Changes between Version 5 and Version 6 of waue/2009/0609
- Timestamp:
- Jun 9, 2009, 4:39:34 PM (16 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
waue/2009/0609
v5 v6 9 9 === readdb === 10 10 - read / dump crawl db 11 - Usage: CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN <nnnn> <out_dir> [<min>] | -url <url>)11 - Usage: !CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN <nnnn> <out_dir> [<min>] | -url <url>) 12 12 - -stats [-sort] print overall statistics to System.out 13 13 {{{ … … 21 21 }}} 22 22 - -dump <out_dir> [-format normal|csv ] dump the whole db to a text file in <out_dir> 23 {{{ 24 $ nutch readdb /tmp/search/crawldb/ -dump ./dump 25 $ vim ./dump/part-00000 26 }}} 23 27 - -url <url> print information on <url> to System.out 28 {{{ 29 $ nutch readdb /tmp/search/crawldb/ -url http://www.nchc.org.tw/tw/ 30 URL: http://www.nchc.org.tw/tw/ 31 32 Version: 7 33 34 Status: 6 (db_notmodified) 35 36 Fetch time: Thu Jul 09 14:34:48 CST 2009 37 38 Modified time: Thu Jan 01 08:00:00 CST 1970 39 40 Retries since fetch: 0 41 42 Retry interval: 2592000 seconds (30 days) 43 44 Score: 3.1152809 45 46 Signature: ce0202bbd593b09b86ce8a9aa991b321 47 48 Metadata: _pst_: success(1), lastModified=0 49 }}} 24 50 - -topN <nnnn> <out_dir> [<min>] dump top <nnnn> urls sorted by score to <out_dir> 25 51 26 52 === inject === 27 53 - inject new urls into the database 28 - Usage: Injector <crawldb> <url_dir>54 - Usage: !Injector <crawldb> <url_dir> 29 55 30 56 === readlinkdb ===