close
Warning:
Can't synchronize with repository "(default)" (Unsupported version control system "svn": libsasl2.so.2: failed to map segment from shared object: Cannot allocate memory). Look in the Trac log for more information.
- Timestamp:
-
Jun 9, 2009, 4:39:34 PM (16 years ago)
- Author:
-
waue
- Comment:
-
--
Legend:
- Unmodified
- Added
- Removed
- Modified
-
v5
|
v6
|
|
9 | 9 | === readdb === |
10 | 10 | - read / dump crawl db |
11 | | - Usage: CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN <nnnn> <out_dir> [<min>] | -url <url>) |
| 11 | - Usage: !CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN <nnnn> <out_dir> [<min>] | -url <url>) |
12 | 12 | - -stats [-sort] print overall statistics to System.out |
13 | 13 | {{{ |
… |
… |
|
21 | 21 | }}} |
22 | 22 | - -dump <out_dir> [-format normal|csv ] dump the whole db to a text file in <out_dir> |
| 23 | {{{ |
| 24 | $ nutch readdb /tmp/search/crawldb/ -dump ./dump |
| 25 | $ vim ./dump/part-00000 |
| 26 | }}} |
23 | 27 | - -url <url> print information on <url> to System.out |
| 28 | {{{ |
| 29 | $ nutch readdb /tmp/search/crawldb/ -url http://www.nchc.org.tw/tw/ |
| 30 | URL: http://www.nchc.org.tw/tw/ |
| 31 | |
| 32 | Version: 7 |
| 33 | |
| 34 | Status: 6 (db_notmodified) |
| 35 | |
| 36 | Fetch time: Thu Jul 09 14:34:48 CST 2009 |
| 37 | |
| 38 | Modified time: Thu Jan 01 08:00:00 CST 1970 |
| 39 | |
| 40 | Retries since fetch: 0 |
| 41 | |
| 42 | Retry interval: 2592000 seconds (30 days) |
| 43 | |
| 44 | Score: 3.1152809 |
| 45 | |
| 46 | Signature: ce0202bbd593b09b86ce8a9aa991b321 |
| 47 | |
| 48 | Metadata: _pst_: success(1), lastModified=0 |
| 49 | }}} |
24 | 50 | - -topN <nnnn> <out_dir> [<min>] dump top <nnnn> urls sorted by score to <out_dir> |
25 | 51 | |
26 | 52 | === inject === |
27 | 53 | - inject new urls into the database |
28 | | - Usage: Injector <crawldb> <url_dir> |
| 54 | - Usage: !Injector <crawldb> <url_dir> |
29 | 55 | |
30 | 56 | === readlinkdb === |