close
Warning:
Can't synchronize with repository "(default)" (Unsupported version control system "svn": /usr/lib/python2.7/dist-packages/libsvn/_repos.so: failed to map segment from shared object: Cannot allocate memory). Look in the Trac log for more information.
- Timestamp:
-
Jun 10, 2010, 5:02:18 PM (15 years ago)
- Author:
-
shunfa
- Comment:
-
--
Legend:
- Unmodified
- Added
- Removed
- Modified
-
|
v8
|
v9
|
|
| 28 | 28 | == 執行 == |
| 29 | 29 | |
| 30 | | === 上傳urls === |
| 31 | | * bin/hadoop dfs -put urls urls |
| | 30 | === 2010/06/10 === |
| 32 | 31 | {{{ |
| 33 | | log4j:ERROR setFile(null,true) call failed. |
| 34 | | java.io.FileNotFoundException: /tmp/NutchEZ/logs/hadoop.log (Permission denied) |
| 35 | | ...something message... |
| 36 | | log4j:ERROR Either File or DatePattern options are not set for appender [DRFA]. |
| 37 | | put: org.apache.hadoop.security.AccessControlException: Permission denied: user=nutchuser, access=WRITE, inode="":root:supergroup:rwxr-xr-x |
| | 32 | 10/06/10 16:58:42 INFO mapred.JobClient: Task Id : attempt_201006091555_0003_r_000000_0, Status : FAILED |
| | 33 | Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. |
| | 34 | 10/06/10 16:58:53 INFO mapred.JobClient: Task Id : attempt_201006091555_0003_r_000000_1, Status : FAILED |
| | 35 | Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. |
| | 36 | 10/06/10 16:59:05 INFO mapred.JobClient: Task Id : attempt_201006091555_0003_r_000000_2, Status : FAILED |
| | 37 | Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. |
| | 38 | Exception in thread "main" java.io.IOException: Job failed! |
| | 39 | at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) |
| | 40 | at org.apache.nutch.crawl.Generator.generate(Generator.java:472) |
| | 41 | at org.apache.nutch.crawl.Generator.generate(Generator.java:409) |
| | 42 | at org.apache.nutch.crawl.Crawl.main(Crawl.java:116) |
| | 43 | nutch crawl is error |
| 38 | 44 | }}} |
| 39 | | * 暫時切換至root測試 |
| 40 | | |
| 41 | | === 爬網 === |
| 42 | | * bin/nutch crawl urls -dir search -threads 2 -depth 3 -topN 100000 |
| 43 | | |
| 44 | | == 待完成事項 == |
| 45 | | * 爬網, 搜尋檔案..等執行階段測試 |