[[PageOutline]] = Ubuntu 10.04 (CPU 1, RAM 2G) in VM = * 同時執行 7 個爬行任務 (記憶體使用 1.2G,CPU使用率 80~85%) 1. [http://www.nchc.org.tw/ nchc官網] 的 tw 和 en (深度 6) 2. nchc intra (深度 3) 3. [http://code.google.com/p/crawlzilla/ google crawlzilla 官網] (深度 8) 4. [http://crawlzilla.sourceforge.net/ sourceforge crawlzilla 官網] (深度 8) 5. [http://trac.nchc.org.tw/grid trac grid] (深度 8) 6. [https://trac.nchc.org.tw/cloud/ trac cloud] (深度 10) 7. [http://forum.hadoop.tw/index.php hadoop forum] (深度 10) >> jazz 建議 可能是 hadoop 預設 heap = 1G,所以再 2G 的情況下,很正常運作 [[BR]] >> 之後改用較低的 RAM 和 修改過的 hadoop heap 參數 測試 [[BR]] * Ubuntu 10.04 (CPU1, RAM 512M) ('''''hadoop Heap 512M''''') in VM * 同時執行以上3個爬取任務時,一樣會產生 out of memory 問題 * Ubuntu 10.04 (CPU1, RAM 512M) ('''''hadoop Heap 256M''''') in VM * 同時執行以上3個爬取任務時,一樣會產生 out of memory 問題 * error message ('''syslog''') {{{ 600 Sep 9 09:59:28 ubuntu-186 kernel: [ 3708.133724] Out of memory: kill process 3843 (go.sh) score 1775788 or a child 601 Sep 9 09:59:28 ubuntu-186 kernel: [ 3708.133791] Killed process 4205 (counter.sh) 602 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.200789] counter.sh invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0 603 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.200978] counter.sh cpuset=/ mems_allowed=0 604 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.201021] Pid: 11384, comm: counter.sh Not tainted 2.6.32-24-generic #39-Ubuntu 605 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.201061] Call Trace: 606 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.201133] [] oom_kill_process+0xa4/0x2b0 607 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.201201] [] ? select_bad_process+0xa9/0xe0 608 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.201267] [] __out_of_memory+0x51/0xa0 609 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.201333] [] out_of_memory+0x58/0xb0 610 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.201399] [] __alloc_pages_slowpath+0x407/0x4a0 611 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.201466] [] __alloc_pages_nodemask+0x13a/0x170 612 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.201533] [] do_wp_page+0x1b9/0x820 613 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.201600] [] ? kmap_atomic_prot+0x4c/0xf0 614 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.201666] [] handle_mm_fault+0x2fc/0x390 615 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.201733] [] do_page_fault+0x10d/0x3a0 616 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.201799] [] ? do_page_fault+0x0/0x3a0 617 Sep 9 09:59:28 ubuntu-186 kernel: [ 3709.201865] [] error_code+0x73/0x80 }}} * error message ('''hadoop-crawler-jobtracker-ubuntu-186.log''') {{{ 869 2010-09-09 10:04:16,052 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201009090900_0026_m_000000_0: java.io.IOException: Cannot run program "bash": java.io.IOException: error=12, Cannot allocate memory 870 at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) 871 at org.apache.hadoop.util.Shell.runCommand(Shell.java:149) 872 at org.apache.hadoop.util.Shell.run(Shell.java:134) 873 at org.apache.hadoop.fs.DF.getAvailable(DF.java:73) 874 at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:321) 875 at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) 876 at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107) 877 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:930) 878 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:842) 879 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 880 at org.apache.hadoop.mapred.Child.main(Child.java:158) 881 Caused by: java.io.IOException: java.io.IOException: error=12, Cannot allocate memory 882 at java.lang.UNIXProcess.(UNIXProcess.java:148) 883 at java.lang.ProcessImpl.start(ProcessImpl.java:65) 884 at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) 885 ... 10 more }}} = Ubuntu 10.04 (CPU 1, RAM 1G, Disk 8G) in VBox = == 分別測試爬取3層~10層,結果如下 == 爬取網站: * http://www.nchc.org.tw/tw/ * http://www.nchc.org.tw/en/ ||Depth||花費時間||結果|| ||3||1h:31m:00s||Finish|| ||4||2h:48m:22s||Finish|| ||5||more than 12Hour||Unfinish|| 執行順序及狀況為: * 爬取3層後,執行"echo 1>/proc/sys/vm/drop_caches"清除記憶體 * 爬取4層後,執行"echo 1>/proc/sys/vm/drop_caches"清除記憶體 * 爬取第5層時,系統進度及相關錯誤訊息如下: ||Jobid||Priority||User||Name||Map % Complete||Reduce % Complete|| ||job_201009231019_0059||NORMAL||crawler||fetch NCHC_5/segments/20100923162900||100.00%||0.00%|| * 錯誤訊息: {{{ org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/crawler/NCHC_5/segments/20100923162900/crawl_fetch/part-00000/index could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) at org.apache.hadoop.ipc.Client.call(Client.java:697) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) }}} * 動態加入一個運算節點後,系統繼續執行未完成的job * 小結: * 1. 除了RAM,Disk也需要夠大,才足以存放運算時所產生的中介檔案 * 2. 若遇到類似的錯誤訊息(Datanode容量不足),僅需動態的加入另一個運算節點並啟動,即可讓系統繼續運算,不需要Kill掉已經卡彈的程序。 * 3. 此一測試結果,仍無法得知RAM 1G 的爬取極限。