wiki:crawlzilla/stress_testing

Ubuntu 10.04 (CPU 1, RAM 2G) in VM

jazz 建議 可能是 hadoop 預設 heap = 1G,所以再 2G 的情況下,很正常運作
之後改用較低的 RAM 和 修改過的 hadoop heap 參數 測試


  • Ubuntu 10.04 (CPU1, RAM 512M) (hadoop Heap 512M) in VM
    • 同時執行以上3個爬取任務時,一樣會產生 out of memory 問題
  • Ubuntu 10.04 (CPU1, RAM 512M) (hadoop Heap 256M) in VM
    • 同時執行以上3個爬取任務時,一樣會產生 out of memory 問題
    • error message (syslog)
      600	Sep  9 09:59:28 ubuntu-186 kernel: [ 3708.133724] Out of memory: kill process 3843 (go.sh) score 1775788 or a child
      601	Sep  9 09:59:28 ubuntu-186 kernel: [ 3708.133791] Killed process 4205 (counter.sh)
      602	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.200789] counter.sh invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
      603	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.200978] counter.sh cpuset=/ mems_allowed=0
      604	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.201021] Pid: 11384, comm: counter.sh Not tainted 2.6.32-24-generic #39-Ubuntu
      605	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.201061] Call Trace:
      606	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.201133]  [<c01cd1f4>] oom_kill_process+0xa4/0x2b0
      607	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.201201]  [<c01cd869>] ? select_bad_process+0xa9/0xe0
      608	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.201267]  [<c01cd8f1>] __out_of_memory+0x51/0xa0
      609	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.201333]  [<c01cd998>] out_of_memory+0x58/0xb0
      610	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.201399]  [<c01d01a7>] __alloc_pages_slowpath+0x407/0x4a0
      611	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.201466]  [<c01d037a>] __alloc_pages_nodemask+0x13a/0x170
      612	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.201533]  [<c01e6049>] do_wp_page+0x1b9/0x820
      613	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.201600]  [<c013052c>] ? kmap_atomic_prot+0x4c/0xf0
      614	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.201666]  [<c01e6e0c>] handle_mm_fault+0x2fc/0x390
      615	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.201733]  [<c058f1bd>] do_page_fault+0x10d/0x3a0
      616	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.201799]  [<c058f0b0>] ? do_page_fault+0x0/0x3a0
      617	Sep  9 09:59:28 ubuntu-186 kernel: [ 3709.201865]  [<c058d0b3>] error_code+0x73/0x80
      
    • error message (hadoop-crawler-jobtracker-ubuntu-186.log)
      869	2010-09-09 10:04:16,052 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201009090900_0026_m_000000_0: java.io.IOException: Cannot run program "bash": java.io.IOException: error=12, Cannot allocate memory
      870	  at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
      871	  at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
      872	  at org.apache.hadoop.util.Shell.run(Shell.java:134)
      873	  at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
      874	  at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:321)
      875	  at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
      876	  at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
      877	  at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:930)
      878	  at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:842)
      879	  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
      880	  at org.apache.hadoop.mapred.Child.main(Child.java:158)
      881	Caused by: java.io.IOException: java.io.IOException: error=12, Cannot allocate memory
      882	  at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
      883	  at java.lang.ProcessImpl.start(ProcessImpl.java:65)
      884	  at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
      885	  ... 10 more
      

Ubuntu 10.04 (CPU 1, RAM 1G, Disk 8G) in VBox

分別測試爬取3層~10層,結果如下

爬取網站:

Depth花費時間結果
31h:31m:00sFinish
42h:48m:22sFinish
5more than 12HourUnfinish

執行順序及狀況為:

  • 爬取3層後,執行"echo 1>/proc/sys/vm/drop_caches"清除記憶體
  • 爬取4層後,執行"echo 1>/proc/sys/vm/drop_caches"清除記憶體
  • 爬取第5層時,系統進度及相關錯誤訊息如下:
JobidPriorityUserNameMap % CompleteReduce % Complete
job_201009231019_0059NORMALcrawlerfetch NCHC_5/segments/20100923162900100.00%0.00%

  • 錯誤訊息:
    org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/crawler/NCHC_5/segments/20100923162900/crawl_fetch/part-00000/index could only be replicated to 0 nodes, instead of 1
    	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
    	at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
    	at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    	at java.lang.reflect.Method.invoke(Method.java:597)
    	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
    	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
    
    	at org.apache.hadoop.ipc.Client.call(Client.java:697)
    	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    	at $Proxy1.addBlock(Unknown Source)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    	at java.lang.reflect.Method.invoke(Method.java:597)
    	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    	at $Proxy1.addBlock(Unknown Source)
    	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
    	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
    	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
    	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
    
  • 動態加入一個運算節點後,系統繼續執行未完成的job
  • 小結:
    • 1. 除了RAM,Disk也需要夠大,才足以存放運算時所產生的中介檔案
    • 2. 若遇到類似的錯誤訊息(Datanode容量不足),僅需動態的加入另一個運算節點並啟動,即可讓系統繼續運算,不需要Kill掉已經卡彈的程序。
    • 3. 此一測試結果,仍無法得知RAM 1G 的爬取極限。
Last modified 14 years ago Last modified on Sep 24, 2010, 10:57:49 AM

Attachments (4)

Download all attachments as: .zip