= Crawlzilla爬取數據參考 = == 單機測試 == ||主機資訊|| || ||CPU||Intel® Core™ i7-920 Processor|| ||記憶體|| 12GiB|| ||作業系統||Ubuntu 10.04(lucid) (x86)|| || Kernel||Linux 2.6.32-26-generic|| ||Cralzilla版本||: 0.3.0-101115 安裝於一台主機上|| 測試方式: 各別summit job,前一crawl job完成後再執行下一個job,非同時summit全部job 起始位址: http://udn.com/NEWS/mainpage.shtml 測試結果 ||Depth||Exec. Time(HR)||Crawl Files||Crawl Words|| ||3||0.88||4599||89742|| ||4||1.58||8903||126229|| ||5||2.83||13498||171480|| ||6||9.12||16744||204349|| ||7||9.61||21324||312669|| ||8||10.28||24984||356119|| ||9||9.3||28044||413921|| ||10||9.44||31981||431790|| == 叢集測試 == === 計算節點(6) === ||主機資訊 || || ||計算節點數量||6|| ||CPU||Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz|| ||記憶體||8GiB|| ||作業系統||Ubuntu 10.04(lucid) (x86) || ||Linux Kernel||2.6.32-27 (PAE enabled)|| ||Cralzilla版本||0.3.0-101116|| 測試方式: shell script 執行爬取3~10層的Job 起始位址: http://udn.com/NEWS/mainpage.shtml 測試結果 ||Depth||Exec. Time(HR)||Crawl Files||Crawl Words|| ||3||0.9||4642||89168|| ||4||2.02||8212||123186|| ||5||2.98||12517||163206|| ||6||3.95||16220||210714|| ||7||6.23||19577||319898|| ||8||5.78||22705||351934|| ||9||6.01||26148||407658|| ||10||6.34||30954||440307|| === 計算節點(3) === ||主機資訊 || || ||計算節點數量||3|| ||CPU||Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz|| ||記憶體||8GiB|| ||作業系統||Ubuntu 10.04(lucid) (x86) || ||Linux Kernel||2.6.32-27 (PAE enabled)|| ||Cralzilla版本||0.3.0-101116|| 測試方式: shell script 執行爬取3~10層的Job 起始位址: http://udn.com/NEWS/mainpage.shtml 測試結果 ||Depth||Exec. Time(HR)||Crawl Files||Crawl Words|| ||3||0.75||2457||60438|| ||4||2.39||6830||108784|| ||5||2.5||11398||153627|| ||6||4.7||15310||233880|| ||7||6.09||19538||232897|| ||8||5.51||23300||348894|| ||9||6.33||26689||379194|| ||10||7.98||30605||431518|| == Reference == * maps & reduces number * [http://www.ics.uci.edu/~abehm/hadoop.html#how_many_maps_reduces]