{{{ #!html
實做四: Hadoop 程式編譯
}}} [[PageOutline]] = 練習 1 : Word Count 初級版 = * 上傳內容到hdfs內 {{{ $ cd /opt/hadoop $ mkdir lab4_input $ echo "I like NCHC Cloud Course." > lab4_input/input1 $ echo "I like nchc Cloud Course, and we enjoy this course." > lab4_input/input2 $ bin/hadoop fs -put lab4_input lab4_input $ bin/hadoop fs -ls lab4_input }}} * 下載 [attachment:wiki:jazz/Hadoop_Lab6:WordCount.java?format=raw WordCount.java] 並存到/opt/hadoop; {{{ $ wget http://trac.nchc.org.tw/cloud/attachment/wiki/jazz/Hadoop_Lab6/WordCount.java?format=raw $ mv WordCount.java\?format\=raw WordCount.java }}} * 運作程式 {{{ $ mkdir MyJava $ javac -classpath hadoop-*-core.jar -d MyJava WordCount.java $ jar -cvf wordcount.jar -C MyJava . $ bin/hadoop jar wordcount.jar WordCount lab4_input/ lab4_out1/ $ bin/hadoop fs -cat lab4_out1/part-00000 }}} * lab4_out1 執行結果 {{{ #!text Cloud 2 Course, 1 Course. 1 I 2 NCHC 1 and 1 course. 1 enjoy 1 like 2 nchc 1 this 1 we 1 }}} ----- = 練習 2 : Word Count 進階版 = {{{ $ echo "\." >pattern.txt && echo "\," >>pattern.txt $ bin/hadoop fs -put pattern.txt ./ $ mkdir MyJava2 }}} * 下載 [attachment:wiki:jazz/Hadoop_Lab6:WordCount2.java?format=raw WordCount2.java] 並存到/opt/hadoop; {{{ $ wget http://trac.nchc.org.tw/cloud/attachment/wiki/jazz/Hadoop_Lab6/WordCount2.java?format=raw $ mv WordCount2.java\?format\=raw WordCount2.java }}} {{{ $ javac -classpath hadoop-*-core.jar -d MyJava2 WordCount2.java $ jar -cvf wordcount2.jar -C MyJava2 . $ bin/hadoop jar wordcount2.jar WordCount2 lab4_input lab4_out2 -skip pattern.txt $ bin/hadoop fs -cat lab4_out2/part-00000 }}} * lab4_out2 執行結果 {{{ #!text Cloud 2 Course 2 I 2 NCHC 1 and 1 course 1 enjoy 1 like 2 nchc 1 this 1 we 1 }}} {{{ $ bin/hadoop jar wordcount2.jar WordCount2 -Dwordcount.case.sensitive=false lab4_input lab4_out3 -skip pattern.txt $ bin/hadoop fs -cat lab4_out3/part-00000 }}} * lab4_out3 執行結果 {{{ #!text and 1 cloud 2 course 3 enjoy 1 i 2 like 2 nchc 2 this 1 we 1 }}} = Hadoop 運算時出錯 = * 可到 /opt/hadoop/logs 內看日誌檔 * 用瀏覽器打開 [http://localhost:50070/logs/ ] 或 [http://localhost:50030/logs/] 皆可 日誌名稱的格式為 ||hadoop||-||使用者||-||工作身份||-||主機||.||副檔名|| 因此, * 運作於 pc121 主機上的 namenode 的日誌為 * hadoop-hadooper-namenode-pc121.log * 運作於 pc122 主機上的 datanode 的日誌為 * hadoop-hadooper-datanode-pc122.log * 運作於 pc123 主機上的 jobtracker 的日誌為 * hadoop-hadooper-jobtracker-pc123.log * 運作於 pc124 主機上的 tasktracker 的日誌為 * hadoop-hadooper-tasktracker-pc124.log