wiki:Hadoop_Lab4

Version 1 (modified by waue, 15 years ago) (diff)

--

實做四: Hadoop 程式編譯

前言:啟動Hadoop環境

  • 重新啟動昨天的環境
    • 在 node1 上操作
      $ cd ~
      $ wget http://hadoop.nchc.org.tw/~waue/clean.sh
      $ chmod 755 clean.sh
      $ ./clean.sh
      $ 
      
    • 請檢查 hadoop 是否正確運作.

練習 1 : Word Count 初級版

  • 上傳內容到hdfs內
$ cd /opt/hadoop
$ bin/hadoop fs -mkdir input
$ echo "I like NCHC Cloud Course." > input1
$ echo "I like nchc Cloud Course, and we enjoy this course." > input2
$ bin/hadoop fs -put input1 input
$ bin/hadoop fs -put input2 input
$ bin/hadoop fs -ls input
  • 運作程式
$ mkdir MyJava
$ javac -classpath hadoop-*-core.jar -d MyJava WordCount.java
$ jar -cvf wordcount.jar -C MyJava .
$ bin/hadoop jar wordcount.jar WordCount input/ output/
$ bin/hadoop fs -cat output/part-00000

練習 2 : Word Count 進階版

$ echo "\." >pattern.txt && echo "\," >>pattern.txt
$ bin/hadoop fs -put pattern.txt ./
$ mkdir MyJava2
$ javac -classpath hadoop-*-core.jar -d MyJava2 WordCount2.java
$ jar -cvf wordcount2.jar -C MyJava2 .
$ bin/hadoop jar wordcount2.jar WordCount2 input output2 -skip pattern.txt
$ bin/hadoop fs -cat output2/part-00000
$ bin/hadoop jar wordcount2.jar WordCount2 -Dwordcount.case.sensitive=false input output3 -skip pattern.txt
$ bin/hadoop fs -cat output3/part-00000

Attachments (3)

Download all attachments as: .zip