wiki:Hadoop_Lab4

Version 34 (modified by waue, 12 years ago) (diff)

--

實做四: Hadoop 程式編譯

練習 0 使用範例

  • 下載 nchc-example.jar
    $ wget http://trac.nchc.org.tw/cloud/raw-attachment/wiki/Hadoop_Lab4/nchc-example.jar
    
  • 執行客製化的hadoop 程式
$ bin/hadoop jar nchc-example.jar
  • 輸出結果
    ******************************************
    歡迎使用 NCHC.Hadoop 的運算功能
    指令:
      Hadoop jar nchc-example-*.jar <功能>
    功能:
      hello:     印出key-value的內容
      wordcount: 計算輸入資料夾內分別在每個檔案的字數統計
      mwc:       統合計算所有輸入檔的字數統計
      grep:      算出包指定字串的個數
      nchcgrep:  整合來源檔內的每個字與其所有出現的所在列
    ******************************************
    
     * 範例:
    
    {{{
    $ bin/hadoop jar nchc-example.jar hello lab3_input lab4_out6 1 1
    }}}
    
    
     = 練習 1 : Hello  =
    
    [wiki:waue/helloHadoop code]
    
     * 使用hello
    {{{
    $ bin/hadoop jar nchc-example.jar hello
    }}}
     * 使用提示:
    {{{
    #!text
    hello <inDir> <outDir> <m> <r>
    }}}
    
     * 運作:
    
    {{{
    $ bin/hadoop dfs -put conf-local lab3-input
    $ bin/hadoop jar nchc-example.jar hello lab3_input lab4_out6 1 1
    }}}
    
     * 看結果
    
     = 練習 2 : Word Count 初級版 =
    [wiki:waue/wordCountI code]
    
    
     * 上傳內容到hdfs內
    
    {{{
    $ cd /opt/hadoop
    $ mkdir lab4_input
    $ echo "I like NCHC Cloud Course." > lab4_input/input1
    $ echo "I like nchc Cloud Course, and we enjoy this course." > lab4_input/input2
    $ bin/hadoop fs -put lab4_input lab4_input
    $ bin/hadoop fs -ls lab4_input
    }}}
    
     * 下載 [raw-attachment:wiki:Hadoop_Lab4:WordCount.java WordCount.java] 並存到/opt/hadoop;
    {{{
    $ wget http://trac.nchc.org.tw/cloud/raw-attachment/wiki/Hadoop_Lab4/WordCount.java
    }}}
    
     * 運作程式
    
    {{{
    $ mkdir MyJava
    $ javac -classpath hadoop-*-core.jar -d MyJava WordCount.java
    $ jar -cvf wordcount.jar -C MyJava .
    $ bin/hadoop jar wordcount.jar WordCount lab4_input/ lab4_out1/
    $ bin/hadoop fs -cat lab4_out1/part-r-00000
    }}}
    
     * lab4_out1 執行結果 
    {{{
    #!text
    Cloud	2
    Course,	1
    Course.	1
    I	2
    NCHC	1
    and	1
    course.	1
    enjoy	1
    like	2
    nchc	1
    this	1
    we	1
    }}}
    -----
    
     = 練習 3 : Word Count 進階版 =
    
    [wiki:waue/wordCountII code]
    
    {{{
    $ echo "\." >pattern.txt && echo "\," >>pattern.txt
    $ bin/hadoop fs -put pattern.txt ./
    $ mkdir MyJava2
    }}}
    
    
     * 下載 [raw-attachment:wiki:Hadoop_Lab4:WordCount2.java WordCount2.java] 並存到/opt/hadoop;
    {{{
    $ wget http://trac.nchc.org.tw/cloud/raw-attachment/wiki/Hadoop_Lab4/WordCount2.java
    }}}
    
    {{{
    $ javac -classpath hadoop-*-core.jar -d MyJava2 WordCount2.java
    $ jar -cvf wordcount2.jar -C MyJava2 .
    $ bin/hadoop jar wordcount2.jar WordCount2 lab4_input lab4_out2 -skip pattern.txt
    $ bin/hadoop fs -cat lab4_out2/part-00000
    }}}
    
     * lab4_out2 執行結果
    {{{
    #!text
    Cloud	2
    Course	2
    I	2
    NCHC	1
    and	1
    course	1
    enjoy	1
    like	2
    nchc	1
    this	1
    we	1
    }}}
    
    {{{
    $ bin/hadoop jar wordcount2.jar WordCount2 -Dwordcount.case.sensitive=false lab4_input lab4_out3 -skip pattern.txt
    $ bin/hadoop fs -cat lab4_out3/part-00000
    }}}
    
     * lab4_out3 執行結果
    {{{
    #!text
    and	1
    cloud	2
    course	3
    enjoy	1
    i	2
    like	2
    nchc	2
    this	1
    we	1
    }}}
    

Attachments (3)

Download all attachments as: .zip