用Eclipse製成可在Hadoop上運行MapReduce的jar檔
ps : 需eclipse 3.3 以上 搭配 hadoop 0.17 以上版本。
|  Hadoop 安裝目錄  |  /opt/hadoop 
 | 
|  來源資料夾  |  /opt/hadoop/input 
 | 
|  輸出資料夾  |  /opt/hadoop/output 
 | 
- 開啟MapReduce 專案
 
|  視窗操作  |  介面中設定  |  註解 
 | 
|  File > new > Map/Reduce? Project>next  |  Project name:sample   Configure Hadoop install directory: /opt/hadoop   => Finish  |  完成會增加sample專案並切換成MapReduce的視野 
 | 
- 加入檔案WordCount.java檔
 
|  視窗操作  |  介面中設定  |  結果 
 | 
|  右鍵點選sample專案 > new > file    |  sample >src   File Name: WordCount.java   => Finish  |  完成後就多了一個WordCount.java檔 
 | 
- 寫入WordCount.java的內容(code)
 
- 執行
 
|  視窗操作  |  介面中設定  |  結果 
 | 
|  run > Run Configurations...  |  Main tag :  Name: WordCount   Project: sample   Main class:: WordCount ;Arguments tag :   Program arguments: /opt/hadoop/log /opt/hadoop/test2 => Apply => Run  |  console 介面會出現執行結果 
 | 
- Eclipse是用模擬的方式模擬Hadoop的環境,執行這段程式碼,所以並沒有送上HDFS給Hadoop的job tracker作Map Reduce。http://localhost:50030 沒有工作運作的紀錄可以證明這點。
- 既然是在本機端上運作,所以給的Program arguments參數 /opt/hadoop/input /opt/hadoop/output 是本機上的目錄。
 - 請確認 input 資料夾內有純文字資料,且output資料夾尚未存在(執行後系統會自行建立此資料夾並將結果放入)
 
 - 若Console 介面沒有錯誤訊息,則代表這段程式在主機端運作無誤
09/02/06 17:18:35 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
09/02/06 17:18:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
09/02/06 17:18:35 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
09/02/06 17:18:35 INFO mapred.FileInputFormat: Total input paths to process : 1
... 略 ...
09/02/06 17:18:36 INFO mapred.JobClient:     Map output bytes=445846
09/02/06 17:18:36 INFO mapred.JobClient:     Map input bytes=320950
09/02/06 17:18:36 INFO mapred.JobClient:     Combine input records=37943
09/02/06 17:18:36 INFO mapred.JobClient:     Map output records=37943
09/02/06 17:18:36 INFO mapred.JobClient:     Reduce input records=9284
 
錯誤排除 : 
- input 資料夾內有純文字資料
 - output 資料夾尚未存在(執行後系統會自行建立此資料夾並將結果放入)
 - 檢查"run configuration" 內的 "Java Application" > "WordCount" 的設定是否正確
 
- 打包成JAR
 
|  視窗操作  |  介面中設定  |  結果 
 | 
|  File > Export > Java > Runnable JAR file  |    Launch configuration : WordCount - sample   Export destionation : /opt/hadoop/WordCount.jar => Finish => ok  | /opt/hadoop/下可以找到檔案WordCount.jar 
 | 
- 最後一個ok在於包入Hadoop的必要library,所以匯出的WordCount.jar 檔大約有4.3MB
 
- 運行WordCount於HDFS之上
 
指令:
$ cd /opt/hadoop
$ bin/hadoop jar WordCount.jar /user/waue/input /user/waue/out/
- bin/hadoop jar 不可用 -jar,但若是單純用java執行jar, 則要用$ java -jar XXX.jar,不可只用jar
 - /user/waue/input /user/waue/out/ 為輸入和輸出的兩個參數,這兩個路徑是HDFS上得路徑,請確認hdfs內的/user/waue/input有純文字檔,且無/user/waue/out/這個資料夾。
 - 若已經成功執行過,想再執行第二次,請更換output的資料夾名稱,否則會因資料夾已存在而出現錯誤訊息。
 
執行畫面
09/02/06 18:13:14 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
09/02/06 18:13:14 INFO mapred.FileInputFormat: Total input paths to process : 1
09/02/06 18:13:14 INFO mapred.FileInputFormat: Total input paths to process : 1
09/02/06 18:13:15 INFO mapred.JobClient: Running job: job_200902051032_0009
09/02/06 18:13:16 INFO mapred.JobClient:  map 0% reduce 0%
09/02/06 18:13:20 INFO mapred.JobClient:  map 100% reduce 0%
09/02/06 18:13:23 INFO mapred.JobClient: Job complete: job_200902051032_0009
09/02/06 18:13:23 INFO mapred.JobClient: Counters: 16
09/02/06 18:13:23 INFO mapred.JobClient:   File Systems
09/02/06 18:13:23 INFO mapred.JobClient:     HDFS bytes read=320950
09/02/06 18:13:23 INFO mapred.JobClient:     HDFS bytes written=130568
09/02/06 18:13:23 INFO mapred.JobClient:     Local bytes read=168448
09/02/06 18:13:23 INFO mapred.JobClient:     Local bytes written=336932
09/02/06 18:13:23 INFO mapred.JobClient:   Job Counters 
09/02/06 18:13:23 INFO mapred.JobClient:     Launched reduce tasks=1
09/02/06 18:13:23 INFO mapred.JobClient:     Launched map tasks=1
09/02/06 18:13:23 INFO mapred.JobClient:     Data-local map tasks=1
09/02/06 18:13:23 INFO mapred.JobClient:   Map-Reduce Framework
09/02/06 18:13:23 INFO mapred.JobClient:     Reduce input groups=9284
09/02/06 18:13:23 INFO mapred.JobClient:     Combine output records=18568
09/02/06 18:13:23 INFO mapred.JobClient:     Map input records=7868
09/02/06 18:13:23 INFO mapred.JobClient:     Reduce output records=9284
09/02/06 18:13:23 INFO mapred.JobClient:     Map output bytes=445846
09/02/06 18:13:23 INFO mapred.JobClient:     Map input bytes=320950
09/02/06 18:13:23 INFO mapred.JobClient:     Combine input records=47227
09/02/06 18:13:23 INFO mapred.JobClient:     Map output records=37943
09/02/06 18:13:23 INFO mapred.JobClient:     Reduce input records=9284