| 3 | | 1. 編輯好 WordCount 程式碼 |
| 4 | | 2. |
| | 3 | ps : 需eclipse 3.3 以上 搭配 hadoop 0.17 以上版本。 |
| | 4 | * 本篇的安裝環境是 |
| | 5 | || 名稱 || 目錄 || |
| | 6 | || Hadoop 安裝目錄 || /opt/hadoop || |
| | 7 | || 來源資料夾 || /opt/hadoop/input || |
| | 8 | || 輸出資料夾 || /opt/hadoop/output || |
| | 9 | |
| | 10 | 1. 開啟MapReduce 專案 |
| | 11 | |
| | 12 | || 視窗操作 || 介面中設定 || 註解 || |
| | 13 | || '''File''' > '''new''' > '''Map/Reduce Project'''>'''next''' || '''Project name''':''sample'' [[br]] '''Configure Hadoop install directory''': /opt/hadoop [[br]] => '''Finish''' || 完成會增加sample專案並切換成MapReduce的視野 || |
| | 14 | |
| | 15 | 2. 加入檔案WordCount.java檔 |
| | 16 | |
| | 17 | || 視窗操作 || 介面中設定 || 結果 || |
| | 18 | || 右鍵點選sample專案 > '''new''' > '''file''' || sample >'''src''' [[br]] '''File Name''': WordCount.java [[br]] => '''Finish''' || 完成後就多了一個WordCount.java檔 || |
| | 19 | |
| | 20 | 3. 寫入WordCount.java的內容([wiki:WordCount code]) |
| | 21 | |
| | 22 | 4. 執行 |
| | 23 | |
| | 24 | || 視窗操作 || 介面中設定 || 結果 || |
| | 25 | || '''run''' > '''Run Configurations...''' || '''Main''' tag :[[br]] '''Name''': '''WordCount''' [[br]] '''Project''': sample [[br]] '''Main class:''': WordCount ;'''Arguments''' tag : [[br]] '''Program arguments''': /opt/hadoop/log /opt/hadoop/test2 => '''Apply''' => '''Run''' || console 介面會出現執行結果 || |
| | 26 | |
| | 27 | * Eclipse是用模擬的方式模擬Hadoop的環境,執行這段程式碼,所以並沒有送上HDFS給Hadoop的job tracker作Map Reduce。http://localhost:50030 沒有工作運作的紀錄可以證明這點。 |
| | 28 | * 既然是在本機端上運作,所以給的Program arguments參數 '''/opt/hadoop/input /opt/hadoop/output''' 是本機上的目錄。 |
| | 29 | * 請確認 input 資料夾內有純文字資料,且output資料夾尚未存在(執行後系統會自行建立此資料夾並將結果放入) |
| | 30 | * 若Console 介面沒有錯誤訊息,則代表這段程式在主機端運作無誤 |
| | 31 | {{{ |
| | 32 | 09/02/06 17:18:35 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= |
| | 33 | 09/02/06 17:18:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. |
| | 34 | 09/02/06 17:18:35 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). |
| | 35 | 09/02/06 17:18:35 INFO mapred.FileInputFormat: Total input paths to process : 1 |
| | 36 | |
| | 37 | ... 略 ... |
| | 38 | |
| | 39 | 09/02/06 17:18:36 INFO mapred.JobClient: Map output bytes=445846 |
| | 40 | 09/02/06 17:18:36 INFO mapred.JobClient: Map input bytes=320950 |
| | 41 | 09/02/06 17:18:36 INFO mapred.JobClient: Combine input records=37943 |
| | 42 | 09/02/06 17:18:36 INFO mapred.JobClient: Map output records=37943 |
| | 43 | 09/02/06 17:18:36 INFO mapred.JobClient: Reduce input records=9284 |
| | 44 | }}} |
| | 45 | |
| | 46 | 錯誤排除 : |
| | 47 | |
| | 48 | * input 資料夾內有純文字資料 |
| | 49 | * output 資料夾尚未存在(執行後系統會自行建立此資料夾並將結果放入) |
| | 50 | * 檢查"run configuration" 內的 "Java Application" > "WordCount" 的設定是否正確 |
| | 51 | |
| | 52 | 5. 打包成JAR |
| | 53 | |
| | 54 | || 視窗操作 || 介面中設定 || 結果 || |
| | 55 | || '''File''' > '''Export''' > Java > Runnable JAR file || ''' Launch configuration''' : '''WordCount - sample''' [[br]] '''Export destionation''' : /opt/hadoop/WordCount.jar => Finish => ok ||/opt/hadoop/下可以找到檔案WordCount.jar || |
| | 56 | |
| | 57 | * 最後一個ok在於包入Hadoop的必要library,所以匯出的WordCount.jar 檔大約有4.3MB |
| | 58 | |
| | 59 | 6. 運行WordCount於HDFS之上 |
| | 60 | |
| | 61 | 指令: |
| | 62 | {{{ |
| | 63 | $ cd /opt/hadoop |
| | 64 | $ bin/hadoop jar WordCount.jar /user/waue/input /user/waue/out/ |
| | 65 | }}} |
| | 66 | |
| | 67 | * bin/hadoop jar 不可用 '''-jar''',但若是單純用java執行jar, 則要用'''$ java -jar XXX.jar''',不可只用jar |
| | 68 | * /user/waue/input /user/waue/out/ 為輸入和輸出的兩個參數,這兩個路徑是HDFS上得路徑,請確認hdfs內的/user/waue/input有純文字檔,且無/user/waue/out/這個資料夾。 |
| | 69 | * 若已經成功執行過,想再執行第二次,請更換output的資料夾名稱,否則會因資料夾已存在而出現錯誤訊息。 |
| | 70 | |
| | 71 | 執行畫面 |
| | 72 | {{{ |
| | 73 | 09/02/06 18:13:14 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. |
| | 74 | |
| | 75 | 09/02/06 18:13:14 INFO mapred.FileInputFormat: Total input paths to process : 1 |
| | 76 | |
| | 77 | 09/02/06 18:13:14 INFO mapred.FileInputFormat: Total input paths to process : 1 |
| | 78 | |
| | 79 | 09/02/06 18:13:15 INFO mapred.JobClient: Running job: job_200902051032_0009 |
| | 80 | |
| | 81 | 09/02/06 18:13:16 INFO mapred.JobClient: map 0% reduce 0% |
| | 82 | |
| | 83 | 09/02/06 18:13:20 INFO mapred.JobClient: map 100% reduce 0% |
| | 84 | |
| | 85 | 09/02/06 18:13:23 INFO mapred.JobClient: Job complete: job_200902051032_0009 |
| | 86 | |
| | 87 | 09/02/06 18:13:23 INFO mapred.JobClient: Counters: 16 |
| | 88 | |
| | 89 | 09/02/06 18:13:23 INFO mapred.JobClient: File Systems |
| | 90 | |
| | 91 | 09/02/06 18:13:23 INFO mapred.JobClient: HDFS bytes read=320950 |
| | 92 | |
| | 93 | 09/02/06 18:13:23 INFO mapred.JobClient: HDFS bytes written=130568 |
| | 94 | |
| | 95 | 09/02/06 18:13:23 INFO mapred.JobClient: Local bytes read=168448 |
| | 96 | |
| | 97 | 09/02/06 18:13:23 INFO mapred.JobClient: Local bytes written=336932 |
| | 98 | |
| | 99 | 09/02/06 18:13:23 INFO mapred.JobClient: Job Counters |
| | 100 | |
| | 101 | 09/02/06 18:13:23 INFO mapred.JobClient: Launched reduce tasks=1 |
| | 102 | |
| | 103 | 09/02/06 18:13:23 INFO mapred.JobClient: Launched map tasks=1 |
| | 104 | |
| | 105 | 09/02/06 18:13:23 INFO mapred.JobClient: Data-local map tasks=1 |
| | 106 | |
| | 107 | 09/02/06 18:13:23 INFO mapred.JobClient: Map-Reduce Framework |
| | 108 | |
| | 109 | 09/02/06 18:13:23 INFO mapred.JobClient: Reduce input groups=9284 |
| | 110 | |
| | 111 | 09/02/06 18:13:23 INFO mapred.JobClient: Combine output records=18568 |
| | 112 | |
| | 113 | 09/02/06 18:13:23 INFO mapred.JobClient: Map input records=7868 |
| | 114 | |
| | 115 | 09/02/06 18:13:23 INFO mapred.JobClient: Reduce output records=9284 |
| | 116 | |
| | 117 | 09/02/06 18:13:23 INFO mapred.JobClient: Map output bytes=445846 |
| | 118 | |
| | 119 | 09/02/06 18:13:23 INFO mapred.JobClient: Map input bytes=320950 |
| | 120 | |
| | 121 | 09/02/06 18:13:23 INFO mapred.JobClient: Combine input records=47227 |
| | 122 | |
| | 123 | 09/02/06 18:13:23 INFO mapred.JobClient: Map output records=37943 |
| | 124 | |
| | 125 | 09/02/06 18:13:23 INFO mapred.JobClient: Reduce input records=9284 |
| | 126 | }}} |
| | 127 | |
| | 128 | * http://localhost:50030 會紀錄剛剛運作的工作 |
| | 129 | |