Context Navigation

← Previous Version
View Latest Version
Next Version →

Version 1 (modified by jazz, 10 years ago) (diff)
--

作業一

題目：請參考 hadoop_labs/lab013 改成逆向索引（Reverse Index）。使 ReverseIndex 執行之結果為「"關鍵字"\t"檔案名稱(用逗點隔開)"」型態。
參考: 以連結之執行方法，忽略句點（\.）與逗點（\,），並且忽略大小寫（case.sensitive=false），

參考步驟：
Here is the reference steps:

$ hadoop jar WordCount -Dwordcount.case.sensitive=false hw1_input hw1_out -skip pattern.txt
$ hadoop fs -cat hw1_out/part-00000

參考結果應該為：(路徑不限）
The reference result should be as following:（no limitation for the format of "path"）

and     input2
cloud   input1,input2
course  input1,input2,input2
enjoy   input2
i       input1,input2
like    input1,input2
nctu    input1,input2
this    input2
we      input2

繳交期限：2011年6月13日（一）上午 11:59
Due date: 11:59 AM, Monday, June 13th, Year 2011
繳交方式：將原始碼與報告以附件方式寄至 jazz _AT_ nchc _DOT_ org _DOT_ tw (1) 程式原始碼一份：以 ${學號}.zip 方式壓縮與命名 (2) 報告一份：以 ${學號} 命名。
Please e-mail the java source code and report (doc or PDF) to jazz _AT_ nchc _DOT_ org _DOT_ tw
提示：
Hint:
- 請將 Mapper 輸出、Reducer 輸入輸出的 (Key,Value) 由原本的 (Text, IntWritable) 改成 (Text, Text)
- Replace (Key,Value) pair from (Text, IntWritable) to (Text, Text)

加分題：(Extra)

試將出現次數統計加入結果，亦即參考結果如下：
Try to add count of each file in the result, i.e. The reference result should be as following:

and     input2(1)
cloud   input1(1),input2(1)
course  input1(1),input2(2)
enjoy   input2(1)
i       input1(1),input2(1)
like    input1(1),input2(1)
nctu    input1(1),input2(1)
this    input2(1)
we      input2(1)

配分比例：
- 標準題原始碼 Source Code：60%
- 報告 Report ：20%
  - 參考內容入下：Reference Items should be shown in your report
  - 封面 Cover : 姓名、學號 ( Your Name and ID ）
  - 於 hadoop.nchc.org.tw 執行的擷圖（Screenshot of your program running on hadoop.nchc.org.tw）
  - 執行結果 The result of your program
- 加分題：20%

Download in other formats:

Plain Text