Version 1 (modified by jazz, 14 years ago) (diff) |
---|
實作五 Lab 5
在單機模式執行 MapReduce 基本運算
Running MapReduce in local mode by Examples
Running MapReduce in local mode by Examples
MapReduce 範例一『字數統計(WordCount)』
- STEP 1 : 練習 MapReduce 丟 Job 指令: 『hadoop jar <local jar file> <class name> <parameters>』
Jazz@human ~ $ cd /opt/hadoop/ Jazz@human /opt/hadoop $ hadoop jar hadoop-*-examples.jar wordcount input output 11/10/21 14:08:58 INFO input.FileInputFormat: Total input paths to process : 12 11/10/21 14:09:00 INFO mapred.JobClient: Running job: job_201110211130_0001 11/10/21 14:09:01 INFO mapred.JobClient: map 0% reduce 0% 11/10/21 14:09:31 INFO mapred.JobClient: map 16% reduce 0% 11/10/21 14:10:29 INFO mapred.JobClient: map 100% reduce 27% 11/10/21 14:10:33 INFO mapred.JobClient: map 100% reduce 100% 11/10/21 14:10:35 INFO mapred.JobClient: Job complete: job_201110211130_0001 11/10/21 14:10:35 INFO mapred.JobClient: Counters: 17 11/10/21 14:10:35 INFO mapred.JobClient: Job Counters 11/10/21 14:10:35 INFO mapred.JobClient: Launched reduce tasks=1 11/10/21 14:10:35 INFO mapred.JobClient: Launched map tasks=12 11/10/21 14:10:35 INFO mapred.JobClient: Data-local map tasks=12 11/10/21 14:10:35 INFO mapred.JobClient: FileSystemCounters 11/10/21 14:10:35 INFO mapred.JobClient: FILE_BYTES_READ=16578 11/10/21 14:10:35 INFO mapred.JobClient: HDFS_BYTES_READ=18312 11/10/21 14:10:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=32636 11/10/21 14:10:35 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=10922 11/10/21 14:10:35 INFO mapred.JobClient: Map-Reduce Framework 11/10/21 14:10:35 INFO mapred.JobClient: Reduce input groups=592 11/10/21 14:10:35 INFO mapred.JobClient: Combine output records=750 11/10/21 14:10:35 INFO mapred.JobClient: Map input records=553 11/10/21 14:10:35 INFO mapred.JobClient: Reduce shuffle bytes=15674 11/10/21 14:10:35 INFO mapred.JobClient: Reduce output records=592 11/10/21 14:10:35 INFO mapred.JobClient: Spilled Records=1500 11/10/21 14:10:35 INFO mapred.JobClient: Map output bytes=24438 11/10/21 14:10:35 INFO mapred.JobClient: Combine input records=1755 11/10/21 14:10:35 INFO mapred.JobClient: Map output records=1755 11/10/21 14:10:35 INFO mapred.JobClient: Reduce input records=750
- STEP 2 : 練習從 http://localhost:50030 查看目前 MapReduce Job 的運作情形
- STEP 3 : 使用 HDFS 指令: 『hadoop fs -get <HDFS file/dir> <local file/dir>』,並了解輸出檔案檔名均為 part-r-*,且執行參數會紀錄於 <HOSTNAME>_<TIME>_job_<JOBID>_0001_conf.xml,不妨可以觀察 xml 內容與 hadoop config 檔的參數關聯。
Jazz@human /opt/hadoop $ hadoop fs -get output my_output Jazz@human /opt/hadoop $ ls -alR my_output my_output: total 12 drwxr-xr-x+ 3 Jazz None 0 Oct 21 14:12 . drwxr-xr-x+ 15 Jazz None 0 Oct 21 14:12 .. drwxr-xr-x+ 3 Jazz None 0 Oct 21 14:12 _logs -rwxr-xr-x 1 Jazz None 10922 Oct 21 14:12 part-r-00000 my_output/_logs: total 0 drwxr-xr-x+ 3 Jazz None 0 Oct 21 14:12 . drwxr-xr-x+ 3 Jazz None 0 Oct 21 14:12 .. drwxr-xr-x+ 2 Jazz None 0 Oct 21 14:12 history my_output/_logs/history: total 48 drwxr-xr-x+ 2 Jazz None 0 Oct 21 14:12 . drwxr-xr-x+ 3 Jazz None 0 Oct 21 14:12 .. -rwxr-xr-x 1 Jazz None 26004 Oct 21 14:12 localhost_1319167815125_job_201110211130_0001_Jazz_word+count -rwxr-xr-x 1 Jazz None 16984 Oct 21 14:12 localhost_1319167815125_job_201110211130_0001_conf.xml