|   | 1 | ◢ <[wiki:VNU120925/Lab6 實作六]> | <[wiki:VNU120925 回課程大綱]> ▲ | <[wiki:VNU120925/Lab8 實作八]> ◣ | 
                  
                          |   | 2 |  | 
                  
                          |   | 3 | = 實作七 Lab 7 = | 
                  
                          |   | 4 |  | 
                  
                          |   | 5 | [[PageOutline]] | 
                  
                          |   | 6 |  | 
                  
                          |   | 7 | {{{ | 
                  
                          |   | 8 | #!html | 
                  
                          |   | 9 | <div style="text-align: center;"><big style="font-weight: bold;"><big>在完全分散模式執行 MapReduce 基本運算<br/>Running MapReduce in Full Distributed Mode by Examples</big></big></div> | 
                  
                          |   | 10 | }}} | 
                  
                          |   | 11 | {{{ | 
                  
                          |   | 12 | #!text | 
                  
                          |   | 13 | 以下練習,請連線至 hadoop.nchc.org.tw 操作。底下的 hXXXX 等於您的用戶名稱。 | 
                  
                          |   | 14 | }}} | 
                  
                          |   | 15 |  | 
                  
                          |   | 16 | == Sample 1 : WordCount == | 
                  
                          |   | 17 |   | 
                  
                          |   | 18 |  * 如名稱,WordCount會對所有的字作字數統計,並且從a-z作排列[[BR]]WordCount example will count each word shown in documents and sorting from a to z. | 
                  
                          |   | 19 | {{{ | 
                  
                          |   | 20 | ~$ hadoop fs -put /etc/hadoop/conf lab5_input | 
                  
                          |   | 21 | ~$ hadoop fs -rmr lab5_out2 | 
                  
                          |   | 22 | ~$ hadoop jar hadoop-examples.jar wordcount lab5_input lab5_out2 | 
                  
                          |   | 23 | }}} | 
                  
                          |   | 24 |  * 檢查輸出結果的方法同之前方法[[BR]]Let's check the computed result of '''wordcount''' from HDFS :  | 
                  
                          |   | 25 | {{{ | 
                  
                          |   | 26 | $ hadoop fs -ls lab5_out2 | 
                  
                          |   | 27 | $ hadoop fs -cat lab5_out2/part-r-00000  | 
                  
                          |   | 28 | }}} | 
                  
                          |   | 29 |  * 結果如下[[BR]]You should see results like this:  | 
                  
                          |   | 30 | {{{ | 
                  
                          |   | 31 | "".     4 | 
                  
                          |   | 32 | "*"     9 | 
                  
                          |   | 33 | "127.0.0.1"     3 | 
                  
                          |   | 34 | "AS     2 | 
                  
                          |   | 35 | "License");     2 | 
                  
                          |   | 36 | "_logs/history/"        1 | 
                  
                          |   | 37 | "alice,bob      9 | 
                  
                          |   | 38 |  | 
                  
                          |   | 39 | ( ... skip ... ) | 
                  
                          |   | 40 | }}} | 
                  
                          |   | 41 |  | 
                  
                          |   | 42 | == Sample 2: grep == | 
                  
                          |   | 43 |   | 
                  
                          |   | 44 |  * grep 這個命令是擷取文件裡面特定的字元,在Hadoop example中此指令可以擷取文件中有此指定文字的字串,並作計數統計[[BR]]grep is a command to extract specific characters in documents. In hadoop examples, you can use this command to extract strings match the regular expression and count for matched strings. | 
                  
                          |   | 45 | {{{ | 
                  
                          |   | 46 | $ hadoop fs -ls lab5_input | 
                  
                          |   | 47 | $ hadoop jar hadoop-examples.jar grep lab5_input lab5_out3 'dfs[a-z.]+'  | 
                  
                          |   | 48 | }}} | 
                  
                          |   | 49 |  * 運作的畫面如下:[[BR]]You should see procedure like this:   | 
                  
                          |   | 50 | {{{ | 
                  
                          |   | 51 | 11/04/19 10:00:20 INFO mapred.FileInputFormat: Total input paths to process : 25 | 
                  
                          |   | 52 | 11/04/19 10:00:20 INFO mapred.JobClient: Running job: job_201104120101_0645 | 
                  
                          |   | 53 | 11/04/19 10:00:21 INFO mapred.JobClient:  map 0% reduce 0% | 
                  
                          |   | 54 | ( ... skip ... ) | 
                  
                          |   | 55 | }}} | 
                  
                          |   | 56 |  * 接著查看結果[[BR]]Let's check the computed result of '''grep''' from HDFS : | 
                  
                          |   | 57 | {{{ | 
                  
                          |   | 58 | $ hadoop fs -ls lab5_out3 | 
                  
                          |   | 59 | Found 2 items | 
                  
                          |   | 60 | drwx------   - hXXXX supergroup          0 2011-04-19 10:00 /user/hXXXX/lab5_out1/_logs | 
                  
                          |   | 61 | -rw-r--r--   2 hXXXX supergroup       1146 2011-04-19 10:00 /user/hXXXX/lab5_out1/part-00000 | 
                  
                          |   | 62 | $ hadoop fs -cat lab5_out1/part-00000  | 
                  
                          |   | 63 | }}} | 
                  
                          |   | 64 |  * 結果如下[[BR]]You should see results like this:  | 
                  
                          |   | 65 | {{{ | 
                  
                          |   | 66 | 4       dfs.permissions | 
                  
                          |   | 67 | 4       dfs.replication | 
                  
                          |   | 68 | 4       dfs.name.dir | 
                  
                          |   | 69 | 3       dfs.namenode.decommission.interval. | 
                  
                          |   | 70 | 3       dfs.namenode.decommission.nodes.per.interval | 
                  
                          |   | 71 | 3       dfs. | 
                  
                          |   | 72 | ( ... skip ... ) | 
                  
                          |   | 73 | }}} | 
                  
                          |   | 74 |  | 
                  
                          |   | 75 | == More Examples == | 
                  
                          |   | 76 |   | 
                  
                          |   | 77 |  可執行的指令一覽表:[[BR]]Here is a list of hadoop examples : | 
                  
                          |   | 78 |  | 
                  
                          |   | 79 |  || aggregatewordcount ||  An Aggregate based map/reduce program that counts the words in the input files. ||  | 
                  
                          |   | 80 |  || aggregatewordhist || An Aggregate based map/reduce program that computes the histogram of the words in the input files. ||  | 
                  
                          |   | 81 |  || grep ||  A map/reduce program that counts the matches of a regex in the input. ||  | 
                  
                          |   | 82 |  || join || A job that effects a join over sorted, equally partitioned datasets ||  | 
                  
                          |   | 83 |  || multifilewc ||  A job that counts words from several files. ||  | 
                  
                          |   | 84 |  || pentomino  || A map/reduce tile laying program to find solutions to pentomino problems. ||  | 
                  
                          |   | 85 |  || pi ||  A map/reduce program that estimates Pi using monte-carlo method. ||  | 
                  
                          |   | 86 |  || randomtextwriter ||  A map/reduce program that writes 10GB of random textual data per node. ||  | 
                  
                          |   | 87 |  || randomwriter || A map/reduce program that writes 10GB of random data per node. ||  | 
                  
                          |   | 88 |  || sleep ||  A job that sleeps at each map and reduce task. ||  | 
                  
                          |   | 89 |  || sort || A map/reduce program that sorts the data written by the random writer. ||  | 
                  
                          |   | 90 |  || sudoku ||  A sudoku solver. ||  | 
                  
                          |   | 91 |  || wordcount || A map/reduce program that counts the words in the input files. ||  | 
                  
                          |   | 92 |  | 
                  
                          |   | 93 | You could find more detail at [http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/package-summary.html org.apache.hadoop.examples] |