Changes between Initial Version and Version 1 of Hadoop_Lab3


Ignore:
Timestamp:
Aug 28, 2009, 6:16:42 PM (15 years ago)
Author:
waue
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Hadoop_Lab3

    v1 v1  
     1{{{
     2#!html
     3<div style="text-align: center;"><big
     4 style="font-weight: bold;"><big><big>實做三、執行 MapReduce 基本運算</big></big></big></div>
     5}}}
     6[[PageOutline]]
     7
     8 == 1 Hadoop運算命令 grep == 
     9 
     10 * grep 這個命令是擷取文件裡面特定的字元,在Hadoop example中此指令可以擷取文件中有此指定文字的字串,並作計數統計
     11 
     12{{{
     13 /opt/hadoop$ bin/hadoop jar hadoop-*-examples.jar grep input grep_output 'dfs[a-z.]+'
     14 
     15}}}
     16 
     17 運作的畫面如下:
     18 
     19{{{
     20
     2109/03/24 12:33:45 INFO mapred.FileInputFormat: Total input paths to process : 9
     2209/03/24 12:33:45 INFO mapred.FileInputFormat: Total input paths to process : 9
     2309/03/24 12:33:45 INFO mapred.JobClient: Running job: job_200903232025_0003
     2409/03/24 12:33:46 INFO mapred.JobClient:  map 0% reduce 0%
     2509/03/24 12:33:47 INFO mapred.JobClient:  map 10% reduce 0%
     2609/03/24 12:33:49 INFO mapred.JobClient:  map 20% reduce 0%
     2709/03/24 12:33:51 INFO mapred.JobClient:  map 30% reduce 0%
     2809/03/24 12:33:52 INFO mapred.JobClient:  map 40% reduce 0%
     2909/03/24 12:33:54 INFO mapred.JobClient:  map 50% reduce 0%
     3009/03/24 12:33:55 INFO mapred.JobClient:  map 60% reduce 0%
     3109/03/24 12:33:57 INFO mapred.JobClient:  map 70% reduce 0%
     3209/03/24 12:33:59 INFO mapred.JobClient:  map 80% reduce 0%
     3309/03/24 12:34:00 INFO mapred.JobClient:  map 90% reduce 0%
     3409/03/24 12:34:02 INFO mapred.JobClient:  map 100% reduce 0%
     3509/03/24 12:34:10 INFO mapred.JobClient:  map 100% reduce 10%
     3609/03/24 12:34:12 INFO mapred.JobClient:  map 100% reduce 13%
     3709/03/24 12:34:15 INFO mapred.JobClient:  map 100% reduce 20%
     3809/03/24 12:34:20 INFO mapred.JobClient:  map 100% reduce 23%
     3909/03/24 12:34:22 INFO mapred.JobClient: Job complete: job_200903232025_0003
     4009/03/24 12:34:22 INFO mapred.JobClient: Counters: 16
     4109/03/24 12:34:22 INFO mapred.JobClient:   File Systems
     4209/03/24 12:34:22 INFO mapred.JobClient:     HDFS bytes read=48245
     4309/03/24 12:34:22 INFO mapred.JobClient:     HDFS bytes written=1907
     4409/03/24 12:34:22 INFO mapred.JobClient:     Local bytes read=1549
     4509/03/24 12:34:22 INFO mapred.JobClient:     Local bytes written=3584
     4609/03/24 12:34:22 INFO mapred.JobClient:   Job Counters
     47......
     48}}}
     49
     50 
     51 * 接著查看結果
     52
     53{{{
     54  /opt/hadoop$ bin/hadoop fs -ls grep_output
     55  /opt/hadoop$ bin/hadoop fs -cat grep_output/part-00000
     56}}}
     57
     58 結果如下
     59
     60{{{
     613       dfs.class
     623       dfs.
     632       dfs.period
     641       dfs.http.address
     651       dfs.balance.bandwidth
     661       dfs.block.size
     671       dfs.blockreport.initial
     681       dfs.blockreport.interval
     691       dfs.client.block.write.retries
     701       dfs.client.buffer.dir
     711       dfs.data.dir
     721       dfs.datanode.address
     731       dfs.datanode.dns.interface
     741       dfs.datanode.dns.nameserver
     751       dfs.datanode.du.pct
     761       dfs.datanode.du.reserved
     771       dfs.datanode.handler.count
     781       dfs.datanode.http.address
     791       dfs.datanode.https.address
     801       dfs.datanode.ipc.address
     811       dfs.default.chunk.view.size
     821       dfs.df.interval
     831       dfs.file
     841       dfs.heartbeat.interval
     851       dfs.hosts
     861       dfs.hosts.exclude
     871       dfs.https.address
     881       dfs.impl
     891       dfs.max.objects
     901       dfs.name.dir
     911       dfs.namenode.decommission.interval
     921       dfs.namenode.decommission.interval.
     931       dfs.namenode.decommission.nodes.per.interval
     941       dfs.namenode.handler.count
     951       dfs.namenode.logging.level
     961       dfs.permissions
     971       dfs.permissions.supergroup
     981       dfs.replication
     991       dfs.replication.consider
     1001       dfs.replication.interval
     1011       dfs.replication.max
     1021       dfs.replication.min
     1031       dfs.replication.min.
     1041       dfs.safemode.extension
     1051       dfs.safemode.threshold.pct
     1061       dfs.secondary.http.address
     1071       dfs.servers
     1081       dfs.web.ugi
     1091       dfsmetrics.log
     110
     111 }}}
     112
     113 == 2 Hadoop運算命令 WordCount ==
     114 
     115 * 如名稱,WordCount會對所有的字作字數統計,並且從a-z作排列
     116 
     117 {{{
     118 /opt/hadoop$ bin/hadoop jar hadoop-*-examples.jar wordcount input wc_output
     119 }}}
     120 
     121 檢查輸出結果的方法同2.1的方法
     122
     123 === 2.1 更多運算命令 ===
     124 
     125 可執行的指令一覽表:
     126
     127 || aggregatewordcount ||  An Aggregate based map/reduce program that counts the words in the input files. ||
     128 || aggregatewordhist || An Aggregate based map/reduce program that computes the histogram of the words in the input files. ||
     129 || grep ||  A map/reduce program that counts the matches of a regex in the input. ||
     130 || join || A job that effects a join over sorted, equally partitioned datasets ||
     131 || multifilewc ||  A job that counts words from several files. ||
     132 || pentomino  || A map/reduce tile laying program to find solutions to pentomino problems. ||
     133 || pi ||  A map/reduce program that estimates Pi using monte-carlo method. ||
     134 || randomtextwriter ||  A map/reduce program that writes 10GB of random textual data per node. ||
     135 || randomwriter || A map/reduce program that writes 10GB of random data per node. ||
     136 || sleep ||  A job that sleeps at each map and reduce task. ||
     137 || sort || A map/reduce program that sorts the data written by the random writer. ||
     138 || sudoku ||  A sudoku solver. ||
     139 || wordcount || A map/reduce program that counts the words in the input files. ||
     140
     141 請參考 [http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/package-summary.html org.apache.hadoop.examples]
     142
     143
     144 == 3. 使用網頁Gui瀏覽資訊 ==
     145 
     146 * [http://localhost:50030 Map/Reduce Administration]
     147 * [http://localhost:50070 NameNode ]