Changes between Version 15 and Version 16 of jazz/Hadoop_Lab2


Ignore:
Timestamp:
Mar 24, 2009, 2:10:07 PM (15 years ago)
Author:
waue
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • jazz/Hadoop_Lab2

    v15 v16  
    99 == Content 1. 基本操作 ==
    1010 === 1.1 瀏覽你HDFS目錄  ===
    11 
     11{{{
     12/opt/hadoop$ bin/hadoop fs -ls
     13}}}
    1214 === 1.2 上傳資料到HDFS目錄 ===
     15 * 上傳
     16{{{
     17/opt/hadoop$ bin/hadoop fs -put conf input
     18}}}
     19 * 檢查
     20{{{
     21/opt/hadoop$ bin/hadoop fs -ls
     22/opt/hadoop$ bin/hadoop fs -ls input
     23}}}
    1324 
    1425 === 1.3 下載HDFS的資料到本地目錄 ===
     26 * 下載
     27{{{
     28/opt/hadoop$ bin/hadoop fs -get input fromHDFS
     29}}}
     30 * 檢查
     31{{{
     32/opt/hadoop$ ls -al | grep fromHDFS
     33/opt/hadoop$ ls -al fromHDFS
     34}}} 
     35
     36 === 1.4 刪除檔案 ===
     37{{{
     38/opt/hadoop$ bin/hadoop fs -ls input
     39/opt/hadoop$ bin/hadoop fs -rm input/masters
     40}}}
     41 === 1.5 直接看檔案 ===
     42{{{
     43/opt/hadoop$ bin/hadoop fs -ls input
     44/opt/hadoop$ bin/hadoop fs -cat input/slaves
     45}}}
     46
     47 === 1.6 更多指令操作 ===
     48{{{
     49hadooper@vPro:/opt/hadoop$ bin/hadoop fs
     50
     51Usage: java FsShell
     52
     53           [-ls <path>]
     54
     55           [-lsr <path>]
     56
     57           [-du <path>]
     58
     59           [-dus <path>]
     60
     61           [-count[-q] <path>]
     62
     63           [-mv <src> <dst>]
     64
     65           [-cp <src> <dst>]
     66
     67           [-rm <path>]
     68
     69           [-rmr <path>]
     70
     71           [-expunge]
     72
     73           [-put <localsrc> ... <dst>]
     74
     75           [-copyFromLocal <localsrc> ... <dst>]
     76
     77           [-moveFromLocal <localsrc> ... <dst>]
     78
     79           [-get [-ignoreCrc] [-crc] <src> <localdst>]
     80
     81           [-getmerge <src> <localdst> [addnl]]
     82
     83           [-cat <src>]
     84
     85           [-text <src>]
     86
     87           [-copyToLocal [-ignoreCrc] [-crc] <src> <localdst>]
     88
     89           [-moveToLocal [-crc] <src> <localdst>]
     90
     91           [-mkdir <path>]
     92
     93           [-setrep [-R] [-w] <rep> <path/file>]
     94
     95           [-touchz <path>]
     96
     97           [-test -[ezd] <path>]
     98
     99           [-stat [format] <path>]
     100
     101           [-tail [-f] <file>]
     102
     103           [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
     104
     105           [-chown [-R] [OWNER][:[GROUP]] PATH...]
     106
     107           [-chgrp [-R] GROUP PATH...]
     108
     109           [-help [cmd]]
     110
     111
     112
     113Generic options supported are
     114
     115-conf <configuration file>     specify an application configuration file
     116
     117-D <property=value>            use value for given property
     118
     119-fs <local|namenode:port>      specify a namenode
     120
     121-jt <local|jobtracker:port>    specify a job tracker
     122
     123-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
     124
     125-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
     126
     127-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
     128
     129
     130
     131The general command line syntax is
     132
     133bin/hadoop command [genericOptions] [commandOptions]
     134
     135
     136}}} 
    15137 
    16  === 1.4 更多指令操作 ===
     138 === 1.7 自我練習 ===
     139 
     140 * 刪除在 hdfs 內的一整個的資料夾 input
    17141 
    18142 == Content 2. Hadoop 運算命令 ==
     
    20144 === 2.1 Hadoop運算命令 grep === 
    21145 
     146 * grep 這個命令是擷取文件裡面特定的字元,在Hadoop example中此指令可以擷取文件中有此指定文字的字串,並作計數統計
     147 
     148 {{{
     149 /opt/hadoop$ bin/hadoop jar hadoop-*-examples.jar grep input grep_output 'dfs[a-z.]+'
     150 }}}
     151 
     152 運作的畫面如下:
     153 
     154 {{{
     15509/03/24 12:33:45 INFO mapred.FileInputFormat: Total input paths to process : 9
     156
     15709/03/24 12:33:45 INFO mapred.FileInputFormat: Total input paths to process : 9
     158
     15909/03/24 12:33:45 INFO mapred.JobClient: Running job: job_200903232025_0003
     160
     16109/03/24 12:33:46 INFO mapred.JobClient:  map 0% reduce 0%
     162
     16309/03/24 12:33:47 INFO mapred.JobClient:  map 10% reduce 0%
     164
     16509/03/24 12:33:49 INFO mapred.JobClient:  map 20% reduce 0%
     166
     16709/03/24 12:33:51 INFO mapred.JobClient:  map 30% reduce 0%
     168
     16909/03/24 12:33:52 INFO mapred.JobClient:  map 40% reduce 0%
     170
     17109/03/24 12:33:54 INFO mapred.JobClient:  map 50% reduce 0%
     172
     17309/03/24 12:33:55 INFO mapred.JobClient:  map 60% reduce 0%
     174
     17509/03/24 12:33:57 INFO mapred.JobClient:  map 70% reduce 0%
     176
     17709/03/24 12:33:59 INFO mapred.JobClient:  map 80% reduce 0%
     178
     17909/03/24 12:34:00 INFO mapred.JobClient:  map 90% reduce 0%
     180
     18109/03/24 12:34:02 INFO mapred.JobClient:  map 100% reduce 0%
     182
     18309/03/24 12:34:10 INFO mapred.JobClient:  map 100% reduce 10%
     184
     18509/03/24 12:34:12 INFO mapred.JobClient:  map 100% reduce 13%
     186
     18709/03/24 12:34:15 INFO mapred.JobClient:  map 100% reduce 20%
     188
     18909/03/24 12:34:20 INFO mapred.JobClient:  map 100% reduce 23%
     190
     19109/03/24 12:34:22 INFO mapred.JobClient: Job complete: job_200903232025_0003
     192
     19309/03/24 12:34:22 INFO mapred.JobClient: Counters: 16
     194
     19509/03/24 12:34:22 INFO mapred.JobClient:   File Systems
     196
     19709/03/24 12:34:22 INFO mapred.JobClient:     HDFS bytes read=48245
     198
     19909/03/24 12:34:22 INFO mapred.JobClient:     HDFS bytes written=1907
     200
     20109/03/24 12:34:22 INFO mapred.JobClient:     Local bytes read=1549
     202
     20309/03/24 12:34:22 INFO mapred.JobClient:     Local bytes written=3584
     204
     20509/03/24 12:34:22 INFO mapred.JobClient:   Job Counters
     206
     207......
     208 }}}
     209 
     210 * 接著查看結果
     211 {{{
     212  /opt/hadoop$ bin/hadoop fs -ls grep_output
     213  /opt/hadoop$ bin/hadoop fs -cat grep_output/part-00000
     214 }}}
     215 結果如下
     216 {{{
     217
     2183       dfs.class
     219
     2203       dfs.
     221
     2222       dfs.period
     223
     2241       dfs.http.address
     225
     2261       dfs.balance.bandwidth
     227
     2281       dfs.block.size
     229
     2301       dfs.blockreport.initial
     231
     2321       dfs.blockreport.interval
     233
     2341       dfs.client.block.write.retries
     235
     2361       dfs.client.buffer.dir
     237
     2381       dfs.data.dir
     239
     2401       dfs.datanode.address
     241
     2421       dfs.datanode.dns.interface
     243
     2441       dfs.datanode.dns.nameserver
     245
     2461       dfs.datanode.du.pct
     247
     2481       dfs.datanode.du.reserved
     249
     2501       dfs.datanode.handler.count
     251
     2521       dfs.datanode.http.address
     253
     2541       dfs.datanode.https.address
     255
     2561       dfs.datanode.ipc.address
     257
     2581       dfs.default.chunk.view.size
     259
     2601       dfs.df.interval
     261
     2621       dfs.file
     263
     2641       dfs.heartbeat.interval
     265
     2661       dfs.hosts
     267
     2681       dfs.hosts.exclude
     269
     2701       dfs.https.address
     271
     2721       dfs.impl
     273
     2741       dfs.max.objects
     275
     2761       dfs.name.dir
     277
     2781       dfs.namenode.decommission.interval
     279
     2801       dfs.namenode.decommission.interval.
     281
     2821       dfs.namenode.decommission.nodes.per.interval
     283
     2841       dfs.namenode.handler.count
     285
     2861       dfs.namenode.logging.level
     287
     2881       dfs.permissions
     289
     2901       dfs.permissions.supergroup
     291
     2921       dfs.replication
     293
     2941       dfs.replication.consider
     295
     2961       dfs.replication.interval
     297
     2981       dfs.replication.max
     299
     3001       dfs.replication.min
     301
     3021       dfs.replication.min.
     303
     3041       dfs.safemode.extension
     305
     3061       dfs.safemode.threshold.pct
     307
     3081       dfs.secondary.http.address
     309
     3101       dfs.servers
     311
     3121       dfs.web.ugi
     313
     3141       dfsmetrics.log
     315
     316
     317 }}}
    22318 === 2.2 Hadoop運算命令 WordCount ===
    23319 
     320 * 如名稱,WordCount會對所有的字作字數統計,並且從a-z作排列
     321 
     322 {{{
     323 /opt/hadoop$ bin/hadoop jar hadoop-*-examples.jar wordcount input wc_output
     324 }}}
     325 
     326 檢查輸出結果的方法同2.1的方法
     327
    24328 === 2.3 更多運算命令 ===
    25329 
     
    40344 || wordcount || A map/reduce program that counts the words in the input files. ||
    41345
    42  
    43346 請參考 [http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/package-summary.html org.apache.hadoop.examples]
    44  
    45 
    46  
    47  == Content 3. 使用網頁Gui瀏覽訊息 ==
    48  
    49  
    50  == 練習 ==
    51  
    52  
     347
     348{{{
     349#!html
     350<html lang="zh-tw"><head>
     351
     352<meta content="text/html; charset=ISO-8859-1" http-equiv="content-type"><title>a.html</title>
     353
     354</head><body>
     355<br>
     356
     357<p>
     358</p><table summary="" border="1" cellpadding="3" cellspacing="0" width="100%">
     359<tbody><tr class="TableHeadingColor" bgcolor="#ccccff">
     360<th colspan="2" align="left"><font size="+2">
     361<b>Class Summary</b></font></th>
     362</tr>
     363<tr class="TableRowColor" bgcolor="white">
     364
     365<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/AggregateWordCount.html" title="class in org.apache.hadoop.examples">AggregateWordCount</a></b></td>
     366<td>This is an example Aggregated Hadoop Map/Reduce application. It
     367reads the text input files, breaks each line into words and counts
     368them. The output is a locally sorted list of words and the count of how
     369often they occurred. To run: bin/hadoop jar hadoop-*-examples.jar
     370aggregatewordcount in-dir out-dir numOfReducers textinputformat </td>
     371</tr>
     372<tr class="TableRowColor" bgcolor="white">
     373<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/AggregateWordHistogram.html" title="class in org.apache.hadoop.examples">AggregateWordHistogram</a></b></td>
     374<td>This is an example Aggregated Hadoop Map/Reduce application.
     375Computes the histogram of the words in the input texts. To run:
     376bin/hadoop jar hadoop-*-examples.jar aggregatewordhist in-dir out-dir
     377numOfReducers textinputformat </td>
     378</tr>
     379<tr class="TableRowColor" bgcolor="white">
     380<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/ExampleDriver.html" title="class in org.apache.hadoop.examples">ExampleDriver</a></b></td>
     381<td>A description of an example program based on its class and a human-readable description.</td>
     382</tr>
     383
     384<tr class="TableRowColor" bgcolor="white">
     385<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/Grep.html" title="class in org.apache.hadoop.examples">Grep</a></b></td>
     386<td>&nbsp;</td>
     387</tr>
     388<tr class="TableRowColor" bgcolor="white">
     389<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/Join.html" title="class in org.apache.hadoop.examples">Join</a></b></td>
     390<td>This is the trivial map/reduce program that does absolutely nothing
     391other than use the framework to fragment and sort the input values. To
     392run: bin/hadoop jar build/hadoop-examples.jar join [-m maps] [-r
     393reduces] [-inFormat input format class] [-outFormat output format
     394class] [-outKey output key class] [-outValue output value class]
     395[-joinOp <inner |outer|override="">] [in-dir]* in-dir out-dir</inner></td>
     396</tr>
     397<tr class="TableRowColor" bgcolor="white">
     398<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/RandomTextWriter.html" title="class in org.apache.hadoop.examples">RandomTextWriter</a></b></td>
     399<td>This program uses map/reduce to just run a distributed job where
     400there is
     401no interaction between the tasks and each task writes a large unsorted
     402random sequence of words.To run: bin/hadoop jar
     403hadoop-${version}-examples.jar randomtextwriter [-outFormat output
     404format class] output</td>
     405
     406</tr>
     407<tr class="TableRowColor" bgcolor="white">
     408<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/RandomWriter.html" title="class in org.apache.hadoop.examples">RandomWriter</a></b></td>
     409<td>This program uses map/reduce to just run a distributed job where
     410there is
     411no interaction between the tasks and each task write a large unsorted
     412random binary sequence file of BytesWritable.To run: bin/hadoop jar
     413hadoop-${version}-examples.jar randomwriter [-outFormat output format
     414class] output</td>
     415</tr>
     416<tr class="TableRowColor" bgcolor="white">
     417<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/Sort.html" title="class in org.apache.hadoop.examples">Sort&lt;K,V&gt;</a></b></td>
     418<td>This is the trivial map/reduce program that does absolutely nothing
     419other than use the framework to fragment and sort the input values.To
     420run: bin/hadoop jar build/hadoop-examples.jar sort [-m maps] [-r
     421reduces] [-inFormat input format class] [-outFormat output format
     422class] [-outKey output key class] [-outValue output value class]
     423[-totalOrder pcnt num samples max splits] in-dir out-dir</td>
     424</tr>
     425<tr class="TableRowColor" bgcolor="white">
     426<td width="15%"><b><a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/WordCount.html" title="class in org.apache.hadoop.examples">WordCount</a></b></td>
     427
     428<td>This is an example Hadoop Map/Reduce application.</td>
     429</tr>
     430</tbody></table>
     431</body></html>
     432}}}
     433 
     434 === 2.4 練習 ===
     435 
     436 
     437 == Content 3. 使用網頁Gui瀏覽資訊 ==
     438 
     439 * [http://localhost:50030 Map/Reduce Administration]
     440 * [http://localhost:50070 NameNode ]
     441 
     442 === 3.1 練習 ===
     443 
     444 * 用網頁秀出你在 wordcount練習的輸出結果
     445