Changes between Version 2 and Version 3 of III140322/Lab4


Ignore:
Timestamp:
Mar 23, 2014, 9:34:56 AM (10 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • III140322/Lab4

    v2 v3  
    1212{{{
    1313#!text
    14 以下練習,請連線至 hadoop.nchc.org.tw 操作。底下的 hXXXX 等於您的用戶名稱。
     14以下練習,請連線至 https://lab.3du.me 操作。底下的 hXXXX 等於您的用戶名稱。
    1515}}}
    1616
    17 == Content 1: HDFS Shell 基本操作 ==
    18 == Content 1: Basic HDFS Shell Commands ==
    19 
    20 === 1.1 瀏覽你HDFS目錄 ===
    21 === 1.1 Browsing Your HDFS Folder ===
    22 
    23 {{{
    24 ~$ hadoop fs -ls
    25 Found 1 items
    26 drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp
    27 ~$ hadoop fs -lsr
    28 drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp
    29 }}}
    30 
    31 === 1.2 上傳資料到 HDFS 目錄 ===
    32 === 1.2 Upload Files or Folder to HDFS ===
    33 
    34  * 上傳 Upload
    35 
    36 {{{
    37 ~$ hadoop fs -put /etc/hadoop/conf input
    38 }}}
    39 
    40  * 檢查 Check
    41 
    42 {{{
    43 ~$ hadoop fs -ls
    44 Found 2 items
    45 drwxr-xr-x   - hXXXX supergroup          0 2011-04-19 09:16 /user/hXXXX/input
    46 drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp
    47 ~$ hadoop fs -ls input
    48 Found 25 items
    49 -rw-r--r--   2 hXXXX supergroup        321 2011-04-19 09:16 /user/hXXXX/input/README
    50 -rw-r--r--   2 hXXXX supergroup       3936 2011-04-19 09:16 /user/hXXXX/input/capacity-scheduler.xml
    51 -rw-r--r--   2 hXXXX supergroup        196 2011-04-19 09:16 /user/hXXXX/input/commons-logging.properties
    52 (.... skip ....)
    53 }}}
    54  
    55 === 1.3 下載 HDFS 的資料到本地目錄 ===
    56 === 1.3 Download HDFS Files or Folder to Local ===
    57 
    58  * 下載 Download
    59 
    60 {{{
    61 ~$ hadoop fs -get input fromHDFS
    62 }}}
    63 
    64  * 檢查 Check
    65 {{{
    66 ~$ ls -al | grep fromHDFS
    67 drwxr-xr-x    2 hXXXX hXXXX  4096 2011-04-19 09:18 fromHDFS
    68 ~$ ls -al fromHDFS
    69 總計 160
    70 drwxr-xr-x 2 hXXXX hXXXX  4096 2011-04-19 09:18 .
    71 drwx--x--x 3 hXXXX hXXXX  4096 2011-04-19 09:18 ..
    72 -rw-r--r-- 1 hXXXX hXXXX  3936 2011-04-19 09:18 capacity-scheduler.xml
    73 -rw-r--r-- 1 hXXXX hXXXX   196 2011-04-19 09:18 commons-logging.properties
    74 -rw-r--r-- 1 hXXXX hXXXX   535 2011-04-19 09:18 configuration.xsl
    75 (.... skip ....)
    76 ~$ diff /etc/hadoop/conf fromHDFS/
    77 }}} 
    78 
    79 === 1.4 刪除檔案 ===
    80 === 1.4 Remove Files or Folder ===
    81 
    82 {{{
    83 ~$ hadoop fs -ls input/masters
    84 Found 1 items
    85 -rw-r--r--   2 hXXXX supergroup         10 2011-04-19 09:16 /user/hXXXX/input/masters
    86 ~$ hadoop fs -rm input/masters
    87 Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/input/masters
    88 }}}
    89 
    90 === 1.5 直接看檔案 ===
    91 === 1.5 Browse Files Directly ===
    92 
    93 {{{
    94 ~$ hadoop fs -ls input/slaves
    95 Found 1 items
    96 -rw-r--r--   2 hXXXX supergroup         10 2011-04-19 09:16 /user/hXXXX/input/slaves
    97 ~$ hadoop fs -cat input/slaves
    98 localhost
    99 }}}
    100 
    101 === 1.6 更多指令操作 ===
    102 === 1.6 More Commands -- Help message ===
    103 
    104 {{{
    105 hXXXX@hadoop:~$ hadoop fs
    106 
    107 Usage: java FsShell
    108            [-ls <path>]
    109            [-lsr <path>]
    110            [-du <path>]
    111            [-dus <path>]
    112            [-count[-q] <path>]
    113            [-mv <src> <dst>]
    114            [-cp <src> <dst>]
    115            [-rm <path>]
    116            [-rmr <path>]
    117            [-expunge]
    118            [-put <localsrc> ... <dst>]
    119            [-copyFromLocal <localsrc> ... <dst>]
    120            [-moveFromLocal <localsrc> ... <dst>]
    121            [-get [-ignoreCrc] [-crc] <src> <localdst>]
    122            [-getmerge <src> <localdst> [addnl]]
    123            [-cat <src>]
    124            [-text <src>]
    125            [-copyToLocal [-ignoreCrc] [-crc] <src> <localdst>]
    126            [-moveToLocal [-crc] <src> <localdst>]
    127            [-mkdir <path>]
    128            [-setrep [-R] [-w] <rep> <path/file>]
    129            [-touchz <path>]
    130            [-test -[ezd] <path>]
    131            [-stat [format] <path>]
    132            [-tail [-f] <file>]
    133            [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
    134            [-chown [-R] [OWNER][:[GROUP]] PATH...]
    135            [-chgrp [-R] GROUP PATH...]
    136            [-help [cmd]]
    137 
    138 Generic options supported are
    139 -conf <configuration file>     specify an application configuration file
    140 -D <property=value>            use value for given property
    141 -fs <local|namenode:port>      specify a namenode
    142 -jt <local|jobtracker:port>    specify a job tracker
    143 -files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
    144 -libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
    145 -archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
    146 The general command line syntax is
    147 hadoop command [genericOptions] [commandOptions]
    148 }}} 
    149  
    150 == Content 2: 使用網頁 GUI 瀏覽資訊 ==
    151 == Content 2: User Web GUI to browse HDFS ==
    152  
    153  * [http://hadoop.nchc.org.tw:50030 JobTracker Web Interface]
    154  * [http://hadoop.nchc.org.tw:50070 NameNode Web Interface]
    155  
    156 == Content 3: 更多 HDFS Shell 的用法 ==
    157 == Content 3: More about HDFS Shell ==
    158  
    159  * hadoop fs <args> ,下面則列出 <args> 的用法[[BR]]Following are the examples of hadoop fs related commands.
    160  * 以下操作預設的目錄在 /user/<$username>/ 下[[BR]]By default, your working directory will be at /user/<$username>/.
    161 {{{
    162 $ hadoop fs -ls input
    163 Found 25 items
    164 -rw-r--r--   2 hXXXX supergroup        321 2011-04-19 09:16 /user/hXXXX/input/README
    165 -rw-r--r--   2 hXXXX supergroup       3936 2011-04-19 09:16 /user/hXXXX/input/capacity-scheduler.xml
    166 -rw-r--r--   2 hXXXX supergroup        196 2011-04-19 09:16 /user/hXXXX/input/commons-logging.properties
    167 (.... skip ....)
    168 }}}
    169  * 完整的路徑則是 '''hdfs://node:port/path''' 如:[[BR]]Or you have to give a __''absolute path''__, such as '''hdfs://node:port/path'''
    170 {{{
    171 $ hadoop fs -ls hdfs://hadoop.nchc.org.tw/user/hXXXX/input
    172 Found 25 items
    173 -rw-r--r--   2 hXXXX supergroup        321 2011-04-19 09:16 /user/hXXXX/input/README
    174 -rw-r--r--   2 hXXXX supergroup       3936 2011-04-19 09:16 /user/hXXXX/input/capacity-scheduler.xml
    175 -rw-r--r--   2 hXXXX supergroup        196 2011-04-19 09:16 /user/hXXXX/input/commons-logging.properties
    176 (.... skip ....)
    177 }}}
    178 
    179 === -cat  ===
    180 
    181  * 將路徑指定文件的內容輸出到 STDOUT [[BR]] Print given file content to STDOUT
    182 {{{
    183 $ hadoop fs -cat input/hadoop-env.sh
    184 }}}
    185 
    186 === -chgrp  ===
    187 
    188  * 改變文件所屬的組 [[BR]] Change '''owner group''' of given file or folder
    189 {{{
    190 $ hadoop fs -ls
    191 Found 2 items
    192 drwxr-xr-x   - hXXXX supergroup          0 2011-04-19 09:16 /user/hXXXX/input
    193 drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp
    194 $ hadoop fs -chgrp -R ${USER} input
    195 $ hadoop fs -ls
    196 Found 2 items
    197 drwxr-xr-x   - hXXXX hXXXX               0 2011-04-19 09:21 /user/hXXXX/input
    198 drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp
    199 }}}
    200 
    201 === -chmod ===
    202 
    203  * 改變文件的權限 [[BR]] Change '''read and write permission''' of given file or folder
    204 {{{
    205 $ hadoop fs -ls
    206 Found 2 items
    207 drwxr-xr-x   - hXXXX hXXXX               0 2011-04-19 09:21 /user/hXXXX/input
    208 drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp
    209 $ hadoop fs -chmod -R 755 input
    210 $ hadoop fs -ls
    211 Found 2 items
    212 drwxrwxrwx   - hXXXX hXXXX               0 2011-04-19 09:21 /user/hXXXX/input
    213 drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp
    214 }}}
    215 
    216 === -chown ===
    217 
    218  * 改變文件的擁有者 [[BR]] Change '''owner''' of given file or folder
    219 {{{
    220 $ hadoop fs -chown -R ${USER} input
    221 }}}
    222  * 注意:因為在 hadoop.nchc.org.tw 上您沒有管理者權限,因此若要改成其他使用者時,會看到類似以下的錯誤訊息:
    223  * Note: Since you don't have the super user permission, you will see error message as following:
    224 {{{
    225 $ hadoop fs -chown -R h1000 input
    226 chown: changing ownership of 'hdfs://hadoop.nchc.org.tw/user/hXXXX/input':org.apache.hadoop.security.AccessControlException: Non-super user cannot change owner.
    227 }}}
    228 
    229 === -copyFromLocal, -put ===
    230 
    231  * 從 local 放檔案到 hdfs [[BR]] Both commands will copy given file or folder from local to HDFS
    232 {{{
    233 $ hadoop fs -copyFromLocal /etc/hadoop/conf dfs_input
    234 }}}
    235 
    236 === -copyToLocal, -get ===
    237 
    238  * 把hdfs上得檔案下載到 local [[BR]] Both commands will copy given file or folder from HDFS to local
    239 {{{
    240 $ hadoop fs -copyToLocal dfs_input input1
    241 }}}
    242 
    243 === -cp ===
    244 
    245  * 將文件從 hdfs 原本路徑複製到 hdfs 目標路徑 [[BR]] Copy given file or folder from HDFS source path to HDFS target path
    246 {{{
    247 $ hadoop fs -cp input input1
    248 }}}
    249 
    250 === -du ===
    251 
    252  * 顯示目錄中所有文件的大小 [[BR]] Display the size of files in given folder
    253 {{{
    254 $ hadoop fs -du input
    255 Found 24 items
    256 321         hdfs://hadoop.nchc.org.tw/user/hXXXX/input/README
    257 3936        hdfs://hadoop.nchc.org.tw/user/hXXXX/input/capacity-scheduler.xml
    258 196         hdfs://hadoop.nchc.org.tw/user/hXXXX/input/commons-logging.properties
    259 ( .... skip .... )
    260 }}}
    261 
    262 === -dus ===
    263 
    264  * 顯示該目錄/文件的總大小 [[BR]] Display total size of given folder
    265 {{{
    266 $ hadoop fs -dus input
    267 hdfs://hadoop.nchc.org.tw/user/hXXXX/input      84218
    268 }}}
    269 
    270 === -expunge ===
    271 
    272  * 清空垃圾桶 [[BR]] Clean up Recycled
    273 {{{
    274 $ hadoop fs -expunge
    275 }}}
    276 
    277 === -getmerge ===
    278 
    279  * 將來源目錄<src>下所有的文件都集合到本地端一個<localdst>檔案內 [[BR]] Merge all files in HDFS source folder <src> into one local file
    280 {{{
    281 $ hadoop fs -getmerge <src> <localdst>
    282 }}}
    283 {{{
    284 $ mkdir -p in1
    285 $ echo "this is one; " >> in1/input
    286 $ echo "this is two; " >> in1/input2
    287 $ hadoop fs -put in1 in1
    288 $ hadoop fs -getmerge in1 merge.txt
    289 $ cat ./merge.txt
    290 }}}
    291  * 您應該會看到類似底下的結果:[[BR]]You should see results like this:
    292 {{{
    293 this is one;
    294 this is two;
    295 }}}
    296 
    297 === -ls ===
    298 
    299  * 列出文件或目錄的資訊 [[BR]] List files and folders
    300  * 文件名 <副本數> 文件大小 修改日期 修改時間 權限 用戶ID 組ID  [[BR]] <file name> <replication> <size> <modified date> <modified time> <permission> <user id> <group id>
    301  * 目錄名 <dir> 修改日期 修改時間 權限 用戶ID 組ID [[BR]] <folder name> <modified date> <modified time> <permission> <user id> <group id>
    302 {{{
    303 $ hadoop fs -ls
    304 Found 5 items
    305 drwxr-xr-x   - hXXXX supergroup          0 2011-04-19 09:32 /user/hXXXX/dfs_input
    306 drwxr-xr-x   - hXXXX supergroup          0 2011-04-19 09:34 /user/hXXXX/in1
    307 drwxrwxrwx   - hXXXX hXXXX               0 2011-04-19 09:21 /user/hXXXX/input
    308 drwxr-xr-x   - hXXXX supergroup          0 2011-04-19 09:33 /user/hXXXX/input1
    309 drwxr-xr-x   - hXXXX supergroup          0 2010-01-24 17:23 /user/hXXXX/tmp
    310 }}}
    311 
    312 === -lsr ===
    313 
    314  * ls 命令的遞迴版本 [[BR]] list files and folders with recursive
    315 {{{
    316 $ hadoop fs -lsr in1
    317 -rw-r--r--   2 hXXXX supergroup         14 2011-04-19 09:34 /user/hXXXX/in1/input
    318 -rw-r--r--   2 hXXXX supergroup         14 2011-04-19 09:34 /user/hXXXX/in1/input2
    319 }}}
    320 
    321 === -mkdir ===
    322 
    323  * 建立資料夾 [[BR]] create directories
    324 {{{
    325 $ hadoop fs -mkdir a b c
    326 }}}
    327 
    328 === -moveFromLocal ===
    329 
    330  * 將 local 端的資料夾剪下移動到 hdfs 上 [[BR]] move local files or folder to HDFS ( it will delete local files or folder. )
    331 {{{
    332 $ hadoop fs -moveFromLocal in1 in2
    333 }}}
    334 
    335 === -mv ===
    336 
    337  * 更改資料的名稱 [[BR]] Change file name or folder name.
    338 {{{
    339 $ hadoop fs -mv in2 in3
    340 }}}
    341 
    342 === -rm ===
    343 
    344  * 刪除指定的檔案(不可資料夾)[[BR]] Remove given files (not folders)
    345 {{{
    346 $ hadoop fs -rm in1/input
    347 Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/in1/input
    348 }}}
    349 === -rmr ===
    350 
    351  * 遞迴刪除資料夾(包含在內的所有檔案) [[BR]] Remove given files and folders with recursive
    352 {{{
    353 $ hadoop fs -rmr a b c dfs_input in3 input input1
    354 Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/a
    355 Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/b
    356 Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/c
    357 Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/dfs_input
    358 Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/in3
    359 Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/input
    360 Deleted hdfs://hadoop.nchc.org.tw/user/hXXXX/input1
    361 }}}
    362 
    363 === -setrep ===
    364 
    365  * 設定副本係數 [[BR]] setup replication numbers of given files or folder
    366 {{{
    367 $ hadoop fs -setrep [-R] [-w] <rep> <path/file>
    368 }}}
    369 {{{
    370 $ hadoop fs -setrep -w 2 -R in1
    371 Replication 2 set: hdfs://hadoop.nchc.org.tw/user/hXXXX/in1/input2
    372 Waiting for hdfs://hadoop.nchc.org.tw/user/hXXXX/in1/input2 ... done
    373 }}}
    374 
    375 === -stat ===
    376 
    377  * 印出時間資訊 [[BR]] Print Status of time stamp of folder
    378 {{{
    379 $ hadoop fs -stat in1
    380 2011-04-19 09:34:49
    381 }}}
    382 === -tail ===
    383 
    384  * 將文件的最後 1K 內容輸出 [[BR]] Display the last 1K contents of given file
    385  * 用法  Usage
    386 {{{
    387 hadoop fs -tail [-f] 檔案 (-f 參數用來顯示如果檔案增大,則秀出被append上得內容)
    388 hadoop fs -tail [-f] <path/file> (-f is used when file had appended)
    389 }}}
    390 {{{
    391 $ hadoop fs -tail in1/input2
    392 this is two;
    393 }}}
    394 
    395 === -test ===
    396 
    397  * 測試檔案, -e 檢查文件是否存在(1=存在, 0=否), -z 檢查文件是否為空(1=空, 0=不為空), -d 檢查是否為目錄(1=存在, 0=否) [[BR]] test files or folders [[BR]] -e : check if file or folder existed ( 1 = exist , 0 = false )[[BR]] -z : check if file is empty ( 1 = empty , 0 = false ) [[BR]] -d : check if given path is folder ( 1 = it's folder , 0 = false )
    398    * 要用 echo $? 來看回傳值為 0 or 1 [[BR]] You have to use '''echo $?''' to get the return value
    399  * 用法 Usage
    400 {{{
    401 $ hadoop fs -test -[ezd] URI
    402 }}}
    403  
    404 {{{
    405 $ hadoop fs -test -e in1/input2
    406 $ echo $?
    407 0
    408 $ hadoop fs -test -z in1/input3
    409 $ echo $?
    410 1
    411 $ hadoop fs -test -d in1/input2
    412 $ echo $?
    413 1
    414 }}}
    415 
    416 === -text ===
    417 
    418  * 將檔案(如壓縮檔, textrecordinputstream)輸出為純文字格式 [[BR]] Display archive file contents into STDOUT
    419 {{{
    420 $ hadoop fs -text <src>
    421 }}}
    422 {{{
    423 $ gzip merge.txt
    424 $ hadoop fs -put merge.txt.gz .
    425 $ hadoop fs -text merge.txt.gz
    426 11/04/19 09:54:16 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    427 11/04/19 09:54:16 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
    428 this is one;
    429 this is two;
    430 }}}
    431  * ps : 目前沒支援zip的函式庫 [[BR]] PS. It does not support zip files yet.
    432 {{{
    433 $ gunzip merge.txt.gz
    434 $ zip merge.zip merge.txt
    435 $ hadoop fs -put merge.zip .
    436 $ hadoop fs -text merge.zip
    437 PK�N�>E73       merge.txtUT     ���Mq��Mux
    438                                            ��+��,V���Tk�(��<�PK�N�>E73  ��merge.txtUT���Mux
    439                               ��PKOY
    440 }}}
    441 
    442 === -touchz ===
    443 
    444  * 建立一個空文件 [[BR]] creat an empty file
    445 {{{
    446 $ hadoop fs -touchz in1/kk
    447 $ hadoop fs -test -z in1/kk
    448 $ echo $?
    449 0
    450 }}}
    451 
    452 ----
    453 
    454  * 您可以用以下指令把以上練習產生的暫存目錄與檔案清除:[[BR]]You can clean up the temporary folders and files using following command:
    455 {{{
    456 ~$ hadoop fs -rmr in1 merge.txt.gz merge.zip
    457 ~$ rm -rf input1/ fromHDFS/ merge.zip
    458 }}}
     17* https://lab.3du.me