| | 1 | {{{ |
| | 2 | #!html |
| | 3 | <div style="text-align: center;"><big |
| | 4 | style="font-weight: bold;"><big><big>實作八: Hadoop 叢集進階用法 </big></big></big></div> |
| | 5 | }}} |
| | 6 | [[PageOutline]] |
| | 7 | |
| | 8 | = 狀況一: 如何動態加入datanode 與 tasktracker = |
| | 9 | * 某些情況下,後來的環境也許會跟之前配置的不同,比如說,原本只有五台機器架設hadoop,但是也許老闆今天心血來潮,又撥了五台電腦給你。在接續之前的環境動態的擴增節點的方法,請看以下作法。 |
| | 10 | |
| | 11 | == 1.0 說明 == |
| | 12 | |
| | 13 | * 要新增的節點上,hadoop版本與設定檔要與原先的相同 |
| | 14 | * 是否能連到正確的位址取決於conf/hadoop-site.xml內的jobTracker, Namenode資訊是否正確 (目前測試結果與conf/slave、masters無關) |
| | 15 | |
| | 16 | == 1.1 加入datanode == |
| | 17 | |
| | 18 | * 在要加入的節點上面下此指令 |
| | 19 | {{{ |
| | 20 | $ cd $HADOOP_HOME |
| | 21 | $ bin/hadoop-daemon.sh --config ./conf start datanode |
| | 22 | }}} |
| | 23 | * 執行畫面如下: |
| | 24 | {{{ |
| | 25 | starting datanode, logging to /tmp/hadoop/logs/hadoop-waue-datanode-Dx7200.out |
| | 26 | }}} |
| | 27 | |
| | 28 | == 1.2 加入 taskTracker == |
| | 29 | |
| | 30 | * 是否能連到正確的namenode取決於conf/hadoop-site.xml,目前測試結果與conf/slave、masters無關 |
| | 31 | {{{ |
| | 32 | $ cd $HADOOP_HOME |
| | 33 | $ bin/hadoop-daemon.sh --config ./conf start tasktracker |
| | 34 | }}} |
| | 35 | * 執行畫面如下: |
| | 36 | {{{ |
| | 37 | starting tasktracker, logging to /tmp/hadoop/logs/hadoop-waue-tasktracker-Dx7200.out |
| | 38 | }}} |
| | 39 | |
| | 40 | ----- |
| | 41 | |
| | 42 | = 狀況二: 怎麼讓我的HDFS內資料均勻分散於各個Node上 = |
| | 43 | |
| | 44 | * 下面指令用於分析數據塊分佈和重新平衡!DataNode上的數據分佈 |
| | 45 | {{{ |
| | 46 | $ bin/hadoop balancer |
| | 47 | }}} |
| | 48 | * 執行畫面如下 |
| | 49 | {{{ |
| | 50 | Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved |
| | 51 | 09/04/01 18:00:08 INFO net.NetworkTopology: Adding a new node: /default-rack/140.110.138.191:50010 |
| | 52 | 09/04/01 18:00:08 INFO net.NetworkTopology: Adding a new node: /default-rack/140.110.141.129:50010 |
| | 53 | 09/04/01 18:00:08 INFO dfs.Balancer: 0 over utilized nodes: |
| | 54 | 09/04/01 18:00:08 INFO dfs.Balancer: 0 under utilized nodes: |
| | 55 | The cluster is balanced. Exiting... |
| | 56 | Balancing took 186.0 milliseconds |
| | 57 | }}} |
| | 58 | |
| | 59 | ------ |
| | 60 | |
| | 61 | = 狀況三:如何讓已上線服務的Hadoop進行升級並且不失去以前的資料 = |
| | 62 | |
| | 63 | * 假設從原本的hadoop 0.16升級到hadoop 0.18 |
| | 64 | * 如果把conf/這個資料夾至於$Hadoop_home目錄下的話,一旦換版本就連conf也被一併換掉,但無論hadoop的版本新舊,其實設定檔及其資訊是可以共用的。 |
| | 65 | |
| | 66 | == step 1. 停止hdfs == |
| | 67 | |
| | 68 | * 先看狀態 |
| | 69 | {{{ |
| | 70 | $ cd /opt/hadoop/ |
| | 71 | $ bin/hadoop dfsadmin -upgradeProgress status |
| | 72 | |
| | 73 | There are no upgrades in progress. |
| | 74 | }}} |
| | 75 | |
| | 76 | * 停止hdfs |
| | 77 | * 注意不可使用bin/stop-all.sh來停止 |
| | 78 | {{{ |
| | 79 | $ bin/stop-dfs.sh |
| | 80 | }}} |
| | 81 | |
| | 82 | == Step 2. 鍊結新版本hadoop == |
| | 83 | |
| | 84 | * 把conf 移至/opt/conf ,hadoop 0.16 與 hadoop 0.18用 ln 做捷徑代換。 |
| | 85 | * 以下假設你已經下載好hadoop0.18並解壓縮後,資料夾名稱為hadoop-0.18.3 |
| | 86 | {{{ |
| | 87 | $ cd opt/ |
| | 88 | $ mv hadoop/conf ./ |
| | 89 | $ mv hadoop hadoop-0.16 |
| | 90 | $ ln hadoop-0.18.3 hadoop |
| | 91 | }}} |
| | 92 | |
| | 93 | == step 3. 設置環境變數 == |
| | 94 | |
| | 95 | * 由於conf已不在hadoop_home內,因此記得匯入conf/hadoop-env.sh的參數 |
| | 96 | * 填入hadoop-env.sh 內$HADOOP_CONF_DIR正確路徑,並匯入資訊 |
| | 97 | {{{ |
| | 98 | $ source /opt/conf/hadoop-env.sh |
| | 99 | }}} |
| | 100 | |
| | 101 | == step 4. 每個節點都部署新版本的Hadoop == |
| | 102 | |
| | 103 | * 若有多個node的話,則每個node的hadoop版本都要統一,否則會出現問題 |
| | 104 | |
| | 105 | == step 5. 啟動 == |
| | 106 | |
| | 107 | {{{ |
| | 108 | $ bin/start-dfs.sh -upgrade |
| | 109 | }}} |
| | 110 | |
| | 111 | * namenode管理網頁會出現升級狀態 |
| | 112 | |
| | 113 | ------- |
| | 114 | |
| | 115 | = 狀況四:如何讓已上線服務的Hadoop進行降級並且不失去以前的資料 = |
| | 116 | |
| | 117 | * 此情況與狀況三相反,因此作法類似狀況三,下面的狀況假設設定檔已在/opt/conf內,並且/opt內也都有hadoop-0.16 與 hadoop-0.18.3 兩個資料夾,而且節點只有一台。 |
| | 118 | |
| | 119 | == step 1. 停止 HDFS == |
| | 120 | |
| | 121 | {{{ |
| | 122 | $ cd /opt/hadoop |
| | 123 | $ bin/stop-dfs.sh |
| | 124 | }}} |
| | 125 | |
| | 126 | == step 2. 部署老版本的Hadoop == |
| | 127 | |
| | 128 | {{{ |
| | 129 | $ rm /opt/hadoop |
| | 130 | $ ln hadoop-0.16 hadoop |
| | 131 | }}} |
| | 132 | |
| | 133 | == step 3. 退回之前版本 == |
| | 134 | |
| | 135 | {{{ |
| | 136 | $ bin/start-dfs.sh -rollback |
| | 137 | }}} |
| | 138 | |
| | 139 | ----- |
| | 140 | |
| | 141 | = 狀況五:我的HDFS檔案系統是否正常 = |
| | 142 | |
| | 143 | * 在此介紹HDFS文件系統檢查工具 "bin/hadoop fsck" |
| | 144 | |
| | 145 | {{{ |
| | 146 | $ bin/hadoop fsck / |
| | 147 | }}} |
| | 148 | * 執行畫面如下 |
| | 149 | {{{ |
| | 150 | . |
| | 151 | /user/waue/input/1.txt: Under replicated blk_-90085106852013388_1001. Target Replicas is 3 but found 2 replica(s). |
| | 152 | /user/waue/input/1.txt: Under replicated blk_-4027196261436469955_1001. Target Replicas is 3 but found 2 replica(s). |
| | 153 | . |
| | 154 | /user/waue/input/2.txt: Under replicated blk_-2300843106107816641_1002. Target Replicas is 3 but found 2 replica(s). |
| | 155 | . |
| | 156 | /user/waue/input/3.txt: Under replicated blk_-1561577350198661966_1003. Target Replicas is 3 but found 2 replica(s). |
| | 157 | . |
| | 158 | /user/waue/input/4.txt: Under replicated blk_1316726598778579026_1004. Target Replicas is 3 but found 2 replica(s). |
| | 159 | Status: HEALTHY |
| | 160 | Total size: 143451003 B |
| | 161 | Total dirs: 8 |
| | 162 | Total files: 4 |
| | 163 | Total blocks (validated): 5 (avg. block size 28690200 B) |
| | 164 | Minimally replicated blocks: 5 (100.0 %) |
| | 165 | Over-replicated blocks: 0 (0.0 %) |
| | 166 | Under-replicated blocks: 5 (100.0 %) |
| | 167 | Mis-replicated blocks: 0 (0.0 %) |
| | 168 | Default replication factor: 3 |
| | 169 | Average block replication: 2.0 |
| | 170 | Corrupt blocks: 0 |
| | 171 | Missing replicas: 5 (50.0 %) |
| | 172 | Number of data-nodes: 2 |
| | 173 | Number of racks: 1 |
| | 174 | The filesystem under path '/' is HEALTHY |
| | 175 | }}} |
| | 176 | |
| | 177 | * 加不同的參數有不同的用處,如 |
| | 178 | {{{ |
| | 179 | $ bin/hadoop fsck / -files |
| | 180 | }}} |
| | 181 | * 執行畫面如下 |
| | 182 | {{{ |
| | 183 | /tmp <dir> |
| | 184 | /tmp/hadoop <dir> |
| | 185 | /tmp/hadoop/hadoop-waue <dir> |
| | 186 | /tmp/hadoop/hadoop-waue/mapred <dir> |
| | 187 | /tmp/hadoop/hadoop-waue/mapred/system <dir> |
| | 188 | /user <dir> |
| | 189 | /user/waue <dir> |
| | 190 | /user/waue/input <dir> |
| | 191 | /user/waue/input/1.txt 115045564 bytes, 2 block(s): Under replicated blk_-90085106852013388_1001. Target Replicas is 3 but found 2 replica(s). |
| | 192 | Under replicated blk_-4027196261436469955_1001. Target Replicas is 3 but found 2 replica(s). |
| | 193 | /user/waue/input/2.txt 987864 bytes, 1 block(s): Under replicated blk_-2300843106107816641_1002. Target Replicas is 3 but found 2 replica(s). |
| | 194 | /user/waue/input/3.txt 1573048 bytes, 1 block(s): Under replicated blk_-1561577350198661966_1003. Target Replicas is 3 but found 2 replica(s). |
| | 195 | /user/waue/input/4.txt 25844527 bytes, 1 block(s): Under replicated blk_1316726598778579026_1004. Target Replicas is 3 but found 2 replica(s). |
| | 196 | Status: HEALTHY |
| | 197 | ....(同上) |
| | 198 | }}} |
| | 199 | |
| | 200 | ----- |
| | 201 | |
| | 202 | = 狀況六:我的系統似乎跑太多Job,我要幫他減肥 = |
| | 203 | |
| | 204 | == step 1. 把所有程序列出來 == |
| | 205 | |
| | 206 | * 可到JobTracker:50030網頁來看程序的Jobid |
| | 207 | * 或用指令印出所有程序 |
| | 208 | {{{ |
| | 209 | $ bin/hadoop job -list all |
| | 210 | |
| | 211 | 5 jobs submitted |
| | 212 | States are: |
| | 213 | Running : 1 Succeded : 2 Failed : 3 Prep : 4 |
| | 214 | JobId State StartTime UserName |
| | 215 | job_200904021140_0001 2 1238652150499 waue |
| | 216 | job_200904021140_0002 3 1238657754096 waue |
| | 217 | job_200904021140_0004 3 1238657989495 waue |
| | 218 | job_200904021140_0005 2 1238658076347 waue |
| | 219 | job_200904021140_0006 2 1238658644666 waue |
| | 220 | }}} |
| | 221 | |
| | 222 | == step 2. more detail == |
| | 223 | |
| | 224 | * 查看工作狀態 |
| | 225 | {{{ |
| | 226 | $ bin/hadoop job -status job_200904021140_0001 |
| | 227 | }}} |
| | 228 | * 印出程序的歷史狀態 |
| | 229 | {{{ |
| | 230 | $ bin/hadoop job -history /user/waue/stream-output1 |
| | 231 | |
| | 232 | Hadoop job: job_200904021140_0005 |
| | 233 | ===================================== |
| | 234 | Job tracker host name: gm1.nchc.org.tw |
| | 235 | job tracker start time: Thu Apr 02 11:40:06 CST 2009 |
| | 236 | User: waue |
| | 237 | JobName: streamjob9019.jar |
| | 238 | JobConf: hdfs://gm1.nchc.org.tw:9000/tmp/hadoop/hadoop-waue/mapred/system/job_200904021140_0005/job.xml |
| | 239 | Submitted At: 2-四月-2009 15:41:16 |
| | 240 | Launched At: 2-四月-2009 15:41:16 (0sec) |
| | 241 | Finished At: 2-四月-2009 15:42:04 (48sec) |
| | 242 | Status: SUCCESS |
| | 243 | ===================================== |
| | 244 | ...略 |
| | 245 | }}} |
| | 246 | |
| | 247 | == step 3. 殺很大、殺不用錢 == |
| | 248 | |
| | 249 | * 終止正在執行的程序,如 id 為 job_200904021140_0001 |
| | 250 | {{{ |
| | 251 | $ bin/hadoop job -kill job_200904021140_0001 |
| | 252 | }}} |
| | 253 | -------- |
| | 254 | |
| | 255 | = 狀況七:怎麼看目前的Hadoop版本 = |
| | 256 | |
| | 257 | * 印出目前的hadoop 版本 |
| | 258 | {{{ |
| | 259 | $ bin/hadoop version |
| | 260 | }}} |
| | 261 | * 執行畫面 |
| | 262 | {{{ |
| | 263 | Hadoop 0.18.3 |
| | 264 | Subversion https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r 736250 |
| | 265 | Compiled by ndaley on Thu Jan 22 23:12:08 UTC 2009 |
| | 266 | }}} |
| | 267 | |
| | 268 | ----- |
| | 269 | = 狀況八:我要設定HDFS的帳戶及配額 = |
| | 270 | |
| | 271 | == step 1. 先設定各使用者的預設資料夾,屬性及讀寫權限 == |
| | 272 | |
| | 273 | * hdfs的權限有owner, group, other三種 |
| | 274 | * 而用戶的身份取決於client上的使用者 (用 whoami),群組為(bash -c groups) |
| | 275 | * 相關的操作: |
| | 276 | {{{ |
| | 277 | $ bin/hadoop fs -mkdir own |
| | 278 | $ bin/hadoop fs -chmod -R 755 own |
| | 279 | $ bin/hadoop fs -chgrp -R waue own |
| | 280 | $ bin/hadoop fs -chown -R waue own |
| | 281 | $ bin/hadoop fs -lsr own |
| | 282 | }}} |
| | 283 | * conf/hadoop-site.xml 可用參數: |
| | 284 | {{{ |
| | 285 | #!php |
| | 286 | dfs.permissions = true |
| | 287 | dfs.web.ugi = webuser,webgroup |
| | 288 | dfs.permissions.supergroup = supergroup |
| | 289 | dfs.upgrade.permission = 777 |
| | 290 | dfs.umask = 022 |
| | 291 | }}} |
| | 292 | |
| | 293 | == step 2. 設定配額 == |
| | 294 | |
| | 295 | * 目錄配額是對目錄樹上該目錄下的名字數量做硬性限制 |
| | 296 | * 設定配額,數字代表個數 (如:我上傳了一個2個block的檔案可以上傳,但我上傳兩個檔案很小的檔上去卻不行) |
| | 297 | * 配額為1可以強制目錄保持為空 |
| | 298 | * 重命名不會改變該目錄的配額 |
| | 299 | {{{ |
| | 300 | $ bin/hadoop fs -mkdir quota |
| | 301 | $ bin/hadoop dfsadmin -setQuota 2 quota |
| | 302 | $ bin/hadoop fs -put ../conf/hadoop-env.sh quota/ |
| | 303 | $ bin/hadoop fs -put ../conf/hadoop-site.xml quota/ |
| | 304 | |
| | 305 | put: org.apache.hadoop.dfs.QuotaExceededException: The quota of /user/waue/quota is exceeded: quota=2 count=3 |
| | 306 | }}} |
| | 307 | |
| | 308 | * 檢查目錄的配額方法: "bin/hadoop fs -count -q <目錄> " |
| | 309 | |
| | 310 | {{{ |
| | 311 | $ bin/hadoop fs -count -q own |
| | 312 | none inf 1 0 0 hdfs://gm1.nchc.org.tw:9000/user/waue/own |
| | 313 | $ bin/hadoop dfsadmin -setQuota 4 own |
| | 314 | $ bin/hadoop fs -count -q own |
| | 315 | 4 3 1 0 0 hdfs://gm1.nchc.org.tw:9000/user/waue/own |
| | 316 | }}} |
| | 317 | |
| | 318 | * 清除之前設定的配額 |
| | 319 | {{{ |
| | 320 | $ bin/hadoop dfsadmin -clrQuota quota/ |
| | 321 | }}} |