| | 1 | = 2009-08-27 = |
| | 2 | |
| | 3 | * [計畫] Hadoop 叢集維護 |
| | 4 | * [狀況] 發現 hadoop104, hadoop106 kernel panic |
| | 5 | * [狀況] 發現 /etc/hadoop/conf/hadoop-site.xml 中 dfs.replication 數值為 1 也就是沒做備份 |
| | 6 | * [解法] |
| | 7 | 1. 修改 dfs.replication 數值為 3 |
| | 8 | 2. 重新執行 hadoop-namenode 與 hadoop-datanode |
| | 9 | 3. 使用 hadoop fs -setrep 設定目前為 1 的 /user 目錄所有檔案 |
| | 10 | {{{ |
| | 11 | root@hadoop:~# su -s /bin/sh hadoop -c "hadoop fs -setrep -R 3 /user" |
| | 12 | }}} |
| | 13 | 4. 使用 hadoop balancer 嘗試資料的 replication 機制是否會被執行 |
| | 14 | {{{ |
| | 15 | root@hadoop:~# su -s /bin/sh hadoop -c "hadoop balancer" |
| | 16 | }}} |
| | 17 | 5. 使用 hadoop fsck 嘗試資料的 replication 機制是否會被執行 |
| | 18 | {{{ |
| | 19 | root@hadoop:~# su -s /bin/sh hadoop -c "hadoop fsck / -racks" |
| | 20 | }}} |
| | 21 | {{{ |
| | 22 | #!sh |
| | 23 | ### 會有訊息顯示目前的 replication 數目不夠 |
| | 24 | /user/waue/input/1.txt: Under replicated blk_-682447276956362627_16045. Target Replicas is 3 but found 1 replica(s). |
| | 25 | ### 自從誤刪 hadoop113 硬碟資料後,HDFS 狀態都是 CORRUPT,看樣子要請大家重新上傳看看了 |
| | 26 | Status: CORRUPT |
| | 27 | Total size: 2514937876121 B |
| | 28 | Total dirs: 2800 |
| | 29 | Total files: 14972 |
| | 30 | Total blocks (validated): 51686 (avg. block size 48658009 B) |
| | 31 | ******************************** |
| | 32 | CORRUPT FILES: 1921 |
| | 33 | MISSING BLOCKS: 4972 |
| | 34 | MISSING SIZE: 232737717270 B |
| | 35 | CORRUPT BLOCKS: 4972 |
| | 36 | ******************************** |
| | 37 | Minimally replicated blocks: 46714 (90.38037 %) |
| | 38 | Over-replicated blocks: 3 (0.00580428 %) |
| | 39 | Under-replicated blocks: 45388 (87.81488 %) |
| | 40 | Mis-replicated blocks: 0 (0.0 %) |
| | 41 | Default replication factor: 3 |
| | 42 | Average block replication: 1.0597067 |
| | 43 | Corrupt blocks: 4972 |
| | 44 | Missing replicas: 89401 (163.2239 %) |
| | 45 | Number of data-nodes: 17 |
| | 46 | Number of racks: 1 |
| | 47 | |
| | 48 | The filesystem under path '/' is CORRUPT |
| | 49 | }}} |
| | 50 | * [發現] Hadoop 對於 HDFS /var/lib/hadoop/cache 目錄裡的檔案還真是保護到極致了...設定了 10 個 replication 副本 |
| | 51 | {{{ |
| | 52 | -rw-r--r-- 10 hadoop002 supergroup 108739 2009-08-24 22:55 /var/lib/hadoop/cache/hadoop/mapred/system/job_200908242228_0009/job.jar |
| | 53 | }}} |