| 17 | == prepare big data set == |
| 18 | == 準備一個大的資料集 == |
| 19 | |
| 20 | * 首先,讓我們產生一個大小為 200MB 的檔案。 |
| 21 | {{{ |
| 22 | h998@hadoop:~$ dd if=/dev/zero of=200mb.img bs=1M count=200 |
| 23 | 200+0 records in |
| 24 | 200+0 records out |
| 25 | 209715200 bytes (210 MB) copied, 0.239545 s, 875 MB/s |
| 26 | }}} |
| 27 | * 驗證一下檔案大小 |
| 28 | {{{ |
| 29 | h998@hadoop:~$ du -sh 200mb.img |
| 30 | 200M 200mb.img |
| 31 | }}} |
| 32 | * 將 200mb.img 上傳到 HDFS |
| 33 | {{{ |
| 34 | h998@hadoop:~$ hadoop fs -put 200mb.img 200mb.img |
| 35 | }}} |
| 36 | * 驗證一下,上傳是否成功? |
| 37 | {{{ |
| 38 | h998@hadoop:~$ hadoop fs -ls 200mb.img |
| 39 | Found 1 items |
| 40 | -rw-r--r-- 2 h998 supergroup 209715200 2013-08-12 12:06 /user/h998/200mb.img |
| 41 | }}} |
| 42 | |
| 46 | * 首先,讓我們學習一下 fsck 的基本用法 |
| 47 | {{{ |
| 48 | #!sh |
| 49 | ~$ hadoop fsck |
| 50 | Usage: DFSck <path> [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]] |
| 51 | <path> start checking from this path |
| 52 | -move move corrupted files to /lost+found |
| 53 | -delete delete corrupted files |
| 54 | -files print out files being checked |
| 55 | -openforwrite print out files opened for write |
| 56 | -blocks print out block report |
| 57 | -locations print out locations for every block |
| 58 | -racks print out network topology for data-node locations |
| 59 | By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually tagged CORRUPT or HEALTHY depending on their block allocation status |
| 60 | }}} |
| 61 | * 我們先不給任何參數,只給絕對路徑看看結果 |
| 62 | {{{ |
| 63 | h998@hadoop:~$ hadoop fsck /user/${USER}/200mb.img |
| 64 | .Status: HEALTHY |
| 65 | Total size: 209715200 B |
| 66 | Total dirs: 0 |
| 67 | Total files: 1 |
| 68 | Total blocks (validated): 2 (avg. block size 104857600 B) |
| 69 | Minimally replicated blocks: 2 (100.0 %) |
| 70 | Over-replicated blocks: 0 (0.0 %) |
| 71 | Under-replicated blocks: 0 (0.0 %) |
| 72 | Mis-replicated blocks: 0 (0.0 %) |
| 73 | Default replication factor: 2 |
| 74 | Average block replication: 2.0 |
| 75 | Corrupt blocks: 0 |
| 76 | Missing replicas: 0 (0.0 %) |
| 77 | Number of data-nodes: 12 |
| 78 | Number of racks: 1 |
| 79 | |
| 80 | |
| 81 | The filesystem under path '/user/h998/200mb.img' is HEALTHY |
| 82 | }}} |
| 83 | |
| 84 | * 接著,我們要來使用 fsck 的參數,來觀察 200mb.img 到底有幾個區塊?這些區塊又分別存放在哪幾台機器中呢? |
| 85 | {{{ |
| 86 | h998@hadoop:~$ hadoop fsck /user/${USER}/200mb.img -files -blocks -locations -racks |
| 87 | /user/h998/200mb.img 209715200 bytes, 2 block(s): OK |
| 88 | 0. blk_-6674004733773524889_19333928 len=134217728 repl=2 [/default-rack/192.168.1.4:50010, /default-rack/192.168.1.8:50010] |
| 89 | 1. blk_-2951307914939094717_19333928 len=75497472 repl=2 [/default-rack/192.168.1.14:50010, /default-rack/192.168.1.2:50010] |
| 90 | |
| 91 | Status: HEALTHY |
| 92 | Total size: 209715200 B |
| 93 | Total dirs: 0 |
| 94 | Total files: 1 |
| 95 | Total blocks (validated): 2 (avg. block size 104857600 B) |
| 96 | Minimally replicated blocks: 2 (100.0 %) |
| 97 | Over-replicated blocks: 0 (0.0 %) |
| 98 | Under-replicated blocks: 0 (0.0 %) |
| 99 | Mis-replicated blocks: 0 (0.0 %) |
| 100 | Default replication factor: 2 |
| 101 | Average block replication: 2.0 |
| 102 | Corrupt blocks: 0 |
| 103 | Missing replicas: 0 (0.0 %) |
| 104 | Number of data-nodes: 12 |
| 105 | Number of racks: 1 |
| 106 | |
| 107 | |
| 108 | The filesystem under path '/user/h998/200mb.img' is HEALTHY |
| 109 | }}} |