Changes between Version 15 and Version 16 of waue/Hadoop_DRBL
- Timestamp:
- Feb 25, 2009, 6:38:10 PM (16 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
waue/Hadoop_DRBL
v15 v16 1 1 [[PageOutline]] 2 2 3 = Hadoop Cluster Based on DRBL = 4 5 == drbl server 作業環境 == 3 = DRBL叢集上運行HADOOP = 4 '''Hadoop Cluster Based on DRBL''' 5 6 * 此篇的目的在於利用DRBL統整一個Cluster,並在上面運行Hadoop。 7 * 由於DRBL為無碟系統,並非一般的Cluster,因此有些地方需要注意。 8 9 == 零、環境說明 == 10 11 環境中共有七台機器,一台為drbl server,也是hadoop的namenode,其他節點則client 與datanode,如下: 12 || 名稱 || ip || drbl用途 || hadoop 用途 || 13 || hadoop || 192.168.1.254 || drbl server || namenode || 14 || hadoop || 192.168.1.2 || drbl server || namenode || 15 || hadoop || 192.168.1.3 || drbl clinet || datanode || 16 || hadoop || 192.168.1.4 || drbl clinet || datanode || 17 || hadoop || 192.168.1.5 || drbl clinet || datanode || 18 || hadoop || 192.168.1.6 || drbl clinet || datanode || 19 || hadoop || 192.168.1.7 || drbl clinet || datanode || 20 21 介紹drbl server環境如下: 6 22 || debian || etch (4.0) || server - 64 bit || 7 23 8 * 安裝drbl 9 10 * 安裝 java 6 11 12 在套件庫裡 /etc/apt/sources.list 加入 non-free 庫以及 backports 網址才能安裝 sun-java6 24 DRBL為無碟系統,因此只要將drbl server系統與所需服務安裝好,則其他的client網路開機後,就會載入以server為依據的檔案系統,也就是說,只有某些特定資料夾內的內容(如 /etc /root /home /tmp /var ...)會各自不同之外,其他都一樣。舉例若改了server內/etc/hosts檔的,則其他的client都會自動即時一起更改(因為是用NFS mount 上來的)。 25 26 因此,只要先在drbl server上完成了'''一、安裝''','''二、設定'''之後,在將其他的client開機然後依照'''三、操作''' 就可以了。 27 28 == 一、安裝 == 29 30 === 安裝drbl === 31 * 詳見 [http://drbl.nchc.org.tw/one4all/desktop/ DRBL的安裝] 32 33 === 安裝 java 6 === 34 35 * 在套件庫裡 /etc/apt/sources.list 加入 non-free 庫以及 backports 網址才能安裝 sun-java6 13 36 {{{ 14 37 deb http://opensource.nchc.org.tw/debian/ etch main contrib non-free … … 19 42 deb http://free.nchc.org.tw/drbl-core drbl stable 20 43 }}} 21 安裝key及java644 * 安裝key及java6 22 45 {{{ 23 46 $ wget http://www.backports.org/debian/archive.key … … 27 50 }}} 28 51 29 30 = Hadoop Install = 31 32 * download Hadoop 0.18.3 52 === 安裝 Hadoop 0.18.3 === 53 33 54 {{{ 34 55 $ cd /opt … … 37 58 hadoop:/opt# ln -sf hadoop-0.18.3 hadoop 38 59 }}} 39 * 在 ~/.bashrc 的最末加入 以下資訊 60 61 = 二、設定 Hadoop = 62 63 * 在 /etc/bash.bashrc 的最末加入 以下資訊 40 64 {{{ 41 65 PATH=$PATH:/opt/drbl/bin:/opt/drbl/sbin … … 43 67 export HADOOP_HOME=/opt/hadoop/ 44 68 }}} 45 並執行 46 {{{ 47 $ source ~/.bashrc 48 }}} 49 * edit hadoop-0.18.3/conf/hadoop-env.sh 69 70 * 編輯 /etc/hosts 把下面內容貼在最後 71 {{{ 72 192.168.1.254 gm2.nchc.org.tw 73 192.168.1.1 hadoop101 74 192.168.1.10 hadoop110 75 192.168.1.11 hadoop111 76 192.168.1.2 hadoop102 77 192.168.1.3 hadoop103 78 192.168.1.4 hadoop104 79 192.168.1.5 hadoop105 80 192.168.1.6 hadoop106 81 192.168.1.7 hadoop107 82 192.168.1.8 hadoop108 83 192.168.1.9 hadoop109 84 }}} 85 86 * 編輯 /opt/hadoop-0.18.3/conf/hadoop-env.sh 50 87 {{{ 51 88 #!diff … … 54 91 @@ -6,7 +6,9 @@ 55 92 # remote nodes. 56 57 93 # The java implementation to use. Required. 58 94 -# export JAVA_HOME=/usr/lib/j2sdk1.5-sun … … 65 101 }}} 66 102 67 * edithadoop-0.18.3/conf/hadoop-site.xml103 * 編輯 /opt/hadoop-0.18.3/conf/hadoop-site.xml 68 104 {{{ 69 105 #!diff … … 77 113 + <property> 78 114 + <name>fs.default.name</name> 79 + <value>hdfs:// 192.168.1.254:9000/</value>115 + <value>hdfs://gm2.nchc.org.tw:9000/</value> 80 116 + <description> 81 117 + The name of the default file system. Either the literal string … … 85 121 + <property> 86 122 + <name>mapred.job.tracker</name> 87 + <value>hdfs:// 192.168.1.254:9001</value>123 + <value>hdfs://gm2.nchc.org.tw:9001</value> 88 124 + <description> 89 125 + The host and port that the MapReduce job tracker runs at. If … … 95 131 }}} 96 132 97 = DRBL setup = 133 * 編輯 /opt/hadoop/conf/slaves 134 {{{ 135 hadoop102 136 hadoop103 137 hadoop104 138 hadoop105 139 hadoop106 140 hadoop107 141 hadoop 142 143 }}} 144 145 = 三、DRBL 操作 = 98 146 99 == Environment == 147 == 開啟client == 148 * 將所有的 client 開啟,並且如下 100 149 {{{ 101 150 ****************************************************** … … 106 155 | +-- [eth2] 140.110.xxx.130| +- to WAN 107 156 | | 108 | +-- [eth1] 192.168.1.254 +- to clients group 1 [ 16 clients, their IP109 | | from 192.168.1. 1 - 192.168.1.16]157 | +-- [eth1] 192.168.1.254 +- to clients group 1 [ 6 clients, their IP 158 | | from 192.168.1.2 - 192.168.1.7] 110 159 +------------------------------+ 111 160 ****************************************************** 112 Total clients: 16161 Total clients: 6 113 162 ****************************************************** 114 163 }}} … … 133 182 #!/bin/bash 134 183 135 for ((i= 1;i<=16;i++));184 for ((i=2;i<=7;i++)); 136 185 do 137 186 scp scp -r ~/.ssh/ "192.168.1.$i":~/ … … 143 192 * 正確無誤則可免密碼登入 144 193 145 === dsh === 194 ==== dsh ==== 195 * 此節非必要可不做 196 146 197 {{{ 147 198 $ sudo apt-get install dsh 148 199 $ mkdir -p .dsh 149 $ for ((i=1;i<=16;i++)); do echo "192.168.1.$i" >> .dsh/machines.list; done 200 $ for ((i=2;i<=7;i++)); do echo "192.168.1.$i" >> .dsh/machines.list; done 201 }}} 202 並執行 203 {{{ 204 $ dsh -a source /etc/bash.bashrc 150 205 }}} 151 206 152 207 == DRBL Server as Hadoop namenode == 153 * edit /etc/rc.local for DRBL Server as Hadoop namenode 154 {{{ 155 #!diff 156 --- /etc/rc.local.org 2008-11-07 18:09:10.000000000 +0800 157 +++ /etc/rc.local 2008-11-07 17:58:14.000000000 +0800 158 @@ -11,4 +11,7 @@ 159 # 160 # By default this script does nothing. 161 162 +echo 3 > /proc/sys/vm/drop_caches 163 +/opt/hadoop-0.18.3/bin/hadoop namenode -format 164 +/opt/hadoop-0.18.3/bin/hadoop-daemon.sh start namenode 165 +/opt/hadoop-0.18.3/bin/hadoop-daemon.sh start jobtracker 166 +/opt/hadoop-0.18.3/bin/hadoop-daemon.sh start tasktracker 167 exit 0 168 }}} 169 * edit hadoop_datanode for DRBL client as datanode 170 {{{ 171 $ cat > hadoop_datanode << EOF 172 }}} 173 174 175 {{{ 176 #! /bin/sh 177 set -e 178 179 # /etc/init.d/hadoop_datanode: start and stop Hadoop DFS datanode for DRBL Client 180 181 export PATH="${PATH:+$PATH:}/usr/sbin:/sbin" 182 183 case "\$1" in 184 start) 185 echo -n "starting datanode:" 186 /opt/hadoop-0.18.3/bin/hadoop-daemon.sh start datanode 187 echo "[OK]" 188 ;; 189 stop) 190 echo -n "stoping datanode:" 191 /opt/hadoop-0.18.3/bin/hadoop-daemon.sh stop datanode 192 echo "[OK]" 193 ;; 194 195 *) 196 echo "Usage: /etc/init.d/hadoop_datanode {start|stop}" 197 exit 1 198 esac 199 200 exit 0 201 EOF 202 }}} 203 {{{ 204 $ chmod a+x hadoop_datanode 205 $ sudo /opt/drbl/sbin/drbl-cp-host hadoop_datanode /etc/init.d/ 206 $ sudo /opt/drbl/bin/drbl-doit update-rc.d hadoop_datanode defaults 99 207 }}} 208 * shutdown DRBL clients 209 * reboot DRBL server 210 * use "Wake on LAN" for DRBL clients 211 * browse http://192.168.1.254:50070 for DFS status 208 * 啟動 209 {{{ 210 bin/hadoop namenode -format 211 bin/start-all 212 }}} 213 * 測試 214 {{{ 215 mkdir input 216 cp *.txt input/ 217 bin/hadoop dfs -put input input 218 bin/hadoop jar hadoop-*-examples.jar wordcount input ouput 219 }}} 220 221 * browse http://gm2.nchc.org.tw:50070 for DFS status 212 222 213 223 == 參考 ==