| 7 | | * 本實作基於 Ubuntu 8.04 LTS 版本,關於 Ubuntu 8.04 的安裝程序,請參考"[wiki:jazz/Hardy Ubuntu 8.04 Server 版安裝步驟]"。 |
| 8 | | * 本課程實作之電腦教室所提供的作業環境是 Ubuntu 8.04 Server 版加裝 xubuntu 桌面的環境。 |
| 9 | | * 本頁面的部分指令,是針對不熟悉 Linux 文字編輯器的使用者所設計的'懶人'設定法,您也可以使用習慣使用的文字編輯器(如:vi,nano,joe等)進行修改。 |
| 10 | | * 這個頁面,黑底白字的部分為指令,請自行剪貼提示符號 "$"(代表一般使用者) 或 "#"(代表最高權限 root 管理者) 之後的指令。 |
| 11 | | |
| 12 | | * 登入資訊 |
| 13 | | |
| 14 | | || 使用者 || Hadooper|| |
| 15 | | || 群組 || Hadoop || |
| 16 | | || 密碼 || ****** || |
| 17 | | |
| 18 | | * Hadooper 擁有sudoer 的權限 |
| 19 | | |
| 20 | | ----- |
| 21 | | * 寫給我看的: |
| 22 | | |
| 23 | | 每台電腦都要增加此使用者 |
| 24 | | {{{ |
| 25 | | $ sudo addgroup hadoop |
| 26 | | $ sudo adduser --ingroup hadoop hadooper |
| 27 | | }}} |
| 28 | | 測試不設定.bashrc 的java home有無關係 |
| 29 | | ------- |
| 30 | | |
| | 7 | * 您手邊有兩台電腦,假設剛剛操作的電腦為node1,另一台則為node2。稍後的環境我們假設node1 為server, node2 為slaves。 |
| | 8 | * 這個實做會架設運作在叢集環境上的Hadoop,因此若是你的電腦還存在著之前的實做一的環境,請先作step 0,以移除掉之前的設定。 |
| | 9 | |
| | 10 | |
| | 11 | === 清除所有在實做一作過的環境 === |
| | 12 | |
| | 13 | * node1 (有操作過實做一的電腦)執行 |
| | 14 | {{{ |
| | 15 | ~$ killall java |
| | 16 | ~$ rm -rf /tmp/hadoop-hadooper* |
| | 17 | ~$ rm -rf /opt/hadoop/logs/*/tmp/hadoop-hadooper |
| | 18 | ~$ rm -rf /opt/hadoop |
| | 19 | ~$ rm -rf ~/.ssh |
| | 20 | }}} |
| | 21 | |
| | 22 | === 設定hostname === |
| | 23 | |
| 73 | | /opt/hadoop$ cat >> conf/hadoop-env.sh << EOF |
| 74 | | }}} |
| 75 | | |
| 76 | | 貼上以下資訊 |
| 77 | | |
| 78 | | {{{ |
| 79 | | #!sh |
| 80 | | export JAVA_HOME=/usr/lib/jvm/java-6-sun |
| 81 | | export HADOOP_HOME=/opt/hadoop |
| 82 | | export HADOOP_CONF_DIR=/opt/hadoop/conf |
| 83 | | EOF |
| | 69 | /opt/hadoop$ gedit conf/hadoop-env.sh |
| | 70 | }}} |
| | 71 | |
| | 72 | 編輯以下資訊 |
| | 73 | |
| | 74 | {{{ |
| | 75 | #!diff |
| | 76 | --- hadoop-0.18.3/conf/hadoop-env.sh.org |
| | 77 | +++ hadoop-0.18.3/conf/hadoop-env.sh |
| | 78 | @@ -6,7 +6,10 @@ |
| | 79 | # remote nodes. |
| | 80 | # The java implementation to use. Required. |
| | 81 | -# export JAVA_HOME=/usr/lib/j2sdk1.5-sun |
| | 82 | +export JAVA_HOME=/usr/lib/jvm/java-6-sun |
| | 83 | +export HADOOP_HOME=/opt/hadoop |
| | 84 | +export HADOOP_CONF_DIR=/opt/hadoop/conf |
| | 85 | +export HADOOP_LOG_DIR=/home/hadooper/logs |
| | 86 | +export HADOOP_PID_DIR=/home/hadooper/pids |
| | 87 | |
| | 88 | # Extra Java CLASSPATH elements. Optional. |
| | 89 | # export HADOOP_CLASSPATH= |
| 91 | | /opt/hadoop# cat > conf/hadoop-site.xml << EOF |
| 92 | | }}} |
| 93 | | |
| 94 | | 貼上以下內容 |
| 95 | | |
| 96 | | {{{ |
| 97 | | #!sh |
| 98 | | <configuration> |
| 99 | | <property> |
| 100 | | <name>fs.default.name</name> |
| 101 | | <value>hdfs://localhost:9000</value> |
| 102 | | <description> |
| 103 | | The name of the default file system. Either the literal string |
| 104 | | "local" or a host:port for NDFS. |
| 105 | | </description> |
| 106 | | </property> |
| 107 | | <property> |
| 108 | | <name>mapred.job.tracker</name> |
| 109 | | <value>localhost:9001</value> |
| 110 | | <description> |
| 111 | | The host and port that the MapReduce job tracker runs at. If |
| 112 | | "local", then jobs are run in-process as a single map and |
| 113 | | reduce task. |
| 114 | | </description> |
| 115 | | </property> |
| 116 | | </configuration> |
| 117 | | EOF |
| 118 | | }}} |
| 119 | | |
| 120 | | == step 6. 格式化HDFS == |
| 121 | | |
| 122 | | * 以上我們已經設定好 Hadoop 單機測試的環境,接著讓我們來啟動 Hadoop 相關服務,格式化 namenode, secondarynamenode, tasktracker |
| | 97 | /opt/hadoop# gedit conf/hadoop-site.xml |
| | 98 | }}} |
| | 99 | |
| | 100 | 編輯以下內容 |
| | 101 | |
| | 102 | {{{ |
| | 103 | #!diff |
| | 104 | --- hadoop-0.18.3/conf/hadoop-site.xml.org |
| | 105 | +++ hadoop-0.18.3/conf/hadoop-site.xml |
| | 106 | @@ -4,5 +4,31 @@ |
| | 107 | <!-- Put site-specific property overrides in this file. --> |
| | 108 | <configuration> |
| | 109 | - |
| | 110 | + <property> |
| | 111 | + <name>fs.default.name</name> |
| | 112 | + <value>hdfs://node1_ip:9000/</value> |
| | 113 | + <description> |
| | 114 | + The name of the default file system. Either the literal string |
| | 115 | + "local" or a host:port for NDFS. |
| | 116 | + </description> |
| | 117 | + </property> |
| | 118 | + <property> |
| | 119 | + <name>mapred.job.tracker</name> |
| | 120 | + <value>hdfs://node1_ip:9001</value> |
| | 121 | + <description> |
| | 122 | + The host and port that the MapReduce job tracker runs at. If |
| | 123 | + "local", then jobs are run in-process as a single map and |
| | 124 | + reduce task. |
| | 125 | + </description> |
| | 126 | + </property> |
| | 127 | + <property> |
| | 128 | + <name>hadoop.tmp.dir</name> |
| | 129 | + <value>/tmp/hadoop/hadoop-${user.name}</value> |
| | 130 | + <description>A base for other temporary directories.</description> |
| | 131 | + </property> |
| | 132 | </configuration> |
| | 133 | }}} |
| | 134 | |
| | 135 | == step 6. 設定masters及slaves == |
| | 136 | |
| | 137 | * 接著我們要編輯哪個主機當namenode, 若有其他主機則為datanodes |
| | 138 | * 編輯 conf/slaves |
| | 139 | {{{ |
| | 140 | /opt/hadoop$ gedit conf/hadoop-site.xml |
| | 141 | }}} |
| | 142 | 內容 |
| | 143 | {{{ |
| | 144 | #!diff |
| | 145 | --- hadoop/conf/slaves.org |
| | 146 | +++ hadoop/conf/slaves |
| | 147 | @@ -1,2 +1,5 @@ |
| | 148 | -localhost |
| | 149 | +node1 |
| | 150 | +node2 |
| | 151 | }}} |
| | 152 | |
| | 153 | == step 7. Hadoop_Home內的資料複製到其他主機上 == |
| | 154 | |
| | 155 | * 在node1上對遠端node2作開資料夾/opt/hadoop及權限設定 |
| | 156 | {{{ |
| | 157 | /opt/hadoop$ ssh node2_ip "sudo mkdir /opt/hadoop" |
| | 158 | /opt/hadoop$ ssh node2_ip "sudo chown -R hadoop:hadooper /opt/hadoop" |
| | 159 | }}} |
| | 160 | |
| | 161 | * 複製node1的hadoop資料夾到node2上 |
| | 162 | {{{ |
| | 163 | /opt/hadoop$ scp -r /opt/hadoop/* node2_ip:/opt/hadoop/ |
| | 164 | }}} |
| | 165 | |
| | 166 | == step 8. 格式化HDFS == |
| | 167 | |
| | 168 | |
| | 169 | * 以上我們已經安裝及設定好 Hadoop 的叢集環境,接著讓我們來啟動 Hadoop ,首先還是先格式化hdfs |
| | 170 | |
| | 171 | * 在node1 上操作 |
| | 172 | |
| 148 | | == step 7. 啟動Hadoop == |
| 149 | | |
| 150 | | * 接著用 start-all.sh 來啟動所有服務,包含 namenode, datanode, |
| 151 | | {{{ |
| 152 | | /opt/hadoop$ bin/start-all.sh |
| 153 | | }}} |
| 154 | | 執行畫面如: |
| 155 | | {{{ |
| 156 | | starting namenode, logging to /opt/hadoop/logs/hadoop-hadooper-namenode-vPro.out |
| 157 | | localhost: starting datanode, logging to /opt/hadoop/logs/hadoop-hadooper-datanode-vPro.out |
| 158 | | localhost: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-hadooper-secondarynamenode-vPro.out |
| 159 | | starting jobtracker, logging to /opt/hadoop/logs/hadoop-hadooper-jobtracker-vPro.out |
| 160 | | }}} |
| 161 | | |
| 162 | | == step 8. 完成!檢查運作狀態 == |
| | 198 | == step 9. 啟動Hadoop == |
| | 199 | |
| | 200 | * 在node1上,執行下面的命令啟動HDFS: |
| | 201 | $ bin/start-dfs.sh |
| | 202 | |
| | 203 | * bin/start-dfs.sh腳本會參照NameNode上${HADOOP_CONF_DIR}/slaves文件的內容,在所有列出的slave上啟動DataNode守護進程。 |
| | 204 | |
| | 205 | * 在node2上,執行下面的命令啟動Map/Reduce: |
| | 206 | {{{ |
| | 207 | $ ssh node2_ip "bin/start-mapred.sh" |
| | 208 | }}} |
| | 209 | |
| | 210 | * bin/start-mapred.sh腳本會參照JobTracker上${HADOOP_CONF_DIR}/slaves文件的內容,在所有列出的slave上啟動TaskTracker。 |
| | 211 | |
| | 212 | == step 8. 檢查運作狀態 == |