Changes between Version 16 and Version 17 of waue/2009/nutch_install
- Timestamp:
- Apr 24, 2009, 6:47:29 PM (16 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
waue/2009/nutch_install
v16 v17 101 101 = step 3 編輯設定檔 = 102 102 * 所有的設定檔都在 /opt/nutch/conf 下 103 == 3.1 hadoop-env.sh ==103 == 3.1 $NUTCH_HOME/conf/hadoop-env.sh == 104 104 * 將原本的檔案hadoop-env.sh任意處填入 105 {{{ 106 $ cd /opt/nutch/conf 107 $ gedit hadoop-env.sh 108 }}} 109 105 110 {{{ 106 111 #!sh … … 116 121 * 載入環境設定值 117 122 {{{ 118 $ source /opt/nutch/conf/hadoop-env.sh123 $ source ./hadoop-env.sh 119 124 }}} 120 125 * ps:強烈建議寫入 /etc/bash.bashrc 中比較萬無一失!! 121 126 122 127 123 == 3.2 conf/nutch-site.xml ==128 == 3.2 $NUTCH_HOME/conf/nutch-site.xml == 124 129 * 重要的設定檔,新增了必要的內容於內,然而想要瞭解更多參數資訊,請見nutch-default.xml 125 130 {{{ 126 $ vim conf/nutch-site.xml131 $ gedit nutch-site.xml 127 132 }}} 128 133 {{{ … … 198 203 }}} 199 204 200 == 3.3 crawl-urlfilter.txt ==205 == 3.3 $NUTCH_HOME/conf/crawl-urlfilter.txt == 201 206 * 重新編輯爬檔規則,此檔重要在於若設定不好,則爬出來的結果幾乎是空的,也就是說最後你的搜尋引擎都找不到資料啦! 202 207 {{{ 203 $ vim conf/crawl-urlfilter.txt208 $ gedit ./crawl-urlfilter.txt 204 209 }}} 205 210 {{{ … … 221 226 == 4.1 編輯url清單 == 222 227 {{{ 228 $ cd /opt/nutch 223 229 $ mkdir urls 224 230 $ echo "http://www.nchc.org.tw" >> ./urls/urls.txt