close
Warning:
Can't synchronize with repository "(default)" (Unsupported version control system "svn": /usr/lib/python2.7/dist-packages/libsvn/_fs.so: failed to map segment from shared object: Cannot allocate memory). Look in the Trac log for more information.
- Timestamp:
-
Jul 15, 2010, 7:19:25 PM (16 years ago)
- Author:
-
waue
- Comment:
-
--
Legend:
- Unmodified
- Added
- Removed
- Modified
-
|
v2
|
v3
|
|
| 19 | 19 | || my_tomcat_dir || 原本被用來放nutch網頁(在tomcat內)的資料夾 || /opt/nutchez/tomcat/webapps/ROOT/ || |
| 20 | 20 | |
| 21 | | = 1. 安裝 = |
| | 21 | = 1. 修改程式碼 = |
| | 22 | |
| 22 | 23 | * 安裝必要工具(java 已經安裝) |
| 23 | 24 | {{{ |
| … |
… |
|
| 118 | 119 | }}} |
| 119 | 120 | |
| | 121 | = 重編 nutch = |
| 120 | 122 | * 重新編譯 nutch-1.0 |
| 121 | 123 | {{{ |
| 122 | 124 | ant |
| 123 | 125 | }}} |
| 124 | | |
| 125 | 126 | |
| 126 | 127 | * 完成則多一個資料夾 build, |
| … |
… |
|
| 138 | 139 | |
| 139 | 140 | * 最後,將nutch-job-1.0.jar複製到我的nutchez資料夾內取代使用 |
| | 141 | * (下面的步驟小心的把原本的job作備份,也可以不用,改用新編出來的直接取代) |
| 140 | 142 | |
| 141 | 143 | {{{ |
| 142 | 144 | cd nutch-1.0 |
| 143 | | sudo mv /opt/nutchez/nutch/nutch-1.0.job /opt/nutchez/nutch/nutch-1.0-ori.job |
| 144 | | sudo cp build/nutch-1.0.job /opt/nutchez/nutch/nutch-1.0-ika-waue-100715.job |
| | 145 | sudo mv $my_nutch_dir/nutch-1.0.job $my_nutch_dir/nutch-1.0-ori.job |
| | 146 | sudo cp build/nutch-1.0.job $my_nutch_dir/nutch-1.0-ika-waue-100715.job |
| 145 | 147 | cp build/nutch-1.0.job |
| 146 | | sudo ln -sf /opt/nutchez/nutch/nutch-1.0-ika-waue-100715.job /opt/nutchez/nutch/nutch-1.0.job |
| | 148 | sudo ln -sf $my_nutch_dir/nutch-1.0-ika-waue-100715.job $my_nutch_dir/nutch-1.0.job |
| 147 | 149 | }}} |
| 148 | 150 | |
| 149 | | * 把nutch-1.0.war重新生成的文件夾下lib中的nutch-1.0.jar跟換成你新生成的 nutch-1.0.jar文件,還要放入ik分詞器的jar文件 |
| | 151 | * 把nutch-1.0.war重新生成的文件夾下lib中的nutch-1.0.jar,跟換成你新生成的 nutch-1.0.jar文件,還要放入ik分詞器的jar文件 |
| 150 | 152 | * 最後爬取,搜索的結果就是按ik分過的中文詞 |
| | 153 | |
| 151 | 154 | {{{ |
| 152 | 155 | cd nutch-1.0/ |
| 153 | | cp lib/IKAnalyzer3.1.6GA.jar /opt/nutchez/nutch/lib/ |
| 154 | | |
| | 156 | cp lib/IKAnalyzer3.1.6GA.jar $my_nutch_dir/lib/ |
| 155 | 157 | cd /opt/nutchez/tomcat/webapps/ROOT/WEB-INF/lib |
| 156 | 158 | cp nutch-1.0/build/nutch-1.0-ika.jar ./ |
| … |
… |
|
| 158 | 160 | }}} |
| 159 | 161 | |
| 160 | | 完成 |
| 161 | | |
| | 162 | = 完成 = |
| | 163 | * 爬取後的資料就會依照中文分詞來分。 |
| | 164 | * 即使用原始的nutch所crawl下來的資料,將 重編後的nutch-1.0.jar nutch-1.0.job IKAnalyzer3.1.6 正確放到你原本的nutch搜尋網頁中,重新啟動tomcat後,也可直接享用有分詞的結果 |
| 162 | 165 | [[Image(2010-07-15-190139_751x697_scrot.png)]] |