Version 14 (modified by waue, 14 years ago) (diff) |
---|
crawlzilla 新版
v 1.0
目標
- 多人共用版本
- 網頁介面更新
- 加入排程等新功能
- 更新 nutch 版本至 1.2
- svn 庫上的安裝測試模式
- slave安裝可搭配網頁引導
系統分析
目錄結構
- /home/crawler/crawlzilla
目錄1 目錄2 說明 ./user/[admin,username]/ ./IDB/XXX/meta admin 為必有資料夾,username 為之後新增的使用者,XXX 為新增索引庫, meta 放每個索引庫的相關檔案 ./IDB/XXX/index~segments index~segments 為 lucene db 的必要五個資料夾 ./tmp 該使用者正在運算的IndexDB ./workspace hadoop 的運算資料夾 ./slave/ 給 slave 安裝需要的檔案 ./meta/ dialog 產生的中間檔 ./meta/tmp/ 暫存檔
- /opt/crawlzilla/
目錄1 目錄2 說明 ./tomcat ./webapps/UUU/XXX 對應到 UUU 的 XXX 索引庫 ./nutch nutch 的目錄 ./main/ 放 crawlzilla 的執行檔
- /var/log/crawlzilla/
目錄1 目錄2 說明 ./hadoop-logs ./hadoop-pids ./shell-logs ./tomcat-logs
新舊 檔案\目錄 對照
舊 ==> 新 說明 /home/crawler/crawlzilla/logs ==> 刪除此鍊結 /home/crawler/crawlzilla/nutch ==> 刪除此鍊結 /home/crawler/crawlzilla/source ==> /home/crawler/crawlzilla/slave /home/crawler/crawlzilla/archieve/_DBName_ ==> /home/crawler/crawlzilla/user/admin/IDB/_DBName_ /home/crawler/crawlzilla/tmp ==> /home/crawler/crawlzilla/tmp /home/crawler/crawlzilla/urls ==> /home/crawler/crawlzilla/meta/urls /home/crawler/crawlzilla/.metadata/_DBName_ ==> /home/crawler/crawlzilla/user/admin/IDB/_DBName_/meta /home/crawler/crawlzilla/.menu_tmp ==> /home/crawler/crawlzilla/meta/menu_tmp /home/crawler/crawlzilla/system/ ==> 於下說明
- /home/crawler/crawlzilla/system:
舊 ==> 新 說明 執行檔 ==> /opt/crawlzilla/main/執行檔 如 crawlzilla, install, go.sh ... lang/ ==> /opt/crawlzilla/main/lang/ 語言檔資料夾 hosts ==> /home/crawler/crawlzilla/meta/ hosts.old ==> /home/crawler/crawlzilla/meta/ hosts.bak ==> /home/crawler/crawlzilla/meta/ version ==> /opt/crawlzilla/version crawl_nodes ==> /home/crawler/crawlzilla/meta/ crawl_nodes.bak ==> /home/crawler/crawlzilla/meta/ crawl_nodes.old ==> /home/crawler/crawlzilla/meta/
環境參數
(以下為舊的)
- Crawlzilla_Install_PATH="/opt/crawlzilla"
- Tomcat_HOME="/opt/crawlzilla/tomcat"
- Crawlzilla_HOME="/home/crawler/crawlzilla"
- Work_Path=$Crawlzilla_HOME/system
- Manu_Tmp_Path="/home/crawler/crawlzilla/meta"
- Hadoop_Daemon="/opt/crawlzilla/nutch/bin/hadoop-daemon.sh"
- PID_Dir="/var/log/crawlzilla/hadoop-pids"
- Crawl_Nodes=$Crawlzilla_HOME/meta/crawl_nodes
功能
shell
- 狀態
- 運算設定
- 快速設定
- 網頁伺服器設定
- 多人版帳號管理
- 語言切換
- slave安裝提示