Changes between Version 1 and Version 2 of waue/2010/0526


Ignore:
Timestamp:
May 27, 2010, 4:31:58 PM (14 years ago)
Author:
waue
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • waue/2010/0526

    v1 v2  
    22 * [https://issues.apache.org/jira/browse/NUTCH-427]
    33
    4 A.  Introduction
     4安裝方法
    55
    6     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
    7     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate the
    8     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs library and also
    9     support all the properties from the JCifs library.
     61.  下載 protocol-smb 最新檔,解壓縮此檔,假定壓縮後的資料夾名稱為 $pro-smb-dir
    107
    11     You can find more information on the following site: http://jcifs.samba.org/
    12     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
    13    
    14 B.  Installation
     8[https://issues.apache.org/jira/secure/attachment/12442365/protocol-smb-dist.zip]
    159
    16     1) Binaries only: 
     102. 將 $pro-smb-dir/build/plugins/內的 '''protocol-smb''' 資料夾 (內的 三個檔案 jcifs-1.3.0.jar  plugin.xml  protocol-smb.jar)
     11複製到  '''$nutch_home/plugin/''' 去,
    1712
    18  The protocol-smb files can be found in the ../plugins directory.
     133. 修改 $nutch_home/conf/nutch-site.xml
    1914
    20                                 Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
     15{{{
     16#!xml
     17<property>
     18<name>plugin.includes</name>
     19<value>protocol-smb| other plugins...</value>
     20<description>
     21</description>
     22</property>
     23}}}
    2124
    22                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
     254. 將 $pro-smb-dir/conf/smb.properties 複製到  $nutch_home/conf/,並設定數值
    2326
    24                         Configure the properties in "smb.properties" file
     275. url 格式為 smb://server/share
    2528
    26                         Enable the plugin by updating "nutch-site.xml" file found in NUTCHHOME/conf directory
     296. 進行 nutch 爬取
     30{{{
     31#!sh
     32#!/bin/bash
     33crawl_dep=$1
     34echo $1
     35function debug_echo () {
     36  if [ $? -eq 0 ]; then
     37      echo "$1 finished "
     38  else
     39      echo "$1 is error"
     40      exit
     41  fi
     42}
     43source /opt/nutchez/nutch/conf/hadoop-env.sh
     44debug_echo "import hadoop-env.sh"
     45echo "delete search (local,hdfs) and urls (hdfs) "
     46rm -rf /home/nutchuser/nutchez/search
     47/opt/nutchez/nutch/bin/hadoop dfs -rmr urls search
     48/opt/nutchez/nutch/bin/hadoop dfs -put /home/nutchuser/nutchez/urls urls
     49#
     50/opt/nutchez/nutch/bin/nutch crawl urls -dir search -depth $crawl_dep -topN 5000 -threads 1000
     51debug_echo "nutch crawl"
     52#
     53/opt/nutchez/nutch/bin/hadoop dfs -get search /home/nutchuser/nutchez/search
     54debug_echo "download search"
     55#
     56/opt/nutchez/tomcat/bin/shutdown.sh
     57/opt/nutchez/tomcat/bin/startup.sh
     58debug_echo "tomcat restart"
     59}}}
    2760
    28                                 e.g. <property>
    29                                         <name>plugin.includes</name>
    30                                         <value>protocol-smb| other plugins...</value>
    31                                         <description>
    32                                         </description>
    33                                      </property>
     61 * 遇到問題
    3462
     63{{{
     64#!txt
     652010-05-27 14:07:19,417 WARN org.apache.nutch.crawl.Injector: Skipping smb://140.110.138.179/share:java.net.MalformedURLException: unknown protocol: smb
     66}}}
     67
     68 * 試著用以下方法解決:
     69{{{
     70#!txt
     71a) a short term solutions will be to installed the JCIFS jar
     72library found in protocol-smb folder in
     73JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
     74
     75b) After completing step a), if the exeception is still thrown
     76set the System properties by passing the following arguments
     77to the JVM:
     78
     79-Djava.protocol.handler.pkgs=jcifs
     80
     81c) You can set the property also in your Code for example if
     82you start Crawling with org.apache.nutch.crawl.Crawl
     83Add the following two lines. This will be the Same like in b)
     84public static void main(String args[]) throws Exception {
     85System.setProperty("java.protocol.handler.pkgs", "jcifs");
     86new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
     87//and so on
     88
     89Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
     90}}}
     91
     92但是此warn 還是沒有解決,以至沒有入口點。於是到 http://jcifs.samba.org/src/docs/faq.html
     93
     94將jcifs專案單獨測試,
     95{{{
     96#!java
     97import java.net.MalformedURLException;
     98import java.text.SimpleDateFormat;
     99import java.util.Date;
     100import java.util.GregorianCalendar;
     101
     102import jcifs.smb.NtlmAuthenticator;
     103import jcifs.smb.NtlmPasswordAuthentication;
     104import jcifs.smb.SmbException;
     105import jcifs.smb.SmbFile;
     106
     107public class test {
     108
     109        /**
     110         * @param args
     111         * @throws MalformedURLException
     112         * @throws SmbException
     113         */
     114        public static void main(String[] args) throws MalformedURLException, SmbException {
     115                // TODO Auto-generated method stub
     116                String domain = "WORKSTATION";
     117                String username = "waue";
     118                String password = "cccccc";
     119                String server = "140.110.138.179";
     120                String share = "share";
     121                String directory = ".";
     122                SmbFile[] files = new SmbFile[0];
     123
     124        NtlmPasswordAuthentication auth = new NtlmPasswordAuthentication(domain,
     125                        username, password);
     126        String smburl = String.format("smb://%s/%s/%s/", server, share, directory);
     127//        SmbFile file = new SmbFile(smburl, auth);
     128        SmbFile file = new SmbFile(smburl);
     129        files = file.listFiles();
     130        System.err.println("file : ");
     131        for (SmbFile fi : files){
     132                System.err.println(fi.getName());
     133        }
     134        }
     135}
     136}}}
     137
     138得到結果
     139
     140{{{
     141file :
     142【影片】/
     143人月神話.pdf
     144其他/
     145【音樂】/
     146test.txt
     147【軟體】/
     148【照片】/
     149【遊戲】/
     150}}}
     151
     152證明此jcifs 在我的電腦可以 work,因此是 protocal-smb 與 nutch 之間的問題