close Warning: Can't synchronize with repository "(default)" (/usr/svn/silverfile does not appear to be a Subversion repository.). Look in the Trac log for more information.

Changes between Version 1 and Version 2 of administration/search


Ignore:
Timestamp:
Apr 12, 2009, 3:59:58 PM (13 years ago)
Author:
hank
Comment:

work on nutch

Legend:

Unmodified
Added
Removed
Modified
  • administration/search

    v1 v2  
    22is set to /usr/wwwapps/crawldir
    33
    4 http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html[[BR]]
     4http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html
     5
    56http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html
    67
     
    2324create files[[BR]]
    2425add file:///FILES/
     26
     27
     28
     29New Nutch:
     30wget http://nutch
     31
     32edit regex-urlfilter.txt [post here]
     33
     34edit crawl-urlfilter.txt [post here]
     35
     36edit nutch-site.xml [post here]
     37
     38download pdf libraries
     39{{{
     40cd src/plugin/parse-pdf/lib
     41wget http://pdfbox.cvs.sourceforge.net/viewvc/*checkout*/pdfbox/pdfbox/external/jai_codec.jar
     42wget http://pdfbox.cvs.sourceforge.net/viewvc/*checkout*/pdfbox/pdfbox/external/jai_core.jar
     43}}}
     44In src/plugin/parse-pdf/plugin.xml
     45{{{
     46<!-- Uncomment the following two lines after you have downloaded the
     47     libraries, see README.txt for more details.-->
     48<library name="jai_codec.jar"/>
     49<library name="jai_core.jar"/>
     50}}}
     51
     52Rebuild Nutch:
     53{{{
     54cd ..nutch-1.0/
     55ant jar
     56ant compile-plugins
     57ant war
     58}}}