close Warning: Can't synchronize with repository "(default)" (/usr/svn/silverfile does not appear to be a Subversion repository.). Look in the Trac log for more information.

Changes between Version 5 and Version 6 of administration/search


Ignore:
Timestamp:
Apr 17, 2009, 2:33:45 PM (13 years ago)
Author:
hank
Comment:

work on search

Legend:

Unmodified
Added
Removed
Modified
  • administration/search

    v5 v6  
    181181Lets crawl the /usr/share/doc directory and create some test files.
    182182
    183  1 Create haystack.txt in /usr/share/doc/sed/
    184 {{{
     183 1. Create haystack.txt in /usr/share/doc/sed/
     184  {{{
    185185Nutch has found the haystack.
    186186haystack9
    187 }}}
    188 
    189  1 Create needle.txt in / and then create a symlink to it in /usr/share/doc/wget/
    190 {{{
     187  }}}
     188
     189 1. Create needle.txt in / and then create a symlink to it in /usr/share/doc/wget/
     190 {{{
    191191Symlink to /needle.txt
    192192Here is a needle:
    193193needle008
    194 }}}
    195 
    196 Symlink command:
    197 {{{
     194  }}}
     195
     196  Symlink command:
     197  {{{
    198198/usr/share/doc/wget/ ln -s /needle.txt needle.txt
    199 }}}
    200 
    201  1 Create jump.txt in /usr/share/
    202 {{{
     199  }}}
     200
     201 1. Create jump.txt in /usr/share/
     202  {{{
    203203nutch... you naughty crawler!
    204204 the secret code is meatball33
    205 }}}
     205  }}}
     206
     207Do a crawl on the /usr/doc/share:
     208 1. Edit crawl-urlfilter.txt and regex-urlfilter.txt
     209    Change top level directory to :
     210    +^file:///usr/share/doc/
     211    +^file:/usr/share/doc/
     212 1. Create file docurls with:
     213    file:///usr/share/doc/
     214 1. Run the crawl:
     215    {{{
     216    nutch crawl urls.doc -dir crawl.doc > crawl.doc.log
     217    }}}
     218
     219
    206220
    207221== Helpful Articles ==