Save This Page
Home » nutch-1.0 » org.apache.nutch » parse » js » [javadoc | source]
org.apache.nutch.parse.js
public class: JSParseFilter [javadoc | source]
java.lang.Object
   org.apache.nutch.parse.js.JSParseFilter

All Implemented Interfaces:
    HtmlParseFilter, Parser

This class is a heuristic link extractor for JavaScript files and code snippets. The general idea of a two-pass regex matching comes from Heritrix. Parts of the code come from OutlinkExtractor.java by Stephan Strittmatter.
Field Summary
public static final  Log LOG     
Method from org.apache.nutch.parse.js.JSParseFilter Summary:
filter,   getConf,   getParse,   main,   setConf
Methods from java.lang.Object:
equals,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.nutch.parse.js.JSParseFilter Detail:
 public ParseResult filter(Content content,
    ParseResult parseResult,
    HTMLMetaTags metaTags,
    DocumentFragment doc) 
 public Configuration getConf() 
 public ParseResult getParse(Content c) 
 public static  void main(String[] args) throws Exception 
 public  void setConf(Configuration conf)