Docjar: A Java Source and Docuemnt Enginecom.*    java.*    javax.*    org.*    all    new    plug-in

Quick Search    Search Deep

org.htmlparser.scanners
Class TagScanner  view TagScanner download TagScanner.java

java.lang.Object
  extended byorg.htmlparser.scanners.TagScanner
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
BaseHrefScanner, BgSoundScanner, CompositeTagScanner, DoctypeScanner, FrameScanner, ImageScanner, InputTagScanner, JspScanner, LinkTagScanner, MetaTagScanner

public abstract class TagScanner
extends java.lang.Object
implements java.io.Serializable

TagScanner is an abstract superclass which is subclassed to create specific scanners, that operate on a tag's strings, identify it, and can extract data from it.
If you wish to write your own scanner, then you must implement scan(). You MAY implement evaluate() as well, if your evaluation logic is not based on a simple text match. You MUST implement getID() - which identifies your scanner uniquely in the hashtable of scanners.
Also, you have a feedback object provided to you, should you want to send log messages. This object is instantiated by Parser when a scanner is added to its collection.


Field Summary
protected  org.htmlparser.util.ParserFeedback feedback
          HTMLParserFeedback object automatically initialized
protected  java.lang.String filter
          A filter which is used to associate this tag.
 
Constructor Summary
TagScanner()
          Default Constructor, automatically registers the scanner into a static array of scanners inside Tag
TagScanner(java.lang.String filter)
          This constructor automatically registers the scanner, and sets the filter for this tag.
 
Method Summary
 java.lang.String absorb(java.lang.String s, char c)
          Insert the method's description here.
static java.lang.String absorbLeadingBlanks(java.lang.String s)
          Remove whitespace from the front of the given string.
static java.util.Map adjustScanners(org.htmlparser.NodeReader reader)
           
 org.htmlparser.tags.Tag createScannedNode(org.htmlparser.tags.Tag tag, java.lang.String url, org.htmlparser.NodeReader reader, java.lang.String currLine)
           
protected  org.htmlparser.tags.Tag createTag(org.htmlparser.tags.data.TagData tagData, org.htmlparser.tags.Tag tag, java.lang.String url)
          Override this method to create your own tag type
 boolean evaluate(java.lang.String s, TagScanner previousOpenScanner)
          This method is used to decide if this scanner can handle this tag type.
static java.lang.String extractXMLData(org.htmlparser.Node node, java.lang.String tagName, org.htmlparser.NodeReader reader)
           
 java.lang.String getFilter()
           
abstract  java.lang.String[] getID()
           
protected  org.htmlparser.tags.Tag getInsertedEndTag(org.htmlparser.tags.Tag tag, org.htmlparser.NodeReader reader, java.lang.String currentLine)
           
protected  org.htmlparser.tags.Tag getReplacedEndTag(org.htmlparser.tags.Tag tag, org.htmlparser.NodeReader reader, java.lang.String currentLine)
           
 java.lang.String insertEndTagBeforeNode(org.htmlparser.Node node, java.lang.String currentLine)
          Insert an EndTag in the currentLine, just before the occurence of the provided tag
static boolean isXMLTagFound(org.htmlparser.Node node, java.lang.String tagName)
           
 java.lang.String removeChars(java.lang.String s, java.lang.String occur)
           
 java.lang.String replaceFaultyTagWithEndTag(org.htmlparser.tags.Tag tag, java.lang.String currentLine)
           
static void restoreScanners(org.htmlparser.NodeReader pReader, java.util.Hashtable tempScanners)
           
 org.htmlparser.tags.Tag scan(org.htmlparser.tags.Tag tag, java.lang.String url, org.htmlparser.NodeReader reader, java.lang.String currLine)
          Scan the tag and extract the information related to this type.
 void setFeedback(org.htmlparser.util.ParserFeedback feedback)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

filter

protected java.lang.String filter
A filter which is used to associate this tag. The filter contains a string that is used to match which tags are to be allowed to pass through. This can be useful when one wishes to dynamically filter out all tags except one type which may be programmed later than the parser. Is also useful for command line implementations of the parser.


feedback

protected org.htmlparser.util.ParserFeedback feedback
HTMLParserFeedback object automatically initialized

Constructor Detail

TagScanner

public TagScanner()
Default Constructor, automatically registers the scanner into a static array of scanners inside Tag


TagScanner

public TagScanner(java.lang.String filter)
This constructor automatically registers the scanner, and sets the filter for this tag.

Method Detail

absorb

public java.lang.String absorb(java.lang.String s,
                               char c)
Insert the method's description here. Creation date: (6/4/2001 11:44:09 AM)


absorbLeadingBlanks

public static java.lang.String absorbLeadingBlanks(java.lang.String s)
Remove whitespace from the front of the given string.


evaluate

public boolean evaluate(java.lang.String s,
                        TagScanner previousOpenScanner)
This method is used to decide if this scanner can handle this tag type. If the evaluation returns true, the calling side makes a call to scan(). This method has to be implemented meaningfully only if a first-word match with the scanner id does not imply a match (or extra processing needs to be done). Default returns true


extractXMLData

public static java.lang.String extractXMLData(org.htmlparser.Node node,
                                              java.lang.String tagName,
                                              org.htmlparser.NodeReader reader)
                                       throws org.htmlparser.util.ParserException

getFilter

public java.lang.String getFilter()

isXMLTagFound

public static boolean isXMLTagFound(org.htmlparser.Node node,
                                    java.lang.String tagName)

createScannedNode

public final org.htmlparser.tags.Tag createScannedNode(org.htmlparser.tags.Tag tag,
                                                       java.lang.String url,
                                                       org.htmlparser.NodeReader reader,
                                                       java.lang.String currLine)
                                                throws org.htmlparser.util.ParserException

scan

public org.htmlparser.tags.Tag scan(org.htmlparser.tags.Tag tag,
                                    java.lang.String url,
                                    org.htmlparser.NodeReader reader,
                                    java.lang.String currLine)
                             throws org.htmlparser.util.ParserException
Scan the tag and extract the information related to this type. The url of the initiating scan has to be provided in case relative links are found. The initial url is then prepended to it to give an absolute link. The NodeReader is provided in order to do a lookahead operation. We assume that the identification has already been performed using the evaluate() method.


removeChars

public java.lang.String removeChars(java.lang.String s,
                                    java.lang.String occur)

getID

public abstract java.lang.String[] getID()

setFeedback

public final void setFeedback(org.htmlparser.util.ParserFeedback feedback)

adjustScanners

public static java.util.Map adjustScanners(org.htmlparser.NodeReader reader)

restoreScanners

public static void restoreScanners(org.htmlparser.NodeReader pReader,
                                   java.util.Hashtable tempScanners)

insertEndTagBeforeNode

public java.lang.String insertEndTagBeforeNode(org.htmlparser.Node node,
                                               java.lang.String currentLine)
Insert an EndTag in the currentLine, just before the occurence of the provided tag


createTag

protected org.htmlparser.tags.Tag createTag(org.htmlparser.tags.data.TagData tagData,
                                            org.htmlparser.tags.Tag tag,
                                            java.lang.String url)
                                     throws org.htmlparser.util.ParserException
Override this method to create your own tag type


getReplacedEndTag

protected org.htmlparser.tags.Tag getReplacedEndTag(org.htmlparser.tags.Tag tag,
                                                    org.htmlparser.NodeReader reader,
                                                    java.lang.String currentLine)

replaceFaultyTagWithEndTag

public java.lang.String replaceFaultyTagWithEndTag(org.htmlparser.tags.Tag tag,
                                                   java.lang.String currentLine)

getInsertedEndTag

protected org.htmlparser.tags.Tag getInsertedEndTag(org.htmlparser.tags.Tag tag,
                                                    org.htmlparser.NodeReader reader,
                                                    java.lang.String currentLine)