java.lang.Object
org.htmlparser.scanners.TagScanner
- All Implemented Interfaces:
- java.io.Serializable
- Direct Known Subclasses:
- BaseHrefScanner, BgSoundScanner, CompositeTagScanner, DoctypeScanner, FrameScanner, ImageScanner, InputTagScanner, JspScanner, LinkTagScanner, MetaTagScanner
- public abstract class TagScanner
- extends java.lang.Object
- implements java.io.Serializable
TagScanner is an abstract superclass which is subclassed to create specific
scanners, that operate on a tag's strings, identify it, and can extract data
from it.
If you wish to write your own scanner, then you must implement scan(). You
MAY implement evaluate() as well, if your evaluation logic is not based on a
simple text match. You MUST implement getID() - which identifies your scanner
uniquely in the hashtable of scanners.
Also, you have a feedback object provided to you, should you want to send log
messages. This object is instantiated by Parser when a scanner is added to
its collection.
|
Constructor Summary |
TagScanner()
Default Constructor, automatically registers the scanner into a static
array of scanners inside Tag |
TagScanner(java.lang.String filter)
This constructor automatically registers the scanner, and sets the filter
for this tag. |
|
Method Summary |
java.lang.String |
absorb(java.lang.String s,
char c)
Insert the method's description here. |
static java.lang.String |
absorbLeadingBlanks(java.lang.String s)
Remove whitespace from the front of the given string. |
static java.util.Map |
adjustScanners(org.htmlparser.NodeReader reader)
|
org.htmlparser.tags.Tag |
createScannedNode(org.htmlparser.tags.Tag tag,
java.lang.String url,
org.htmlparser.NodeReader reader,
java.lang.String currLine)
|
protected org.htmlparser.tags.Tag |
createTag(org.htmlparser.tags.data.TagData tagData,
org.htmlparser.tags.Tag tag,
java.lang.String url)
Override this method to create your own tag type |
boolean |
evaluate(java.lang.String s,
TagScanner previousOpenScanner)
This method is used to decide if this scanner can handle this tag type. |
static java.lang.String |
extractXMLData(org.htmlparser.Node node,
java.lang.String tagName,
org.htmlparser.NodeReader reader)
|
java.lang.String |
getFilter()
|
abstract java.lang.String[] |
getID()
|
protected org.htmlparser.tags.Tag |
getInsertedEndTag(org.htmlparser.tags.Tag tag,
org.htmlparser.NodeReader reader,
java.lang.String currentLine)
|
protected org.htmlparser.tags.Tag |
getReplacedEndTag(org.htmlparser.tags.Tag tag,
org.htmlparser.NodeReader reader,
java.lang.String currentLine)
|
java.lang.String |
insertEndTagBeforeNode(org.htmlparser.Node node,
java.lang.String currentLine)
Insert an EndTag in the currentLine, just before the occurence of the
provided tag |
static boolean |
isXMLTagFound(org.htmlparser.Node node,
java.lang.String tagName)
|
java.lang.String |
removeChars(java.lang.String s,
java.lang.String occur)
|
java.lang.String |
replaceFaultyTagWithEndTag(org.htmlparser.tags.Tag tag,
java.lang.String currentLine)
|
static void |
restoreScanners(org.htmlparser.NodeReader pReader,
java.util.Hashtable tempScanners)
|
org.htmlparser.tags.Tag |
scan(org.htmlparser.tags.Tag tag,
java.lang.String url,
org.htmlparser.NodeReader reader,
java.lang.String currLine)
Scan the tag and extract the information related to this type. |
void |
setFeedback(org.htmlparser.util.ParserFeedback feedback)
|
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
filter
protected java.lang.String filter
- A filter which is used to associate this tag. The filter contains a
string that is used to match which tags are to be allowed to pass
through. This can be useful when one wishes to dynamically filter out all
tags except one type which may be programmed later than the parser. Is
also useful for command line implementations of the parser.
feedback
protected org.htmlparser.util.ParserFeedback feedback
- HTMLParserFeedback object automatically initialized
TagScanner
public TagScanner()
- Default Constructor, automatically registers the scanner into a static
array of scanners inside Tag
TagScanner
public TagScanner(java.lang.String filter)
- This constructor automatically registers the scanner, and sets the filter
for this tag.
absorb
public java.lang.String absorb(java.lang.String s,
char c)
- Insert the method's description here. Creation date: (6/4/2001 11:44:09
AM)
absorbLeadingBlanks
public static java.lang.String absorbLeadingBlanks(java.lang.String s)
- Remove whitespace from the front of the given string.
evaluate
public boolean evaluate(java.lang.String s,
TagScanner previousOpenScanner)
- This method is used to decide if this scanner can handle this tag type.
If the evaluation returns true, the calling side makes a call to scan().
This method has to be implemented meaningfully only if a
first-word match with the scanner id does not imply a match (or extra
processing needs to be done). Default returns true
extractXMLData
public static java.lang.String extractXMLData(org.htmlparser.Node node,
java.lang.String tagName,
org.htmlparser.NodeReader reader)
throws org.htmlparser.util.ParserException
getFilter
public java.lang.String getFilter()
isXMLTagFound
public static boolean isXMLTagFound(org.htmlparser.Node node,
java.lang.String tagName)
createScannedNode
public final org.htmlparser.tags.Tag createScannedNode(org.htmlparser.tags.Tag tag,
java.lang.String url,
org.htmlparser.NodeReader reader,
java.lang.String currLine)
throws org.htmlparser.util.ParserException
scan
public org.htmlparser.tags.Tag scan(org.htmlparser.tags.Tag tag,
java.lang.String url,
org.htmlparser.NodeReader reader,
java.lang.String currLine)
throws org.htmlparser.util.ParserException
- Scan the tag and extract the information related to this type. The url of
the initiating scan has to be provided in case relative links are found.
The initial url is then prepended to it to give an absolute link. The
NodeReader is provided in order to do a lookahead operation. We assume
that the identification has already been performed using the evaluate()
method.
removeChars
public java.lang.String removeChars(java.lang.String s,
java.lang.String occur)
getID
public abstract java.lang.String[] getID()
setFeedback
public final void setFeedback(org.htmlparser.util.ParserFeedback feedback)
adjustScanners
public static java.util.Map adjustScanners(org.htmlparser.NodeReader reader)
restoreScanners
public static void restoreScanners(org.htmlparser.NodeReader pReader,
java.util.Hashtable tempScanners)
insertEndTagBeforeNode
public java.lang.String insertEndTagBeforeNode(org.htmlparser.Node node,
java.lang.String currentLine)
- Insert an EndTag in the currentLine, just before the occurence of the
provided tag
createTag
protected org.htmlparser.tags.Tag createTag(org.htmlparser.tags.data.TagData tagData,
org.htmlparser.tags.Tag tag,
java.lang.String url)
throws org.htmlparser.util.ParserException
- Override this method to create your own tag type
getReplacedEndTag
protected org.htmlparser.tags.Tag getReplacedEndTag(org.htmlparser.tags.Tag tag,
org.htmlparser.NodeReader reader,
java.lang.String currentLine)
replaceFaultyTagWithEndTag
public java.lang.String replaceFaultyTagWithEndTag(org.htmlparser.tags.Tag tag,
java.lang.String currentLine)
getInsertedEndTag
protected org.htmlparser.tags.Tag getInsertedEndTag(org.htmlparser.tags.Tag tag,
org.htmlparser.NodeReader reader,
java.lang.String currentLine)