|
|||||||||
| Home >> All >> org >> htmlparser >> [ tags overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
org.htmlparser.tags
Class Tag

java.lang.Objectorg.htmlparser.Node
org.htmlparser.tags.Tag
- All Implemented Interfaces:
- java.io.Serializable
- Direct Known Subclasses:
- BaseHrefTag, BgSoundTag, CompositeTag, DoctypeTag, EndTag, FrameTag, ImageTag, InputTag, JspTag, LinkTagTag, MetaTag
- public class Tag
- extends org.htmlparser.Node
Tag represents a generic tag. This class allows users to register specific tag scanners, which can identify links, or image references. This tag asks the scanners to run over the text, and identify. It can be used to dynamically configure a parser.
| Field Summary | |
protected java.util.Hashtable |
attributes
tag parameters parsed into this hashtable not implemented yet added by Kaarle Kaila 23.10.2001 |
protected static java.util.HashSet |
breakTags
Set of tags that breaks the flow. |
private static java.lang.String |
EMPTY_STRING
|
static java.lang.String |
EMPTYTAG
|
private boolean |
emptyXmlTag
|
private static org.htmlparser.parserHelper.AttributeParser |
paramParser
|
private int |
startLine
The line number on which this tag starts |
private static int |
TAG_BEFORE_PARSING_STATE
|
private static int |
TAG_BEGIN_PARSING_STATE
|
private static int |
TAG_FINISHED_PARSING_STATE
|
private static int |
TAG_IGNORE_BEGIN_TAG_STATE
|
private static int |
TAG_IGNORE_DATA_STATE
|
private static int |
TAG_ILLEGAL_STATE
|
protected java.lang.StringBuffer |
tagContents
Tag contents will have the contents of the comment tag. |
private java.lang.String |
tagLine
|
private java.lang.String[] |
tagLines
The combined text of all the lines spanned by this tag |
static java.lang.String |
TAGNAME
Constant used as value for the value of the tag name in parseParameters (Kaarle Kaila 3.8.2001) |
private static org.htmlparser.parserHelper.TagParser |
tagParser
|
protected org.htmlparser.scanners.TagScanner |
thisScanner
Scanner associated with this tag (useful for extraction of filtering data from a HTML node) |
static java.lang.String |
TYPE
|
| Fields inherited from class org.htmlparser.Node |
lineSeparator, nodeBegin, nodeEnd, parent |
| Constructor Summary | |
Tag(org.htmlparser.tags.data.TagData tagData)
Set the Tag with the beginning posn, ending posn and tag contents (in a tagData object. |
|
| Method Summary | |
void |
accept(org.htmlparser.visitors.NodeVisitor visitor)
|
void |
append(char ch)
|
void |
append(java.lang.String ch)
|
boolean |
breaksFlow()
Determines if the given tag breaks the flow of text. |
void |
collectInto(org.htmlparser.util.NodeList collectionList,
java.lang.String filter)
This method verifies that the current tag matches the provided filter. |
private boolean |
containsMoreThanOneKey()
|
static java.lang.String |
extractWord(java.lang.String s)
Extract the first word from the given string. |
static Tag |
find(org.htmlparser.NodeReader reader,
java.lang.String input,
int position)
Locate the tag withing the input string, by parsing from the given position |
java.lang.String |
getAttribute(java.lang.String name)
In case the tag is parsed at the scan method this will return value of a parameter not implemented yet |
java.util.Hashtable |
getAttributes()
Gets the attributes in the tag. |
java.lang.String |
getParameter(java.lang.String name)
Deprecated. use getAttribute instead |
java.util.Hashtable |
getParsed()
Deprecated. This method is deprecated. Use getAttributes() instead. |
int |
getTagBegin()
Gets the nodeBegin. |
int |
getTagEnd()
Gets the nodeEnd. |
int |
getTagEndLine()
Gets the line number on which this tag ends. |
java.lang.String |
getTagLine()
Returns the line where the tag was found |
java.lang.String[] |
getTagLines()
Returns the combined text of all the lines spanned by this tag |
java.lang.String |
getTagName()
|
int |
getTagStartLine()
Gets the line number on which this tag starts. |
java.lang.String |
getText()
Return the text contained in this tag |
org.htmlparser.scanners.TagScanner |
getThisScanner()
Return the scanner associated with this tag. |
java.lang.String |
getType()
|
boolean |
isEmptyXmlTag()
Is this an empty xml tag of the form <tag/> |
private java.util.Hashtable |
parseAttributes()
This method is not to be called by any scanner or tag. |
java.util.Hashtable |
redoParseAttributes()
Sometimes, a scanner may need to request a re-evaluation of the attributes in a tag. |
org.htmlparser.Node |
scan(java.util.Map scanners,
java.lang.String url,
org.htmlparser.NodeReader reader)
Scan the tag to see using the registered scanners, and attempt identification. |
void |
setAttribute(java.lang.String key,
java.lang.String value)
Set attribute with given key, value pair. |
void |
setAttributes(java.util.Hashtable attributes)
Sets the parsed. |
void |
setEmptyXmlTag(boolean emptyXmlTag)
|
void |
setTagBegin(int tagBegin)
Sets the nodeBegin. |
void |
setTagEnd(int tagEnd)
Sets the nodeEnd. |
void |
setTagLine(java.lang.String newTagLine)
|
static void |
setTagParser(org.htmlparser.parserHelper.TagParser tagParser)
Sets the tagParser. |
void |
setText(java.lang.String text)
|
void |
setThisScanner(org.htmlparser.scanners.TagScanner scanner)
|
java.lang.String |
toHtml()
A call to a tag's toHTML() method will render it in HTML Most tags that do not have children and inherit from Tag, do not need to override toHTML(). |
java.lang.String |
toPlainTextString()
Returns a string representation of the node. |
java.lang.String |
toString()
Print the contents of the tag |
| Methods inherited from class org.htmlparser.Node |
collectInto, elementBegin, elementEnd, getLineSeparator, getParent, setLineSeparator, setParent, toHTML |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
TYPE
public static final java.lang.String TYPE
- See Also:
- Constant Field Values
TAGNAME
public static final java.lang.String TAGNAME
- Constant used as value for the value of the tag name in parseParameters
(Kaarle Kaila 3.8.2001)
- See Also:
- Constant Field Values
EMPTYTAG
public static final java.lang.String EMPTYTAG
- See Also:
- Constant Field Values
TAG_BEFORE_PARSING_STATE
private static final int TAG_BEFORE_PARSING_STATE
- See Also:
- Constant Field Values
TAG_BEGIN_PARSING_STATE
private static final int TAG_BEGIN_PARSING_STATE
- See Also:
- Constant Field Values
TAG_FINISHED_PARSING_STATE
private static final int TAG_FINISHED_PARSING_STATE
- See Also:
- Constant Field Values
TAG_ILLEGAL_STATE
private static final int TAG_ILLEGAL_STATE
- See Also:
- Constant Field Values
TAG_IGNORE_DATA_STATE
private static final int TAG_IGNORE_DATA_STATE
- See Also:
- Constant Field Values
TAG_IGNORE_BEGIN_TAG_STATE
private static final int TAG_IGNORE_BEGIN_TAG_STATE
- See Also:
- Constant Field Values
EMPTY_STRING
private static final java.lang.String EMPTY_STRING
- See Also:
- Constant Field Values
paramParser
private static org.htmlparser.parserHelper.AttributeParser paramParser
tagParser
private static org.htmlparser.parserHelper.TagParser tagParser
tagContents
protected java.lang.StringBuffer tagContents
- Tag contents will have the contents of the comment tag.
emptyXmlTag
private boolean emptyXmlTag
attributes
protected java.util.Hashtable attributes
- tag parameters parsed into this hashtable not implemented yet added by
Kaarle Kaila 23.10.2001
thisScanner
protected org.htmlparser.scanners.TagScanner thisScanner
- Scanner associated with this tag (useful for extraction of filtering data
from a HTML node)
tagLine
private java.lang.String tagLine
tagLines
private java.lang.String[] tagLines
- The combined text of all the lines spanned by this tag
startLine
private int startLine
- The line number on which this tag starts
breakTags
protected static java.util.HashSet breakTags
- Set of tags that breaks the flow.
| Constructor Detail |
Tag
public Tag(org.htmlparser.tags.data.TagData tagData)
- Set the Tag with the beginning posn, ending posn and tag contents (in a
tagData object.
| Method Detail |
append
public void append(char ch)
append
public void append(java.lang.String ch)
find
public static Tag find(org.htmlparser.NodeReader reader, java.lang.String input, int position)
- Locate the tag withing the input string, by parsing from the given
position
parseAttributes
private java.util.Hashtable parseAttributes()
- This method is not to be called by any scanner or tag. It is an expensive
method, hence it has been made private. However, there might be some
circumstances when a scanner wishes to force parsing of attributes over
and above what has already been parsed. To make the choice clear - we
have a method - redoParseAttributes(), which can be used.
getAttribute
public java.lang.String getAttribute(java.lang.String name)
- In case the tag is parsed at the scan method this will return value of a
parameter not implemented yet
setAttribute
public void setAttribute(java.lang.String key, java.lang.String value)
- Set attribute with given key, value pair.
getParameter
public java.lang.String getParameter(java.lang.String name)
- Deprecated. use getAttribute instead
- In case the tag is parsed at the scan method this will return value of a parameter not implemented yet
- In case the tag is parsed at the scan method this will return value of a parameter not implemented yet
getAttributes
public java.util.Hashtable getAttributes()
- Gets the attributes in the tag.
getTagName
public java.lang.String getTagName()
getTagLine
public java.lang.String getTagLine()
- Returns the line where the tag was found
getTagLines
public java.lang.String[] getTagLines()
- Returns the combined text of all the lines spanned by this tag
getText
public java.lang.String getText()
- Return the text contained in this tag
getThisScanner
public org.htmlparser.scanners.TagScanner getThisScanner()
- Return the scanner associated with this tag.
extractWord
public static java.lang.String extractWord(java.lang.String s)
- Extract the first word from the given string. Words are delimited by
whitespace or equals signs.
scan
public org.htmlparser.Node scan(java.util.Map scanners, java.lang.String url, org.htmlparser.NodeReader reader) throws org.htmlparser.util.ParserException
- Scan the tag to see using the registered scanners, and attempt
identification.
setAttributes
public void setAttributes(java.util.Hashtable attributes)
- Sets the parsed.
setTagBegin
public void setTagBegin(int tagBegin)
- Sets the nodeBegin.
getTagBegin
public int getTagBegin()
- Gets the nodeBegin.
setTagEnd
public void setTagEnd(int tagEnd)
- Sets the nodeEnd.
getTagEnd
public int getTagEnd()
- Gets the nodeEnd.
getTagStartLine
public int getTagStartLine()
- Gets the line number on which this tag starts.
getTagEndLine
public int getTagEndLine()
- Gets the line number on which this tag ends.
setTagLine
public void setTagLine(java.lang.String newTagLine)
setText
public void setText(java.lang.String text)
setThisScanner
public void setThisScanner(org.htmlparser.scanners.TagScanner scanner)
toPlainTextString
public java.lang.String toPlainTextString()
- Description copied from class:
org.htmlparser.Node - Returns a string representation of the node. This is an important method,
it allows a simple string transformation of a web page, regardless of a
node.
Typical application code (for extracting only the text from a web page) would then be simplified to :
Node node; for (Enumeration e = parser.elements(); e.hasMoreElements();) { node = (Node) e.nextElement(); System.out.println(node.toPlainTextString()); // Or do whatever processing you wish with the plain text string }
toHtml
public java.lang.String toHtml()
- A call to a tag's toHTML() method will render it in HTML Most tags that
do not have children and inherit from Tag, do not need to override
toHTML().
containsMoreThanOneKey
private boolean containsMoreThanOneKey()
toString
public java.lang.String toString()
- Print the contents of the tag
setTagParser
public static void setTagParser(org.htmlparser.parserHelper.TagParser tagParser)
- Sets the tagParser.
breaksFlow
public boolean breaksFlow()
- Determines if the given tag breaks the flow of text.
collectInto
public void collectInto(org.htmlparser.util.NodeList collectionList, java.lang.String filter)
- This method verifies that the current tag matches the provided filter.
The match is based on the string object and not its contents, so ensure
that you are using static final filter strings provided in the tag
classes.
getParsed
public java.util.Hashtable getParsed()
- Deprecated. This method is deprecated. Use getAttributes() instead.
- Returns table of attributes in the tag
- Returns table of attributes in the tag
redoParseAttributes
public java.util.Hashtable redoParseAttributes()
- Sometimes, a scanner may need to request a re-evaluation of the
attributes in a tag. This may happen when there is some correction
activity. An example of its usage can be found in ImageTag.
Note: This is an intensive task, hence call only when really necessary
accept
public void accept(org.htmlparser.visitors.NodeVisitor visitor)
getType
public java.lang.String getType()
isEmptyXmlTag
public boolean isEmptyXmlTag()
- Is this an empty xml tag of the form
<tag/>
setEmptyXmlTag
public void setEmptyXmlTag(boolean emptyXmlTag)
|
|||||||||
| Home >> All >> org >> htmlparser >> [ tags overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
JAVADOC