Docjar: A Java Source and Docuemnt Enginecom.*    java.*    javax.*    org.*    all    new    plug-in

Quick Search    Search Deep

org.htmlparser.tags
Class Tag  view Tag download Tag.java

java.lang.Object
  extended byorg.htmlparser.Node
      extended byorg.htmlparser.tags.Tag
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
BaseHrefTag, BgSoundTag, CompositeTag, DoctypeTag, EndTag, FrameTag, ImageTag, InputTag, JspTag, LinkTagTag, MetaTag

public class Tag
extends org.htmlparser.Node

Tag represents a generic tag. This class allows users to register specific tag scanners, which can identify links, or image references. This tag asks the scanners to run over the text, and identify. It can be used to dynamically configure a parser.


Field Summary
protected  java.util.Hashtable attributes
          tag parameters parsed into this hashtable not implemented yet added by Kaarle Kaila 23.10.2001
protected static java.util.HashSet breakTags
          Set of tags that breaks the flow.
private static java.lang.String EMPTY_STRING
           
static java.lang.String EMPTYTAG
           
private  boolean emptyXmlTag
           
private static org.htmlparser.parserHelper.AttributeParser paramParser
           
private  int startLine
          The line number on which this tag starts
private static int TAG_BEFORE_PARSING_STATE
           
private static int TAG_BEGIN_PARSING_STATE
           
private static int TAG_FINISHED_PARSING_STATE
           
private static int TAG_IGNORE_BEGIN_TAG_STATE
           
private static int TAG_IGNORE_DATA_STATE
           
private static int TAG_ILLEGAL_STATE
           
protected  java.lang.StringBuffer tagContents
          Tag contents will have the contents of the comment tag.
private  java.lang.String tagLine
           
private  java.lang.String[] tagLines
          The combined text of all the lines spanned by this tag
static java.lang.String TAGNAME
          Constant used as value for the value of the tag name in parseParameters (Kaarle Kaila 3.8.2001)
private static org.htmlparser.parserHelper.TagParser tagParser
           
protected  org.htmlparser.scanners.TagScanner thisScanner
          Scanner associated with this tag (useful for extraction of filtering data from a HTML node)
static java.lang.String TYPE
           
 
Fields inherited from class org.htmlparser.Node
lineSeparator, nodeBegin, nodeEnd, parent
 
Constructor Summary
Tag(org.htmlparser.tags.data.TagData tagData)
          Set the Tag with the beginning posn, ending posn and tag contents (in a tagData object.
 
Method Summary
 void accept(org.htmlparser.visitors.NodeVisitor visitor)
           
 void append(char ch)
           
 void append(java.lang.String ch)
           
 boolean breaksFlow()
          Determines if the given tag breaks the flow of text.
 void collectInto(org.htmlparser.util.NodeList collectionList, java.lang.String filter)
          This method verifies that the current tag matches the provided filter.
private  boolean containsMoreThanOneKey()
           
static java.lang.String extractWord(java.lang.String s)
          Extract the first word from the given string.
static Tag find(org.htmlparser.NodeReader reader, java.lang.String input, int position)
          Locate the tag withing the input string, by parsing from the given position
 java.lang.String getAttribute(java.lang.String name)
          In case the tag is parsed at the scan method this will return value of a parameter not implemented yet
 java.util.Hashtable getAttributes()
          Gets the attributes in the tag.
 java.lang.String getParameter(java.lang.String name)
          Deprecated. use getAttribute instead
 java.util.Hashtable getParsed()
          Deprecated. This method is deprecated. Use getAttributes() instead.
 int getTagBegin()
          Gets the nodeBegin.
 int getTagEnd()
          Gets the nodeEnd.
 int getTagEndLine()
          Gets the line number on which this tag ends.
 java.lang.String getTagLine()
          Returns the line where the tag was found
 java.lang.String[] getTagLines()
          Returns the combined text of all the lines spanned by this tag
 java.lang.String getTagName()
           
 int getTagStartLine()
          Gets the line number on which this tag starts.
 java.lang.String getText()
          Return the text contained in this tag
 org.htmlparser.scanners.TagScanner getThisScanner()
          Return the scanner associated with this tag.
 java.lang.String getType()
           
 boolean isEmptyXmlTag()
          Is this an empty xml tag of the form
<tag/>
private  java.util.Hashtable parseAttributes()
          This method is not to be called by any scanner or tag.
 java.util.Hashtable redoParseAttributes()
          Sometimes, a scanner may need to request a re-evaluation of the attributes in a tag.
 org.htmlparser.Node scan(java.util.Map scanners, java.lang.String url, org.htmlparser.NodeReader reader)
          Scan the tag to see using the registered scanners, and attempt identification.
 void setAttribute(java.lang.String key, java.lang.String value)
          Set attribute with given key, value pair.
 void setAttributes(java.util.Hashtable attributes)
          Sets the parsed.
 void setEmptyXmlTag(boolean emptyXmlTag)
           
 void setTagBegin(int tagBegin)
          Sets the nodeBegin.
 void setTagEnd(int tagEnd)
          Sets the nodeEnd.
 void setTagLine(java.lang.String newTagLine)
           
static void setTagParser(org.htmlparser.parserHelper.TagParser tagParser)
          Sets the tagParser.
 void setText(java.lang.String text)
           
 void setThisScanner(org.htmlparser.scanners.TagScanner scanner)
           
 java.lang.String toHtml()
          A call to a tag's toHTML() method will render it in HTML Most tags that do not have children and inherit from Tag, do not need to override toHTML().
 java.lang.String toPlainTextString()
          Returns a string representation of the node.
 java.lang.String toString()
          Print the contents of the tag
 
Methods inherited from class org.htmlparser.Node
collectInto, elementBegin, elementEnd, getLineSeparator, getParent, setLineSeparator, setParent, toHTML
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

TYPE

public static final java.lang.String TYPE
See Also:
Constant Field Values

TAGNAME

public static final java.lang.String TAGNAME
Constant used as value for the value of the tag name in parseParameters (Kaarle Kaila 3.8.2001)

See Also:
Constant Field Values

EMPTYTAG

public static final java.lang.String EMPTYTAG
See Also:
Constant Field Values

TAG_BEFORE_PARSING_STATE

private static final int TAG_BEFORE_PARSING_STATE
See Also:
Constant Field Values

TAG_BEGIN_PARSING_STATE

private static final int TAG_BEGIN_PARSING_STATE
See Also:
Constant Field Values

TAG_FINISHED_PARSING_STATE

private static final int TAG_FINISHED_PARSING_STATE
See Also:
Constant Field Values

TAG_ILLEGAL_STATE

private static final int TAG_ILLEGAL_STATE
See Also:
Constant Field Values

TAG_IGNORE_DATA_STATE

private static final int TAG_IGNORE_DATA_STATE
See Also:
Constant Field Values

TAG_IGNORE_BEGIN_TAG_STATE

private static final int TAG_IGNORE_BEGIN_TAG_STATE
See Also:
Constant Field Values

EMPTY_STRING

private static final java.lang.String EMPTY_STRING
See Also:
Constant Field Values

paramParser

private static org.htmlparser.parserHelper.AttributeParser paramParser

tagParser

private static org.htmlparser.parserHelper.TagParser tagParser

tagContents

protected java.lang.StringBuffer tagContents
Tag contents will have the contents of the comment tag.


emptyXmlTag

private boolean emptyXmlTag

attributes

protected java.util.Hashtable attributes
tag parameters parsed into this hashtable not implemented yet added by Kaarle Kaila 23.10.2001


thisScanner

protected org.htmlparser.scanners.TagScanner thisScanner
Scanner associated with this tag (useful for extraction of filtering data from a HTML node)


tagLine

private java.lang.String tagLine

tagLines

private java.lang.String[] tagLines
The combined text of all the lines spanned by this tag


startLine

private int startLine
The line number on which this tag starts


breakTags

protected static java.util.HashSet breakTags
Set of tags that breaks the flow.

Constructor Detail

Tag

public Tag(org.htmlparser.tags.data.TagData tagData)
Set the Tag with the beginning posn, ending posn and tag contents (in a tagData object.

Method Detail

append

public void append(char ch)

append

public void append(java.lang.String ch)

find

public static Tag find(org.htmlparser.NodeReader reader,
                       java.lang.String input,
                       int position)
Locate the tag withing the input string, by parsing from the given position


parseAttributes

private java.util.Hashtable parseAttributes()
This method is not to be called by any scanner or tag. It is an expensive method, hence it has been made private. However, there might be some circumstances when a scanner wishes to force parsing of attributes over and above what has already been parsed. To make the choice clear - we have a method - redoParseAttributes(), which can be used.


getAttribute

public java.lang.String getAttribute(java.lang.String name)
In case the tag is parsed at the scan method this will return value of a parameter not implemented yet


setAttribute

public void setAttribute(java.lang.String key,
                         java.lang.String value)
Set attribute with given key, value pair.


getParameter

public java.lang.String getParameter(java.lang.String name)
Deprecated. use getAttribute instead

In case the tag is parsed at the scan method this will return value of a parameter not implemented yet


getAttributes

public java.util.Hashtable getAttributes()
Gets the attributes in the tag.


getTagName

public java.lang.String getTagName()

getTagLine

public java.lang.String getTagLine()
Returns the line where the tag was found


getTagLines

public java.lang.String[] getTagLines()
Returns the combined text of all the lines spanned by this tag


getText

public java.lang.String getText()
Return the text contained in this tag


getThisScanner

public org.htmlparser.scanners.TagScanner getThisScanner()
Return the scanner associated with this tag.


extractWord

public static java.lang.String extractWord(java.lang.String s)
Extract the first word from the given string. Words are delimited by whitespace or equals signs.


scan

public org.htmlparser.Node scan(java.util.Map scanners,
                                java.lang.String url,
                                org.htmlparser.NodeReader reader)
                         throws org.htmlparser.util.ParserException
Scan the tag to see using the registered scanners, and attempt identification.


setAttributes

public void setAttributes(java.util.Hashtable attributes)
Sets the parsed.


setTagBegin

public void setTagBegin(int tagBegin)
Sets the nodeBegin.


getTagBegin

public int getTagBegin()
Gets the nodeBegin.


setTagEnd

public void setTagEnd(int tagEnd)
Sets the nodeEnd.


getTagEnd

public int getTagEnd()
Gets the nodeEnd.


getTagStartLine

public int getTagStartLine()
Gets the line number on which this tag starts.


getTagEndLine

public int getTagEndLine()
Gets the line number on which this tag ends.


setTagLine

public void setTagLine(java.lang.String newTagLine)

setText

public void setText(java.lang.String text)

setThisScanner

public void setThisScanner(org.htmlparser.scanners.TagScanner scanner)

toPlainTextString

public java.lang.String toPlainTextString()
Description copied from class: org.htmlparser.Node
Returns a string representation of the node. This is an important method, it allows a simple string transformation of a web page, regardless of a node.
Typical application code (for extracting only the text from a web page) would then be simplified to :
 Node node;
 for (Enumeration e = parser.elements(); e.hasMoreElements();) {
 	node = (Node) e.nextElement();
 	System.out.println(node.toPlainTextString()); // Or do whatever processing you wish with the plain text string
 }
 


toHtml

public java.lang.String toHtml()
A call to a tag's toHTML() method will render it in HTML Most tags that do not have children and inherit from Tag, do not need to override toHTML().


containsMoreThanOneKey

private boolean containsMoreThanOneKey()

toString

public java.lang.String toString()
Print the contents of the tag


setTagParser

public static void setTagParser(org.htmlparser.parserHelper.TagParser tagParser)
Sets the tagParser.


breaksFlow

public boolean breaksFlow()
Determines if the given tag breaks the flow of text.


collectInto

public void collectInto(org.htmlparser.util.NodeList collectionList,
                        java.lang.String filter)
This method verifies that the current tag matches the provided filter. The match is based on the string object and not its contents, so ensure that you are using static final filter strings provided in the tag classes.


getParsed

public java.util.Hashtable getParsed()
Deprecated. This method is deprecated. Use getAttributes() instead.

Returns table of attributes in the tag


redoParseAttributes

public java.util.Hashtable redoParseAttributes()
Sometimes, a scanner may need to request a re-evaluation of the attributes in a tag. This may happen when there is some correction activity. An example of its usage can be found in ImageTag.
Note: This is an intensive task, hence call only when really necessary


accept

public void accept(org.htmlparser.visitors.NodeVisitor visitor)

getType

public java.lang.String getType()

isEmptyXmlTag

public boolean isEmptyXmlTag()
Is this an empty xml tag of the form
<tag/>


setEmptyXmlTag

public void setEmptyXmlTag(boolean emptyXmlTag)