Docjar: A Java Source and Docuemnt Enginecom.*    java.*    javax.*    org.*    all    new    plug-in

Quick Search    Search Deep

org.htmlparser.tags
Class CompositeTag  view CompositeTag download CompositeTag.java

java.lang.Object
  extended byorg.htmlparser.Node
      extended byorg.htmlparser.tags.Tag
          extended byorg.htmlparser.tags.CompositeTag
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
AppletTag, BodyTag, Bullet, BulletList, Div, FormTag, FrameSetTag, HeadTag, Html, LabelTag, LinkTag, OptionTag, ScriptTag, SelectTag, Span, StyleTag, TableColumn, TableRow, TableTag, TextareaTag, TitleTag

public abstract class CompositeTag
extends Tag


Field Summary
protected  org.htmlparser.util.NodeList childTags
           
protected  Tag endTag
           
protected  Tag startTag
           
 
Fields inherited from class org.htmlparser.tags.Tag
attributes, breakTags, EMPTYTAG, tagContents, TAGNAME, thisScanner, TYPE
 
Fields inherited from class org.htmlparser.Node
lineSeparator, nodeBegin, nodeEnd, parent
 
Constructor Summary
CompositeTag(org.htmlparser.tags.data.TagData tagData, org.htmlparser.tags.data.CompositeTagData compositeTagData)
           
 
Method Summary
 void accept(org.htmlparser.visitors.NodeVisitor visitor)
           
 org.htmlparser.Node childAt(int index)
          Get child at given index
 org.htmlparser.util.SimpleNodeIterator children()
           
 void collectInto(org.htmlparser.util.NodeList collectionList, java.lang.Class nodeType)
          Collect this node and its child nodes (if-applicable) into the collection parameter, provided the node satisfies the filtering criteria.
 void collectInto(org.htmlparser.util.NodeList collectionList, java.lang.String filter)
          This method verifies that the current tag matches the provided filter.
 org.htmlparser.StringNode[] digupStringNode(java.lang.String searchText)
          Finds a string node, however embedded it might be, and returns it.
 org.htmlparser.util.SimpleNodeIterator elements()
          Return the child tags as an iterator.
 int findPositionOf(org.htmlparser.Node searchNode)
          Returns the node number of a child node given the node object.
 int findPositionOf(java.lang.String text)
          Returns the node number of the string node containing the given text.
 org.htmlparser.Node getChild(int index)
           
 int getChildCount()
           
 org.htmlparser.util.NodeList getChildren()
           
 org.htmlparser.Node[] getChildrenAsNodeArray()
           
 java.lang.String getChildrenHTML()
           
 Tag getEndTag()
           
 Tag getStartTag()
           
protected  void putChildrenInto(java.lang.StringBuffer sb)
           
protected  void putEndTagInto(java.lang.StringBuffer sb)
           
 void putStartTagInto(java.lang.StringBuffer sb)
           
 Tag searchByName(java.lang.String name)
          Searches all children who for a name attribute.
 org.htmlparser.util.NodeList searchFor(java.lang.Class classType)
          Collect all objects that are of a certain type Note that this will not check for parent types, and will not recurse through child tags
 org.htmlparser.util.NodeList searchFor(java.lang.String searchString)
          Searches for any node whose text representation contains the search string.
 org.htmlparser.util.NodeList searchFor(java.lang.String searchString, boolean caseSensitive)
          Searches for any node whose text representation contains the search string.
 java.lang.String toHtml()
          A call to a tag's toHTML() method will render it in HTML Most tags that do not have children and inherit from Tag, do not need to override toHTML().
 java.lang.String toPlainTextString()
          Returns a string representation of the node.
 
Methods inherited from class org.htmlparser.tags.Tag
append, append, breaksFlow, extractWord, find, getAttribute, getAttributes, getParameter, getParsed, getTagBegin, getTagEnd, getTagEndLine, getTagLine, getTagLines, getTagName, getTagStartLine, getText, getThisScanner, getType, isEmptyXmlTag, redoParseAttributes, scan, setAttribute, setAttributes, setEmptyXmlTag, setTagBegin, setTagEnd, setTagLine, setTagParser, setText, setThisScanner, toString
 
Methods inherited from class org.htmlparser.Node
elementBegin, elementEnd, getLineSeparator, getParent, setLineSeparator, setParent, toHTML
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

startTag

protected Tag startTag

endTag

protected Tag endTag

childTags

protected org.htmlparser.util.NodeList childTags
Constructor Detail

CompositeTag

public CompositeTag(org.htmlparser.tags.data.TagData tagData,
                    org.htmlparser.tags.data.CompositeTagData compositeTagData)
Method Detail

children

public org.htmlparser.util.SimpleNodeIterator children()

getChild

public org.htmlparser.Node getChild(int index)

getChildrenAsNodeArray

public org.htmlparser.Node[] getChildrenAsNodeArray()

getChildren

public org.htmlparser.util.NodeList getChildren()

elements

public org.htmlparser.util.SimpleNodeIterator elements()
Return the child tags as an iterator. Equivalent to calling getChildren ().elements ().


toPlainTextString

public java.lang.String toPlainTextString()
Description copied from class: org.htmlparser.Node
Returns a string representation of the node. This is an important method, it allows a simple string transformation of a web page, regardless of a node.
Typical application code (for extracting only the text from a web page) would then be simplified to :
 Node node;
 for (Enumeration e = parser.elements(); e.hasMoreElements();) {
 	node = (Node) e.nextElement();
 	System.out.println(node.toPlainTextString()); // Or do whatever processing you wish with the plain text string
 }
 

Overrides:
toPlainTextString in class Tag

putStartTagInto

public void putStartTagInto(java.lang.StringBuffer sb)

putChildrenInto

protected void putChildrenInto(java.lang.StringBuffer sb)

putEndTagInto

protected void putEndTagInto(java.lang.StringBuffer sb)

toHtml

public java.lang.String toHtml()
Description copied from class: Tag
A call to a tag's toHTML() method will render it in HTML Most tags that do not have children and inherit from Tag, do not need to override toHTML().

Overrides:
toHtml in class Tag

searchByName

public Tag searchByName(java.lang.String name)
Searches all children who for a name attribute. Returns first match.


searchFor

public org.htmlparser.util.NodeList searchFor(java.lang.String searchString,
                                              boolean caseSensitive)
Searches for any node whose text representation contains the search string. Collects all such nodes in a NodeList. e.g. if you wish to find any textareas in a form tag containing "hello world", the code would be : NodeList nodeList = formTag.searchFor("Hello World");


searchFor

public org.htmlparser.util.NodeList searchFor(java.lang.Class classType)
Collect all objects that are of a certain type Note that this will not check for parent types, and will not recurse through child tags


searchFor

public org.htmlparser.util.NodeList searchFor(java.lang.String searchString)
Searches for any node whose text representation contains the search string. Collects all such nodes in a NodeList. e.g. if you wish to find any textareas in a form tag containing "hello world", the code would be : NodeList nodeList = formTag.searchFor("Hello World"); This search is case-insensitive.


findPositionOf

public int findPositionOf(java.lang.String text)
Returns the node number of the string node containing the given text. This can be useful to index into the composite tag and get other children.


findPositionOf

public int findPositionOf(org.htmlparser.Node searchNode)
Returns the node number of a child node given the node object. This would typically be used in conjuction with digUpStringNode, after which the string node's parent can be used to find the string node's position. Faster than calling findPositionOf(text) again. Note that the position is at a linear level alone - there is no recursion in this method.


childAt

public org.htmlparser.Node childAt(int index)
Get child at given index


collectInto

public void collectInto(org.htmlparser.util.NodeList collectionList,
                        java.lang.String filter)
Description copied from class: Tag
This method verifies that the current tag matches the provided filter. The match is based on the string object and not its contents, so ensure that you are using static final filter strings provided in the tag classes.

Overrides:
collectInto in class Tag

collectInto

public void collectInto(org.htmlparser.util.NodeList collectionList,
                        java.lang.Class nodeType)
Description copied from class: org.htmlparser.Node
Collect this node and its child nodes (if-applicable) into the collection parameter, provided the node satisfies the filtering criteria.

This mechanism allows powerful filtering code to be written very easily, without bothering about collection of embedded tags separately. e.g. when we try to get all the links on a page, it is not possible to get it at the top-level, as many tags (like form tags), can contain links embedded in them. We could get the links out by checking if the current node is a form tag, and going through its contents. However, this ties us down to specific tags, and is not a very clean approach.

Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like :

 NodeList collectionList = new NodeList();
 Node node;
 for (NodeIterator e = parser.elements(); e.hasMoreNodes();) {
 	node = e.nextNode();
 	node.collectInto(collectionVector, LinkTag.class);
 }
 
Thus, collectionList will hold all the link nodes, irrespective of how deep the links are embedded.


getChildrenHTML

public java.lang.String getChildrenHTML()

accept

public void accept(org.htmlparser.visitors.NodeVisitor visitor)
Overrides:
accept in class Tag

getChildCount

public int getChildCount()

getStartTag

public Tag getStartTag()

getEndTag

public Tag getEndTag()

digupStringNode

public org.htmlparser.StringNode[] digupStringNode(java.lang.String searchText)
Finds a string node, however embedded it might be, and returns it. The string node will retain links to its parents, so further navigation is possible.