java.lang.Object
org.htmlparser.Node
org.htmlparser.tags.Tag
org.htmlparser.tags.CompositeTag
- All Implemented Interfaces:
- java.io.Serializable
- Direct Known Subclasses:
- AppletTag, BodyTag, Bullet, BulletList, Div, FormTag, FrameSetTag, HeadTag, Html, LabelTag, LinkTag, OptionTag, ScriptTag, SelectTag, Span, StyleTag, TableColumn, TableRow, TableTag, TextareaTag, TitleTag
- public abstract class CompositeTag
- extends Tag
| Methods inherited from class org.htmlparser.tags.Tag |
append, append, breaksFlow, extractWord, find, getAttribute, getAttributes, getParameter, getParsed, getTagBegin, getTagEnd, getTagEndLine, getTagLine, getTagLines, getTagName, getTagStartLine, getText, getThisScanner, getType, isEmptyXmlTag, redoParseAttributes, scan, setAttribute, setAttributes, setEmptyXmlTag, setTagBegin, setTagEnd, setTagLine, setTagParser, setText, setThisScanner, toString |
startTag
protected Tag startTag
endTag
protected Tag endTag
childTags
protected org.htmlparser.util.NodeList childTags
CompositeTag
public CompositeTag(org.htmlparser.tags.data.TagData tagData,
org.htmlparser.tags.data.CompositeTagData compositeTagData)
children
public org.htmlparser.util.SimpleNodeIterator children()
getChild
public org.htmlparser.Node getChild(int index)
getChildrenAsNodeArray
public org.htmlparser.Node[] getChildrenAsNodeArray()
getChildren
public org.htmlparser.util.NodeList getChildren()
elements
public org.htmlparser.util.SimpleNodeIterator elements()
- Return the child tags as an iterator. Equivalent to calling getChildren
().elements ().
toPlainTextString
public java.lang.String toPlainTextString()
- Description copied from class:
org.htmlparser.Node
- Returns a string representation of the node. This is an important method,
it allows a simple string transformation of a web page, regardless of a
node.
Typical application code (for extracting only the text from a web page)
would then be simplified to :
Node node;
for (Enumeration e = parser.elements(); e.hasMoreElements();) {
node = (Node) e.nextElement();
System.out.println(node.toPlainTextString()); // Or do whatever processing you wish with the plain text string
}
- Overrides:
toPlainTextString in class Tag
putStartTagInto
public void putStartTagInto(java.lang.StringBuffer sb)
putChildrenInto
protected void putChildrenInto(java.lang.StringBuffer sb)
putEndTagInto
protected void putEndTagInto(java.lang.StringBuffer sb)
toHtml
public java.lang.String toHtml()
- Description copied from class:
Tag
- A call to a tag's toHTML() method will render it in HTML Most tags that
do not have children and inherit from Tag, do not need to override
toHTML().
- Overrides:
toHtml in class Tag
searchByName
public Tag searchByName(java.lang.String name)
- Searches all children who for a name attribute. Returns first match.
searchFor
public org.htmlparser.util.NodeList searchFor(java.lang.String searchString,
boolean caseSensitive)
- Searches for any node whose text representation contains the search
string. Collects all such nodes in a NodeList. e.g. if you wish to find
any textareas in a form tag containing "hello world", the code would be :
NodeList nodeList = formTag.searchFor("Hello World");
searchFor
public org.htmlparser.util.NodeList searchFor(java.lang.Class classType)
- Collect all objects that are of a certain type Note that this will not
check for parent types, and will not recurse through child tags
searchFor
public org.htmlparser.util.NodeList searchFor(java.lang.String searchString)
- Searches for any node whose text representation contains the search
string. Collects all such nodes in a NodeList. e.g. if you wish to find
any textareas in a form tag containing "hello world", the code would be :
NodeList nodeList = formTag.searchFor("Hello World");
This
search is case-insensitive.
findPositionOf
public int findPositionOf(java.lang.String text)
- Returns the node number of the string node containing the given text.
This can be useful to index into the composite tag and get other
children.
findPositionOf
public int findPositionOf(org.htmlparser.Node searchNode)
- Returns the node number of a child node given the node object. This would
typically be used in conjuction with digUpStringNode, after which the
string node's parent can be used to find the string node's position.
Faster than calling findPositionOf(text) again. Note that the position is
at a linear level alone - there is no recursion in this method.
childAt
public org.htmlparser.Node childAt(int index)
- Get child at given index
collectInto
public void collectInto(org.htmlparser.util.NodeList collectionList,
java.lang.String filter)
- Description copied from class:
Tag
- This method verifies that the current tag matches the provided filter.
The match is based on the string object and not its contents, so ensure
that you are using static final filter strings provided in the tag
classes.
- Overrides:
collectInto in class Tag
collectInto
public void collectInto(org.htmlparser.util.NodeList collectionList,
java.lang.Class nodeType)
- Description copied from class:
org.htmlparser.Node
- Collect this node and its child nodes (if-applicable) into the collection
parameter, provided the node satisfies the filtering criteria.
This mechanism allows powerful filtering code to be written very easily,
without bothering about collection of embedded tags separately. e.g. when
we try to get all the links on a page, it is not possible to get it at
the top-level, as many tags (like form tags), can contain links embedded
in them. We could get the links out by checking if the current node is a
form tag, and going through its contents. However, this ties us down to
specific tags, and is not a very clean approach.
Using collectInto(), programs get a lot shorter. Now, the code to extract
all links from a page would look like :
NodeList collectionList = new NodeList();
Node node;
for (NodeIterator e = parser.elements(); e.hasMoreNodes();) {
node = e.nextNode();
node.collectInto(collectionVector, LinkTag.class);
}
Thus, collectionList will hold all the link nodes, irrespective of how
deep the links are embedded.
getChildrenHTML
public java.lang.String getChildrenHTML()
accept
public void accept(org.htmlparser.visitors.NodeVisitor visitor)
- Overrides:
accept in class Tag
getChildCount
public int getChildCount()
getStartTag
public Tag getStartTag()
getEndTag
public Tag getEndTag()
digupStringNode
public org.htmlparser.StringNode[] digupStringNode(java.lang.String searchText)
- Finds a string node, however embedded it might be, and returns it. The
string node will retain links to its parents, so further navigation is
possible.