|
|||||||||
| Home >> All >> org >> [ htmlparser overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
org.htmlparser
Class Node

java.lang.Objectorg.htmlparser.Node
- All Implemented Interfaces:
- java.io.Serializable
- Direct Known Subclasses:
- RemarkNode, StringNode
- public abstract class Node
- extends java.lang.Object
- implements java.io.Serializable
- extends java.lang.Object
A Node interface is implemented by all types of nodes (tags, string elements, etc)
| Field Summary | |
protected static java.lang.String |
lineSeparator
Variable to store lineSeparator. |
protected int |
nodeBegin
The beginning position of the tag in the line |
protected int |
nodeEnd
The ending position of the tag in the line |
protected org.htmlparser.tags.CompositeTag |
parent
If parent of this tag |
| Constructor Summary | |
Node(int nodeBegin,
int nodeEnd)
|
|
Node(int nodeBegin,
int nodeEnd,
org.htmlparser.tags.CompositeTag parent)
|
|
| Method Summary | |
abstract void |
accept(org.htmlparser.visitors.NodeVisitor visitor)
|
void |
collectInto(org.htmlparser.util.NodeList collectionList,
java.lang.Class nodeType)
Collect this node and its child nodes (if-applicable) into the collection parameter, provided the node satisfies the filtering criteria. |
abstract void |
collectInto(org.htmlparser.util.NodeList collectionList,
java.lang.String filter)
Collect this node and its child nodes (if-applicable) into the collection parameter, provided the node satisfies the filtering criteria. |
int |
elementBegin()
Returns the beginning position of the tag. |
int |
elementEnd()
Returns the ending position fo the tag |
static java.lang.String |
getLineSeparator()
|
org.htmlparser.tags.CompositeTag |
getParent()
Get the parent of this tag |
static void |
setLineSeparator(java.lang.String lineSeparator)
|
void |
setParent(org.htmlparser.tags.CompositeTag tag)
Sets the parent of this tag |
abstract java.lang.String |
toHtml()
This method will make it easier when using html parser to reproduce html pages (with or without modifications) Applications reproducing html can use this method on nodes which are to be used or transferred as they were recieved, with the original html |
java.lang.String |
toHTML()
Deprecated. - use toHtml() instead |
abstract java.lang.String |
toPlainTextString()
Returns a string representation of the node. |
abstract java.lang.String |
toString()
Return the string representation of the node. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
nodeBegin
protected int nodeBegin
- The beginning position of the tag in the line
nodeEnd
protected int nodeEnd
- The ending position of the tag in the line
parent
protected org.htmlparser.tags.CompositeTag parent
- If parent of this tag
lineSeparator
protected static java.lang.String lineSeparator
- Variable to store lineSeparator. This is setup to read
line.separatorfrom the System property. However it can also be changed using the mutator methods. This will be used in the toHTML() methods in all the sub-classes of Node.
| Constructor Detail |
Node
public Node(int nodeBegin,
int nodeEnd)
Node
public Node(int nodeBegin,
int nodeEnd,
org.htmlparser.tags.CompositeTag parent)
| Method Detail |
setLineSeparator
public static void setLineSeparator(java.lang.String lineSeparator)
getLineSeparator
public static java.lang.String getLineSeparator()
toPlainTextString
public abstract java.lang.String toPlainTextString()
- Returns a string representation of the node. This is an important method,
it allows a simple string transformation of a web page, regardless of a
node.
Typical application code (for extracting only the text from a web page) would then be simplified to :
Node node; for (Enumeration e = parser.elements(); e.hasMoreElements();) { node = (Node) e.nextElement(); System.out.println(node.toPlainTextString()); // Or do whatever processing you wish with the plain text string }
toHtml
public abstract java.lang.String toHtml()
- This method will make it easier when using html parser to reproduce html
pages (with or without modifications) Applications reproducing html can
use this method on nodes which are to be used or transferred as they were
recieved, with the original html
toString
public abstract java.lang.String toString()
- Return the string representation of the node. Subclasses must define this
method, and this is typically to be used in the manner
System.out.println(node)
collectInto
public abstract void collectInto(org.htmlparser.util.NodeList collectionList, java.lang.String filter)
- Collect this node and its child nodes (if-applicable) into the collection
parameter, provided the node satisfies the filtering criteria.
This mechanism allows powerful filtering code to be written very easily,
without bothering about collection of embedded tags separately. e.g. when
we try to get all the links on a page, it is not possible to get it at
the top-level, as many tags (like form tags), can contain links embedded
in them. We could get the links out by checking if the current node is a
form tag, and going through its contents. However, this ties us down to
specific tags, and is not a very clean approach.
Using collectInto(), programs get a lot shorter. Now, the code to extract
all links from a page would look like :
NodeList collectionList = new NodeList(); Node node; String filter = LinkTag.LINK_TAG_FILTER; for (NodeIterator e = parser.elements(); e.hasMoreNodes();) { node = e.nextNode(); node.collectInto(collectionVector, filter); }Thus, collectionList will hold all the link nodes, irrespective of how deep the links are embedded. This of course implies that tags must fulfill their responsibilities toward honouring certain filters. Important: In order to keep performance optimal, do not create you own filter strings, as the internal matching occurs with the pre-existing filter string object (in the relevant class). i.e. do not make calls like : collectInto(collectionList,"-l"), instead, make calls only like : collectInto(collectionList,LinkTag.LINK_TAG_FILTER). To find out if your desired tag has filtering support, check the API of the tag.
collectInto
public void collectInto(org.htmlparser.util.NodeList collectionList, java.lang.Class nodeType)
- Collect this node and its child nodes (if-applicable) into the collection
parameter, provided the node satisfies the filtering criteria.
This mechanism allows powerful filtering code to be written very easily,
without bothering about collection of embedded tags separately. e.g. when
we try to get all the links on a page, it is not possible to get it at
the top-level, as many tags (like form tags), can contain links embedded
in them. We could get the links out by checking if the current node is a
form tag, and going through its contents. However, this ties us down to
specific tags, and is not a very clean approach.
Using collectInto(), programs get a lot shorter. Now, the code to extract
all links from a page would look like :
NodeList collectionList = new NodeList(); Node node; for (NodeIterator e = parser.elements(); e.hasMoreNodes();) { node = e.nextNode(); node.collectInto(collectionVector, LinkTag.class); }Thus, collectionList will hold all the link nodes, irrespective of how deep the links are embedded.
elementBegin
public int elementBegin()
- Returns the beginning position of the tag.
elementEnd
public int elementEnd()
- Returns the ending position fo the tag
accept
public abstract void accept(org.htmlparser.visitors.NodeVisitor visitor)
toHTML
public final java.lang.String toHTML()
- Deprecated. - use toHtml() instead
getParent
public org.htmlparser.tags.CompositeTag getParent()
- Get the parent of this tag
setParent
public void setParent(org.htmlparser.tags.CompositeTag tag)
- Sets the parent of this tag
|
|||||||||
| Home >> All >> org >> [ htmlparser overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
JAVADOC
org.htmlparser.Node