Docjar: A Java Source and Docuemnt Enginecom.*    java.*    javax.*    org.*    all    new    plug-in

Quick Search    Search Deep

org.htmlparser
Class Node  view Node download Node.java

java.lang.Object
  extended byorg.htmlparser.Node
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
RemarkNode, StringNode

public abstract class Node
extends java.lang.Object
implements java.io.Serializable

A Node interface is implemented by all types of nodes (tags, string elements, etc)


Field Summary
protected static java.lang.String lineSeparator
          Variable to store lineSeparator.
protected  int nodeBegin
          The beginning position of the tag in the line
protected  int nodeEnd
          The ending position of the tag in the line
protected  org.htmlparser.tags.CompositeTag parent
          If parent of this tag
 
Constructor Summary
Node(int nodeBegin, int nodeEnd)
           
Node(int nodeBegin, int nodeEnd, org.htmlparser.tags.CompositeTag parent)
           
 
Method Summary
abstract  void accept(org.htmlparser.visitors.NodeVisitor visitor)
           
 void collectInto(org.htmlparser.util.NodeList collectionList, java.lang.Class nodeType)
          Collect this node and its child nodes (if-applicable) into the collection parameter, provided the node satisfies the filtering criteria.
abstract  void collectInto(org.htmlparser.util.NodeList collectionList, java.lang.String filter)
          Collect this node and its child nodes (if-applicable) into the collection parameter, provided the node satisfies the filtering criteria.
 int elementBegin()
          Returns the beginning position of the tag.
 int elementEnd()
          Returns the ending position fo the tag
static java.lang.String getLineSeparator()
           
 org.htmlparser.tags.CompositeTag getParent()
          Get the parent of this tag
static void setLineSeparator(java.lang.String lineSeparator)
           
 void setParent(org.htmlparser.tags.CompositeTag tag)
          Sets the parent of this tag
abstract  java.lang.String toHtml()
          This method will make it easier when using html parser to reproduce html pages (with or without modifications) Applications reproducing html can use this method on nodes which are to be used or transferred as they were recieved, with the original html
 java.lang.String toHTML()
          Deprecated. - use toHtml() instead
abstract  java.lang.String toPlainTextString()
          Returns a string representation of the node.
abstract  java.lang.String toString()
          Return the string representation of the node.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

nodeBegin

protected int nodeBegin
The beginning position of the tag in the line


nodeEnd

protected int nodeEnd
The ending position of the tag in the line


parent

protected org.htmlparser.tags.CompositeTag parent
If parent of this tag


lineSeparator

protected static java.lang.String lineSeparator
Variable to store lineSeparator. This is setup to read line.separator from the System property. However it can also be changed using the mutator methods. This will be used in the toHTML() methods in all the sub-classes of Node.

Constructor Detail

Node

public Node(int nodeBegin,
            int nodeEnd)

Node

public Node(int nodeBegin,
            int nodeEnd,
            org.htmlparser.tags.CompositeTag parent)
Method Detail

setLineSeparator

public static void setLineSeparator(java.lang.String lineSeparator)

getLineSeparator

public static java.lang.String getLineSeparator()

toPlainTextString

public abstract java.lang.String toPlainTextString()
Returns a string representation of the node. This is an important method, it allows a simple string transformation of a web page, regardless of a node.
Typical application code (for extracting only the text from a web page) would then be simplified to :
 Node node;
 for (Enumeration e = parser.elements(); e.hasMoreElements();) {
 	node = (Node) e.nextElement();
 	System.out.println(node.toPlainTextString()); // Or do whatever processing you wish with the plain text string
 }
 


toHtml

public abstract java.lang.String toHtml()
This method will make it easier when using html parser to reproduce html pages (with or without modifications) Applications reproducing html can use this method on nodes which are to be used or transferred as they were recieved, with the original html


toString

public abstract java.lang.String toString()
Return the string representation of the node. Subclasses must define this method, and this is typically to be used in the manner
 System.out.println(node)
 


collectInto

public abstract void collectInto(org.htmlparser.util.NodeList collectionList,
                                 java.lang.String filter)
Collect this node and its child nodes (if-applicable) into the collection parameter, provided the node satisfies the filtering criteria.

This mechanism allows powerful filtering code to be written very easily, without bothering about collection of embedded tags separately. e.g. when we try to get all the links on a page, it is not possible to get it at the top-level, as many tags (like form tags), can contain links embedded in them. We could get the links out by checking if the current node is a form tag, and going through its contents. However, this ties us down to specific tags, and is not a very clean approach.

Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like :

 NodeList collectionList = new NodeList();
 Node node;
 String filter = LinkTag.LINK_TAG_FILTER;
 for (NodeIterator e = parser.elements(); e.hasMoreNodes();) {
 	node = e.nextNode();
 	node.collectInto(collectionVector, filter);
 }
 
Thus, collectionList will hold all the link nodes, irrespective of how deep the links are embedded. This of course implies that tags must fulfill their responsibilities toward honouring certain filters. Important: In order to keep performance optimal, do not create you own filter strings, as the internal matching occurs with the pre-existing filter string object (in the relevant class). i.e. do not make calls like : collectInto(collectionList,"-l"), instead, make calls only like : collectInto(collectionList,LinkTag.LINK_TAG_FILTER).

To find out if your desired tag has filtering support, check the API of the tag.


collectInto

public void collectInto(org.htmlparser.util.NodeList collectionList,
                        java.lang.Class nodeType)
Collect this node and its child nodes (if-applicable) into the collection parameter, provided the node satisfies the filtering criteria.

This mechanism allows powerful filtering code to be written very easily, without bothering about collection of embedded tags separately. e.g. when we try to get all the links on a page, it is not possible to get it at the top-level, as many tags (like form tags), can contain links embedded in them. We could get the links out by checking if the current node is a form tag, and going through its contents. However, this ties us down to specific tags, and is not a very clean approach.

Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like :

 NodeList collectionList = new NodeList();
 Node node;
 for (NodeIterator e = parser.elements(); e.hasMoreNodes();) {
 	node = e.nextNode();
 	node.collectInto(collectionVector, LinkTag.class);
 }
 
Thus, collectionList will hold all the link nodes, irrespective of how deep the links are embedded.


elementBegin

public int elementBegin()
Returns the beginning position of the tag.


elementEnd

public int elementEnd()
Returns the ending position fo the tag


accept

public abstract void accept(org.htmlparser.visitors.NodeVisitor visitor)

toHTML

public final java.lang.String toHTML()
Deprecated. - use toHtml() instead


getParent

public org.htmlparser.tags.CompositeTag getParent()
Get the parent of this tag


setParent

public void setParent(org.htmlparser.tags.CompositeTag tag)
Sets the parent of this tag