Docjar: A Java Source and Docuemnt Enginecom.*    java.*    javax.*    org.*    all    new    plug-in

Quick Search    Search Deep

org.htmlparser
Class NodeReader  view NodeReader download NodeReader.java

java.lang.Object
  extended byjava.io.Reader
      extended byjava.io.BufferedReader
          extended byorg.htmlparser.NodeReader

public class NodeReader
extends java.io.BufferedReader

NodeReader builds on the BufferedReader, providing methods to read one element at a time


Field Summary
static java.lang.String DECIPHER_ERROR
           
private  boolean dontReadNextLine
           
protected  java.lang.String line
           
private  int lineCount
           
private  org.htmlparser.util.NodeList nextParsedNode
           
protected  Node node
           
private  Parser parser
           
protected  int posInLine
           
private  java.lang.String previousLine
           
protected  org.htmlparser.scanners.TagScanner previousOpenScanner
           
private  RemarkNodeParser remarkNodeParser
           
private  org.htmlparser.parserHelper.StringParser stringParser
           
protected  java.lang.String url
           
 
Fields inherited from class java.io.BufferedReader
 
Fields inherited from class java.io.Reader
lock
 
Constructor Summary
NodeReader(java.io.Reader in, int len)
          This constructor basically overrides the existing constructor in the BufferedReader class.
NodeReader(java.io.Reader in, int len, java.lang.String url)
          The constructor takes in a reader object, it's length and the url to be read.
NodeReader(java.io.Reader in, java.lang.String url)
          The constructor takes in a reader object, and the url to be read.
 
Method Summary
 void addNextParsedNode(Node nextParsedNode)
          Adds the given node on the front of an internal list of pre-parsed nodes.
 void appendLineDetails(java.lang.StringBuffer msgBuffer)
           
private  boolean beginTag(java.lang.String line, int pos)
          Returns true if the text at pos in line should be scanned as a tag.
 void changeLine(java.lang.String line)
          This method is intended to be called only by scanners, when a situation of dirty html has arisen, and action has been taken to correct the parsed tags.
 java.lang.String getCurrentLine()
           
 int getLastLineNumber()
          Get the last line number that the reader has read
 int getLastReadPosition()
          This method is useful when designing your own scanners.
 java.lang.String getLine()
          Returns the line.
 int getLineCount()
          Returns the lineCount.
static java.lang.String getLineSeparator()
          Gets the line seperator that is being used
 java.lang.String getNextLine()
           
 Parser getParser()
          Returns the parser object for which this reader exists
 java.lang.String getPreviousLine()
          Returns the previousLine.
 org.htmlparser.scanners.TagScanner getPreviousOpenScanner()
          Gets the previousOpenScanner.
 org.htmlparser.parserHelper.StringParser getStringParser()
           
 java.lang.String getURL()
          Get the url for this reader.
 boolean isDontReadNextLine()
           
 Node readElement()
          Read the next element
 Node readElement(boolean balance_quotes)
          Read the next element
protected  boolean readNextLine()
          Do we need to read the next line ?
 void reset()
          Reset the stream to the point where the mark() method was called.
 void setDontReadNextLine(boolean dontReadNextLine)
           
 void setLineCount(int lineCount)
          Sets the lineCount.
static void setLineSeparator(java.lang.String lineSeparator)
           
 void setParser(Parser newParser)
          The setParser method is used by the parser to put its own object into the reader.
 void setPosInLine(int posInLine)
          Sets the posInLine.
 void setPreviousOpenScanner(org.htmlparser.scanners.TagScanner previousOpenScanner)
          Sets the previousOpenScanner.
 
Methods inherited from class java.io.BufferedReader
close, mark, markSupported, read, read, readLine, ready, skip
 
Methods inherited from class java.io.Reader
read
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DECIPHER_ERROR

public static final java.lang.String DECIPHER_ERROR
See Also:
Constant Field Values

posInLine

protected int posInLine

line

protected java.lang.String line

node

protected Node node

previousOpenScanner

protected org.htmlparser.scanners.TagScanner previousOpenScanner

url

protected java.lang.String url

parser

private Parser parser

lineCount

private int lineCount

previousLine

private java.lang.String previousLine

stringParser

private org.htmlparser.parserHelper.StringParser stringParser

remarkNodeParser

private RemarkNodeParser remarkNodeParser

nextParsedNode

private org.htmlparser.util.NodeList nextParsedNode

dontReadNextLine

private boolean dontReadNextLine
Constructor Detail

NodeReader

public NodeReader(java.io.Reader in,
                  int len,
                  java.lang.String url)
The constructor takes in a reader object, it's length and the url to be read.


NodeReader

public NodeReader(java.io.Reader in,
                  int len)
This constructor basically overrides the existing constructor in the BufferedReader class. The URL defaults to an empty string.


NodeReader

public NodeReader(java.io.Reader in,
                  java.lang.String url)
The constructor takes in a reader object, and the url to be read. The buffer size defaults to 8192.

Method Detail

getURL

public java.lang.String getURL()
Get the url for this reader.


changeLine

public void changeLine(java.lang.String line)
This method is intended to be called only by scanners, when a situation of dirty html has arisen, and action has been taken to correct the parsed tags. For e.g. if we have html of the form :
 
  <a href="somelink.html"><img src=...>
 
 <a href="someotherlink.html">...</a>
  
 
Now to salvage the first link, we'd probably like to insert an end tag somewhere (typically before the second begin link tag). So that the parsing continues uninterrupted, we will need to change the existing line being parsed, to contain the end tag in it.


getCurrentLine

public java.lang.String getCurrentLine()

getLastLineNumber

public int getLastLineNumber()
Get the last line number that the reader has read


getLastReadPosition

public int getLastReadPosition()
This method is useful when designing your own scanners. You might need to find out what is the location where the reader has stopped last.


getNextLine

public java.lang.String getNextLine()

getParser

public Parser getParser()
Returns the parser object for which this reader exists


getPreviousOpenScanner

public org.htmlparser.scanners.TagScanner getPreviousOpenScanner()
Gets the previousOpenScanner.


beginTag

private boolean beginTag(java.lang.String line,
                         int pos)
Returns true if the text at pos in line should be scanned as a tag. Basically an open angle followed by a known special character or a letter.


readElement

public Node readElement()
                 throws org.htmlparser.util.ParserException
Read the next element


readElement

public Node readElement(boolean balance_quotes)
                 throws org.htmlparser.util.ParserException
Read the next element


appendLineDetails

public void appendLineDetails(java.lang.StringBuffer msgBuffer)

readNextLine

protected boolean readNextLine()
Do we need to read the next line ?


setParser

public void setParser(Parser newParser)
The setParser method is used by the parser to put its own object into the reader. This happens internally, so this method is not generally for use by the developer or the user.


setPreviousOpenScanner

public void setPreviousOpenScanner(org.htmlparser.scanners.TagScanner previousOpenScanner)
Sets the previousOpenScanner.


setLineSeparator

public static void setLineSeparator(java.lang.String lineSeparator)

getLineSeparator

public static java.lang.String getLineSeparator()
Gets the line seperator that is being used


getLineCount

public int getLineCount()
Returns the lineCount.


getPreviousLine

public java.lang.String getPreviousLine()
Returns the previousLine.


getLine

public java.lang.String getLine()
Returns the line.


setLineCount

public void setLineCount(int lineCount)
Sets the lineCount.


setPosInLine

public void setPosInLine(int posInLine)
Sets the posInLine.


reset

public void reset()
           throws java.io.IOException
Description copied from class: java.io.BufferedReader
Reset the stream to the point where the mark() method was called. Any chars that were read after the mark point was set will be re-read during subsequent reads.

This method will throw an IOException if the number of chars read from the stream since the call to mark() exceeds the mark limit passed when establishing the mark.


getStringParser

public org.htmlparser.parserHelper.StringParser getStringParser()

addNextParsedNode

public void addNextParsedNode(Node nextParsedNode)
Adds the given node on the front of an internal list of pre-parsed nodes. Used in recursive calls where downstream nodes have been recognized in order to parse the current node.


isDontReadNextLine

public boolean isDontReadNextLine()

setDontReadNextLine

public void setDontReadNextLine(boolean dontReadNextLine)