|
|||||||||
| Home >> All >> org >> [ htmlparser overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
org.htmlparser
Class NodeReader

java.lang.Objectjava.io.Reader
java.io.BufferedReader
org.htmlparser.NodeReader
- public class NodeReader
- extends java.io.BufferedReader
NodeReader builds on the BufferedReader, providing methods to read one element at a time
| Field Summary | |
static java.lang.String |
DECIPHER_ERROR
|
private boolean |
dontReadNextLine
|
protected java.lang.String |
line
|
private int |
lineCount
|
private org.htmlparser.util.NodeList |
nextParsedNode
|
protected Node |
node
|
private Parser |
parser
|
protected int |
posInLine
|
private java.lang.String |
previousLine
|
protected org.htmlparser.scanners.TagScanner |
previousOpenScanner
|
private RemarkNodeParser |
remarkNodeParser
|
private org.htmlparser.parserHelper.StringParser |
stringParser
|
protected java.lang.String |
url
|
| Fields inherited from class java.io.BufferedReader |
|
| Fields inherited from class java.io.Reader |
lock |
| Constructor Summary | |
NodeReader(java.io.Reader in,
int len)
This constructor basically overrides the existing constructor in the BufferedReader class. |
|
NodeReader(java.io.Reader in,
int len,
java.lang.String url)
The constructor takes in a reader object, it's length and the url to be read. |
|
NodeReader(java.io.Reader in,
java.lang.String url)
The constructor takes in a reader object, and the url to be read. |
|
| Method Summary | |
void |
addNextParsedNode(Node nextParsedNode)
Adds the given node on the front of an internal list of pre-parsed nodes. |
void |
appendLineDetails(java.lang.StringBuffer msgBuffer)
|
private boolean |
beginTag(java.lang.String line,
int pos)
Returns true if the text at pos in line
should be scanned as a tag. |
void |
changeLine(java.lang.String line)
This method is intended to be called only by scanners, when a situation of dirty html has arisen, and action has been taken to correct the parsed tags. |
java.lang.String |
getCurrentLine()
|
int |
getLastLineNumber()
Get the last line number that the reader has read |
int |
getLastReadPosition()
This method is useful when designing your own scanners. |
java.lang.String |
getLine()
Returns the line. |
int |
getLineCount()
Returns the lineCount. |
static java.lang.String |
getLineSeparator()
Gets the line seperator that is being used |
java.lang.String |
getNextLine()
|
Parser |
getParser()
Returns the parser object for which this reader exists |
java.lang.String |
getPreviousLine()
Returns the previousLine. |
org.htmlparser.scanners.TagScanner |
getPreviousOpenScanner()
Gets the previousOpenScanner. |
org.htmlparser.parserHelper.StringParser |
getStringParser()
|
java.lang.String |
getURL()
Get the url for this reader. |
boolean |
isDontReadNextLine()
|
Node |
readElement()
Read the next element |
Node |
readElement(boolean balance_quotes)
Read the next element |
protected boolean |
readNextLine()
Do we need to read the next line ? |
void |
reset()
Reset the stream to the point where the mark() method
was called. |
void |
setDontReadNextLine(boolean dontReadNextLine)
|
void |
setLineCount(int lineCount)
Sets the lineCount. |
static void |
setLineSeparator(java.lang.String lineSeparator)
|
void |
setParser(Parser newParser)
The setParser method is used by the parser to put its own object into the reader. |
void |
setPosInLine(int posInLine)
Sets the posInLine. |
void |
setPreviousOpenScanner(org.htmlparser.scanners.TagScanner previousOpenScanner)
Sets the previousOpenScanner. |
| Methods inherited from class java.io.BufferedReader |
close, mark, markSupported, read, read, readLine, ready, skip |
| Methods inherited from class java.io.Reader |
read |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
DECIPHER_ERROR
public static final java.lang.String DECIPHER_ERROR
- See Also:
- Constant Field Values
posInLine
protected int posInLine
line
protected java.lang.String line
node
protected Node node
previousOpenScanner
protected org.htmlparser.scanners.TagScanner previousOpenScanner
url
protected java.lang.String url
parser
private Parser parser
lineCount
private int lineCount
previousLine
private java.lang.String previousLine
stringParser
private org.htmlparser.parserHelper.StringParser stringParser
remarkNodeParser
private RemarkNodeParser remarkNodeParser
nextParsedNode
private org.htmlparser.util.NodeList nextParsedNode
dontReadNextLine
private boolean dontReadNextLine
| Constructor Detail |
NodeReader
public NodeReader(java.io.Reader in, int len, java.lang.String url)
- The constructor takes in a reader object, it's length and the url to be
read.
NodeReader
public NodeReader(java.io.Reader in, int len)
- This constructor basically overrides the existing constructor in the
BufferedReader class. The URL defaults to an empty string.
NodeReader
public NodeReader(java.io.Reader in, java.lang.String url)
- The constructor takes in a reader object, and the url to be read. The
buffer size defaults to 8192.
| Method Detail |
getURL
public java.lang.String getURL()
- Get the url for this reader.
changeLine
public void changeLine(java.lang.String line)
- This method is intended to be called only by scanners, when a situation
of dirty html has arisen, and action has been taken to correct the parsed
tags. For e.g. if we have html of the form :
<a href="somelink.html"><img src=...>
<a href="someotherlink.html">...</a> Now to salvage the first link, we'd probably like to insert an end tag somewhere (typically before the second begin link tag). So that the parsing continues uninterrupted, we will need to change the existing line being parsed, to contain the end tag in it.
getCurrentLine
public java.lang.String getCurrentLine()
getLastLineNumber
public int getLastLineNumber()
- Get the last line number that the reader has read
getLastReadPosition
public int getLastReadPosition()
- This method is useful when designing your own scanners. You might need to
find out what is the location where the reader has stopped last.
getNextLine
public java.lang.String getNextLine()
getParser
public Parser getParser()
- Returns the parser object for which this reader exists
getPreviousOpenScanner
public org.htmlparser.scanners.TagScanner getPreviousOpenScanner()
- Gets the previousOpenScanner.
beginTag
private boolean beginTag(java.lang.String line, int pos)
- Returns true if the text at
posinlineshould be scanned as a tag. Basically an open angle followed by a known special character or a letter.
readElement
public Node readElement() throws org.htmlparser.util.ParserException
- Read the next element
readElement
public Node readElement(boolean balance_quotes) throws org.htmlparser.util.ParserException
- Read the next element
appendLineDetails
public void appendLineDetails(java.lang.StringBuffer msgBuffer)
readNextLine
protected boolean readNextLine()
- Do we need to read the next line ?
setParser
public void setParser(Parser newParser)
- The setParser method is used by the parser to put its own object into the
reader. This happens internally, so this method is not generally for use
by the developer or the user.
setPreviousOpenScanner
public void setPreviousOpenScanner(org.htmlparser.scanners.TagScanner previousOpenScanner)
- Sets the previousOpenScanner.
setLineSeparator
public static void setLineSeparator(java.lang.String lineSeparator)
getLineSeparator
public static java.lang.String getLineSeparator()
- Gets the line seperator that is being used
getLineCount
public int getLineCount()
- Returns the lineCount.
getPreviousLine
public java.lang.String getPreviousLine()
- Returns the previousLine.
getLine
public java.lang.String getLine()
- Returns the line.
setLineCount
public void setLineCount(int lineCount)
- Sets the lineCount.
setPosInLine
public void setPosInLine(int posInLine)
- Sets the posInLine.
reset
public void reset() throws java.io.IOException- Description copied from class:
java.io.BufferedReader - Reset the stream to the point where the
mark()method was called. Any chars that were read after the mark point was set will be re-read during subsequent reads.This method will throw an IOException if the number of chars read from the stream since the call to
mark()exceeds the mark limit passed when establishing the mark.
getStringParser
public org.htmlparser.parserHelper.StringParser getStringParser()
addNextParsedNode
public void addNextParsedNode(Node nextParsedNode)
- Adds the given node on the front of an internal list of pre-parsed nodes.
Used in recursive calls where downstream nodes have been recognized in
order to parse the current node.
isDontReadNextLine
public boolean isDontReadNextLine()
setDontReadNextLine
public void setDontReadNextLine(boolean dontReadNextLine)
Overview Package Class Use Deprecated Index Home >> All >> org >> [ htmlparser overview ] PREV CLASS NEXT CLASS SUMMARY:
JAVADOC |
SOURCE |
DOWNLOAD | NESTED | FIELD | CONSTR | METHODDETAIL: FIELD | CONSTR | METHOD
