|
|||||||||
| Home >> All >> gnu >> javax >> swing >> text >> html >> parser >> [ support overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
gnu.javax.swing.text.html.parser.support
Class Parser

java.lang.Objectgnu.javax.swing.text.html.parser.support.low.Constants
gnu.javax.swing.text.html.parser.support.low.ReaderTokenizer
gnu.javax.swing.text.html.parser.support.Parser
- All Implemented Interfaces:
- javax.swing.text.html.parser.DTDConstants
- public class Parser
- extends gnu.javax.swing.text.html.parser.support.low.ReaderTokenizer
- implements javax.swing.text.html.parser.DTDConstants
- extends gnu.javax.swing.text.html.parser.support.low.ReaderTokenizer
A simple error-tolerant HTML parser that uses a DTD document to access data on the possible tokens, arguments and syntax.
The parser reads an HTML content from a Reader and calls various notifying methods (which should be overridden in a subclass) when tags or data are encountered.
Some HTML elements need no opening or closing tags. The task of this parser is to invoke the tag handling methods also when the tags are not explicitly specified and must be supposed using information, stored in the DTD. For example, parsing the document
<table><tr><td>a<td>b<td>c</tr>
will invoke exactly the handling methods exactly in the same order
(and with the same parameters) as if parsing the document:
<html><head></head><body><table><
tbody><tr><td>a</td><td>b
</td><td>c</td></tr><
/tbody></table></body></html>
| Field Summary | |
(package private) gnu.javax.swing.text.html.parser.htmlAttributeSet |
attributes
The attributes of the current HTML element. |
private java.lang.StringBuffer |
buffer
The buffer to collect the incremental output like text or coment. |
private parameterDefaulter |
defaulter
Provides the default values for parameters in the case when these values are defined in the DTD. |
private java.util.Set |
documentTags
The set of the document tags. |
protected javax.swing.text.html.parser.DTD |
dtd
The document template description that will be used to parse the documents. |
gnu.javax.swing.text.html.parser.support.low.Token |
hTag
The current html tag. |
protected int |
preformatted
This fields has positive values in preformatted tags. |
protected boolean |
strict
The value of this field determines whether or not the Parser will be strict in enforcing SGML compatibility. |
private gnu.javax.swing.text.html.parser.support.low.Token |
t
The current token. |
private textPreProcessor |
textProcessor
The text pre-processor for handling line ends and tabs. |
private java.lang.StringBuffer |
title
The buffer to store the document title. |
private boolean |
titleHandled
True means that the 'title' tag of this document has already been handled. |
private boolean |
titleOpen
True means that the 'title' tag is currently open and all text is also added to the title buffer. |
private gnu.javax.swing.text.html.parser.htmlValidator |
validator
The validator, controlling the forcible closing of the tags that (in accordance to dtd) are not allowed in the current context. |
| Fields inherited from class gnu.javax.swing.text.html.parser.support.low.ReaderTokenizer |
advanced, backupMode |
| Fields inherited from class gnu.javax.swing.text.html.parser.support.low.Constants |
AP, bDIGIT, BEGIN, bLETTER, bLINEBREAK, bNAME, bQUOTING, bSINGLE_CHAR_TOKEN, bSPECIAL, bWHITESPACE, COMMENT_END, COMMENT_OPEN, COMMENT_TRIPLEDASH_END, DOUBLE_DASH, END, ENTITY, ENTITY_NAMED, ENTITY_NUMERIC, EOF, EQ, EXCLAMATION, NUMTOKEN, OTHER, QUOT, SCRIPT, SCRIPT_CLOSE, SCRIPT_OPEN, SGML, SLASH, STYLE, STYLE_CLOSE, STYLE_OPEN, TAG, WS |
| Fields inherited from interface javax.swing.text.html.parser.DTDConstants |
ANY, CDATA, CONREF, CURRENT, DEFAULT, EMPTY, ENDTAG, ENTITIES, ENTITY, FIXED, GENERAL, ID, IDREF, IDREFS, IMPLIED, MD, MODEL, MS, NAME, NAMES, NMTOKEN, NMTOKENS, NOTATION, NUMBER, NUMBERS, NUTOKEN, NUTOKENS, PARAMETER, PI, PUBLIC, RCDATA, REQUIRED, SDATA, STARTTAG, SYSTEM |
| Constructor Summary | |
Parser(javax.swing.text.html.parser.DTD a_dtd)
Creates a new Parser that uses the given javax.swing.text.html.parser.DTD. |
|
| Method Summary | |
private void |
_handleCompleteElement(javax.swing.text.html.parser.TagElement tag)
Handle a complete element, when the tag content is already present in the buffer and both starting and heading tags behind. |
private void |
_handleEmptyTag(javax.swing.text.html.parser.TagElement tag)
A hooks for operations, preceeding call to handleEmptyTag(). |
(package private) void |
_handleEndTag_remaining(javax.swing.text.html.parser.TagElement tag)
Actions that are also required if the closing action was initiated by the tag validator. |
private void |
_handleEndTag(javax.swing.text.html.parser.TagElement tag)
A hooks for operations, preceeding call to handleEndTag(). |
(package private) void |
_handleStartTag(javax.swing.text.html.parser.TagElement tag)
A hooks for operations, preceeding call to handleStartTag(). |
protected void |
_handleText()
A hook, for operations, preceeding call to handleText. |
protected void |
append(gnu.javax.swing.text.html.parser.support.low.Token t)
Add the image of this token to the buffer. |
protected void |
CDATA(boolean clearBuffer)
Read parseable character data, add to buffer. |
protected void |
Comment()
Process Comment. |
protected void |
consume(gnu.javax.swing.text.html.parser.support.low.pattern p)
Consume pattern that must match. |
protected void |
endTag(boolean omitted)
The method is called when the HTML end (closing) tag is found or if the parser concludes that the one should be present in the current position. |
void |
error(java.lang.String msg)
Invokes the error handler. |
void |
error(java.lang.String msg,
java.lang.String invalid)
Invokes the error handler. |
void |
error(java.lang.String parm1,
java.lang.String parm2,
java.lang.String parm3)
Invokes the error handler. |
void |
error(java.lang.String parm1,
java.lang.String parm2,
java.lang.String parm3,
java.lang.String parm4)
Invokes the error handler. |
void |
error(java.lang.String msg,
gnu.javax.swing.text.html.parser.support.low.Token atToken)
Invokes the error handler. |
void |
flushAttributes()
|
private void |
forciblyCloseTheTag()
Resume parsing after heavy errors in HTML tag structure. |
gnu.javax.swing.text.html.parser.htmlAttributeSet |
getAttributes()
Get the attributes of the current tag. |
protected int |
getCurrentLine()
Get the first line of the last parsed token. |
private void |
handleComment()
Handle comment in string buffer. |
protected void |
handleComment(char[] comment)
Handle HTML comment. |
protected void |
handleEmptyTag(javax.swing.text.html.parser.TagElement tag)
Handle the tag with no content, like <br>. |
protected void |
handleEndTag(javax.swing.text.html.parser.TagElement tag)
The method is called when the HTML closing tag ((like </table>) is found or if the parser concludes that the one should be present in the current position. |
protected void |
handleEOFInComment()
This is additionally called in when the HTML content terminates without closing the HTML comment. |
protected void |
handleError(int line,
java.lang.String message)
|
protected void |
handleStartTag(javax.swing.text.html.parser.TagElement tag)
The method is called when the HTML opening tag ((like <table>) is found or if the parser concludes that the one should be present in the current position. |
protected void |
handleText(char[] text)
Handle the text section. |
protected void |
handleTitle(char[] title)
Handle HTML <title> tag. |
protected javax.swing.text.html.parser.TagElement |
makeTag(javax.swing.text.html.parser.Element element)
Constructs the tag from the given element. |
protected javax.swing.text.html.parser.TagElement |
makeTag(javax.swing.text.html.parser.Element element,
boolean isSupposed)
Constructs the tag from the given element. |
private javax.swing.text.html.parser.TagElement |
makeTagElement(java.lang.String name,
boolean isSupposed)
|
protected void |
markFirstTime(javax.swing.text.html.parser.Element element)
This is called when the tag, representing the given element, occurs first time in the document. |
protected gnu.javax.swing.text.html.parser.support.low.Token |
mustBe(int kind)
Consume the token that was checked before and hence MUST be present. |
protected void |
noValueAttribute(java.lang.String element,
java.lang.String attribute)
Handle attribute without value. |
protected gnu.javax.swing.text.html.parser.support.low.Token |
optional(int kind)
Consume the optional token, if present. |
void |
parse(java.io.Reader reader)
Parse the HTML text, calling various methods in response to the occurence of the corresponding HTML constructions. |
protected void |
parseDocument()
Parse the html document. |
java.lang.String |
parseDTDMarkup()
Parses DTD markup declaration. |
boolean |
parseMarkupDeclarations(java.lang.StringBuffer strBuff)
Parse SGML insertion ( <! ... |
protected void |
readAttributes(java.lang.String element)
Read the element attributes, adding them into attribute set. |
private void |
readTillTokenE(int till)
Read till the given token, resolving entities. |
private void |
resolveAndAppendEntity(gnu.javax.swing.text.html.parser.support.low.Token entity)
Resolve the entity and append it to the end of buffer. |
protected java.lang.String |
resolveNamedEntity(java.lang.String a_tag)
Return string, corresponding the given named entity. |
protected char |
resolveNumericEntity(java.lang.String a_tag)
Return char, corresponding the given numeric entity. |
protected void |
restart()
Reset all fields into the intial default state, preparing the parset for parsing the next document. |
private void |
restOfTag(boolean closing,
gnu.javax.swing.text.html.parser.support.low.Token name,
gnu.javax.swing.text.html.parser.support.low.Token start)
Handle the remaining of HTML tags. |
protected void |
Script()
Read a script. |
protected void |
Sgml()
Process SGML insertion that is not a comment. |
private void |
startingTag(javax.swing.text.html.parser.TagElement tag)
This should fire additional actions in response to the ChangedCharSetException. |
protected void |
startTag(javax.swing.text.html.parser.TagElement tag)
The method is called when the HTML opening tag ((like <table>) is found or if the parser concludes that the one should be present in the current position. |
protected void |
Style()
Read a style definition. |
protected void |
Tag()
Read a html tag. |
private void |
ws_error()
|
| Methods inherited from class gnu.javax.swing.text.html.parser.support.low.ReaderTokenizer |
getEndOfLineSequence, getNextToken, getTokenAhead, getTokenAhead, mark, reset, reset |
| Methods inherited from class gnu.javax.swing.text.html.parser.support.low.Constants |
endMatches |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
hTag
public gnu.javax.swing.text.html.parser.support.low.Token hTag
- The current html tag.
dtd
protected javax.swing.text.html.parser.DTD dtd
- The document template description that will be used to parse the documents.
strict
protected boolean strict
- The value of this field determines whether or not the Parser will be
strict in enforcing SGML compatibility. The default value is false,
stating that the parser should do everything to parse and get at least
some information even from the incorrectly written HTML input.
preformatted
protected int preformatted
- This fields has positive values in preformatted tags.
documentTags
private java.util.Set documentTags
- The set of the document tags. This field is used for supporting
markFirstTime().
buffer
private java.lang.StringBuffer buffer
- The buffer to collect the incremental output like text or coment.
title
private java.lang.StringBuffer title
- The buffer to store the document title.
t
private gnu.javax.swing.text.html.parser.support.low.Token t
- The current token.
titleHandled
private boolean titleHandled
- True means that the 'title' tag of this document has
already been handled.
titleOpen
private boolean titleOpen
- True means that the 'title' tag is currently open and all
text is also added to the title buffer.
attributes
gnu.javax.swing.text.html.parser.htmlAttributeSet attributes
- The attributes of the current HTML element.
Package-private to avoid an accessor method.
validator
private gnu.javax.swing.text.html.parser.htmlValidator validator
- The validator, controlling the forcible closing of the tags that
(in accordance to dtd) are not allowed in the current context.
defaulter
private parameterDefaulter defaulter
- Provides the default values for parameters in the case when these
values are defined in the DTD.
textProcessor
private textPreProcessor textProcessor
- The text pre-processor for handling line ends and tabs.
| Constructor Detail |
Parser
public Parser(javax.swing.text.html.parser.DTD a_dtd)
- Creates a new Parser that uses the given
javax.swing.text.html.parser.DTD. The only standard way
to get an instance of DTD is to construct it manually, filling in
all required fields.
| Method Detail |
getAttributes
public gnu.javax.swing.text.html.parser.htmlAttributeSet getAttributes()
- Get the attributes of the current tag.
error
public void error(java.lang.String msg)
- Invokes the error handler. The default method in this implementation
delegates the call to handleError, also providing the current line.
error
public void error(java.lang.String msg, gnu.javax.swing.text.html.parser.support.low.Token atToken)
- Description copied from class:
gnu.javax.swing.text.html.parser.support.low.ReaderTokenizer - Invokes the error handler.
error
public void error(java.lang.String msg, java.lang.String invalid)
- Invokes the error handler. The default method in this implementation
delegates the call to error (parm1+": '"+parm2+"'").
error
public void error(java.lang.String parm1, java.lang.String parm2, java.lang.String parm3)
- Invokes the error handler. The default method in this implementation
delegates the call to error (parm1+" "+ parm2+" "+ parm3).
error
public void error(java.lang.String parm1, java.lang.String parm2, java.lang.String parm3, java.lang.String parm4)
- Invokes the error handler. The default method in this implementation
delegates the call to error (parm1+" "+ parm2+" "+ parm3+" "+ parm4).
flushAttributes
public void flushAttributes()
parse
public void parse(java.io.Reader reader) throws java.io.IOException
- Parse the HTML text, calling various methods in response to the
occurence of the corresponding HTML constructions.
parseDTDMarkup
public java.lang.String parseDTDMarkup() throws java.io.IOException
- Parses DTD markup declaration. Currently returns null without action.
parseMarkupDeclarations
public boolean parseMarkupDeclarations(java.lang.StringBuffer strBuff) throws java.io.IOException
- Parse SGML insertion ( <! ... > ). When the
the SGML insertion is found, this method is called, passing
SGML in the string buffer as a parameter. The default method
returns false without action and can be overridden to
implement user - defined SGML support.
If you need more information about SGML insertions in HTML documents, the author suggests to read SGML tutorial on
http://www.w3.org/TR/WD-html40-970708/intro/sgmltut.html. We also recommend Goldfarb C.F (1991) The SGML Handbook, Oxford University Press, 688 p, ISBN: 0198537379.
getCurrentLine
protected int getCurrentLine()
- Get the first line of the last parsed token.
CDATA
protected void CDATA(boolean clearBuffer)
throws gnu.javax.swing.text.html.parser.support.low.ParseException
- Read parseable character data, add to buffer.
Comment
protected void Comment()
throws gnu.javax.swing.text.html.parser.support.low.ParseException
- Process Comment. This method skips till --> without
taking SGML constructs into consideration. The supported SGML
constructs are handled separately.
Script
protected void Script()
throws gnu.javax.swing.text.html.parser.support.low.ParseException
- Read a script. The text, returned without any changes,
is terminated only by the closing tag SCRIPT.
Sgml
protected void Sgml()
throws gnu.javax.swing.text.html.parser.support.low.ParseException
- Process SGML insertion that is not a comment.
Style
protected void Style()
throws gnu.javax.swing.text.html.parser.support.low.ParseException
- Read a style definition. The text, returned without any changes,
is terminated only by the closing tag STYLE.
Tag
protected void Tag()
throws gnu.javax.swing.text.html.parser.support.low.ParseException
- Read a html tag.
_handleText
protected void _handleText()
- A hook, for operations, preceeding call to handleText.
Handle text in a string buffer.
In non - preformatted mode, all line breaks immediately following the
start tag and immediately before an end tag is discarded,
\r, \n and \t are replaced by spaces, multiple space are replaced
by the single one and the result is moved into array,
passing it to handleText().
append
protected final void append(gnu.javax.swing.text.html.parser.support.low.Token t)
- Add the image of this token to the buffer.
consume
protected final void consume(gnu.javax.swing.text.html.parser.support.low.pattern p)
- Consume pattern that must match.
endTag
protected void endTag(boolean omitted)
- The method is called when the HTML end (closing) tag is found or if
the parser concludes that the one should be present in the
current position. The method is called immediatly
before calling the handleEndTag().
handleComment
protected void handleComment(char[] comment)
- Handle HTML comment. The default method returns without action.
handleEOFInComment
protected void handleEOFInComment()
- This is additionally called in when the HTML content terminates
without closing the HTML comment. This can only happen if the
HTML document contains errors (for example, the closing --;gt is
missing.
handleEmptyTag
protected void handleEmptyTag(javax.swing.text.html.parser.TagElement tag) throws javax.swing.text.ChangedCharSetException
- Handle the tag with no content, like <br>. The method is
called for the elements that, in accordance with the current DTD,
has an empty content.
handleEndTag
protected void handleEndTag(javax.swing.text.html.parser.TagElement tag)
- The method is called when the HTML closing tag ((like </table>)
is found or if the parser concludes that the one should be present
in the current position.
handleError
protected void handleError(int line,
java.lang.String message)
handleStartTag
protected void handleStartTag(javax.swing.text.html.parser.TagElement tag)
- The method is called when the HTML opening tag ((like <table>)
is found or if the parser concludes that the one should be present
in the current position.
handleText
protected void handleText(char[] text)
- Handle the text section.
For non-preformatted section, the parser replaces \t, \r and \n by spaces and then multiple spaces by a single space. Additionaly, all whitespace around tags is discarded.
For pre-formatted text (inside TEXAREA and PRE), the parser preserves all tabs and spaces, but removes one bounding \r, \n or \r\n, if it is present. Additionally, it replaces each occurence of \r or \r\n by a single \n.
handleTitle
protected void handleTitle(char[] title)
- Handle HTML <title> tag. This method is invoked when
both title starting and closing tags are already behind.
The passed argument contains the concatenation of all
title text sections.
makeTag
protected javax.swing.text.html.parser.TagElement makeTag(javax.swing.text.html.parser.Element element)
- Constructs the tag from the given element. In this implementation,
this is defined, but never called.
makeTag
protected javax.swing.text.html.parser.TagElement makeTag(javax.swing.text.html.parser.Element element, boolean isSupposed)
- Constructs the tag from the given element.
markFirstTime
protected void markFirstTime(javax.swing.text.html.parser.Element element)
- This is called when the tag, representing the given element,
occurs first time in the document.
mustBe
protected gnu.javax.swing.text.html.parser.support.low.Token mustBe(int kind)
- Consume the token that was checked before and hence MUST be present.
noValueAttribute
protected void noValueAttribute(java.lang.String element, java.lang.String attribute)
- Handle attribute without value. The default method uses
the only allowed attribute value from DTD.
If the attribute is unknown or allows several values,
the HTML.NULL_ATTRIBUTE_VALUE is used. The attribute with
this value is added to the attribute set.
optional
protected gnu.javax.swing.text.html.parser.support.low.Token optional(int kind)
- Consume the optional token, if present.
parseDocument
protected void parseDocument()
throws gnu.javax.swing.text.html.parser.support.low.ParseException
- Parse the html document.
readAttributes
protected void readAttributes(java.lang.String element)
- Read the element attributes, adding them into attribute set.
resolveNamedEntity
protected java.lang.String resolveNamedEntity(java.lang.String a_tag)
- Return string, corresponding the given named entity. The name is passed
with the preceeding &, but without the ending semicolon.
resolveNumericEntity
protected char resolveNumericEntity(java.lang.String a_tag)
- Return char, corresponding the given numeric entity.
The name is passed with the preceeding , but without
the ending semicolon.
restart
protected void restart()
- Reset all fields into the intial default state, preparing the
parset for parsing the next document.
startTag
protected void startTag(javax.swing.text.html.parser.TagElement tag) throws javax.swing.text.ChangedCharSetException
- The method is called when the HTML opening tag ((like <table>)
is found or if the parser concludes that the one should be present
in the current position. The method is called immediately before
calling the handleStartTag.
_handleCompleteElement
private void _handleCompleteElement(javax.swing.text.html.parser.TagElement tag)
- Handle a complete element, when the tag content is already present in the
buffer and both starting and heading tags behind. This is called
in the case when the tag text must not be parsed for the nested
elements (elements STYLE and SCRIPT).
_handleEmptyTag
private void _handleEmptyTag(javax.swing.text.html.parser.TagElement tag)
- A hooks for operations, preceeding call to handleEmptyTag().
Handle the tag with no content, like <br>. As no any
nested tags are expected, the tag validator is not involved.
_handleEndTag
private void _handleEndTag(javax.swing.text.html.parser.TagElement tag)
- A hooks for operations, preceeding call to handleEndTag().
The method is called when the HTML closing tag
is found. Calls handleTitle after closing the 'title' tag.
_handleEndTag_remaining
void _handleEndTag_remaining(javax.swing.text.html.parser.TagElement tag)
- Actions that are also required if the closing action was
initiated by the tag validator.
Package-private to avoid an accessor method.
_handleStartTag
void _handleStartTag(javax.swing.text.html.parser.TagElement tag)
- A hooks for operations, preceeding call to handleStartTag().
The method is called when the HTML opening tag ((like <table>)
is found.
Package-private to avoid an accessor method.
forciblyCloseTheTag
private void forciblyCloseTheTag()
throws gnu.javax.swing.text.html.parser.support.low.ParseException
- Resume parsing after heavy errors in HTML tag structure.
handleComment
private void handleComment()
- Handle comment in string buffer. You can avoid allocating a char
array each time by processing your comment directly here.
makeTagElement
private javax.swing.text.html.parser.TagElement makeTagElement(java.lang.String name, boolean isSupposed)
readTillTokenE
private void readTillTokenE(int till)
throws gnu.javax.swing.text.html.parser.support.low.ParseException
- Read till the given token, resolving entities. Consume the given
token without adding it to buffer.
resolveAndAppendEntity
private void resolveAndAppendEntity(gnu.javax.swing.text.html.parser.support.low.Token entity)
- Resolve the entity and append it to the end of buffer.
restOfTag
private void restOfTag(boolean closing,
gnu.javax.swing.text.html.parser.support.low.Token name,
gnu.javax.swing.text.html.parser.support.low.Token start)
throws gnu.javax.swing.text.html.parser.support.low.ParseException
- Handle the remaining of HTML tags. This is a common end for
TAG, SCRIPT and STYLE.
startingTag
private void startingTag(javax.swing.text.html.parser.TagElement tag)
- This should fire additional actions in response to the
ChangedCharSetException. The current implementation
does nothing.
ws_error
private void ws_error()
|
|||||||||
| Home >> All >> gnu >> javax >> swing >> text >> html >> parser >> [ support overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
JAVADOC