Docjar: A Java Source and Docuemnt Enginecom.*    java.*    javax.*    org.*    all    new    plug-in

Quick Search    Search Deep

gnu.javax.swing.text.html.parser.support
Class Parser  view Parser download Parser.java

java.lang.Object
  extended bygnu.javax.swing.text.html.parser.support.low.Constants
      extended bygnu.javax.swing.text.html.parser.support.low.ReaderTokenizer
          extended bygnu.javax.swing.text.html.parser.support.Parser
All Implemented Interfaces:
javax.swing.text.html.parser.DTDConstants

public class Parser
extends gnu.javax.swing.text.html.parser.support.low.ReaderTokenizer
implements javax.swing.text.html.parser.DTDConstants

A simple error-tolerant HTML parser that uses a DTD document to access data on the possible tokens, arguments and syntax.

The parser reads an HTML content from a Reader and calls various notifying methods (which should be overridden in a subclass) when tags or data are encountered.

Some HTML elements need no opening or closing tags. The task of this parser is to invoke the tag handling methods also when the tags are not explicitly specified and must be supposed using information, stored in the DTD. For example, parsing the document

<table><tr><td>a<td>b<td>c</tr>
will invoke exactly the handling methods exactly in the same order (and with the same parameters) as if parsing the document:
<html><head></head><body><table>< tbody><tr><td>a</td><td>b </td><td>c</td></tr>< /tbody></table></body></html>

(supposed tags are given in italics). The parser also supports obsolete elements of HTML syntax.


Field Summary
(package private)  gnu.javax.swing.text.html.parser.htmlAttributeSet attributes
          The attributes of the current HTML element.
private  java.lang.StringBuffer buffer
          The buffer to collect the incremental output like text or coment.
private  parameterDefaulter defaulter
          Provides the default values for parameters in the case when these values are defined in the DTD.
private  java.util.Set documentTags
          The set of the document tags.
protected  javax.swing.text.html.parser.DTD dtd
          The document template description that will be used to parse the documents.
 gnu.javax.swing.text.html.parser.support.low.Token hTag
          The current html tag.
protected  int preformatted
          This fields has positive values in preformatted tags.
protected  boolean strict
          The value of this field determines whether or not the Parser will be strict in enforcing SGML compatibility.
private  gnu.javax.swing.text.html.parser.support.low.Token t
          The current token.
private  textPreProcessor textProcessor
          The text pre-processor for handling line ends and tabs.
private  java.lang.StringBuffer title
          The buffer to store the document title.
private  boolean titleHandled
          True means that the 'title' tag of this document has already been handled.
private  boolean titleOpen
          True means that the 'title' tag is currently open and all text is also added to the title buffer.
private  gnu.javax.swing.text.html.parser.htmlValidator validator
          The validator, controlling the forcible closing of the tags that (in accordance to dtd) are not allowed in the current context.
 
Fields inherited from class gnu.javax.swing.text.html.parser.support.low.ReaderTokenizer
advanced, backupMode
 
Fields inherited from class gnu.javax.swing.text.html.parser.support.low.Constants
AP, bDIGIT, BEGIN, bLETTER, bLINEBREAK, bNAME, bQUOTING, bSINGLE_CHAR_TOKEN, bSPECIAL, bWHITESPACE, COMMENT_END, COMMENT_OPEN, COMMENT_TRIPLEDASH_END, DOUBLE_DASH, END, ENTITY, ENTITY_NAMED, ENTITY_NUMERIC, EOF, EQ, EXCLAMATION, NUMTOKEN, OTHER, QUOT, SCRIPT, SCRIPT_CLOSE, SCRIPT_OPEN, SGML, SLASH, STYLE, STYLE_CLOSE, STYLE_OPEN, TAG, WS
 
Fields inherited from interface javax.swing.text.html.parser.DTDConstants
ANY, CDATA, CONREF, CURRENT, DEFAULT, EMPTY, ENDTAG, ENTITIES, ENTITY, FIXED, GENERAL, ID, IDREF, IDREFS, IMPLIED, MD, MODEL, MS, NAME, NAMES, NMTOKEN, NMTOKENS, NOTATION, NUMBER, NUMBERS, NUTOKEN, NUTOKENS, PARAMETER, PI, PUBLIC, RCDATA, REQUIRED, SDATA, STARTTAG, SYSTEM
 
Constructor Summary
Parser(javax.swing.text.html.parser.DTD a_dtd)
          Creates a new Parser that uses the given javax.swing.text.html.parser.DTD.
 
Method Summary
private  void _handleCompleteElement(javax.swing.text.html.parser.TagElement tag)
          Handle a complete element, when the tag content is already present in the buffer and both starting and heading tags behind.
private  void _handleEmptyTag(javax.swing.text.html.parser.TagElement tag)
          A hooks for operations, preceeding call to handleEmptyTag().
(package private)  void _handleEndTag_remaining(javax.swing.text.html.parser.TagElement tag)
          Actions that are also required if the closing action was initiated by the tag validator.
private  void _handleEndTag(javax.swing.text.html.parser.TagElement tag)
          A hooks for operations, preceeding call to handleEndTag().
(package private)  void _handleStartTag(javax.swing.text.html.parser.TagElement tag)
          A hooks for operations, preceeding call to handleStartTag().
protected  void _handleText()
          A hook, for operations, preceeding call to handleText.
protected  void append(gnu.javax.swing.text.html.parser.support.low.Token t)
          Add the image of this token to the buffer.
protected  void CDATA(boolean clearBuffer)
          Read parseable character data, add to buffer.
protected  void Comment()
          Process Comment.
protected  void consume(gnu.javax.swing.text.html.parser.support.low.pattern p)
          Consume pattern that must match.
protected  void endTag(boolean omitted)
          The method is called when the HTML end (closing) tag is found or if the parser concludes that the one should be present in the current position.
 void error(java.lang.String msg)
          Invokes the error handler.
 void error(java.lang.String msg, java.lang.String invalid)
          Invokes the error handler.
 void error(java.lang.String parm1, java.lang.String parm2, java.lang.String parm3)
          Invokes the error handler.
 void error(java.lang.String parm1, java.lang.String parm2, java.lang.String parm3, java.lang.String parm4)
          Invokes the error handler.
 void error(java.lang.String msg, gnu.javax.swing.text.html.parser.support.low.Token atToken)
          Invokes the error handler.
 void flushAttributes()
           
private  void forciblyCloseTheTag()
          Resume parsing after heavy errors in HTML tag structure.
 gnu.javax.swing.text.html.parser.htmlAttributeSet getAttributes()
          Get the attributes of the current tag.
protected  int getCurrentLine()
          Get the first line of the last parsed token.
private  void handleComment()
          Handle comment in string buffer.
protected  void handleComment(char[] comment)
          Handle HTML comment.
protected  void handleEmptyTag(javax.swing.text.html.parser.TagElement tag)
          Handle the tag with no content, like <br>.
protected  void handleEndTag(javax.swing.text.html.parser.TagElement tag)
          The method is called when the HTML closing tag ((like </table>) is found or if the parser concludes that the one should be present in the current position.
protected  void handleEOFInComment()
          This is additionally called in when the HTML content terminates without closing the HTML comment.
protected  void handleError(int line, java.lang.String message)
           
protected  void handleStartTag(javax.swing.text.html.parser.TagElement tag)
          The method is called when the HTML opening tag ((like <table>) is found or if the parser concludes that the one should be present in the current position.
protected  void handleText(char[] text)
          Handle the text section.
protected  void handleTitle(char[] title)
          Handle HTML <title> tag.
protected  javax.swing.text.html.parser.TagElement makeTag(javax.swing.text.html.parser.Element element)
          Constructs the tag from the given element.
protected  javax.swing.text.html.parser.TagElement makeTag(javax.swing.text.html.parser.Element element, boolean isSupposed)
          Constructs the tag from the given element.
private  javax.swing.text.html.parser.TagElement makeTagElement(java.lang.String name, boolean isSupposed)
           
protected  void markFirstTime(javax.swing.text.html.parser.Element element)
          This is called when the tag, representing the given element, occurs first time in the document.
protected  gnu.javax.swing.text.html.parser.support.low.Token mustBe(int kind)
          Consume the token that was checked before and hence MUST be present.
protected  void noValueAttribute(java.lang.String element, java.lang.String attribute)
          Handle attribute without value.
protected  gnu.javax.swing.text.html.parser.support.low.Token optional(int kind)
          Consume the optional token, if present.
 void parse(java.io.Reader reader)
          Parse the HTML text, calling various methods in response to the occurence of the corresponding HTML constructions.
protected  void parseDocument()
          Parse the html document.
 java.lang.String parseDTDMarkup()
          Parses DTD markup declaration.
 boolean parseMarkupDeclarations(java.lang.StringBuffer strBuff)
          Parse SGML insertion ( <! ...
protected  void readAttributes(java.lang.String element)
          Read the element attributes, adding them into attribute set.
private  void readTillTokenE(int till)
          Read till the given token, resolving entities.
private  void resolveAndAppendEntity(gnu.javax.swing.text.html.parser.support.low.Token entity)
          Resolve the entity and append it to the end of buffer.
protected  java.lang.String resolveNamedEntity(java.lang.String a_tag)
          Return string, corresponding the given named entity.
protected  char resolveNumericEntity(java.lang.String a_tag)
          Return char, corresponding the given numeric entity.
protected  void restart()
          Reset all fields into the intial default state, preparing the parset for parsing the next document.
private  void restOfTag(boolean closing, gnu.javax.swing.text.html.parser.support.low.Token name, gnu.javax.swing.text.html.parser.support.low.Token start)
          Handle the remaining of HTML tags.
protected  void Script()
          Read a script.
protected  void Sgml()
          Process SGML insertion that is not a comment.
private  void startingTag(javax.swing.text.html.parser.TagElement tag)
          This should fire additional actions in response to the ChangedCharSetException.
protected  void startTag(javax.swing.text.html.parser.TagElement tag)
          The method is called when the HTML opening tag ((like <table>) is found or if the parser concludes that the one should be present in the current position.
protected  void Style()
          Read a style definition.
protected  void Tag()
          Read a html tag.
private  void ws_error()
           
 
Methods inherited from class gnu.javax.swing.text.html.parser.support.low.ReaderTokenizer
getEndOfLineSequence, getNextToken, getTokenAhead, getTokenAhead, mark, reset, reset
 
Methods inherited from class gnu.javax.swing.text.html.parser.support.low.Constants
endMatches
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

hTag

public gnu.javax.swing.text.html.parser.support.low.Token hTag
The current html tag.


dtd

protected javax.swing.text.html.parser.DTD dtd
The document template description that will be used to parse the documents.


strict

protected boolean strict
The value of this field determines whether or not the Parser will be strict in enforcing SGML compatibility. The default value is false, stating that the parser should do everything to parse and get at least some information even from the incorrectly written HTML input.


preformatted

protected int preformatted
This fields has positive values in preformatted tags.


documentTags

private java.util.Set documentTags
The set of the document tags. This field is used for supporting markFirstTime().


buffer

private java.lang.StringBuffer buffer
The buffer to collect the incremental output like text or coment.


title

private java.lang.StringBuffer title
The buffer to store the document title.


t

private gnu.javax.swing.text.html.parser.support.low.Token t
The current token.


titleHandled

private boolean titleHandled
True means that the 'title' tag of this document has already been handled.


titleOpen

private boolean titleOpen
True means that the 'title' tag is currently open and all text is also added to the title buffer.


attributes

gnu.javax.swing.text.html.parser.htmlAttributeSet attributes
The attributes of the current HTML element. Package-private to avoid an accessor method.


validator

private gnu.javax.swing.text.html.parser.htmlValidator validator
The validator, controlling the forcible closing of the tags that (in accordance to dtd) are not allowed in the current context.


defaulter

private parameterDefaulter defaulter
Provides the default values for parameters in the case when these values are defined in the DTD.


textProcessor

private textPreProcessor textProcessor
The text pre-processor for handling line ends and tabs.

Constructor Detail

Parser

public Parser(javax.swing.text.html.parser.DTD a_dtd)
Creates a new Parser that uses the given javax.swing.text.html.parser.DTD. The only standard way to get an instance of DTD is to construct it manually, filling in all required fields.

Method Detail

getAttributes

public gnu.javax.swing.text.html.parser.htmlAttributeSet getAttributes()
Get the attributes of the current tag.


error

public void error(java.lang.String msg)
Invokes the error handler. The default method in this implementation delegates the call to handleError, also providing the current line.


error

public void error(java.lang.String msg,
                  gnu.javax.swing.text.html.parser.support.low.Token atToken)
Description copied from class: gnu.javax.swing.text.html.parser.support.low.ReaderTokenizer
Invokes the error handler.


error

public void error(java.lang.String msg,
                  java.lang.String invalid)
Invokes the error handler. The default method in this implementation delegates the call to error (parm1+": '"+parm2+"'").


error

public void error(java.lang.String parm1,
                  java.lang.String parm2,
                  java.lang.String parm3)
Invokes the error handler. The default method in this implementation delegates the call to error (parm1+" "+ parm2+" "+ parm3).


error

public void error(java.lang.String parm1,
                  java.lang.String parm2,
                  java.lang.String parm3,
                  java.lang.String parm4)
Invokes the error handler. The default method in this implementation delegates the call to error (parm1+" "+ parm2+" "+ parm3+" "+ parm4).


flushAttributes

public void flushAttributes()

parse

public void parse(java.io.Reader reader)
           throws java.io.IOException
Parse the HTML text, calling various methods in response to the occurence of the corresponding HTML constructions.


parseDTDMarkup

public java.lang.String parseDTDMarkup()
                                throws java.io.IOException
Parses DTD markup declaration. Currently returns null without action.


parseMarkupDeclarations

public boolean parseMarkupDeclarations(java.lang.StringBuffer strBuff)
                                throws java.io.IOException
Parse SGML insertion ( <! ... > ). When the the SGML insertion is found, this method is called, passing SGML in the string buffer as a parameter. The default method returns false without action and can be overridden to implement user - defined SGML support.

If you need more information about SGML insertions in HTML documents, the author suggests to read SGML tutorial on http://www.w3.org/TR/WD-html40-970708/intro/sgmltut.html. We also recommend Goldfarb C.F (1991) The SGML Handbook, Oxford University Press, 688 p, ISBN: 0198537379.


getCurrentLine

protected int getCurrentLine()
Get the first line of the last parsed token.


CDATA

protected void CDATA(boolean clearBuffer)
              throws gnu.javax.swing.text.html.parser.support.low.ParseException
Read parseable character data, add to buffer.


Comment

protected void Comment()
                throws gnu.javax.swing.text.html.parser.support.low.ParseException
Process Comment. This method skips till --> without taking SGML constructs into consideration. The supported SGML constructs are handled separately.


Script

protected void Script()
               throws gnu.javax.swing.text.html.parser.support.low.ParseException
Read a script. The text, returned without any changes, is terminated only by the closing tag SCRIPT.


Sgml

protected void Sgml()
             throws gnu.javax.swing.text.html.parser.support.low.ParseException
Process SGML insertion that is not a comment.


Style

protected void Style()
              throws gnu.javax.swing.text.html.parser.support.low.ParseException
Read a style definition. The text, returned without any changes, is terminated only by the closing tag STYLE.


Tag

protected void Tag()
            throws gnu.javax.swing.text.html.parser.support.low.ParseException
Read a html tag.


_handleText

protected void _handleText()
A hook, for operations, preceeding call to handleText. Handle text in a string buffer. In non - preformatted mode, all line breaks immediately following the start tag and immediately before an end tag is discarded, \r, \n and \t are replaced by spaces, multiple space are replaced by the single one and the result is moved into array, passing it to handleText().


append

protected final void append(gnu.javax.swing.text.html.parser.support.low.Token t)
Add the image of this token to the buffer.


consume

protected final void consume(gnu.javax.swing.text.html.parser.support.low.pattern p)
Consume pattern that must match.


endTag

protected void endTag(boolean omitted)
The method is called when the HTML end (closing) tag is found or if the parser concludes that the one should be present in the current position. The method is called immediatly before calling the handleEndTag().


handleComment

protected void handleComment(char[] comment)
Handle HTML comment. The default method returns without action.


handleEOFInComment

protected void handleEOFInComment()
This is additionally called in when the HTML content terminates without closing the HTML comment. This can only happen if the HTML document contains errors (for example, the closing --;gt is missing.


handleEmptyTag

protected void handleEmptyTag(javax.swing.text.html.parser.TagElement tag)
                       throws javax.swing.text.ChangedCharSetException
Handle the tag with no content, like <br>. The method is called for the elements that, in accordance with the current DTD, has an empty content.


handleEndTag

protected void handleEndTag(javax.swing.text.html.parser.TagElement tag)
The method is called when the HTML closing tag ((like </table>) is found or if the parser concludes that the one should be present in the current position.


handleError

protected void handleError(int line,
                           java.lang.String message)

handleStartTag

protected void handleStartTag(javax.swing.text.html.parser.TagElement tag)
The method is called when the HTML opening tag ((like <table>) is found or if the parser concludes that the one should be present in the current position.


handleText

protected void handleText(char[] text)
Handle the text section.

For non-preformatted section, the parser replaces \t, \r and \n by spaces and then multiple spaces by a single space. Additionaly, all whitespace around tags is discarded.

For pre-formatted text (inside TEXAREA and PRE), the parser preserves all tabs and spaces, but removes one bounding \r, \n or \r\n, if it is present. Additionally, it replaces each occurence of \r or \r\n by a single \n.


handleTitle

protected void handleTitle(char[] title)
Handle HTML <title> tag. This method is invoked when both title starting and closing tags are already behind. The passed argument contains the concatenation of all title text sections.


makeTag

protected javax.swing.text.html.parser.TagElement makeTag(javax.swing.text.html.parser.Element element)
Constructs the tag from the given element. In this implementation, this is defined, but never called.


makeTag

protected javax.swing.text.html.parser.TagElement makeTag(javax.swing.text.html.parser.Element element,
                                                          boolean isSupposed)
Constructs the tag from the given element.


markFirstTime

protected void markFirstTime(javax.swing.text.html.parser.Element element)
This is called when the tag, representing the given element, occurs first time in the document.


mustBe

protected gnu.javax.swing.text.html.parser.support.low.Token mustBe(int kind)
Consume the token that was checked before and hence MUST be present.


noValueAttribute

protected void noValueAttribute(java.lang.String element,
                                java.lang.String attribute)
Handle attribute without value. The default method uses the only allowed attribute value from DTD. If the attribute is unknown or allows several values, the HTML.NULL_ATTRIBUTE_VALUE is used. The attribute with this value is added to the attribute set.


optional

protected gnu.javax.swing.text.html.parser.support.low.Token optional(int kind)
Consume the optional token, if present.


parseDocument

protected void parseDocument()
                      throws gnu.javax.swing.text.html.parser.support.low.ParseException
Parse the html document.


readAttributes

protected void readAttributes(java.lang.String element)
Read the element attributes, adding them into attribute set.


resolveNamedEntity

protected java.lang.String resolveNamedEntity(java.lang.String a_tag)
Return string, corresponding the given named entity. The name is passed with the preceeding &, but without the ending semicolon.


resolveNumericEntity

protected char resolveNumericEntity(java.lang.String a_tag)
Return char, corresponding the given numeric entity. The name is passed with the preceeding &#, but without the ending semicolon.


restart

protected void restart()
Reset all fields into the intial default state, preparing the parset for parsing the next document.


startTag

protected void startTag(javax.swing.text.html.parser.TagElement tag)
                 throws javax.swing.text.ChangedCharSetException
The method is called when the HTML opening tag ((like <table>) is found or if the parser concludes that the one should be present in the current position. The method is called immediately before calling the handleStartTag.


_handleCompleteElement

private void _handleCompleteElement(javax.swing.text.html.parser.TagElement tag)
Handle a complete element, when the tag content is already present in the buffer and both starting and heading tags behind. This is called in the case when the tag text must not be parsed for the nested elements (elements STYLE and SCRIPT).


_handleEmptyTag

private void _handleEmptyTag(javax.swing.text.html.parser.TagElement tag)
A hooks for operations, preceeding call to handleEmptyTag(). Handle the tag with no content, like <br>. As no any nested tags are expected, the tag validator is not involved.


_handleEndTag

private void _handleEndTag(javax.swing.text.html.parser.TagElement tag)
A hooks for operations, preceeding call to handleEndTag(). The method is called when the HTML closing tag is found. Calls handleTitle after closing the 'title' tag.


_handleEndTag_remaining

void _handleEndTag_remaining(javax.swing.text.html.parser.TagElement tag)
Actions that are also required if the closing action was initiated by the tag validator. Package-private to avoid an accessor method.


_handleStartTag

void _handleStartTag(javax.swing.text.html.parser.TagElement tag)
A hooks for operations, preceeding call to handleStartTag(). The method is called when the HTML opening tag ((like <table>) is found. Package-private to avoid an accessor method.


forciblyCloseTheTag

private void forciblyCloseTheTag()
                          throws gnu.javax.swing.text.html.parser.support.low.ParseException
Resume parsing after heavy errors in HTML tag structure.


handleComment

private void handleComment()
Handle comment in string buffer. You can avoid allocating a char array each time by processing your comment directly here.


makeTagElement

private javax.swing.text.html.parser.TagElement makeTagElement(java.lang.String name,
                                                               boolean isSupposed)

readTillTokenE

private void readTillTokenE(int till)
                     throws gnu.javax.swing.text.html.parser.support.low.ParseException
Read till the given token, resolving entities. Consume the given token without adding it to buffer.


resolveAndAppendEntity

private void resolveAndAppendEntity(gnu.javax.swing.text.html.parser.support.low.Token entity)
Resolve the entity and append it to the end of buffer.


restOfTag

private void restOfTag(boolean closing,
                       gnu.javax.swing.text.html.parser.support.low.Token name,
                       gnu.javax.swing.text.html.parser.support.low.Token start)
                throws gnu.javax.swing.text.html.parser.support.low.ParseException
Handle the remaining of HTML tags. This is a common end for TAG, SCRIPT and STYLE.


startingTag

private void startingTag(javax.swing.text.html.parser.TagElement tag)
This should fire additional actions in response to the ChangedCharSetException. The current implementation does nothing.


ws_error

private void ws_error()