Docjar: A Java Source and Docuemnt Enginecom.*    java.*    javax.*    org.*    all    new    plug-in

Quick Search    Search Deep

org.htmlparser.util
Class Generate  view Generate download Generate.java

java.lang.Object
  extended byorg.htmlparser.util.Generate

public class Generate
extends java.lang.Object

Create a character reference translation class source file. Usage:

 
      java -classpath .:lib/htmlparser.jar Generate > Translate.java
  
 
Derived from HTMLStringFilter.java provided as an example with the htmlparser.jar file available at htmlparser.sourceforge.net written by Somik Raha ( somik@industriallogic. com http://industriallogic.com).


Field Summary
protected static java.lang.String nl
          The system specific line separator string.
protected  org.htmlparser.Parser parser
          The working parser.
 
Constructor Summary
Generate()
          Create a Generate object.
 
Method Summary
 void extract(java.lang.String string)
          Parse the sgml declaration for character entity reference name, equivalent numeric character reference and a comment.
 int indexOfWhitespace(java.lang.String string, int index)
          Find the lowest index of whitespace (space or newline).
static void main(java.lang.String[] args)
          Generator program.
 java.lang.String pack(java.lang.String string)
          Rewrite the comment string.
 java.lang.String pad(java.lang.String string, char character, int length)
          Pad a string on the left with the given character to the length specified.
 void parse()
          Pull out text elements from the HTML.
 java.lang.String pretty(java.lang.String string)
          Pretty up a comment string.
 void sgml(java.lang.String string)
          Extract special characters.
 java.lang.String translate(java.lang.String string)
          Translate character references.
 java.lang.String unicode(java.lang.String string)
          Convert the textual representation of the numeric character reference to a character.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

parser

protected org.htmlparser.Parser parser
The working parser.


nl

protected static final java.lang.String nl
The system specific line separator string.

Constructor Detail

Generate

public Generate()
         throws ParserException
Create a Generate object. Sets up the generation by creating a new Parser pointed at http://www.w3.org/TR/REC-html40/sgml/entities.html with the standard scanners registered.

Method Detail

translate

public java.lang.String translate(java.lang.String string)
Translate character references. After generating the Translate class we could use it to do this job, but that would involve a bootstrap problem, so this method does the reference conversion for a very tiny subset (enough to understand the w3.org page).


parse

public void parse()
           throws ParserException
Pull out text elements from the HTML.


indexOfWhitespace

public int indexOfWhitespace(java.lang.String string,
                             int index)
Find the lowest index of whitespace (space or newline).


pack

public java.lang.String pack(java.lang.String string)
Rewrite the comment string. In the sgml table, the comments are of the form:
 
  -- latin capital letter I with diaeresis,
              U+00CF ISOlat1
  
 
so we just want to make a one-liner without the spaces and newlines.


pretty

public java.lang.String pretty(java.lang.String string)
Pretty up a comment string.


pad

public java.lang.String pad(java.lang.String string,
                            char character,
                            int length)
Pad a string on the left with the given character to the length specified.


unicode

public java.lang.String unicode(java.lang.String string)
Convert the textual representation of the numeric character reference to a character.


extract

public void extract(java.lang.String string)
Parse the sgml declaration for character entity reference name, equivalent numeric character reference and a comment. Emit a java hash table 'put' with the name as the key, the numeric character as the value and comment the insertion with the comment.


sgml

public void sgml(java.lang.String string)
Extract special characters. Scan the string looking for substrings of the form:
 
  <!ENTITY nbsp   CDATA "&#160;" -- no-break space = non-breaking space, U+00A0 ISOnum -->
  
 
and emit a java definition for each.


main

public static void main(java.lang.String[] args)
                 throws ParserException
Generator program.
 
      java -classpath .:lib/htmlparser.jar Generate > Translate.java