|
|||||||||
| Home >> All >> org >> htmlparser >> [ util overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
org.htmlparser.util
Class Generate

java.lang.Objectorg.htmlparser.util.Generate
- public class Generate
- extends java.lang.Object
Create a character reference translation class source file. Usage:
java -classpath .:lib/htmlparser.jar Generate > Translate.java
Derived from HTMLStringFilter.java provided as an example with the
htmlparser.jar file available at htmlparser.sourceforge.net
written by Somik Raha ( somik@industriallogic. com http://industriallogic.com).
| Field Summary | |
protected static java.lang.String |
nl
The system specific line separator string. |
protected org.htmlparser.Parser |
parser
The working parser. |
| Constructor Summary | |
Generate()
Create a Generate object. |
|
| Method Summary | |
void |
extract(java.lang.String string)
Parse the sgml declaration for character entity reference name, equivalent numeric character reference and a comment. |
int |
indexOfWhitespace(java.lang.String string,
int index)
Find the lowest index of whitespace (space or newline). |
static void |
main(java.lang.String[] args)
Generator program. |
java.lang.String |
pack(java.lang.String string)
Rewrite the comment string. |
java.lang.String |
pad(java.lang.String string,
char character,
int length)
Pad a string on the left with the given character to the length specified. |
void |
parse()
Pull out text elements from the HTML. |
java.lang.String |
pretty(java.lang.String string)
Pretty up a comment string. |
void |
sgml(java.lang.String string)
Extract special characters. |
java.lang.String |
translate(java.lang.String string)
Translate character references. |
java.lang.String |
unicode(java.lang.String string)
Convert the textual representation of the numeric character reference to a character. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
parser
protected org.htmlparser.Parser parser
- The working parser.
nl
protected static final java.lang.String nl
- The system specific line separator string.
| Constructor Detail |
Generate
public Generate()
throws ParserException
- Create a Generate object. Sets up the generation by creating a new
Parserpointed at http://www.w3.org/TR/REC-html40/sgml/entities.html with the standard scanners registered.
| Method Detail |
translate
public java.lang.String translate(java.lang.String string)
- Translate character references. After generating the Translate class we
could use it to do this job, but that would involve a bootstrap problem,
so this method does the reference conversion for a very tiny subset
(enough to understand the w3.org page).
parse
public void parse()
throws ParserException
- Pull out text elements from the HTML.
indexOfWhitespace
public int indexOfWhitespace(java.lang.String string, int index)
- Find the lowest index of whitespace (space or newline).
pack
public java.lang.String pack(java.lang.String string)
- Rewrite the comment string. In the sgml table, the comments are of the
form:
-- latin capital letter I with diaeresis, U+00CF ISOlat1so we just want to make a one-liner without the spaces and newlines.
pretty
public java.lang.String pretty(java.lang.String string)
- Pretty up a comment string.
pad
public java.lang.String pad(java.lang.String string, char character, int length)
- Pad a string on the left with the given character to the length
specified.
unicode
public java.lang.String unicode(java.lang.String string)
- Convert the textual representation of the numeric character reference to
a character.
extract
public void extract(java.lang.String string)
- Parse the sgml declaration for character entity reference name,
equivalent numeric character reference and a comment. Emit a java hash
table 'put' with the name as the key, the numeric character as the value
and comment the insertion with the comment.
sgml
public void sgml(java.lang.String string)
- Extract special characters. Scan the string looking for substrings of the
form:
<!ENTITY nbsp CDATA " " -- no-break space = non-breaking space, U+00A0 ISOnum -->
and emit a java definition for each.
main
public static void main(java.lang.String[] args) throws ParserException
- Generator program.
java -classpath .:lib/htmlparser.jar Generate > Translate.java
|
|||||||||
| Home >> All >> org >> htmlparser >> [ util overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
JAVADOC
org.htmlparser.util.Generate