|
|||||||||
| Home >> All >> org >> dom4j >> [ io overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
org.dom4j.io
Class HTMLWriter

java.lang.Objectorg.xml.sax.helpers.XMLFilterImpl
org.dom4j.io.XMLWriter
org.dom4j.io.HTMLWriter
- All Implemented Interfaces:
- org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler, org.xml.sax.ext.LexicalHandler, org.xml.sax.XMLFilter, org.xml.sax.XMLReader
- public class HTMLWriter
- extends XMLWriter
HTMLWriter takes a DOM4J tree and formats it to a stream as
HTML. This formatter is similar to XMLWriter but it outputs the text of CDATA
and Entity sections rather than the serialised format as in XML, it has an
XHTML mode, it retains whitespace in certain elements such as <PRE>,
and it supports certain elements which have no corresponding close tag such
as for <BR> and <P>.
The OutputFormat passed in to the constructor is checked for isXHTML() and
isExpandEmptyElements(). See OutputFormatfor details.
Here are the rules for this class based on an OutputFormat, "format",
passed in to the constructor:
- If an element is in getOmitElementCloseSet 55 , then it is treated specially:
- It never expands, since some browsers treat this as two separate Horizontal Rules: <HR></HR>
- If format.isXHTML() 55 , then it has a space before the closing single-tag slash, since Netscape 4.x- treats this: <HR /> as an element named "HR" with an attribute named "/", but that's better than when it refuses to recognize this: <hr/> which it thinks is an element named "HR/".
- If format.isXHTML() 55 , all elements must have either a close element, or be a closed single tag.
- If format.isExpandEmptyElements() 55 () is true, all elements are expanded except as above.
If isXHTML == true, CDATA sections look like this:
<myelement><![CDATA[My data]]></myelement>Otherwise, they look like this:
<myelement>My data</myelement>
Basically, OutputFormat.isXHTML() ==
true will produce valid XML, while format.isExpandEmptyElements() 55 determines whether empty elements are
expanded if isXHTML is true, excepting the special HTML single tags.
Also, HTMLWriter handles tags whose contents should be preformatted, that is, whitespace-preserved. By default, this set includes the tags <PRE>, <SCRIPT>, <STYLE>, and <TEXTAREA>, case insensitively. It does not include <IFRAME>. Other tags, such as <CODE>, <KBD>, <TT>, <VAR>, are usually rendered in a different font in most browsers, but don't preserve whitespace, so they also don't appear in the default list. HTML Comments are always whitespace-preserved. However, the parser you use may store comments with linefeed-only text nodes (\n) even if your platform uses another line.separator character, and HTMLWriter outputs Comment nodes exactly as the DOM is set up by the parser. See examples and discussion here:
Examples
Pretty Printing
This example shows how to pretty print a string containing a valid HTML document to a string. You can also just call the static methods of this class:
prettyPrintHTML(String) 55 or
prettyPrintHTML(String,boolean,boolean,boolean,boolean) 55 or,
prettyPrintXHTML(String) 55 for XHTML (note the X)String testPrettyPrint(String html) { StringWriter sw = new StringWriter(); OutputFormat format = OutputFormat.createPrettyPrint(); // These are the default values for createPrettyPrint, // so you needn't set them: // format.setNewlines(true); // format.setTrimText(true);</font> format.setXHTML(true); HTMLWriter writer = new HTMLWriter(sw, format); Document document = DocumentHelper.parseText(html); writer.write(document); writer.flush(); return sw.toString(); }This example shows how to create a "squeezed" document, but one that will work in browsers even if the browser line length is limited. No newlines are included, no extra whitespace at all, except where it it required by setPreformattedTags 55 .
String testCrunch(String html) { StringWriter sw = new StringWriter(); OutputFormat format = OutputFormat.createPrettyPrint(); format.setNewlines(false); format.setTrimText(true); format.setIndent(""); format.setXHTML(true); format.setExpandEmptyElements(false); format.setNewLineAfterNTags(20); org.dom4j.io.HTMLWriter writer = new HTMLWriter(sw, format); org.dom4j.Document document = DocumentHelper.parseText(html); writer.write(document); writer.flush(); return sw.toString(); }
- Version:
- $Revision: 1.21 $
| Nested Class Summary | |
private class |
HTMLWriter.FormatState
|
| Field Summary | |
protected static OutputFormat |
DEFAULT_HTML_FORMAT
|
protected static java.util.HashSet |
DEFAULT_PREFORMATTED_TAGS
|
private java.util.Stack |
formatStack
|
private java.lang.String |
lastText
|
private static java.lang.String |
lineSeparator
|
private int |
newLineAfterNTags
|
private java.util.HashSet |
omitElementCloseSet
Used to store the qualified element names which should have no close element tag |
private java.util.HashSet |
preformattedTags
|
private int |
tagsOuput
|
| Fields inherited from class org.dom4j.io.XMLWriter |
DEFAULT_FORMAT, lastOutputNodeType, LEXICAL_HANDLER_NAMES, preserve, writer |
| Fields inherited from class org.xml.sax.helpers.XMLFilterImpl |
|
| Constructor Summary | |
HTMLWriter()
|
|
HTMLWriter(OutputFormat format)
|
|
HTMLWriter(java.io.OutputStream out)
|
|
HTMLWriter(java.io.OutputStream out,
OutputFormat format)
|
|
HTMLWriter(java.io.Writer writer)
|
|
HTMLWriter(java.io.Writer writer,
OutputFormat format)
|
|
| Method Summary | |
void |
endCDATA()
Report the end of a CDATA section. |
java.util.Set |
getOmitElementCloseSet()
A clone of the Set of elements that can have their close-tags omitted. |
java.util.Set |
getPreformattedTags()
|
private java.util.HashSet |
internalGetOmitElementCloseSet()
|
boolean |
isPreformattedTag(java.lang.String qualifiedName)
DOCUMENT ME! |
private java.lang.String |
justSpaces(java.lang.String text)
|
private void |
lazyInitNewLinesAfterNTags()
|
protected void |
loadOmitElementCloseSet(java.util.Set set)
|
protected boolean |
omitElementClose(java.lang.String qualifiedName)
|
static java.lang.String |
prettyPrintHTML(java.lang.String html)
Convenience method to just get a String result. |
static java.lang.String |
prettyPrintHTML(java.lang.String html,
boolean newlines,
boolean trim,
boolean isXHTML,
boolean expandEmpty)
DOCUMENT ME! |
static java.lang.String |
prettyPrintXHTML(java.lang.String html)
Convenience method to just get a String result, but As XHTML . |
void |
setOmitElementCloseSet(java.util.Set newSet)
To use the empty set, pass an empty Set, or null: |
void |
setPreformattedTags(java.util.Set newSet)
Override the default set, which includes PRE, SCRIPT, STYLE, and TEXTAREA, case insensitively. |
void |
startCDATA()
Report the start of a CDATA section. |
protected void |
writeCDATA(java.lang.String text)
|
protected void |
writeClose(java.lang.String qualifiedName)
Overriden method to not close certain element names to avoid wierd behaviour from browsers for versions up to 5.x |
protected void |
writeDeclaration()
This will write the declaration to the given Writer. |
protected void |
writeElement(org.dom4j.Element element)
This override handles any elements that should not remove whitespace, such as <PRE>, <SCRIPT>, <STYLE>, and <TEXTAREA>. |
protected void |
writeEmptyElementClose(java.lang.String qualifiedName)
|
protected void |
writeEntity(org.dom4j.Entity entity)
|
protected void |
writeString(java.lang.String text)
|
| Methods inherited from class org.xml.sax.helpers.XMLFilterImpl |
error, fatalError, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getParent, parse, resolveEntity, setContentHandler, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setParent, skippedEntity, warning |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
lineSeparator
private static java.lang.String lineSeparator
DEFAULT_PREFORMATTED_TAGS
protected static final java.util.HashSet DEFAULT_PREFORMATTED_TAGS
DEFAULT_HTML_FORMAT
protected static final OutputFormat DEFAULT_HTML_FORMAT
formatStack
private java.util.Stack formatStack
lastText
private java.lang.String lastText
tagsOuput
private int tagsOuput
newLineAfterNTags
private int newLineAfterNTags
preformattedTags
private java.util.HashSet preformattedTags
omitElementCloseSet
private java.util.HashSet omitElementCloseSet
- Used to store the qualified element names which should have no close
element tag
| Constructor Detail |
HTMLWriter
public HTMLWriter(java.io.Writer writer)
HTMLWriter
public HTMLWriter(java.io.Writer writer, OutputFormat format)
HTMLWriter
public HTMLWriter()
throws java.io.UnsupportedEncodingException
HTMLWriter
public HTMLWriter(OutputFormat format) throws java.io.UnsupportedEncodingException
HTMLWriter
public HTMLWriter(java.io.OutputStream out) throws java.io.UnsupportedEncodingException
HTMLWriter
public HTMLWriter(java.io.OutputStream out, OutputFormat format) throws java.io.UnsupportedEncodingException
| Method Detail |
startCDATA
public void startCDATA()
throws org.xml.sax.SAXException
- Description copied from interface:
org.xml.sax.ext.LexicalHandler - Report the start of a CDATA section.
The contents of the CDATA section will be reported through the regular characters 55 event; this event is intended only to report the boundary.
- Specified by:
startCDATAin interfaceorg.xml.sax.ext.LexicalHandler- Overrides:
startCDATAin classXMLWriter
endCDATA
public void endCDATA()
throws org.xml.sax.SAXException
- Description copied from interface:
org.xml.sax.ext.LexicalHandler - Report the end of a CDATA section.
- Specified by:
endCDATAin interfaceorg.xml.sax.ext.LexicalHandler- Overrides:
endCDATAin classXMLWriter
writeCDATA
protected void writeCDATA(java.lang.String text) throws java.io.IOException
- Overrides:
writeCDATAin classXMLWriter
writeEntity
protected void writeEntity(org.dom4j.Entity entity) throws java.io.IOException
- Overrides:
writeEntityin classXMLWriter
writeDeclaration
protected void writeDeclaration()
throws java.io.IOException
- Description copied from class:
XMLWriter This will write the declaration to the given Writer. Assumes XML version 1.0 since we don't directly know.
- Overrides:
writeDeclarationin classXMLWriter
writeString
protected void writeString(java.lang.String text) throws java.io.IOException
- Overrides:
writeStringin classXMLWriter
writeClose
protected void writeClose(java.lang.String qualifiedName) throws java.io.IOException
- Overriden method to not close certain element names to avoid wierd
behaviour from browsers for versions up to 5.x
- Overrides:
writeClosein classXMLWriter
writeEmptyElementClose
protected void writeEmptyElementClose(java.lang.String qualifiedName) throws java.io.IOException
- Overrides:
writeEmptyElementClosein classXMLWriter
omitElementClose
protected boolean omitElementClose(java.lang.String qualifiedName)
internalGetOmitElementCloseSet
private java.util.HashSet internalGetOmitElementCloseSet()
loadOmitElementCloseSet
protected void loadOmitElementCloseSet(java.util.Set set)
getOmitElementCloseSet
public java.util.Set getOmitElementCloseSet()
- A clone of the Set of elements that can have their close-tags omitted. By
default it should be "AREA", "BASE", "BR", "COL", "HR", "IMG", "INPUT",
"LINK", "META", "P", "PARAM"
setOmitElementCloseSet
public void setOmitElementCloseSet(java.util.Set newSet)
- To use the empty set, pass an empty Set, or null:
setOmitElementCloseSet(new HashSet()); or setOmitElementCloseSet(null);
getPreformattedTags
public java.util.Set getPreformattedTags()
setPreformattedTags
public void setPreformattedTags(java.util.Set newSet)
Override the default set, which includes PRE, SCRIPT, STYLE, and TEXTAREA, case insensitively.
Setting Preformatted Tags
Pass in a Set of Strings, one for each tag name that should be treated like a PRE tag. You may pass in null or an empty Set to assign the empty set, in which case no tags will be treated as preformatted, except that HTML Comments will continue to be preformatted. If a tag is included in the set of preformatted tags, all whitespace within the tag will be preserved, including whitespace on the same line preceding the close tag. This will generally make the close tag not line up with the start tag, but it preserves the intention of the whitespace within the tag.
The browser considers leading whitespace before the close tag to be significant, but leading whitespace before the open tag to be insignificant. For example, if the HTML author doesn't put the close TEXTAREA tag flush to the left margin, then the TEXTAREA control in the browser will have spaces on the last line inside the control. This may be the HTML author's intent. Similarly, in a PRE, the browser treats a flushed left close PRE tag as different from a close tag with leading whitespace. Again, this must be left up to the HTML author.
Examples
Here is an example of how you can set the PreformattedTags list using setPreformattedTags to include IFRAME, as well as the default set, if you have an instance of this class named myHTMLWriter:
Set current = myHTMLWriter.getPreformattedTags(); current.add("IFRAME"); myHTMLWriter.setPreformattedTags(current); //The set is now <b>PRE, SCRIPT, STYLE, TEXTAREA, IFRAME</b>Similarly, you can simply replace it with your own:HashSet newset = new HashSet(); newset.add("PRE"); newset.add("TEXTAREA"); myHTMLWriter.setPreformattedTags(newset); //The set is now <b>{PRE, TEXTAREA}</b>You can remove all tags from the preformatted tags list, with an empty set, like this:myHTMLWriter.setPreformattedTags(new HashSet()); //The set is now <b>{}</b>or with null, like this:myHTMLWriter.setPreformattedTags(null); //The set is now <b>{}</b>
isPreformattedTag
public boolean isPreformattedTag(java.lang.String qualifiedName)
- DOCUMENT ME!
writeElement
protected void writeElement(org.dom4j.Element element) throws java.io.IOException
- This override handles any elements that should not remove whitespace,
such as <PRE>, <SCRIPT>, <STYLE>, and <TEXTAREA>.
Note: the close tags won't line up with the open tag, but we can't alter
that. See javadoc note at setPreformattedTags.
- Overrides:
writeElementin classXMLWriter
justSpaces
private java.lang.String justSpaces(java.lang.String text)
lazyInitNewLinesAfterNTags
private void lazyInitNewLinesAfterNTags()
prettyPrintHTML
public static java.lang.String prettyPrintHTML(java.lang.String html) throws java.io.IOException, java.io.UnsupportedEncodingException, org.dom4j.DocumentException
- Convenience method to just get a String result.
prettyPrintXHTML
public static java.lang.String prettyPrintXHTML(java.lang.String html) throws java.io.IOException, java.io.UnsupportedEncodingException, org.dom4j.DocumentException
- Convenience method to just get a String result, but As XHTML .
prettyPrintHTML
public static java.lang.String prettyPrintHTML(java.lang.String html, boolean newlines, boolean trim, boolean isXHTML, boolean expandEmpty) throws java.io.IOException, java.io.UnsupportedEncodingException, org.dom4j.DocumentException
- DOCUMENT ME!
|
|||||||||
| Home >> All >> org >> dom4j >> [ io overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
JAVADOC