|
|||||||||
| Home >> All >> org >> apache >> lenya >> lucene >> [ html overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
org.apache.lenya.lucene.html
Class HtmlDocument

java.lang.Objectorg.apache.lenya.lucene.html.HtmlDocument
- public class HtmlDocument
- extends java.lang.Object
The HtmlDocument class creates a Lucene org.apache.lucene.document.Document
from an HTML document.
It does this by using JTidy package. It can take input input from java.io.File or java.io.InputStream.
| Field Summary | |
private java.lang.String |
luceneClassValue
|
private java.lang.String |
luceneTagName
|
private org.w3c.dom.Element |
rawDoc
|
| Constructor Summary | |
HtmlDocument(java.io.File file)
Constructs an HtmlDocument from a java.io.File. |
|
HtmlDocument(java.io.InputStream is)
Constructs an HtmlDocument from an java.io.InputStream. |
|
| Method Summary | |
static org.apache.lucene.document.Document |
Document(java.io.File file)
Creates a Lucene Document from a java.io.File. |
java.lang.String |
getBody()
Gets the body text attribute of the HtmlDocument object. |
private java.lang.String |
getBodyText(org.w3c.dom.Node node,
boolean indexByLucene)
Gets the bodyText attribute of the HtmlDocument object. |
static org.apache.lucene.document.Document |
getDocument(java.io.InputStream is)
Creates a Lucene Document from an java.io.InputStream. |
java.lang.String |
getTitle()
Gets the title attribute of the HtmlDocument object. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
rawDoc
private org.w3c.dom.Element rawDoc
luceneTagName
private java.lang.String luceneTagName
luceneClassValue
private java.lang.String luceneClassValue
| Constructor Detail |
HtmlDocument
public HtmlDocument(java.io.File file) throws java.io.IOException
- Constructs an
HtmlDocumentfrom a java.io.File.
HtmlDocument
public HtmlDocument(java.io.InputStream is) throws java.io.IOException
- Constructs an
HtmlDocumentfrom an java.io.InputStream.
| Method Detail |
getDocument
public static org.apache.lucene.document.Document getDocument(java.io.InputStream is) throws java.io.IOException
- Creates a Lucene
Documentfrom an java.io.InputStream.
Document
public static org.apache.lucene.document.Document Document(java.io.File file) throws java.io.IOException
- Creates a Lucene
Documentfrom a java.io.File.
getTitle
public java.lang.String getTitle()
- Gets the title attribute of the
HtmlDocumentobject.
getBody
public java.lang.String getBody()
- Gets the body text attribute of the
HtmlDocumentobject.
getBodyText
private java.lang.String getBodyText(org.w3c.dom.Node node, boolean indexByLucene)
- Gets the bodyText attribute of the
HtmlDocumentobject.
|
|||||||||
| Home >> All >> org >> apache >> lenya >> lucene >> [ html overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
JAVADOC
org.apache.lenya.lucene.html.HtmlDocument