java.lang.Object
org.htmlparser.visitors.NodeVisitor
org.htmlparser.visitors.TextExtractingVisitor
- public class TextExtractingVisitor
- extends NodeVisitor
Extracts text from a web page. Usage:
Parser parser = new Parser(...);
TextExtractingVisitor visitor = new TextExtractingVisitor();
parser.visitAllNodesWith(visitor);
String textInPage = visitor.getExtractedText();
| Fields inherited from class org.htmlparser.visitors.NodeVisitor |
|
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
textAccumulator
private java.lang.StringBuffer textAccumulator
preTagBeingProcessed
private boolean preTagBeingProcessed
TextExtractingVisitor
public TextExtractingVisitor()
getExtractedText
public java.lang.String getExtractedText()
visitStringNode
public void visitStringNode(org.htmlparser.StringNode stringNode)
- Overrides:
visitStringNode in class NodeVisitor
visitTitleTag
public void visitTitleTag(org.htmlparser.tags.TitleTag titleTag)
- Overrides:
visitTitleTag in class NodeVisitor
replaceNonBreakingSpaceWithOrdinarySpace
private java.lang.String replaceNonBreakingSpaceWithOrdinarySpace(java.lang.String text)
visitEndTag
public void visitEndTag(org.htmlparser.tags.EndTag endTag)
- Overrides:
visitEndTag in class NodeVisitor
visitTag
public void visitTag(org.htmlparser.tags.Tag tag)
- Overrides:
visitTag in class NodeVisitor
isPreTag
private boolean isPreTag(org.htmlparser.tags.Tag tag)