Docjar: A Java Source and Docuemnt Enginecom.*    java.*    javax.*    org.*    all    new    plug-in

Quick Search    Search Deep

org.htmlparser.scanners
Class CompositeTagScanner  view CompositeTagScanner download CompositeTagScanner.java

java.lang.Object
  extended byorg.htmlparser.scanners.TagScanner
      extended byorg.htmlparser.scanners.CompositeTagScanner
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
AppletScanner, BodyScanner, BulletListScanner, BulletScanner, DivScanner, FormScanner, FrameSetScanner, HeadScanner, HtmlScanner, LabelScanner, LinkScanner, OptionTagScanner, ScriptScanner, SelectTagScanner, SpanScanner, StyleScanner, TableColumnScanner, TableRowScanner, TableScanner, TextareaTagScanner, TitleScanner

public abstract class CompositeTagScanner
extends TagScanner

To create your own scanner that can hold children, create a subclass of this class. The composite tag scanner can be configured with:

Here are examples of each:
Tags which will trigger a match If we wish to recognize <mytag>,
 
  MyScanner extends CompositeTagScanner {
    private static final String [] MATCH_IDS = { "MYTAG" };
 	 MyScanner() {
 		super(MATCH_IDS);
 	 }
 	 ...
  }
  
 
Tags which force correction If we wish to insert end tags if we get a or without recieving </mytag>
 
  MyScanner extends CompositeTagScanner {
    private static final String [] MATCH_IDS = { "MYTAG" };
    private static final String [] ENDERS = {};
    private static final String [] END_TAG_ENDERS = { "BODY", "HTML" };
 	 MyScanner() {
 		super(MATCH_IDS, ENDERS, END_TAG_ENDERS, true);
 	 }
 	 ...
  }
  
 
Preventing children of same type This is useful when you know that a certain tag can never hold children of its own type. e.g. <FORM> can never have more form tags within it. If it does, it is an error and should be corrected. The default behavior is to allow nesting.
 
  MyScanner extends CompositeTagScanner {
    private static final String [] MATCH_IDS = { "FORM" };
    private static final String [] ENDERS = {};
    private static final String [] END_TAG_ENDERS = { "BODY", "HTML" };
 	 MyScanner() {
 		super(MATCH_IDS, ENDERS,END_TAG_ENDERS, false);
 	 }
 	 ...
  }
  
 
Inside the scanner, use createTag() to specify what tag needs to be created.


Field Summary
private  boolean allowSelfChildren
           
private  boolean balance_quotes
           
private  java.util.Set endTagEnderSet
           
protected  java.lang.String[] nameOfTagToMatch
           
private  java.util.Set tagEnderSet
           
 
Fields inherited from class org.htmlparser.scanners.TagScanner
feedback, filter
 
Constructor Summary
CompositeTagScanner(java.lang.String[] nameOfTagToMatch)
           
CompositeTagScanner(java.lang.String[] nameOfTagToMatch, java.lang.String[] tagEnders)
           
CompositeTagScanner(java.lang.String[] nameOfTagToMatch, java.lang.String[] tagEnders, boolean allowSelfChildren)
           
CompositeTagScanner(java.lang.String filter, java.lang.String[] nameOfTagToMatch)
           
CompositeTagScanner(java.lang.String filter, java.lang.String[] nameOfTagToMatch, java.lang.String[] tagEnders)
           
CompositeTagScanner(java.lang.String filter, java.lang.String[] nameOfTagToMatch, java.lang.String[] tagEnders, boolean allowSelfChildren)
           
CompositeTagScanner(java.lang.String filter, java.lang.String[] nameOfTagToMatch, java.lang.String[] tagEnders, java.lang.String[] endTagEnders, boolean allowSelfChildren)
           
CompositeTagScanner(java.lang.String filter, java.lang.String[] nameOfTagToMatch, java.lang.String[] tagEnders, java.lang.String[] endTagEnders, boolean allowSelfChildren, boolean balance_quotes)
          Constructor specifying all member fields.
 
Method Summary
 void beforeScanningStarts()
          Override this method if you wish to create any data structures or do anything before the start of the scan.
 void childNodeEncountered(org.htmlparser.Node node)
          This method is called everytime a child to the composite is found.
abstract  org.htmlparser.tags.Tag createTag(org.htmlparser.tags.data.TagData tagData, org.htmlparser.tags.data.CompositeTagData compositeTagData)
          You must override this method to create the tag of your choice upon successful parsing.
 boolean isAllowSelfChildren()
           
 boolean isTagToBeEndedFor(org.htmlparser.tags.Tag tag)
           
 org.htmlparser.tags.Tag scan(org.htmlparser.tags.Tag tag, java.lang.String url, org.htmlparser.NodeReader reader, java.lang.String currLine)
          Scan the tag and extract the information related to this type.
 boolean shouldCreateEndTagAndExit()
          Override this method to implement scanner logic that determines if the current scanner is to be allowed.
 
Methods inherited from class org.htmlparser.scanners.TagScanner
absorb, absorbLeadingBlanks, adjustScanners, createScannedNode, createTag, evaluate, extractXMLData, getFilter, getID, getInsertedEndTag, getReplacedEndTag, insertEndTagBeforeNode, isXMLTagFound, removeChars, replaceFaultyTagWithEndTag, restoreScanners, setFeedback
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

nameOfTagToMatch

protected java.lang.String[] nameOfTagToMatch

allowSelfChildren

private boolean allowSelfChildren

tagEnderSet

private java.util.Set tagEnderSet

endTagEnderSet

private java.util.Set endTagEnderSet

balance_quotes

private boolean balance_quotes
Constructor Detail

CompositeTagScanner

public CompositeTagScanner(java.lang.String[] nameOfTagToMatch)

CompositeTagScanner

public CompositeTagScanner(java.lang.String[] nameOfTagToMatch,
                           java.lang.String[] tagEnders)

CompositeTagScanner

public CompositeTagScanner(java.lang.String[] nameOfTagToMatch,
                           java.lang.String[] tagEnders,
                           boolean allowSelfChildren)

CompositeTagScanner

public CompositeTagScanner(java.lang.String filter,
                           java.lang.String[] nameOfTagToMatch)

CompositeTagScanner

public CompositeTagScanner(java.lang.String filter,
                           java.lang.String[] nameOfTagToMatch,
                           java.lang.String[] tagEnders)

CompositeTagScanner

public CompositeTagScanner(java.lang.String filter,
                           java.lang.String[] nameOfTagToMatch,
                           java.lang.String[] tagEnders,
                           boolean allowSelfChildren)

CompositeTagScanner

public CompositeTagScanner(java.lang.String filter,
                           java.lang.String[] nameOfTagToMatch,
                           java.lang.String[] tagEnders,
                           java.lang.String[] endTagEnders,
                           boolean allowSelfChildren)

CompositeTagScanner

public CompositeTagScanner(java.lang.String filter,
                           java.lang.String[] nameOfTagToMatch,
                           java.lang.String[] tagEnders,
                           java.lang.String[] endTagEnders,
                           boolean allowSelfChildren,
                           boolean balance_quotes)
Constructor specifying all member fields.

Method Detail

scan

public org.htmlparser.tags.Tag scan(org.htmlparser.tags.Tag tag,
                                    java.lang.String url,
                                    org.htmlparser.NodeReader reader,
                                    java.lang.String currLine)
                             throws org.htmlparser.util.ParserException
Description copied from class: TagScanner
Scan the tag and extract the information related to this type. The url of the initiating scan has to be provided in case relative links are found. The initial url is then prepended to it to give an absolute link. The NodeReader is provided in order to do a lookahead operation. We assume that the identification has already been performed using the evaluate() method.

Overrides:
scan in class TagScanner

beforeScanningStarts

public void beforeScanningStarts()
Override this method if you wish to create any data structures or do anything before the start of the scan. This is just after a tag has triggered the scanner but before the scanner begins its processing.


childNodeEncountered

public void childNodeEncountered(org.htmlparser.Node node)
This method is called everytime a child to the composite is found. It is useful when we need to store special children seperately. Though, all children are collected anyway into a node list.


createTag

public abstract org.htmlparser.tags.Tag createTag(org.htmlparser.tags.data.TagData tagData,
                                                  org.htmlparser.tags.data.CompositeTagData compositeTagData)
                                           throws org.htmlparser.util.ParserException
You must override this method to create the tag of your choice upon successful parsing. Data required for construction of your tag can be found within tagData and compositeTagData


isTagToBeEndedFor

public final boolean isTagToBeEndedFor(org.htmlparser.tags.Tag tag)

isAllowSelfChildren

public final boolean isAllowSelfChildren()

shouldCreateEndTagAndExit

public boolean shouldCreateEndTagAndExit()
Override this method to implement scanner logic that determines if the current scanner is to be allowed. This is useful when there are rules which dont allow recursive tags of the same type.