Docjar: A Java Source and Docuemnt Enginecom.*    java.*    javax.*    org.*    all    new    plug-in

Quick Search    Search Deep

org.greenstone.gatherer.cdm
Class CommandTokenizer  view CommandTokenizer download CommandTokenizer.java

java.lang.Object
  extended byorg.greenstone.gatherer.cdm.CommandTokenizer

public class CommandTokenizer
extends java.lang.Object

This class provides an extension to the standard StringTokenizer in that it recognizes quotes (or some form of bracketting) enclose a single token so in something like:
format Search '<table><img src=... </table>'
the formatting string is parsed as a single token. Unfortunately this makes countTokens() unreliable for exact measurement of tokens remaining, and only useful for determining if there are tokens left to be processed (includes any that have already been read into command buffer).

Version:
2.3

Field Summary
static int BRACKET_ENCLOSED
           
private  int count
           
static int DOUBLE_QUOTE_ENCLOSED
           
private  java.io.BufferedReader in_stream
           
private  java.util.StringTokenizer internal_tokenizer
           
static int NORMAL
           
static int QUOTE_ENCLOSED
           
 
Constructor Summary
CommandTokenizer(java.lang.String command)
          Basic Constructor.
CommandTokenizer(java.lang.String command, java.io.BufferedReader in_stream)
          Advanced Constructor.
 
Method Summary
private  java.lang.String buildToken(java.lang.StringBuffer buffer, char end_char, boolean strip_characters)
          Parse in the next token.
 int countTokens()
          Returns the minumum number of remaining tokens before the tokenizer runs out of string.
 boolean hasMoreTokens()
          Determine if there are still tokens available.
 java.lang.String nextToken()
          Method to retrieve the next token from the command, taking care to group tokens enclosed in speech marks.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

BRACKET_ENCLOSED

public static final int BRACKET_ENCLOSED
See Also:
Constant Field Values

DOUBLE_QUOTE_ENCLOSED

public static final int DOUBLE_QUOTE_ENCLOSED
See Also:
Constant Field Values

NORMAL

public static final int NORMAL
See Also:
Constant Field Values

QUOTE_ENCLOSED

public static final int QUOTE_ENCLOSED
See Also:
Constant Field Values

in_stream

private java.io.BufferedReader in_stream

count

private int count

internal_tokenizer

private java.util.StringTokenizer internal_tokenizer
Constructor Detail

CommandTokenizer

public CommandTokenizer(java.lang.String command)
Basic Constructor. Used to parse tokens from a string keeping tokens surrounded by speechmarks or square brackets intact. Thus something like:
collectionmeta collectionextra [l = en] "Hello World"
is tokenized thus
{'collectionmeta', 'collectionextra', 'l = en', 'Hello World'}


CommandTokenizer

public CommandTokenizer(java.lang.String command,
                        java.io.BufferedReader in_stream)
Advanced Constructor. As above but with one major difference. Since it is provided an input stream (presumably where the command string originated from), it is able to parse a quote enclosed command token that stretches over several lines. Each newline is preserved in the resulting token. There is an extra bitchslap here as comething like a collection extra might have html code in them that contain escaped speechmarks, so extra care must be taken not to break at them. Thus something like:
collectionmeta collectionextra [l = en] "
an example of the crazy as description we sometimes get which includes of all things something like >this which you could easily see might be a problem if I parse this niavely."
is tokenized thus
{'collectionmeta', 'collectionextra', 'l = en', '\nan example of the crazy as description we sometimes get which includes of all things something like this which you could easily see might be a problem if I parse this niavely.'}

Method Detail

countTokens

public int countTokens()
Returns the minumum number of remaining tokens before the tokenizer runs out of string. There may be more tokens than this count, but never less. The discrepancy is due to internal functionality and the fact we can't read ahead in the string or associated stream without risking the need for unpredictable push-back


hasMoreTokens

public boolean hasMoreTokens()
Determine if there are still tokens available.


nextToken

public java.lang.String nextToken()
Method to retrieve the next token from the command, taking care to group tokens enclosed in speech marks.


buildToken

private java.lang.String buildToken(java.lang.StringBuffer buffer,
                                    char end_char,
                                    boolean strip_characters)
Parse in the next token. paying heed to enclosing characters demands, escaped characters, newlines and empty buffers and consequential unexpected end of tokens