java.lang.Object
org.greenstone.gatherer.cdm.CommandTokenizer
- public class CommandTokenizer
- extends java.lang.Object
This class provides an extension to the standard StringTokenizer in that it recognizes quotes (or some form of bracketting) enclose a single token so in something like:
format Search '<table><img src=... </table>'
the formatting string is parsed as a single token. Unfortunately this makes countTokens() unreliable for exact measurement of tokens remaining, and only useful for determining if there are tokens left to be processed (includes any that have already been read into command buffer).
- Version:
- 2.3
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
BRACKET_ENCLOSED
public static final int BRACKET_ENCLOSED
- See Also:
- Constant Field Values
DOUBLE_QUOTE_ENCLOSED
public static final int DOUBLE_QUOTE_ENCLOSED
- See Also:
- Constant Field Values
NORMAL
public static final int NORMAL
- See Also:
- Constant Field Values
QUOTE_ENCLOSED
public static final int QUOTE_ENCLOSED
- See Also:
- Constant Field Values
in_stream
private java.io.BufferedReader in_stream
count
private int count
internal_tokenizer
private java.util.StringTokenizer internal_tokenizer
CommandTokenizer
public CommandTokenizer(java.lang.String command)
- Basic Constructor. Used to parse tokens from a string keeping tokens surrounded by speechmarks or square brackets intact. Thus something like:
collectionmeta collectionextra [l = en] "Hello World"
is tokenized thus
{'collectionmeta', 'collectionextra', 'l = en', 'Hello World'}
CommandTokenizer
public CommandTokenizer(java.lang.String command,
java.io.BufferedReader in_stream)
- Advanced Constructor. As above but with one major difference. Since it is provided an input stream (presumably where the command string originated from), it is able to parse a quote enclosed command token that stretches over several lines. Each newline is preserved in the resulting token. There is an extra bitchslap here as comething like a collection extra might have html code in them that contain escaped speechmarks, so extra care must be taken not to break at them. Thus something like:
collectionmeta collectionextra [l = en] "
an example of the crazy as description we sometimes get which includes of all things something like
>this which you could easily see might be a problem if I parse this niavely."
is tokenized thus
{'collectionmeta', 'collectionextra', 'l = en', '\nan example of the crazy as description we sometimes get which includes of all things something like this which you could easily see might be a problem if I parse this niavely.'}
countTokens
public int countTokens()
- Returns the minumum number of remaining tokens before the tokenizer runs out of string. There may be more tokens than this count, but never less. The discrepancy is due to internal functionality and the fact we can't read ahead in the string or associated stream without risking the need for unpredictable push-back
hasMoreTokens
public boolean hasMoreTokens()
- Determine if there are still tokens available.
nextToken
public java.lang.String nextToken()
- Method to retrieve the next token from the command, taking care to group tokens enclosed in speech marks.
buildToken
private java.lang.String buildToken(java.lang.StringBuffer buffer,
char end_char,
boolean strip_characters)
- Parse in the next token. paying heed to enclosing characters demands, escaped characters, newlines and empty buffers and consequential unexpected end of tokens