|
|||||||||
| Home >> All >> org >> apache >> oro >> text >> [ awk overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
org.apache.oro.text.awk
Class AwkCompiler

java.lang.Objectorg.apache.oro.text.awk.AwkCompiler
- All Implemented Interfaces:
- org.apache.oro.text.regex.PatternCompiler
- public final class AwkCompiler
- extends java.lang.Object
- implements org.apache.oro.text.regex.PatternCompiler
- extends java.lang.Object
The AwkCompiler class is used to create compiled regular expressions conforming to the Awk regular expression syntax. It generates AwkPattern instances upon compilation to be used in conjunction with an AwkMatcher instance. AwkMatcher finds true leftmost-longest matches, so you must take care with how you formulate your regular expression to avoid matching more than you really want.
The supported regular expression syntax is a superset of traditional AWK, but NOT to be confused with GNU AWK or other AWK variants. Additionally, this AWK implementation is DFA-based and only supports 8-bit ASCII. Consequently, these classes can perform very fast pattern matches in most cases.
This is the traditional Awk syntax that is supported:
- Alternatives separated by |
- Quantified atoms
- *
- Match 0 or more times.
- +
- Match 1 or more times.
- ?
- Match 0 or 1 times.
- Atoms
- regular expression within parentheses
- a . matches everything including newline
- a ^ is a null token matching the beginning of a string but has no relation to newlines (and is only valid at the beginning of a regex; this differs from traditional awk for the sake of efficiency in Java).
- a $ is a null token matching the end of a string but has no relation to newlines (and is only valid at the end of a regex; this differs from traditional awk for the sake of efficiency in Java).
- Character classes (e.g., [abcd]) and ranges (e.g. [a-z])
- Special backslashed characters work within a character class
- Special backslashed characters
- \b
- backspace
- \n
- newline
- \r
- carriage return
- \t
- tab
- \f
- formfeed
- \xnn
- hexadecimal representation of character
- \nn or \nnn
- octal representation of character
- Any other backslashed character matches itself
This is the extended syntax that is supported:
- Quantified atoms
- {n,m}
- Match at least n but not more than m times.
- {n,}
- Match at least n times.
- {n}
- Match exactly n times.
- Atoms
- Special backslashed characters
- \d
- digit [0-9]
- \D
- non-digit [^0-9]
- \w
- word character [0-9a-z_A-Z]
- \W
- a non-word character [^0-9a-z_A-Z]
- \s
- a whitespace character [ \t\n\r\f]
- \S
- a non-whitespace character [^ \t\n\r\f]
- \cD
- matches the corresponding control character
- \0
- matches null character
- Special backslashed characters
- Since:
- 1.0
- Version:
- @version@
| Field Summary | |
private boolean |
__beginAnchor
|
private int |
__bytesRead
|
private boolean |
__caseSensitive
|
private int |
__closeParen
|
private boolean |
__endAnchor
|
private int |
__expressionLength
|
private boolean |
__inCharacterClass
|
private char |
__lookahead
|
private boolean |
__multiline
|
private int |
__openParen
|
private int |
__position
|
private char[] |
__regularExpression
|
(package private) static char |
_END_OF_INPUT
|
static int |
CASE_INSENSITIVE_MASK
A mask passed as an option to the compile 55 methods to indicate a compiled regular expression should be case insensitive. |
static int |
DEFAULT_MASK
The default mask for the compile 55 methods. |
static int |
MULTILINE_MASK
A mask passed as an option to the compile 55 methods to indicate a compiled regular expression should treat input as having multiple lines. |
| Constructor Summary | |
AwkCompiler()
|
|
| Method Summary | |
private SyntaxNode |
__atom()
|
private SyntaxNode |
__backslashToken()
|
private SyntaxNode |
__branch()
|
private SyntaxNode |
__characterClass()
|
private static boolean |
__isMetachar(char token)
|
private void |
__match(char token)
|
private int |
__parseUnsignedInteger(int radix,
int minDigits,
int maxDigits)
|
private SyntaxNode |
__piece()
|
private void |
__putback()
|
private SyntaxNode |
__regex()
|
private SyntaxNode |
__repetition(SyntaxNode atom)
|
(package private) static boolean |
_isLowerCase(char token)
|
(package private) static boolean |
_isUpperCase(char token)
|
(package private) static boolean |
_isWordCharacter(char token)
|
(package private) SyntaxNode |
_newTokenNode(char token,
int position)
|
(package private) SyntaxTree |
_parse(char[] expression)
|
(package private) static char |
_toggleCase(char token)
|
org.apache.oro.text.regex.Pattern |
compile(char[] pattern)
Same as calling compile(pattern, AwkCompiler.DEFAULT_MASK); |
org.apache.oro.text.regex.Pattern |
compile(char[] pattern,
int options)
Compiles an Awk regular expression into an AwkPattern instance that can be used by an AwkMatcher object to perform pattern matching. |
org.apache.oro.text.regex.Pattern |
compile(java.lang.String pattern)
Same as calling compile(pattern, AwkCompiler.DEFAULT_MASK); |
org.apache.oro.text.regex.Pattern |
compile(java.lang.String pattern,
int options)
Compiles an Awk regular expression into an AwkPattern instance that can be used by an AwkMatcher object to perform pattern matching. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
DEFAULT_MASK
public static final int DEFAULT_MASK
- The default mask for the compile 55 methods.
It is equal to 0 and indicates no special options are active.
- See Also:
- Constant Field Values
CASE_INSENSITIVE_MASK
public static final int CASE_INSENSITIVE_MASK
- A mask passed as an option to the compile 55 methods
to indicate a compiled regular expression should be case insensitive.
- See Also:
- Constant Field Values
MULTILINE_MASK
public static final int MULTILINE_MASK
- A mask passed as an option to the compile 55 methods
to indicate a compiled regular expression should treat input as having
multiple lines. This option affects the interpretation of
the . metacharacters. When this mask is used,
the . metacharacter will not match newlines. The default
behavior is for . to match newlines.
- See Also:
- Constant Field Values
_END_OF_INPUT
static final char _END_OF_INPUT
- See Also:
- Constant Field Values
__inCharacterClass
private boolean __inCharacterClass
__caseSensitive
private boolean __caseSensitive
__multiline
private boolean __multiline
__beginAnchor
private boolean __beginAnchor
__endAnchor
private boolean __endAnchor
__lookahead
private char __lookahead
__position
private int __position
__bytesRead
private int __bytesRead
__expressionLength
private int __expressionLength
__regularExpression
private char[] __regularExpression
__openParen
private int __openParen
__closeParen
private int __closeParen
| Constructor Detail |
AwkCompiler
public AwkCompiler()
| Method Detail |
__isMetachar
private static boolean __isMetachar(char token)
_isWordCharacter
static boolean _isWordCharacter(char token)
_isLowerCase
static boolean _isLowerCase(char token)
_isUpperCase
static boolean _isUpperCase(char token)
_toggleCase
static char _toggleCase(char token)
__match
private void __match(char token)
throws org.apache.oro.text.regex.MalformedPatternException
__putback
private void __putback()
__regex
private SyntaxNode __regex() throws org.apache.oro.text.regex.MalformedPatternException
__branch
private SyntaxNode __branch() throws org.apache.oro.text.regex.MalformedPatternException
__piece
private SyntaxNode __piece() throws org.apache.oro.text.regex.MalformedPatternException
__parseUnsignedInteger
private int __parseUnsignedInteger(int radix,
int minDigits,
int maxDigits)
throws org.apache.oro.text.regex.MalformedPatternException
__repetition
private SyntaxNode __repetition(SyntaxNode atom) throws org.apache.oro.text.regex.MalformedPatternException
__backslashToken
private SyntaxNode __backslashToken() throws org.apache.oro.text.regex.MalformedPatternException
__atom
private SyntaxNode __atom() throws org.apache.oro.text.regex.MalformedPatternException
__characterClass
private SyntaxNode __characterClass() throws org.apache.oro.text.regex.MalformedPatternException
_newTokenNode
SyntaxNode _newTokenNode(char token, int position)
_parse
SyntaxTree _parse(char[] expression) throws org.apache.oro.text.regex.MalformedPatternException
compile
public org.apache.oro.text.regex.Pattern compile(char[] pattern, int options) throws org.apache.oro.text.regex.MalformedPatternException
- Compiles an Awk regular expression into an AwkPattern instance that
can be used by an AwkMatcher object to perform pattern matching.
- Specified by:
compilein interfaceorg.apache.oro.text.regex.PatternCompiler
compile
public org.apache.oro.text.regex.Pattern compile(java.lang.String pattern, int options) throws org.apache.oro.text.regex.MalformedPatternException
- Compiles an Awk regular expression into an AwkPattern instance that
can be used by an AwkMatcher object to perform pattern matching.
- Specified by:
compilein interfaceorg.apache.oro.text.regex.PatternCompiler
compile
public org.apache.oro.text.regex.Pattern compile(char[] pattern) throws org.apache.oro.text.regex.MalformedPatternException
- Same as calling compile(pattern, AwkCompiler.DEFAULT_MASK);
- Specified by:
compilein interfaceorg.apache.oro.text.regex.PatternCompiler
compile
public org.apache.oro.text.regex.Pattern compile(java.lang.String pattern) throws org.apache.oro.text.regex.MalformedPatternException
- Same as calling compile(pattern, AwkCompiler.DEFAULT_MASK);
- Specified by:
compilein interfaceorg.apache.oro.text.regex.PatternCompiler
|
|||||||||
| Home >> All >> org >> apache >> oro >> text >> [ awk overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
JAVADOC
org.apache.oro.text.awk.AwkCompiler