|
|||||||||
| Home >> All >> org >> apache >> oro >> text >> [ perl overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
org.apache.oro.text.perl
Class Perl5Util

java.lang.Objectorg.apache.oro.text.perl.Perl5Util
- All Implemented Interfaces:
- org.apache.oro.text.regex.MatchResult
- public final class Perl5Util
- extends java.lang.Object
- implements org.apache.oro.text.regex.MatchResult
- extends java.lang.Object
This is a utility class implementing the 3 most common Perl5 operations involving regular expressions:
- [m]/pattern/[i][m][s][x],
- s/pattern/replacement/[g][i][m][o][s][x],
- and split().
The objective of the class is to minimize the amount of code a Java
programmer using Jakarta-ORO
has to write to achieve the same results as Perl by
transparently handling regular expression compilation, caching, and
matching. A second objective is to use the same Perl pattern matching
syntax to ease the task of Perl programmers transitioning to Java
(this also reduces the number of parameters to a method).
All the state affecting methods are synchronized to avoid
the maintenance of explicit locks in multithreaded programs. This
philosophy differs from the
org.apache.oro.text.regex package, where
you are expected to either maintain explicit locks, or more preferably
create separate compiler and matcher instances for each thread.
To use this class, first create an instance using the default constructor or initialize the instance with a PatternCache of your choosing using the alternate constructor. The default cache used by Perl5Util is a PatternCacheLRU of capacity GenericPatternCache.DEFAULT_CAPACITY. You may want to create a cache with a different capacity, a different cache replacement policy, or even devise your own PatternCache implementation. The PatternCacheLRU is probably the best general purpose pattern cache, but your specific application may be better served by a different cache replacement policy. You should remember that you can front-load a cache with all the patterns you will be using before initializing a Perl5Util instance, or you can just let Perl5Util fill the cache as you use it.
You might use the class as follows:
Perl5Util util = new Perl5Util();
String line;
DataInputStream input;
PrintStream output;
// Initialization of input and output omitted
while((line = input.readLine()) != null) {
// First find the line with the string we want to substitute because
// it is cheaper than blindly substituting each line.
if(util.match("/HREF=\"description1.html\"/")) {
line = util.substitute("s/description1\\.html/about1.html/", line);
}
output.println(line);
}
A couple of things to remember when using this class are that the
match() 55 methods have the same meaning as
Perl5Matcher.contains() 55
and =~ m/pattern/ in Perl. The methods are named match
to more closely associate them with Perl and to differentiate them
from Perl5Matcher.matches() 55 .
A further thing to keep in mind is that the
MalformedPerl5PatternException class is derived from
RuntimeException which means you DON'T have to catch it. The reasoning
behind this is that you will detect your regular expression mistakes
as you write and debug your program when a MalformedPerl5PatternException
is thrown during a test run. However, we STRONGLY recommend that you
ALWAYS catch MalformedPerl5PatternException whenever you deal with a
DYNAMICALLY created pattern. Relying on a fatal
MalformedPerl5PatternException being thrown to detect errors while
debugging is only useful for dealing with static patterns, that is, actual
pregenerated strings present in your program. Patterns created from user
input or some other dynamic method CANNOT be relied upon to be correct
and MUST be handled by catching MalformedPerl5PatternException for your
programs to be robust.
Finally, as a convenience Perl5Util implements
the MatchResult interface.
The methods are merely wrappers which call the corresponding method of
the last MatchResult
found (which can be accessed with getMatch() 55 ) by a match or
substitution (or even a split, but this isn't particularly useful).
At the moment, the
MatchResult returned
by getMatch() 55 is not stored in a thread-local variable. Therefore
concurrent calls to getMatch() 55 will produce unpredictable
results. So if your concurrent program requires the match results,
you must protect the matching and the result retrieval in a critical
section. If you do not need match results, you don't need to do anything
special. If you feel the J2SE implementation of getMatch() 55
should use a thread-local variable and obviate the need for a critical
section, please express your views on the oro-dev mailing list.
- Since:
- 1.0
- Version:
- @version@
| Field Summary | |
private org.apache.oro.util.Cache |
__expressionCache
The hashtable to cache higher-level expressions |
private int |
__inputBeginOffset
Keeps track of the begin and end offsets of the original input for the postMatch() and preMatch() methods. |
private int |
__inputEndOffset
Keeps track of the begin and end offsets of the original input for the postMatch() and preMatch() methods. |
private org.apache.oro.text.regex.MatchResult |
__lastMatch
The last match from a successful call to a matching method. |
private org.apache.oro.text.regex.Perl5Matcher |
__matcher
The pattern matcher to perform matching operations. |
private static java.lang.String |
__matchExpression
The regular expression to use to parse match expression. |
private org.apache.oro.text.regex.Pattern |
__matchPattern
The compiled match expression parsing regular expression. |
private static java.lang.String |
__nullString
Used for default return value of post and pre Match() |
private java.lang.Object |
__originalInput
Keeps track of the original input (for postMatch() and preMatch()) methods. |
private org.apache.oro.text.PatternCache |
__patternCache
The pattern cache to compile and store patterns |
private java.util.ArrayList |
__splitList
A container for temporarily holding the results of a split before deleting trailing empty fields. |
static int |
SPLIT_ALL
A constant passed to the split() 55 methods indicating that all occurrences of a pattern should be used to split a string. |
| Constructor Summary | |
Perl5Util()
Default constructor for Perl5Util. |
|
Perl5Util(org.apache.oro.text.PatternCache cache)
A secondary constructor for Perl5Util. |
|
| Method Summary | |
private void |
__compilePatterns()
Compiles the patterns (currently only the match expression) used to parse Perl5 expressions. |
private org.apache.oro.text.regex.Pattern |
__parseMatchExpression(java.lang.String pattern)
Parses a match expression and returns a compiled pattern. |
int |
begin(int group)
Returns the begin offset of the subgroup of the last match found relative the beginning of the match. |
int |
beginOffset(int group)
Returns an offset marking the beginning of the last pattern match found relative to the beginning of the input from which the match was extracted. |
int |
end(int group)
Returns the end offset of the subgroup of the last match found relative the beginning of the match. |
int |
endOffset(int group)
Returns an offset marking the end of the last pattern match found relative to the beginning of the input from which the match was extracted. |
org.apache.oro.text.regex.MatchResult |
getMatch()
Returns the last match found by a call to a match(), substitute(), or split() method. |
java.lang.String |
group(int group)
Returns the contents of the parenthesized subgroups of the last match found according to the behavior dictated by the MatchResult interface. |
int |
groups()
|
int |
length()
Returns the length of the last match found. |
boolean |
match(java.lang.String pattern,
char[] input)
Searches for the first pattern match somewhere in a character array taking a pattern specified in Perl5 native format: |
boolean |
match(java.lang.String pattern,
org.apache.oro.text.regex.PatternMatcherInput input)
Searches for the next pattern match somewhere in a org.apache.oro.text.regex.PatternMatcherInput instance, taking a pattern specified in Perl5 native format: |
boolean |
match(java.lang.String pattern,
java.lang.String input)
Searches for the first pattern match in a String taking a pattern specified in Perl5 native format: |
java.lang.String |
postMatch()
Returns the part of the input following the last match found. |
char[] |
postMatchCharArray()
Returns the part of the input following the last match found as a char array. |
java.lang.String |
preMatch()
Returns the part of the input preceding the last match found. |
char[] |
preMatchCharArray()
Returns the part of the input preceding the last match found as a char array. |
void |
split(java.util.Collection results,
java.lang.String input)
Splits input in the default Perl manner, splitting on all whitespace. |
void |
split(java.util.Collection results,
java.lang.String pattern,
java.lang.String input)
This method is identical to calling: |
void |
split(java.util.Collection results,
java.lang.String pattern,
java.lang.String input,
int limit)
Splits a String into strings that are appended to a List, but no more than a specified limit. |
java.util.Vector |
split(java.lang.String input)
Deprecated. Use split(Collection results, String input) 55 instead. |
java.util.Vector |
split(java.lang.String pattern,
java.lang.String input)
Deprecated. Use split(Collection results, String pattern, String input) 55 instead. |
java.util.Vector |
split(java.lang.String pattern,
java.lang.String input,
int limit)
Deprecated. Use split(Collection results, String pattern, String input, int limit) 55
instead. |
int |
substitute(java.lang.StringBuffer result,
java.lang.String expression,
java.lang.String input)
Substitutes a pattern in a given input with a replacement string. |
java.lang.String |
substitute(java.lang.String expression,
java.lang.String input)
Substitutes a pattern in a given input with a replacement string. |
java.lang.String |
toString()
Returns the same as group(0). |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
__matchExpression
private static final java.lang.String __matchExpression
- The regular expression to use to parse match expression.
- See Also:
- Constant Field Values
__patternCache
private org.apache.oro.text.PatternCache __patternCache
- The pattern cache to compile and store patterns
__expressionCache
private org.apache.oro.util.Cache __expressionCache
- The hashtable to cache higher-level expressions
__matcher
private org.apache.oro.text.regex.Perl5Matcher __matcher
- The pattern matcher to perform matching operations.
__matchPattern
private org.apache.oro.text.regex.Pattern __matchPattern
- The compiled match expression parsing regular expression.
__lastMatch
private org.apache.oro.text.regex.MatchResult __lastMatch
- The last match from a successful call to a matching method.
__splitList
private java.util.ArrayList __splitList
- A container for temporarily holding the results of a split before
deleting trailing empty fields.
__originalInput
private java.lang.Object __originalInput
- Keeps track of the original input (for postMatch() and preMatch())
methods. This will be discarded if the preMatch() and postMatch()
methods are moved into the MatchResult interface.
__inputBeginOffset
private int __inputBeginOffset
- Keeps track of the begin and end offsets of the original input for
the postMatch() and preMatch() methods.
__inputEndOffset
private int __inputEndOffset
- Keeps track of the begin and end offsets of the original input for
the postMatch() and preMatch() methods.
__nullString
private static final java.lang.String __nullString
- Used for default return value of post and pre Match()
- See Also:
- Constant Field Values
SPLIT_ALL
public static final int SPLIT_ALL
- A constant passed to the split() 55 methods indicating
that all occurrences of a pattern should be used to split a string.
- See Also:
- Constant Field Values
| Constructor Detail |
Perl5Util
public Perl5Util(org.apache.oro.text.PatternCache cache)
- A secondary constructor for Perl5Util. It initializes the Perl5Matcher
used by the class to perform matching operations, but requires the
programmer to provide a PatternCache instance for the class
to use to compile and store regular expressions. You would want to
use this constructor if you want to change the capacity or policy
of the cache used. Example uses might be:
// We know we're going to use close to 50 expressions a whole lot, so // we create a cache of the proper size. util = new Perl5Util(new PatternCacheLRU(50));
or// We're only going to use a few expressions and know that second-chance // fifo is best suited to the order in which we are using the patterns. util = new Perl5Util(new PatternCacheFIFO2(10));
Perl5Util
public Perl5Util()
- Default constructor for Perl5Util. This initializes the Perl5Matcher
used by the class to perform matching operations and creates a
default PatternCacheLRU instance to use to compile and cache regular
expressions. The size of this cache is
GenericPatternCache.DEFAULT_CAPACITY.
| Method Detail |
__compilePatterns
private void __compilePatterns()
- Compiles the patterns (currently only the match expression) used to
parse Perl5 expressions. Right now it initializes __matchPattern.
__parseMatchExpression
private org.apache.oro.text.regex.Pattern __parseMatchExpression(java.lang.String pattern) throws MalformedPerl5PatternException
- Parses a match expression and returns a compiled pattern.
First checks the expression cache and if the pattern is not found,
then parses the expression and fetches a compiled pattern from the
pattern cache. Otherwise, just uses the pattern found in the
expression cache. __matchPattern is used to parse the expression.
match
public boolean match(java.lang.String pattern, char[] input) throws MalformedPerl5PatternException
- Searches for the first pattern match somewhere in a character array
taking a pattern specified in Perl5 native format:
The[m]/pattern/[i][m][s][x]
mprefix is optional and the meaning of the optional trailing options are:- i
- case insensitive match
- m
- treat the input as consisting of multiple lines
- s
- treat the input as consisting of a single line
- x
- enable extended expression syntax incorporating whitespace and comments
If the input contains the pattern, the org.apache.oro.text.regex.MatchResult can be obtained by calling
getMatch()55 . However, Perl5Util implements the MatchResult interface as a wrapper around the last MatchResult found, so you can call its methods to access match information.
match
public boolean match(java.lang.String pattern, java.lang.String input) throws MalformedPerl5PatternException
- Searches for the first pattern match in a String taking
a pattern specified in Perl5 native format:
The[m]/pattern/[i][m][s][x]
mprefix is optional and the meaning of the optional trailing options are:- i
- case insensitive match
- m
- treat the input as consisting of multiple lines
- s
- treat the input as consisting of a single line
- x
- enable extended expression syntax incorporating whitespace and comments
If the input contains the pattern, the MatchResult can be obtained by calling
getMatch()55 . However, Perl5Util implements the MatchResult interface as a wrapper around the last MatchResult found, so you can call its methods to access match information.
match
public boolean match(java.lang.String pattern, org.apache.oro.text.regex.PatternMatcherInput input) throws MalformedPerl5PatternException
- Searches for the next pattern match somewhere in a
org.apache.oro.text.regex.PatternMatcherInput instance, taking
a pattern specified in Perl5 native format:
The[m]/pattern/[i][m][s][x]
mprefix is optional and the meaning of the optional trailing options are:- i
- case insensitive match
- m
- treat the input as consisting of multiple lines
- s
- treat the input as consisting of a single line
- x
- enable extended expression syntax incorporating whitespace and comments
If the input contains the pattern, the MatchResult can be obtained by calling
getMatch()55 . However, Perl5Util implements the MatchResult interface as a wrapper around the last MatchResult found, so you can call its methods to access match information. After the call to this method, the PatternMatcherInput current offset is advanced to the end of the match, so you can use it to repeatedly search for expressions in the entire input using a while loop as explained in the PatternMatcherInput documentation.
getMatch
public org.apache.oro.text.regex.MatchResult getMatch()
- Returns the last match found by a call to a match(), substitute(), or
split() method. This method is only intended for use to retrieve a match
found by the last match found by a match() method. This method should
be used when you want to save MatchResult instances. Otherwise, for
simply accessing match information, it is more convenient to use the
Perl5Util methods implementing the MatchResult interface.
substitute
public int substitute(java.lang.StringBuffer result, java.lang.String expression, java.lang.String input) throws MalformedPerl5PatternException
- Substitutes a pattern in a given input with a replacement string.
The substitution expression is specified in Perl5 native format:
Thes/pattern/replacement/[g][i][m][o][s][x]
sprefix is mandatory and the meaning of the optional trailing options are:- g
- Substitute all occurrences of pattern with replacement. The default is to replace only the first occurrence.
- i
- perform a case insensitive match
- m
- treat the input as consisting of multiple lines
- o
- If variable interopolation is used, only evaluate the interpolation once (the first time). This is equivalent to using a numInterpolations argument of 1 in Util.substitute() 55 . The default is to compute each interpolation independently. See Util.substitute() 55 and Perl5Substitution for more details on variable interpolation in substitutions.
- s
- treat the input as consisting of a single line
- x
- enable extended expression syntax incorporating whitespace and comments
when you could more easily write:numSubs = util.substitute(result, "s/foo\\/bar/goo\\/\\/baz/", input);
where the hashmarks are used instead of slashes.numSubs = util.substitute(result, "s#foo/bar#goo//baz#", input);
There is a special case of backslashing that you need to pay attention to. As demonstrated above, to denote a delimiter in the substituted string it must be backslashed. However, this can be a problem when you want to denote a backslash at the end of the substituted string. As of PerlTools 1.3, a new means of handling this situation has been implemented. In previous versions, the behavior was that
"... a double backslash (quadrupled in the Java String) always represents two backslashes unless the second backslash is followed by the delimiter, in which case it represents a single backslash."
The new behavior is that a backslash is always a backslash in the substitution portion of the expression unless it is used to escape a delimiter. A backslash is considered to escape a delimiter if an even number of contiguous backslashes preceed the backslash and the delimiter following the backslash is not the FINAL delimiter in the expression. Therefore, backslashes preceding final delimiters are never considered to escape the delimiter. The following, which used to be an invalid expression and require a special-case extra backslash, will now replace all instances of / with \:
numSubs = util.substitute(result, "s#/#\\#g", input);
- Since:
- 2.0.6
substitute
public java.lang.String substitute(java.lang.String expression, java.lang.String input) throws MalformedPerl5PatternException
- Substitutes a pattern in a given input with a replacement string.
The substitution expression is specified in Perl5 native format.
- Calling this method is the same as:
-
String result; StringBuffer buffer = new StringBuffer(); perl.substitute(buffer, expression, input); result = buffer.toString();
- Since:
- 1.0
split
public void split(java.util.Collection results, java.lang.String pattern, java.lang.String input, int limit) throws MalformedPerl5PatternException
- Splits a String into strings that are appended to a List, but no more
than a specified limit. The String is split using a regular expression
as the delimiter. The regular expression is a pattern specified
in Perl5 native format:
The[m]/pattern/[i][m][s][x]
mprefix is optional and the meaning of the optional trailing options are:- i
- case insensitive match
- m
- treat the input as consisting of multiple lines
- s
- treat the input as consisting of a single line
- x
- enable extended expression syntax incorporating whitespace and comments
The limit parameter causes the string to be split on at most the first limit - 1 number of pattern occurences.
Of special note is that this split method performs EXACTLY the same as the Perl split() function. In other words, if the split pattern contains parentheses, additional Vector elements are created from each of the matching subgroups in the pattern. Using an example similar to the one from the Camel book:
produces the Vector containing:split(list, "/([,-])/", "8-12,15,18")
Furthermore, the following Perl behavior is observed: "leading empty fields are preserved, and empty trailing one are deleted." This has the effect that a split on a zero length string returns an empty list. The Util.split() 55 method does NOT implement these behaviors because it is intended to be a general self-consistent and predictable split function usable with Pattern instances other than Perl5Pattern.{ "8", "-", "12", ",", "15", ",", "18" }
split
public void split(java.util.Collection results, java.lang.String pattern, java.lang.String input) throws MalformedPerl5PatternException
- This method is identical to calling:
split(results, pattern, input, SPLIT_ALL);
split
public void split(java.util.Collection results, java.lang.String input) throws MalformedPerl5PatternException
- Splits input in the default Perl manner, splitting on all whitespace.
This method is identical to calling:
split(results, "/\\s+/", input);
split
public java.util.Vector split(java.lang.String pattern, java.lang.String input, int limit) throws MalformedPerl5PatternException
- Deprecated. Use
split(Collection results, String pattern, String input, int limit)55 instead.- Splits a String into strings contained in a Vector of size no greater than a specified limit. The String is split using a regular expression as the delimiter. The regular expression is a pattern specified in Perl5 native format:
The[m]/pattern/[i][m][s][x]
mprefix is optional and the meaning of the optional trailing options are:- i
- case insensitive match
- m
- treat the input as consisting of multiple lines
- s
- treat the input as consisting of a single line
- x
- enable extended expression syntax incorporating whitespace and comments
The limit parameter causes the string to be split on at most the first limit - 1 number of pattern occurences.
Of special note is that this split method performs EXACTLY the same as the Perl split() function. In other words, if the split pattern contains parentheses, additional Vector elements are created from each of the matching subgroups in the pattern. Using an example similar to the one from the Camel book:
produces the Vector containing:split("/([,-])/", "8-12,15,18")
The Util.split() 55 method does NOT implement this particular behavior because it is intended to be usable with Pattern instances other than Perl5Pattern.{ "8", "-", "12", ",", "15", ",", "18" } - Splits a String into strings contained in a Vector of size no greater than a specified limit. The String is split using a regular expression as the delimiter. The regular expression is a pattern specified in Perl5 native format:
split
public java.util.Vector split(java.lang.String pattern, java.lang.String input) throws MalformedPerl5PatternException
- Deprecated. Use
split(Collection results, String pattern, String input)55 instead.- This method is identical to calling:
split(pattern, input, SPLIT_ALL);
- This method is identical to calling:
split
public java.util.Vector split(java.lang.String input) throws MalformedPerl5PatternException
- Deprecated. Use
split(Collection results, String input)55 instead.- Splits input in the default Perl manner, splitting on all whitespace. This method is identical to calling:
split("/\\s+/", input); - Splits input in the default Perl manner, splitting on all whitespace. This method is identical to calling:
length
public int length()
- Returns the length of the last match found.
- Specified by:
lengthin interfaceorg.apache.oro.text.regex.MatchResult
groups
public int groups()
- Specified by:
groupsin interfaceorg.apache.oro.text.regex.MatchResult
group
public java.lang.String group(int group)
- Returns the contents of the parenthesized subgroups of the last match
found according to the behavior dictated by the MatchResult interface.
- Specified by:
groupin interfaceorg.apache.oro.text.regex.MatchResult
begin
public int begin(int group)
- Returns the begin offset of the subgroup of the last match found
relative the beginning of the match.
- Specified by:
beginin interfaceorg.apache.oro.text.regex.MatchResult
end
public int end(int group)
- Returns the end offset of the subgroup of the last match found
relative the beginning of the match.
- Specified by:
endin interfaceorg.apache.oro.text.regex.MatchResult
beginOffset
public int beginOffset(int group)
- Returns an offset marking the beginning of the last pattern match
found relative to the beginning of the input from which the match
was extracted.
- Specified by:
beginOffsetin interfaceorg.apache.oro.text.regex.MatchResult
endOffset
public int endOffset(int group)
- Returns an offset marking the end of the last pattern match found
relative to the beginning of the input from which the match was
extracted.
- Specified by:
endOffsetin interfaceorg.apache.oro.text.regex.MatchResult
toString
public java.lang.String toString()
- Returns the same as group(0).
- Specified by:
toStringin interfaceorg.apache.oro.text.regex.MatchResult
preMatch
public java.lang.String preMatch()
- Returns the part of the input preceding the last match found.
postMatch
public java.lang.String postMatch()
- Returns the part of the input following the last match found.
preMatchCharArray
public char[] preMatchCharArray()
- Returns the part of the input preceding the last match found as a
char array. This method eliminates the extra
buffer copying caused by preMatch().toCharArray().
postMatchCharArray
public char[] postMatchCharArray()
- Returns the part of the input following the last match found as a char
array. This method eliminates the extra buffer copying caused by
preMatch().toCharArray().
|
|||||||||
| Home >> All >> org >> apache >> oro >> text >> [ perl overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
JAVADOC
org.apache.oro.text.perl.Perl5Util