| Constructor: |
public Token() {
}
Constructs a Token will null text. |
public Token(int start,
int end) {
startOffset = start;
endOffset = end;
}
Constructs a Token with null text and start & end
offsets. Parameters:
start - start offset
end - end offset
|
public Token(int start,
int end,
String typ) {
startOffset = start;
endOffset = end;
type = typ;
}
Constructs a Token with null text and start & end
offsets plus the Token type. Parameters:
start - start offset
end - end offset
|
public Token(String text,
int start,
int end) {
termText = text;
startOffset = start;
endOffset = end;
}
Constructs a Token with the given term text, and start
& end offsets. The type defaults to "word."
NOTE: for better indexing speed you should
instead use the char[] termBuffer methods to set the
term text. Parameters:
text - term text
start - start offset
end - end offset
|
public Token(String text,
int start,
int end,
String typ) {
termText = text;
startOffset = start;
endOffset = end;
type = typ;
}
Constructs a Token with the given text, start and end
offsets, & type. NOTE: for better indexing
speed you should instead use the char[] termBuffer
methods to set the term text. Parameters:
text - term text
start - start offset
end - end offset
typ - token type
|
| Method from org.apache.lucene.analysis.Token Detail: |
public void clear() {
payload = null;
// Leave termBuffer to allow re-use
termLength = 0;
termText = null;
positionIncrement = 1;
// startOffset = endOffset = 0;
// type = DEFAULT_TYPE;
}
Resets the term text, payload, and positionIncrement to default.
Other fields such as startOffset, endOffset and the token type are
not reset since they are normally overwritten by the tokenizer. |
public Object clone() {
try {
Token t = (Token)super.clone();
if (termBuffer != null) {
t.termBuffer = null;
t.setTermBuffer(termBuffer, 0, termLength);
}
if (payload != null) {
t.setPayload((Payload) payload.clone());
}
return t;
} catch (CloneNotSupportedException e) {
throw new RuntimeException(e); // shouldn't happen
}
}
|
public final int endOffset() {
return endOffset;
}
Returns this Token's ending offset, one greater than the position of the
last character corresponding to this token in the source text. |
public Payload getPayload() {
return this.payload;
}
Returns this Token's payload. |
public int getPositionIncrement() {
return positionIncrement;
}
Returns the position increment of this Token. |
public char[] resizeTermBuffer(int newSize) {
initTermBuffer();
if (newSize > termBuffer.length) {
int size = termBuffer.length;
while(size < newSize)
size *= 2;
char[] newBuffer = new char[size];
System.arraycopy(termBuffer, 0, newBuffer, 0, termBuffer.length);
termBuffer = newBuffer;
}
return termBuffer;
}
Grows the termBuffer to at least size newSize. |
public void setEndOffset(int offset) {
this.endOffset = offset;
}
|
public void setPayload(Payload payload) {
this.payload = payload;
}
Sets this Token's payload. |
public void setPositionIncrement(int positionIncrement) {
if (positionIncrement < 0)
throw new IllegalArgumentException
("Increment must be zero or greater: " + positionIncrement);
this.positionIncrement = positionIncrement;
}
Set the position increment. This determines the position of this token
relative to the previous Token in a TokenStream , used in phrase
searching.
The default value is one.
Some common uses for this are:
- Set it to zero to put multiple terms in the same position. This is
useful if, e.g., a word has multiple stems. Searches for phrases
including either stem will match. In this case, all but the first stem's
increment should be set to zero: the increment of the first instance
should be one. Repeating a token with an increment of zero can also be
used to boost the scores of matches on that token.
- Set it to values greater than one to inhibit exact phrase matches.
If, for example, one does not want phrases to match across removed stop
words, then one could build a stop word filter that removes stop words and
also sets the increment to the number of stop words removed before each
non-stop word. Then exact phrase queries will only match when the terms
occur with no intervening stop words.
|
public void setStartOffset(int offset) {
this.startOffset = offset;
}
|
public final void setTermBuffer(char[] buffer,
int offset,
int length) {
resizeTermBuffer(length);
System.arraycopy(buffer, offset, termBuffer, 0, length);
termLength = length;
}
Copies the contents of buffer, starting at offset for
length characters, into the termBuffer
array. NOTE: for better indexing speed you
should instead retrieve the termBuffer, using #termBuffer() or #resizeTermBuffer(int) , and
fill it in directly to set the term text. This saves
an extra copy. |
public final void setTermLength(int length) {
initTermBuffer();
termLength = length;
}
Set number of valid characters (length of the term) in
the termBuffer array. |
public void setTermText(String text) {
termText = text;
termBuffer = null;
}
Sets the Token's term text. NOTE: for better
indexing speed you should instead use the char[]
termBuffer methods to set the term text. |
public final void setType(String type) {
this.type = type;
}
|
public final int startOffset() {
return startOffset;
}
Returns this Token's starting offset, the position of the first character
corresponding to this token in the source text.
Note that the difference between endOffset() and startOffset() may not be
equal to termText.length(), as the term text may have been altered by a
stemmer or some other filter. |
public final char[] termBuffer() {
initTermBuffer();
return termBuffer;
}
Returns the internal termBuffer character array which
you can then directly alter. If the array is too
small for your token, use #resizeTermBuffer(int) to increase it. After
altering the buffer be sure to call #setTermLength to record the number of valid
characters that were placed into the termBuffer. |
public final int termLength() {
initTermBuffer();
return termLength;
}
Return number of valid characters (length of the term)
in the termBuffer array. |
public final String termText() {
if (termText == null && termBuffer != null)
termText = new String(termBuffer, 0, termLength);
return termText;
} Deprecated! Use - #termBuffer() and #termLength() instead.
Returns the Token's term text. |
public String toString() {
StringBuffer sb = new StringBuffer();
sb.append('(");
initTermBuffer();
if (termBuffer == null)
sb.append("null");
else
sb.append(termBuffer, 0, termLength);
sb.append(',").append(startOffset).append(',").append(endOffset);
if (!type.equals("word"))
sb.append(",type=").append(type);
if (positionIncrement != 1)
sb.append(",posIncr=").append(positionIncrement);
sb.append(')");
return sb.toString();
}
|
public final String type() {
return type;
}
Returns this Token's lexical type. Defaults to "word". |