|
Qizx fe-4.4p2 API | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface TextTokenizer
Pluggable text tokenizer compatible with standard full-text features. Analyzes text chunks to extract and normalize words.
To parse words, the tokenizer is first initialized with method
start(char[], int)
on a text chunk. Then the nextToken()
method
is called repeatedly until the last token is parsed.
Field Summary | |
---|---|
static int |
END
Code returned by nextToken when the end of the text to tokenize is reached. |
static int |
PARAGRAPH
Code returned by nextToken when a paragraph boundary is recognized. |
static int |
SENTENCE
Code returned by nextToken when a sentence boundary is recognized. |
static int |
WORD
Code returned by nextToken when a word is recognized. |
Method Summary | |
---|---|
void |
copyTokenTo(char[] array,
int start)
Copies the current token into a character array. |
void |
defineSpecialChar(char ch)
Define a character to recognize when parsing of special characters is enabled. |
int |
getDigitMax()
Returns the maximum number of digits a word can contain. |
char[] |
getTokenChars()
Returns the current token as a new character array. |
int |
getTokenLength()
Returns the original length of the last word returned by nextWord. |
int |
getTokenOffset()
Returns the offset (in source text chunk) of the last word returned by nextWord. |
boolean |
gotWildcard()
Returns true if wildcard characters have been recognized in the current token. |
boolean |
isAcceptingWildcards()
Returns true if wildcard characters are recognized. |
boolean |
isParsingSpecialChars()
Returns true if special characters are recognized. |
int |
nextToken()
Returns the type of the next token, or END if no more token can be found. |
void |
setAcceptingWildcards(boolean acceptingWildcards)
If set to true, wildcard characters are recognized. |
void |
setDigitMax(int max)
Sets the maximum number of digits a word can contain. |
void |
setParsingSpecialChars(boolean parsingSpecialChars)
If set to true, special characters are recognized. |
void |
start(char[] text,
int length)
Starts the analysis of a new text chunk. |
void |
start(CharSequence text)
Starts the analysis of a new text chunk. |
Field Detail |
---|
static final int END
static final int WORD
static final int SENTENCE
Not yet supported.
static final int PARAGRAPH
Not yet supported.
Method Detail |
---|
void start(char[] text, int length)
text
- characters to tokenizelength
- number of characters in the text arrayvoid start(CharSequence text)
text
- fragment to tokenizeint nextToken()
int getTokenOffset()
int getTokenLength()
char[] getTokenChars()
void copyTokenTo(char[] array, int start)
array
- destination array. Must fit the size of the token.start
- offset in the destination array.boolean isParsingSpecialChars()
defineSpecialChar(char)
void setParsingSpecialChars(boolean parsingSpecialChars)
defineSpecialChar(char)
int getDigitMax()
void setDigitMax(int max)
void defineSpecialChar(char ch)
boolean isAcceptingWildcards()
Wildcard character sequences are ".", ".?", ".*", ".+", and ".{n,m}"
void setAcceptingWildcards(boolean acceptingWildcards)
boolean gotWildcard()
|
© 2010 Axyana Software | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |