|
Qizx fe-4.4p2 API | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.qizx.api.util.fulltext.DefaultTextTokenizer
public class DefaultTextTokenizer
Generic Text Tokenizer, suitable for most Western languages.
Words are 1) a sequence of letters or digits, beginning with a letter; 2) a number, without exponent. Words never contain a dash or an apostrophe.
Field Summary |
---|
Fields inherited from interface com.qizx.api.fulltext.TextTokenizer |
---|
END, PARAGRAPH, SENTENCE, WORD |
Constructor Summary | |
---|---|
DefaultTextTokenizer()
|
Method Summary | |
---|---|
void |
copyTokenTo(char[] array,
int start)
Copies the current token into a character array. |
void |
defineSpecialChar(char ch)
Define a character to recognize when parsing of special characters is enabled. |
int |
getDigitMax()
Returns the maximum number of digits a word can contain. |
char[] |
getTokenChars()
Returns the current token as a new character array. |
int |
getTokenLength()
Returns the original length of the last word returned by nextWord. |
int |
getTokenOffset()
Returns the offset (in source text chunk) of the last word returned by nextWord. |
boolean |
gotWildcard()
Returns true if wildcard characters have been recognized in the current token. |
boolean |
isAcceptingWildcards()
Returns true if wildcard characters are recognized. |
boolean |
isParsingSpecialChars()
Returns true if special characters are recognized. |
int |
nextToken()
Returns the type of the next token, or END if no more token can be found. |
void |
setAcceptingWildcards(boolean acceptingWildcards)
If set to true, wildcard characters are recognized. |
void |
setDigitMax(int max)
Sets the maximum number of digits a word can contain. |
void |
setParsingSpecialChars(boolean parsingSpecialChars)
If set to true, special characters are recognized. |
void |
start(char[] text,
int length)
Starts the analysis of a new text chunk. |
void |
start(CharSequence text)
Starts the analysis of a new text chunk. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public DefaultTextTokenizer()
Method Detail |
---|
public void start(char[] text, int length)
TextTokenizer
start
in interface TextTokenizer
text
- characters to tokenizelength
- number of characters in the text arraypublic void start(CharSequence text)
TextTokenizer
start
in interface TextTokenizer
text
- fragment to tokenizepublic void copyTokenTo(char[] array, int start)
TextTokenizer
copyTokenTo
in interface TextTokenizer
array
- destination array. Must fit the size of the token.start
- offset in the destination array.public char[] getTokenChars()
TextTokenizer
getTokenChars
in interface TextTokenizer
public int getTokenOffset()
TextTokenizer
getTokenOffset
in interface TextTokenizer
public int getTokenLength()
TextTokenizer
getTokenLength
in interface TextTokenizer
public boolean isAcceptingWildcards()
TextTokenizer
Wildcard character sequences are ".", ".?", ".*", ".+", and ".{n,m}"
isAcceptingWildcards
in interface TextTokenizer
public void setAcceptingWildcards(boolean acceptingWildcards)
TextTokenizer
setAcceptingWildcards
in interface TextTokenizer
public boolean isParsingSpecialChars()
TextTokenizer
isParsingSpecialChars
in interface TextTokenizer
TextTokenizer.defineSpecialChar(char)
public void setParsingSpecialChars(boolean parsingSpecialChars)
TextTokenizer
setParsingSpecialChars
in interface TextTokenizer
TextTokenizer.defineSpecialChar(char)
public void defineSpecialChar(char ch)
TextTokenizer
defineSpecialChar
in interface TextTokenizer
public boolean gotWildcard()
TextTokenizer
gotWildcard
in interface TextTokenizer
public int nextToken()
TextTokenizer
nextToken
in interface TextTokenizer
public int getDigitMax()
TextTokenizer
getDigitMax
in interface TextTokenizer
public void setDigitMax(int max)
TextTokenizer
setDigitMax
in interface TextTokenizer
|
© 2010 Axyana Software | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |