com.qizx.api.util.text
Class DefaultWordSieve
java.lang.Object
com.qizx.api.util.fulltext.DefaultTextTokenizer
com.qizx.api.util.text.DefaultWordSieve
- All Implemented Interfaces:
- TextTokenizer, Indexing.WordSieve, Serializable
Deprecated. use DefaultTextTokenizer
public class DefaultWordSieve
- extends DefaultTextTokenizer
- implements Indexing.WordSieve, Serializable
A basic text tokenizer suitable for most European languages.
All methods can be redefined.
- Words start with a letter, a digit, or an underscore. They can contain
additionally an hyphen '-', and a dot if it is not in last position.
- Unless specified by constructor argument, characters are converted to
lowercase.
- Unless specified by constructor argument, ISO-8859-1 characters with
accents (diacritics) are converted to accent-less equivalent (e.g 'é' is
converted to 'e'). More complex mappings such as German ß to "ss" are not
supported.
- No stemming is performed.
- See Also:
- Serialized Form
Methods inherited from class com.qizx.api.util.fulltext.DefaultTextTokenizer |
copyTokenTo, defineSpecialChar, getDigitMax, getTokenChars, getTokenLength, getTokenOffset, gotWildcard, isAcceptingWildcards, isParsingSpecialChars, nextToken, setAcceptingWildcards, setDigitMax, setParsingSpecialChars, start, start |
Methods inherited from interface com.qizx.api.fulltext.TextTokenizer |
copyTokenTo, defineSpecialChar, getDigitMax, getTokenChars, getTokenLength, getTokenOffset, gotWildcard, isAcceptingWildcards, isParsingSpecialChars, nextToken, setAcceptingWildcards, setDigitMax, setParsingSpecialChars, start, start |
DefaultWordSieve
public DefaultWordSieve()
- Deprecated.
- Default constructor.
copy
public Indexing.WordSieve copy()
- Deprecated.
- Description copied from interface:
Indexing.WordSieve
- Creates a carbon copy of this object.
- Specified by:
copy
in interface Indexing.WordSieve
- Returns:
- a new copy of this object