Qizx fe-4.4p2 API

com.qizx.api
Class Indexing

java.lang.Object
  extended by com.qizx.api.Indexing
All Implemented Interfaces:
com.qizx.xdm.DataConversion

public class Indexing
extends Object
implements com.qizx.xdm.DataConversion

Specification of rules and parameters used to build indexes in a XML Library.

An instance of this class is associated with each XML Library. It can be represented in XML, thus read and written in this format. A DTD is provided in the documentation.

The direct use of this class is recommended only for very advanced problems, for example when an indexing specification needs to be generated dynamically. In most cases, an indexing specification is written in XML. Its most common use consists of defining specific Numeric or Date Sieves, or to tune parameters like minWordLength.


Nested Class Summary
static interface Indexing.DateSieve
          Pluggable analyzer/converter of date-time values, for custom indexing.
static interface Indexing.NumberSieve
          Pluggable analyzer/converter of numeric values, for custom indexing.
static class Indexing.Rule
          Indexing properties associated with an element or an attribute (for advanced indexing).
static interface Indexing.Sieve
          Abstraction of Sieves used for indexing XML data.
static interface Indexing.WordSieve
          Deprecated. replaced by TextTokenizer
 
Field Summary
static byte DATE
          Indexing type for a rule: attempt to convert element or attribute text content with a DateSieve then put it to date index.
static byte DATE_AND_STRING
          Indexing type for a rule: same as both DATE and STRING.
static byte DISABLE_FULL_TEXT
          Value for full-text option (global or per rule) disabling full-text.
static byte ENABLE_FULL_TEXT
          Value for full-text option (global or per rule) enabling full-text.
static byte INHERIT
          Default value for full-text on a Rule: inherit from parent element or from global setting.
static byte NUMERIC
          Indexing type for a rule: attempt to convert element or attribute text content with a NumberSieve then put it to numeric index.
static byte NUMERIC_AND_STRING
          Indexing type for a rule: same as both NUMERIC and STRING.
static byte STRING
          Indexing type for a rule: text fragment directly put to string index if length is less than max-string-length.
 
Constructor Summary
Indexing()
          Creates an empty indexing configuration.
 
Method Summary
 FormatDateSieve addAttrDateSieve(String format, Locale locale, QName attributeName, QName[] context)
          A Convenience method that adds a format-based DateSieve for a specific attribute
 Indexing.Rule addAttributeRule(QName attributeName, QName[] context, int indexingType)
          Adds a new Rule applicable to a specific attribute.
 FormatNumberSieve addAttrNumberSieve(String format, Locale locale, QName attributeName, QName[] context)
          A Convenience method that adds a format-based NumberSieve for a specific attribute
 Indexing.DateSieve addDateSieve(String format, Locale locale, QName elementName, QName[] context)
          A Convenience method that adds a format-based DateSieve for a specific element
 Indexing.Rule addDefaultAttributeRule(QName[] context, int indexingType)
          Adds a new Rule applicable to all attributes.
 Indexing.Rule addDefaultElementRule(QName[] context, int indexingType)
          Adds a new Rule applicable to all elements.
 Indexing.Rule addElementRule(QName elementName, QName[] context, int indexingType)
          Adds a new Rule applicable to a specific element.
 FormatNumberSieve addNumberSieve(String format, Locale locale, QName elementName, QName[] context)
          A Convenience method that adds a format-based DateSieve for a specific element
 double convertDate(Node node)
          Attempts to convert the date or date-time contained in the text fragment to a double value (in milliseconds from 1970-01-01 00:00:00 UTC).
 double convertNumber(Node node)
          Attempts to convert the string to a double value.
static Indexing defaultRules()
          Creates a default indexing configuration.
 void export(XMLPushStream stream)
          Converts to an external representation.
 Indexing.Rule getAttributeRule(int index)
          Returns the n-th attribute Rule.
 int getAttributeRuleCount()
          Returns the number of defined Attribute rules, including default rules.
 Indexing.Rule getElementRule(int index)
          Returns the n-th element Rule.
 int getElementRuleCount()
          Returns the number of defined Element rules, including default rules.
 int getMaxStringLength()
          Returns the maximum length of text chunks that can be indexed in value indexes (Attribute and Simple element content).
 int getMaxWordLength()
          Returns the maximum length of words that can be indexed in the Full-text index.
 int getMinWordLength()
          Returns the minimum length of words that can be indexed in the Full-text index.
 Indexing.WordSieve getWordSieve()
          Returns the word sieve used for full-text indexing.
 boolean isFulltextEnabled()
          Returns true if full-text indexing is globally enabled for elements.
 void parse(InputSource source)
          Parses an Indexing specification from XML text representation.
 void parse(InputSource source, XMLReader parser)
          Parses an Indexing specification from XML text representation.
 void parse(Node specification)
          Parses an Indexing specification from a Node.
 void setFulltextEnabled(boolean fulltext)
          Enables or disables full-text indexing globally.
 void setMaxStringLength(int maxStringLength)
          Defines the maximum length of text chunks that can be indexed in value indexes (Attribute and Simple element content).
 void setMaxWordLength(int maxWordLength)
          Defines the maximum length of words that can be indexed in the Full-text index.
 void setMinWordLength(int minWordLength)
          Defines the minimum length of words that can be indexed in the Full-text index.
 void setWordSieve(Indexing.WordSieve wordSieve)
          Defines the word sieve used for full-text indexing.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

STRING

public static final byte STRING
Indexing type for a rule: text fragment directly put to string index if length is less than max-string-length.

See Also:
Constant Field Values

NUMERIC

public static final byte NUMERIC
Indexing type for a rule: attempt to convert element or attribute text content with a NumberSieve then put it to numeric index.

See Also:
Constant Field Values

NUMERIC_AND_STRING

public static final byte NUMERIC_AND_STRING
Indexing type for a rule: same as both NUMERIC and STRING.

See Also:
Constant Field Values

DATE

public static final byte DATE
Indexing type for a rule: attempt to convert element or attribute text content with a DateSieve then put it to date index.

See Also:
Constant Field Values

DATE_AND_STRING

public static final byte DATE_AND_STRING
Indexing type for a rule: same as both DATE and STRING.

See Also:
Constant Field Values

INHERIT

public static final byte INHERIT
Default value for full-text on a Rule: inherit from parent element or from global setting.

See Also:
Constant Field Values

DISABLE_FULL_TEXT

public static final byte DISABLE_FULL_TEXT
Value for full-text option (global or per rule) disabling full-text.

See Also:
Constant Field Values

ENABLE_FULL_TEXT

public static final byte ENABLE_FULL_TEXT
Value for full-text option (global or per rule) enabling full-text.

See Also:
Constant Field Values
Constructor Detail

Indexing

public Indexing()
Creates an empty indexing configuration.

Method Detail

defaultRules

public static Indexing defaultRules()
Creates a default indexing configuration.

Returns:
a new instance which contains default indexing rules.

parse

public void parse(InputSource source)
           throws QizxException
Parses an Indexing specification from XML text representation.

Parameters:
source - a SAX input source for the specification
Throws:
QizxException - on on IO error, SAX parsing error, if the indexing specification is invalid

parse

public void parse(InputSource source,
                  XMLReader parser)
           throws QizxException
Parses an Indexing specification from XML text representation.

Parameters:
source - a SAX input source for the specification
parser - a SAX parser explicitly specified
Throws:
QizxException - on on IO error, SAX parsing error, if the indexing specification is invalid

parse

public void parse(Node specification)
           throws DataModelException
Parses an Indexing specification from a Node.

Parameters:
specification - a node of type document or element which is the root of an indexing specification.
Throws:
DataModelException - if the indexing specification is invalid

export

public void export(XMLPushStream stream)
            throws DataModelException
Converts to an external representation.

Parameters:
stream - a Push style interface (for example XMLSerializer).
Throws:
DataModelException - if generated by the output stream

addElementRule

public Indexing.Rule addElementRule(QName elementName,
                                    QName[] context,
                                    int indexingType)
Adds a new Rule applicable to a specific element.

Parameters:
elementName - specific element name
context - enclosing elements (optional, may be null)
indexingType - possible values are STRING, NUMERIC, DATE, NUMERIC_AND_STRING, DATE_AND_STRING
Returns:
the new Rule

addDefaultElementRule

public Indexing.Rule addDefaultElementRule(QName[] context,
                                           int indexingType)
Adds a new Rule applicable to all elements.

Parameters:
context - enclosing elements (optional, may be null)
indexingType - possible values are STRING, NUMERIC, DATE, NUMERIC_AND_STRING, DATE_AND_STRING
Returns:
the new Rule

getElementRuleCount

public int getElementRuleCount()
Returns the number of defined Element rules, including default rules.

Returns:
the number of defined Element rules

getElementRule

public Indexing.Rule getElementRule(int index)
Returns the n-th element Rule.

Parameters:
index - an index starting from 0
Returns:
the element Rule at rank 'index'

addAttributeRule

public Indexing.Rule addAttributeRule(QName attributeName,
                                      QName[] context,
                                      int indexingType)
Adds a new Rule applicable to a specific attribute.

Parameters:
attributeName - specific attribute name
context - enclosing elements (optional, may be null)
indexingType - possible values are STRING, NUMERIC, DATE, NUMERIC_AND_STRING, DATE_AND_STRING
Returns:
the new Rule

addDefaultAttributeRule

public Indexing.Rule addDefaultAttributeRule(QName[] context,
                                             int indexingType)
Adds a new Rule applicable to all attributes.

Parameters:
context - enclosing elements (optional, may be null)
indexingType - possible values are STRING, NUMERIC, DATE, NUMERIC_AND_STRING, DATE_AND_STRING
Returns:
the new Rule

getAttributeRuleCount

public int getAttributeRuleCount()
Returns the number of defined Attribute rules, including default rules.

Returns:
the number of defined Attribute rules

getAttributeRule

public Indexing.Rule getAttributeRule(int index)
Returns the n-th attribute Rule.

Parameters:
index - an index starting from 0
Returns:
the attribute Rule at rank 'index'

getWordSieve

public Indexing.WordSieve getWordSieve()
Returns the word sieve used for full-text indexing.

Returns:
the current word sieve

setWordSieve

public void setWordSieve(Indexing.WordSieve wordSieve)
Defines the word sieve used for full-text indexing.

Note: the word sieve is unique because it is also used for parsing full-text queries.

Parameters:
wordSieve - in implementation of WordSieve. Must not be null.

getMaxStringLength

public int getMaxStringLength()
Returns the maximum length of text chunks that can be indexed in value indexes (Attribute and Simple element content).

The default value is 50

Returns:
the current maximum length

setMaxStringLength

public void setMaxStringLength(int maxStringLength)
Defines the maximum length of text chunks that can be indexed in value indexes (Attribute and Simple element content). Text chunks longer than this value are regarded as not meaningful therefore not indexed. But they can still be indexed in full-text index.

Parameters:
maxStringLength - the maximum length desired

getMaxWordLength

public int getMaxWordLength()
Returns the maximum length of words that can be indexed in the Full-text index. Words longer than this value are regarded as not meaningful therefore not indexed. The default value is 30.

Returns:
the current maximum length

setMaxWordLength

public void setMaxWordLength(int maxWordLength)
Defines the maximum length of words that can be indexed in the Full-text index. Words longer than this value are regarded as not meaningful therefore not indexed.

Parameters:
maxWordLength - word length in characters

getMinWordLength

public int getMinWordLength()
Returns the minimum length of words that can be indexed in the Full-text index. Words shorter than this value are regarded as not meaningful therefore not indexed. The default value is 2 (one-letter words are discarded).

Returns:
the current minimum length

setMinWordLength

public void setMinWordLength(int minWordLength)
Defines the minimum length of words that can be indexed in the Full-text index. Words shorter than this value are regarded as not meaningful therefore not indexed.

Parameters:
minWordLength - word length in characters

isFulltextEnabled

public boolean isFulltextEnabled()
Returns true if full-text indexing is globally enabled for elements.

By default, full-text indexing is globally enabled. It can be disabled with setFulltextEnabled(), or by an attribute full-text='false' in an indexing specification.

Returns:
true if full-text indexing is globally enabled (the default)

setFulltextEnabled

public void setFulltextEnabled(boolean fulltext)
Enables or disables full-text indexing globally. Full-text indexing can be enabled on specific XML elements by a rule with attribute full-text='true'.

Parameters:
fulltext - true to enable full-text indexing globally

addDateSieve

public Indexing.DateSieve addDateSieve(String format,
                                       Locale locale,
                                       QName elementName,
                                       QName[] context)
A Convenience method that adds a format-based DateSieve for a specific element

Parameters:
format - a date format as supported by SimpleDateFormat and FormatDateSieve.
locale - optional locale used for creating the format (if null, the default locale is used).
elementName - required name of the element
context - optional element context
Returns:
the DateSieve created for the added rule, so that it can be further modified.

addAttrDateSieve

public FormatDateSieve addAttrDateSieve(String format,
                                        Locale locale,
                                        QName attributeName,
                                        QName[] context)
A Convenience method that adds a format-based DateSieve for a specific attribute

Parameters:
format - a date format as supported by SimpleDateFormat and FormatDateSieve.
locale - optional locale used for creating the format (if null, the default locale is used).
attributeName - required name of the attribute
context - optional element context
Returns:
the DateSieve created for the added rule.

addNumberSieve

public FormatNumberSieve addNumberSieve(String format,
                                        Locale locale,
                                        QName elementName,
                                        QName[] context)
A Convenience method that adds a format-based DateSieve for a specific element

Parameters:
format - a date format as supported by SimpleDateFormat and FormatDateSieve.
locale - optional locale used for creating the format (if null, the default locale is used).
elementName - required name of the element
context - optional element context
Returns:
the DateSieve created for the added rule.

addAttrNumberSieve

public FormatNumberSieve addAttrNumberSieve(String format,
                                            Locale locale,
                                            QName attributeName,
                                            QName[] context)
A Convenience method that adds a format-based NumberSieve for a specific attribute

Parameters:
format - a date format as supported by DecimalFormat and FormatNumberSieve.
locale - optional locale used for creating the format (if null, the default locale is used).
attributeName - required name of the attribute
context - optional element context
Returns:
the DateSieve created for the added rule.

convertNumber

public double convertNumber(Node node)
Description copied from interface: com.qizx.xdm.DataConversion
Attempts to convert the string to a double value.

Specified by:
convertNumber in interface com.qizx.xdm.DataConversion
Returns:
the converted value, or NaN if the conversion is not possible. Should raise no exception

convertDate

public double convertDate(Node node)
Description copied from interface: com.qizx.xdm.DataConversion
Attempts to convert the date or date-time contained in the text fragment to a double value (in milliseconds from 1970-01-01 00:00:00 UTC).

Specified by:
convertDate in interface com.qizx.xdm.DataConversion
Returns:
the converted value, or NaN if the conversion is not possible.

© 2010 Axyana Software