Table of Contents
This edition of Qizx does not include a stand-alone server program. It is designed to be embedded in a Java™ application, typically a Servlet. You'll learn in this chapter everything needed to implement a basic application using Qizx. For an introduction to using Qizx, please see the chapter Getting started.
The target audience of this chapter are experienced Java programmers, having a good knowledge of XML and at least a basic knowledge of XQuery.
This chapter is organized in 7 lessons:
First lesson: how to create a database (Library
) and populate it with data (Collection
s and Document
s).
This lesson is by far the largest one because it contains a refresher about the concepts (LibraryManager
, Library
, Collection
, etc) involved in programming Qizx and also, sidebars about the XML catalog resolver, multi-threading and authorization, which can be skipped on a first reading.
Second lesson: how to make local copies of Document
s stored in a database.
Third lesson: how to query a database.
Fourth lesson: how to delete a Document
, a Collection
or a whole Library
.
Fifth lesson: how to modify a Document
stored in a database.
Sixth lesson: how to customize the indexing of the XML content and how to re-index a database
Seventh lesson: how to add metadata (properties) to a Document
.
The directory docs/samples/book_data/
contains several kinds of XML documents. These short, simple XML documents (a few dozens) serve no other purpose than teaching how to program with the Qizx API. In real life, Qizx can be expected to store and query hundreds of thousands XML documents of multiple sizes, ranging from a few hundreds of bytes to several hundred megabytes.
Each document found in this directory contains the description of a Science-Fiction book: its title, authors, editions, etc. Example docs/samples/book_data/Books/The_Robots_of_Dawn.xml
:
<book xmlns="http://www.qizx.com/namespace/Tutorial"> <title>The Robots of Dawn</title> <author>Isaac Asimov</author> <publicationDate>MCMLXXXIII</publicationDate> <editions> <edition> <ISBN>0553299492</ISBN> <publisher>Doubleday</publisher> <language>English</language> <year>1983</year> </edition> </editions> </book>
Each document found in this directory contains the description of a publisher: its name, address, etc. Example docs/samples/book_data/Publishers/Doubleday.xml
:
<publisher xmlns="http://www.qizx.com/namespace/Tutorial"> <trademark>Doubleday</trademark> <company>Random House, Inc.</company> <address xml:space="preserve">1540 Broadway New York, NY 10036 US</address> </publisher>
Each document found in this directory contains the description of a Science-Fiction author: her/his name, pseudonyms, birth date, etc. Example docs/samples/book_data/Authors/iasimov.xml
:
<author xmlns="http://www.qizx.com/namespace/Tutorial" nationality="US" gender="male"> <fullName>Isaac Asimov</fullName> <pseudonyms> <pseudonym>Paul French</pseudonym> <pseudonym>George E. Dale</pseudonym> </pseudonyms> <birthDate>January 2, 1920</birthDate> <birthPlace> <city>Petrovichi</city><country>Russian SFSR</country> </birthPlace> <blurb location="../Author%20Blurbs/Isaac_Asimov.xhtml"/> </author>
Each document found in this directory is an XHTML page which is a copy of a Wikipedia article describing a Science-Fiction author. Example docs/samples/book_data/Author Blurbs/Isaac_Asimov.xhtml
:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" dir="ltr" lang="en"> <head> ... <title>Isaac Asimov - Wikipedia, the free encyclopedia</title> ... </body> </html>
The XHTML DTD and the corresponding XML Catalog are found in docs/samples/xhtml_dtd/
.
All the code samples used to illustrate this chapter are found in the docs/samples/programming/
directory. Files containing XQuery scripts are found in the docs/samples/book_queries/
directory.
You'll need a recent version of ant, a Java-based build tool[3] to compile and run the codes samples.
The Put
class implements a command-line tool allowing to create a Library
and populate it with Collection
s and Document
s. More precisely, it allows to copy one or more source files or directories to a single destination Collection
or Document
. If multiple sources are specified, the destination must be an existing Collection
. Moreover the Put
class allows to filter what's being copied by the means of a simple java.io.FileFilter
.
The outline of this program is (excerpts of Put.java
):
LibraryManager libManager = getLibraryManager(storageDir);Library lib = getLibrary(libManager, libName);
LibraryMember dst = lib.getMember(dstPath); boolean dstIsCollection = (dst != null && dst.isCollection()); if (args.length > l+4 && !dstIsCollection) { shutdown(lib, libManager); usage("'" + dstPath + "', does not exist or is a document"); } try { for (int i = l+2; i < last; ++i) { File srcFile = new File(args[i]); String dstPath2 = dstPath; if (dstIsCollection) { dstPath2 = joinPath(dstPath, srcFile.getName()); } put(lib, srcFile, filter, dstPath2);
} verbose("Committing changes..."); lib.commit();
} finally { shutdown(lib, libManager);
}
Get a | |
Get a | |
For each source directory, create the corresponding | |
Commit changes made to the | |
Close the |
Objects involved:
LibraryManager
A LibraryManager
is similar to a database manager. It allows to open or create Libraries.
Library
A Library
is similar to a database. If we use the filesystem analogy, a Library
is similar to a disk drive.
A Library
has a name[4]. A Library
always contains a root Collection
, named "/
", which cannot be deleted.
Collection
If we use the filesystem analogy, a Collection
is similar to a directory. It can contain Document
s and/or Collection
s.
Note that nothing forces you to create a hierarchy of Collection
s. If you prefer, you can import all your Document
s in the root Collection
.
Document
If we use the filesystem analogy, a Document
is similar to a file. Unlike plain files, the content of a Document
is always well-formed XML.
LibraryMember
A common term (super-interface) for both Collection
and Document
.
Like its filesystem counterpart, a LibraryMember
has a path. Path components are separated by a slash character "/
". The last component is the name of the LibraryMember
. The other path components are the names of the ancestor Collection
s of the LibraryMember
, up to the root Collection
"/
".
Example: "/foo/bar/gee
". The name of this LibraryMember
is "gee
". Its ancestor Collection
s are, from direct parent to the root: "bar
", "foo
", "/
".
There is no concept of current working Collection
, therefore relative paths are not useful.
Note that the name of LibraryMember
may contain any character supported by Java™ (including whitespace), except the slash character "/
".
Unlike its filesystem counterpart, a LibraryMember
may have any number of user-defined properties (meta-data) in addition to its content (that is, XML content for a Document
, members for a Collection
). More on properties in lesson 7.
private static LibraryManager getLibraryManager(File storageDir) throws IOException, QizxException { if (storageDir.exists()) {return Configuration.openLibraryGroup(storageDir);
} else { if (!storageDir.mkdirs()) {
throw new IOException("cannot create directory '" + storageDir + "'"); } verbose("Creating library group in '" + storageDir + "'..."); return Configuration.createLibraryGroup(storageDir);
} }
A | |
A Class Configuration supports many options that can be set before creating or opening a Library Group or LibraryManager. Note: LibraryManagerFactory is now deprecated. |
private static Library getLibrary(LibraryManager libManager, String libName) throws QizxException { Library lib = libManager.openLibrary(libName);if (lib == null) { verbose("Creating library '" + libName + "'..."); libManager.createLibrary(libName);
lib = libManager.openLibrary(libName); } return lib; }
| |
|
private static void put(Library lib,File srcFile, FileFilter filter, String dstPath) throws IOException, QizxException { if (srcFile.isDirectory()) { Collection collection = lib.getCollection(dstPath);if (collection == null) { verbose("Creating collection '" + dstPath + "'..."); collection = lib.createCollection(dstPath);
} File[] files = srcFile.listFiles(filter); if (files == null) { throw new IOException("cannot list directory '" + srcFile + "'"); } for (int i = 0; i < files.length; ++i) { File file = files[i]; put(lib, file, filter, joinPath(dstPath, file.getName())); } } else { verbose("Importing '" + srcFile + "' as document '" + dstPath + "'..."); lib.importDocument(dstPath, srcFile);
} }
| |
A | |
A Note that if a Now what if your XML source is not a file? May be your XML source is a W3C DOM Document or a JDOM Document. Or may be you want to dynamically create a Example: dynamically create a XMLPushStream out = lib.beginImportDocument(docPath); out.putDocumentStart(); QName helloName = lib.getQName("hello", "http://www.acme.com/ns/test"); out.putElementStart(helloName); out.putText("Hello world!"); out.putElementElement(helloName); out.putDocumentEnd(); Document doc = lib.endImportDocument(); The
|
A Library
is both a database (or a disk drive, if we use the filesystem analogy) and a transactional session allowing to modify and/or query this database. As such, a sequence of changes made to a Library
must end with commit
or rollback
.
... verbose("Committing changes..."); lib.commit();} finally { shutdown(lib, libManager); } ... private static void shutdown(Library lib, LibraryManager libManager) throws QizxException { if (lib.isModified()) {
lib.rollback(); } lib.close();
libManager.closeAllLibraries(10000 /*ms*/);
}
The | |
The | |
Note that the | |
A |
Compile class Put
by executing ant (see build.xml
) in the docs/samples/programming/put/
directory.
Create the "Tutorial
" library and populate it with all the documents found in docs/samples/book_data/
by running ant run in the docs/samples/programming/put/
directory.
The Get
class implements a command-line tool allowing to make local copies of Collection
s and Document
s stored in a Library
. This tool can match the names of the Collection
s and Document
s to be copied against a wildcard. For example, it can be used to make local copies all Document
s whose names end with ".xhtml
" found in the "/Author Blurbs
" Collection
(corresponding command-line argument is "/Author Blurbs/*.xhtml
").
For queries to work properly, document imports and updates should first be completed with a commit. Some operations would work even before the commit (like getting the contents of a just imported document), but many operations rely on indexing, and indexing is completed at the time of the commit.
Excerpts of Get.java
:
... LibraryMember libMember = lib.getMember(path);if (libMember == null) { error("dont't find '" + path + "'"); return; } get(libMember, dstFile); ... private static void get(LibraryMember libMember, File dstFile) throws IOException, QizxException { File dstFile2; if (dstFile.isDirectory()) { String baseName = libMember.getName(); if ("/".equals(baseName)) baseName = "root"; dstFile2 = new File(dstFile, baseName); } else { dstFile2 = dstFile; } if (libMember.isCollection()) {
getCollection((Collection) libMember, dstFile2); } else { getDocument((Document) libMember, dstFile2); } }
| |
|
A local copy of a Document
is created as follows:
private static void getDocument(Document doc, File dstFile) throws IOException, QizxException { verbose("Copying document '" + doc.getPath() + "' to file '" + dstFile + "'..."); FileOutputStream out = new FileOutputStream(dstFile); try { doc.export(new XMLSerializer(out, "UTF-8"));} finally { out.close(); } }
The Qizx comes with a number of useful implementations of the
The above |
A local copy of a Collection
is created as follows:
private static void getCollection(Collection col, File dstFile) throws IOException, QizxException { verbose("Copying collection '" + col.getPath() + "' to directory '" + dstFile + "'..."); if (!dstFile.isDirectory()) { verbose("Creating directory '" + dstFile + "'..."); if (!dstFile.mkdirs()) { throw new IOException("Cannot create directory '" + dstFile + "'"); } } LibraryMemberIterator iter = col.getChildren();while (iter.moveToNextMember()) { LibraryMember libMember = iter.getCurrentMember(); File dstFile2 = new File(dstFile, libMember.getName()); if (libMember.isCollection()) { getCollection((Collection) libMember, dstFile2); } else { getDocument((Document) libMember, dstFile2); } } }
You'll also find a variant of the |
The Qizx API contains a number of iterators which work differently from java.util.Iterator
(e.g. hasNext
, next
).
In the Qizx API, an iterator always has a moveToNext
method which moves the position of the cursor by one item and a XXX
getCurrent
which returns the item found at current cursor position.XXX
Invoking getCurrent
several times, without invoking XXX
moveToNext
, is indeed possible and will always return the same item. However initially the cursor is one position before the first item (if any), therefore you need to invoke XXX
moveToNext
at least once before invoking XXX
getCurrent
.XXX
Compile class Get
by executing ant (see build.xml
) in the docs/samples/programming/get/
directory.
Run ant run in the docs/samples/programming/get/
directory to make local copies of
Document
"/Authors/pjfarmer.xml
",
Document
s "/Author Blurbs/Philip*
",
Document
s "/Books/The*.xml
",
Collection
"/Publishers
".
in docs/samples/programming/get/tests/out/
.
Querying a database (that is, a Library
) is fairly easy:
Expression expr = lib.compileExpression(script);ItemSequence results = expr.evaluate();
while (results.moveToNextItem()) {
Item result = results.getCurrentItem();
/*Do something with result.*/
}
First compile an XQuery expression using | |
Then evaluate the expression using | |
An Example ( (: Compute and return 2 + 3 :)
2 + 3 evaluates to an Example ( (: List all books by their titles. :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";
collection("/Books")//t:book/t:title evaluates to an |
For queries to work properly, document imports and updates should first be completed with a commit. Some operations would work even before the commit (like getting the contents of a just imported document), but many operations rely on indexing, and indexing is completed at the time of the commit.
The Query
class, which implements a command-line tool allowing to query a Library
, is more complicated than the above code sample because it supports somewhat advanced options.
Excerpts of Query.java
:
private static Expression compileExpression(Library lib, String script, LibraryMember queryRoot, QName[] varNames, String[] varValues) throws IOException, QizxException { Expression expr; try { expr = lib.compileExpression(script); } catch (CompilationException e) { Message[] messages = e.getMessages(); for (int i = 0; i < messages.length; ++i) { error(messages[i].toString()); } throw e; } if (queryRoot != null) expr.bindImplicitCollection(queryRoot);if (varNames != null) { for (int i = 0; i < varNames.length; ++i) { expr.bindVariable(varNames[i], varValues[i], /*type*/ null);
} } return expr; }
Example ( (: List all books by their titles. :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";
//t:book/t:title instead of ( (: List all books by their titles. :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";
collection("/Books")//t:book/t:title | |
An XQuery expression can be further parametrized by the use of variables. Example ( (: List all books containing the value of variable $searched in their titles. :) declare namespace t = "http://www.qizx.com/namespace/Tutorial"; declare variable $searched external; collection("/Books")//t:book/t:title[contains(., $searched)]
|
Some queries may return thousands of results. Therefore, displaying just a range of results (e.g. from result #100 to result #199 inclusive) is a very common need.
private static void evaluateExpression(Expression expr, int from, int limit) throws QizxException { ItemSequence results = expr.evaluate(); if (from > 0) { results.skip(from);} XMLSerializer serializer = new XMLSerializer(); serializer.setIndent(2); int count = 0; while (results.moveToNextItem()) { Item result = results.getCurrentItem(); System.out.print("[" + (from+1+count) + "] "); showResult(serializer, result); System.out.println(); ++count; if (count >= limit)
break; } System.out.flush(); }
| |
This being done, you still need to limit the number of |
In this lesson, we'll just show how to print the string representation of an Item
. In lesson 5, we'll go further and explore the data model of Qizx.
private static void showResult(XMLSerializer serializer, Item result) throws QizxException { if (!result.isNode()) {System.out.println(result.getString());
return; } Node node = result.getNode();
serializer.reset(); String xmlForm = serializer.serializeToString(node);
System.out.println(xmlForm); }
| |
| |
The |
Compile class Query
by executing ant (see build.xml
) in the docs/samples/programming/query/
directory.
Run ant run in the docs/samples/programming/query/
directory to perform this query:
(: Find all books written by French authors. :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";
for $a in collection("/Authors")//t:author[@nationality = "France"]
for $b in collection("/Books")//t:book[.//t:author = $a/t:fullName]
return
$b/t:title
Note that directory docs/samples/book_queries/
contains all the queries needed to illustrate this lesson and also the following ones. You can execute all these queries by running ant run_all in docs/samples/programming/query/
.
Class Delete
implements a command-line tool allowing to delete one or more Document
s or Collection
s. If no Document
or Collection
paths are specified as command-line arguments, the tool deletes the whole Library
.
Excerpts of Delete.java
:
if (args.length == 2) { verbose("Deleting library '" + libName + "'..."); if (!libManager.deleteLibrary(libName)) {warning("Library '" + libName + "' not found"); } libManager.closeAllLibraries(10000 /*ms*/); } else { Library lib = libManager.openLibrary(libName); try { for (int i = 2; i < args.length; ++i) { String path = args[i]; verbose("Deleting member '" + path + "' of library '" + libName + "'..."); if (!lib.deleteMember(path)) {
warning("Member '" + path + "' of library '" + libName + "' not found"); } } verbose("Committing changes..."); lib.commit(); } finally { shutdown(lib, libManager); } }
| |
|
LibraryManager
Because there is no LibraryManager.delete
method, the only way to physically destroy a LibraryManager
is, first to ``close'' it using LibraryManager.closeAllLibraries
, and then, to delete its storage directory (obtained using LibraryManager.getStorageDirectory
).
Since Qizx 2.1, there are two methods for updating a document:
Use XQuery Update, an extension to XQuery that allows insertions, deletions and updates on selected nodes. This is in general by far the easiest method.
A tutorial is available here for a quick yet comprehensive introduction to XQuery Update.
Extract the document to update as a W3C DOM Document
, then update the DOM form, then write back the DOM onto the document. This was the only method available in Qizx 2.0. It can still be useful in specific cases.
Whatever method is used, please remember that any update operation on a document basically implies replacing the document in its entirety. This corresponds with a deliberate design choice allowing faster queries.
In the next sections, the two methods are explained. The example described consists of adding a pseudonym to an existing author specified by his/her full name.
XQuery Update is an extension of XQuery which provides additional instructions for updating documents. The updating primitives are insert, delete, replace and rename.
Using XQuery Update simply consists of executing a script containing XQuery Update primitives. Such a script is called an updating query.
An "updating query" is executed in a special way by the XQuery engine:
first a "pending update list" is created by executing the query (which returns no value)
then the update list is applied at once.
This means that changes are not visible during the execution of the script, but only after completion. This can be surprising, as noted in the example hereafter. The XQuery Update tutorial addresses such issues with more detail.
Here is the XQuery Update script used:
declare default element namespace 'http://www.qizx.com/namespace/Tutorial'; declare variable $ERR := qName('http://www.w3.org/2005/xqt-errors', 'ERR00001'); declare variable $authorName external; declare variable $pseudo external; let $auth := /author[fullName = $authorName] return if (empty($auth)) then error($ERR, 'no such author') else if ($auth/pseudonyms[pseudonym = $pseudo])then error($ERR, 'pseudonym already defined') else if ($auth/pseudonyms) then insert node <pseudonym>{ $pseudo }</pseudonym>
into $auth/pseudonyms else insert node <pseudonyms><pseudonym>{ $pseudo }</pseudonym></pseudonyms>
into $auth
Preliminary tests: check that the author exists and that the pseudonym is not yet defined. | |
If the enclosing element | |
If the element Please notice that due to the way XQuery Update works, it not possible to create the element |
The corresponding Java program is XUpdate.java
.
The strategy we'll use is the following:
Find the Document
to be modified by performing a query.
Convert the document found
to a W3C DOM Document
. This step is needed because the DOM[11] of Qizx is immutable. For example, you'll find a Node.getAttribute
method, but no Node.setAttribute
method.
Modify the W3C DOM Document
.
Replace the content of the Document
stored in the Library
by the content of the W3C DOM Document
.
Unlike the Put
, Get
, Delete
classes which implement generic command-line tools, the Edit
class is specific to the dataset used to illustrate this tutorial. The Edit
class allows to add a pseudonym to an author. The author is found by her/his full name, and not by the path of the Document
containing her/his record.
Excerpts of Edit.java
:
Node author = findAuthor(lib, collectionPath, authorName);if (author == null) return; if (hasPseudonym(author, pseudonym)) {
warning("'" + authorName + "' already has pseudonym '" + pseudonym + "'"); return; } org.w3c.dom.Document doc = (org.w3c.dom.Document) author.getDocumentNode()
.getObject();
if (!doAddPseudo(doc, pseudonym))
return; XMLPushStream out = lib.beginImportDocument(author.getLibraryDocument()
.getPath());
DOMToPushStream helper = new DOMToPushStream(lib, out);
helper.putDocument(doc); lib.endImportDocument();
The private static Node findAuthor(Library lib, String collectionPath, String authorName) throws QizxException { Collection collection = lib.getCollection(collectionPath); if (collection == null) { error("'" + collectionPath + "' is not a collection"); return null; } String script = "declare namespace t = '" + TUTORIAL_NS_URI + "';\n" + "declare variable $name external;\n" + "/t:author[t:fullName = $name]"; Expression expr = lib.compileExpression(script); expr.bindImplicitCollection(collection); expr.bindVariable(lib.getQName("name"), authorName, /*type*/ null); ItemSequence items = expr.evaluate(); if (!items.moveToNextItem()) { error("Don't find author '" + authorName + "'"); return null; } Item item = items.getCurrentItem(); return item.getNode(); } | |
The | |
Method | |
Method | |
The | |
We now need to access the | |
|
The hasPseudonym
method is a simple example of using the Qizx DOM. It searches its pseudonym argument inside an t:author
/t:pseudonyms
/t:pseudonym
element (author having multiple pseudonyms) or inside a t:author
/t:pseudonym
element (author having a single pseudonym):
private static boolean hasPseudonym(Node element, String pseudonym) throws QizxException { Node child = element.getFirstChild();while (child != null) { if (child.isElement()) {
String childName = child.getNodeName().getLocalPart();
if ("pseudonyms".equals(childName)) { return hasPseudonym(child, pseudonym); } else if ("pseudonym".equals(childName)) { if (pseudonym.equals(child.getStringValue())) { return true; } } } child = child.getNextSibling();
} return false; }
The Attributes are represented by | |
Method Methods such as | |
An element A new |
Compile class Edit
by executing ant (see build.xml
) in the docs/samples/programming/edit/
directory.
Run ant run in the docs/samples/programming/edit/
directory to add pseudonym "Kilgore Trout
" to author "Philip José Farmer
" [10].
Note that if you have already run the example using XQuery Update, you will get an error since the Edit class does not accept duplicate pseudonyms.
Query 20.xq
:
(: Find all authors born after 1945 (e.g. Lois McMaster Bujold). :) declare namespace t = "http://www.qizx.com/namespace/Tutorial"; collection("/")//t:author[t:birthDate > xs:date("1945-01-01Z")]/t:fullName
gives no result because the t:birthDate
element is not indexed as a xs:date
[13]. The cause of this problem is that the element contains a date in local format (example: November 2, 1949
) rather than a standard format (example: 1949-11-02
).
This is a case where we need to specify a custom indexing: on the t:birthDate
element, a specific string-to-date converter based on the predefined class com.qizx.api.util.text.FormatDateSieve
has to be used.
In Qizx, custom indexing is defined through an "Indexing Specification" which is in XML format. The syntax and semantics of indexing specifications are described in great details in Chapter 9, Configuring the indexing process.
The indexing specification we will use is in the file indexing.xml
:
<indexing xmlns:t="http://www.qizx.com/namespace/Tutorial"> <!-- Default rules --><element as="numeric+string"/> <element as="date+string" /> <element as="string" /> <attribute as="numeric+string" /> <attribute as="date+string" /> <attribute as="string" /> <!-- Custom rules --> <element name="t:birthDate" context="t:author" as="date" sieve="com.qizx.api.util.text.FormatDateSieve" format="MMMM d, yyyy" locale="en-US" timezone="GMT" />
<element name="t:publicationDate" context="t:book" as="numeric" sieve="RomanNumberSieve" />
</indexing>
Including the default rules before your custom rules is mandatory. If you don't do that, the Library is re-indexed with just the custom rules, which means that many queries will not work. | |
This custom rule specifies that a | |
More about this other custom rule in Section 7.2, “Writing a custom Indexing.NumberSieve”. |
The ReIndex
class implements a command-line tool allowing to change the indexing specification of a Library
and then to re-index this Library
.
Library lib = libManager.openLibrary(libName); try { verbose("Loading indexing specifications from '" + indexingFile + "'..."); Indexing indexing = loadIndexing(indexingFile);lib.setIndexing(indexing);
verbose ("Re-indexing library '" + libName + "'..."); lib.reIndex();
} finally { shutdown(lib, libManager); }
The private static Indexing loadIndexing(File file)
throws IOException, SAXException, QizxException {
Indexing indexing = new Indexing();
String systemId = file.toURI().toASCIIString();
indexing.parse(new InputSource(systemId));
return indexing;
} Alternatively, it is possible to programmatically create an | |
| |
Note that there is no need to invoke |
This time, query 21.xq
(: Find all books published before 1960 (e.g. The Caves of Steel). :) declare namespace t = "http://www.qizx.com/namespace/Tutorial"; collection("/")//t:book[t:publicationDate < 1960]/t:title
gives no result because the t:publicationDate
element is not indexed as a number[13]. The reason of this problem is that the element contains a Roman numeral year date (example: "MCMLIV
" = 1954).
The predefined string-to-number converter, com.qizx.api.util.text.FormatNumberSieve
, is very flexible but not to the point of converting Roman numeral year dates to numbers. Therefore the only way to solve the problem is:
To write a custom string-to-number converter (called a sieve in Qizx parlance), that is, to implement interface Indexing.NumberSieve
.
To properly declare this custom sieve in indexing.xml
, our custom indexing specification.
<element name="t:publicationDate" context="t:book"
as="numeric" sieve="RomanNumberSieve" />
To make sure that the code of our custom sieve is referenced in the CLASSPATH
.
Excerpts of RomanNumberSieve.java
:
public final class RomanNumberSieve implements Indexing.NumberSieve { ... public double convert(String text) {double converted = 0; char[] chars = text.trim().toUpperCase().toCharArray(); int maxSymbolValue = -1; for (int j = chars.length-1; j >= 0; --j) { char c = chars[j]; Symbol symbol = null; for (int i = 0; i < SYMBOLS.length; ++i) { if (SYMBOLS[i].symbol == c) { symbol = SYMBOLS[i]; break; } } if (symbol == null) { return Double.NaN; } if (symbol.value >= maxSymbolValue) { // Example: second "M" in "MCMXC" (1990). maxSymbolValue = symbol.value; converted += maxSymbolValue; } else { // Example: first "C" in "MCMXC" (1990). converted -= symbol.value; } } return converted; } public void setParameters(String[] parameters) {}
public String[] getParameters() { return null; } ... }
Compile class ReIndex
by executing ant (see build.xml
) in the docs/samples/programming/reindex/
directory.
Run ant run in the docs/samples/programming/reindex/
directory to re-index the "Tutorial
" Library
using indexing.xml
, our customized indexing specification.
Run ant run2 in the docs/samples/programming/query/
directory to check that the 20.xq
and 21.xq
queries now return the expected results.
A LibraryMember
, Collection
or Document
, has not only a content, but also properties. Properties are also explained in the chapter Getting Started.
A property has a name (String
) and a value (any Object
implementing java.io.Serializable
).
Qizx automatically adds a few system properties to all LibraryMember
s. The most useful system properties are:
The nature of the LibraryMember
: "collection
" or "document
".
The absolute path of the LibraryMember
. Example: "/Author Blurbs/Philip_Jose_Farmer.xhtml
".
But the real benefit of supporting properties is to allow an application to attach private information to a LibraryMember
.
The AddMeta
class implements a very specific command-line tool which allows to add metadata[14] to Document
s stored in the "/Author Blurbs
" Collection
. Remember that the Document
s stored in that Collection
are copies of articles found on Wikipedia. The AddMeta
class allows to annotate a Document
with the following metadata:
The date of the Wikipedia article. A java.util.Date
object.
The location of the Wikipedia article. A java.net.URL
object.
The license[15] attached to the Wikipedia article. A String
.
Excerpts of AddMeta.java
:
... Collection collection = lib.getCollection(collectionPath); if (collection == null) { error("'" + collectionPath + "' is not a collection"); return; } LibraryMemberIterator iter = collection.getChildren(); while (iter.moveToNextMember()) {LibraryMember m = iter.getCurrentMember(); if (m.isDocument()) { String name = trimExtension(m.getName()); Info info = (Info) nameToInfo.get(name);
if (info == null) { warning("No meta-data about '" + m.getPath() + "'..."); } else { verbose("Adding meta-data to '" + m.getPath() + "'..."); m.setProperty("copyDate", info.copyDate);
m.setProperty("copiedURL", info.copiedURL); m.setProperty("license", license); } } } ...
Iterate over the members of | |
If an entry having the same name as current
private static final class Info { public final Date copyDate; public final URL copiedURL; public Info(Date copyDate, URL copiedURL) { this.copyDate = copyDate; this.copiedURL = copiedURL; } } The content of | |
Method |
Compile class AddMeta
by executing ant (see build.xml
) in the docs/samples/programming/addmeta/
directory.
Run ant run in the docs/samples/programming/addmeta/
directory to add the metadata found in LocalCopyInfo.txt
to the corresponding Document
s of Collection
"/Author Blurbs
".
Run ant run3 in the docs/samples/programming/query/
directory to execute 30.xq
, a query making use of some of the properties we have just added:
(: List the original, Wikipedia, URLs of author blurbs containing word "Russian" and copied locally after September 15, 2007. :) declare namespace html = "http://www.w3.org/1999/xhtml"; for $doc in xlib:query-properties("/Author Blurbs/*.xhtml", copyDate ge xs:date("2007-09-15")) where $doc/*[ft:contains("Russian")] return xlib:get-property($doc, "copiedURL")
xlib:query-properties
and xlib:get-property
are XQuery extension functions, specific to Qizx, documented in Chapter 14, XML Library extension functions.
In addition to the main packages com.qizx.api and com.qizx.api.fulltext, the Java API of Qizx provides packages containing miscellaneous utilities: their root is com.qizx.api.util, and there are specialized sub-packages.
This section is a short presentation of main functionalities of these packages. For more details, please consult the Javadocumentation.
The main package contains implementations of API interfaces and some useful adapters.
DefaultModuleResolver is the default implementation of ModuleResolver. By plugging a subclass of DefaultModuleResolver
in a LibraryManager or a XQuerySessionManager, it is possible to change the way modules are accessed.
XMLPushStream is a generic interface which is roughly equivalent to SAX2. Qizx uses it rather than using SAX2 because SAX2 is not well adapted to the XQuery Data Model. Adapters to and from SAX2 are provided.
XMLPushStream allows transferring XML content in a "push" style. It is typically used to export a Node item, but it can also be used to compute and store a Document into an XML Library.
The most useful implementation is XMLSerializer. It allows transforming XML content to a character stream.
Adapter to SAX: PushStreamToSAX converts to a flow of SAX2 events.
Adapter to DOM: PushStreamToDOM builds a DOM document.
Builder of internal XML Data Model: CorePushBuilder allows creating a representation of the internal Data Model that can be accessed through the com.qizx.api.Node interface.
There is also SAXToPushStream, a reverse adapter from SAX to XMLPushStream
which is internally used for loading XML documents into a database using the standard JAXP interface.
NodeSource is a subclass of javax.xml.transform.sax.SAXSource. It can be used to pass a Qizx Node as a source to any XSLT engine supporting JAXP (namely Saxon and Xalan).
Conversely, PushStreamResult is a subclass of class javax.xml.transform.sax.SAXResult, wrapping any XMLPushStream
. Its typical use is with Library.beginImportDocument: it allows directly reimporting the result of an XSLT transformation into a Document of a database (XML Library).
Implementations of full-text interfaces:
DefaultFullTextFactory
DefaultTextTokenizer
DefaultScorer
FullTextHighlighter is an iterator extending XMLPullStream, it distinguishes terms of a full-text query. It is basically used for implementing the ft:highlight extension function.
FullTextSnippetExtractor extracts a snippet from a document or a XML Node, attempting to show most of the terms of a full-text query within N words (by default 20). It is also an iterator extending XMLPullStream. It is used for implementing the ft:snippet extension function.
[3] “In theory, it is kind of like Make, without Make's wrinkles” say its authors.
[4] The name of the Library
used in this tutorial is "Tutorial
".
[5] Remember that a Library
is at the same time a database and the transactional session used to modify and/or query this database.
[6] Qizx has the ACID capabilities of a transactional database: Atomicity, Consistency, Isolation, Durability.
[7] At worst, lock the root Collection
.
[8] Libraries are relatively lightweight objects. Opening a Library
is cheap in terms of memory and CPU usage.
[9] "An Introduction to StAX" by Elliotte Rusty Harold. Recommended StAX implementation: Woodstox.
[10] Kilgore Trout is not an actual author. This is the pseudonym used by Philip José Farmer to write the "Venus on the Half-Shell" Science-Fiction novel.
[11] Document Object Model. Actually the term used in the XML Query literature is XQuery/XPath2 Data Model or DM for short.
[12] Not a javax.xml.namespace.QName
as found in the Java™ runtime, starting from version 1.5.
[13] Run ant run2 in the docs/samples/programming/query/
directory to check that.
[14] Data about data.