Chapter 10. Programming with the Qizx API

Table of Contents

1. What you'll learn
1.1. About the data samples used in this tutorial
1.2. Compiling and running the code samples
2. Creating a Library and populating it with Collections and Documents
2.1. Creating a LibraryManager
2.2. Creating a Library
2.3. Creating Collections and importing Documents
2.4. The dual nature of the Library object: both a database and a transactional session
2.5. Compiling and running the code of this lesson
3. Retrieving Documents stored in a database
3.1. Compiling and running the code of this lesson
4. Querying a database
4.1. Compiling and running the code of this lesson
5. Deleting Documents and Collections
5.1. Compiling and running the code of this lesson
6. Modifying a Document stored in a database
6.1. Updating a Document using XQuery Update
6.1.1. Compiling and running the code of this lesson
6.2. Updating a Document using the Java API and DOM
6.2.1. Compiling and running the code of this lesson
7. Customizing the indexing of XML content
7.1. Re-indexing a Library
7.2. Writing a custom Indexing.NumberSieve
7.3. Compiling and running the code of this lesson
8. Adding metadata to Documents
8.1. Compiling and running the code of this lesson
9. Convenience and utility classes provided by the API
9.1. Package com.qizx.api.util
9.2. Package com.qizx.api.util.fulltext
9.3. Package com.qizx.api.util.accesscontrol

1. What you'll learn

This edition of Qizx does not include a stand-alone server program. It is designed to be embedded in a Java™ application, typically a Servlet. You'll learn in this chapter everything needed to implement a basic application using Qizx. For an introduction to using Qizx, please see the chapter Getting started.

The target audience of this chapter are experienced Java programmers, having a good knowledge of XML and at least a basic knowledge of XQuery.

This chapter is organized in 7 lessons:

  1. First lesson: how to create a database (Library) and populate it with data (Collections and Documents).

    This lesson is by far the largest one because it contains a refresher about the concepts (LibraryManager, Library, Collection, etc) involved in programming Qizx and also, sidebars about the XML catalog resolver, multi-threading and authorization, which can be skipped on a first reading.

  2. Second lesson: how to make local copies of Documents stored in a database.

  3. Third lesson: how to query a database.

  4. Fourth lesson: how to delete a Document, a Collection or a whole Library.

  5. Fifth lesson: how to modify a Document stored in a database.

  6. Sixth lesson: how to customize the indexing of the XML content and how to re-index a database

  7. Seventh lesson: how to add metadata (properties) to a Document.

1.1. About the data samples used in this tutorial

The directory docs/samples/book_data/ contains several kinds of XML documents. These short, simple XML documents (a few dozens) serve no other purpose than teaching how to program with the Qizx API. In real life, Qizx can be expected to store and query hundreds of thousands XML documents of multiple sizes, ranging from a few hundreds of bytes to several hundred megabytes.

Books/

Each document found in this directory contains the description of a Science-Fiction book: its title, authors, editions, etc. Example docs/samples/book_data/Books/The_Robots_of_Dawn.xml:

<book xmlns="http://www.qizx.com/namespace/Tutorial">
  <title>The Robots of Dawn</title>
  <author>Isaac Asimov</author>
  <publicationDate>MCMLXXXIII</publicationDate>
  <editions>
    <edition>
      <ISBN>0553299492</ISBN>
      <publisher>Doubleday</publisher>
      <language>English</language>
      <year>1983</year>
    </edition>
  </editions>
</book>
Publishers/

Each document found in this directory contains the description of a publisher: its name, address, etc. Example docs/samples/book_data/Publishers/Doubleday.xml:

<publisher xmlns="http://www.qizx.com/namespace/Tutorial">
  <trademark>Doubleday</trademark>
  <company>Random House, Inc.</company>
  <address xml:space="preserve">1540 Broadway
New York, NY 10036
US</address>
</publisher>
Authors/

Each document found in this directory contains the description of a Science-Fiction author: her/his name, pseudonyms, birth date, etc. Example docs/samples/book_data/Authors/iasimov.xml:

<author xmlns="http://www.qizx.com/namespace/Tutorial"
  nationality="US" gender="male">
  <fullName>Isaac Asimov</fullName>
  <pseudonyms>
    <pseudonym>Paul French</pseudonym>
    <pseudonym>George E. Dale</pseudonym>
  </pseudonyms>
  <birthDate>January 2, 1920</birthDate>
  <birthPlace>
    <city>Petrovichi</city><country>Russian SFSR</country>
  </birthPlace>
  <blurb location="../Author%20Blurbs/Isaac_Asimov.xhtml"/>
</author>
Author Blurbs/

Each document found in this directory is an XHTML page which is a copy of a Wikipedia article describing a Science-Fiction author. Example docs/samples/book_data/Author Blurbs/Isaac_Asimov.xhtml:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" dir="ltr"
lang="en">
<head>
...
<title>Isaac Asimov - Wikipedia, the free encyclopedia</title>
...
</body>
</html>

The XHTML DTD and the corresponding XML Catalog are found in docs/samples/xhtml_dtd/.

1.2. Compiling and running the code samples

All the code samples used to illustrate this chapter are found in the docs/samples/programming/ directory. Files containing XQuery scripts are found in the docs/samples/book_queries/ directory.

You'll need a recent version of ant, a Java-based build tool[3] to compile and run the codes samples.

2. Creating a Library and populating it with Collections and Documents

The Put class implements a command-line tool allowing to create a Library and populate it with Collections and Documents. More precisely, it allows to copy one or more source files or directories to a single destination Collection or Document. If multiple sources are specified, the destination must be an existing Collection. Moreover the Put class allows to filter what's being copied by the means of a simple java.io.FileFilter.

The outline of this program is (excerpts of Put.java):

        LibraryManager libManager = getLibraryManager(storageDir);1
        Library lib = getLibrary(libManager, libName);2

        LibraryMember dst = lib.getMember(dstPath);
        boolean dstIsCollection = (dst != null && dst.isCollection());

        if (args.length > l+4 && !dstIsCollection) {
            shutdown(lib, libManager);
            usage("'" + dstPath + "', does not exist or is a document");
        }

        try {
            for (int i = l+2; i < last; ++i) {
                File srcFile = new File(args[i]);

                String dstPath2 = dstPath;
                if (dstIsCollection) {
                    dstPath2 = joinPath(dstPath, srcFile.getName());
                }
                put(lib, srcFile, filter, dstPath2);3
            }

            verbose("Committing changes...");
            lib.commit();4
        } finally {
            shutdown(lib, libManager);5
        }

1

Get a LibraryManager. ``Create it'' if it does not exist.

2

Get a Library from the LibraryManager. Create it if it does not exist.

3

For each source directory, create the corresponding Collection in the Library. Assume that each source file is a well-formed XML document and import it in the Library.

4

Commit changes made to the Library.

5

Close the Library. ``Close'' the LibraryManager.

Objects involved:

LibraryManager

A LibraryManager is similar to a database manager. It allows to open or create Libraries.

Library

A Library is similar to a database. If we use the filesystem analogy, a Library is similar to a disk drive.

A Library has a name[4]. A Library always contains a root Collection, named "/", which cannot be deleted.

Collection

If we use the filesystem analogy, a Collection is similar to a directory. It can contain Documents and/or Collections.

Note that nothing forces you to create a hierarchy of Collections. If you prefer, you can import all your Documents in the root Collection.

Document

If we use the filesystem analogy, a Document is similar to a file. Unlike plain files, the content of a Document is always well-formed XML.

LibraryMember

A common term (super-interface) for both Collection and Document.

Like its filesystem counterpart, a LibraryMember has a path. Path components are separated by a slash character "/". The last component is the name of the LibraryMember. The other path components are the names of the ancestor Collections of the LibraryMember, up to the root Collection "/".

Example: "/foo/bar/gee". The name of this LibraryMember is "gee". Its ancestor Collections are, from direct parent to the root: "bar", "foo", "/".

There is no concept of current working Collection, therefore relative paths are not useful.

Note that the name of LibraryMember may contain any character supported by Java™ (including whitespace), except the slash character "/".

Unlike its filesystem counterpart, a LibraryMember may have any number of user-defined properties (meta-data) in addition to its content (that is, XML content for a Document, members for a Collection). More on properties in lesson 7.

2.1. Creating a LibraryManager

    private static LibraryManager getLibraryManager(File storageDir) 
        throws IOException, QizxException
    {
        if (storageDir.exists()) {1
            return Configuration.openLibraryGroup(storageDir);2
        } else {
            if (!storageDir.mkdirs()) {3
                throw new IOException("cannot create directory '" + 
                                      storageDir + "'");
            }

            verbose("Creating library group in '" + storageDir + "'...");
            return Configuration.createLibraryGroup(storageDir);4
        }
    }

3 1

A LibraryManager stores all its data (XML content, indexes, etc) in a single directory of the filesystem. Creating LibraryManager automatically creates this directory if it does not already exist. In the above code, we have preferred to create the storage directory ``by hand'', before invoking createLibraryGroup. See also How to delete a LibraryManager.

4 2

A LibraryManager is obtained by using the openLibraryGroup or createLibraryGroup methods of the Configuration class.

Class Configuration supports many options that can be set before creating or opening a Library Group or LibraryManager.

Note: LibraryManagerFactory is now deprecated.

2.2. Creating a Library

    private static Library getLibrary(LibraryManager libManager,
                                      String libName) 
        throws QizxException {
        Library lib = libManager.openLibrary(libName);1
        if (lib == null) {
            verbose("Creating library '" + libName + "'...");
            libManager.createLibrary(libName);2
            lib = libManager.openLibrary(libName);
        }
        return lib;
    }

1

openLibrary returns the Library having the specified name. It returns null if such Library does not exist.

2

createLibrary creates the Library having the specified name.

2.3. Creating Collections and importing Documents

    private static void put(Library lib,File srcFile, FileFilter filter,
                            String dstPath) 
        throws IOException, QizxException {
        if (srcFile.isDirectory()) {
            Collection collection = lib.getCollection(dstPath);1
            if (collection == null) {
                verbose("Creating collection '" + dstPath + "'...");
                collection = lib.createCollection(dstPath);2
            }

            File[] files = srcFile.listFiles(filter);
            if (files == null) {
                throw new IOException("cannot list directory '" + 
                                      srcFile + "'");
            }

            for (int i = 0; i < files.length; ++i) {
                File file = files[i];
                put(lib, file, filter, joinPath(dstPath, file.getName()));
            }
        } else {
            verbose("Importing '" + srcFile + "' as document '" + 
                    dstPath + "'...");
            lib.importDocument(dstPath, srcFile);3
        }
    }

1

Library has several methods returning a LibraryMember: getCollection, getDocument, getMember. All these methods must be passed absolute paths.

2

A Collection is created by invoking createCollection.

3

A Document is created by invoking one of the several importDocument methods. These methods differ by the types of their source arguments: java.io.File, java.net.URL, org.xml.sax.InputSource, etc. In all cases, the source must contain well-formed XML.

Note that if a Document already exists, importDocument allows to change its content.

Now what if your XML source is not a file? May be your XML source is a W3C DOM Document or a JDOM Document. Or may be you want to dynamically create a Document. In such case, you'll need to use the beginImportDocument and endImportDocument low-level methods.

Example: dynamically create a Document containing "<hello xmlns="http://www.acme.com/ns/test">Hello world!</hello>":

XMLPushStream out = lib.beginImportDocument(docPath);
out.putDocumentStart();
QName helloName = lib.getQName("hello", "http://www.acme.com/ns/test");
out.putElementStart(helloName);
out.putText("Hello world!");
out.putElementElement(helloName);
out.putDocumentEnd();
Document doc = lib.endImportDocument();

The XMLPushStream interface returned by beginImportDocument allows to ``push XML content'' into a Document. This is a pretty low-level interface, similar to SAX. Fortunately, Qizx comes with two handy adapters:

com.qizx.api.util.DOMToPushStream

Copies a W3C DOM document or element to an XMLPushStream. This utility class is used in lesson 5.

com.qizx.api.util.SAXToPushStream

Implements org.xml.sax.ContentHandler, org.xml.sax.ext.LexicalHandler, etc, to convert SAX events to invocations of the corresponding methods in an XMLPushStream.

2.4. The dual nature of the Library object: both a database and a transactional session

A Library is both a database (or a disk drive, if we use the filesystem analogy) and a transactional session allowing to modify and/or query this database. As such, a sequence of changes made to a Library must end with commit or rollback.

        ...
            verbose("Committing changes...");
            lib.commit();1
        } finally {
            shutdown(lib, libManager);
        }
        ...

    private static void shutdown(Library lib, LibraryManager libManager) 
        throws QizxException {
        if (lib.isModified()) {2
            lib.rollback();
        }
        lib.close();3
        libManager.closeAllLibraries(10000 /*ms*/);4
    }

1

The commit method is invoked to commit the changes made to the Library.

2

The shutdown helper is invoked even when the program crashes before committing the changes made to the Library. The isModified method may be used to test this case, becausem a successful commit clears the modified flag. When this error case happens, you need to invoke the rollback method to restore the state of the Library before the changes.

3

Note that the close method raises a QizxException if the database has been modified and commit or rollback have not been invoked.

4

A LibraryManager has no close method. However, you really need to invoke its closeAllLibraries method to stop worker threads. If you don't do that, your application may not be able to exit.

2.5. Compiling and running the code of this lesson

  • Compile class Put by executing ant (see build.xml) in the docs/samples/programming/put/ directory.

  • Create the "Tutorial" library and populate it with all the documents found in docs/samples/book_data/ by running ant run in the docs/samples/programming/put/ directory.

3. Retrieving Documents stored in a database

The Get class implements a command-line tool allowing to make local copies of Collections and Documents stored in a Library. This tool can match the names of the Collections and Documents to be copied against a wildcard. For example, it can be used to make local copies all Documents whose names end with ".xhtml" found in the "/Author Blurbs" Collection (corresponding command-line argument is "/Author Blurbs/*.xhtml").

Warning

For queries to work properly, document imports and updates should first be completed with a commit. Some operations would work even before the commit (like getting the contents of a just imported document), but many operations rely on indexing, and indexing is completed at the time of the commit.

Excerpts of Get.java:

            ...
            LibraryMember libMember = lib.getMember(path);1
            if (libMember == null) {
                error("dont't find '" + path + "'");
                return;
            }

            get(libMember, dstFile);
            ...

   private static void get(LibraryMember libMember, File dstFile) 
        throws IOException, QizxException {
        File dstFile2;
        if (dstFile.isDirectory()) {
            String baseName = libMember.getName();
            if ("/".equals(baseName))
                baseName = "root";

            dstFile2 = new File(dstFile, baseName);
        } else {
            dstFile2 = dstFile;
        }

        if (libMember.isCollection()) {2
            getCollection((Collection) libMember, dstFile2);
        } else {
            getDocument((Document) libMember, dstFile2);
        }
    }

1

Library.getMember returns the LibraryMember (if any) corresponding to specified absolute path.

2

LibraryMember.isCollection may be used to test if this member is a Collection or a Document. You'll also find a LibraryMember.isDocument method.

A local copy of a Document is created as follows:

    private static void getDocument(Document doc, File dstFile) 
        throws IOException, QizxException {
        verbose("Copying document '" + doc.getPath() + 
                "' to file '" + dstFile + "'...");

        FileOutputStream out = new FileOutputStream(dstFile);
        try {
            doc.export(new XMLSerializer(out, "UTF-8"));1
        } finally {
            out.close();
        }
    }

1

The Document.export method used in the above code sample has a XMLPushStream parameter. That is, to export itself, a Document ``pushes its XML content'' (element tags, attributes, text, etc) to an object implementing the XMLPushStream interface.

Qizx comes with a number of useful implementations of the XMLPushStream interface:

com.qizx.api.util.XMLSerializer

Most useful implementation. It allows to save XML content to a java.io.OutputStream and thus, to a File or a String.

com.qizx.api.util.PushStreamToDOM

With this implementation of XMLPushStream, converting a Qizx Document to org.w3c.dom.Document is as simple as:

PushStreamToDOM toDOM = new PushStreamToDOM();
doc.export(toDOM);
org.w3c.dom.Document w3cDOMDoc = toDOM.getResultDocument();
com.qizx.api.util.PushStreamToSAX

With this implementation of XMLPushStream, feeding a Qizx Document into a SAX org.xml.sax.ContentHandler is as simple as:

PushStreamToSAX toSAX = new PushStreamToSAX(handler);
doc.export(toSAX);

The above export method is useful when you want to save, or simply traverse, a Document stored in a Library. There is another Document.export method, this time having no parameters, which is useful when you want to parse a Document stored in a Library. This alternate export method returns an XMLPullStream, that is, a pull parser[9], similar to a StAX parser.

A local copy of a Collection is created as follows:

    private static void getCollection(Collection col, File dstFile) 
        throws IOException, QizxException {
        verbose("Copying collection '" + col.getPath() + 
                "' to directory '" + dstFile + "'...");

        if (!dstFile.isDirectory()) {
            verbose("Creating directory '" + dstFile + "'...");

            if (!dstFile.mkdirs()) {
                throw new IOException("Cannot create directory '" + 
                                      dstFile + "'");
            }
        }

        LibraryMemberIterator iter = col.getChildren();1
        while (iter.moveToNextMember()) {
            LibraryMember libMember = iter.getCurrentMember();

            File dstFile2 = new File(dstFile, libMember.getName());
            
            if (libMember.isCollection()) {
                getCollection((Collection) libMember, dstFile2);
            } else {
                getDocument((Document) libMember, dstFile2);
            }
        }
    }

1

Collection.getChildren returns an iterator which iterates over the Collections and Documents directly contained in a Collection.

You'll also find a variant of the getChildren method which has a LibraryMemberFilter parameter. com.qizx.api.util.GlobFilter is a ready-to-use implementation of LibraryMemberFilter which matches the name (not the full path, just the name) of a LibraryMember against a glob-style (Unix shell) pattern.

About Qizx iterators

The Qizx API contains a number of iterators which work differently from java.util.Iterator (e.g. hasNext, next).

In the Qizx API, an iterator always has a moveToNextXXX method which moves the position of the cursor by one item and a getCurrentXXX which returns the item found at current cursor position.

Invoking getCurrentXXX several times, without invoking moveToNextXXX, is indeed possible and will always return the same item. However initially the cursor is one position before the first item (if any), therefore you need to invoke moveToNextXXX at least once before invoking getCurrentXXX.

3.1. Compiling and running the code of this lesson

  • Compile class Get by executing ant (see build.xml) in the docs/samples/programming/get/ directory.

  • Run ant run in the docs/samples/programming/get/ directory to make local copies of

    • Document "/Authors/pjfarmer.xml",

    • Documents "/Author Blurbs/Philip*",

    • Documents "/Books/The*.xml",

    • Collection "/Publishers".

    in docs/samples/programming/get/tests/out/.

4. Querying a database

Querying a database (that is, a Library) is fairly easy:

Expression expr = lib.compileExpression(script);1
ItemSequence results = expr.evaluate();2        
while (results.moveToNextItem()) {3
    Item result = results.getCurrentItem();

    /*Do something with result.*/
}

1

First compile an XQuery expression using Library.compileExpression. If no compilation errors (CompilationException) are found, this returns an Expression object.

2

Then evaluate the expression using Expression.evaluate. If no evaluation errors (EvaluationException) are found, this returns the results of the evaluation in the form of an ItemSequence.

3

An ItemSequence allows to iterate over a sequence of Items (see About Qizx iterators). A Item is either an atomic value or an XML Node.

Example (1.xq):

(: Compute and return 2 + 3 :)
2 + 3

evaluates to an ItemSequence containing a single atomic value (5).

Example (3.xq):

(: List all books by their titles. :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";

collection("/Books")//t:book/t:title

evaluates to an ItemSequence containing several t:title element Nodes.

Warning

For queries to work properly, document imports and updates should first be completed with a commit. Some operations would work even before the commit (like getting the contents of a just imported document), but many operations rely on indexing, and indexing is completed at the time of the commit.

The Query class, which implements a command-line tool allowing to query a Library, is more complicated than the above code sample because it supports somewhat advanced options.

Excerpts of Query.java:

    private static Expression compileExpression(Library lib, 
                                                String script,
                                                LibraryMember queryRoot,
                                                QName[] varNames,
                                                String[] varValues) 
        throws IOException, QizxException {
        Expression expr;
        try {
            expr = lib.compileExpression(script);
        } catch (CompilationException e) {
            Message[] messages = e.getMessages();
            for (int i = 0; i < messages.length; ++i) {
                error(messages[i].toString());
            }

            throw e;
        }

        if (queryRoot != null)
            expr.bindImplicitCollection(queryRoot);1

        if (varNames != null) {
            for (int i = 0; i < varNames.length; ++i) {
                expr.bindVariable(varNames[i], varValues[i], /*type*/ null);2
            }
        }

        return expr;
    }

1

Expression.bindImplicitCollection allows to write queries containing paths which are not prefixed with collection("XXX") or doc("YYY").

Example (100.xq), using bindImplicitCollection to bind the expression to collection("/Books"), allows to write:

(: List all books by their titles. :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";

//t:book/t:title

instead of (3.xq):

(: List all books by their titles. :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";

collection("/Books")//t:book/t:title

2

An XQuery expression can be further parametrized by the use of variables. Example (101.xq):

(: List all books containing the value of variable $searched 
   in their titles. :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";

declare variable $searched external;

collection("/Books")//t:book/t:title[contains(., $searched)]

Expression.bindVariable allows to give a variable its value, prior to evaluating the expression.

Some queries may return thousands of results. Therefore, displaying just a range of results (e.g. from result #100 to result #199 inclusive) is a very common need.

    private static void evaluateExpression(Expression expr, 
                                           int from, int limit) 
        throws QizxException {
        ItemSequence results = expr.evaluate();
        if (from > 0) {
            results.skip(from);1
        }

        XMLSerializer serializer = new XMLSerializer();
        serializer.setIndent(2);

        int count = 0;
        while (results.moveToNextItem()) {
            Item result = results.getCurrentItem();

            System.out.print("[" + (from+1+count) + "] ");
            showResult(serializer, result);
            System.out.println();

            ++count;
            if (count >= limit)2
                break;
        }
        System.out.flush();
    }

1

ItemSequence.skip allows to quickly skip the specified number of Items.

2

This being done, you still need to limit the number of Items you are going to display.

In this lesson, we'll just show how to print the string representation of an Item. In lesson 5, we'll go further and explore the data model of Qizx.

    private static void showResult(XMLSerializer serializer,
                                   Item result) 
        throws QizxException {
        if (!result.isNode()) {1
            System.out.println(result.getString());2
            return;
        }
        Node node = result.getNode();3

        serializer.reset();
        String xmlForm = serializer.serializeToString(node);4
        System.out.println(xmlForm);
    }

1 3

Item.isNode returns true for a Node and false for an atomic value. Similarly, Item.getNode returns a Node when the Item actually is a Node and null when the Item is an atomic value.

2

Item.getString returns the string value of an Item (whether Node or atomic value). What precisely is the string value of an Item is specified in the XQuery standard.

4

The XMLSerializer.serializeToString convenience method is used to obtain the string representation of a Node.

4.1. Compiling and running the code of this lesson

  • Compile class Query by executing ant (see build.xml) in the docs/samples/programming/query/ directory.

  • Run ant run in the docs/samples/programming/query/ directory to perform this query:

    (: Find all books written by French authors. :)
    declare namespace t = "http://www.qizx.com/namespace/Tutorial";
    
    for $a in collection("/Authors")//t:author[@nationality = "France"]
        for $b in collection("/Books")//t:book[.//t:author = $a/t:fullName]
        return 
            $b/t:title

    Note that directory docs/samples/book_queries/ contains all the queries needed to illustrate this lesson and also the following ones. You can execute all these queries by running ant run_all in docs/samples/programming/query/.

5. Deleting Documents and Collections

Class Delete implements a command-line tool allowing to delete one or more Documents or Collections. If no Document or Collection paths are specified as command-line arguments, the tool deletes the whole Library.

Excerpts of Delete.java:

        if (args.length == 2) {
            verbose("Deleting library '" + libName + "'...");
            if (!libManager.deleteLibrary(libName)) {1
                warning("Library '" + libName + "' not found");
            }
            libManager.closeAllLibraries(10000 /*ms*/);
        } else {
            Library lib = libManager.openLibrary(libName);

            try {
                for (int i = 2; i < args.length; ++i) {
                    String path = args[i];

                    verbose("Deleting member '" + path + "' of library '" + 
                            libName + "'...");
                    if (!lib.deleteMember(path)) {2
                        warning("Member '" + path + "' of library '" + 
                                libName + "' not found");
                    }
                }

                verbose("Committing changes...");
                lib.commit();
            } finally {
                shutdown(lib, libManager);
            }
        }

1

LibraryManager.deleteLibrary is used to delete a Library. Note that the commit method is not invoked in this case.

2

Library.deleteMember is used to delete a LibraryMember (Document or Collection). Collections are recursively deleted.

How to delete a LibraryManager

Because there is no LibraryManager.delete method, the only way to physically destroy a LibraryManager is, first to ``close'' it using LibraryManager.closeAllLibraries, and then, to delete its storage directory (obtained using LibraryManager.getStorageDirectory).

5.1. Compiling and running the code of this lesson

  • Compile class Delete by executing ant (see build.xml) in the docs/samples/programming/delete/ directory.

  • Run ant run in the docs/samples/programming/delete/ directory to delete Document "/Authors/ktrout.xml"[10].

6. Modifying a Document stored in a database

Since Qizx 2.1, there are two methods for updating a document:

  1. Use XQuery Update, an extension to XQuery that allows insertions, deletions and updates on selected nodes. This is in general by far the easiest method.

    A tutorial is available here for a quick yet comprehensive introduction to XQuery Update.

  2. Extract the document to update as a W3C DOM Document, then update the DOM form, then write back the DOM onto the document. This was the only method available in Qizx 2.0. It can still be useful in specific cases.

Whatever method is used, please remember that any update operation on a document basically implies replacing the document in its entirety. This corresponds with a deliberate design choice allowing faster queries.

In the next sections, the two methods are explained. The example described consists of adding a pseudonym to an existing author specified by his/her full name.

6.1. Updating a Document using XQuery Update

XQuery Update is an extension of XQuery which provides additional instructions for updating documents. The updating primitives are insert, delete, replace and rename.

Using XQuery Update simply consists of executing a script containing XQuery Update primitives. Such a script is called an updating query.

An "updating query" is executed in a special way by the XQuery engine:

  • first a "pending update list" is created by executing the query (which returns no value)

  • then the update list is applied at once.

This means that changes are not visible during the execution of the script, but only after completion. This can be surprising, as noted in the example hereafter. The XQuery Update tutorial addresses such issues with more detail.

Here is the XQuery Update script used:

declare default element namespace 'http://www.qizx.com/namespace/Tutorial';
declare variable $ERR := qName('http://www.w3.org/2005/xqt-errors', 'ERR00001');
declare variable $authorName external;
declare variable $pseudo external;

let $auth := /author[fullName = $authorName]
return
 if (empty($auth))
   then error($ERR, 'no such author')
 else if ($auth/pseudonyms[pseudonym = $pseudo]) 1
   then error($ERR, 'pseudonym already defined')
 else if ($auth/pseudonyms)
   then insert node <pseudonym>{ $pseudo }</pseudonym> 2
          into $auth/pseudonyms
   else insert node <pseudonyms><pseudonym>{ $pseudo }</pseudonym></pseudonyms> 3
          into $auth

1

Preliminary tests: check that the author exists and that the pseudonym is not yet defined.

2

If the enclosing element pseudonyms exists, then we can directly insert the new pseudonym element into it.

3

If the element pseudonyms does not exist yet, then create one with the new pseudonym element inside it.

Please notice that due to the way XQuery Update works, it not possible to create the element pseudonyms first, then to insert the new pseudonym element inside it. This is because the element pseudonyms is not visible until completion, therefore it cannot be used by an expression insert node ... into.

The corresponding Java program is XUpdate.java.

6.1.1. Compiling and running the code of this lesson

  • Compile class XUpdate by executing ant (see build.xml) in the docs/samples/programming/edit/ directory.

  • Run ant xurun in the docs/samples/programming/edit/ directory to add pseudonym "Kilgore Trout" to author "Philip José Farmer" [10].

6.2. Updating a Document using the Java API and DOM

The strategy we'll use is the following:

  1. Find the Document to be modified by performing a query.

  2. Convert the document found to a W3C DOM Document. This step is needed because the DOM[11] of Qizx is immutable. For example, you'll find a Node.getAttribute method, but no Node.setAttribute method.

  3. Modify the W3C DOM Document.

  4. Replace the content of the Document stored in the Library by the content of the W3C DOM Document.

Unlike the Put, Get, Delete classes which implement generic command-line tools, the Edit class is specific to the dataset used to illustrate this tutorial. The Edit class allows to add a pseudonym to an author. The author is found by her/his full name, and not by the path of the Document containing her/his record.

Excerpts of Edit.java:

        Node author = findAuthor(lib, collectionPath, authorName);1
        if (author == null)
            return;

        if (hasPseudonym(author, pseudonym)) {2
            warning("'" + authorName + "' already has pseudonym '" + 
                    pseudonym + "'");
            return;
        }

        org.w3c.dom.Document doc = 
            (org.w3c.dom.Document) author.getDocumentNode()3.getObject();4
        if (!doAddPseudo(doc, pseudonym))5
            return;

        XMLPushStream out = 
            lib.beginImportDocument(author.getLibraryDocument()6.getPath());7

        DOMToPushStream helper = new DOMToPushStream(lib, out);8
        helper.putDocument(doc);
        lib.endImportDocument();

1

The findAuthor method allows to find an t:author element by the content of its t:fullName child element. Lesson 3 explained how to query a database, so there is nothing new here:

    private static Node findAuthor(Library lib, String collectionPath,
                                   String authorName) 
        throws QizxException {
        Collection collection = lib.getCollection(collectionPath);
        if (collection == null) {
            error("'" + collectionPath + "' is not a collection");
            return null;
        }

        String script = 
            "declare namespace t = '" + TUTORIAL_NS_URI + "';\n" +
            "declare variable $name external;\n" +
            "/t:author[t:fullName = $name]";

        Expression expr = lib.compileExpression(script);
        expr.bindImplicitCollection(collection);
        expr.bindVariable(lib.getQName("name"), authorName, /*type*/ null);

        ItemSequence items = expr.evaluate();
        if (!items.moveToNextItem()) {
            error("Don't find author '" + authorName + "'");
            return null;
        }
        Item item = items.getCurrentItem();

        return item.getNode();
    }

2

The hasPseudonym method is detailed below.

3

Method Node.getDocumentNode is used to access the document Node containing the t:author element Node previously found by the findAuthor method.

4

Method Item.getObject converts an Item to an equivalent JavaObject. In the case of a com.qizx.api.Node, this equivalent is a org.w3c.dom.Node.

5

The doAddPseudo method adds a t:pseudonym descendant to the t:author element using the org.w3c.dom API, which is standard Java™ since version 1.4.

6

We now need to access the Document, that is, the LibraryMember, containing the t:author element Node. Method Node.getLibraryDocument returns this information. Not to be confused with Node.getDocumentNode, which returns the outermost ancestor Node of a Node.

7 8

Library.beginImportDocument, Library.endImportDocument and the com.qizx.api.util.DOMToPushStream helper class allows to import a W3C DOM Document into a Library. This has already been explained in lesson 1.

The hasPseudonym method is a simple example of using the Qizx DOM. It searches its pseudonym argument inside an t:author/t:pseudonyms/t:pseudonym element (author having multiple pseudonyms) or inside a t:author/t:pseudonym element (author having a single pseudonym):

    private static boolean hasPseudonym(Node element, String pseudonym) 
        throws QizxException {
        Node child = element.getFirstChild();1
        while (child != null) {
            if (child.isElement()) {2
                String childName = child.getNodeName().getLocalPart();3
                if ("pseudonyms".equals(childName)) {
                    return hasPseudonym(child, pseudonym);
                } else if ("pseudonym".equals(childName)) {
                    if (pseudonym.equals(child.getStringValue())) {
                        return true;
                    }
                }
            }

            child = child.getNextSibling();4
        }

        return false;
    }

1 4

The Node.getFirstChild and Node.getNextSibling methods allow to iterate over the children of an element or document Node.

Attributes are represented by Nodes too, but are not considered to be children of element Nodes. Attributes are accessed using the Node.getAttribute, Node.getAttributeCount, Node.getAttributes methods.

2

Nodes are not typed. That is, there are no Element, Attribute, Comment, etc, objects. The same Node object is used to represent an element, an attribute, a comment, a processing instruction, a text node or a document.

Method Node.getNodeNature returns the kind of a Node. Node.isElement is just a convenience method.

Methods such as Node.getName, Node.getAttribute, etc, return values depending on the kind of the subject Node. For example, Node.getAttribute returns null for all kinds of Nodes, except for element Nodes.

3

An element Node has a name which is returned by the Node.getName method. In Qizx, an XML name is represented by a com.qizx.api.QName[12] object, and not by a String or a pair of Strings like in the W3C DOM.

A new QName object is obtained using ItemFactory.getQName. A Library extends the ItemFactory interface. Therefore, a QName is generally obtained from a Library.

6.2.1. Compiling and running the code of this lesson

  • Compile class Edit by executing ant (see build.xml) in the docs/samples/programming/edit/ directory.

  • Run ant run in the docs/samples/programming/edit/ directory to add pseudonym "Kilgore Trout" to author "Philip José Farmer" [10].

    Note that if you have already run the example using XQuery Update, you will get an error since the Edit class does not accept duplicate pseudonyms.

7. Customizing the indexing of XML content

7.1. Re-indexing a Library

Query 20.xq:

(: Find all authors born after 1945 (e.g. Lois McMaster Bujold). :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";

collection("/")//t:author[t:birthDate > xs:date("1945-01-01Z")]/t:fullName

gives no result because the t:birthDate element is not indexed as a xs:date[13]. The cause of this problem is that the element contains a date in local format (example: November 2, 1949) rather than a standard format (example: 1949-11-02).

This is a case where we need to specify a custom indexing: on the t:birthDate element, a specific string-to-date converter based on the predefined class com.qizx.api.util.text.FormatDateSieve has to be used.

In Qizx, custom indexing is defined through an "Indexing Specification" which is in XML format. The syntax and semantics of indexing specifications are described in great details in Chapter 9, Configuring the indexing process.

The indexing specification we will use is in the file indexing.xml:

<indexing xmlns:t="http://www.qizx.com/namespace/Tutorial">
  <!-- Default rules -->1

  <element as="numeric+string"/>
  <element as="date+string" />
  <element as="string" />

  <attribute as="numeric+string" />
  <attribute as="date+string" />
  <attribute as="string" />

  <!-- Custom rules -->

  <element name="t:birthDate" context="t:author" 
           as="date" sieve="com.qizx.api.util.text.FormatDateSieve" 
           format="MMMM d, yyyy" locale="en-US" timezone="GMT" />2

  <element name="t:publicationDate" context="t:book" 
           as="numeric" sieve="RomanNumberSieve" />3
</indexing>

1

Including the default rules before your custom rules is mandatory. If you don't do that, the Library is re-indexed with just the custom rules, which means that many queries will not work.

2

This custom rule specifies that a FormatDateSieve with a US "MMMM d, yyyy" format is to be used to index the content of t:author/t:birthDate elements.

3

More about this other custom rule in Section 7.2, “Writing a custom Indexing.NumberSieve”.

The ReIndex class implements a command-line tool allowing to change the indexing specification of a Library and then to re-index this Library.

        Library lib = libManager.openLibrary(libName);

        try {
            verbose("Loading indexing specifications from '" + 
                    indexingFile + "'...");
            Indexing indexing = loadIndexing(indexingFile);1
            lib.setIndexing(indexing);2

            verbose ("Re-indexing library '" + libName + "'...");
            lib.reIndex();3
        } finally {
            shutdown(lib, libManager);
        }

1

The Indexing specification is simply loaded from an XML file by using the Indexing.parse method:

    private static Indexing loadIndexing(File file) 
        throws IOException, SAXException, QizxException {
        Indexing indexing = new Indexing();

        String systemId = file.toURI().toASCIIString();
        indexing.parse(new InputSource(systemId));

        return indexing;
    }

Alternatively, it is possible to programmatically create an Indexing object by invoking methods such as Indexing.addAttributeRule, Indexing.addElementRule, etc.

2

Library.setIndexing changes the indexing specifications of a Library, but does not automatically re-index the Library.

3

Library.reIndex re-indexes a Library. This may take from a few seconds to several hours depending on the size of the Library.

Note that there is no need to invoke Library.commit after reIndex.

7.2. Writing a custom Indexing.NumberSieve

This time, query 21.xq

(: Find all books published before 1960 (e.g. The Caves of Steel). :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";

collection("/")//t:book[t:publicationDate < 1960]/t:title

gives no result because the t:publicationDate element is not indexed as a number[13]. The reason of this problem is that the element contains a Roman numeral year date (example: "MCMLIV" = 1954).

The predefined string-to-number converter, com.qizx.api.util.text.FormatNumberSieve, is very flexible but not to the point of converting Roman numeral year dates to numbers. Therefore the only way to solve the problem is:

  1. To write a custom string-to-number converter (called a sieve in Qizx parlance), that is, to implement interface Indexing.NumberSieve.

  2. To properly declare this custom sieve in indexing.xml, our custom indexing specification.

      <element name="t:publicationDate" context="t:book" 
               as="numeric" sieve="RomanNumberSieve" />
  3. To make sure that the code of our custom sieve is referenced in the CLASSPATH.

Excerpts of RomanNumberSieve.java:

public final class RomanNumberSieve implements Indexing.NumberSieve {
    ...
    public double convert(String text) {1
        double converted = 0;

        char[] chars = text.trim().toUpperCase().toCharArray();
        int maxSymbolValue = -1;

        for (int j = chars.length-1; j >= 0; --j) {
            char c = chars[j];

            Symbol symbol = null;
            for (int i = 0; i < SYMBOLS.length; ++i) {
                if (SYMBOLS[i].symbol == c) {
                    symbol = SYMBOLS[i];
                    break;
                }
            }
            if (symbol == null) {
                return Double.NaN;
            }

            if (symbol.value >= maxSymbolValue) {
                // Example: second "M" in "MCMXC" (1990).
                maxSymbolValue = symbol.value;
                converted += maxSymbolValue;
            } else {
                // Example: first "C" in "MCMXC" (1990).
                converted -= symbol.value;
            }
        }

        return converted;
    }

    public void setParameters(String[] parameters) {}2
    public String[] getParameters() { return null; }
    ...
}

1

A Indexing.NumberSieve basically converts a String to a double. It should return Double.NaN when the conversion fails.

2

Like all Indexing.Sieves, an Indexing.NumberSieve can be parametrized. This feature is not useful in the case of RomanNumberSieve.

7.3. Compiling and running the code of this lesson

  • Compile class ReIndex by executing ant (see build.xml) in the docs/samples/programming/reindex/ directory.

  • Run ant run in the docs/samples/programming/reindex/ directory to re-index the "Tutorial" Library using indexing.xml, our customized indexing specification.

  • Run ant run2 in the docs/samples/programming/query/ directory to check that the 20.xq and 21.xq queries now return the expected results.

8. Adding metadata to Documents

A LibraryMember, Collection or Document, has not only a content, but also properties. Properties are also explained in the chapter Getting Started.

A property has a name (String) and a value (any Object implementing java.io.Serializable).

Qizx automatically adds a few system properties to all LibraryMembers. The most useful system properties are:

nature

The nature of the LibraryMember: "collection" or "document".

path

The absolute path of the LibraryMember. Example: "/Author Blurbs/Philip_Jose_Farmer.xhtml".

But the real benefit of supporting properties is to allow an application to attach private information to a LibraryMember.

The AddMeta class implements a very specific command-line tool which allows to add metadata[14] to Documents stored in the "/Author Blurbs" Collection. Remember that the Documents stored in that Collection are copies of articles found on Wikipedia. The AddMeta class allows to annotate a Document with the following metadata:

copyDate

The date of the Wikipedia article. A java.util.Date object.

copiedURL

The location of the Wikipedia article. A java.net.URL object.

license

The license[15] attached to the Wikipedia article. A String.

Excerpts of AddMeta.java:

        ...
        Collection collection = lib.getCollection(collectionPath);
        if (collection == null) {
            error("'" + collectionPath + "' is not a collection");
            return;
        }

        LibraryMemberIterator iter = collection.getChildren();
        while (iter.moveToNextMember()) {1
            LibraryMember m = iter.getCurrentMember();

            if (m.isDocument()) {
                String name = trimExtension(m.getName());

                Info info = (Info) nameToInfo.get(name);2
                if (info == null) {
                    warning("No meta-data about '" + m.getPath() + "'...");
                } else {
                    verbose("Adding meta-data to '" + m.getPath() + "'...");
                    m.setProperty("copyDate", info.copyDate);3
                    m.setProperty("copiedURL", info.copiedURL);
                    m.setProperty("license", license);
                }
            }
        }
        ...

1

Iterate over the members of Collection "/Author Blurbs".

2

If an entry having the same name as current LibraryMember m is found in HashMap nameToInfo, add the "copyDate", "copiedURL" and "license" properties to LibraryMember m.

HashMap nameToInfo maps Strings (LibraryMember names) to Info objects.

    private static final class Info {
        public final Date copyDate;
        public final URL copiedURL;

        public Info(Date copyDate, URL copiedURL) {
            this.copyDate = copyDate;
            this.copiedURL = copiedURL;
        }
    }

The content of HashMap nameToInfo is parsed from LocalCopyInfo.txt. The value of String license is loaded from License.txt.

3

Method LibraryMember.setProperty can be used to add a new property or to replace the value of an existing one.

8.1. Compiling and running the code of this lesson

  • Compile class AddMeta by executing ant (see build.xml) in the docs/samples/programming/addmeta/ directory.

  • Run ant run in the docs/samples/programming/addmeta/ directory to add the metadata found in LocalCopyInfo.txt to the corresponding Documents of Collection "/Author Blurbs".

  • Run ant run3 in the docs/samples/programming/query/ directory to execute 30.xq, a query making use of some of the properties we have just added:

    (: List the original, Wikipedia, URLs of author blurbs containing 
       word "Russian" and copied locally after September 15, 2007. :)
    declare namespace html = "http://www.w3.org/1999/xhtml";
    
    for $doc in xlib:query-properties("/Author Blurbs/*.xhtml",
                                      copyDate ge xs:date("2007-09-15"))
    where $doc/*[ft:contains("Russian")]
    return xlib:get-property($doc, "copiedURL")

    xlib:query-properties and xlib:get-property are XQuery extension functions, specific to Qizx, documented in Chapter 14, XML Library extension functions.

9. Convenience and utility classes provided by the API

In addition to the main packages com.qizx.api and com.qizx.api.fulltext, the Java API of Qizx provides packages containing miscellaneous utilities: their root is com.qizx.api.util, and there are specialized sub-packages.

This section is a short presentation of main functionalities of these packages. For more details, please consult the Javadocumentation.

9.1. Package com.qizx.api.util

The main package contains implementations of API interfaces and some useful adapters.

Default Implementations

  • DefaultModuleResolver is the default implementation of ModuleResolver. By plugging a subclass of DefaultModuleResolver in a LibraryManager or a XQuerySessionManager, it is possible to change the way modules are accessed.

Implementations of XMLPushStream

XMLPushStream is a generic interface which is roughly equivalent to SAX2. Qizx uses it rather than using SAX2 because SAX2 is not well adapted to the XQuery Data Model. Adapters to and from SAX2 are provided.

XMLPushStream allows transferring XML content in a "push" style. It is typically used to export a Node item, but it can also be used to compute and store a Document into an XML Library.

  • The most useful implementation is XMLSerializer. It allows transforming XML content to a character stream.

  • Adapter to SAX: PushStreamToSAX converts to a flow of SAX2 events.

  • Adapter to DOM: PushStreamToDOM builds a DOM document.

  • Builder of internal XML Data Model: CorePushBuilder allows creating a representation of the internal Data Model that can be accessed through the com.qizx.api.Node interface.

There is also SAXToPushStream, a reverse adapter from SAX to XMLPushStream which is internally used for loading XML documents into a database using the standard JAXP interface.

Adapters for JAXP transformations

NodeSource is a subclass of javax.xml.transform.sax.SAXSource. It can be used to pass a Qizx Node as a source to any XSLT engine supporting JAXP (namely Saxon and Xalan).

Conversely, PushStreamResult is a subclass of class javax.xml.transform.sax.SAXResult, wrapping any XMLPushStream. Its typical use is with Library.beginImportDocument: it allows directly reimporting the result of an XSLT transformation into a Document of a database (XML Library).

9.2. Package com.qizx.api.util.fulltext

Default Implementations

Implementations of full-text interfaces:

  • DefaultFullTextFactory

  • DefaultTextTokenizer

  • DefaultScorer

Utilities

  • FullTextHighlighter is an iterator extending XMLPullStream, it distinguishes terms of a full-text query. It is basically used for implementing the ft:highlight extension function.

  • FullTextSnippetExtractor extracts a snippet from a document or a XML Node, attempting to show most of the terms of a full-text query within N words (by default 20). It is also an iterator extending XMLPullStream. It is used for implementing the ft:snippet extension function.

9.3. Package com.qizx.api.util.accesscontrol

Contains a simple Unix-like implementation of interface AccessControl.



[3] In theory, it is kind of like Make, without Make's wrinkles” say its authors.

[4] The name of the Library used in this tutorial is "Tutorial".

[5] Remember that a Library is at the same time a database and the transactional session used to modify and/or query this database.

[6] Qizx has the ACID capabilities of a transactional database: Atomicity, Consistency, Isolation, Durability.

[7] At worst, lock the root Collection.

[8] Libraries are relatively lightweight objects. Opening a Library is cheap in terms of memory and CPU usage.

[9] "An Introduction to StAX" by Elliotte Rusty Harold. Recommended StAX implementation: Woodstox.

[10] Kilgore Trout is not an actual author. This is the pseudonym used by Philip José Farmer to write the "Venus on the Half-Shell" Science-Fiction novel.

[11] Document Object Model. Actually the term used in the XML Query literature is XQuery/XPath2 Data Model or DM for short.

[12] Not a javax.xml.namespace.QName as found in the Java™ runtime, starting from version 1.5.

[13] Run ant run2 in the docs/samples/programming/query/ directory to check that.

[14] Data about data.