Chapter 4. Getting started

Table of Contents

1. Introduction
1.1. About the data samples used in this tutorial
2. Creating an XML Library
2.1. Creating a Library using Qizx Studio
2.2. Creating a Library using the qizx command-line tool
3. Populating a Library with Collections and Documents
3.1. Importing XML Documents using Qizx Studio
3.2. Importing XML Documents using the qizx tool
4. Exporting Documents from an XML Library
4.1. Using Qizx Studio
4.2. Using the qizx command-line tool
5. Querying a Library
5.1. Writing and running queries with Qizx Studio
5.2. Running queries with the qizx command line tool
6. Copying, Renaming, Deleting Documents and Collections
6.1. Using Qizx Studio
6.2. Using the qizx command-line tool
7. Updating XML Documents
8. Using Metadata Properties
8.1. Properties in Qizx Studio
8.2. Properties in the qizx command-line tool
8.3. Extension functions for Property handling
8.4. Using property queries to restrict the search domain of a standard query
8.5. Custom indexes

1. Introduction

Qizx is a XML Query database engine designed to be embedded in a Java™ application — typically a Servlet. As such, it is primarily used as a class library (see the chapter Programming with the Qizx API for an introduction).

To help experimenting with XML Query and XML databases and developing, Qizx also comes with two tools which make it easy to build a database, populate it with XML documents, and perform queries on this database, without programming — except of course in XML Query:

Qizx Studio

A graphic tool featuring an explorer view for browsing the contents of a group of XML Libraries, plus a simple XML Query workbench with which you can write and execute XML Query scripts, and view the results.

qizx

A command-line tool which can be used to create and maintain XML Libraries, or simply execute XML Query script files.

In this chapter you'll learn in 6 lessons how these two tools can be used to achieve the most common tasks:

  1. Lesson 1: how to create a database (called XML Library)

  2. Lesson 2: how to populate a database with Collections and Documents.

  3. Lesson 3: how to extract copies of Documents stored in a database.

  4. Lesson 4: how to query a database.

  5. Lesson 5: how to delete a Document, a Collection or a whole Library.

  6. Lesson 6: how to use metadata (properties) on Documents or Collections.

The target audience of this chapter are programmers or experienced users having a good knowledge of XML and at least a basic knowledge of XQuery.

1.1. About the data samples used in this tutorial

The directory docs/samples/book_data/ contains several kinds of XML documents. These short, simple XML documents (a few dozens) serve no other purpose than teaching how to use Qizx API. In real life, Qizx can be expected to store and query hundreds of thousands XML documents of multiple sizes, ranging from a few hundreds of bytes to several hundred megabytes.

Books/

Each document found in this directory contains the description of a Science-Fiction book: its title, authors, editions, etc. Example docs/samples/book_data/Books/The_Robots_of_Dawn.xml:

<book xmlns="http://www.qizx.com/namespace/Tutorial">
  <title>The Robots of Dawn</title>
  <author>Isaac Asimov</author>
  <publicationDate>MCMLXXXIII</publicationDate>
  <editions>
    <edition>
      <ISBN>0553299492</ISBN>
      <publisher>Doubleday</publisher>
      <language>English</language>
      <year>1983</year>
    </edition>
  </editions>
</book>
Publishers/

Each document found in this directory contains the description of a publisher: its name, address, etc. Example docs/samples/book_data/Publishers/Doubleday.xml:

<publisher xmlns="http://www.qizx.com/namespace/Tutorial">
  <trademark>Doubleday</trademark>
  <company>Random House, Inc.</company>
  <address xml:space="preserve">1540 Broadway
New York, NY 10036
US</address>
</publisher>
Authors/

Each document found in this directory contains the description of a Science-Fiction author: her/his name, pseudonyms, birth date, etc. Example docs/samples/book_data/Authors/iasimov.xml:

<author xmlns="http://www.qizx.com/namespace/Tutorial"
  nationality="US" gender="male">
  <fullName>Isaac Asimov</fullName>
  <pseudonyms>
    <pseudonym>Paul French</pseudonym>
    <pseudonym>George E. Dale</pseudonym>
  </pseudonyms>
  <birthDate>January 2, 1920</birthDate>
  <birthPlace>
    <city>Petrovichi</city><country>Russian SFSR</country>
  </birthPlace>
  <blurb location="../Author%20Blurbs/Isaac_Asimov.xhtml"/>
</author>
Author Blurbs/qizx

Each document found in this directory is an XHTML page which is a copy of a Wikipedia article describing a Science-Fiction author. Example docs/samples/book_data/Author Blurbs/Isaac_Asimov.xhtml:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" dir="ltr"
lang="en">
<head>
...
<title>Isaac Asimov - Wikipedia, the free encyclopedia</title>
...
</body>
</html>

The XHTML DTD and the corresponding XML Catalog are found in docs/samples/xhtml_dtd/.

2. Creating an XML Library

in Qizx, a database is called an XML Library. Physically, a Library is stored in a directory on a disk. There is no limit to the number of Libraries that can be created with Qizx.

A Qizx engine can actually handle several Libraries at the same time. This allows a better sharing of resources in case an application needs to handle several Libraries.

A Library Group is simply a bundle of Libraries grouped together inside a parent directory. A Library group can be opened or created in a single operation by a Qizx engine.

A Library is normally part of a Library Group. This not a hard and fast rule, a Library can be opened independently and can even belong to several groups[1].

In practice, you will likely use a single Library at a time. It is rarely useful to create two or more Libraries, unless to really want to have separate sets of data for your applications; indexing issues can be a reason too (see the chapter Configuring the indexing process for more details).

2.1. Creating a Library using Qizx Studio

Starting Qizx Studio

  • On Windows, the directory bin inside the Qizx distribution contains an executable qizxstudio.exe (or qizxstudio.bat), that can be started directly by a double-click,

  • On Linux or Mac OS X or other Unix, a shell script bin/qizxstudio can be started from a console window or from a graphic explorer.

    Note that when started from a console, Qizx Studio accepts command-line arguments, for example to directly open a Library group or load a XML Query script in the editor. See the reference documentation.

You should then see a window looking like this:

Figure 4.1. Qizx Studio first launch

Qizx Studio first launch

  • There are two tabs in Qizx Studio: "XQuery" for entering and running queries, "XML Libraries" for browsing and modifying XML Libraries.

  • The header [No XML Libraries] means that Qizx Studio has not yet opened any Library group. Still, it is possible to execute XQuery scripts but without access to a library.

Creation of the Library

Figure 4.2. Creating an XML Library

Creating an XML Library

  1. Right-clicking on the icon of the library group icon and choosing "Create Library Group" brings a directory selection dialog with which you select a directory (new or empty), assumed here to be "C:\works\xdb1" (of course it can be whatever you choose).

  2. Then the dialog above asks for the name of the first Library within the group. We assume in the following that the name "scifi" is chosen.

  3. When the Library is created, it contains the root collection, whose path is "/". By clicking on the root collection, you should see its default Metadata properties appear on the right side.

  4. It is possible to create more Libraries with the right-click menu on the icon of the Library Group.

  5. Opening an existing Library Group is achieved by using the menu item "Open XML Library Group" and choosing its directory.

    You can also directly choose the directory of a Library (instead of a group), but in that case you can manage only this single library.

    Note that a Library can be opened by only one instance of a Qizx engine at a time: if you attempt to open it several times you will get an error message complaining that the Library is locked.

Creation of a Collection

  1. Right-clicking on the icon of the root collection, and choosing "Create sub-collection", you are prompted for the name of a Collection (the name must not contain slashes). The collection is created as direct child of the root collection. If your chose the name "books", the path of the collection is "/books".

2.2. Creating a Library using the qizx command-line tool

The shell script qizx (qizx.bat on Windows) is also located in the bin/ directory in the Qizx distribution. In the following we assume that this bin/ directory is in the PATH environment variable.

In a terminal window, type the following command (on Windows):

qizx -group c:\works\xdb1 1 -library scifi 2 -create 3

1

The option -group (or -g for short) specifies the path of the Library group (here c:\works\xdb1)

2

The option -library (or -l for short) specifies the name of the working Library (here scifi).

3

The option -create tells the tool to create what is necessary:

  • If the group does not exist yet, then it is created

  • If the library scifi does not exist yet, then it is created

  • If both already exist, the -create option has no effect.

If you explore the directory c:\works\xdb1, you will find a sub-directory corresponding to the Library scifi. The internal structure of a Library needs not be known, and should never be altered manually, except for the directories logs which contain log files.

3. Populating a Library with Collections and Documents

In this section we use the sample documents provided in docs/samples/book_data/ inside the distribution.

3.1. Importing XML Documents using Qizx Studio

Assuming we have created a Library named 'scifi' as explained above:

  1. Right-click on the icon of the root collection (path '/') and choose Import Documents. A dialog appears.

  2. The import operation is performed in two steps:

    1. Files and directories are selected in an import list, using the Add File/Folder button.

      The Filter combo-box allows filtering the file extension of interest (generally .xml).

      Here we select the whole directory docs\samples\book_data or docs/samples/book_data inside the Qizx distribution. Because we use the filter *.xml, only the files ending with the .xml extension will be selected. After selection the number of selected documents and their total size in bytes are displayed in the table.

      This selection operation can be repeated on directories, or on single XML files. The auxiliary buttons Remove and Clear all allow editing the list.

    2. Pushing the button Start Import actually starts the import transaction.

      After completion, the dialog can be closed with the Close button at bottom.

      Parsing errors are displayed in the message window of the dialog. The import speed can reach up to 2 Megabytes per second on a 3 GHz processor for large documents, but a large number of small documents can proportionally slow down this process.

  3. Once the import finished, you should see something like:

    Figure 4.3. Library browser after importing documents

    Library browser after importing documents

Remark

When selecting a directory in the import dialog, the contents of the directory are imported into the current collection. The sub-directory structure of the source is replicated, but the original directory name is not used.

Using XML Catalogs

The sample data also contains some XHTML files which refer to a DTD. If we import them in the same way (first, set the filter to "*.xhtml"), we notice that it takes a significant time (several seconds) while the total size is only a few hundreds kilobytes (alternately there might be a parse error if you have no access to the network). As explained in the "XML catalogs" sidebar, this is because the DTD public identifier refers to an HTTP location, so the DTD is downloaded from the network.

To avoid this, a suitable catalog can be found in the sample data: docs/samples/xhtml_dtd/catalog.xml. There are two possibilities for enabling the catalog:

  • Define an environment variable XML_CATALOG_FILES, whose value is a list of paths (or URLs) of catalogs, separated by semicolons. This method works in any context (Qizx Studio, qizx, or application).

  • in Qizx Studio, there is a dialog to define the list of catalogs more conveniently: ToolsXML Catalogs. Attention, the environment variable has priority over this mechanism.

3.2. Importing XML Documents using the qizx tool

If you use this command:

qizx -g c:\work\xdb1 -l scifi -include .xml -include .xhtml 1 \
    -import / docs\samples\book_data 2

all files ending with .xml will be imported from directory docs\samples, in the same way as in Qizx Studio.

1

The option -include followed by an extension acts as a file filter. It is somewhat equivalent to the filters in Qizx Studio. It is possible to have several -include options in a row. There is also a converse -exclude option.

2

The option -import specifies the target collection. This collection is created automatically if necessary.

The option can be followed by any number of paths of directories or XML documents, or even HTTP locations (URL).

Using XML Catalogs

Using the qizx tool, a catalog file can be defined with the environment variable XML_CATALOG_FILES, as explained above.

4. Exporting Documents from an XML Library

4.1. Using Qizx Studio

  • Exporting a document: using the the XML Library browser, select the document. Then right-clicking the document icon, or using the button in the document view ("Contents of Document", down right), brings an export dialog:

    Figure 4.4. Exporting a Document from an XML Library

    Exporting a Document from an XML Library

    From the dialog, you can choose several export, or serialization, options:

    • Encoding

    • Method: XML (standard), HTML or XHTML (meaningful only if the document contents are HTML), and Text (all tags are stripped, may be useful to generate code or data using the XML Query language).

    • Omit XML Declaration: strips the <?xml header.

    • Indentation: makes the output prettier by adding whitespace.

    • Note that not all standard serialization options are available, only the most common ones. The command-line tool allows for all options implemented by Qizx.

  • Exporting a whole Collection is not available currently in Qizx Studio, but it is in the qizx tool.

4.2. Using the qizx command-line tool

There are two option switches to control export of documents and collections:

  • Option -out file defines the export destination: it is a plain file if a Document is exported, it should be a directory if a Collection is exported. If it exists, it is overwritten, else it is created.

    This option must come before the -export option.

  • Option -export member selects a Document or Collection to export.

    This option should come after -out and serialization options.

  • Serialization options are introduced by the switch -X immediately followed by an option name, then if applicable the value after a '=' sign. Example: -Xmethod=XHTML -Xencoding=UTF-8 .

    Serialization options are described in detail here.

Example:

qizx -g c:\work\xdb1 -l scifi -out myexporteddata -Xmethod=Html \
    -export "/sample/book_data/Authors Blurbs"

5. Querying a Library

In this section we are going to run queries on the database we have just created.

This section assume you have at least a basic knowledge of the XML Query language.

Note that the directory docs/samples/book_queries/ contains the queries needed to illustrate this lesson.

5.1. Writing and running queries with Qizx Studio

Qizx Studio currently provides a basic environment for editing and running XML Query queries. Later releases will likely offer debugging facilities.

  • Let us try this query (which is the contents of the file docs/samples/book_queries/4.xq):

    (: Find all books written by French authors. :)
    declare namespace t = "http://www.qizx.com/namespace/Tutorial";
    
    for $a in collection("/Authors")//t:author[@nationality = "France"]
        for $b in collection("/Books")//t:book[.//t:author = $a/t:fullName]
        return 
            $b/t:title
  • In Qizx Studio, switch to the XQuery tab, then use the menu FileOpen XQuery to load the file mentioned above.

    Note that you can also save to a file a query that you have entered or edited in Qizx Studio.

    There is an history that allows running again former queries, so it is not necessary to save intermediary experiments.

  • Then, if you have created the XML Library as indicated in the previous sections, you can use the button Execute to run the query. After execution, we should obtain something similar to this:

    Figure 4.5. Result of a query

    Result of a query

  • Notice that in the picture above the display mode of the right-side view has been changed to "Data Model", using the View combo-box. This makes it easier to see the Data Model structure.

    The result sequence contains one item, which is a element t:title whose string value is "Planet of the Apes".

    We can for example change the value "France" to "US" in the query, and get a sequence of 8 items.

    In the same location, there are a few other queries that you can also try.

  • The result items in the right-side view can be exported into a file using a button in the header. Notice that the resulting file will not in general be a well-formed XML document.

  • Diagnostic view, the view at bottom left contains messages, which can be simple information (execution times) or possible execution errors.

    Compilation and execution errors have generally a link to the location in the source code. By clicking the link, the location of the error is displayed in the editor view.

  • For more information about the editor and the query history, please see the documentation of Qizx Studio.

5.2. Running queries with the qizx command line tool

  • To run queries on a particular Library, it is sufficient to specify a XQuery source file on the command-line:

    qizx -g c:\work\xdb1 -l scifi 4.xq -out results.xml

    Of course, like before, we specify the Library with -g and -l (or -group and -library) switches.

  • Results are displayed on the console (or standard output). Retrieving results into a file works like export, by using -out and serialization options.

6. Copying, Renaming, Deleting Documents and Collections

In this short section, we will see how to perform the basic tasks of copying, renaming and deleting Documents and Collections.

6.1. Using Qizx Studio

  • In Qizx Studio, these tasks are fairly easy to perform: just right-click on the library member to copy, rename or delete, and select the proper menu item.

  • For copy and rename, you are prompted for a destination path: this path should be inside an existing collection, and should not point to an existing object.

6.2. Using the qizx command-line tool

  • The -delete option switch can be used to delete any Library member given its path (Collection or Document):

    qizx -g c:\work\xdb1 -l scifi -delete /Authors
  • There are no option switches for renaming and copying. You can resort to a script (let us put it in a file named rename.xq):

    declare variable $src-member external;
    declare variable $dst-member external;
    try {
      xlib:rename-member($src-member, $dst-member),
      xlib:commit()
    }
    catch($err) {
     element error { $err }
    }

    Caution

    It is highly recommended to wrap the operation within a try-catch, because the functions xlib:rename-member() and xlib:commit() have side effects. The try-catch extension guarantees that its body (the try clause) is evaluated only once and in the order specified.

    It is highly recommended to wrap the operation within a try/catch, because otherwise the execution would be performed twice (for the sake of display) and an error would happen (the second rename cannot work).

    To run the script, use the -D option switch to bind a value with the variables $src-member and $dst-member:

    qizx -g c:\work\xdb1 -l scifi rename.xq -Dsrc-member=/Authors/iasimov.xml \
        -Ddst-member=/Authors/IsaacAsimov.xml 

    Of course, the copy operation can be performed in the same way using the extension function xlib:copy-member.

7. Updating XML Documents

As of version 2.1, Qizx supports the XQuery Update extension. This extension is a powerful mechanism well integrated with the base XQuery language thats allows modifications at Node level.

To understand the basics of XQuery Update, we recommend reading our tutorial "XQuery Update for the impatient".

Using XQuery Update in Qizx is straightforward: since XQuery Update is an extension of XQuery, executing an updating script is the same as running any other query. This is very much like in SQL, using a SELECT ... UPDATE instruction instead of a simple SELECT.

Warning

Qizx is designed for performing fast queries, not fast updates. Its design has deliberately sacrificed the capability to perform fast local updates inside large documents, in order to achieve greater querying speed. So we advise against updating documents larger than about one megabyte. Small documents can be updated as quickly as in any other XML database.

Example 4.1. Delete a Node

Still using the same example data as before, let us suppose we want to remove the third pseudonym of the author Jack Vance:

declare namespace t = "http://www.qizx.com/namespace/Tutorial";

let $a := collection("/Authors")//t:author[t:fullName = "Jack Vance"]
return delete node $a//t:pseudonym[3]

This returns an empty sequence, because updating expressions like delete node, insert node etc always return an empty sequence.

In Qizx Studio, a commit is performed automatically, so we only have to check the document /Authors/jvance.xml to see the result. It should now contain 4 pseudonyms instead of 5, the element <pseudonym>Peter Held</pseudonym> should have disappeared.


Example 4.2. insert the Spanish edition of "Planet of the Apes".

declare default element namespace "http://www.qizx.com/namespace/Tutorial";

let $book := collection("/Books")//book[title ="Planet of the Apes"]
let $e := <edition>
   <ISBN>9788466303736</ISBN>
   <publisher>Suma de Letras</publisher>
   <language>Castellano</language>
   <year>2001</year>
 </edition> 
return insert node $e into $book/editions

Notice that here we use "declare default element namespace" so that the inserted nodes <edition>... have the proper namespace.


8. Using Metadata Properties

In this lesson, we will see what are Properties and how they can be useful.

Collections and Documents can hold any number of named properties. Some properties are created automatically by the database engine (we call them system properties), but it is also possible to add properties at will (user properties).

An important aspect is that Properties can be queried: it is possible to run a special type of queries that return a sequence of documents or collections whose properties match the query. This is a very powerful mechanism as we will see below.

A property has a name (a simple name without namespace) and a value. The possible types of a property value are:

  • Boolean.

  • Long integer (corresponds to XQuery type xs:integer).

  • Double.

  • String.

  • java.util.Date, a date/time with millisecond precision.

  • Node, a single node of the XQuery data model, likely an element. This allows a property to contain rich structured information. Furthermore this XML value can be queried much in the same way as normal document content.

  • Any serializable Java object: this can be used through the Java API, but also in XQuery through the Java Binding mechanism, which allows handling arbitrary Java objects in XQuery.

Two system properties common to Collections and Documents are:

nature

The nature of the Library member: "collection" or "document".

path

The absolute path of the Library member. Example: "/Author Blurbs/Philip_Jose_Farmer.xhtml".

Properties are sometimes called metadata: this means properties can be used as metadata, that is, data describing data. For example, when specified, the public and system ids of the DTD of the document are stored as system properties. For documents, some statistics are computed automatically and added as properties. The source path or URI of a document and the date of import are also stored as properties.

Note

Predefined properties are described in reference documentation.

So Properties can be used as user-defined metadata: they provide an easy way to associate information with documents without altering the contents of the documents.

8.1. Properties in Qizx Studio

  • Let's select a document in the Library, say /Authors/iasimov.xml.

  • In the view Metadata Properties, you should see a list of properties of the document.

  • By right-clicking on one of the properties, and choosing "Add Property", a dialog should appear:

    Figure 4.6. Adding a new property 'meta-info-1'

    Adding a new property 'meta-info-1'

  • Thanks to this dialog, you can enter the name of a new property (here meta-info-1), choose its type (here Node), and enter a fragment of XML as a value.

  • After clicking OK, the property should be visible in the property view.

Using properties in a query

Suppose you want to find all documents which have a meta-info-1 property: go to the XQuery tab and enter this expression in the query editor, then run it.

xlib:query-properties("/Authors", nature="document" and meta-info-1)

You should obtain one item which is document("/Authors/iasimov.xml").

Remarks:

  • xlib:query-properties is an extension function which returns a list of those library members which are contained within the collection passed as first argument, and match the boolean expression passed as second argument.

  • The boolean expression as second argument is standard XQuery, where properties are used as if they were XML elements.

    Thus nature="document" should be read as a library member whose property 'nature' is equal to 'document', while meta-info-1 should be read as a member which has a property named meta-info-1.

    It is even possible to do a full-text search on a property: for example use meta-info-1[ft:contains('field1')] or equivalently: ft:contains('field1', meta-info-1).

8.2. Properties in the qizx command-line tool

There are no option switches to handle metadata properties in qizx. You have to resort to XQuery scripts using the extension functions described in the next section.

8.3. Extension functions for Property handling

In addition to xlib:query-properties, there are several functions in the xlib: namespace to handle properties: see their description in Chapter 14, XML Library extension functions.

In these functions, the $member parameter can be either a path (String) or a wrapped LibraryMember object obtained for example through the functions xlib:collection() or xlib:document(). .

xlib:property-names ($member)

return a list of the names of properties owned by the object

xlib:get-property ($member, $name)

returns the value of the property.

xlib:set-property ($member, $name, $value)

Sets the value, creates the property if necessary. If the value is empty sequence, removes the property.

A call to this function should be committed with the function xlib:commit.

8.4. Using property queries to restrict the search domain of a standard query

Suppose you want to perform a XQuery query, but only in those documents which are marked with a boolean property latest-version equal to true (this would be a primitive way of doing versioning).

Let us assume the query to perform is //section[ft:contains('prevention AND hazard')] (find a section containing the word hazard and the word prevention).

Then you can write a query like this:

xlib:query-properties("/", latest-version=true())//section[ft:contains('prevention AND hazard')]

Remarks:

  • The expression above is treated in a slightly special way by Qizx: normally the root of a Path Expression is a sequence of nodes, while here it is a sequence of library members. But Qizx performs an automatic expansion into a set of document nodes.

  • This mechanism is a powerful way to define a search domain for a query, according to criteria of arbitrary complexity. We will see in the next section a possible use of this capability.

8.5. Custom indexes

An application of the technique presented in the previous section is the management of custom indexes.

An example: suppose you have documents which contain invoices. You would like to find the invoices where the average item price is greater than a certain value. Let's suppose the average price is computed as follows:

declare function local:average-item-price($invoice) {
  sum(for $item in $invoice/item return $item/price * $item/quantity)
     div sum($invoice/item/quantity)
}

There are several possibilities:

  1. Perform directly the query using this function:

    collection("/invoices")/invoice[ local:average-item-price(.) >= 1000 ]

    This can be very slow if there are many items.

  2. Store the average price inside the document: this is not satisfactory, we do not want to pollute our data just for the sake of queries.

  3. The finest solution is to use a user property named for example average-item-price which contains this value. The property is initialized when the document is created or updated. Then the query can be written like as follows, and should be quite fast:

    xlib:query-properties("/invoices", average-item-price >= 1000)/invoice

Generally speaking, a custom index is simply a property containing a value that is expensive to compute. This value is initialized once when creating or updating the document. Then it can be used to perform fast queries.



[1] This is a more advanced topic, not yet fully documented.