Table of Contents
Qizx is a XML Query database engine designed to be embedded in a Java™ application — typically a Servlet. As such, it is primarily used as a class library (see the chapter Programming with the Qizx API for an introduction).
To help experimenting with XML Query and XML databases and developing, Qizx also comes with two tools which make it easy to build a database, populate it with XML documents, and perform queries on this database, without programming — except of course in XML Query:
A graphic tool featuring an explorer view for browsing the contents of a group of XML Libraries, plus a simple XML Query workbench with which you can write and execute XML Query scripts, and view the results.
A command-line tool which can be used to create and maintain XML Libraries, or simply execute XML Query script files.
In this chapter you'll learn in 6 lessons how these two tools can be used to achieve the most common tasks:
Lesson 1: how to create a database (called XML Library)
Lesson 2: how to populate a database with Collections and Documents.
Lesson 3: how to extract copies of Documents stored in a database.
Lesson 4: how to query a database.
Lesson 5: how to delete a Document
, a Collection
or a whole Library
.
Lesson 6: how to use metadata (properties) on Documents or Collections.
The target audience of this chapter are programmers or experienced users having a good knowledge of XML and at least a basic knowledge of XQuery.
The directory docs/samples/book_data/
contains several kinds of XML documents. These short, simple XML documents (a few dozens) serve no other purpose than teaching how to use Qizx API. In real life, Qizx can be expected to store and query hundreds of thousands XML documents of multiple sizes, ranging from a few hundreds of bytes to several hundred megabytes.
Each document found in this directory contains the description of a Science-Fiction book: its title, authors, editions, etc. Example docs/samples/book_data/Books/The_Robots_of_Dawn.xml
:
<book xmlns="http://www.qizx.com/namespace/Tutorial"> <title>The Robots of Dawn</title> <author>Isaac Asimov</author> <publicationDate>MCMLXXXIII</publicationDate> <editions> <edition> <ISBN>0553299492</ISBN> <publisher>Doubleday</publisher> <language>English</language> <year>1983</year> </edition> </editions> </book>
Each document found in this directory contains the description of a publisher: its name, address, etc. Example docs/samples/book_data/Publishers/Doubleday.xml
:
<publisher xmlns="http://www.qizx.com/namespace/Tutorial"> <trademark>Doubleday</trademark> <company>Random House, Inc.</company> <address xml:space="preserve">1540 Broadway New York, NY 10036 US</address> </publisher>
Each document found in this directory contains the description of a Science-Fiction author: her/his name, pseudonyms, birth date, etc. Example docs/samples/book_data/Authors/iasimov.xml
:
<author xmlns="http://www.qizx.com/namespace/Tutorial" nationality="US" gender="male"> <fullName>Isaac Asimov</fullName> <pseudonyms> <pseudonym>Paul French</pseudonym> <pseudonym>George E. Dale</pseudonym> </pseudonyms> <birthDate>January 2, 1920</birthDate> <birthPlace> <city>Petrovichi</city><country>Russian SFSR</country> </birthPlace> <blurb location="../Author%20Blurbs/Isaac_Asimov.xhtml"/> </author>
Each document found in this directory is an XHTML page which is a copy of a Wikipedia article describing a Science-Fiction author. Example docs/samples/book_data/Author Blurbs/Isaac_Asimov.xhtml
:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" dir="ltr" lang="en"> <head> ... <title>Isaac Asimov - Wikipedia, the free encyclopedia</title> ... </body> </html>
The XHTML DTD and the corresponding XML Catalog are found in docs/samples/xhtml_dtd/
.
in Qizx, a database is called an XML Library. Physically, a Library is stored in a directory on a disk. There is no limit to the number of Libraries that can be created with Qizx.
A Qizx engine can actually handle several Libraries at the same time. This allows a better sharing of resources in case an application needs to handle several Libraries.
A Library Group is simply a bundle of Libraries grouped together inside a parent directory. A Library group can be opened or created in a single operation by a Qizx engine.
A Library is normally part of a Library Group. This not a hard and fast rule, a Library can be opened independently and can even belong to several groups[1].
In practice, you will likely use a single Library at a time. It is rarely useful to create two or more Libraries, unless to really want to have separate sets of data for your applications; indexing issues can be a reason too (see the chapter Configuring the indexing process for more details).
On Windows, the directory bin inside the Qizx distribution contains an executable qizxstudio.exe (or qizxstudio.bat), that can be started directly by a double-click,
On Linux or Mac OS X or other Unix, a shell script bin/qizxstudio can be started from a console window or from a graphic explorer.
Note that when started from a console, Qizx Studio accepts command-line arguments, for example to directly open a Library group or load a XML Query script in the editor. See the reference documentation.
You should then see a window looking like this:
There are two tabs in Qizx Studio: "XQuery" for entering and running queries, "XML Libraries" for browsing and modifying XML Libraries.
The header [No XML Libraries] means that Qizx Studio has not yet opened any Library group. Still, it is possible to execute XQuery scripts but without access to a library.
Right-clicking on the icon of the library group icon and choosing "C:\works\xdb1
" (of course it can be whatever you choose).
Then the dialog above asks for the name of the first Library within the group. We assume in the following that the name "scifi
" is chosen.
When the Library is created, it contains the root collection, whose path is "/
". By clicking on the root collection, you should see its default Metadata properties appear on the right side.
It is possible to create more Libraries with the right-click menu on the icon of the Library Group.
Opening an existing Library Group is achieved by using the menu item "
" and choosing its directory.You can also directly choose the directory of a Library (instead of a group), but in that case you can manage only this single library.
Note that a Library can be opened by only one instance of a Qizx engine at a time: if you attempt to open it several times you will get an error message complaining that the Library is locked.
Right-clicking on the icon of the root collection, and choosing "books
", the path of the collection is "/books
".
The shell script qizx
(qizx.bat
on Windows) is also located in the bin/
directory in the Qizx distribution. In the following we assume that this bin/
directory is in the PATH
environment variable.
In a terminal window, type the following command (on Windows):
qizx -group c:\works\xdb1-library scifi
-create
The option | |
The option | |
The option
|
If you explore the directory c:\works\xdb1
, you will find a sub-directory corresponding to the Library scifi
. The internal structure of a Library needs not be known, and should never be altered manually, except for the directories logs
which contain log files.
In this section we use the sample documents provided in docs/samples/book_data/
inside the distribution.
Assuming we have created a Library named 'scifi
' as explained above:
Right-click on the icon of the root collection (path '/
') and choose . A dialog appears.
The import operation is performed in two steps:
Files and directories are selected in an import list, using the
button.The Filter combo-box allows filtering the file extension of interest (generally .xml
).
Here we select the whole directory docs\samples\book_data
or docs/samples/book_data
inside the Qizx distribution. Because we use the filter *.xml
, only the files ending with the .xml
extension will be selected. After selection the number of selected documents and their total size in bytes are displayed in the table.
This selection operation can be repeated on directories, or on single XML files. The auxiliary buttons
and allow editing the list.Pushing the button
actually starts the import transaction.After completion, the dialog can be closed with the Close button at bottom.
Parsing errors are displayed in the message window of the dialog. The import speed can reach up to 2 Megabytes per second on a 3 GHz processor for large documents, but a large number of small documents can proportionally slow down this process.
Once the import finished, you should see something like:
When selecting a directory in the import dialog, the contents of the directory are imported into the current collection. The sub-directory structure of the source is replicated, but the original directory name is not used.
The sample data also contains some XHTML files which refer to a DTD. If we import them in the same way (first, set the filter to "*.xhtml
"), we notice that it takes a significant time (several seconds) while the total size is only a few hundreds kilobytes (alternately there might be a parse error if you have no access to the network). As explained in the "XML catalogs" sidebar, this is because the DTD public identifier refers to an HTTP location, so the DTD is downloaded from the network.
To avoid this, a suitable catalog can be found in the sample data: docs/samples/xhtml_dtd/catalog.xml
. There are two possibilities for enabling the catalog:
Define an environment variable XML_CATALOG_FILES
, whose value is a list of paths (or URLs) of catalogs, separated by semicolons. This method works in any context (Qizx Studio, qizx, or application).
in Qizx Studio, there is a dialog to define the list of catalogs more conveniently:
→ . Attention, the environment variable has priority over this mechanism.If you use this command:
qizx -g c:\work\xdb1 -l scifi -include .xml -include .xhtml\ -import / docs\samples\book_data
all files ending with .xml
will be imported from directory docs\samples
, in the same way as in Qizx Studio.
The option | |
The option The option can be followed by any number of paths of directories or XML documents, or even HTTP locations (URL). |
Using the qizx tool, a catalog file can be defined with the environment variable XML_CATALOG_FILES
, as explained above.
Exporting a document: using the the XML Library browser, select the document. Then right-clicking the document icon, or using the button in the document view ("Contents of Document", down right), brings an export dialog:
From the dialog, you can choose several export, or serialization, options:
Encoding
Method: XML (standard), HTML or XHTML (meaningful only if the document contents are HTML), and Text (all tags are stripped, may be useful to generate code or data using the XML Query language).
Omit XML Declaration: strips the <?xml
header.
Indentation: makes the output prettier by adding whitespace.
Note that not all standard serialization options are available, only the most common ones. The command-line tool allows for all options implemented by Qizx.
Exporting a whole Collection is not available currently in Qizx Studio, but it is in the qizx tool.
There are two option switches to control export of documents and collections:
Option -out
file
defines the export destination: it is a plain file if a Document is exported, it should be a directory if a Collection is exported. If it exists, it is overwritten, else it is created.
This option must come before the -export
option.
Option -export
member
selects a Document or Collection to export.
This option should come after -out
and serialization options.
Serialization options are introduced by the switch -X
immediately followed by an option name, then if applicable the value after a '=
' sign. Example: -Xmethod=XHTML -Xencoding=UTF-8
.
Serialization options are described in detail here.
Example:
qizx -g c:\work\xdb1 -l scifi -out myexporteddata -Xmethod=Html \ -export "/sample/book_data/Authors Blurbs"
In this section we are going to run queries on the database we have just created.
This section assume you have at least a basic knowledge of the XML Query language.
Note that the directory docs/samples/book_queries/
contains the queries needed to illustrate this lesson.
Qizx Studio currently provides a basic environment for editing and running XML Query queries. Later releases will likely offer debugging facilities.
Let us try this query (which is the contents of the file docs/samples/book_queries/4.xq
):
(: Find all books written by French authors. :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";
for $a in collection("/Authors")//t:author[@nationality = "France"]
for $b in collection("/Books")//t:book[.//t:author = $a/t:fullName]
return
$b/t:title
In Qizx Studio, switch to the XQuery tab, then use the menu → to load the file mentioned above.
Note that you can also save to a file a query that you have entered or edited in Qizx Studio.
There is an history that allows running again former queries, so it is not necessary to save intermediary experiments.
Then, if you have created the XML Library as indicated in the previous sections, you can use the button Execute to run the query. After execution, we should obtain something similar to this:
Notice that in the picture above the display mode of the right-side view has been changed to "Data Model", using the View combo-box. This makes it easier to see the Data Model structure.
The result sequence contains one item, which is a element t:title
whose string value is "Planet of the Apes
".
We can for example change the value "France
" to "US
" in the query, and get a sequence of 8 items.
In the same location, there are a few other queries that you can also try.
The result items in the right-side view can be exported into a file using a button in the header. Notice that the resulting file will not in general be a well-formed XML document.
Diagnostic view, the view at bottom left contains messages, which can be simple information (execution times) or possible execution errors.
Compilation and execution errors have generally a link to the location in the source code. By clicking the link, the location of the error is displayed in the editor view.
For more information about the editor and the query history, please see the documentation of Qizx Studio.
To run queries on a particular Library, it is sufficient to specify a XQuery source file on the command-line:
qizx -g c:\work\xdb1 -l scifi 4.xq -out results.xml
Of course, like before, we specify the Library with -g
and -l
(or -group
and -library
) switches.
Results are displayed on the console (or standard output). Retrieving results into a file works like export, by using -out
and serialization options.
In this short section, we will see how to perform the basic tasks of copying, renaming and deleting Documents and Collections.
In Qizx Studio, these tasks are fairly easy to perform: just right-click on the library member to copy, rename or delete, and select the proper menu item.
For copy and rename, you are prompted for a destination path: this path should be inside an existing collection, and should not point to an existing object.
The -delete
option switch can be used to delete any Library member given its path (Collection or Document):
qizx -g c:\work\xdb1 -l scifi -delete /Authors
There are no option switches for renaming and copying. You can resort to a script (let us put it in a file named rename.xq
):
declare variable $src-member external; declare variable $dst-member external; try { xlib:rename-member($src-member, $dst-member), xlib:commit() } catch($err) { element error { $err } }
It is highly recommended to wrap the operation within a try-catch
, because the functions xlib:rename-member() and xlib:commit() have side effects. The try-catch
extension guarantees that its body (the try clause) is evaluated only once and in the order specified.
It is highly recommended to wrap the operation within a try/catch, because otherwise the execution would be performed twice (for the sake of display) and an error would happen (the second rename cannot work).
To run the script, use the -D
option switch to bind a value with the variables $src-member
and $dst-member
:
qizx -g c:\work\xdb1 -l scifi rename.xq -Dsrc-member=/Authors/iasimov.xml \ -Ddst-member=/Authors/IsaacAsimov.xml
Of course, the copy operation can be performed in the same way using the extension function xlib:copy-member
.
As of version 2.1, Qizx supports the XQuery Update extension. This extension is a powerful mechanism well integrated with the base XQuery language thats allows modifications at Node level.
To understand the basics of XQuery Update, we recommend reading our tutorial "XQuery Update for the impatient".
Using XQuery Update in Qizx is straightforward: since XQuery Update is an extension of XQuery, executing an updating script is the same as running any other query. This is very much like in SQL, using a SELECT ... UPDATE
instruction instead of a simple SELECT
.
Qizx is designed for performing fast queries, not fast updates. Its design has deliberately sacrificed the capability to perform fast local updates inside large documents, in order to achieve greater querying speed. So we advise against updating documents larger than about one megabyte. Small documents can be updated as quickly as in any other XML database.
Example 4.1. Delete a Node
Still using the same example data as before, let us suppose we want to remove the third pseudonym of the author Jack Vance:
declare namespace t = "http://www.qizx.com/namespace/Tutorial"; let $a := collection("/Authors")//t:author[t:fullName = "Jack Vance"] return delete node $a//t:pseudonym[3]
This returns an empty sequence, because updating expressions like delete node, insert node
etc always return an empty sequence.
In Qizx Studio, a commit is performed automatically, so we only have to check the document /Authors/jvance.xml
to see the result. It should now contain 4 pseudonyms instead of 5, the element <pseudonym>Peter Held</pseudonym>
should have disappeared.
Example 4.2. insert the Spanish edition of "Planet of the Apes".
declare default element namespace "http://www.qizx.com/namespace/Tutorial"; let $book := collection("/Books")//book[title ="Planet of the Apes"] let $e := <edition> <ISBN>9788466303736</ISBN> <publisher>Suma de Letras</publisher> <language>Castellano</language> <year>2001</year> </edition> return insert node $e into $book/editions
Notice that here we use "declare default element namespace" so that the inserted nodes <edition>...
have the proper namespace.
In this lesson, we will see what are Properties and how they can be useful.
Collection
s and Document
s can hold any number of named properties. Some properties are created automatically by the database engine (we call them system properties), but it is also possible to add properties at will (user properties).
An important aspect is that Properties can be queried: it is possible to run a special type of queries that return a sequence of documents or collections whose properties match the query. This is a very powerful mechanism as we will see below.
A property has a name (a simple name without namespace) and a value. The possible types of a property value are:
Boolean.
Long integer (corresponds to XQuery type xs:integer
).
Double.
String.
java.util.Date, a date/time with millisecond precision.
Node, a single node of the XQuery data model, likely an element. This allows a property to contain rich structured information. Furthermore this XML value can be queried much in the same way as normal document content.
Any serializable Java object: this can be used through the Java API, but also in XQuery through the Java Binding mechanism, which allows handling arbitrary Java objects in XQuery.
Two system properties common to Collections and Documents are:
The nature of the Library member: "collection
" or "document
".
The absolute path of the Library member. Example: "/Author Blurbs/Philip_Jose_Farmer.xhtml
".
Properties are sometimes called metadata: this means properties can be used as metadata, that is, data describing data. For example, when specified, the public and system ids of the DTD of the document are stored as system properties. For documents, some statistics are computed automatically and added as properties. The source path or URI of a document and the date of import are also stored as properties.
Predefined properties are described in reference documentation.
So Properties can be used as user-defined metadata: they provide an easy way to associate information with documents without altering the contents of the documents.
Let's select a document in the Library, say /Authors/iasimov.xml
.
In the view Metadata Properties, you should see a list of properties of the document.
By right-clicking on one of the properties, and choosing "
", a dialog should appear:Thanks to this dialog, you can enter the name of a new property (here meta-info-1
), choose its type (here Node
), and enter a fragment of XML as a value.
After clicking
, the property should be visible in the property view.Suppose you want to find all documents which have a meta-info-1
property: go to the XQuery tab and enter this expression in the query editor, then run it.
xlib:query-properties("/Authors", nature="document" and meta-info-1)
You should obtain one item which is document("/Authors/iasimov.xml")
.
Remarks:
xlib:query-properties
is an extension function which returns a list of those library members which are contained within the collection passed as first argument, and match the boolean expression passed as second argument.
The boolean expression as second argument is standard XQuery, where properties are used as if they were XML elements.
Thus nature="document"
should be read as a library member whose property 'nature
' is equal to 'document
', while meta-info-1
should be read as a member which has a property named meta-info-1
.
It is even possible to do a full-text search on a property: for example use meta-info-1[ft:contains('field1')]
or equivalently: ft:contains('field1', meta-info-1)
.
There are no option switches to handle metadata properties in qizx. You have to resort to XQuery scripts using the extension functions described in the next section.
In addition to xlib:query-properties
, there are several functions in the xlib:
namespace to handle properties: see their description in Chapter 14, XML Library extension functions.
In these functions, the $member
parameter can be either a path (String) or a wrapped LibraryMember object obtained for example through the functions xlib:collection() or xlib:document(). .
xlib:property-names ($member
)
return a list of the names of properties owned by the object
xlib:get-property ($member, $name)
returns the value of the property.
xlib:set-property ($member, $name, $value)
Sets the value, creates the property if necessary. If the value is empty sequence, removes the property.
A call to this function should be committed with the function xlib:commit
.
Suppose you want to perform a XQuery query, but only in those documents which are marked with a boolean property latest-version
equal to true
(this would be a primitive way of doing versioning).
Let us assume the query to perform is //section[ft:contains('prevention AND hazard')]
(find a section containing the word hazard and the word prevention).
Then you can write a query like this:
xlib:query-properties("/", latest-version=true())//section[ft:contains('prevention AND hazard')]
Remarks:
The expression above is treated in a slightly special way by Qizx: normally the root of a Path Expression is a sequence of nodes, while here it is a sequence of library members. But Qizx performs an automatic expansion into a set of document nodes.
This mechanism is a powerful way to define a search domain for a query, according to criteria of arbitrary complexity. We will see in the next section a possible use of this capability.
An application of the technique presented in the previous section is the management of custom indexes.
An example: suppose you have documents which contain invoices. You would like to find the invoices where the average item price is greater than a certain value. Let's suppose the average price is computed as follows:
declare function local:average-item-price($invoice) { sum(for $item in $invoice/item return $item/price * $item/quantity) div sum($invoice/item/quantity) }
There are several possibilities:
Perform directly the query using this function:
collection("/invoices")/invoice[ local:average-item-price(.) >= 1000 ]
This can be very slow if there are many items.
Store the average price inside the document: this is not satisfactory, we do not want to pollute our data just for the sake of queries.
The finest solution is to use a user property named for example average-item-price
which contains this value. The property is initialized when the document is created or updated. Then the query can be written like as follows, and should be quite fast:
xlib:query-properties("/invoices", average-item-price >= 1000)/invoice
Generally speaking, a custom index is simply a property containing a value that is expensive to compute. This value is initialized once when creating or updating the document. Then it can be used to perform fast queries.