Table of Contents
These general purpose functions belong to the namespace denoted by the predefined "x:
" prefix. The x:
prefix refers to namespace "com.qizx.functions.ext
".
Serialization — the process of converting XML nodes into a stream of characters — is defined in XQuery 1.0 Specifications, however there is no standard function for performing serialization.
x:serialize
can output a document or a node into XML, HTML, XHTML or plain text, to a file or to the default output stream.
x:serialize($node
as node(),$options
as element(option) ) as xs:string?
Description: Serializes the element and all its content into text. The output can be a file (see options below).
Parameter $tree
: a XML tree to be serialized to text.
Parameter $options
: an element bearing options in the form of attributes: see below.
Returned value: The path of the output file if specified, otherwise the serialized result.
The options argument (which may be absent) has the form of an element of name "options" whose attributes are used to specify different options. For example:
x:serialize( $doc, <options output="out\doc.xml" encoding="ISO-8859-1" indent="yes"/>)
This mechanism is similar to XSLT's xsl:output specification and is very convenient since the options can be computed or extracted from a XML document.
Table 12.1. Implemented serialization options
option name | values | description |
---|---|---|
method | XML (default) XHTML, HTML, or TEXT | output method |
output / file | a file path | output file. If this option is not specified, the generated text is returned as a string. |
version | default "1.0" | version generated in the XML declaration. No validity check. |
standalone | "yes" or "no". | No check is performed. |
encoding | must be the name of an encoding supported by the JRE. | The name supplied is generated in the XML declaration. If different than UTF-8, it forces the output of the XML declaration. |
indent | "yes" or "no" (default "no"). | output indented. |
indent-value (extension) | integer value | specifies the number of space characters used for indentation. |
omit-xml-declaration | "yes" or "no" (default "no"). | controls the output of a XML declaration. |
include-content-type | "yes" or "no" (default "no"). | for XHTML and HTML methods, if the value is "yes", a META element specifying the content type is added at the beginning of element HEAD. |
escape-uri-attributes | "yes" or "no" (default "yes"). | for XHTML and HTML methods, escapes URI attributes (i.e specific HTML attributes whose value is an URI). |
doctype-public | the public ID in the DOCTYPE declaration. | Triggers the output of the DOCTYPE declaration. Must be used together with the doctype-system option. |
doctype-system | the system ID in the DOCTYPE declaration. | Triggers the output of the DOCTYPE declaration. |
auto-dtd (extension) | "yes" or "no" (default "yes"). | If the node is a document node and if this document has DTD information, then output a DOCTYPE declaration.
|
This function transforms an XML tree representing JSON data into JSON format.
The XML JSON tree is typically built by the x:content-parse function but can also be built by XQuery constructor.
In future versions supporting XQuery 3.0 Maps and Arrays, this function will also be able to serialize such data into JSON format.
x:serialize-json($json-data
as item(),$options
as element(option) ) as xs:string?
Description: Serializes the element and all its content into JSON format. The output can be a file (see options below) or a string.
Parameter $tree
: a XML tree representing JSON data to be serialized. This tree must conform with the JSON schema used by Qizx (see below).
Parameter $options
: an element bearing options in the form of attributes: see below.
Returned value: The path of the output file if specified, otherwise the serialized result.
The options argument (which may be absent) has the form of an element of name "options" whose attributes are used to specify different options. For example:
x:serialize-json( $doc, <options file="json.xml" />)
with $doc holding a XML document representing JSON data in the Qizx/JSON representation:
<?xml version='1.0'?> <map xmlns="com.qizx.json"> <pair name="a"> <number>1.0</number> </pair> <pair name="b"> <array> <boolean>true</boolean> <string>str</string> <map/> </array> </pair> <pair name="nothing"> <null/> </pair> </map>
then the file json.xml will contain:
{ "a": 1.0, "b": [ true, "str", { } ], "nothing": null }
Table 12.2. Implemented JSON serialization options
option name | values | description |
---|---|---|
method | XML (default) XHTML, HTML, or TEXT | output method |
output / file | a file path | output file. If this option is not specified, the generated text is returned as a string. |
indent | integer value | specifies the number of space characters used for indentation. |
function x:parse($xml-text) as node()?
Parses a string representing an XML document and returns a node built from that parsing. This can be useful for converting to a node a string from any origin.
Note that function x:eval could be used too (and it is more powerful, since any kind of node can be built with it), but there are some syntax differences: for example in x:eval, the curly braces {
and }
have to be escaped by duplicating them.
Parameter $xml-text
: A well-formed XML document as a string.
Returned value: A node of the Data Model if the string could be correctly parsed; the empty sequence if the argument was the empty sequence. An error is raised if there is a parsing error.
From version 4.2, Qizx offers a generic mechanism to plug Content Importers, i.e parsers of "semi-structured data", i.e data that is not XML, mais can easily transformed into XML representation, and then stored and manipulated in an XML database such as Qizx.
For example:
various dialects of HTML can be transformed into XML. The resulting XML can be serialized back into HTML using the x:serialize function above.
JSON can be mapped into XML: Qizx offers a built-in facility for parsing JSON data, using a specific schema for its XML representation.
Parsers for other formats are planned after version 4.2: Mime Mail (RFC822), CSV, and probably some office formats like RTF.
function x:parse-content($string, $format-name [, $options]) as node()?,
function x:content-parse($string, $format-name [, $options]) as node()?
Parses a string representing of some semi-structured data in the format specified by its name $format and returns a node built from that parsing.
: content-parse is the old name for parse-content and will be deprecated.
Parameter $string
: A well-formed XML document as a string.
Parameter $format-name
: A string naming the Content Importer. For example "html", "json". The recognized names are described for each Content Importer.
Parameter $options
: An XML node with an attribute for each option. For example <options namespaces="true"/>
Returned value: A node of the Data Model if the string could be correctly parsed; the empty sequence if the argument was the empty sequence. An error is raised if there is a parsing error.
function x:parse-url-content($url, $format-name [, $options]) as node()?
Parses semi-structured data located at $url, in the format specified by $format, and returns a node built from that parsing.
Parameter $url
: A well-formed URL. Supported URL protocols currently are http: and file: (by default).
Parameter $format-name
: A string naming the Content Importer. For example "html", "json". The recognized names are described for each Content Importer.
Parameter $options
: An XML node with an attribute for each option. For example <options namespaces="true"/>
Returned value: A node of the Data Model if the string could be correctly parsed; the empty sequence if the argument was the empty sequence. An error is raised if there is a parsing error.
for invoking the JSON parser, the value of the $format-name argument is "json" or "text/json".
No options available to date.
Example:
x:content-parse('{ "a" : 1, b:[true, "str", {}], nothing:null}', "json")
Produces
<?xml version='1.0'?> <map xmlns="com.qizx.json"> <pair name="a"> <number>1.0</number> </pair> <pair name="b"> <array> <boolean>true</boolean> <string>str</string> <map/> </array> </pair> <pair name="nothing"> <null/> </pair> </map>
Schema:
A JSON map is represented by a map
element with as many children pair
elements as there are key-value pairs in the map.
A pair element has an attribute name for the value of the key. Its child element represents the value.
A JSON array is represented by a array
element with as many children elements as there are array items.
JSON values are trivially represented as elements boolean
, number
, string
.
A JSON null value is represented by the empty element null
.
All elements use the namespace "com.qizx.json".
HTML parsing is performed by the TagSoup parser, allowing parsing "as it is found in the wild", i.e possibly malformed.
for invoking the HTML parser, the value of the $format-name argument is "html" or "text/html".
Recognizable options are either TagSoup option, or a short name for SAX features.
TagSoup options:
"ignore-bogons": A value of "true" indicates that the parser will ignore unknown elements.
"bogons-empty ": A value of "true" indicates that the parser will give unknown elements a content model of EMPTY; a value of "false", a content model of ANY.
"root-bogons" : A value of "true" indicates that the parser will allow unknown elements to be the root of the output document.
"default-attributes": A value of "true" indicates that the parser will return default attribute values for missing attributes that have default values.
"translate-colons: A value of "true" indicates that the parser will translate colons into underscores in names.
"restart-elements": A value of "true" indicates that the parser will attempt to restart the restartable elements.
i"gnorable-whitespace": A value of "true" indicates that the parser will transmit whitespace in element-only content via the SAX ignorableWhitespace callback. Normally this is not done, because HTML is an SGML application and SGML suppresses such whitespace.
"cdata-elements": A value of "true" indicates that the parser will process the script and style elements (or any elements with type='cdata' in the TSSL schema) as SGML CDATA elements (that is, no markup is recognized except the matching end-tag).
SAX features:
Short names are used: for example "namespaces" is a short name for "http://xml.org/sax/features/namespaces".
"namespaces"
"namespace-prefixes"
"external-general-entities"
"external-parameter-entities"
etc... see the documentation of TagSoup.
HTML5 parsing is performed by the parser by Henri Sivonen and Mozilla Foundation (c) 2007-2010.
for invoking the HTML parser, the value of the $format-name argument is "html5" or "text/html5".
In addition to SAX features (short names), recognizable options are:
"unicode-normalization-checking"
"html4-mode-compatible-with-xhtml1-schemata"
"mapping-lang-to-xml-lang"
"scripting-enabled"
The x:transform
function invokes a XSLT style-sheet on a node and can retrieve the results of the transformation as a tree, or let the style-sheet output the results.
This is a useful feature when one wants to transform a document (for example extracted from the XML Libraries) or a computed fragment of XML into different output formats like HTML, XSL-FO etc.
This example generates the transformed document $doc
into a file out\doc.xml
:
x:transform( $doc, "ssheet1.xsl", <parameters param1="one" param2="two"/>, <options output-file="out\doc.xml" indent="yes"/>)
The next example returns a new document tree. Suppose we have this very simple stylesheet which renames the element "doc
" into "newdoc
":
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version ="1.0" > <xsl:template match="doc"> <newdoc><xsl:apply-templates/></newdoc> </xsl:template> </xsl:stylesheet>
The following XQuery expression:
x:transform( <doc>text</doc>, "ssheet1.xsl", <parameters/> )
returns:
<newdoc>text</newdoc>
x:transform($source
as node(),$stylesheet-URI
as xs:string,$xslt-parameters
as element(parameters) [,$options
as element(options)] ) as node()?
Transforms the source tree through a XSLT stylesheet. If no output file is explicitly specified in the options, the function returns a new tree.
Parameter $source
: a XML tree to be transformed. It does not need to be a complete document.
Parameter $stylesheet-URI
: the URI of a XSLT stylesheet. Stylesheets are cached and reused for consecutive transformations.
Parameter $xslt-parameters
: an element holding parameter values to pass to the XSLT engine. The parameters are specified in the form of attributes. The name of an attribute matches the name of a xsl:param
declaration in the stylesheet (namespaces can be used). The value of the attribute is passed to the XSLT transformer.
Parameter $options
: [optional argument] an element holding options in the form of attributes: see below.
Returned value: if the path of an output file is not specified in the options, the function returns a new document tree which is the result of the transformation of the source tree. Otherwise, it returns the empty sequence.
Table 12.3. XSLT transform options
option name | values | description |
---|---|---|
output-file | An absolute file path. | Output file. If this option is not specified, the generated tree is returned by the function, otherwise the function returns an empty sequence. |
XSLT output properties (instruction xsl:output): version, standalone, encoding, indent, omit-xml-declaration etc. | These options are used by the style-sheet for outputting the transformed document. They are ignored if no output-file option is specified. | |
Specific options of the XSLT engine (Saxon or default XSLT engine) | An invalid option may cause an error. |
The connection with an XSLT engine uses generic JAXP interfaces, and thus must copy XML trees passed in both directions. This is not as efficient as it could be and can even cause memory problems if the size of processed documents is larger then a few dozen megabytes, depending on the available memory size.
The following functions allow dynamically compiling and executing XQuery expressions.
function x:eval( $expression
as xs:string )
as xs:any
Compiles and evaluates a simple expression provided as a string.
The expression is executed in the context of the current query: it can use global variables, functions and namespaces of the current static context. It can also use the current item '.' if defined in the evaluation context.
However there is no access to the local context (for example if x:eval
is invoked inside a function, the arguments or the local variables of the function are not visible.)
Parameter $expression
: a simple expression (cannot contain prologue declarations).
Returned value: evaluated value of the expression.
Example:
declare variable $x := 1; declare function local:fun($p as xs:integer) { $p * 2 }; let $expr := "1 + $x, local:fun(3)" return x:eval($expr)
This should return the sequence (2, 6)
.
The following functions can be used to quickly estimate the count of documents returned by a query, when an exact count of all results would be too long to compute. They are designed to work on tens of millions of documents.
The estimated count provided by these functions is valid under the following conditions:
There is zero or one result ("hit") of the query per document The functions count documents, not nodes.
The query is applied to an homogeneous domain (a collection, typically): that is, each document in the domain has a chance to match the query (or in other terms, the domain does not contain documents that cannot match the query, and would only distort the count estimation).
Examples: assume collection /Products (the domain) contains only documents whose main node is 'Product'.
The estimated count in the following example would be the size of collection('/Products'):
x:count-estimate(collection('/Products')//Product)
In the following example the function looks at the first result items and estimates the total number by extrapolation: this would be a fraction of the the size of the domain represented by collection('/Products')
. The accuracy can be controlled by an optional parameter (see below):
x:count-estimate(collection('/Products')//Product [ price > 10 ])
function x:count-estimate( $
query [, $min-count as xs:integer ])
as xs:boolean
Returns an estimated count of documents matching the query.
Parameter $query
: any query that matches one node per document at most. Should be an expression, which is evaluated within the x:count-estimate function. Passing an already evaluated sequence brings no profit.
Parameter $min-count
: Optional (default value is 200). Controls the accuracy of the count estimation. This is the number of result items enumerated before doing the estimation. The estimated count is then obtained by comparing the current position in the search domain to the size of the domain, and extrapolating. A larger $min-count gives a better accuracy, but can lead to slower execution.
Returned value: an integer item. If smaller than $min-count, this value represents the exact count. Otherwise the value is strongly rounded to provide a precision of about 10% (for example 11000 instead of 10653).
function x:paged-query( $page-start as xs:integer, $page-size as xs:integer, $query [, $min-count as xs:integer ]) as xs:boolean
Similar to x:count-estimate() but in addition returns a "page" of result items.
This function could be implemented with x:count-estimate() and subsequence($query, $page-start, $page-size), but it combines the two operations in a slightly more efficient way.
Parameter $query
: any query that matches one node per document at most. This expression is in fact a function (or "lambda expression") passed to the x:paged-query function itself. Passing an expression already evaluated (e.g using a variable) would bring no profit.
Parameter $page-start
: The desired start position in the result sequence.
Parameter $page-size
: The desired number of result items.
Parameter $min-count
: Optional (default value is 200). Controls the accuracy of the count estimation. This is the number of result items enumerated before doing the estimation. The estimated count is then obtained by comparing the current position in the search domain to the size of the domain, and extrapolating. A larger $min-count gives a better accuracy, but can lead to slower execution.
Returned value: A sequence made of: first an integer item which is the estimated count, exactly like in x:count-estimate() , then the items of the page.
The following functions match the string-value of nodes (elements and attributes) with a pattern.
Example 1: this expression returns true if the value of the attribute @lang matches the SQL-style pattern:
x:like( "en%", $node/@lang )
Example 2: this expression returns true if the content of the element 'NAME' matches the pattern:
$p/NAME[ x:like( "Theo%" ) ]
function x:like( $pattern
as xs:string [, $context-nodes as node()* ])
as xs:boolean
Returns true if the pattern matches the string-value of at least one node in the node sequence argument.
Parameter $pattern
: a SQL-style pattern: the wildcard '_
' matches any single character, the wildcard '%
' matches any sequence of characters.
Parameter $context-nodes
: optional sequence of nodes. The function checks sequentially the string-value of each node against the pattern. If absent, the argument default to '.
', the current item. This makes sense inside a predicate, like in the example 2 above.
Returned value: a boolean.
function x:ulike($pattern
as xs:string [,$context-nodes
as node()* ]) as xs:boolean
This function is very similar to x:like
, except that the pattern has syntax à la Unix ("glob pattern"). The character '?
' is used instead of '_
' (single character match), and '*
' instead of '%
' (multi-character match).
Note: these functions — as well as the standard fn:matches
function, and the full-text functions — are automatically recognized by the query optimizer which uses library indexes to boost their execution whenever possible.
This function allows testing if a item belongs to a range, in a optimized way.
This function is used typically to optimize a predicate in a Library query, for example
//element[ x:in-range(@weight, 1, 10) ]
which is equivalent to
//element[ @weight >= 1 and @weight <= 10 ]
The reason for this function is that the query optimizer is not able to detect such a double test in all situations. The function could become useless in later versions of Qizx, after improvement of the query optimizer.
function x:in-range( $value, $low-bound as item(), $high-bound as item() ) as xs:boolean function x:in-range( $value, $low-bound as item(), $high-bound as item(), $low-included as xs:boolean, $high-included as xs:boolean ) as xs:boolean
Returns true if at least one item from the sequence $value
belongs to the range defined by other parameters.
Parameter $value
: Any sequence of items. Items must be comparable to the bounds, otherwise a type error is raised.
Parameters $low-bound
, $high-bound
: Lower and upper bounds of the range. They must be of compatible types.
Parameters $low-included
: If $low-included
is equal to true()
, the comparison used is $low-bound <= $value
, otherwise $low-bound < $value
. If absent, <=
is assumed.
Parameters $high-included
: If $high-included
is equal to true()
, the comparison used is $value <= $high-bound
, otherwise $value < $high-bound
. If absent, <=
is assumed.
Returned value: True if at least one item from the sequence $value
belongs to the range defined by $low-bound
, $high-bound
.
Qizx is compliant with the W3C Recommendation. The only differences at present are extensions of the cast operation: Qizx can directly cast date, time, dateTime and durations to and from double values representing seconds, and keeps the extended "constructors" that build date, dateTime, etc, from numeric components like days, hours, minutes, etc.
In order to make computations easier, Qizx can:
Cast xdt:yearMonthDuration
to numeric values: this yields the number of months. The following expression returns 13:
xdt:yearMonthDuration("P1Y1M") cast as xs:integer
Conversely, cast numeric value representing months to xdt:yearMonthDuration
. The following expression holds true:
xdt:yearMonthDuration(13) = xdt:yearMonthDuration("P1Y1M")
Cast xdt:daytimeDuration
to double: this yields the number of seconds. The following expression returns 7201:
xdt:dayTimeDuration("PT2H1S") cast as xs:double
Conversely, cast a numeric value representing seconds to xdt:daytimeDuration
.
Cast xs:dateTime
to double. This returns the number of seconds elapsed since ``the Epoch'', i.e. 1970-01-01T00:00:00Z. If the timezone is not specified, it is considered to be UTC (GMT).
Conversely, cast a numeric value representing seconds from the origin to a dateTime with GMT timezone.
cast from/to the xs:date
type in a similar way (like a dateTime with time equal to 00:00:00).
xdt:date("1970-01-02") cast as xs:double = 86400
cast from/to the xs:time
type in a similar way (seconds from 00:00:00).
xdt:time("01:00:00") cast as xs:double = 3600
These constructors allow date, time, dateTime objects to be built from numeric components (this is quite useful in practice).
function xs:date($year
as xs:integer,$month
as xs:integer,$day
as xs:integer ) as xs:date
Builds a xs:date
from a year, a month, and a day in integer form. The implicit timezone is used.
For example xs:date(1999, 12, 31)
returns the same value as xs:date("1999-12-31")
.
function xs:time($hour
as xs:integer,$minute
as xs:integer,$second
as xs:double ) as xs:time
Builds a xs:time
from an hour, a minute as integer, and seconds as double. The implicit timezone is used.
function xs:dateTime($year
as xs:integer,$month
as xs:integer,$day
as xs:integer,$hour
as xs:integer,$minute
as xs:integer,$second
as xs:double [,$timezone
as xs:double] ) as xs:dateTime
Builds a xs:dateTime
from the six components that constitute date and time.
A timezone can be specified: it is expressed as a signed number of hours (ranging from -14 to 14), otherwise the implicit timezone is used.
These functions are kept for compatibility. They are slightly different than the standard functions:
they accept several date/time and durations types for the argument (so for example we have get-minutes instead of get-minutes-from-time, get-minutes-from-dateTime etc.),
but they do not accept untypedAtomic (node contents): such an argument should be cast to the proper type before being used. So the standard function might be as convenient here.
function get-seconds( $moment
)
as xs:double?
Returns the "second" component from a xs:time, xs:dateTime, and xs:duration.
Can replace fn:seconds-from-dateTime, fn:seconds-from-time, fn:seconds-from-duration, except that the returned type is double instead of decimal, and an argument of type xdt:untypedAtomic is not valid.
function get-all-seconds( $duration
)
as xs:double?
Returns the total number of seconds from a xs:duration. This does not take into account months and years, as explained above.
For example get-all-seconds(xs:duration("P1YT1H"))
returns 3600.
function get-minutes( $moment
)
as xs:integer?
Returns the "minute" component from a xs:time
, xs:dateTime
, and xs:duration
.
function get-hours( $moment
)
as xs:integer?
Returns the "hour" component from a xs:time
, xs:dateTime
, and xs:duration
.
function get-days( $moment
)
as xs:integer?
Returns the "day" component from a xs:date
, xs:dateTime
, xs:day
, xs:monthDay
and xs:duration
.
function get-months( $moment
)
as xs:integer?
Returns the "month" component from a xs:date
, xs:dateTime
, xs:yearMonth
, xs:month
, xs:monthDay
and xs:duration
.
function get-years( $moment
)
as xs:integer?
Returns the "year" component from a xs:date
, xs:dateTime
, xs:year
, xs:yearMonth
and xs:duration
.
function get-timezone( $moment
)
as xs:duration?
Returns the "timezone" component from any date/time type and xs:duration
.
The returned value is like timezone-from-*
except that the returned type is xs:duration
, not xdt:dayTimeDuration
.
Early versions of XQuery had no mechanism to handle run-time errors. Qizx introduced its own try/catch since the very first version.
Qizx now supports the standard try/catch defined in XQuery 3.0.
For the record, the try/catch construct provided by early versions of Qizx (still supported) is documented here:
try {} catch(
expr
$error
) {fallback-expr
}
The try/catch extended language construct first evaluates the body expr
. If no error occurs, then the result of the try/catch is the return value of this expression.
If an error occurs, the local variable $error
receives a string value which is the error message, and fallback-expr
is evaluated (with possible access to the error message). The resulting value of the try/catch is in this case the value of this fallback expression. An error in the evaluation of the fallback-expression is not caught.
The type of this expression is the type that encompasses the types of both arguments.
The body (first expression) is guaranteed to be evaluated completely before exiting the try/catch - unless an error occurs. In other terms, lazy evaluation, which is used in most Qizx expressions, does not apply here.
This is specially important when functions with side-effects are called in the body. If such functions generate errors, these errors are caught by the try/catch, as one can expect. Otherwise lazy evaluation could produce strange effects.
Example: tries to open a document, returns an element error
with an attribute msg
containing the error message if the document cannot be opened.
try { doc("unreachable.xml") } catch($err) { <error msg="{$err}"/> }