User Guide for Sesame 2.7


Table of Contents

1. rdf:about Sesame 2
1.1. Introduction
1.2. How this manual is organized
2. Downloading Sesame
2.1. Maven artifacts
2.2. Subversion code repository
3. Short introduction to Sesame's components
4. Setting up to use the Sesame libraries
4.1. Downloading the libraries
4.2. Using Apache Maven
4.2.1. Maven Repository
4.2.2. Maven Artifacts
4.3. Logging: SLF4J initialization
5. Server software installation
5.1. Required software
5.2. Sesame Server and Workbench installation
5.3. Logging Configuration
5.4. Repository Configuration
6. Sesame Console
6.1. Getting started
6.2. Connecting to a set of repositories
6.3. Repository list
6.4. Creating a repository
6.5. Other commands
6.6. Repository configuration
6.6.1. Memory store configuration
6.6.2. Native store configuration
6.6.3. HTTP repository configuration
6.7. Repository configuration templates (advanced)
7. Application directory configuration
8. Basic Programming with Sesame
8.1. The RDF Model API
8.2. The Repository API
8.2.1. Creating a Repository object
8.2.2. Using a repository: RepositoryConnections
8.2.3. Working with Graphs, Collections and Iterations
8.2.4. Using context
8.2.5. Transactions
9. Parsing/Writing RDF with Rio
9.1. Listening to the parser
9.2. Parsing a file and collecting all triples
9.3. Using your own RDFHandler: counting statements
9.4. Writing RDF
9.5. Detecting the file format
10. HTTP communication protocol for Sesame 2
10.1. Protocol summary
10.2. Protocol version
10.2.1. Request examples
10.3. Repository list
10.3.1. Request examples
10.4. Repository queries
10.4.1. Requests examples
10.5. Repository removal
10.5.1. Request examples
10.6. Repository statements
10.6.1. Request examples
10.7. Context lists
10.7.1. Request examples
10.8. Namespace declaration lists
10.8.1. Request examples
10.9. Namespace declarations
10.9.1. Request examples
10.10. Repository size
10.10.1. Request examples
10.11. Graph Store support
10.11.1. Request examples
10.12. Content types
10.13. TODO
11. The SeRQL query language (revision 3.1)
11.1. Revisions
11.1.1. revision 1.1
11.1.2. revision 1.2
11.1.3. revision 2.0
11.1.4. revision 3.0
11.1.5. revision 3.1
11.2. Introduction
11.3. URIs, literals and variables
11.3.1. Variables
11.3.2. URIs
11.3.3. Literals
11.3.4. Blank Nodes (R1.2)
11.4. Path expressions
11.4.1. Basic path expressions
11.4.2. Path expression short cuts
11.4.3. Optional path expressions
11.5. Select- and construct queries
11.6. Select queries
11.7. Construct queries
11.8. The WHERE clause
11.8.1. Boolean constants
11.8.2. Value (in)equality
11.8.3. SameTerm (R3.1)
11.8.4. Numerical comparisons
11.8.5. Bound() (R3.0)
11.8.6. isUri() and isBnode() (R1.2)
11.8.7. Like (R1.2)
11.8.8. Regex (R3.1)
11.8.9. LangMatches (R3.1)
11.8.10. And, or, not
11.8.11. In (R3.1)
11.8.12. Nested WHERE clauses (R1.2)
11.9. Other functions
11.9.1. label(), lang() and datatype()
11.9.2. namespace() and localName() (R1.2)
11.9.3. str() (R3.1)
11.10. The ORDER BY clause
11.11. The LIMIT and OFFSET clauses
11.12. The USING NAMESPACE clause
11.13. Built-in predicates (REVISED in R2.0)
11.14. Set combinatory operations
11.14.1. UNION (REVISED in R3.0, extended in R3.1)
11.14.2. INTERSECT (R1.2)
11.14.3. MINUS (R1.2)
11.15. Query Nesting
11.15.1. IN (R1.2)
11.15.2. ANY and ALL (R1.2)
11.15.3. EXISTS (R1.2)
11.16. Querying context (R2.0)
11.17. Example SeRQL queries
11.17.1. Query 1
11.17.2. Query 2
11.17.3. Query 3
11.18. References
11.19. SeRQL grammar
Glossary

List of Figures

3.1. A high-level overview of Sesame's most prominent components and their dependencies
11.1. A basic path expression
11.2. Multi-value nodes
11.3. Multi-value nodes in a longer path expression
11.4. Branches in a path expression
11.5. Branches in a longer path expression
11.6. A reification path expression
11.7. Path expression for query 1
11.8. Path expression for query 2
11.9. Path expression for query 3

List of Tables

10.1. MIME types for RDF formats
10.2. MIME types for variable binding formats
10.3. MIME types for boolean result formats
11.1. Default namespaces

Chapter 1. rdf:about Sesame 2

1.1. Introduction

Sesame is an open source Java framework for storage and querying of RDF data. The framework is fully extensible and configurable with respect to storage mechanisms, inferencers, RDF file formats, query result formats and query languages. Sesame offers a JBDC-like user API, streamlined system APIs and a RESTful HTTP interface supporting the SPARQL Protocol for RDF.

Of course, a framework isn't very useful without implementations of the various APIs. Out of the box, Sesame supports SPARQL and SeRQL querying, a memory-based and a disk-based RDF store and RDF Schema inferencers. It also supports most popular RDF file formats and query result formats. Various extensions are available or are being worked at elsewhere.

Originally, Sesame was developed by Aduna (then known as Aidministrator) as a research prototype for the hugely successful EU research project On-To-Knowledge. When this work ended in 2001, Aduna continued the development in cooperation with NLnet Foundation, developers from Ontotext, and a number of volunteer developers who contribute ideas, bug reports and fixes.

Sesame is currently developed as a community project, with Aduna as the project leader. Community support is available from www.openrdf.org. Aduna also offers commercial support and consultency services, feel free to contact us for more information.

1.2. How this manual is organized

This user manual covers most aspects of working with Sesame in a variety of settings. In Chapter 2, Downloading Sesame, we explain how and where to obtain Sesame binaries, source code and/or Maven artifacts. Chapter 3, Short introduction to Sesame's components gives a brief overview of the architecture of the framework, and is a recommend background read for anyone who intends to start using Sesame. Chapter 4, Setting up to use the Sesame libraries explains how to set up your development enviroment and is useful if you intend to write your own (Java) programs using the Sesame libraries. In Chapter 5, Server software installation we explain how to install the Sesame Server and Workbench web applications, and Chapter 6, Sesame Console explains the workings of the Sesame command line console, which is a useful tool to quickly create and use a (local) RDF store, but which can also be used as a command line client to connect with a Sesame Server. In Chapter 7, Application directory configuration we explain where Sesame Server, Workbench and Console stores their data and how you can reconfigure that location.

The basics of programming with Sesame are covered in Chapter 8, Basic Programming with Sesame. Chapter 10, HTTP communication protocol for Sesame 2 gives an overview of the structure of the HTTP REST protocol for the Sesame Server, which is useful if you want to communicate with a Sesame Server from a programming language other than Java. Finally, Chapter 11, The SeRQL query language (revision 3.1) documents Sesame's SeRQL query language.

Chapter 2. Downloading Sesame

If you follow the download links from the download section at openRDF.org you'll find that there are a number of different files that you can download. Which one you need depends on what you want to do with it:

  • openrdf-sesame-(version)-sdk.tar.gz. This is a GNU-zipped tar archive of the complete binary version of the Sesame SDK. It includes all the Sesame libraries packaged as jar files and a set of Web Archive (.war) files for easy deployment of Sesame's web applications (see Chapter 5, Server software installation). It also includes documentation (such as this user manual, and the API documentation), as well as startup scripts for the Sesame command console (see Chapter 6, Sesame Console).
  • openrdf-sesame-(version)-sdk.zip. This is a zip archive that has the same contents as the tar.gz file mentioned before.
  • openrdf-sesame-(version)-onejar.jar. This is a Java Archive (.jar) file containing all the relevant Sesame libraries. The main purpose of this jar file is easy inclusion of all Sesame components when using it as an embedded component/library in your own application: you only need to add this one jar file to your classpath and you can start programming against it.

2.1. Maven artifacts

The Sesame libraries are also available as artifacts in a Maven repository. This Maven repository is located at http://repo.aduna-software.org/maven2/releases/. Developers using Maven can use this repository to automatically resolve library dependencies and to get the latest released versions of each library. See Chapter 4, Setting up to use the Sesame libraries for more information on how to set up your POM and which dependencies to include.

For instructions how to use the Maven build system, see http://maven.apache.org/.

2.2. Subversion code repository

Sesame's source code is available from our Subversion (SVN) repository: http://repo.aduna-software.org/svn/org.openrdf/sesame/. There is an SVN Web Viewer available as well at http://repo.aduna-software.org/websvn/listing.php?repname=aduna&path=/org.openrdf/sesame/. Source code for each release can be found in the tags directory, the development version of the code is in the trunk directory. Branches exist for each major release (2.4, 2.5, 2.6, etc.), with consecutive minor releases (2.6.1, 2.6.2, etc.) being developed within that branch.

Chapter 3. Short introduction to Sesame's components

Before diving into the internals of Sesame, we will start with a short introduction to Sesame by giving a high-level overview of its components. It's important to have some basic knowledge about this as the rest of this document will often refer to various components that are touched upon here. It is assumed that the reader has at least some basic knowledge about RDF, RDF Schema, OWL, etc. If this is not the case, some introductory articles can be found at the following locations:

We will try to explain the Sesame framework using the following figure, which shows the most prominent components and APIs in Sesame and how they are built on top of each other. Each component/API depends on the components/APIs that are beneath them.

Figure 3.1. A high-level overview of Sesame's most prominent components and their dependencies

A high-level overview of Sesame's most prominent components and their dependencies

All the way at the bottom of the diagram is the RDF Model, the foundation of the Sesame framework. Being an RDF-oriented framework, all parts of Sesame are to some extent dependent on this RDF model, which defines interfaces and implementation for all basic RDF entities: URI, blank node, literal and statement.

Rio, which stands for "RDF I/O", consists of a set of parsers and writers for various RDF file formats. The parsers can be used to translate RDF files to sets of statements, and the writers for the reverse operation. Rio can also be used independent of the rest of Sesame.

The Storage And Inference Layer (SAIL) API is a low level System API (SPI) for RDF stores and inferencers. Its purpose is to abstract from the storage and inference details, allowing various types of storage and inference to be used. The SAIL API is mainly of interest for those who are developing SAIL implementations (typically, triplestore developers), for all others it suffices to know how to create and configure one. There are several implementations of the Sail API, for example the MemoryStore which stores RDF data in main memory, and the NativeStore which uses dedicated on-disk data structures for storage.

The Repository API is a higer level API that offers a large number of developer-oriented methods for handling RDF data. The main goal of this API is to make the life of application developers as easy as possible. It offers various methods for uploading data files, querying, and extracting and manipulating data. There are several implementations of this API, the ones shown in this figure are the SailRepository and the HTTPRepository. The former translates calls to a SAIL implementation of choice, the latter offers transparent client-server communication with a Sesame server over HTTP.

The top-most component in the diagram is the HTTP Server. The HTTP Server consists of a number of Java Servlets that implement a protocol for accessing Sesame repositories over HTTP. The details of this protocol can be found in Sesame's system documentation, but most people can simply use a client library to handle the communication. The HTTPClient that is used by the HTTPRepository is one such library.

While each part of the Sesame code is publicly available and extensible, most application developers will be primarily interested in the Repository API. This API is described in more detail in Chapter 8, Basic Programming with Sesame.

Chapter 4. Setting up to use the Sesame libraries

In this chapter, we explain some basics about setting up your application development environment to work with Sesame. In Chapter 8, Basic Programming with Sesame we go into details of the use of the APIs. If you do not want to program against the Sesame libraries but just want to install and run the Sesame HTTP server, please skip ahead to Chapter 5, Server software installation.

4.1. Downloading the libraries

As was explained in Chapter 2, Downloading Sesame, various download options are available to you. The quickest way to get started with using the Sesame libraries is to download the Sesame onejar library and include it in your classpath.

However, it is important to note that the Sesame Framework consists of a set of libraries: Sesame is not a monolithic piece of software, you can pick and choose which parts you want and which ones you don't. In those cases where you don't care about picking and choosing and just want to get on with it, the onejar is a good choice. If, however, you want a little more control over what is included, you can download the complete SDK and select (from the lib directory) those sesame libraries that you require.

4.2. Using Apache Maven

An alternative to picking libraries by hand is to use Maven. Apache Maven is a software management tool that helps you by offering things like library version management and dependency management (which is very useful because it means that once you decide you need a particular Sesame library, Maven automatically downloads all the libraries that your library of choice requires in turn), and giving you a handy-dandy build environment. For details on how to start using Maven, we advise you to take a look at the Apache Maven website at http://maven.apache.org/. If you are familiar with Maven, here are a few pointers to help set up your maven project.

4.2.1. Maven Repository

OpenRDF Sesame has its own Maven repository. To configure your project to use the correct repository, add the following to your project's pom.xml (or to your Maven configuration file: settings.xml):

<repositories>
  ...
  <repository>
	 <id>aduna-opensource.releases</id>
	 <name>Aduna Open Source - Maven releases</name>
	 <url>http://repo.aduna-software.org/maven2/releases</url>
  </repository>
</repositories>

4.2.2. Maven Artifacts

The groupId for all Sesame core artifacts is org.openrdf.sesame. To include a maven dependency in your project that automatically gets you the entire Sesame core framework, use artifactId sesame-runtime:

<dependency>
  <groupId>org.openrdf.sesame</groupId>
  <artifactId>sesame-runtime</artifactId>
  <version>${sesame.version}</version>
</dependency>

For many projects you will not need the entire Sesame framework, however, and you can fine-tune your dependencies so that you don't include more than you need. Here are some typical scenarios and the dependencies that go with it. Of course, it's up to you to vary on these basic scenarios and figure exactly which components you need (and if you don't want to bother you can always just use the `everything and the kitchen sink' sesame-runtime dependency).

4.2.2.1. Simple local storage and querying of RDF

If you require functionality for quick in-memory storage and querying of RDF, you will need to include dependencies on the SAIL repository module (artifactId sesame-repository-sail) and the in-memory storage backend module (artifactId sesame-sail-memory):

<dependency>
  <groupId>org.openrdf.sesame</groupId>
  <artifactId>sesame-repository-sail</artifactId>
  <version>${sesame.version}</version>
</dependency>
<dependency>
  <groupId>org.openrdf.sesame</groupId>
  <artifactId>sesame-sail-memory</artifactId>
  <version>${sesame.version}</version>
</dependency>

A straightforward variation on this scenario is of course if you decide you need a more scalable persistent storage instead of (or alongside) simple in-memory storage. In this case, you can include the native store:

<dependency>
  <groupId>org.openrdf.sesame</groupId>
  <artifactId>sesame-sail-nativerdf</artifactId>
  <version>${sesame.version}</version>
</dependency>

4.2.2.2. Parsing / writing RDF files

The Sesame parser toolkit is called Rio, and it is split in several modules: one for its main API (sesame-rio-api), and one for each specific syntax format. If you require functionality to parse or write an RDF file, you will need to include a dependency on any of the parsers for that you will want to use. For example, if you expect to need an RDF/XML syntax parser and a Turtle syntax writer, include the following 2 dependencies (you do not need to include the API dependency explicitly since each parser implementation depends on it already):

<dependency>
  <groupId>org.openrdf.sesame</groupId>
  <artifactId>sesame-rio-rdfxml</artifactId>
  <version>${sesame.version}</version>
</dependency>
<dependency>
  <groupId>org.openrdf.sesame</groupId>
  <artifactId>sesame-rio-turtle</artifactId>
  <version>${sesame.version}</version>
</dependency>

4.2.2.3. Accessing a remote Server

If your project only needs functionality to query/manipulate a remotely running Sesame server, you can stick to just including the HTTPRepository module (sesame-repository-http):

<dependency>
  <groupId>org.openrdf.sesame</groupId>
  <artifactId>sesame-repository-http</artifactId>
  <version>${sesame.version}</version>
</dependency>

4.3. Logging: SLF4J initialization

Before you begin using any of the Sesame libraries, one important configuration step needs to be taken: the initialization and configuration of a logging framework.

Sesame uses the Simple Logging Facade for Java (SLF4J), which is a framework for abstracting from the actual logging implementation. SLF4J allows you, as a user of the Sesame framework, to plug in your own favorite logging implementation at deployment time. SLF4J supports the most popular logging implementations such as Java Logging, Apache Commons Logging, Logback, log4j, etc. See the SLF4J website for more info.

What you need to do is to determine/decide which logging implementation you (are going to) use and include the appropriate SLF4J logger adapter in your classpath. For example, if you decide to use Apache log4j, you need to include the SFL4J-Log4J adapter in your classpath. The SLF4J release packages includes adapters for various logging implementations; just download the SLF4J release package and include the appropriate adapter in your classpath (or, when using Maven, set the appropriate dependency); slf4j-log4j12-(version).jar, for example.

One thing to keep in mind when configuring logging is that SLF4J expects only a single logger implementation on the classpath. Thus, you should choose only a single logger. In addition, if parts of your code depend on projects that use other logging frameworks directly, you can include a Legacy Bridge which makes sure calls to the legacy logger get redirected to SLF4J (and from there on, to your logger of choice. When you set this up correctly, you can have a single logger configuration for your entire project.

Chapter 5. Server software installation

In this section, we explain how you can install a Sesame HTTP Server. You can skip this if you are not planning to run a Sesame server but intend to use Sesame as a library to program against.

5.1. Required software

The Sesame server software requires the following software:

  • Java 5 or newer
  • A Java Servlet Container that supports Java Servlet API 2.4 and Java Server Pages (JSP) 2.0, or newer. We recommend using a recent, stable version of Apache Tomcat. At the time of writing, this is either version 5.5.x or 6.x.

5.2. Sesame Server and Workbench installation

The Sesame 2 server software comes in the form of two Java Web Applications: OpenRDF Sesame Server and OpenRDF Workbench.

Sesame Server provides HTTP access to Sesame repositories and is meant to be accessed by other applications. Apart from some functionality to view the server's log messages, it doesn't provide any user oriented functionality. Instead, the user oriented functionality is part of OpenRDF Workbench. OpenRDF Workbench provides a web interface for querying, updating and exploring the repositories of a Sesame Server.

If you have not done so already, you will first need to download the Sesame 2 SDK. Both Sesame Server and OpenRDF Workbench can be found in the war directory of the SDK. The war-files in this directory need to be deployed in a Java Servlet Container (see Section 5.1, “Required software”). The deployment process is container-specific, please consult the documentation for your container on how to deploy a web application.

After you have deployed the Sesame Server webapp, you should be able to access it, by default, at path /openrdf-sesame. You can point your browser at this location to verify that the deployment succeeded[1]. Your browser should show the Sesame welcome screen as well as some options to view the server logs, among other things. Similarly, after deployment, the OpenRDF Workbench should be available at path /openrdf-workbench.

5.3. Logging Configuration

Both Sesame Server and OpenRDF Workbench use the Logback logging framework. In its default configuration, all Sesame Server log messages are sent to the log file [ADUNA_DATA]/OpenRDF Sesame/logs/main.log (and log messages for the Workbench to the same file in [ADUNA_DATA]/OpenRDF workbench). See Chapter 7, Application directory configuration for more info about data directories.

The default log level is INFO, indicating that only important status messages, warnings and errors are logged. The log level and -behaviour can be adjusted by modifying the [ADUNA_DATA]/OpenRDF Sesame/conf/logback.xml file. This file will be generated when the server is first run. Please consult the logback manual for configuration instructions.

5.4. Repository Configuration

A clean installation of a Sesame Server has a single repository by default: the SYSTEM repository. This SYSTEM repository contains all configuration data for the server, including data on which other repositories exists and (in future releases) the access rights on these repositories. This SYSTEM repository should not be used to store data that is not related to the server configuration.

The best way to create and manage repositories in a SYSTEM repository is to use the Sesame Console or OpenRDF Workbench. The Sesame Console is a command-line application for interacting with Sesame, see Chapter 6, Sesame Console.



[1] There is a known issue (SES-845) with deployment of the Sesame Server in Apache Tomcat version 7, where this link does not redirect to the correct location as expected. If you are running Sesameo on Tomcat 7, try accessing /openrdf-sesame/home/overview.view instead

Chapter 6. Sesame Console

This chapter describes Sesame Console, a command-line application for interacting with Sesame. For now, the best way to create and manage repositories in a SYSTEM repository is to use the Sesame Console.

6.1. Getting started

Sesame Console can be started using the console.bat/.sh scripts that can be found in the bin directory of the Sesame SDK. By default, the console will connect to the "default data directory", which contains the console's own set of repositories. See Chapter 7, Application directory configuration for more info on data directories.

The console can be operated by typing commands. Commands can span multiple lines and end with a '.' at the end of a line. For example, to get an overview of the available commands, type:

help.

To get help for a specific command, type 'help' followed by the command name, e.g.:

help connect.

6.2. Connecting to a set of repositories

As indicated in the previous section, the console connects to its own set of repositories by default. Using the connect command you can make the console connect to a Sesame Server or to a set of repositories on your file system. For example, to connect to a Sesame Server that is listening to port 8080 on localhost, enter the following command:

connect http://localhost:8080/openrdf-sesame.

6.3. Repository list

To get an overview of the repositories that are available in the set that your console is connected to, use the 'show' command:

show repositories.

6.4. Creating a repository

The 'create' command can be used to add new repositories to the set that the console is connected to. This command expects the name of a template that describes the repository's configuration. Currently, there are nine templates that are included with the console by default:

  • memory -- a memory based RDF repository
  • memory-rdfs -- a main-memory repository with RDF Schema inferencing
  • memory-rdfs-dt -- a main-memory repository with RDF Schema and direct type hierarchy inferencing
  • native -- a repository that uses on-disk data structure
  • native-rdfs -- a native repository with RDF Schema inferencing
  • native-rdfs-dt -- a native repository with RDF Schema and direct type hierarchy inferencing
  • remote -- a repository that serves as a proxy for a repository on a Sesame Server

When the 'create' command is executed, the console will ask you to fill in a number of parameters for the type of repository that you chose. For example, to create a native repository, you execute the following command:

create native.

The console will then ask you to provide an ID and title for the repository, as well as the triple indexes that need to be created for this kind of store. The values between square brackets indicate default values which you can select by simply hitting enter. The output of this dialogue looks something like this:

Please specify values for the following variables:
Repository ID [native]: myRepo
Repository title [Native store]: My repository
Triple indexes [spoc,posc]: 
Repository created

Please see Section 6.6, “Repository configuration” for more info on the repository configuration options.

6.5. Other commands

Please check the documentation that is provided by the console itself for help on how to use the other commands. Most commands should be self explanatory.

6.6. Repository configuration

6.6.1. Memory store configuration

A memory store is an RDF repository that stores its data in main memory. Apart from the standard ID and title parameters, this type of repository has a Persist and Sync delay parameter.

6.6.1.1. Memory Store persistence

The Persist parameter controls whether the memory store will use a data file for persistence over sessions. Persistent memory stores write their data to disk before being shut down and read this data back in the next time they are initialized. Non-persistent memory stores are always empty upon initialization.

6.6.1.2. Synchronization delay

By default, the memory store persistence mechanism synchronizes the disk backup directly upon any change to the contents of the store. That means that directly after an update operation (upload, removal) completes, the disk backup is updated. It is possible to configure a synchronization delay however. This can be useful if your application performs several transactions in sequence and you want to prevent disk synchronization in the middle of this sequence to improve update performance.

The synchronization delay is specified by a number, indicating the time in milliseconds that the store will wait before it synchronizes changes to disk. The value 0 indicates that there should be no delay. Negative values can be used to postpone the synchronization indefinitely, i.e. until the store is shut down.

6.6.2. Native store configuration

A native store stores and retrieves its data directly to/from disk. The advantage of this over the memory store is that it scales much better as it is not limited to the size of available memory. Of course, since it has to access the disk, it is also slower than the in-memory store, but it is a good solution for larger data sets.

6.6.2.1. Native store indexes

The native store uses on-disk indexes to speed up querying. It uses B-Trees for indexing statements, where the index key consists of four fields: subject (s), predicate (p), object (o) and context (c). The order in which each of these fields is used in the key determines the usability of an index on a specify statement query pattern: searching statements with a specific subject in an index that has the subject as the first field is signifantly faster than searching these same statements in an index where the subject field is second or third. In the worst case, the 'wrong' statement pattern will result in a sequential scan over the entire set of statements.

By default, the native repository only uses two indexes, one with a subject-predicate-object-context (spoc) key pattern and one with a predicate-object-subject-context (posc) key pattern. However, it is possible to define more or other indexes for the native repository, using the Triple indexes parameter. This can be used to optimize performance for query patterns that occur frequently.

The subject, predicate, object and context fields are represented by the characters 's', 'p', 'o' and 'c' respectively. Indexes can be specified by creating 4-letter words from these four characters. Multiple indexes can be specified by separating these words with commas, spaces and/or tabs. For example, the string "spoc, posc" specifies two indexes; a subject-predicate-object-context index and a predicate-object-subject-context index.

Creating more indexes potentially speeds up querying (a lot), but also adds overhead for maintaining the indexes. Also, every added index takes up additional disk space.

The native store automatically creates/drops indexes upon (re)initialization, so the parameter can be adjusted and upon the first refresh of the configuration the native store will change its indexing strategy, without loss of data.

6.6.3. HTTP repository configuration

An HTTP repository is not an actual store by itself, but serves as a proxy for a store on a (remote) Sesame server. Apart from the standard ID and title parameters, this type of repository has a Sesame server location and a Remote repository ID parameter.

6.6.3.1. Sesame server location

This parameter specifies the URL of the Sesame Server that the repository should communicate with. Default value is http://localhost:8080/openrdf-sesame, which corresponds to a Sesame Server that is running on your own machine.

6.6.3.2. Remote repository ID

This is the ID of the remote repository that the HTTP repository should communicate with. Please note an HTTP repository in the Console has two repository ID parameters: one identifying the remote repository and one that specifies the HTTP repository's own ID.

6.7. Repository configuration templates (advanced)

In Sesame, repository configurations with all their parameters are modeled in RDF and stored in the SYSTEM repository. So, in order to create a new repository, the Console needs to create such an RDF document and submit it to the SYSTEM repository. The Console uses so called repository configuration templates to accomplish this.

Repository configuration templates are simple Turtle RDF files that describe a repository configuration, where some of the parameters are replaced with variables. The Console parses these templates and asks the user to supply values for the variables. The variables are then substituted with the specified values, which produces the required configuration data.

The Sesame Console comes with a number of default templates, which are listed in Section 6.4, “Creating a repository”. The Console tries to resolve the parameter specified with the 'create' command (e.g. "memory") to a template file with the same name (e.g. "memory.ttl"). The default templates are included in Console library, but the Console also looks in the templates subdirectory of [ADUNA_DATA]. You can define your own templates by placing template files in this directory.

To create your own templates, it's easiest to start with an existing template and modify that to your needs. The default "memory.ttl" template looks like this:

#
# Sesame configuration template for a main-memory repository
#
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rep: <http://www.openrdf.org/config/repository#>.
@prefix sr: <http://www.openrdf.org/config/repository/sail#>.
@prefix sail: <http://www.openrdf.org/config/sail#>.
@prefix ms: <http://www.openrdf.org/config/sail/memory#>.

[] a rep:Repository ;
   rep:repositoryID "{%Repository ID|memory%}" ;
   rdfs:label "{%Repository title|Memory store%}" ;
   rep:repositoryImpl [
      rep:repositoryType "openrdf:SailRepository" ;
      sr:sailImpl [
         sail:sailType "openrdf:MemoryStore" ;
         ms:persist {%Persist|true|false%} ;
         ms:syncDelay {%Sync delay|0%}
      ]
   ].

Template variables are written down as {%var name%} and can specify zero or more values, seperated by vertical bars ("|"). If one value is specified then this value is interpreted as the default value for the variable. The Console will use this default value when the user simply hits the Enter key. If multiple variable values are specified, e.g. {%Persist|true|false%}, then this is interpreted as set of all possible values. If the user enters an unspecified value then that is considered to be an error. The value that is specified first is used as the default value.

The URIs that are used in the templates are the URIs that are specified by the RepositoryConfig and SailConfig classes of Sesame's repository configuration mechanism. The relevant namespaces and URIs can be found in these javadoc or source of these classes.

Chapter 7. Application directory configuration

In this chapter, we explain how to change the default data directory for OpenRDF applications. You can skip this chapter if you only use Sesame as a library, or if you consider the defaults to be fine.

All OpenRDF applications (Sesame Server, Workbench, and Console) store configuration files and repository data in a single directory (with subdirectories). On Windows machines, this directory is %APPDATA%\Aduna\ by default, where %APPDATA% is the application data directory of the user that runs the application. For example, in case the application runs under the 'LocalService' user account on Windows XP, the directory is C:\Documents and Settings\LocalService\Application Data\Aduna\. On Linux/UNIX, the default location is $HOME/.aduna/, for example /home/tomcat/.aduna/. We will refer to this data directory as [ADUNA_DATA] in the rest of this manual.

The location of this data directory can be reconfigured using the Java system property info.aduna.platform.appdata.basedir. When you are using Tomcat as the servlet container then you can set this property using the JAVA_OPTS parameter, for example:

  • set JAVA_OPTS=-Dinfo.aduna.platform.appdata.basedir=\path\to\other\dir\ (on Windows)
  • export JAVA_OPTS='-Dinfo.aduna.platform.appdata.basedir=/path/to/other/dir/' (on Linux/UNIX)

If you are using Apache Tomcat as a Windows Service you should use the Windows Services configuration tool to set this property. Other users can either edit the Tomcat startup script or set the property some other way.

One easy way to find out what the directory is in a running instance of the Sesame Server, is to go to http://localhost:8080/openrdf-sesame in your browser and click on 'System' in the navigation menu on the left. The data directory will be listed as one of the configuration settings of the current server.

Chapter 8. Basic Programming with Sesame

In this chapter, we introduce the basics of programming with the Sesame framework. We assume that you have at least a basic understanding of programming in Java and of how the Resource Description Framework models data.

8.1. The RDF Model API

The core of the Sesame framework is the RDF Model API (see the Model API Javadoc). This API defines how the building blocks of RDF (statements, URIs, blank nodes, literals) are represented.

RDF statements are represented by the org.openrdf.model.Statement interface. Each Statement has a subject, predicate, object and (optionally) a context (more about contexts below, in the section about the Repository API). Each of these 4 items is a org.openrdf.model.Value. The Value interface is further specialized into org.openrdf.model.Resource, and org.openrdf.model.Literal. Resource represents any RDF value that is either a blank node or a URI (in fact, it specializes further into org.openrdf.model.URI and org.openrdf.model.BNode). Literal represents RDF literal values (strings, dates, integer numbers, and so on).

To create new values and statements, we can use a org.openrdf.model.ValueFactory. You can use a default ValueFactory implementation called org.openrdf.model.impl.ValueFactoryImpl:

ValueFactory factory = ValueFactoryImpl.getInstance();

You can also obtain a ValueFactory from the Repository you are working with, and in fact, this is the recommend approach. More about that in the next section.

Regardless of how you obtain your ValueFactory, once you have it, you can use it to create new URIs, Literals, and Statements:

URI bob = factory.createURI("http://example.org/bob");
URI name = factory.createURI("http://example.org/name");
Literal bobsName = factory.createLiteral("Bob");
Statement nameStatement = factory.createStatement(bob, name, bobsName);

The Model API also provides pre-defined URIs for several well-known vocabularies, such as RDF, RDFS, OWL, DC (Dublin Core), FOAF (Friend-of-a-Friend), and more. These constants can all be found in the org.openrdf.model.vocabulary package, and can be quite handy in quick creation of RDF statements (or in querying a Repository, as we shall see later):

Statement typeStatement = factory.createStatement(bob, RDF.TYPE, FOAF.PERSON);

8.2. The Repository API

The Repository API is the central access point for Sesame repositories. Its purpose is to give a developer-friendly access point to RDF repositories, offering various methods for querying and updating the data, while hiding a lot of the nitty gritty details of the underlying machinery.

The interfaces for the Repository API can be found in package org.openrdf.repository. Several implementations for these interface exist in various sub-packages. The Javadoc reference for the API is available online and can also be found in the doc directory of the download.

If you need more information about how to set up your environment for working with the Sesame APIs, take a look at Chapter 4, Setting up to use the Sesame libraries.

8.2.1. Creating a Repository object

The first step in any action that involves Sesame repositories is to create a Repository for it.

The central interface of the repository API is the Repository interface. There are several implementations available of this interface. The three main ones are:

  • org.openrdf.repository.sail.SailRepository is a Repository that operates directly on top of a Sail. This is the class most commonly used when accessing/creating a local Sesame repository. SailRepository operates on a (stack of) Sail object(s) for storage and retrieval of RDF data. An important thing to remember is that the behaviour of a repository is determined by the Sail(s) that it operates on; for example, the repository will only support RDF Schema or OWL semantics if the Sail stack includes an inferencer for this.
  • org.openrdf.repository.http.HTTPRepository is, as the name implies, a Repository implementation that acts as a proxy to a Sesame repository available on a remote Sesame server, accessible through HTTP.
  • org.openrdf.repository.sparql.SPARQLRepository is a Repository implementation that acts as a proxy to any remote SPARQL endpoint (whether that endpoint is implemented using Sesame or not).

In the following section, we will first take a look at the use of the SailRepository class in order to create and use a local Sesame repository.

8.2.1.1. Creating a main memory RDF Repository

One of the simplest configurations is a repository that just stores RDF data in main memory without applying any inferencing. This is also by far the fastest type of repository that can be used. The following code creates and initializes a non-inferencing main-memory repository:

import org.openrdf.repository.Repository;
import org.openrdf.repository.sail.SailRepository;
import org.openrdf.sail.memory.MemoryStore;

...

Repository repo = new SailRepository(new MemoryStore());
repo.initialize();

The constructor of the SailRepository class accepts any object of type Sail, so we simply pass it a new main-memory store object (which is, of course, a Sail implementation). Following this, the repository needs to be initialized to prepare the Sail(s) that it operates on.

The repository that is created by the above code is volatile: its contents are lost when the object is garbage collected or when your Java program is shut down. This is fine for cases where, for example, the repository is used as a means for manipulating an RDF model in memory.

Different types of Sail objects take parameters in their constructor that change their behaviour. The MemoryStore for example takes a data directory parameter that specifies a data directory for persisent storage. If specified, the MemoryStore will write its contents to this directory so that it can restore it when it is re-initialized in a future session:

File dataDir = new File("C:\\temp\\myRepository\\");
Repository repo = new SailRepository( new MemoryStore(dataDir) );
repo.initialize();

As you can see, we can fine-tune the configuration of our repository by passing parameters to the constructor of the Sail object. Some Sail types may offer additional configuration methods, all of which need to be called before the repository is initialized. The MemoryStore currently has one such method: setSyncDelay(long), which can be used to control the strategy that is used for writing to the data file, e.g.:

File dataDir = new File("C:\\temp\\myRepository\\");
MemoryStore memStore = new MemoryStore(dataDir);
memStore.setSyncDelay(1000L);

Repository repo = new SailRepository(memStore);
repo.initialize();

8.2.1.2. Creating a Native RDF Repository

A Native RDF Repository does not keep its data in main memory, but instead stores it directly to disk (in a binary format optimized for compact storage and fast retrieval). It is an efficient, scalable and fast solution for RDF storage of datasets that are too large to keep entirely in memory.

The code for creation of a Native RDF repository is almost identical to that of a main memory repository:

import org.openrdf.repository.Repository;
import org.openrdf.repository.sail.SailRepository;
import org.openrdf.sail.nativerdf.NativeStore;


...
File dataDir = new File("/path/to/datadir/");
Repository repo = new SailRepository(new NativeStore(dataDir));
repo.initialize();

By default, the Native store creates a set of two indexes (see Section 6.6.2, “Native store configuration”). To configure which indexes it should create, we can either use the NativeStore.setTripleIndexes(String) method, or we can directly supply a index configuration string to the constructor:

import org.openrdf.repository.Repository;
import org.openrdf.repository.sail.SailRepository;
import org.openrdf.sail.nativerdf.NativeStore;


...
File dataDir = new File("/path/to/datadir/");
String indexes = "spoc,posc,cosp";
Repository repo = new SailRepository(new NativeStore(dataDir, indexes));
repo.initialize();

8.2.1.3. Creating a repository with RDF Schema inferencing

As we have seen, we can create Repository objects for any kind of back-end store by passing them a reference to the appropriate Sail object. We can pass any stack of Sails this way, allowing all kinds of repository configurations to be created quite easily. For example, to stack an RDF Schema inferencer on top of a memory store, we simply create a repository like so:

import org.openrdf.repository.Repository;
import org.openrdf.repository.sail.SailRepository;
import org.openrdf.sail.memory.MemoryStore;
import org.openrdf.sail.inferencer.fc.ForwardChainingRDFSInferencer;

...

Repository repo = new SailRepository(
                          new ForwardChainingRDFSInferencer(
                          new MemoryStore()));
repo.initialize();

Each layer in the Sail stack is created by a constructor that takes the underlying Sail as a parameter. Finally, we create the SailRepository object as a functional wrapper around the Sail stack.

The ForwardChainingRDFSInferencer that is used in this example is a generic RDF Schema inferencer; it can be used on top of any Sail that supports the methods it requires. Both MemoryStore and NativeStore support these methods. However, a word of warning: the Sesame inferencers add a significant performance overhead when adding and removing data to a repository, an overhead that gets progressively worse as the total size of the repository increases. For small to medium-sized datasets it peforms fine, but for larger datasets you are advised not to use it and to switch to alternatives.

8.2.1.4. Accessing a remote repository

Working with remote repositories is just as easy as working with local ones. We can simply use a different Repository object, the HTTPRepository, instead of the SailRepository class.

A requirement is of course that there is a Sesame 2 server running on some remote system, which is accessible over HTTP. For example, suppose that at http://example.org/openrdf-sesame/ a Sesame server is running, which has a repository with the identification 'example-db'. We can access this repository in our code as follows:

import org.openrdf.repository.Repository;
import org.openrdf.repository.http.HTTPRepository;

...

String sesameServer = "http://example.org/openrdf-sesame/";
String repositoryID = "example-db";

Repository repo = new HTTPRepository(sesameServer, repositoryID);
repo.initialize();

8.2.2. Using a repository: RepositoryConnections

Now that we have created a Repository, we want to do something with it. In Sesame 2, this is achieved through the use of RepositoryConnection objects, which can be created by the Repository.

A RepositoryConnection represents - as the name suggests - a connection to the actual store. We can issue operations over this connection, and close it when we are done to make sure we are not keeping resources unnnecessarily occupied.

In the following sections, we will show some examples of basic operations.

8.2.2.1. Adding RDF to a repository

The Repository API offers various methods for adding data to a repository. Data can be added by specifying the location of a file that contains RDF data, and statements can be added individually or in collections.

We perform operations on a repository by requesting a RepositoryConnection from the repository. On this RepositoryConnection object we can perform various operations, such as query evaluation, getting, adding, or removing statements, etc.

The following example code adds two files, one local and one available through HTTP, to a repository:

import org.openrdf.OpenRDFException;
import org.openrdf.repository.Repository;
import org.openrdf.repository.RepositoryConnection;
import org.openrdf.rio.RDFFormat;
import java.io.File;
import java.net.URL;

...

File file = new File("/path/to/example.rdf");
String baseURI = "http://example.org/example/local";

try {
   RepositoryConnection con = repo.getConnection();
   try {
      con.add(file, baseURI, RDFFormat.RDFXML);

      URL url = new URL("http://example.org/example/remote.rdf");
      con.add(url, url.toString(), RDFFormat.RDFXML);
   }
   finally {
      con.close();
   }
}
catch (OpenRDFException e) {
   // handle exception
}
catch (java.io.IOEXception e) {
   // handle io exception
}

More information on other available methods can be found in the javadoc reference of the RepositoryConnection interface.

8.2.2.2. Querying a repository

The Repository API has a number of methods for creating and evaluating queries. Three types of queries are distinguished: tuple queries, graph queries and boolean queries. The query types differ in the type of results that they produce.

The result of a tuple query is a set of tuples (or variable bindings), where each tuple represents a solution of a query. This type of query is commonly used to get specific values (URIs, blank nodes, literals) from the stored RDF data. SPARQL SELECT queries are tuple queries.

The result of graph queries is an RDF graph (or set of statements). This type of query is very useful for extracting sub-graphs from the stored RDF data, which can then be queried further, serialized to an RDF document, etc. SPARQL CONSTRUCT and DESCRIBE queries are graph queries.

The result of boolean queries is a simple boolean value, i.e. true or false. This type of query can be used to check if a repository contains specific information. SPARQL ASK queries are boolean queries.

Note: Sesame 2 currently supports two query languages: SeRQL and SPARQL. The former is explained in Chapter 11, The SeRQL query language (revision 3.1), the specification for the latter is available online. In this chapter, we will use SPARQL queries in our examples.

8.2.2.2.1. Evaluating a tuple query

To evaluate a tuple query we simply do the following:

import java.util.List;
import org.openrdf.OpenRDFException;
import org.openrdf.repository.RepositoryConnection;
import org.openrdf.query.TupleQuery;
import org.openrdf.query.TupleQueryResult;
import org.openrdf.query.BindingSet;
import org.openrdf.query.QueryLanguage;

...

try {
   RepositoryConnection con = repo.getConnection();
   try {
	  String queryString = "SELECT ?x ?y WHERE { ?x ?p ?y } ";
	  TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SPARQL, queryString);

	  TupleQueryResult result = tupleQuery.evaluate();
	  try {
			BindingSet bindingSet = result.next();
			Value valueOfX = bindingSet.getValue("x");
			Value valueOfY = bindingSet.getValue("y");

			// do something interesting with the values here...
	  }
	  finaly {
	      result.close();
	  }
   }
   finally {
      con.close();
   }
}
catch (OpenRDFException e) {
   // handle exception
}

This evaluates a SPARQL query and returns a TupleQueryResult, which consists of a sequence of BindingSet objects. Each BindingSet contains a set of Binding objects. A binding is pair relating a name (as used in the query's SELECT clause) with a value.

As you can see, we use the TupleQueryResult to iterate over all results and get each individual result for x and y. We retrieve values by name rather than by an index. The names used should be the names of variables as specified in your query (note that we leave out the '?' or '$' prefixes used in SPARQL). The TupleQueryResult.getBindingNames() method returns a list of binding names, in the order in which they were specified in the query. To process the bindings in each binding set in the order specified by the projection, you can do the following:

List<String> bindingNames = result.getBindingNames();
while (result.hasNext()) {
   BindingSet bindingSet = result.next();
   Value firstValue = bindingSet.getValue(bindingNames.get(0));
   Value secondValue = bindingSet.getValue(bindingNames.get(1));

   // do something interesting with the values here...
}

Finally, it is important to invoke the close() operation on the TupleQueryResult, after we are done with it. A TupleQueryResult evaluates lazily and keeps resources (such as connections to the underlying database) open. Closing the TupleQueryResult frees up these resources. Do not forget that iterating over a result may cause exceptions! The best way to make sure no connections are kept open unnecessarily is to invoke close() in the finally clause.

An alternative to producing a TupleQueryResult is to supply an object that implements the TupleQueryResultHandler interface to the query's evaluate() method. The main difference is that when using a return object, the caller has control over when the next answer is retrieved, whereas with the use of a handler, the connection simply pushes answers to the handler object as soon as it has them available.

As an example we will use SPARQLResultsXMLWriter, which is a TupleQueryResultHandler implementation that writes SPARQL Results XML documents to an output stream or to a writer:

import org.openrdf.query.resultio.sparqlxml.SPARQLResultsXMLWriter;

...

FileOutputStream out = new FileOutputStream("/path/to/result.srx");
try {
   SPARQLResultsXMLWriter sparqlWriter = new SPARQLResultsXMLWriter(out);

   RepositoryConnection con = myRepository.getConnection();
   try {
      String queryString = "SELECT * FROM {x} p {y}";
      TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SERQL, queryString);
      tupleQuery.evaluate(sparqlWriter);
   }
   finally {
      con.close();
   }
}
finally {
   out.close();
}

You can just as easily supply your own application-specific implementation of TupleQueryResultHandler though.

Lastly, an important warning: as soon as you are done with the RepositoryConnection object, you should close it. Notice that during processing of the TupleQueryResult object (for example, when iterating over its contents), the RepositoryConnection should still be open. We can invoke con.close() after we have finished with the result.

8.2.2.2.2. Evaluating a graph query

The following code evaluates a graph query on a repository:

import org.openrdf.query.GraphQueryResult;

GraphQueryResult graphResult = con.prepareGraphQuery(
QueryLanguage.SPARQL, "CONSTRUCT { ?s ?p ?o } WHERE {?s ?p ?o }".evaluate();

A GraphQueryResult is similar to TupleQueryResult in that is an object that iterates over the query results. However, for graph queries the query results are RDF statements, so a GraphQueryResult iterates over Statement objects:

while (graphResult.hasNext()) {
   Statement st = graphResult.next();
   // ... do something with the resulting statement here.
}

The TupleQueryResultHandler equivalent for graph queries is org.openrdf.rio.RDFHandler. Again, this is a generic interface, each object implementing it can process the reported RDF statements in any way it wants.

All Rio writers (such as the RDFXMLWriter, TurtleWriter, TriXWriter, etc.) implement the RDFHandler interface. This allows them to be used in combination with querying quite easily. In the following example, we use a TurtleWriter to write the result of a SPARQL graph query to standard output in Turtle format:

import org.openrdf.rio.Rio;
import org.openrdf.rio.RDFFormat;
import org.openrdf.rio.RDFWriter;

...

RepositoryConnection con = repo.getConnection();
try {
   RDFWriter writer = Rio.createWriter(RDFFormat.TURTLE, System.out);

	con.prepareGraphQuery(QueryLanguage.SPARQL, 
	     "CONSTRUCT {?s ?p ?o } WHERE {?s ?p ?o } ").evaluate(writer);
}
finally {
   con.close();
}

Again, note that as soon as we are done with the result of the query (either after iterating over the contents of the GraphQueryResult or after invoking the RDFHandler), we invoke con.close() to close the connection and free resources.

8.2.2.2.3. Preparing and Reusing Queries

In the previous sections we have simply created a query from a string and immediately evaluated it. However, the prepareTupleQuery and prepareGraphQuery methods return objects of type Query, specifically TupleQuery and GraphQuery.

A Query object, once created, can be (re)used. For example, we can evaluate a Query object, then add some data to our repository, and evaluate the same query again.

The Query object also has a setBinding method, which can be used to specify specific values for query variables. As a simple example, suppose we have a repository containing names and e-mail addresses of people, and we want to do a query for each person, retrieve his/her e-mail address, for example, but we want to do a separate query for each person. This can be achieved using the setBinding functionality, as follows:

RepositoryConnection con = repo.getConnection();
try {
   // First, prepare a query that retrieves all names of persons
   TupleQuery nameQuery = con.prepareTupleQuery(QueryLanguage.SPARQL,
         "SELECT ?name WHERE { ?person ex:name ?name . }");

   // Then, prepare another query that retrieves all e-mail addresses of persons:
   TupleQuery mailQuery = con.prepareTupleQuery(QueryLanguage.SPARQL,
         "SELECT ?mail WHERE { ?person ex:mail ?mail ; ex:name ?name . }");

   // Evaluate the first query to get all names
   TupleQueryResult nameResult = nameQuery.evaluate();
   try {
      // Loop over all names, and retrieve the corresponding e-mail address.
      while (nameResult.hasNext()) {
         BindingSet bindingSet = nameResult.next();
         Value name = bindingSet.get("name");

         // Retrieve the matching mailbox, by setting the binding for
			// the variable 'name' to the retrieved value. Note that we 
			// can set the same binding name again for each iteration, it will 
			// overwrite the previous setting.
         mailQuery.setBinding("name", name);

         TupleQueryResult mailResult = mailQuery.evaluate();

         // mailResult now contains the e-mail addresses for one particular person
         try {
            ....
         }
         finally {
            // after we are done, close the result
            mailResult.close();
         }
      }
   }
   finally {
      nameResult.close();
   }
}
finally {
   con.close();
}

The values with which you perform the setBinding operation of course do not necessarily have to come from a previous query result (as they do in the above example). Using a ValueFactory you can create your own value objects. You can use this functionality to, for example, query for a particular keyword that is given by user input:

ValueFactory factory = myRepository.getValueFactory();

// In this example, we specify the keyword string. Of course, this
// could just as easily be obtained by user input, or by reading from
// a file, or...
String keyword = "foobar";

// We prepare a query that retrieves all documents for a keyword.
// Notice that in this query the 'keyword' variable is not bound to
// any specific value yet.
TupleQuery keywordQuery = con.prepareTupleQuery(QueryLanguage.SPARQL,
      "SELECT ?document WHERE { ?document ex:keyword ?keyword . }");

// Then we set the binding to a literal representation of our keyword.
// Evaluation of the query object will now effectively be the same as
// if we had specified the query as follows:
//   SELECT ?document WHERE { ?document ex:keyword "foobar". }
keywordQuery.setBinding("keyword", factory.createLiteral(keyword));

// We then evaluate the prepared query and can process the result:
TupleQueryResult keywordQueryResult = keywordQuery.evaluate();

8.2.2.3. Creating, retrieving, removing individual statements

The RepositoryConnection can also be used for adding, retrieving, removing or otherwise manipulating individual statements, or sets of statements.

To be able to add new statements, we can use a ValueFactory to create the Values out of which the statements consist. For example, we want to add a few statements about two resources, Alice and Bob:

import org.openrdf.model.vocabulary.RDF;
import org.openrdf.model.vocabulary.RDFS;
...

ValueFactory f = myRepository.getValueFactory();

// create some resources and literals to make statements out of
URI alice = f.createURI("http://example.org/people/alice");
URI bob = f.createURI("http://example.org/people/bob");
URI name = f.createURI("http://example.org/ontology/name");
URI person = f.createURI("http://example.org/ontology/Person");
Literal bobsName = f.createLiteral("Bob");
Literal alicesName = f.createLiteral("Alice");

try {
   RepositoryConnection con = myRepository.getConnection();

   try {
      // alice is a person
      con.add(alice, RDF.TYPE, person);
      // alice's name is "Alice"
      con.add(alice, name, alicesName);

      // bob is a person
      con.add(bob, RDF.TYPE, person);
      // bob's name is "Bob"
      con.add(bob, name, bobsName);
   }
   finally {
      con.close();
   }
}
catch (OpenRDFException e) {
   // handle exception
}

Of course, it will not always be necessary to use a ValueFactory to create URIs. In practice, you will find that you quite often retrieve existing URIs from the repository (for example, by evaluating a query) and then use those values to add new statements. Or indeed, as we have seen in Section 8.1, “The RDF Model API”, for several well-knowns vocabularies we can simply reuse the predefined constants found in the org.openrdf.model.vocabulary package.

Retrieving statements works in a very similar way. One way of retrieving statements we have already seen actually: we can get a GraphQueryResult containing statements by evaluating a graph query. However, we can also use direct method calls to retrieve (sets of) statements. For example, to retrieve all statements about Alice, we could do:

RepositoryResult<Statement> statements = con.getStatements(alice, null, null, true);

The additional boolean parameter at the end (set to 'true' in this example) indicates whether inferred triples should be included in the result. Of course, this parameter only makes a difference if your repository uses an inferencer.

The RepositoryResult is an iterator-like object that lazily retrieves each matching statement from the repository when its next() method is called. Note that, like is the case with QueryResult objects, iterating over a RepositoryResult may result in exceptions which you should catch to make sure that the RepositoryResult is always properly closed after use:

RepositoryResult<Statement> statements = con.getStatements(alice, null, null, true);

try {
   while (statements.hasNext()) {
      Statement st = statements.next();

      ... // do something with the statement
   }
}
finally {
   statements.close(); // make sure the result object is closed properly
}

In the above method invocation, we see four parameters being passed. The first three represent the subject, predicate and object of the RDF statements which should be retrieved. A null value indicates a wildcard, so the above method call retrieves all statements which have as their subject Alice, and have any kind of predicate and object. The fourth parameter indicates whether or not inferred statements should be included or not.

Removing statements again works in a very similar fashion. Suppose we want to retract the statement that the name of Alice is "Alice"):

con.remove(alice, name, alicesName);

Or, if we want to erase all statements about Alice completely, we can do:

con.remove(alice, null, null);

8.2.3. Working with Graphs, Collections and Iterations

Most of these examples have been on the level of individual statements. However, the Repository API offers several methods that work with Collections of statements, allowing more batch-like update operations.

For example, in the following bit of code, we first retrieve all statements about Alice, put them in a org.openrdf.model.Graph (which is an implementation of java.util.Collection) and then remove them:

import info.aduna.iteration.Iterations;
import org.openrdf.model.Graph;
import org.openrdf.model.impl.GraphImpl;

// Retrieve all statements about Alice and put them in a Graph
RepositoryResult<Statement> statements = con.getStatements(alice, null, null, true));
Graph aboutAlice = Iterations.addAll(statements, new GraphImpl());

// Then, remove them from the repository
con.remove(aboutAlice);

As you can see, the info.aduna.iteration.Iterations class provides a convenient method that takes an Iteration (of which RepositoryResult is a subclass) and a Collection (of which GraphImpl is a subclass) as input, and returns the Collection with the contents of the iterator added to it. It also automatically closes the Iteration for you.

In the above code, you first retrieve all statements, put them in a list, and then remove them. Although this works fine, it can be done in an easier fashion, by simply supplying the resulting object directly:

con.remove(con.getStatements(alice, null, null, true));

The RepositoryConnection interface has several variations of add, retrieve and remove operations. See the RepositoryConnection Javadoc for a full overview of the options.

8.2.3.1. More on Graphs and GraphUtil

In the above example, we used a Graph (see Graph Javadoc) as a collection class for statements. The Graph class has several advantages over using just any old Collection class. First of all, it provides a match method that can be used to retrieve statements from the collection that match a specific subject, predicate and object. In addition, you can use GraphUtil to easily retrieve specific parts of information from the statement collection.

For example, imagine we have a Graph containing names and e-mails for several persons (using the FOAF vocabulary). To easily retrieve each e-mail address for each person, we can do something like this:

import org.openrdf.model.Graph;
import org.openrdf.model.Literal;
import org.openrdf.model.Resource;
import org.openrdf.model.util.GraphUtil;
import org.openrdf.model.vocabulary.RDF;
import org.openrdf.model.vocabulary.FOAF;

Graph graph = ... ; // we initialized our graph before (for example by doing a query on our repository)

for (Resource subject: GraphUtil.getSubjects(graph, RDF.TYPE, FOAF.PERSON)) {
   Literal nameOfSubject = GraphUtil.getUniqueObjectLiteral(graph, subject, FOAF.NAME); 
   Literal mboxOfSubject = GraphUtil.getUniqueObjectLiteral(graph, subject, FOAF.MBOX); 
}

8.2.4. Using context

Sesame 2 supports the notion of context, which you can think of as a way to group sets of statements together through a single group identifier (this identifier can be a blank node or a URI).

A very typical way to use context is tracking provenance of the statements in a repository, that is, which file these statements originate from. For example, consider an application where you add RDF data from different files to a repository, and then one of those files is updated. You would then like to replace the data from that single file in the repository, and to be able to do this you need a way to figure out which statements need to be removed. The context mechanism gives you a way to do that.

In the following example, we add an RDF document from the Web to our repository, in a context. In the example, we make the context identifier equal to the Web location of the file being uploaded.

String location = "http://example.org/example/example.rdf";
String baseURI = location;
URL url = new URL(location);
URI context = f.createURI(location);

con.add(url, baseURI, RDFFormat.RDFXML, context);

We can now use the context mechanism to specifically address these statements in the repository for retrieve and remove operations:

// Get all statements in the context
RepositoryResult<Statement> result = con.getStatements(null, null, null, true, context);

try {
   while (result.hasNext()) {
      Statement st = result.next();
      ... // do something interesting with the result
   }
}
finally {
   result.close();
}

// Export all statements in the context to System.out, in RDF/XML format
RDFHandler rdfxmlWriter = new RDFXMLWriter(System.out);
con.export(context, rdfxmlWriter);

// Remove all statements in the context from the repository
con.clear(context);

In most methods in the Repository API, the context parameter is a vararg, meaning that you can specify an arbitrary number (zero, one, or more) of context identifiers. This way, you can combine different contexts together. For example, we can very easily retrieve statements that appear in either 'context1' or 'context2'.

In the following example we add information about Bob and Alice again, but this time each has their own context. We also create a new property called 'creator' that has as its value the name of the person who is the creator a particular context. The knowledge about creators of contexts we do not add to any particular context, however:

URI context1 = f.createURI("http://example.org/context1");
URI context2 = f.createURI("http://example.org/context2");
URI creator = f.createURI("http://example.org/ontology/creator");

// Add stuff about Alice to context1
con.add(alice, RDF.TYPE, person, context1);
con.add(alice, name, alicesName, context1);

// Alice is the creator of context1
con.add(context1, creator, alicesName);

// Add stuff about Bob to context2
con.add(bob, RDF.TYPE, person, context2);
con.add(bob, name, bobsName, context2);

// Bob is the creator of context2
con.add(context2, creator, bobsName);

Once we have this information in our repository, we can retrieve all statements about either Alice or Bob by using the context vararg:

// Get all statements in either context1 or context2
RepositoryResult<Statement> result =
      con.getStatements(null, null, null, true, context1, context2);

You should observe that the above RepositoryResult will not contain the information that context1 was created by Alice and context2 by Bob. This is because those statements were added without any context, thus they do not appear in context1 or context2, themselves.

To explicitly retrieve statements that do not have an associated context, we do the following:

// Get all statements that do not have an associated context
RepositoryResult<Statement> result =
      con.getStatements(null, null, null, true, (Resource)null);

This will give us only the statements about the creators of the contexts, because those are the only statements that do not have an associated context. Note that we have to explicitly cast the null argument to Resource, because otherwise it is ambiguous whether we are specifying a single value or an entire array that is null (a vararg is internally treated as an array). Simply invoking getStatements(s, p, o, true, null) without an explicit cast will result in an IllegalArgumentException.

We can also get everything that either has no context or is in context1:

// Get all statements that do not have an associated context, or that are in context1
RepositoryResult<Statement> result =
      con.getStatements(null, null, null, true, (Resource)null, context1);

So as you can see, you can freely combine contexts in this fashion.

Important:

getStatements(null, null, null, true);

is not the same as:

getStatements(null, null, null, true, (Resource)null);

The former (without any context id parameter) retrieves all statements in the repository, ignoring any context information. The latter, however, only retrieves statements that explicitly do not have any associated context.

8.2.5. Transactions

So far, we have shown individual operations on repositories: adding statements, removing them, etc. By default, each operation on a RepositoryConnection is immediately sent to the store and committed.

The RepositoryConnection interface supports a full transactional mechanism that allows one to group modification operations together and treat them as a single update: before the transaction is committed, none of the operations in the transaction has taken effect, and after, they all take effect. If something goes wrong at any point during a transaction, it can be rolled back so that the state of the repository is the same as before the transaction started. Bundling update operations in a single transaction often also improves update performance compared to multiple smaller transactions.

We can indicate that we want to begin a transaction by using the RepositoryConnection.begin() method. In the following example, we use a connection to bundle two file addition operations in a single transaction:

File inputFile1 = new File("/path/to/example1.rdf");
String baseURI1 = "http://example.org/example1/";

File inputFile2 = new File("/path/to/example2.rdf");
String baseURI2 = "http://example.org/example2/";

RepositoryConnection con = myRepository.getConnection();
try {
   con.begin();

   // Add the first file
   con.add(inputFile1, baseURI1, RDFFormat.RDFXML);

   // Add the second file
   con.add(inputFile2, baseURI2, RDFFormat.RDFXML);

   // If everything went as planned, we can commit the result
   con.commit();
}
catch (RepositoryException e) {
   // Something went wrong during the transaction, so we roll it back
   con.rollback();
}
finally {
   // Whatever happens, we want to close the connection when we are done.
   con.close();
}

In the above example, we use a transaction to add two files to the repository. Only if both files can be successfully added will the repository change. If one of the files can not be added (for example because it can not be read), then the entire transaction is cancelled and none of the files is added to the repository.

It's important to note that a RepositoryConnection only supports one active transaction at a time. You can check at any time whether a transaction is active on your connection by using the isActive() method. If you find you need to cater for concurrent transactions, you will need to use separate RepositoryConnections.

Chapter 9. Parsing/Writing RDF with Rio

The Sesame framework includes a set of parsers and writers called Rio (see Rio Javadoc). Rio (a rather imaginative acronym for “RDF I/O”) is a toolkit that can be used independently from the rest of Sesame. In this chapter, we will take a look at various ways to use Rio to parse from or write to an RDF document. We will show how to do a simple parse and collect the results, how to count the number of triples in a file, how to convert a file from one syntax format to another, and how to dynamically create a parser for the correct syntax format.

If you use Sesame via the Repository API (see Chapter 8, Basic Programming with Sesame), then typically you will not need to use the parsers directly: you simply supply the document (either via a URL, or as a File, InputStream or Reader object) to the RepositoryConnection and the parsing is all handled internally. However, sometimes you may want to parse an RDF document without immediately storing it in a triplestore. For those cases, you can use Rio directly.

9.1. Listening to the parser

The Rio parsers all work with a set of Listener interfaces that they report results to: ParseErrorListener, ParseLocationListener, and RDFHandler. Of these three, RDFHandler is the most useful one: this is the listener that receives parsed RDF triples. So we will concentrate on this interface here.

The RDFHandler interface is quite simple, it contains just five methods: startRDF, handleNamespace, handleComment, handleStatement, and endRDF. Rio also provides a number of default implementations of RDFHandler, such as StatementCollector, which stores all received RDF triples in a Java Collection. Depending on what you want to do with parsed statements, you can either reuse one of the existing RDFHandlers, or, if you have a specific task in mind, you can simply write your own implementation of RDFHandler. Here, I will show you some simple examples of things you can do with RDFHandlers.

9.2. Parsing a file and collecting all triples

As a simple example of how to use Rio, we parse an RDF document and collect all the parsed statements in a Java Collection object (specifically, in a Graph object).

Let’s say we have a Turtle file, available at http://example.org/example.ttl:

java.net.URL documentUrl = new URL(“http://example.org/example.ttl”);
InputStream inputStream = documentUrl.openStream();

We now have an open InputStream to our RDF file. Now we need a RDFParser object that reads this InputStream and creates RDF statements out of it. Since we are reading a Turtle file, we create a RDFParser object for the RDFFormat.TURTLE syntax format:

RDFParser rdfParser = Rio.createParser(RDFFormat.TURTLE);

(note: all Rio classes and interfaces are in package org.openrdf.rio or one of its subpackages)

We also need an RDFHandler which can receive RDF statements from the parser. Since we just want to create a collection of Statements for now, we’ll just use Rio’s StatementCollector:

org.openrdf.model.Graph myGraph = new org.openrdf.model.impl.GraphImpl();
StatementCollector collector = new StatementCollector(myGraph);
rdfParser.setRDFHandler(collector);

Note, by the way, that you can use any collection class (such as java.util.ArrayList or java.util.HashSet) in place of the Graph object.

Finally, we need to set the parser to work:

try {
   rdfParser.parse(inputStream, documentURL.toString());
}
catch (IOException e) {
  // handle IO problems (e.g. the file could not be read)
}
catch (RDFParseException e) {
  // handle unrecoverable parse error
}
catch (RDFHandlerException e) {
  // handle a problem encountered by the RDFHandler
}

After the parse() method has executed (and provided no exception has occurred), the collection myGraph will be filled by the StatementCollector. As an aside: you do not have to provide the StatementCollector with a list in advance, you can also use an empty constructor and then just get the collection, using StatementCollector.getStatements() .

9.3. Using your own RDFHandler: counting statements

Suppose you want to count the number of triples in an RDF file. You could of course use the code from the previous section, adding all triples to a Collection, and then just checking the size of that Collection. However, this will get you into trouble when you are parsing very large RDF files: you might run out of memory. And in any case: creating and storing all these Statement objects just to be able to count them seems a bit of a waste. So instead, we will create our own RDFHandler, which just counts the parsed RDF statements and then immediately throws them away.

To create your own RDFHandler implementation, you can of course create a class that implements the RDFHandler interface, but a useful shortcut is to instead create a subclass of RDFHandlerBase. This is a base class that provides dummy implementations of all interface methods. The advantage is that you only have to override the methods in which you need to do something. Since what we want to do is just count statements, we only need to override the handleStatement method. Additionaly, we of course need a way to get back the total number of statements found by our counter:

class StatementCounter extends RDFHandlerBase {
 
  private int countedStatements = 0;
 
  @Override
  public void handleStatement(Statement st) {
     countedStatements++;
  }
 
 public int getCountedStatements() {
   return countedStatements;
 }
 }

Once we have our custom RDFHandler class, we can supply that to the parser instead of the StatementCollector:

StatementCounter myCounter = new StatementCounter();
rdfParser.setRDFHandler(myCounter);
try {
   rdfParser.parse(inputStream, documentURL.toString());
}
catch (Exception e) {
  // oh no!
}
int numberOfStatements = myCounter.getCountedStatements();

9.4. Writing RDF

Sofar, we've seen how to read RDF, but Rio of course also allows you to write RDF, using RDFWriters, which are a subclass of RDFHandler that is intended for writing RDF in a specific syntax format.

As an example, we start with a Graph containing several RDF statements, and we want to write these statements to a file. In this example, we'll write our statements to a file in RDF/XML syntax:

Graph myGraph; // a collection of several RDF statements
FileOutputStream out = new FileOutputStream("/path/to/file.rdf");
RDFWriter writer = Rio.createWriter(RDFFormat.RDFXML, out);
try {
  writer.startRDF();
  for (Statement st: myGraph) {
	 writer.handleStatement(st);
  }
  writer.endRDF();
}
catch (RDFHandlerException e) {
 // oh no, do something!
}

Since we have now seen how to read RDF using a parser and how to write using a writer, we can now convert RDF files from one syntax to another, simply by using a parser for the input syntax, collecting the statements, and then writing them again using a writer for the intended output syntax. However, you may notice that this approach may be problematic for very large files: we are collecting all statements into main memory (in a Graph object).

Fortunately, there is a shortcut. We can eliminate the need for using a Graph altogether. If you've paid attention, you might have spotted it already: RDFWriters are also RDFHandlers. So instead of first using a StatementCollector to collect our RDF data and then writing that to our RDFWriter, we can simply use the RDFWriter directly. So if we want to convert our input RDF file from Turtle syntax to RDF/XML syntax, we can do that, like so:

// open our input document
java.net.URL documentUrl = new URL(“http://example.org/example.ttl”);
InputStream inputStream = documentUrl.openStream();

// create a parser for Turtle and a writer for RDF/XML 
RDFParser rdfParser = Rio.createParser(RDFFormat.TURTLE);
RDFWriter rdfWriter = Rio.createWriter(RDFFormat.RDFXML, 
                           new FileOutputStream("/path/to/example-output.rdf");

// link our parser to our writer...
rdfParser.setRDFHandler(rdfWriter);

// ...and start the conversion!
try {
   rdfParser.parse(inputStream, documentURL.toString());
}
catch (IOException e) {
  // handle IO problems (e.g. the file could not be read)
}
catch (RDFParseException e) {
  // handle unrecoverable parse error
}
catch (RDFHandlerException e) {
  // handle a problem encountered by the RDFHandler
}

9.5. Detecting the file format

In the examples sofar, we have always assumed that you know what the syntax format of your input file is: we assumed Turtle syntax and created a new parser using RDFFormat.TURTLE. However, you may not always know in advance what exact format the RDF file is in. What then? Fortunately, Rio has a couple of useful features to help you.

RDFFormat is, as we have seen, a set of constants defining the available syntax formats. However, it also has a couple of utility methods for guessing the correct format, given either a filename or a MIME-type. For example, to get back the RDF format for our Turtle file, we could do the following:

RDFFormat format = RDFFormat.forFileName(documentURL.toString());

This will guess, based on the extension of the file (.ttl) that the file is a Turtle file and return the correct format. We can then use that with the Rio factory class to create the correct parser dynamically:

RDFParser rdfParser = Rio.createParser(format);

As you can see, we still have the same result: we have created an RDFParser object which we can use to parse our file, but now we have not made the explicit assumption that the input file is in Turtle format: if we would later use the same code with a different file (say, a .owl file – which is in RDF/XML format), our program would be able to detect the format at runtime and create the correct parser for it.

Chapter 10. HTTP communication protocol for Sesame 2

The following is a description of the HTTP-based communication protocol for Sesame 2. Design consideration for the protocol include the rules for the REST architectural style. In brief, this means that URLs are used to represent resources and that standard HTTP methods (GET, PUT, etc.) are used to access these resources. Client properties such as the data formats that it can process are communicated to the server using HTTP headers like Accept and are not part of the URLs. This way, a resource identified by a specific URL can, for example, be presented as an HTML page to a web browser and as a binary content stream to a client library. For more in depth information about REST see http://en.wikipedia.org/wiki/REST, http://rest.blueoxen.net/ and http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm. More information about HTTP in general and HTTP headers in particular can be found in RFC 2616 - Hypertext Transfer Protocol -- HTTP/1.1.

The Sesame 2 HTTP communication protocol is a fully compliant superset of the SPARQL Protocol for RDF W3C Recommendation. The current version of the protocol additionally supports communication for SPARQL 1.1 Update operations, as described in the SPARQL 1.1 Protocol for RDF W3C Working Draft, as well as the SPARQL 1.1 Graph Store HTTP Protocol W3C Working Draft.

10.1. Protocol summary

The REST architectural style implies that URLs are used to represent the various resources that are available on a server. This section gives a summary of the resources that are available from a Sesame server with the HTTP-methods that can be used on them. In this overview, <SESAME_URL> is used to denote the location of the Sesame server, e.g. http://localhost/openrdf-sesame. Likewise, <REP_ID> denotes the ID of a specific repository (e.g. "mem-rdf"), and <PREFIX> denotes a namespace prefix (e.g. "rdfs").

The following is an overview of the resources that are available from a Sesame server.

<SESAME_URL>
  /protocol             : protocol version (GET)
  /repositories         : overview of available repositories (GET)
  /<REP_ID>         : query evaluation and administration tasks on 
	 a repository (GET/POST/DELETE)
	 /statements   : repository statements (GET/POST/PUT/DELETE)
	 /contexts     : context overview (GET)
	 /size         : #statements in repository (GET)
	 /rdf-graphs   : named graphs overview (GET)
	 /service  : Graph Store operations on indirectly referenced named graphs 
	 in repository (GET/PUT/POST/DELETE)
	 /<NAME>   : Graph Store operations on directly referenced named graphs 
		in repository (GET/PUT/POST/DELETE)
		/namespaces   : overview of namespace definitions (GET/DELETE)
		/<PREFIX> : namespace-prefix definition (GET/PUT/DELETE)

10.2. Protocol version

The version of the protocol that the server uses to communicate over HTTP is available at: <SESAME_URL>/protocol. The version described by this chapter is "6".

Supported methods on this URL are:

  • GET: Gets the protocol version string, e.g. "1", "2", etc.

10.2.1. Request examples

10.2.1.1. Fetch the protocol version

Request:

GET /openrdf-sesame/protocol HTTP/1.1
Host: localhost

Response:

HTTP/1.1 200 OK
Content-Type: text/plain;charset=UTF-8
Content-Length: 1

4

10.3. Repository list

An overview of the repositories that are available on a server can be retrieved from <SESAME_URL>/repositories.

Supported methods on this URL are:

  • GET: Gets an list of available repositories, including ID, title, read- and write access parameters for each listed repository. The list is formatted as a tuple query result with variables "uri", "id", "title", "readable" and "writable". The "uri" value gives the URI/URL for the repository and the "readable" and "writable" values are xsd:boolean typed literals indicating read- and write permissions.

Request headers:

10.3.1. Request examples

10.3.1.1. Fetch the repository list

Request:

GET /openrdf-sesame/repositories HTTP/1.1
Host: localhost
Accept: application/sparql-results+xml, */*;q=0.5

Response:

HTTP/1.1 200 OK
Content-Type: application/sparql-results+xml;charset=UTF-8

<?xml version='1.0' encoding='UTF-8'?>
<sparql xmlns='http://www.w3.org/2005/sparql-results#'>
  <head>
	 <variable name='uri'/>
	 <variable name='id'/>
	 <variable name='title'/>
	 <variable name='readable'/>
	 <variable name='writable'/>
  </head>
  <results ordered='false' distinct='false'>
	 <result>
		<binding name='uri'>
		  <uri>http://localhost/openrdf-sesame/repositories/mem-rdf</uri>
		</binding>
		<binding name='id'>
		  <literal>mem-rdf</literal>
		</binding>
		<binding name='title'>
		  <literal>Main Memory RDF repository</literal>
		</binding>
		<binding name='readable'>
		  <literal datatype='http://www.w3.org/2001/XMLSchema#boolean'>true</literal>
		</binding>
		<binding name='writable'>
		  <literal datatype='http://www.w3.org/2001/XMLSchema#boolean'>false</literal>
		</binding>
	 </result>
  </results>
</sparql>

10.4. Repository queries

Queries on a specific repository with ID <ID> can be evaluated by sending requests to: <SESAME_URL>/repositories/<ID>. This resource represents a SPARQL query endpoint. Both GET and POST methods are supported. The GET method is preferred as it adheres to the REST architectural style. The POST method should be used in cases where the length of the (URL-encoded) query exceeds practicable limits of proxies, servers, etc. In case a POST request is used, the query parameters should be send to the server as www-form-urlencoded data.

Parameters:

  • 'query': The query to evaluate.
  • 'queryLn' (optional): Specifies the query language that is used for the query. Acceptable values are strings denoting the query languages supported by the server, i.e. "serql" for SeRQL queries and "sparql" for SPARQL queries. If not specified, the server assumes the query is a SPARQL query.
  • 'infer' (optional): Specifies whether inferred statements should be included in the query evaluation. Inferred statements are included by default. Specifying any value other than "true" (ignoring case) restricts the query evluation to explicit statements only.
  • '$<varname>' (optional): specifies variable bindings. Variables appearing in the query can be bound to a specific value outside the actual query using this option. The value should be an N-Triples encoded RDF value.
  • '$<timeout>' (optional): specifies a maximum query execution time, in whole seconds. The value should be an integer. A setting of 0 or a negative number indicates unlimited query time (the default).

Request headers:

10.4.1. Requests examples

10.4.1.1. Evaluate a SeRQL-select query on repository "mem-ref"

Request:

GET /openrdf-sesame/repositories/mem-rdf?query=select%20%3Cfoo:bar%3E&queryLn=serql HTTP/1.1
Host: localhost
Accept: application/sparql-results+xml, */*;q=0.5

Response:

HTTP/1.1 200 OK
Content-Type: application/sparql-results+xml;charset=UTF-8

<?xml version='1.0' encoding='UTF-8'?>
<sparql xmlns='http://www.w3.org/2005/sparql-results#'>
  <head>
	 <variable name='&lt;foo:bar&gt;'/>
  </head>
  <results ordered='false' distinct='false'>
	 <result>
		<binding name='&lt;foo:bar&gt;'>
		  <uri>foo:bar</uri>
		</binding>
	 </result>
  </results>
</sparql>

10.4.1.2. Evaluate a SPARQL-construct query on repository "mem-ref" using a POST request

Request:

POST /openrdf-sesame/repositories/mem-rdf HTTP/1.1
Host: localhost
Content-Type: application/x-www-form-urlencoded
Accept: application/rdf+xml, */*;q=0.5

query=construct%20{?s%20?p%20?o}%20where%20{?s%20?p%20?o}

Response:

HTTP/1.1 200 OK
Content-Type: application/rdf+xml;charset=UTF-8

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
</rdf:RDF>

10.4.1.3. Evaluate a SPARQL-ask query on repository "mem-ref"

Request:

GET /openrdf-sesame/repositories/mem-rdf?query=ask%20{?s%20?p%20?o} HTTP/1.1
Host: localhost
Accept: text/boolean, */*;q=0.5

Response:

HTTP/1.1 200 OK
Content-Type: text/boolean;charset=US-ASCII

true

10.5. Repository removal

A specific repository with ID <ID> can be deleted from the server by sending requests to: <SESAME_URL>/repositories/<ID>. The DELETE method should be used for this, and the request accepts no parameters.

Care should be taken with the use of this method: the result of this operation is the complete removal of the repository from the server, including its configuration settings and (if present) data directory.

10.5.1. Request examples

10.5.1.1. Remove the "mem-rdf" repository

Request:

DELETE /openrdf-sesame/repositories/mem-rdf HTTP/1.1
Host: localhost

Response:

HTTP/1.1 204 NO CONTENT

10.6. Repository statements

The statements for a specific repository with ID <ID> are available at: <SESAME_URL>/repositories/<ID>/statements

Supported methods on this URL are:

  • GET: Fetches statements from the repository.
  • PUT: Updates data in the repository, replacing any existing data with the supplied data. The data supplied with this request is expected to contain an RDF document in one of the supported RDF formats.
  • DELETE: Deletes statements from the repository.
  • POST: Performs updates on the data in the repository. The data supplied with this request is expected to contain either an RDF document, a SPARQL 1.1 Update string, or a special purpose transaction document. If an RDF document is supplied, the statements found in the RDF document will be added to the repository. If a SPARQL 1.1 Update string is supplied, the update operation will be parsed and executed. If a transaction document is supplied, the updates specified in the transaction document will be executed.

Parameters:

  • 'subj' (optional): Restricts a GET or DELETE operation to statements with the specified N-Triples encoded resource as subject.
  • 'pred' (optional): Restricts a GET or DELETE operation to statements with the specified N-Triples encoded URI as predicate.
  • 'obj' (optional): Restricts a GET or DELETE operation to statements with the specified N-Triples encoded value as object.
  • 'update' (optional): Only relevant for POST operations. Specifies the SPARQL 1.1 Update string to be executed. The value is expected to be a syntactically valid SPARQL 1.1 Update string.
  • 'context' (optional): If specified, restricts the operation to one or more specific contexts in the repository. The value of this parameter is either an N-Triples encoded URI or bnode ID, or the special value 'null' which represents all context-less statements. If multiple 'context' parameters are specified, the request will operate on the union of all specified contexts. The operation is executed on all statements that are in the repository if no context is specified.
  • 'infer' (optional): Specifies whether inferred statements should be included in the result of GET requests. Inferred statements are included by default. Specifying any value other than "true" (ignoring case) restricts the request to explicit statements only.
  • 'baseURI' (optional): Specifies the base URI to resolve any relative URIs found in uploaded data against. This parameter only applies to the PUT and POST method.

Request headers:

  • 'Accept': Relevant values for GET requests are the MIME types of supported RDF formats.
  • 'Content-Type': Must specify the encoding of any request data that is sent to a server. Relevant values are the MIME types of supported RDF formats, "application/x-rdftransaction" for a transaction document and "application/x-www-form-urlencoded" in case the parameters are encoded in the request body (as opposed to the being part of the request URL).

10.6.1. Request examples

10.6.1.1. Fetch all statements from repository "mem-rdf"

Request:

GET /openrdf-sesame/repositories/mem-rdf/statements HTTP/1.1
Host: localhost
Accept: application/rdf+xml

Response:

HTTP/1.1 200 OK
Content-Type: application/rdf+xml;charset=UTF-8

[RDF/XML ENCODED RDF DATA]

10.6.1.2. Fetch all statements from a specific context in repository "mem-rdf"

Request:

GET /openrdf-sesame/repositories/mem-rdf/statements?context=_:n1234x5678 HTTP/1.1
Host: localhost
Accept: application/rdf+xml

Response:

HTTP/1.1 200 OK
Content-Type: application/rdf+xml;charset=UTF-8

[RDF/XML ENCODED RDF DATA]

10.6.1.3. Remove all statements from the "mem-rdf" repository

Request:

DELETE /openrdf-sesame/repositories/mem-rdf/statements HTTP/1.1
Host: localhost

Response:

HTTP/1.1 204 NO CONTENT

10.6.1.4. Add data to the "mem-rdf" repository

Request:

POST /openrdf-sesame/repositories/mem-rdf/statements HTTP/1.1
Host: localhost
Content-Type: application/rdf+xml;charset=UTF-8

[RDF/XML ENCODED RDF DATA]

Response:

HTTP/1.1 204 NO CONTENT

10.6.1.5. Add data to the "mem-rdf" repository, replacing any and all existing data

Request:

PUT /openrdf-sesame/repositories/mem-rdf/statements HTTP/1.1
Host: localhost
Content-Type: application/rdf+xml;charset=UTF-8

[RDF/XML ENCODED RDF DATA]

Response:

HTTP/1.1 204 NO CONTENT

10.6.1.6. Add data to a specific context in the "mem-rdf" repository, replacing any data that is currently in this context

Request:

PUT /openrdf-sesame/repositories/mem-rdf/statements?context=%3Curn:x-local:graph1%3E&baseURI=%3Curn:x-local:graph1%3E HTTP/1.1
Host: localhost
Content-Type: application/x-turtle;charset=UTF-8

[TURTLE ENCODED RDF DATA]

Response:

HTTP/1.1 204 NO CONTENT

10.6.1.7. Add statements without a context to the "mem-rdf" repository, ignoring any context information that is encoded in the supplied data

Request:

POST /openrdf-sesame/repositories/mem-rdf/statements?context=null HTTP/1.1
Host: localhost
Content-Type: application/x-turtle;charset=UTF-8

[TURTLE ENCODED RDF DATA]

Response:

HTTP/1.1 204 NO CONTENT

10.6.1.8. Perform update described in a SPARQL 1.1 Update string

Request:

POST /openrdf-sesame/repositories/mem-rdf/statements HTTP/1.1
Host: localhost
Content-Type: application/x-www-form-urlencoded

update=INSERT%20{?s%20?p%20?o}%20WHERE%20{?s%20?p%20?o}

Response:

HTTP/1.1 204 NO CONTENT

10.6.1.9. Perform updates described in a transaction document and treat it as a single transaction

Request:

POST /openrdf-sesame/repositories/mem-rdf/statements HTTP/1.1
Host: localhost
Content-Type: application/x-rdftransaction

[TRANSACTION DATA]

Response:

HTTP/1.1 204 NO CONTENT

10.7. Context lists

A list of resources that are used as context identifiers in a repository with ID <ID> is available at: <SESAME_URL>/repositories/<ID>/contexts

Supported methods on this URL are:

  • GET: Gets a list of resources that are used as context identifiers. The list is formatted as a tuple query result with a single variable "contextID", which is bound to URIs and bnodes that are used as context identifiers.

Request headers:

10.7.1. Request examples

10.7.1.1. Fetch all context identifiers from repository "mem-rdf"

Request:

GET /openrdf-sesame/repositories/mem-rdf/contexts HTTP/1.1
Host: localhost
Accept: application/sparql-results+xml

Response:

HTTP/1.1 200 OK
Content-Type: application/sparql-results+xml

<?xml version='1.0' encoding='UTF-8'?>
<sparql xmlns='http://www.w3.org/2005/sparql-results#'>
  <head>
	 <variable name='contextID'/>
  </head>
  <results ordered='false' distinct='false'>
	 <result>
		<binding name='contextID'>
		  <uri>urn:x-local:graph1</uri>
		</binding>
	 </result>
  </results>
</sparql>

10.8. Namespace declaration lists

Namespace declaration lists for a repository with ID <ID> are available at: <SESAME_URL>/repositories/<ID>/namespaces.

Supported methods on this URL are:

  • GET: Gets a list of namespace declarations that have been defined for the repository. The list is formatted as a tuple query result with variables "prefix" and "namespace", which are both bound to literals.
  • DELETE: Removes all namespace declarations from the repository.

Request headers:

10.8.1. Request examples

10.8.1.1. Fetch all namespace declaration info

Request

GET /openrdf-sesame/repositories/mem-rdf/namespaces HTTP/1.1
Host: localhost
Accept: application/sparql-results+xml, */*;q=0.5

Response:

HTTP/1.1 200 OK
Content-Type: application/sparql-results+xml

<?xml version='1.0' encoding='UTF-8'?>
<sparql xmlns='http://www.w3.org/2005/sparql-results#'>
  <head>
	 <variable name='prefix'/>
	 <variable name='namespace'/>
  </head>
  <results ordered='false' distinct='false'>
	 <result>
		<binding name='prefix'>
		  <literal>rdf</literal>
		</binding>
		<binding name='namespace'>
		  <literal>http://www.w3.org/1999/02/22-rdf-syntax-ns#</literal>
		</binding>
	 </result>
  </results>
</sparql>

10.8.1.2. Remove all namespace declarations from the repository.

Request:

DELETE /openrdf-sesame/repositories/mem-rdf/namespaces HTTP/1.1
Host: localhost

Response:

HTTP/1.1 204 NO CONTENT

10.9. Namespace declarations

Namespace declarations with prefix <PREFIX> for a repository with ID <ID> are available at: <SESAME_URL>/repositories/<ID>/namespaces/<PREFIX>.

Supported methods on this URL are:

  • GET: Gets the namespace that has been defined for a particular prefix.
  • PUT: Defines or updates a namespace declaration, mapping the prefix to the namespace that is supplied in plain text in the request body.
  • DELETE: Removes a namespace declaration.

10.9.1. Request examples

10.9.1.1. Get the namespace for prefix 'rdf'

Request

GET /openrdf-sesame/repositories/mem-rdf/namespaces/rdf HTTP/1.1
Host: localhost

Response:

HTTP/1.1 200 OK
Content-Type: text/plain;charset=UTF-8

http://www.w3.org/1999/02/22-rdf-syntax-ns#

10.9.1.2. Set the namespace for a specific prefix

Request:

PUT /openrdf-sesame/repositories/mem-rdf/namespaces/example HTTP/1.1
Host: localhost
Content-Type: text/plain

http://www.example.com

Response:

HTTP/1.1 204 NO CONTENT

10.9.1.3. Remove the namespace for a specific prefix

Request:

DELETE /openrdf-sesame/repositories/mem-rdf/namespaces/example HTTP/1.1
Host: localhost

Response:

HTTP/1.1 204 NO CONTENT

10.10. Repository size

The repository size (defined as the number of statements it contains) is available at: <SESAME_URL>/repositories/<ID>/size.

Supported methods on this URL are:

  • GET: Gets the number of statements in a repository.

Parameters:

  • 'context' (optional): If specified, restricts the operation to one or more specific contexts in the repository. The value of this parameter is either an N-Triples encoded URI or bnode ID, or the special value 'null' which represents all context-less statements. If multiple 'context' parameters are specified, the request will operate on the union of all specified contexts. The operation is executed on all statements that are in the repository if no context is specified.

10.10.1. Request examples

10.10.1.1. Get the size of repository 'mem-rdf'

Request

GET /openrdf-sesame/repositories/mem-rdf/size HTTP/1.1
Host: localhost

Response:

HTTP/1.1 200 OK
Content-Type: text/plain

123456

10.10.1.2. Get the size of a specific context in repository 'mem-rdf'

Request

GET /openrdf-sesame/repositories/mem-rdf/size?context=%3Curn:x-local:graph1%3E HTTP/1.1
Host: localhost

Response:

HTTP/1.1 200 OK
Content-Type: text/plain

4321

10.11. Graph Store support

The SPARQL 1.1 Graph Store HTTP Protocol is supported on a per-repository basis. The functionality is accessible at <SESAME_URL>/repositories/<ID>/rdf-graphs/service (for indirectly referenced named graphs), and <SESAME_URL>/repositories/<ID>/rdf-graphs/<NAME> (for directly referenced named graphs). A request on a directly referenced named graph entails that the request URL itself is used as the named graph identifier in the repository.

Supported methods on these resources are:

  • GET: fetches statements in the named graph from the repository.
  • PUT: Updates data in the named graph in the repository, replacing any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in one of the supported RDF formats.
  • DELETE: Delete all data in the named graph in the repository.
  • POST: Updates data in the named graph in the repository, adding to any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in one of the supported RDF formats.

Request headers:

For requests on indirectly referenced graphs, the following parameters are supported:

  • 'graph' (optional): specifies the URI of the named graph to be accessed.
  • 'default' (optional): specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.

Each reqest on an indirectly referenced graph needs to specify precisely one of the above parameters.

10.11.1. Request examples

10.11.1.1. Fetch all statements from a directly referenced named graph in repository "mem-rdf"

Request:

GET /openrdf-sesame/repositories/mem-rdf/rdf-graphs/graph1 HTTP/1.1
Host: localhost
Accept: application/rdf+xml

Response:

HTTP/1.1 200 OK
Content-Type: application/rdf+xml;charset=UTF-8

[RDF/XML ENCODED RDF DATA]

10.11.1.2. Fetch all statements from an indirectly referenced named graph in repository "mem-rdf"

Request:

GET /openrdf-sesame/repositories/mem-rdf/rdf-graphs/service?graph=http%3A%2F%2Fexample.org%2Fgraph1 HTTP/1.1
Host: localhost
Accept: application/rdf+xml

Response:

HTTP/1.1 200 OK
Content-Type: application/rdf+xml;charset=UTF-8

[RDF/XML ENCODED RDF DATA]

10.11.1.3. Fetch all statements from the default graph in repository "mem-rdf"

Request:

GET /openrdf-sesame/repositories/mem-rdf/rdf-graphs/service?default HTTP/1.1
Host: localhost
Accept: application/rdf+xml

Response:

HTTP/1.1 200 OK
Content-Type: application/rdf+xml;charset=UTF-8

[RDF/XML ENCODED RDF DATA]

10.11.1.4. Add statements to a directly referenced named graph in the "mem-rdf" repository

Request:

POST /openrdf-sesame/repositories/mem-rdf/rdf-graphs/graph1 HTTP/1.1
Host: localhost
Content-Type: application/x-turtle;charset=UTF-8

[TURTLE ENCODED RDF DATA]

Response:

HTTP/1.1 204 NO CONTENT

10.11.1.5. Clear a directly referenced named graph in the "mem-rdf" repository

Request:

DELETE /openrdf-sesame/repositories/mem-rdf/rdf-graphs/graph1 HTTP/1.1
Host: localhost

Response:

HTTP/1.1 204 NO CONTENT

10.12. Content types

The following table summarizes the MIME types for various document formats that are relevant to this protocol.

Table 10.1. MIME types for RDF formats

FormatMIME type
RDF/XMLapplication/rdf+xml
N-Triplestext/plain
Turtletext/turtle
N3text/rdf+n3
TriXapplication/trix
TriGapplication/x-trig
Sesame Binary RDFapplication/x-binary-rdf

Table 10.2. MIME types for variable binding formats

FormatMIME type
SPARQL Query Results XML Formatapplication/sparql-results+xml
SPARQL Query Results JSON Formatapplication/sparql-results+json
binary RDF results table formatapplication/x-binary-rdf-results-table

Table 10.3. MIME types for boolean result formats

FormatMIME type
SPARQL Query Results XML Formatapplication/sparql-results+xml
plain text boolean result formattext/boolean

10.13. TODO

  • Document HTTP error codes
  • Describe use of HEAD and OPTIONS methods

Chapter 11. The SeRQL query language (revision 3.1)

11.1. Revisions

11.1.1. revision 1.1

SeRQL revision 1.1 is a syntax revision (see issue tracker item SES-75). This document describes the revised syntax. From Sesame release 1.2-RC1 onwards, the old syntax is no longer supported.

11.1.2. revision 1.2

SeRQL revision 1.2 covers a set of new functions and operators:

New operations have been marked with (R1.2) where appropriate in this document.

11.1.3. revision 2.0

SeRQL revision 2.0 is an extension of SeRQL that offers functionality for querying contexts. See Section 11.16, “Querying context (R2.0)” for details.

11.1.4. revision 3.0

SeRQL revision 3.0 modifies SeRQL to be more like SPARQL, adopting its semantics and operators. Main backwards compatiblity issues with revision 2.0 are:

  • The NULL value has been deprecated; the BOUND-operator should now be used instead. For now, the SeRQL parser will automatically translate NULL values to BOUND-operators as much as possible.
  • The semantics of optional joins have been changed from the existing iterative semantics to the better defined compositional semantics that is used in SPARQL. This change will only affect some corner cases that are unlikely to appear in actual queries.

11.1.5. revision 3.1

SeRQL revision 3.1 adds the possibility to apply the IN-operator on a list of values. It also adds support for some SPARQL functionality that wasn't available in SeRQL (in whatever form). This includes the SAMETERM, STR, LANGMATCHES and REGEX operators, result ordering using ORDER BY, UNION of path expressions and the REDUCED modifier for both select and construct queries. SeRQL revision 3.1 was implemented in Sesame 2.3.0.

11.2. Introduction

SeRQL ("Sesame RDF Query Language", pronounced "circle") is an RDF query language that is very similar to SPARQL, but with other syntax. SeRQL was originally developed as a better alternative for the query languages RQL and RDQL. A lot of SeRQL's features can now be found in SPARQL and SeRQL has adopted some of SPARQL's features in return.

This document briefly shows all of these features. After reading through this document one should be able to write SeRQL queries.

11.3. URIs, literals and variables

URIs and literals are the basic building blocks of RDF. For a query language like SeRQL, variables are added to this list. The following sections will show how to write these down in SeRQL.

11.3.1. Variables

Variables are identified by names. These names must start with a letter or an underscore ('_') and can be followed by zero or more letters, numbers, underscores, dashes ('-') or dots ('.'). Examples variable names are:

  • Var1
  • _var2
  • unwise.var-name_isnt-it

SeRQL keywords are not allowed to be used as variable names. Currently, the following keywords are used in SeRQL: select, construct, distinct, reduced, as, from, context, where, order, by, asc, desc, limit, offset, using, namespace, true, false, not, and, or, sameterm, like, ignore, case, regex, label, lang, langmatches, datatype, localname, str, bound, null, isresource, isbnode, isuri, isliteral, in, union, intersect, minus, exists, any, all.

Keywords in SeRQL are all case-insensitive but variable names are case-sensitive.

11.3.2. URIs

There are two ways to write down URIs in SeRQL: either as full URIs or as abbreviated URIs. Full URIs must be surrounded with "<" and ">". Examples of this are:

  • <http://www.openrdf.org/index.html>
  • <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
  • <mailto:sesame@openrdf.org>
  • <file:///C:\rdffiles\test.rdf>

As URIs tend to be long strings with the first part being shared by several of them (i.e. the namespace), SeRQL allows one to use abbreviated URIs (or QNames) by defining (short) names for these namespaces which are called "prefixes". A QName always starts with one of the defined prefixes and a colon (":"). After this colon, the part of the URI that is not part of the namespace follows. The first part, consisting of the prefix and the colon, is replaced by the full namespace by the query engine. Some example QNames are:

  • sesame:index.html
  • rdf:type
  • foaf:Person

11.3.3. Literals

RDF literals consist of three parts: a label, a language tag, and a datatype. The language tag and the datatype are optional and at most one of these two can accompany a label (a literal can not have both a language tag and a datatype). The notation of literals in SeRQL has been modelled after their notation in N-Triples; literals start with the label, which is surrounded by double quotes, optionally followed by a language tag with a "@" prefix or by a datatype URI with a "^^" prefix. Example literals are:

  • "foo"
  • "foo"@en
  • "<foo/>"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>

The SeRQL notation for abbreviated URIs can also be used. When the prefix rdf is mapped to the namespace http://www.w3.org/1999/02/22-rdf-syntax-ns#, the last example literal could also have been written down like:

  • "<foo/>"^^rdf:XMLLiteral

SeRQL has also adopted the character escapes from N-Triples; special characters can be escaped by prefixing them with a backslash. One of the special characters is the double quote. Normally, a double quote would signal the end of a literal's label. If the double quote is part of the label, it needs to be escaped. For example, the sentence John said: "Hi!" can be encoded in a SeRQL literals as: "John said: \"Hi!\"".

As the backslash is a special character itself, it also needs to be escaped. To encode a single backslash in a literal's label, two backslashes need to be written in the label. For example, a Windows directory would be encoded as: "C:\\Program Files\\Apache Tomcat\\".

SeRQL has functions for extracting each of the three parts of a literal. These functions are label, lang, and datatype. label("foo"@en) extracts the label "foo", lang("foo"@en) extracts the language tag "en", and datatype("foo"^^rdf:XMLLiteral) extracts the datatype rdf:XMLLiteral. The use of these functions is explained later.

11.3.4. Blank Nodes (R1.2)

RDF has a notion of blank nodes. These are nodes in the RDF graph that are not labeled with a URI or a literal. The interpretation of such blank nodes is as a form of existential quantification: it allows one to assert that "there exists a node such that..." without specifying what that particular node is. Blank nodes do in fact often have identifiers, but these identifiers are assigned internally by whatever processor is processing the graph and they are only valid in the local context, not as global identifiers (unlike URIs).

Strictly speaking blank nodes are only addressable indirectly, by querying for one or more properties of the node. However, SeRQL, as a practical shortcut, allows blank node identifiers to be used in queries. The syntax for blank nodes is adopted from N-Triples, using a QName-like syntax with "_" as the namespace prefix, and the internal blank node identifier as the local name. For example:

  • _:bnode1

This identifies the blank node with internal identifier "bnode1". These blank node identifiers can be used in the same way that normal URIs or QNames can be used.

Caution: It is important to realize that addressing blank nodes in this way makes SeRQL queries non-portable across repositories. There is no guarantee that in two repositories, even if they contain identical datasets, the blank node identifiers will be identical. It may well be that "bnode1" in repository A is a completely different blank node than "bnode1" in repository B. Even in the same repository, it is not guaranteed that blank node identifiers are stable over updates: if certain statements are added to or removed from a repository, it is not guaranteed "bnode1" still identifies the same blank node that it did before the update operation.

11.4. Path expressions

One of the most prominent parts of SeRQL are path expressions. Path expressions are expressions that match specific paths through an RDF graph.

11.4.1. Basic path expressions

Imagine that we want to query an RDF graph for persons who work for companies that are IT companies. Querying for this information comes down to finding the following pattern in the RDF graph (gray nodes denote variables):

Figure 11.1. A basic path expression

A basic path expression

The SeRQL notation for path expressions resembles the picture above; it is written down as:

{Person} foo:worksFor {Company} rdf:type {foo:ITCompany}

The parts surrounded by curly brackets represent the nodes in the RDF graph, the parts between these nodes represent the edges in the graph. The direction of the arcs (properties) in SeRQL path expressions is always from left to right.

In SeRQL queries, multiple path expressions can be specified by seperating them with commas. For example, the path expression show before can also be written down as two smaller path expressions:

{Person} foo:worksFor {Company},
{Company} rdf:type {foo:ITCompany}

The nodes and edges in the path expressions can be variables, URIs and literals. Also, a node can be left empty in case one is not interested in the value of that node. Here are some more example path expressions to illustrate this:

  • {Person} foo:worksFor {} rdf:type {foo:ITCompany}
  • {Painting} ex:painted_by {} ex:name {"Picasso"}
  • {comic:RoadRunner} SomeRelation {foo:WillyECoyote}

11.4.2. Path expression short cuts

Each and every path can be constructed using a set of basic path expressions. Sometimes, however, it is nicer to use one of the available short cuts. There are three types of short cuts:

11.4.2.1. Multi-value nodes

In situations where one wants to query for two or more statements with identical subject and predicate, the subject and predicate do not have to be repeated over and over again. Instead, a multi-value node can be used:

{subj1} pred1 {obj1, obj2, obj3}

A built-in constraint on this construction is that each value for the variables in the multi-value node is unique (i.e. they are pairwise disjoint). Therefore, this path expression is equivalent to the following combination of path expressions and boolean constraints:

FROM
  {subj1} pred1 {obj1},
  {subj1} pred1 {obj2},
  {subj1} pred1 {obj3}
WHERE obj1 != obj2 AND obj1 != obj3 AND obj2 != obj3

Or graphically:

Figure 11.2. Multi-value nodes

Multi-value nodes

Multi-value nodes can also be used when statements share the predicate and object, e.g.:

{subj1, subj2, subj3} pred1 {obj1}

When used in a longer path expression, multi-value nodes apply to both the part left of the node and the part right of the node. The following path expression:

{first} pred1 {middle1, middle2} pred2 {last}

matches the following graph:

Figure 11.3. Multi-value nodes in a longer path expression

Multi-value nodes in a longer path expression

11.4.2.2. Branches

One of the shorts cuts that is probably used most, is the notation for branches in path expressions. There are lots of situations where one wants to query multiple properties of a single subject. Instead of repeating the subject over and over again, one can use a semi-colon to attach a predicate-object combination to the subject of the last part of a path expression, e.g.:

{subj1} pred1 {obj1};
        pred2 {obj2}

Which is equivalent to:

{subj1} pred1 {obj1},
{subj1} pred2 {obj2}

Or graphically:

Figure 11.4. Branches in a path expression

Branches in a path expression

A more advanced example is:

{first} pred {} pred1 {obj1};
                pred2 {obj2} pred3 {obj3}

Which matches the following graph:

Figure 11.5. Branches in a longer path expression

Branches in a longer path expression

Note that an anonymous variable is used in the middle of the path expressions.

11.4.2.3. Reified statements

The last short cut is a short cut for reified statements. A path expression representing a single statement (i.e. {node} edge {node}) can be written between the curly brackets of a node, e.g.:

{ {reifSubj} reifPred {reifObj} } pred {obj}

This would be equivalent to querying (using "rdf:" as a prefix for the RDF namespace, and "Statement" as a variable for storing the statement's URI):

{Statement} rdf:type {rdf:Statement},
{Statement} rdf:subject {reifSubj},
{Statement} rdf:predicate {reifPred},
{Statement} rdf:object {reifObj},
{Statement} pred {obj}

Again, graphically:

Figure 11.6. A reification path expression

A reification path expression

11.4.3. Optional path expressions

Optional path expressions differ from 'normal' path expressions in that they do not have to be matched to find query results. The SeRQL query engine will try to find paths in the RDF graph matching the path expression, but when it cannot find any paths it will skip the expression and leave any variables in it unbound.

Consider an RDF graph that contains information about people that have names, ages, and optionally e-mail addresses. This is a situation that is very common in RDF data. A logical query on this data is a query that yields all names, ages and, when available, e-mail addresses of people, e.g.:

{Person} ex:name {Name};
         ex:age  {Age};
         ex:email {EmailAddress}

However, using normal path expressions like in the query above, people without e-mail address will not be included in the query result. With optional path expressions, one can indicate that a specific (part of a) path expression is optional. This is done using square brackets, i.e.:

{Person} ex:name {Name};
         ex:age  {Age};
        [ex:email {EmailAddress}]

Or alternatively:

 {Person} ex:name {Name};
          ex:age  {Age},
[{Person} ex:email {EmailAddress}]

In contrast to the first path expressions, this expression will also match people without an e-mail address. For these people, the variable EmailAddress will be unbound.

Optional path expressions can also be nested. This is useful in situations where the existence of a specific path is dependent on the existence of another path. For example, the following path expression queries for the titles of all known documents and, if the author of the document is known, the name of the author (if it is known) and his e-mail address (if it is known):

{Document} ex:title {Title};
          [ex:author {Author} [ex:name {Name}];
                              [ex:email {Email}]]

With this path expression, the SeRQL query engine will not try to find the name and e-mail address of an author when it cannot even find the resource representing the author.

11.5. Select- and construct queries

The SeRQL query language supports two querying concepts. The first one can be characterized as returning a table of values, or a set of variable-value bindings. The second one returns an RDF graph, which can be a subgraph of the graph being queried, or a graph containing information that is derived from it. The first type of queries are called "select queries", the second type of queries are called "construct queries".

A SeRQL query is typically built up from one to seven clauses. For select queries these clauses are: SELECT, FROM, FROM CONTEXT, WHERE, LIMIT, OFFSET and USING NAMESPACE. One might recognize some of these clauses from SQL, but their usage is slightly different. For construct queries the clauses are the same with the exception of the first; construct queries start with a CONSTRUCT clause instead of a SELECT clause. Except for the first clause (SELECT or CONSTRUCT), all clauses are optional.

The first clause (i.e. SELECT or CONSTRUCT) determines what is done with the results that are found. In a SELECT clause, one can specify which variable values should be returned. In a CONSTRUCT clause, one can specify which statements should be returned.

The FROM clause specifies path expressions, which were explained in the previous section. It defines the paths in an RDF graph that are relevant to the query. Note that, when the FROM clause is not specified, the query will simply return the constants specified in the SELECT or CONSTRUCT clause.

The FROM CONTEXT clause is new in SeRQL revision 2.0. It is a variant of the FROM clause that allows one to constrain the path expressions in the clause to one or more contexts. Using context in querying will be explained in more detail in Section 11.16, “Querying context (R2.0)”.

The WHERE clause specifies additional (Boolean) constraints on the values in the path expressions. These are constraints on the nodes and edges of the paths that cannot be expressed in the path expressions themselves.

The LIMIT and OFFSET clauses can be used separately or combined in order to get a subset of all query answers. Their usage is very similar to the LIMIT and OFFSET clauses in SQL queries. The LIMIT clause determines the (maximum) number of query answers that will be returned. The OFFSET clause determines which query answer will be returned as the first result, skipping as many query results as specified in this clause.

Finally, the USING NAMESPACE clause can be used to declare namespace prefixes. These are the mappings from prefixes to namespaces that were referred to in one of previous sections about (abbreviated) URIs.

The WHERE, LIMIT, OFFSET and USING NAMESPACE clauses will be explained in one of the next sections. The following section will explain the SELECT and FROM clause.

11.6. Select queries

As said before, select queries return tables of values, or sets of variable-value bindings. Which values are returned can be specified in the select clause. One can specify variables and/or values in the select clause, seperated by commas. The following example query returns all URIs of classes:

SELECT C
FROM {C} rdf:type {rdfs:Class}

It is also possible to use a '*' in the SELECT clause. In that case, all variable values will be returned, e.g.:

SELECT *
FROM {S} rdfs:label {O}

This query will return the values of the variables S and O.

SELECT O, S
FROM {S} rdfs:label {O}

By default, the results of a select query are not filtered for duplicate rows. Because of the nature of the above queries, these queries will never return duplicates. However, more complex queries might result in duplicate result rows. These duplicates can be filtered out by the SeRQL query engine. To enable this functionality, one needs to specify the DISTINCT keyword after the select keyword. For example:

SELECT DISTINCT *
FROM {Country1} ex:borders {} ex:borders {Country2}
USING NAMESPACE
    ex = <http://example.org/things#>

An alternative to DISTINCT is the REDUCED keyword (Since R3.1). Specifying the REDUCED keyword allows the query engine to filter duplicates from the results, but does not require or guarantee that all duplicates are eliminated. In some cases specifying this keyword allows the query engine to apply more extensive query optimizations, resulting in better query performance. Specifying this option is recommended if there are no strong requirements to retrieve all duplicates.

11.7. Construct queries

Construct queries return RDF graphs as set of statements. The statements that a query should return can be specified in the construct clause using the previously explained path expressions. The following is an example construct query:

CONSTRUCT {Parent} ex:hasChild {Child}
FROM {Child} ex:hasParent {Parent}
USING NAMESPACE
    ex = <http://example.org/things#>

This query defines the inverse of the property ex:hasParent to be ex:hasChild. This is just one example of a query that produces information that is derived from the original information. Here is one more example:

CONSTRUCT
    {Artist} rdf:type {ex:Painter};
             ex:hasPainted {Painting}
FROM
    {Artist} rdf:type {ex:Artist};
             ex:hasCreated {Painting} rdf:type {ex:Painting}
USING NAMESPACE
    ex = <http://example.org/things#>

This query derives that an artist who has created a painting, is a painter. The relation between the painter and the painting is modelled to be art:hasPainted.

Instead of specifying a path expression in the CONSTRUCT clause, one can also use a '*'. In that case, the CONSTRUCT clause is identical to the FROM clause. This allows one to extract a subgraph from a larger graph, e.g.:

CONSTRUCT *
FROM {SUB} rdfs:subClassOf {SUPER}

This query extracts all rdfs:subClassOf relations from an RDF graph.

Just like with select queries, the results of a construct query are not filtered for duplicate statements by default. Again, these duplicates are filtered out by the SeRQL query engine if the DISTINCT keyword is specified after the construct keyword, for example:

CONSTRUCT DISTINCT
    {Artist} rdf:type {ex:Painter}
FROM
    {Artist} rdf:type {ex:Artist};
             ex:hasCreated {} rdf:type {ex:Painting}
USING NAMESPACE
    ex = <http://example.org/things#>

Again, the REDUCED keyword can also be used as an alternative to DISTINCT. See Section 11.6, “Select queries” for a description of this keyword.

11.8. The WHERE clause

The third clause in a query is the WHERE clause. This is an optional clause in which one can specify Boolean constraints on variables.

The following sections will explain the available Boolean expressions for use in the WHERE clause. Section 11.8.12, “Nested WHERE clauses (R1.2)” will explain how WHERE clauses can be nested inside optional path expressions.

11.8.1. Boolean constants

There are two Boolean constants, TRUE and FALSE. The first one is simply always true, the last one is always false. The following query will never produce any results because the constraint in the where clause will never evaluate to true:

SELECT *
FROM {X} Y {Z}
WHERE FALSE

11.8.2. Value (in)equality

The most common boolean constraint is equality or inequality of values. Values can be compared using the operators "=" (equality) and "!=" (inequality). The expression

Var = <foo:bar>

is true if the variable Var has been bound to the URI <foo:bar>, and the expression

Var1 != Var2

checks whether two variables are bound to unequal values.

Equality of literals is influenced by the literal's datatype. This means that two values that represent the same value but are written differently still compare equal. For example, the following comparison evaluates to true:

"123"^^xsd:positiveInteger = "123.0"^^xsd:float

11.8.3. SameTerm (R3.1)

Where the equality operators described in the previous section compares values taking datatypes into account, the SameTerm operator requires an exact lexical match of values. Using a SameTerm operator on a variable and a value is equivalent to replacing the variable with the value in all path expressions. For exampe, the following query:

SELECT X, Y
FROM {X} Y {Z}
WHERE SameTerm(Z, "123.0"^^xsd:float)

...is equivalent to:

SELECT X, Y
FROM {X} Y {"123.0"^^xsd:float}

...but is semantically different from:

SELECT X, Y
FROM {X} Y {Z}
WHERE Z = "123.0"^^xsd:float

11.8.4. Numerical comparisons

Numbers can be compared to each other using the operators "<" (lower than), "<=" (lower than or equal to), ">" (greater than) and ">=" (greater than or equal to). SeRQL uses a literal's datatype to determine whether its value is numerical. All XML Schema built-in numerical datatypes are supported, i.e.: xsd:float, xsd:double, xsd:decimal and all subtypes of xsd:decimal (xsd:long, xsd:nonPositiveInteger, xsd:byte, etc.), where the prefix xsd is used to reference the XML Schema namespace.

In the following query, a comparison between values of type xsd:positiveInteger is used to retrieve all countries that have a population of less than 1 million:

SELECT Country
FROM {Country} ex:population {Population}
WHERE Population < "1000000"^^xsd:positiveInteger
USING NAMESPACE
    ex = <http://example.org/things#>

If one want to compare values of incompatible types, one can try to cast one or both of the values to another type. For example in the above query, if the values that Population is bound to generally do not have a datatype, one can cast these values to xsd:integer to make the comparison work, e.g.:

SELECT Country
FROM {Country} ex:population {Population}
WHERE xsd:integer(Population) < "1000000"^^xsd:positiveInteger
USING NAMESPACE
    ex = <http://example.org/things#>

SeRQL supports all value casting methods from SPARQL, see SPARQL's Constructor Functions for more details.

11.8.5. Bound() (R3.0)

The bound() boolean function checks whether a specific variable has been bound to a value. For example, the following query returns the names of all people without a (known) e-mail address.

SELECT Name
FROM {Person} foaf:name {Name};
             [ex:email {EmailAddress}]
WHERE NOT BOUND(EmailAddress)

11.8.6. isUri() and isBnode() (R1.2)

The isURI() and isBNode() boolean functions are more specific versions of isResource(). They check whether a variable is bound to a URI value or a BNode value, respectively. For example, the following query returns only URIs (and filters out all bNodes and literals):

SELECT V
FROM {R} prop {V}
WHERE isURI(V)

11.8.7. Like (R1.2)

The LIKE operator can check whether a value matches a specified pattern of characters. '*' characters can be used as wildcards, matching with zero or more characters. The rest of the characters are compared lexically. The pattern is surrounded with double quotes, just like a literal's label.

SELECT Country
FROM {Country} ex:name {Name}
WHERE Name LIKE "Belgium"
USING NAMESPACE
    ex = <http://example.org/things#>

By default, the LIKE operator does a case-sensitive comparison: in the above query, the operator fails is the variable Name is bound to the value "belgium" instead of "Belgium". Optionally, one can specify that the operator should perform a case-insensitive comparison:

SELECT Country
FROM {Country} ex:name {Name}
WHERE Name LIKE "belgium" IGNORE CASE
USING NAMESPACE
    ex = <http://example.org/things#>

In this query, the operator will succeed for "Belgium", "belgium", "BELGIUM", etc.

The '*' character can be used as a wildcard to indicate substring matches, for example:

SELECT Country
FROM {Country} ex:name {Name}
WHERE Name LIKE "*Netherlands"
USING NAMESPACE
    ex = <http://example.org/things#>

This query will match any country names that end with the string "Netherlands", for example "The Netherlands".

11.8.8. Regex (R3.1)

The regex() function in SeRQL has been adopted from SPARQL. See the SPARQL regex description for more information.

11.8.9. LangMatches (R3.1)

The langMatches() function in SeRQL has been adopted from SPARQL. See the SPARQL langMatches description for more information.

11.8.10. And, or, not

Boolean constraints and functions can be combined using the AND and OR operators, and negated using the NOT operator. The NOT operator has the highest presedence, then the AND operator, and finally the OR operator. Parentheses can be used to override the default presedence of these operators. The following query is a (kind of artifical) example of this:

SELECT *
FROM {X} Prop {Y} rdfs:label {L}
WHERE NOT L LIKE "*FooBar*" AND
      (Y = <foo:bar> OR Y = <bar:foo>) AND
      isLiteral(L)

11.8.11. In (R3.1)

The IN operator can check whether a value is contained in a list of one or more other values. For example:

SELECT *
FROM {X} Prop {Y}
WHERE Y IN (<foo:bar>, <bar:foo>)

When there are multiple alternatives, this syntax is more convenient than the semantic equivalent with a combination of OR and SameTerm operators:

SELECT *
FROM {X} Prop {Y}
WHERE SameTerm(Y, <foo:bar>) OR SameTerm(Y, <bar:foo>)

11.8.12. Nested WHERE clauses (R1.2)

In order to be able to express boolean constraints on variables in optional path expressions, it is possible to use a nested WHERE clause. The constraints in such a nested WHERE clause restrict the potential matches of the optional path expressions, without causing the entire query to fail if the boolean constraint fails.

To illustrate the difference between a nested WHERE clause and a 'normal' WHERE clause, consider the following two queries on the same data:

Data (using Turtle format):

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex:   <http://example.org/> .

_:a  foaf:name   "Michael" .

_:b  foaf:name   "Rubens" .
_:b  ex:email    "rubinho@example.work".

_:b  foaf:name   "Giancarlo" .
_:b  ex:email    "giancarlo@example.work".

Query 1 (normal WHERE-clause):

SELECT 
   Name, EmailAddress
FROM
  {Person} foaf:name {Name};
          [ex:email {EmailAddress}]
WHERE EmailAddress LIKE "g*"

Query 2 (nested WHERE-clause):

SELECT 
   Name, EmailAddress
FROM
  {Person} foaf:name {Name};
          [ex:email {EmailAddress} WHERE EmailAddress LIKE "g*"]

In query 1, a normal WHERE clause specifies that the EmailAddress found by the optional expression must begin with the letter "g". The result of this query will be:

NameEmailAddress
Giancarlo"giancarlo@example.work"

Despite the fact that the match on EmailAddress is defined as optional, the persons named "Michael" and "Rubens" are not returned. The reason is that the WHERE clause explicitly says that the value bound to the optional variable must start with the letter "g". For Michael, no value is found, hence the variable is unbound, and the comparison operator fails on this. For Rubens, a value is found, but it does not start with the letter "g".

In query 2, however, a nested WHERE-clause is used. This specifies that any binding the optional expression matches must begin with the letter "g". The result of this query is:

NameEmailAddress
Michael 
Rubens 
Giancarlo"giancarlo@example.work"

The person "Michael" is returned without a result for his email address because there is no email address known for him at all. The person "Rubens" is returned without a result for his email address because, although he does have an email address, it does not start with the letter "g".

A query can contain at most one nested WHERE-clause per optional path expression, and at most one 'normal' WHERE-clause.

11.9. Other functions

Apart from the boolean functions and operators introduced in the previous section, SeRQL supports several other functions that return RDF terms rather than non-boolean values. These functions can be used in both the SELECT and the WHERE clause.

11.9.1. label(), lang() and datatype()

The three functions label(), lang() and datatype() all operate on literals. The result of the label() function is the lexical form of the supplied literal. The lang() function returns the language attribute. Both functions return their result as an untyped literal, which can again be compared with other literals using (in)equality-, comparison-, and like operators. The result of the datatype() function is a URI, which can be compared to other URIs. These functions can also be used in SELECT clauses, but not in path expressions.

An example query:

SELECT label(L)
FROM {R} rdfs:label {L}
WHERE isLiteral(L) AND lang(L) LIKE "en*"

11.9.2. namespace() and localName() (R1.2)

The functions namespace() and localName() operate on URIs. The namespace() function returns the namespace of the supplied URI, as a URI object. The localName() function returns the local name part of the supplied URI, as a literal. These functions can also be used in SELECT clauses, but not in path expressions.

The following query retrieves all properties of foaf:Person instances that are in the FOAF namespace. Notice that as a shorthand for the full URI, we can use a namespace prefix (followed by a colon) as an argument.

Data:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex:   <http://example.org/> .

_:a  rdf:type         foaf:Person .
_:a  my:nick          "Schumi" .
_:a  foaf:firstName   "Michael" .
_:a  foaf:knows       _:b .

_:b  rdf:type         foaf:Person .
_:b  foaf:firstName   "Rubens" .
_:b  foaf:nick        "Rubinho" .

Query:

SELECT foafProp, Value
FROM {} foafProp {Value}
WHERE namespace(foafProp) = foaf:
USING NAMESPACE
    foaf = <http://xmlns.com/foaf/0.1/>

Result:

foafPropValue
<http://xmlns.com/foaf/0.1/firstName"Michael"
<http://xmlns.com/foaf/0.1/knows_:b
<http://xmlns.com/foaf/0.1/firstName"Rubens"
<http://xmlns.com/foaf/0.1/nick"Rubinho"

In the following example, the localName() function is used to match two equivalent properties from different namespaces (using the above data).

Query:

SELECT nick
FROM {} rdf:type {foaf:Person};
        nickProp {nick}
WHERE localName(nickProp) LIKE "nick"
USING NAMESPACE
    foaf = <http://xmlns.com/foaf/0.1/>

Result:

nick
"Schumi"
"Rubinho"

11.9.3. str() (R3.1)

The str() function in SeRQL has been adopted from SPARQL. It is similar to the label() function, except that it can also be applied to URIs to convert it to a literal. See the SPARQL specification for more extensive information.

11.10. The ORDER BY clause

The ORDER BY clause can be used to order query results in particular ways. This functionality has been adopted from SPARQL, but the syntax is slightly different. The following example retrieves all known countries, ordered from largest to smallest population:

SELECT Countr, Population
FROM {Country} ex:population {Population}
ORDER BY Population DESC
USING NAMESPACE
    ex = <http://example.org/things#>

The DESC keyword in this example tells the query engine to sort in the results in descending order. The ASC keyword can be used to sort the results in ascending order, which is also the default when order is specified.

Multiple ordering expressions can be specified. If the first expression doesn't define an order between two results, the query engine will use the second expression. This process continues until an order between the results has been established, or until all ordering expressions have been processed. In the latter case, the order of these results is unspecified.

Please see the SPARQL specification for more information on the (partial) ordering of URIs, literals, etc.

11.11. The LIMIT and OFFSET clauses

LIMIT and OFFSET allow you to retrieve just a portion of the results that are generated by the query. If a limit count is given, no more than that many results will be returned (but possibly less, if the query itself yields less results).

OFFSET says to skip that many results before beginning to return results. OFFSET 0 is the same as omitting the OFFSET clause. If both OFFSET and LIMIT appear, then OFFSET rows are skipped before starting to count the LIMIT results that are returned.

11.12. The USING NAMESPACE clause

The USING NAMESPACE clause can be used to define short prefixes for namespaces, which can then be used in abbreviated URIs. Multiple prefixes can be defined, but each declaration must have a unique prefix. The following query shows the use of namespace prefixes:

CONSTRUCT
    {Artist} rdf:type {art:Painter};
             art:hasPainted {Painting}
FROM
    {Artist} rdf:type {art:Artist};
             art:hasCreated {Painting} rdf:type {art:Painting}
USING NAMESPACE
    rdf = <http://www.w3.org/1999/02/22-rdf-syntax-ns#>,
    art = <http://example.org/arts/>

The query engine will replace every occurence of rdf: in an abbreviated URI with http://www.w3.org/1999/02/22-rdf-syntax-ns#, and art: with http://example.org/arts/. So art:hasPainted will be resolved to the URI http://example.org/arts/hasPainted.

Four namespaces that are used very frequently have been assigned prefixes by default:

Table 11.1. Default namespaces

PrefixNamespace
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfshttp://www.w3.org/2000/01/rdf-schema#
xsdhttp://www.w3.org/2001/XMLSchema#
owlhttp://www.w3.org/2002/07/owl#
sesamehttp://www.openrdf.org/schema/sesame#

These prefixes can be used without declaring them. If either of these prefixes is declared explicitly in a query, this declaration will override the default mapping.

11.13. Built-in predicates (REVISED in R2.0)

SeRQL contains a number of built-in predicates. These built-ins can be used like any other predicate, as part of a path expression. The difference with normal predicates is that the built-ins act as operators on the underlying rdf graph: they can be used to query for relations between RDF resources that are not explicitly modeled, nor immediately apparant from the RDF Semantics, but which are nevertheless very useful.

Note: in Sesame 2.0 built-in predicates are only supported on repositories that have a DirectTypeHierarchyInferencer Sail in the Sail stack. This inferencer is a stacked Sail that can be deployed on top of a normal ForwardChainingRDFSInferencer.

Currently, the following built-in predicates are supported:

  • {X} sesame:directSubClassOf {Y}

    This relation holds for every X and Y where:

    1. X rdfs:subClassOf Y.
    2. X != Y.
    3. There is no class Z (Z != Y and Z != X) such that X rdfs:subClassOf Z and Z rdfs:subClassOf Y.
  • {X} sesame:directSubPropertyOf {Y}

    This relation holds for every X and Y where:

    1. X rdfs:subPropertyOf Y.
    2. X != Y.
    3. There is no property Z (Z != X and Z != Y) such that X rdfs:subPropertyOf Z and Z rdfs:subPropertyOf Y.
  • {X} sesame:directType {Y}

    This relation holds for every X and Y where:

    1. X rdf:type Y.
    2. There is no class Z (Z != Y) such that X rdf:type Z and Z rdfs:subClassOf Y.

Note: the above definition takes class/property equivalence through cyclic subClassOf/subPropertyOf relations into account. This means that if A rdfs:subClassOf B, and B rdfs:subClassOf A, it holds that A = B.

The namespace prefix 'sesame' is built-in and does not have to be defined in the query.

11.14. Set combinatory operations

SeRQL offers three combinatory operations that can be used to combine sets of query results.

11.14.1. UNION (REVISED in R3.0, extended in R3.1)

UNION is a combinatory operation the result of which is the set of query answers of both its operands. This allows one to specify alternatives in a query solution.

By default, UNION filters out duplicate answers from its operands. Specifying the ALL keyword ("UNION ALL") disables this filter.

The following example query retrieves the titles of books in the data, where the property used to describe the title can be either from the DC 1.0 or DC 1.1 specification.

Data:

@prefix dc10:  <http://purl.org/dc/elements/1.0/> .
@prefix dc11:  <http://purl.org/dc/elements/1.1/> .

_:a  dc10:title     "The SeRQL Query Language" .
_:b  dc11:title     "The SeRQL Query Language (revision 1.2)" .

_:c  dc10:title     "SeRQL" .
_:c  dc11:title     "SeRQL (updated)" .

Query:

SELECT title
FROM {book} dc10:title {title}

UNION

SELECT title
FROM {book} dc11:title {title}

USING NAMESPACE
    dc10 = <http://purl.org/dc/elements/1.0/>,
    dc11 = <http://purl.org/dc/elements/1.1/>

Result:

title
"The SeRQL Query Language"
"The SeRQL Query Language (revision 1.2)"
"SeRQL"
"SeRQL (updated)"

The union operator simply combines the results from both subqueries, matching bindings by their name:

SELECT title, "1.0" AS "version"
FROM {book} dc10:title {title}

UNION

SELECT title
FROM {x} dc11:title {title}

USING NAMESPACE
    dc10 = <http://purl.org/dc/elements/1.0/>,
    dc11 = <http://purl.org/dc/elements/1.1/>

Result:

titleversion
"The SeRQL Query Language""1.0"
"The SeRQL Query Language (revision 1.2)" 
"SeRQL""1.0"
"SeRQL (updated)" 

Since R3.1, the UNION operation can also be applied to path expressions. With this syntax, the first example can be rewritten to a more compact:

SELECT title
FROM
    {book} dc10:title {title}
    UNION
    {book} dc11:title {title}
USING NAMESPACE
    dc10 = <http://purl.org/dc/elements/1.0/>,
    dc11 = <http://purl.org/dc/elements/1.1/>

11.14.2. INTERSECT (R1.2)

The INTERSECT operation retrieves query results that occur in both its operands.

The following query only retrieves those album creators for which the name is specified identically in both DC 1.0 and DC 1.1.

Data:

@prefix dc10:  <http://purl.org/dc/elements/1.0/> .
@prefix dc11:  <http://purl.org/dc/elements/1.1/> .

_:a  dc10:creator     "George" .
_:a  dc10:creator     "Ringo" .

_:b  dc11:creator     "George" .
_:b  dc11:creator     "Ringo" .

_:c  dc10:creator     "Paul" .
_:c  dc11:creator     "Paul C." .

Query:

SELECT creator
FROM {album} dc10:creator {creator}

INTERSECT

SELECT creator
FROM {album} dc11:creator {creator}

USING NAMESPACE
    dc10 = <http://purl.org/dc/elements/1.0/>,
    dc11 = <http://purl.org/dc/elements/1.1/>

Result:

creator
"George"
"Ringo"

11.14.3. MINUS (R1.2)

The Minus operation returns query results from its first operand which do not occur in the results from its second operand.

The following query returns the titles of all albums of which "Paul" is not a creator.

Data:

@prefix dc10:  <http://purl.org/dc/elements/1.0/> .

_:a  dc10:creator     "George" .
_:a  dc10:title       "Sergeant Pepper" .

_:b  dc10:creator     "Paul" .
_:b  dc10:title       "Yellow Submarine" .

_:c  dc10:creator     "Paul" .
_:c  dc10:creator     "Ringo" .
_:c  dc10:title       "Let it Be" .

Query:

SELECT title
FROM {album} dc10:title {title}

MINUS

SELECT title
FROM {album} dc10:title {title};
             dc10:creator {creator}
WHERE creator like "Paul"

USING NAMESPACE
    dc10 = <http://purl.org/dc/elements/1.0/>,
    dc11 = <http://purl.org/dc/elements/1.1/>

Result:

title
"Sergeant Pepper"

11.15. Query Nesting

SeRQL has several constructs for nested queries. Nested queries can occur as operands for several boolean operators, which are explained in more detail in the following sections.

SeRQL applies variable scoping for nested queries. This means that when a variable is assigned in the outer query, its value will be carried over to the inner query when that variable is reused there.

11.15.1. IN (R1.2)

The IN operator allows set membership checking where the set is defined by a nested SELECT-query.

The following example query uses the IN operator to retrieve all names of Persons, but only those names that also appear as names of Authors.

@prefix ex: <http://example.org/things#> .

_:a  rdf:type         ex:Person .
_:a  ex:name          "John" .

_:b  rdf:type         ex:Person .
_:b  ex:name          "Ringo" .

_:c  rdf:type         ex:Author .
_:c  ex:name          "John" .

_:d  rdf:type         ex:Author .
_:d  ex:name          "George" .

Query:

SELECT name
FROM {} rdf:type {ex:Person};
        ex:name {name}
WHERE name IN ( SELECT n
                FROM {} rdf:type {ex:Author};
                        ex:name {n}
              )
USING NAMESPACE
    ex = <http://example.org/things#>

Result:

name
"John"

11.15.2. ANY and ALL (R1.2)

The ANY and ALL keywords can be used for existential and universal quantification on the right operand of a boolean operator, if this operand is a set, defined by a nested query. The ALL keyword indicates that for every value of the nested query the boolean condition must hold. The ANY keyword indicates that the boolean condition must hold for at least one value of the nested query.

The following query selects the highest value from a set of values using the ALL keyword and a nested query.

Data:

@prefix ex:  <http://example.org/things#> .

_:a  ex:value     "10"^^xsd:int .
_:b  ex:value     "11"^^xsd:int .
_:c  ex:value     "12"^^xsd:int .
_:d  ex:value     "13"^^xsd:int .
_:e  ex:value     "14"^^xsd:int .

Query:

SELECT highestValue
FROM {node} ex:value {highestValue}
WHERE highestValue >= ALL ( SELECT value
                            FROM {} ex:value {value}
                          )
USING NAMESPACE
    ex = <http://example.org/things#>

Result:

highestValue
"14"^^xsd:int

11.15.3. EXISTS (R1.2)

EXISTS is a unary operator that has a nested SELECT-query as its operand. The operator is an existential quantifier that succeeds when the nested query has at least one result.

In the following example, we use EXIST to determine whether any authors are known that share a name with a person, and if so, to retrieve that person's names and hobbies.

Data:

@prefix ex: <http://example.org/things#> .

_:a  rdf:type         ex:Person .
_:a  ex:name          "John" .
_:a  ex:hobby         "Stamp collecting" .

_:b  rdf:type         ex:Person .
_:b  ex:name          "Ringo" .
_:b  ex:hobby         "Crossword puzzles" .

_:c  rdf:type         ex:Author .
_:c  ex:name          "John" .
_:c  ex:authorOf      "Let it be".

Query:

SELECT name, hobby
FROM {} rdf:type {ex:Person};
        ex:name {name};
        ex:hobby {hobby}
WHERE EXISTS ( SELECT n
               FROM {} rdf:type {ex:Author};
                       ex:name {n};
                       ex:authorOf {}
               WHERE n = name
             )
USING NAMESPACE
    ex = <http://example.org/things#>

Result:

namehobby
"John""Stamp collecting"

11.16. Querying context (R2.0)

A new clause, FROM CONTEXT, is introduced in SeRQL 2.0 to allow querying of context. Context can be seen as a grouping mechanism of statements inside a repository, where the group is identified with a context identifier (a URI or a blank node).

A very typical way to use context is tracking provenance of the statements in a repository, that is, which location (on the Web, or on the file system) these statements originate from. For example, consider an application where you add RDF data from different files to a repository, and then one of those files is updated. You would then like to replace the data from that one file in the repository, and to be able to do this you need a way to figure out which statements need to be removed. The context mechanism gives you a way to do that.

By default, a SeRQL query ranges over the total repository. This is known as the default context: we do not specify a context, therefore, the default context is queried. In practice this means that all statements in all contexts in the repository are queried.

In the following example, we have a repository that contains three sets of data. The first set is added without context, the other two each have their own, specific, named context.

Data set 1 (no context):

@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix g:  <http://example.org/contexts/>

g:graph1 dc:publisher "Bob" .
g:graph1 dc:date "2004-12-06T00:00:00Z"^^xsd:dateTime .

g:graph2 dc:publisher "Bob" .
g:graph2 dc:date "2005-01-10T00:00:00Z"^^xsd:dateTime .

Data set 2 (context http://example.org/contexts/graph1):

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

_:a1 foaf:name "Alice" .
_:a1 foaf:mbox <mailto:alice@work.example> .

_:b1 foaf:name "Bob" .
_:b1 foaf:mbox <mailto:bob@oldcorp.example.org> .

Data set 3 (context http://example.org/contexts/graph2):

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

_:a2 foaf:name "Alice" .
_:a2 foaf:mbox <mailto:alice@work.example> .

_:b2 foaf:name "Bob" .
_:b2 foaf:mbox <mailto:bob@newcorp.example.org> .
	

As you can see, the data in each of the named contexts contains different information about the e-mail address of Bob. Using a 'normal' SeRQL query (that is, without using context information), we can retrieve all e-mail addresses quite easily:

Query:

SELECT DISTINCT name, mbox
FROM {x} foaf:name {name};
         foaf:mbox {mbox}
USING NAMESPACE
foaf = <http://xmlns.com/foaf/0.1/>

Result:

namembox
Alicemailto:alice@work.example
Bobmailto:bob@oldcorp.example.org
Bobmailto:bob@newcorp.example.org

However, we can not identify the source of each e-mail address using such a query, because all the statements in the three files are just merged together in a single repository. We can, however, retrieve this information using a context query:

Query:

SELECT DISTINCT source, name, mbox
FROM CONTEXT source
     {x} foaf:name {name};
         foaf:mbox {mbox}
USING NAMESPACE
foaf = <http://xmlns.com/foaf/0.1/>

Result:

sourcenamembox
http://example.org/context/graph1Alicemailto:alice@work.example
http://example.org/context/graph2Alicemailto:alice@work.example
http://example.org/context/graph1Bobmailto:bob@oldcorp.example.org
http://example.org/context/graph2Bobmailto:bob@newcorp.example.org

As you can see, by specifying a variable source in the FROM CONTEXT clause we can retrieve the named context from which the information comes.

We can also specify a named context explicitly by using a URI directly, for example if we only want to query source graph2:

Query:

SELECT name, mbox
FROM CONTEXT <http://example.org/context/graph2>
     {x} foaf:name {name};
         foaf:mbox {mbox}
USING NAMESPACE
foaf = <http://xmlns.com/foaf/0.1/>

Result:

namembox
Alicemailto:alice@work.example
Bobmailto:bob@newcorp.example.org

A SeRQL query may contain any number of FROM CONTEXT clauses and may additionally contain a 'normal' FROM clause as well.

For example, in the following query we combine information from the default context and from the different named contexts to retrieve the most recently published e-mail information:

Query:

SELECT date, source, name, mbox
FROM {source} dc:date {date}	
FROM CONTEXT source
     {x} foaf:name {name};
         foaf:mbox {mbox}
WHERE date >= ALL (SELECT d FROM {} dc:date {d})
USING NAMESPACE
   foaf = <http://xmlns.com/foaf/0.1/>,
   dc = <http://purl.org/dc/elements/1.1/>

Result:

datesourcenamembox
"2005-01-10T00:00:00Z"^^xsd:dateTimehttp://example.org/context/graph2Alicemailto:alice@work.example
"2005-01-10T00:00:00Z"^^xsd:dateTimehttp://example.org/context/graph2Bobmailto:bob@newcorp.example.org

11.17. Example SeRQL queries

11.17.1. Query 1

Description: Find all papers that are about "RDF" and about "Querying", and their authors.

SELECT
   Author, Paper
FROM
   {Paper} rdf:type {foo:Paper};
           ex:keyword {"RDF", "Querying"};
           dc:author {Author}
USING NAMESPACE
   dc = <http://purl.org/dc/elements/1.0/>,
   ex = <http://example.org/things#>

Depicted as a graph, this query searches through the RDF graph for all subgraphs matching the following template:

Figure 11.7. Path expression for query 1

Path expression for query 1

11.17.2. Query 2

Description: Find all artefacts whose English title contains the string "night" and the museum where they are exhibited. The artefact must have been created by someone with first name "Rembrandt". The artefact and museum should both be represented by their titles.

SELECT DISTINCT
   label(ArtefactTitle), MuseumName
FROM
   {Artefact} arts:created_by {} arts:first_name {"Rembrandt"},
   {Artefact} arts:exhibited {} dc:title {MuseumName},
   {Artefact} dc:title {ArtefactTitle}
WHERE
   isLiteral(ArtefactTitle) AND
   lang(ArtefactTitle) = "en" AND
   label(ArtefactTitle) LIKE "*night*"
USING NAMESPACE
   dc   = <http://purl.org/dc/elements/1.0/>,
   arts = <http://example.org/arts/>

Again, depicted as a subgraph template:

Figure 11.8. Path expression for query 2

Path expression for query 2

Note that this figure only shows the path expressions from the from clause. The where clause poses additional constraints on the values of the variables which can't be as easily depicted graphically.

11.17.3. Query 3

Description: Find all siblings of class foo:bar.

SELECT DISTINCT
   Sibling
FROM
   {Sibling, <foo:bar>} rdfs:subClassOf {ParentClass}

Or graphically:

Figure 11.9. Path expression for query 3

Path expression for query 3

Note that the URI foo:bar is not returned as a result (there is an implicit constraint that doesn't allow Sibling to be equal to values that occur in the same multi-value node).

11.19. SeRQL grammar

The following is the BNF grammar of SeRQL, revision 3.0:

ParseUnit        ::= Query [NamespaceDeclList]

NamespaceDeclList::= "using" "namespace" NamespaceDecl ("," NamespaceDecl)*
NamespaceDecl    ::= <PREFIX_NAME> "=" <URI>

Query            ::= TupleQuerySet
                   | GraphQuerySet

TupleQuerySet    ::= TupleQuery [SetOperator TupleQuerySet]
TupleQuery       ::= "(" TupleQuerySet ")"
                   | SelectQuery

GraphQuerySet    ::= GraphQuery [SetOperator GraphQuerySet]
GraphQuery       ::= "(" GraphQuerySet ")"
                   | ConstructQuery

SetOperator      ::= "union" ["all"]
                   | "minus"
                   | "intersect"

SelectQuery      ::= "select" ["distinct"|"reduced"] Projection [QueryBody]
Projection       ::= "*"
                   | [ ProjectionElem ("," ProjectionElem)* ]
ProjectionElem   ::= ValueExpr ["as" Var]

ConstructQuery   ::= "construct" ["distinct"|"reduced"] ConstructClause [QueryBody]
ConstructClause  ::= "*"
                   | PathExprList

QueryBody        ::= ("from" ["context" ContextID] PathExprList)+
                     ["where" BooleanExpr]
                     ["order" "by" OrderExprList]
                     ["limit" <POS_INTEGER>]
                     ["offset" <POS_INTEGER>]

ContextID        ::= Var
                   | Uri
                   | BNode

PathExprList     ::= UnionPathExpr ("," UnionPathExpr)*
UnionPathExpr    ::= PathExpr ("union" PathExpr)*
PathExpr         ::= BasicPathExpr
                   | OptGraphPattern
                   | "(" PathExprList ")"
BasicPathExpr    ::= Node Edge Node [[";"] PathExprTail]
OptGraphPattern  ::= "[" PathExprList ["where" BooleanExpr] "]"

PathExprTail     ::= Edge Node [[";"] PathExprTail]
                   | OptPathExprTail [";" PathExprTail]
OptPathExprTail  ::= "[" Edge Node [[";"] PathExprTail] ["where" BooleanExpr] "]"

PathExprCont     ::= PathExprBranch
                   | PathExprTail

PathExprBranch   ::= ";" PathExprTail

PathExprTail     ::= Edge Node
                   | "[" Edge Node [PathExprCont] ["where" BooleanExpr] "]"

Edge             ::= Var
                   | Uri
Node             ::= "{" [ NodeElem ("," NodeElem)* ] "}"
NodeElem         ::= Var
                   | Value
                   | ReifiedStat
ReifiedStat      ::= "{" [NodeElem] "}" Edge "{" [NodeElem] "}"

OrderExprList    ::= OrderExpr ("," OrderExpr)*
OrderExpr        ::= ValueExpr ["asc"|"desc"]

BooleanExpr      ::= OrExpr
OrExpr           ::= AndExpr ["or" BooleanExpr]
AndExpr          ::= BooleanElem ["and" AndExpr]
BooleanElem      ::= "(" BooleanExpr ")"
                   | "true"
                   | "false"
                   | "not" BooleanElem
                   | "bound" "(" Var ")"
                   | "sameTerm" "(" ValueExpr "," ValueExpr ")"
                   | ValueExpr CompOp ValueExpr
                   | ValueExpr CompOp ("any"|"all") "(" TupleQuerySet ")"
                   | ValueExpr "like" <STRING>
                   | ValueExpr "in" "(" TupleQuerySet ")"
                   | ValueExpr "in" "(" ArgList ")"
                   | "exists" "(" TupleQuerySet ")"
                   | "isResource" "(" Var ")"
                   | "isURI" "(" Var ")"
                   | "isBNode" "(" Var ")"
                   | "isLiteral" "(" Var ")"
                   | "langMatches" "(" ValueExpr "," ValueExpr ")"
                   | "regex" "(" ValueExpr "," ValueExpr [ "," ValueExpr ] ")"

CompOp           ::= "=" | "!=" | "<" | "<=" | ">" | ">="

ValueExpr        ::= Var
                   | Value
                   | "datatype" "(" Var ")"
                   | "lang" "(" Var ")"
                   | "label" "(" Var ")"
                   | "namespace" "(" Var ")"
                   | "localname" "(" Var ")"
                   | "str" "(" ValueExpr ")"
                   | FunctionCall

FunctionCall     ::= Uri "(" [ArgList] ")"

ArgList          ::= ValueExpr ("," ValueExpr)*

Var              ::= <NC_NAME>

Value            ::= Uri
                   | BNode
                   | Literal

Uri              ::= <URI>
                   | <QNAME>

BNode            ::= <BNODE>

Literal          ::= <STRING>
                   | <LANG_LITERAL>
                   | <DT_LITERAL>
                   | <POS_INTEGER>
                   | <NEG_INTEGER>
                   | <DECIMAL>

<URI>            ::= "<" (* a legal URI, see http://www.ietf.org/rfc/rfc2396.txt *) ">"
<QNAME>          ::= <PREFIX_NAME> ":" <NC_NAME_CHAR>*
<BNODE>          ::= "_:" <NC_NAME>

<STRING>         ::= (* A quoted character string with escapes *)
<LANG_LITERAL>   ::= <STRING> "@" <LIT_LANG>
<DT_LITERAL>     ::= <STRING> "^^" (<URI>|<QNAME>)

<POS_INTEGER>    ::= "+"? [0-9]+
<NEG_INTEGER>    ::= "-" [0-9]+
<DECIMAL>        ::= ("+"|"-")? [0-9]* "." [0-9]+

<PREFIX_NAME>    ::= <LETTER> <NC_NAME_CHAR>*
                   | "_" <NC_NAME_CHAR>+

<NC_NAME>        ::= (<LETTER>|"_") <NC_NAME_CHAR>*
<NC_NAME_CHAR>   ::= (* see http://www.w3.org/TR/REC-xml-names/#NT-NCNameChar *)
<LETTER>         ::= (* see http://www.w3.org/TR/REC-xml/#NT-Letter *)

Note: all keywords are assumed to be case-insensitive. Whitespace characters between tokens are not significant other than for separating the tokens. Production rules with a head that is surrounded by angular brackets define tokens (aka "terminals").

Glossary

R

Resource Description Framework

The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax formats.

External references:

S

SPARQL Protocol and RDF Query Language

SPARQL (pronounced "sparkle", a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language, that is, a query language for databases, able to retrieve and manipulate data stored in Resource Description Framework format. In addition, SPARQL defines a Protocol for accessing RDF data sources over the Web.

External references:

See Also Resource Description Framework.