About Inference Rules Files

The structure and rules of creating inferencing rules.

Inferencing can be performed to generate additional relationships once the CGE builds a database. CGE accomplishes this with a user defined rules file, which contains a set of rules specific to the data being processed. The rules file format and semantics are based on Apache Jena rules.

In this version of CGE there are certain limitations to these rules:

The @include construct is not supported.
Calls to functions or built-in primitives, such as print, all, or max are not supported.
The [...] syntax is not supported, including named rules.
Backward chaining is not supported. Furthermore, backward syntax (<-) cannot be used to express forward chaining.
If multiple premises or conclusions (quads) are specified on either side of the -> in a single rule, each pair must be separated by a space. The use of commas as separators is not supported.
Native UTF-8 is not supported in rules files, however Unicode characters are supported within URIs, where they are valid syntax.

CAUTION: It is important to note that turning inferencing on/off is a database level setting. Turning inferencing on can negatively impact performance. When this setting is set to true, the inferencer will run during the first time that the database compiles and for subsequent updates. Since the whole database is examined when inferencing occurs, turning this feature on after a period of time during which it was turned off, will still affect the data that was loaded during the period when it was turned off. In other words, if a user turns inferencing off and then adds or updates data, that data will also be inferenced once the user turns the inferencing feature on again and performs another update.

Inference Rules File Format

The rules file has the form: one or more prefixes, followed by one or more rules:

left-hand side quad(s) -> right-hand side quad(s)

Comments are denoted by a # character at the beginning of a line. The quad, or quads, on the left-hand side of the -> are the quads that the inferencer will attempt to match to infer the quad, or quads, on the right-hand side of the ->. All of the left-hand-side rules must be satisfied in order for the inference to be made. Each rule must end with a period (.) and a newline character, and each rule must be on its own line. The inferencer does not recognize the escape character (\).

A quad takes the form:

(subject predicate object [graph])

It is mandatory to specify the subject, predicate and object. The graph field is optional. If a graph is not specified, the inferencer will use the default graph and the rule will apply only to triples in that graph. The subject, predicate and object fields can be any valid form of these fields as specified by the N-Quads grammar, except as described in the list of limitations above. The graph field in a quad has the same valid forms as an object. If a rule contains a URI, that URI must have existed in at least one of the data files that were included in the database. Alternatively, to apply a new ontology that was not in the original data files, create a new file that contains any new objects and predicates, and add that file to the database. The fields of a quad in a rule can also be variables, or shorthand versions of strings built from a specified prefix. A variable must begin with a ? character, followed by a valid name. A name can contain any of the following characters:

name := [a-zA-Z][_a-zA-Z0-9]*

To specify one or more prefixes at the beginning of a rules file, before any rules, use the following syntax: @prefix prefix_name: <http://urlstring#>

A rules file does not have to use prefixes. However they can be used to simplify quads within rules. For example, prefixes are useful for creating shorthand versions of URIs that will be used repeatedly in the rules statements.

As with rules, each prefix must end with a period (.) and a newline, and each prefix must be on its own line.

Inferencing a Database

When a database is built with inferencing enabled and a rules.txt file is found in the database directory, CGE will start applying the forward chaining rules found in that file to the triples/quads read from the RDF. The inferred quads are added to the in-memory database and stored in the compiled dbQuads file. If inferencing is enabled, the rules.txt file is also used when updating a database using SPARUL commands. As with any other quads added by the SPARUL commands, the inferred quads are added to the in-memory database but are not written to disk until the database is check-pointed.

Note: Inferencing is enabled by default and may be disabled by setting the value of the cge.server.InferOnUpdate control parameter to 0. Control parameters are configuration keywords that allow controlling server configuration settings.

Examples

The following prefix and rule examples are from the rule set used for the LUBM data.

A prefix statement

@prefix ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 
(?x rdf:type ub:Course) -> (?x rdf:type ub:Work) .

In this example the term rdf:type is shorthand for:

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>.

The inferencer expands the prefixed version of the string to the full string when creating the rules used during inferencing. The rule in this example says that for a given triple ?x rdf:type ub:Course in the default graph, infer a new triple ?x is-type ub:Work and add it to the default graph, as shown in the next example.

Inferring a new triple

Applying this rule:

(?x rdf:type ub:Course) -> (?x rdf:type ub:Work) .

to this triple in the data input:

<http://www.Department10.University0.edu/Course6> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> \
 <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#Course>

infers (and adds) this new triple to the default graph:

<http://www.Department10.University0.edu/Course6> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> \
<http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#Work>

A rule to establish a hierarchy of types

The following rule shows one way that ontology rules are used to establish a hierarchy of data types.

(?x rdf:type ub:Faculty) -> (?x rdf:type ub:Employee) . 
(?x rdf:type ub:Employee) -> (?x rdf:type ub:Person) .

The following rule uses a variable for the graph field. This rule is excerpted from the RDFS rules file, which is based on some of the Jena rules for RDFS and OWL. The complete rules file is reproduced in Sample RDFS Rules File.

(?x ?a ?y ?g) (?a owl:inverseOf ?b ?g) -> (?y ?b ?x ?g) .

This rule is also an example of another way rules are used to establish relationships between triples in the database. This rule states that if two predicates A and B are defined to be inverses of each other and then if the triple (X A Y) appears in the database, then the system can infer that the triple (Y B X) is also there, or should be there.

A rule to establish a hierarchy of types

The following rule shows one way that ontology rules are used to establish a hierarchy of data types.

(?x rdf:type ub:Faculty) -> (?x rdf:type ub:Employee) . 
(?x rdf:type ub:Employee) -> (?x rdf:type ub:Person) .

A Faculty member is also an Employee, an Employee is also a Person, and so on. Such a rule eliminates the need to explicitly including each desired type for each such item in the database. Note that this rule did not use the graph field. The following rule uses a variable for the graph field. This rule is excerpted from the RDFS rules file, which is based on some of the Jena rules for RDFS and OWL. The complete rules file is reproduced in Sample RDFS Rules File.

(?x ?a ?y ?g) (?a owl:inverseOf ?b ?g) -> (?y ?b ?x ?g) .

Cross-database rules

Another use of a rules file is to establish a relationship between triples in two different databases. For example, if one were extending a U.S.-based database with some additional data from France, it might streamline the process to include such rules as:

(<x.cray.eg.france#personne> <x.cray.eg.france#nom> ?name <x.cray.eg.frenchdb>) -> \
(<x.cray.eg.us#person> <x.cray.eg.us#name> ?name <x.cray.eg.usdb>) .

By this rule the fields in the quads are translated into their English counterparts, consistent with the data that is already in the American based database.