OWL Web Ontology Language Parsing OWL in RDF/XML (original) (raw)

Abstract

An OWL-RDF parser takes an RDF/XML file and attempts to construct an OWL ontology that corresponds to the triples represented in the RDF. This document describes a basic strategy that could be used in such a parser. Note that this is not intended as a complete specification, but hopefully provides enough information to point the way towards how one would build a parser that will deal with a majority of (valid) OWL ontologies.

For example, we do not discuss the implementation or handling ofowl:imports here, nor do we address in depth issues concerned with spotting some of the more obscure violations of the DL/Lite rules.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.

This document was produced as part of the Web Ontology Working Group's effort to develop the OWL Web Ontology Language. It is not part of the OWL language specification, but contains additional non-normative information which the Working Group expects will be useful to some users.

The author and the Working Group welcome comments on this document, but do not guarantee a reply or any further action. Please send comments to public-webont-comments@w3.org(public archive).

The Web Ontology Working Group is part of the W3C Semantic Web Activity (Activity Statement, Group Charter).

The W3C maintains a list of any patent disclosures related to this work.

Table of Contents


1 Introduction

An OWL-RDF parser takes an RDF/XML file and attempts to construct an OWL ontology that corresponds to the triples represented in the RDF. This document describes a basic strategy that could be used in such a parser. Note that this is not intended as a complete specification, but hopefully provides enough information to point the way towards how one would build a parser that will deal with a majority of (valid) OWL ontologies.

For example, we do not discuss the implementation or handling ofowl:imports here, nor do we address in depth issues concerned with spotting some of the more obscure violations of the DL/Lite rules.

The scope of this discussion is OWL DL and OWL Lite, and the procedure described below is targeted primarily at parsing OWL DL ontologies. For example, wheneverrdfs:subPropertyOf is used, OWL DL requires that the subject and object of the triple have corresponding types (e.g. both are either owl:ObjectProperty orowl:DatatypeProperty). If this is not the case, the parser will raise an error. An OWL Full parser should allow this (but it is not necessarily clear what the corresponding abstract syntax for such a construct would be).

2 OWL in RDF

The OWL Semantics and Abstract Syntax [OWL S&AS] document provides a characterisation of OWL ontologies in terms of an abstract syntax. This is a high level description of the way in which we can define the characteristics of classes and properties.

In addition, S&AS gives a mapping to RDF triples. This tells us how such an abstract description of an OWL ontology can be transformed to a collection of RDF triples (which can then be represented in a concrete fashion using, for example RDF/XML).

In order to parse an OWL-RDF file into some structure closer to the abstract syntax we need to reverse this mapping, i.e. determine what the class and property definitions were that lead to those particular triples. Note that this reverse mapping is not necessarily unique. For example, the following two ontology fragments:

Class( a ) Class( b ) SubClassOf( b a )

and

Class( a ) Class( b partial a )

both give rise to the same collection of triples under the mapping:

a rdf:type owl:Class b rdf:type owl:Class b rdfs:subClassOf a

For many purposes, e.g. species validation, this is not necessarily a problem. For other situations, e.g. where an editing tool is being used, we would at least expect a parser to be consistent in the strategy it employed to produce abstract syntax descriptions.

An arbitrary RDF graph may not necessarily correspond to an OWL Lite or DL ontology. In other words, there may not be an OWL Lite or DL ontology which when transformed using the mapping produces the given graph. This is what a species validator attempts to determine: if such an ontology exists. A parser (as described here) will go one step further and actually attempt to construct such an ontology.

2.1 Errors

There are, in general, two ways in which an RDF graph may fail to correspond to an OWL [Lite|DL] ontology.

  1. There is an OWL Lite or DL ontology in abstract syntax form which maps to a superset of the given triples but some of the triples have been forgotten and are not in the graph.
  2. The ontologies in abstract syntax form that map to the triples or any superset of the triples violate some of the restrictions for membership of the OWL Lite or DL subspecies. (This includes the case where there are no such ontologies).

We might (loosely) describe the first as external errors, and the second as internal errors. Examples of_external_ errors include:

Once we have an ontology in abstract form, we can then check for internal errors. For example, there are restrictions on the expressiveness that can be used in OWL Lite (no unions or enumerations and limited cardinality restrictions). The Lite and DL subspecies also have a constraint that effectively says that the collections of URI references of classes, individuals and properties must be disjoint. Thus in OWL Lite and DL we can not use metamodelling devices such as classes as instances.

3 Parser Implementation

The following discussion assumes that we have some implementation of a data structure representing the ontology which is close to the abstract syntax description (something along the lines of our proposedOWL API). We do not discuss the details of such an implementation here — hopefully the meaning of actions such as add a class x or set the functional flag on a property will be clear.

3.1 Streaming vs. non-streaming

Many XML parsers operate in a streaming fashion — elements are reported to the parser as they are encountered during the parse, and the file is processed incrementally. It is difficult to do this when parsing RDF models (or at least when performing a task such as producing an abstract syntax representation of an OWL ontology from a given RDF/XML file). The problem is that we have no guarantee of the order in which the triples in the graph are processed (and thus reported by the streaming parser). A particular syntactic construct may actually be split across several locations in the RDF file. In order to parse in a streaming fashion, we may have to make note of triples encountered earlier on and then come back to process them later. As a concrete example of this, consider a situation where anowl:AnnotationPRoperty is used to make an annotation about a particular individual:

AnnotationProperty( hasName ) Individual( fred hasName "Frederick" )

This results in the triples:

[1] hasName rdf:type owl:AnnotationProperty [2] fred hasName "Frederick"

If we encounter [1] before [2] during the parse, we know that the property is an annotation property, and can thus process [2] as an annotation. If, however, we encounter [2] first, we do not know whether to process[2] as an annotation or a value on the individual. As there is no way of knowing whether or not[1] will occur until we have seen all the triples, we must wait until we have seen all triples before processing[2].

Because of this, our strategy is that the parser does not attempt to process anything until all triples are available. Although it may be possible to process some information in a streaming manner, it reduces the conceptual complexity of the parser if we first collect the triples then process them. Note that this has ramifications on the resources that will be required when parsing — when parsing large RDF graphs, large amounts of memory may be needed.

We assume that while parsing we have access to the objects in the ontology already created, e.g.. if an ObjectPropertyp has been introduced we can get access to it. When we refer to, for example, the ObjectProperty p, we mean the ObjectProperty that has been defined with namep.

In addition, we assume that we can query the RDF graph to determine the presence or absence of particular arcs (e.g. precisely the kind of functionality provided by an RDF API such as Jena).

Using Triples

While processing the graph, we keep a record of any triples that have been used in the translation. For example, if there is a triple:

x rdf:type owl:Class

which results in the introduction of a class x.

Class( c )

then we consider that triple to have been used.

3.2 Named Objects

We first identify the name classes and properties that make up the ontology.

Classes

For any non-bnode x in the graph s.t. there is a triple:

x rdf:type owl:Class

introduce a new class x.

Class( c )

We will refer to any such classes that have been introduced in this manner as named classes.

Properties

Properties should all be introduced with an explicit type.

ObjectProperty

For any node p in the graph where there is one of the following triples:

p rdf:type owl:ObjectProperty p rdf:type owl:TransitiveProperty p rdf:type owl:InverseFunctionalProperty p rdf:type owl:SymmetricProperty

introduce a new ObjectProperty p.

ObjectProperty( p )

In addition, if any of the latter three triples are present, the appropriate flag should be set on the property, e.g.:

ObjectProperty( p Transitive )

If there is also a triple of the form:

p rdf:type FunctionalProperty

then the property should be set as functional.

For any object property p dealt with as above, there may also be an (optional) triple:

p rdf:type rdf:Property

DatatypeProperty

For any node p in the graph where there is a triple:

q rdf:type owl:DatatypeProperty

introduce a new DatatypeProperty q:

DatatypeProperty( q )

If there is also a triple of the form:

q rdf:type FunctionalProperty

then the property should be set as functional.

For any data property pdealt with as above, there may also be an (optional) triple:

p rdf:type rdf:Property

AnnotationProperty

For any node a in the graph where there is a triple:

a rdf:type owl:AnnotationProperty

introduce a new AnnotationProperty a.

AnnotationProperty( a )

For any annotation property p dealt with as above, there may also be an (optional) triple:

p rdf:type rdf:Property

Datatypes

For any node d in the graph where there is a triple:

d rdf:type rdfs:Datatype

introduce a new Datatype d.

Datatype( d )

There may also be an (optional) triple:

d rdf:type rdfs:Class

3.3 Axioms

Now that the named classes and properties have been identified, we can determine the axioms that have been asserted.

Property Axioms

Property axioms assert characteristics of properties.

Domain

For any triples of the form:

p rdfs:domain d

translate d to a class description, and add the resulting class description to the domains of the property p. If p is not a property, raise an error.

Range

For any triples of the form:

p rdfs:range r

if p is an ObjectProperty, then translate r to a class description, and add the resulting class description to the ranges of the property p. Ifp is a data property, convert r to adata range and add the result to the ranges of the property.

subProperty & equivalentProperty

For any triples of the form:

p rdfs:subPropertyOf q

or

p owl:equivalentProperty q

first check that either:

  1. p and q are ObjectProperties;
    or
  2. p and q are DatatypeProperties.

If so, add an axiom asserting thatq is a superproperty or equivalent property ofp as appropriate. If neither of the above are true, raise an error.

inverseOf

For any triples of the form:

p owl:inverseOf q

Check that p and q are ObjectProperties. If not, raise an error. If so, add qto the collection of inverses of p.

Class Definitions

We have to deal with any class definitions that occur in the ontology. For example, the following RDF fragment:

arises when a class a has been given a complete definition involving an intersection.

For any named class x, do the following.

Class Axioms

Class axioms can provide relationships and characteristics of arbitrary class descriptions.

SubClass

For all triples of the form:

c rdfs:subClassOf d

add a new axiom:

SubClassOf( ct dt )

where ct is the translation of c to a class description, and dt the translation ofd. If c is a named class, then due to the ambiguity of the reverse mapping, an alternative here is to include the assertion as part of the definition of the class and add the axiom:

Class( c partial dt )

to the ontology. Note that in this case, if the class already has a partial description in the ontology, e.g. there is an axiom:

Class( c partial e1 e2...en )

then we can simply add dt to this axiom to get:

Class( c partial e1 e2...en dt )

rather than introducing a new axiom.

EquivalentClass

See below.

DisjointClass

See below.

Individual Axioms

Individual axioms assert relationships about the equality and inequality of individuals.

Same

For all triples of the form:

x owl:sameAs y

where x and y are individualIDs, add individuals x and y (if necessary) and an axiom:

SameIndividual( x y )

Different

For all triples of the form:

x owl:differentFrom y

where x and y are individualIDs, add individuals x and y (if necessary) and an axiom:

DifferentIndividuals( x y )

AllDifferent

For all triples of the form:

x rdf:type owl:AllDifferent

where x is a bnode, there should also be a triple:

x owl:distinctMembers l

where l is a list. Add an axiom:

DifferentIndividuals( i1 i2...in )

where i1 i2 ... in are the individuals in the list l. If l is not a list (of named individuals), x is not a bnode or theowl:distinctMembers triple is missing, raise an error.

3.4 Translating Lists

Lists are used in a number of places in OWL ontologies: for example to represent the arguments of boolean expressions or the individuals listed in an enumeration (oneOf). For the purposes of producing a OWL ontology, order is not particularly important — the order of the operands in an intersection or union does not alter their semantics, so for simplicitly, we consider converting a node representing a list to a set of nodes. Lists are thus handled using the following simple recursive procedure.

The node rdf:nil is translated to the empty set.

For a node l s.t. there is a triple:

l rdf:type rdf:List

find the node r s.t. there is a triple:

l rdf:rest r

If such a node does not exist, or there are are multiple nodes which are the objects of such triples, raise an error. The noder should be a list node itself. Convert this node to a set of nodes rs. Now find the node s.t. there is a triple:

l rdf:first f

Again, there should be a single such node — if not, raise an error. Return the result of adding this node to the setrs.

For cases where we expect a list of class descriptions, we do the obvious thing, e.g. convert to a collection of nodes, then translate each node using the procedure described below.

For any node l which is used as a list (e.g. as the subject of a rdf:first or rdf:rest, the object of a rdf:rest, or in a place where a list is expected, there may be an (optional) triple:

l rdf:type rdf:List

Circular lists are forbidden. If we have:

l rdf:rest r

and we encounter the node l used as the object of anrdf:rest triple while translatingr, then an error should be raised.

3.5 Translating Class Decriptions

If a node is used in particular contexts (e.g. as the subject or object of an owl:subClassOf triple) then we know that the node is intended to represent a class expression. In order to handle this, we define a procedure which takes a node in the RDF graph and yields a class expression.

If n is a named class, then return n.

If this is not the case, n must be the subject of the subject of anrdf:type triple with objectowl:Restriction or be the subject of exactly one triple involving owl:oneOf,owl:intersectionOf, owl:unionOf,owl:complementOf. If not, raise an error.

The node may also be the subject of triple:

n rdf:type owl:Class

or

n rdf:type rdfs:Class

Translation then proceeds on a case-analysis of the particular triple found.

3.6 Translating Data Ranges

If n is an XML schema data type orrdf:XMLLiteral, then return that type.

If n is a datatype introduced as above, then return that datatype.

If n is a blank node and there is a triple:

n owl:oneOf l

then l should be a list of data values. Return:

oneOf(d1 d2...dn)

where d1 d2... dn are the data values in the listl. If l is not a list, raise an error.

3.7 Structure Sharing

[OWL S&AS] includes the following comment relating to translation from abstract syntax to RDF graphs:

For many directives these transformation rules call for the transformation of components of the directive using other transformation rules. When the transformation of a component is used as the subject, predicate, or object of a triple, even an optional triple, the transformation of the component is part of the production (but only once per production) and the main node of that transformation should be used in the triple.

In practice, this means that blank nodes (i.e. those with no name) which are produced during the transformation and represent arbitrary expressions in the abstract syntax form should not be "re-used".

Consider the following example:

Class(A partial intersectionOf(C D))

In this case, translation to an RDF graph would result in a blank node representing the intersection of C and D. This would then be used as the object of a rdfs:subClassOf triple withA as subject.

Now consider if the ontology also included a second axiom as below.

Class(A partial intersectionOf(C D)) Class(B partial intersectionOf(C D))

In this case, we are not allowed to "re-use" the blank node, but must instead produce a new node to represent the intersection being used in the definition of B, even though the expressions are identical.

There are, however, two cases where a blank node corresponding to an expression can be used in more than one place — when the translation results from an EquivalentClasses orDisjointClasses axiom. These are discussed in more detail below.

In order to check whether an RDF graph corresponds to an OWL [Lite|DL] ontology, we must check that the rules for structure sharing have not been violated. We describe strategies for doing this.

Marking Used Blank Nodes

We keep track of all the blank nodes that have been used during the parsing process. Effectively, this means that whenever we see a blank node that occurs as the object of a triple involvingowl:complementOf, rdf:type,owl:someValuesFrom, owl:allValuesFrom or rdfs:subClassOf, occurs as a value in a list which is the object of anowl:intersectionOf or owl:unionOf, or occurs as the subject of a triple involvingrdfs:subClassOf, we first check to see whether the node has been used. If so, then structure sharing has occurred and the ontology is not in DL. If not, then we mark the node as used and carry on. Processingowl:equivalentClass andowl:disjointWith triples is slightly more complicated as the mapping rules permit us to share structure in particular ways.

EquivalentClass

In general, an equivalence axiom

EquivalentClasses( D1 D2...Dn )

is translated to a collection of nodes, one for each expression in the equivalence, and a number of owl:equivalentClasstriples between these nodes such that those triples form a connected graph over the nodes. In other words, starting from any node in the collection, we can get to any other node in the collection along a path that only traverses owl:equivalentClass edges in either direction.

In practice, this means that a blank node may participate in more than one owl:equivalentClass triple (but note that it cannot also participate in other triples).

A possible strategy for dealing withowl:equivalentClass triples is as follows.

  1. Collect all owl:equivalentClass triples that occur in the graph.
  2. Partition the nodes that occur in these triples into sets, where each set consists of connected blank nodes and URI references connected to them, or pairs of URI references: if n andm are in a set, there is a path between them consisting only of owl:equivalentClass edges.
  3. For each set of nodes n1n2...nn, add an equivalence axiom:
    EquivalentClasses( tn1tn2...tnn )
    where tni is the translated description of ni. In addition, if any of the ni are blank nodes, check that they have not been used. If they have, this isnot an OWL DL ontology. If they are not used, mark as_used_

An improvement to this strategy is to attempt to identify the situations where the owl:equivalentClass triple may have come from a class definition (recall the ambiguity of the mapping). To address this, if any of the node sets have size 2, and have been produced because of a single triple:

c owl:equivalentClass d

where c is a named class, then we translate the assertion as a definition of the class and add the axiom:

Class( c complete dt )

to the ontology. In order to correctly parse OWL Lite ontologies, this approach is essential, as it ensures that a situation such as:

c owl:equivalentClass _:a _:a rdf:type owl:Restriction _:a owl:onProperty p _:a owl:minCardinality 0

is translated to a definition of the class rather than a class axiom (the resulting axiom would not be permitted in OWL Lite).

DisjointClass

The rules for DisjointClasses axioms tell us that an axiom:

DisjointClasses( D1 D2...Dn )

is translated to a collection of nodes, one for each expression in the equivalence, and a number of owl:disjointWithtriples, such that every node in the collection is connected to every other node by at least one triple (in either direction). Again, this may lead to blank nodes being used in more than one place.

A possible strategy for dealing with owl:disjointWithtriples is as follows:

Tests for Structure Sharing

There are a number of tests in the OWL Test Cases [OWL Tests] which are designed to illustrate these issues, in particular:

3.8 Everything Else

Once all the triples that relate to primitive object definitions and axioms have been processed, (more or less) everything else is assumed to be a fact relating to individuals. For all remaining triples:

x p y

the action taken depends on the type of p. If no explicit type has been given for the property p, raise an error.

If p is an annotation property, then add an appropriate annotation to the object x (which should correspond to a named class, property, datatype, ontology, individual or unnamed ontology or individual).

If p is an ObjectProperty, assume that the subject and object are individuals and add a fact:

Individual( x value( p y ) )

If p is an DatatypeProperty, assume that the subject is an individual and add a fact:

Individual( x value( p dy ) )

where dy is the translation of y to a literal value.

4 Error Recovery

There are many cases in the above discussion where errors may be raised — for example if properties are used without explicit typing. Strictly speaking, an OWL DL or Lite parser could choose to fail when encountering such situations. Of course, in practice, we might expect parsers to be more resilient and be able to recover. So for example, if the parser detects the following use of a propertyp:

x rdf:type owl:Thing y rdf:type owl:Thing x p y

it is reasonable to assume that the property p is intended to be an owl:ObjectProperty. In this case, we might expect the parser to assume that p is an ObjectProperty and try and proceed with the parse (but would of course warn the user about the assumption being made).

References

[OWL S&AS]

OWL Web Ontology Language Semantics and Abstract Syntax Peter F. Patel-Schneider, Patrick Hayes and Ian Horrocks, eds. W3C Proposed Recommendation, 15 December 2003.

[OWL Tests]

OWL Web Ontology Language Test Cases Jeremy J. Carroll and Jos De Roo, eds. W3C Proposed Recommendation, 15 December 2003.


Valid XHTML 1.0!