Resource Description Framework (RDF): Concepts and Abstract
Syntax ([original](http://www.w3.org/TR/2002/WD-rdf-concepts-20021108)) ([raw](?raw))
Abstract
The Resource Description Framework (RDF) is a framework for representing information in the Web.
This document defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. It also includes discussion of design goals, meaning of RDF documents, key concepts, datatyping, character normalization and handling of URI references.
Status of this Document
This is a W3C RDF Core Working Group Working Draft produced as part of the W3C Semantic Web Activity (Activity Statement).
This document is being released for review by W3C Members and other interested parties to encourage feedback and comments, especially with regard to the sections on datatyping and how the changes affect existing implementations and content.
This is a public W3C Working Draft and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
In conformance with W3C policy requirements, known patent and IPR constraints associated with this Working Draft are detailed on the RDF Core Working Group Patent Disclosure page.
Comments on this document are invited and should be sent to the public mailing list www-rdf-comments@w3.org. An archive of comments is available at http://lists.w3.org/Archives/Public/www-rdf-comments/.
Table of contents
- 1. Introduction
- 2. RDF background, rationale and concepts
- 2.1 Motivation
- 2.2 Design goals
* 2.2.1 A simple data model
* 2.2.2 Formal semantics and inference
* 2.2.3 Extensible URI-based vocabulary
* 2.2.4 XML-based syntax
* 2.2.5 Use XML schema datatypes
* 2.2.6 Anyone can make simple assertions about anything
* 2.2.7 Arbitrary expression of simple facts
* 2.2.8 A basis for binding agreements - 2.3 RDF concepts
* 2.3.1 Graph data model
* 2.3.2 URI-based vocabulary and node identification
* 2.3.3 Datatypes
* 2.3.4 Literals
* 2.3.5 XML serialization syntax
* 2.3.6 Representation of simple facts
* 2.3.7 Entailment - 2.4 Meaning of RDF
* 2.4.1 Asserted and non-asserted forms
* 2.4.2 Social meaning
* 2.4.3 Interaction between social and formal meaning
* 2.4.3.1 Example
* 2.4.4 Authoritative definition of a predicate - 2.5 RDF core URI vocabulary and namespaces
- 3. XML Content within an RDF Graph
- 4. Abstract Syntax
- 5. Additional technical considerations
- 6. Acknowledgments
- 7. References
1. Introduction
The Resource Description Framework (RDF) is a framework for representing information in the Web.
This document defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. It also includes discussion of design goals, meaning of RDF documents, key concepts, datatyping, character normalization and handling of URI references.
Normative documentation of the RDF core falls into the following areas:
- XML serialization syntax [RDF-SYNTAX],
- formal semantics [RDF-SEMANTICS], and
- this document.
The framework is designed so that vocabularies can be layered on top of this core. RDF vocabulary definition language (RDF schema) [RDF-VOCABULARY] is the first such vocabulary. Others (cf. OWL [OWL] and the applications in the primer [RDF-PRIMER]) are in development.
1.1 Structure of this document
In section 2, some background to the design goals and rationale of RDF is presented. There is also some discussion of the intended implications of publishing an RDF document (section 2.4).
RDF's abstract syntax is a graph, which can be serialized using XML (but which is quite distinct from XML's tree-based infoset [XML-INFOSET]). The abstract syntax captures the fundamental structure of RDF, independently of any concrete syntax used for serialization. The formal semantics of RDF are defined in terms of the abstract syntax. XML content of literals is described in section 3, and the abstract syntax is defined in section 4 of this document.
Section 5 discusses character normalization and fragment identifier use.
1.2 Background reading
RDF draws upon ideas from knowledge representation, artificial intelligence and data management, including from Conceptual Graphs, logic-based knowledge representation, frames, and relational databases. Some possible sources of background information are [Sowa] [CG] [KIF] [Hayes] [Luger] [Gray].
2. RDF background, rationale and concepts
RDF has an abstract syntax that reflects a simple graph-based data model, and formal semantics with a rigorously defined notion of entailment providing a basis for well founded deductions in RDF data.
2.1 Motivation
The development of RDF has been motivated by the following uses, among others:
- Web metadata: providing information about Web resources and the systems that use them (e.g. content rating, capability descriptions, privacy preferences, etc.)
- Applications that require open rather than constrained information models (e.g. scheduling activities, describing organizational processes, annotation of Web resources, etc.)
- To do for machine processable information (application data) what the World Wide Web has done for hypertext: to allow data to be processed outside the particular environment in which it was created, in a fashion that can work at Internet scale.
- Interworking among applications: combining data from several applications to arrive at new information.
- Automated processing of Web information by software agents: the Web is moving from having just human-readable information to being a world-wide network of cooperating processes. RDF provides a world-wide lingua franca for these processes.
RDF is designed to represent information in a minimally constraining, flexible way. It can be used in isolated applications, where individually designed formats may be more perspicuous, but RDF's generality offers greater value from sharing. The value of information thus increases as it becomes accessible to more applications across the entire Internet.
2.2 Design goals
The design of RDF is intended to meet the following goals:
- A simple data model
- Formal semantics and provable inference
- Extensible URI-based vocabulary
- XML-based syntax
- Support use of XML schema datatypes
- Anyone can make simple assertions about anything
- Universal expression of simple facts
- A basis for legally binding agreements
2.2.1 A simple data model
RDF has a simple data model that is easy for applications to process and manipulate. The data model is independent of any specific serialization syntax.
NOTE: the term "model" used here in "data model" has a completely different sense to its use in the term "model theory". See the RDF model theory specification [RDF-SEMANTICS] or a textbook on logical semantics (e.g., [HUNTER] [DAVIS]) for more information about "model theory" as used in the literature of mathematics and logic.
2.2.2 Formal semantics and inference
RDF has a formal semantics which provides a dependable basis for reasoning about the meaning of an RDF expression. In particular, it supports rigorously defined notions of entailment which provide a basis for defining reliable rules of inference in RDF data.
2.2.3 Extensible URI-based vocabulary
The vocabulary is fully extensible, being based on URIs with optional fragment identifiers (URI references, orURIrefs). URI references are used for naming all kinds of things in RDF.
The other kind of value that appears in RDF data is a literal.
2.2.4 XML-based syntax
RDF has a recommended XML serialization form [RDF-SYNTAX], which can be used to encode the data model for exchange of information among applications.
2.2.5 Use XML schema datatypes
RDF can use values represented according to XML schema datatypes [XML-SCHEMA2], thus assisting the exchange of information between RDF and other XML applications.
2.2.6 Anyone can make simple assertions about anything
To facilitate operation at Internet scale, RDF is an open-world framework that allows anyone to make simple assertions about anything. In general, it is not assumed that all information about any topic is available. A consequence of this is that RDF cannot prevent anyone from making assertions that are nonsensical or inconsistent with the world as people see it, and applications that build upon RDF must find ways to deal with incomplete and conflicting sources of information. (This is where RDF departs from more prescriptive approaches to representing data in XML, which aim to present information that is well-formed and complete for an application's needs.)
2.2.7 Arbitrary expression of simple facts
RDF can represent arbitrary information that can be expressed simple facts. (What constitutes a simple fact is discussed later, in section 2.3.6)
2.2.8 A basis for binding agreements
RDF is intended to convey assertions that are meaningful to the extent that they may, in appropriate contexts, be used to express the terms of binding agreements.
This goal is explored further in section 2.4 below.
2.3 RDF concepts
RDF uses the following key concepts:
- Graph data model
- URI-based vocabulary
- Datatypes
- Literals
- XML serialization syntax
- Information as representation of simple facts
- Entailment
2.3.1 Graph data model
The underlying structure of any expression in RDF can be viewed as a directed labelled graph, which consists of nodes and labelled directed arcs that link pairs of nodes (these notions are defined more formally in section 4). The RDF graph is a set of triples:
Each property arc represents a statement of a relationship between the nodes that it links, having three parts:
- a property that describes some relationship (also called a predicate),
- a value that is the subject of the statement, and
- a value that is the object of the statement.
The direction of the arc is significant: it always points toward the object of a statement.
The meaning of an RDF graph is the conjunction (i.e. logical AND) of all the statements that it contains.
2.3.2 URI-based vocabulary and node identification
Nodes in an RDF graph are URIs with optional fragment identifiers (URI references, or URIrefs), literals, or blank (having no separate form of identification). Arcs are labelled with URI references. (See [URIS], section 4, for a description of URI reference forms, noting that relative URIs are not used in an RDF graph. See also section 4.1.)
The URI reference or literal on a node identifies what that node represents. The label on an arc identifies the relationship between the nodes connected by the arc. The arc label may also be a node in the graph.
A blank node is an RDF graph node that is not a URI reference or a literal. In the RDF abstract syntax, a blank node is just a unique node that can be used in one or more RDF statements, and has no globally distinguishing identity.
A convention used by some linear representations of an RDF graph to allow several statements to reference the same blank node is to use a blank node identifier, which is a local identifier that can be distinguished from all URIs and literals. When graphs are merged, their blank nodes must be kept distinct if meaning is to be preserved; this may call for re-allocation of blank node identifiers.
Note that blank node identifiers are not part of the RDF abstract syntax, and the representation of statements that use blank nodes is entirely dependent on the particular concrete syntax used.
2.3.3 Datatypes
Datatypes are used by RDF in the representation of values such as integers, floating point numbers and dates.
RDF uses the datatype abstraction defined by XML Schema Part 2: Datatypes [XML-SCHEMA2]. A datatype consists of a lexical space, a value space and a datatype mapping.
A datatype mapping is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype:
- Each member of the lexical space is paired with (maps to) exactly one member of the value space.
- Each member of the value space is paired with at least one value in the lexical space (a lexical representation for that value).
With one exception, the datatypes used in RDF have alexical space consisting of a set of strings. The exception is rdfs:XMLLiteral, whose lexical space also includes pairs of strings and language identifiers. The value obtained through its datatype mapping may depend on the language identifier.
For example, the datatype mapping for the XML Schema datatypexsd:boolean, where each member of the value space (represented here as 'T' and 'F') has two lexical representations, is as follows:
Value Space | {T, F} |
---|---|
Lexical Space | {"0", "1", "true", "false"} |
Datatype Mapping | {<"true", T>, <"1", T>, <"0", F>, <"false", F>} |
RDF predefines just one datatype rdfs:XMLLiteral, used for embedding XML in RDF (see section 3).
There is no built-in concept of numbers or dates or other common values. Rather, RDF defers to datatypes that are defined separately, and identified with URIs.The predefined XML Schema datatypes [XML-SCHEMA2] are expected to be widely used for this purpose. The defining authority of a URI which identifies a datatype is responsible for specifying the datatype's lexical space, value space and datatype mapping.
RDF provides no mechanism for defining new datatypes. XML Schema Datatypes [XML-SCHEMA2] provides an extensibility framework suitable for defining new datatypes for use in RDF.
2.3.4 Literals
Literals are used to identify values such as numbers and dates by means of a lexical representation. Anything represented by a literal could also be represented by a URI, but it is often more convenient or intuitive to use literals.
A literal may be the object of an RDF statement, but not the subject or the arc.
Literals may be plain or typed :
- A plain literal is a string combined with an optional language identifier. This should be used for plain text in a natural language. As recommended in the RDF formal semantics [RDF-SEMANTICS], these plain literals are self-denoting.
- A typed literal is a string, a datatype URI and an optional language identifier. It denotes the member of the identified datatype's value space obtained by applying the datatype mapping to the literal string.
Continuing the example from section 2.3.3, the typed literals which can be defined using the XML Schema datatype xsd:boolean are:
Typed Literal | Datatype Mapping | Value |
---|---|---|
<xsd:boolean, "true"> | <"true", T> | T |
<xsd:boolean, "1"> | <"1", T> | T |
<xsd:boolean, "false"> | <"false", F> | F |
<xsd:boolean, "0"> | <"0", F> | F |
2.3.5 XML serialization syntax
An RDF graph, as described by the RDF abstract syntax, can be represented in various ways, using different concrete syntaxes but each conveying a common RDF meaning.
Only the XML syntax [RDF-SYNTAX] is normatively specified and recommended for use to exchange information between applications.
2.3.6 Representation of simple facts
Roughly, a "simple fact" is the kind of information that can be stored in one row of a relational database, possibly about any nameable thing or concept.
The basic building block of RDF is a statement, which is a binary relational assertion. For example, the expression "floats(oil,water)" is a binary relational assertion expressing that oil floats on water. The term "floats" names a relationship that holds between "oil" and "water". An RDF statement can also contain a variable; e.g., as in "floats(?x,water)" expressing that there is something that floats on water, where "?x" stands for the something, without saying what it is.
Conjunction (logical-AND) of statements can be used to express more complex facts, such as "floats(oil,water) AND burns(oil,air)". Using the same variable in several different statements of a conjunction can say more than one might immediately expect, e.g. "type(?x,fluid) AND floats(?x,water)" says there is a fluid that floats on water.
Relationships involving more than two things can be expressed as a conjunction of binary relations, so "boilsAt(water,100C,1atm)" could be expressed as the existence of a boiling event, say "?b", such that "boils(?b,water) AND temp(?b,100C) AND press(?b,1atm)".
The expressive power of RDF corresponds to the existential-conjunctive (EC) subset of first order logic [Sowa]. It does not provide means to express negation (NOT) or disjunction (OR). RDF is unusual, for a first order logic subset, in that it allows statements to be made about the relation terms themselves, e.g. "type(floats,physical-relationship) and floats(oil,water)". This kind of expression is more commonly associated with higher order logics, but the use allowed by RDF has first-order semantics [RDF-SEMANTICS] [[[cite section when MT is stabilized]]].
Through its use of extensible URI-based vocabularies, RDF provides for expression of facts about arbitrary subjects; i.e. assertions of named properties about specific named things. A URI can be constructed for any thing that can be named, so RDF facts can be about any such things.
In an RDF graph, the function of variables in the examples above is provided by blank nodes.
2.3.7 Entailment
The ideas on meaning and inference in RDF are underpinned by the concept of entailment. An RDF expression A is said toentail another RDF expression B if every possible arrangement of things in the world that makes A true also makes B true. On this basis, if we presume or demonstrate the truth of A then we can also infer the truth of B. Entailment is discussed at greater length in the RDF formal semantics document [RDF-SEMANTICS].
This idea of entailment sets RDF apart from many other network data formats. What obligations does an entailment place on a processor of RDF data? The answer is: none. RDF applications are not required to find all facts that can be inferred on the basis of allowed entailments. (Further, it may be not possible to find all such facts.) But useful applications may infer some such facts, and treat those facts as if they were part of the supplied input data.
The RDF test cases described by [RDF-TESTS] contain some entailment and non-entailment tests (positive entailment tests and negative entailment tests), covering both RDF-entailment and RDFS-entailment. A positive entailment test indicates that the indicated conclusion can be inferred from the given antecedents; RDF applications are allowed to perform such inferences, but not required to do so. A negative entailment test indicates that an RDF application is not entitled by the rules of RDF alone to infer the indicated conclusion from the corresponding antecedent. A non-entailment does not mean that the conclusion is necessarily false: it may be true for reasons unrelated to the antecedent facts.
So we have the situation that a positive entailment does not mean that an RDF application must infer the conclusion, and a negative entailment does not mean the conclusion is necessarily false. How are we to judge whether a given RDF application is truly playing by the rules of RDF? The answer lies in the existence of aproof. An application that validly infers a conclusion from some antecedent facts must do so in a series of steps that can be directly traced to allowable entailments, which series constitutes a proof. The allowable entailments and corresponding proof steps sanctioned by the RDF specification are set out in the RDF formal semantics document [RDF-SEMANTICS].
2.4 Meaning of RDF
There are two aspects to the meaning of an RDF graph. There is the formal meaning as determined by the RDF model theory [RDF-SEMANTICS]. This determines, with mathematical precision, all the conclusions that can be legitimately drawn from an RDF graph. There is also the social meaning of the graph. It is the social meaning that affects what it means to people and how it interacts with human social institutions such as our systems of law.
2.4.1 Asserted and non-asserted forms
RDF/XML expressions, i.e. encodings of RDF graphs, can be used to make claims or assertions about the 'real' world. Such expressions are said to be asserted.
But not every RDF/XML expression is asserted. While the formal semantics of an RDF graph is that of an assertion, some may convey meaning that is partly determined by the circumstances in which they are used. For example, in English, a statement "I don't believe that George is a clown" contains the words "George is a clown", which, considered in isolation, has the form of an assertion that George exhibits certain comic qualities. However, considering the whole sentence, no such assertion is considered to be made.
2.4.2 Social meaning
When an RDF graph is asserted in the Web, its publisher is saying something about their view of the world. Such an assertion should be understood to carry the same social import and responsibilities as an assertion in any other format. A combination of social (e.g. legal) and technical machinery (protocols, file formats, publication frameworks) provide the contexts that fix the intended meanings of the vocabulary of some piece of RDF, and which distinguish assertions from other uses (e.g. citations, denials or illustrations).
The technical machinery includes protocols for transferring information (e.g. HTTP, SMTP) and file formats for encapsulating and labelling information (e.g. MIME, XML). A media type, application/rdf+xml [RDF-MIME-TYPE] indicates the use of RDF/XML as distinct from some other XML that happens to look like RDF. Issuing an HTTP GET request and obtaining data with a "200 OK" response code is a technical indication that the received data was published at the request URI; but data received with a "404 Not found" response cannot be considered to be similarly published information.
The social machinery includes the form of publication: publishing some unqualified statements on one's World Wide Web home page would generally be taken as an assertion of those statements. But publishing the same statements with a qualification, such as "here are some common myths", or as part of a rebuttal, would likely not be construed as an assertion of the truth of those statements. Similar considerations apply to the publication of assertions expressed in RDF.
When a user invokes an application that uses RDF, there is also a social and technical context of invocation that determines some set of RDF assertions that will be assumed to be true: the application itself, and any RDF files that are passed to it. Garbage-in, garbage-out applies: if the initial assumed facts are wrong or meaningless, the results will have little value. No specific mechanisms for deciding or evaluating the validity of any such assertions are defined here.
2.4.3 Interaction between social and formal meaning
[[[This whole section needs more work: the issues still are not being conveyed clearly]]]
Using RDF, 'received meaning' can be characterized as the social meaning of any logical consequences. If you publish a graph G and G logically entails G', and we interpret G' using the same social conventions that everyone agrees could be reasonably used to interpret G, then you are asserting that content of G' as well.
Human publishers of RDF content commit themselves to the mechanically-inferred social obligations. The machines doing the inferences are not expected to know about all these social conventions and obligations.
The social conventions surrounding use of RDF include the idea that each URI 'belongs to' somebody who has authority and responsibility for defining its meaning. The social conventions are rooted in the URI specification [RFC2396] and registration procedures [RFC2717]. A URI scheme registration refers to a specification of the detailed syntax and interpretation for that scheme, from which the defining authority for a given URI may be deduced. In the case ofhttp: URIs, the defining specification is the HTTP protocol specification [RFC2616], which obtains a resource representation from the host named in the URI; thus, the owner of the host's DNS domain controls (observable aspects of) the URI's meaning.
2.4.3.1 Example
Imagine three websites each publishing some RDF:
(A) http://insult.example.com/lexicon# asserts the following, and this is all that one can find on the website about that term: | ||
---|---|---|
A:Clown | rdf:type | rdfs:Class . |
A:Clown | rdfs:Comment | "A foolish person, whose pronouncements are probably ill-considered and not to be taken seriously" . |
(B) http://AngloSaxon.example.org/lexicon# asserts: | ||
B:Comic | rdf:subClassOf | http://insult.example.com/lexicon#Clown . |
(C) http://skunk.example.org/ asserts the following, assuming that C:JohnSmith is understood to refer to some particular person: | ||
C:JohnSmith | rdf:type | http://AngloSaxon.example.org/lexicon#Comic . |
Now, it follows by the formal RDF model theory that these three together entail:
C:JohnSmith | rdf:type | http://insult.example.com/lexicon#Clown . |
---|
which the person identified as C:JohnSmith might reasonably consider an insult. Why? Not because of the RDF model theory, which merely says he is in some class about which nothing can be formally inferred. However, the rdfs:comment associated with that class name by the owner of that name provides the insulting content, in the social context of Web publication, even though it cannot be formally inferred via the RDF inference rules.
But who has insulted the identified person? A merely defined the term; B does not mention him in particular, so even A and B together do not constitute a personal insult. And C might argue that although he refers to the person, he only asserts that he is a comic, which is not in itself grounds for a libel suit. However, one could reasonably claim that C is to blame, since C uses not a generic term 'Comic', but a particular URI reference which is defined by its owner (B) in a way which is clearly insulting, since B in turn explicitly refers to, and uses, the term defined by A. Thus, C's use of a B-defined term suggests a clear intent by C to convey a meaning defined by B, by virtue of a definition by A, which is insulting.
By using the specific name http://AngloSaxon.example.org/lexicon#Comic instead of some term defined in, say, a glossary of job descriptions, B has explicitly removed his use of the term 'Clown' from any formal connection with people who are entertainers. In order to succeed in his probable intent of making a generic slander against these people, B should have used a term that was defined by someone else, such as:
http://entertainers.example.com/glossary#Comic
rdfs:subClassOf http://insult.example.com/lexicon#Clown .
and then if C had also used this first URI reference, then in spite of a similar formal inference chain generating the insulting conclusion about C:JohnSmith, there would be nobody to sue, since now C would indeed have simply made a harmless observation about his occupation, and B's assertion, while indeed arguably offensive, makes no reference to him in particular.
The point of this example is to emphasize that publication of RDF, when considered as a social act, constitutes a publication of some content that is defined by whatever normal social conditions are used by the publishers of any terms in the RDF to define the meanings of those terms, even if those meanings and definitions are not accessible to the formal semantics of RDF; and, moreover, those meanings are preserved under any formally sanctioned inference processes. In a nutshell, the formal entailments of social meanings are themselves part of the social meaning.
2.4.4 Authoritative definition of a predicate
RDF assumes that for any URI some individual or organization has the authority to define the meaning of that URI. An RDF predicate is defined by the individual or organization with such the authority with respect to the its URI, and misuse by others should not be permitted to undermine that authority.
2.5 RDF core URI vocabulary and namespaces
RDF uses URIs to identify resources and properties. Certain URIs are reserved for use by RDF, and may not be used for any purpose not sanctioned the RDF specifications. Specifically, URIs with the following leading substrings are reserved for RDF core vocabulary:
- http://www.w3.org/1999/02/22-rdf-syntax-ns# (conventionally associated with namespace prefix rdf:)
- http://www.w3.org/2000/01/rdf-schema# (conventionally associated with namespace prefix rdfs:)
Used with the RDF/XML serialization, these URI prefix strings correspond to XML namespaces [XML-NS] associated with the RDF core vocabulary terms.
NOTE: these namespace URIs are the same as those used in earlier RDF documents [RDF-MS] [RDF-SCHEMA].
[[[NOTE FOR REVIEWERS: Some terms in these namespaces have been deprecated, some have been added, and some RDF schema terms have had their meaning changed. We invite community feedback regarding the relative costs of adopting these changes under the old namespace URIs vs creating new URIs for this revision of RDF.]]]
Vocabulary terms in the rdf: namespace are listed in section 3.4 of the RDF syntax specification [RDF-SYNTAX].
Vocabulary terms defined in the rdfs: namespace are defined [[[where?]]] in the RDF schema vocabulary specification [RDF-VOCABULARY].
3. XML Content within an RDF Graph
RDF provides for XML content as a possible literal value. This typically originates from the use ofrdf:parseType="Literal"
in the RDF/XML Syntax [RDF-SYNTAX].
Such content is indicated in an RDF graph using a typed literal whose datatype is a special built-in datatype,rdfs:XMLLiteral
.
As part of the definition of this datatype, we use an ancillary definition.
The XML document corresponding to a pair ( str, lang ) is formed as follows:
Concatenate the five strings:
- "<rdf-wrapper xml:lang='"
- lang
- "'>"
- str
- ""
Encode the resulting Unicode string in UTF-8 to form the corresponding XML document.
No escaping is applied. The choice of
rdf-wrapper
is fixed but arbitrary.
The XML document corresponding to a stringstr is formed as the XML document corresponding to the pair (str, "").
Using this, the datatype rdfs:XMLLiteral
is defined as follows.
The datatype URI
ishttp://www.w3.org/2000/01/rdf-schema#XMLLiteral
.
The value space
is the set of all XML documents that:
- Have root element tag:
<rdf-wrapper>
- Have no attributes on the root element other than
xml:lang
- are Canonical XML [XML-C14N] (with comments).
The lexical space
contains all pairs ( string, lang ) wherelang is any language identifier [RFC-3066] in lowercase, andstring
is well-balanced, self-contained XML element content [XML], for which the XML document corresponding to the pair is a well-formed XML document [XML] that also conforms to XML Namespaces [XML-NS].
also contains all strings string
which are well-balanced, self-contained XML element content [XML], and for which the corresponding XML document is a well-formed XML document [XML] that also conforms to XML Namespaces [XML-NS].
The mapping
is defined as the function that maps a pair or string to thecanonical form [XML-C14N] (with comments) of the corresponding XML document.
REMINDER: All other datatypes have a lexical space being a set of strings, and a mapping which maps strings to values.
NOTE: Not all values of this datatype are compliant with XML 1.1 [XML 1.1]. If compliance with XML 1.1 is desired, then only those values that are fully normalized according to XML 1.1 should be used.
4. Abstract Syntax
[[[This section has had a substantial rewrite since the last Working Group review. ]]]
This section defines the RDF abstract syntax. The RDF abstract syntax is a set of triples, called the RDF graph.
This section also defines equality between RDF graphs. A definition of equality is needed to support the RDF Test Cases [RDF-TESTS] specification.
4.1 RDF Triples
An RDF triple contains three components, called:
- the subject
- the predicate, which is an RDF URI reference
- the object
The subject may not be an RDF literal.
Note: subjects and objects are otherwise unrestricted, since anything that is neither an RDF literal nor an RDF URI reference. is treated as a blank node.
An RDF triple is conventionally written in the order subject, predicate, object.
4.2 RDF graph
An RDF graph is a set of RDF triples.
The nodes of an RDF graph is the set of subjects and objects of triples in the graph.
The blank nodes of an RDF graph are those nodes that are not RDF literals or RDF URI references.
4.3 Graph Equality
Two RDF graphs G and G' are equal if there is a bijection I between the nodes of the two graphs, such that:
- I(lit)=lit for all RDF literals lit which are nodes of either graph.
- I(uri)=uri for all RDF URI references uri which are nodes of either graph.
- The triple ( s, p, o ) is in G if and only if the triple ( I(s), p, I(o) ) is inG'
4.4 RDF URI References
[[[This text should be reviewed in light of the IRI section in the namespaces 1.1 Last Call WD and comments made on it; the editor had one attempt but it failed.]]]
A URI reference within an RDF graph (an RDF URI reference) is a Unicode string [UNICODE] that:
- is in Normal Form C [NFC] and
- would produce a US-ASCII string that is an absolute URI reference in the form defined by [URIS], as modified by [RFC-2732], when:
- encoded in UTF-8, and
- with disallowed characters escaped according to the percent escaping algorithm below
The disallowed characters that must be %-escaped include all non-ASCII characters, the excluded characters listed in Section 2.4 of [URIS], except for the number sign (#) and percent sign (%) characters and the square bracket characters re-allowed in [RFC-2732].
Disallowed characters must be escaped as follows:
- Each disallowed character is converted to UTF-8 [RFC-2279] as one or more bytes.
- Any bytes corresponding to a disallowed character are escaped with the URI escaping mechanism (that is, converted to
%
HH, where HH is the hexadecimal notation of the byte value). - The original character is replaced by the resulting character sequence.
Two RDF URI references are equal if and only if they compare as equal, character by character, as Unicode strings.
Note: RDF URI references are compatible with the anyURI datatype as defined by XML schema datatypes [XML-SCHEMA2], constrained to be an absolute rather than a relative URI reference, and constrained to be in Unicode Normal Form C [NFC] (for compatibility with [CHARMOD]).
Note: RDF URI references are compatible with International Resource Identifiers as defined by [XML Namespaces 1.1].
Note: The restriction to absolute URI references is found in this abstract syntax. When there is a well-defined base URI, concrete syntaxes, such as RDF/XML, may permit relative URIs as a shorthand for such absolute URI references,
4.5 RDF Literals
A literal in an RDF graph contains three components called:
- The lexical form being a Unicode [UNICODE] string in Normal Form C [NFC].
- The language identifier as defined by [RFC-3066], normalized to lowercase.
- The datatype URI being an RDF URI reference.
The lexical form is present in all RDF literals; the language identifier and the datatype URI may be absent from an RDF literal.
A plain literal is one in which the datatype URI is absent.
A typed literal is one in which the datatype URI is present.
Note: Literals in which the lexical form begins with a composing character (as defined by [CHARMOD]) are allowed however they may cause interoperability problems, particularly with XML version 1.1 [XML 1.1].
Note: When using the language identifier, care must be taken not to confuse language with locale. The language identifier only relates to human language text. Presentational issues, how to best represent typed data to the end-user, should be addressed in end-user applications.
4.5.1 Literal Equality
Two literals are equal if and only if all of the following hold:
- The strings of the two lexical forms compare equal, character by character.
- Either both or neither have language identifiers.
- The language identifiers of the two lexical forms compare equal.
- Either both or neither have datatype URIs.
- The two datatype URIs, if any, compare equal, character by character.
Note: RDF Literals are distinct and distinguishable from RDF URI references; e.g. http://example.org as an RDF Literal (untyped, without a language identifier) is not equal to http://example.org as an RDF URI reference.
4.5.2 The Value Corresponding to a Typed Literal
The datatype URI refers to a datatype. For XML Schema built-in datatypes, URIs such ashttp://www.w3.org/2001/XMLSchema#int
are used. The URI of the datatype rdfs:XMLLiteral may be used. There may be other, implementation dependent, mechanisms by which URIs refer to datatypes.
The value associated with a typed literal is found by applying the datatype mapping associated with the datatype URI to the lexical form. This mapping fails if the lexical form is not in the lexical space of the datatype associated with the datatype URI. Exceptionally, if the datatype is rdfs:XMLLiteral and the literal has a language identifier, then the datatype mapping is applied to the pair form by the lexical form and the language identifier.
A typed literal for which the datatype does not map the lexical form to a value is not syntacticly ill-formed.
[[[Review interaction with model theory concerning typed values.]]]
5. Additional technical considerations
5.1 Character normalization
[[[This subsection will be deleted at the next draft of this document. This subsection normatively depends on CHARMOD, currently a Last Call Working Draft. The editors are unwilling to progress to Last Call with such a dependency.]]]
For the processing of character data that can be represented in different ways, RDF processors are required to conform to Early Uniform Normalization, as described by Character Model for the World Wide Web 1.0 [CHARMOD].
5.2 Fragment identifiers
RDF uses an RDF URI Reference, which may include a fragment identifier, as a context free identifier for a resource. RFC 2396 [URIS] states that the meaning of a fragment identifier depends on the MIME content-type of a document, i.e. is context dependent.
These apparently conflicting views are reconciled by considering that, in an RDF graph, any RDF URI reference consisting of an absolute URI and a fragment identifier identifies the same thing as the fragment identifier does in an application/rdf+xml [RDF-MIME-TYPE] representation of the resource identified by the absolute URI component. Thus:
- we assume that the URI part (i.e. excluding fragment identifier) indicates a Web resource with an RDF representation. So when someurl#frag is used in an RDF document, someurl is presumed to designate an RDF document.
- someurl#frag means the thing that is indicated, according to the rules of the application/rdf+xml MIME content-type as a "fragment" or "view" of the RDF document at someurl. If the document does not exist, or cannot be retrieved, then exactly what that view may be is somewhat undetermined, but that does not prevent use of RDF to say things about it.
- the RDF interpretation of a fragment identifier allows it to indicate a thing that is entirely external to the document, or even to the "shared information space" known as the Web. That is, it can be an abstract idea, like my car or a mythical Unicorn.
- thus, an application/rdf+xml document acts as an intermediary between some Web retrievable documents (itself, at least, also any other Web retrievable URIs that it may use, including schema URIs and references to other RDF documents), and some set of abstract or non-Web entities that the RDF may describe.
This provides a handling of URI references and their denotation that is consistent with the RDF model theory and usage, and also with conventional Web behavior.
6. Acknowledgments
This document contains a significant contribution from Pat Hayes, Sergey Melnik and Patrick Stickler, under whose leadership was developed the framework described in the RDF family of specifications for representing datatyped values, such as integers and dates.
The editors acknowledge valuable contributions from the following:
- Frank Manola
- Pat Hayes
- Dan Brickley
- Jos de Roo
- Dave Beckett
- Patrick Stickler
- Peter F. Patel-Schneider
- Jerome Euzenat
- Massimo Marchiori
- Tim Berners-Lee
- Dave Reynolds
- Dan Connolly
- [[[Other contributors]]]
Jeremy Carroll thanks Oreste Signore, his host at the W3C Office in Italy and Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo", part of the Consiglio Nazionale delle Ricerche, where Jeremy is a visiting researcher.
This document is a product of extended deliberations by the RDFcore Working Group, whose members have included:
- Art Barstow (W3C)
- Dave Beckett (ILRT)
- Dan Brickley (ILRT)
- Dan Connolly (W3C)
- Jeremy Carroll (Hewlett Packard)
- Ron Daniel (Interwoven Inc)
- Bill dehOra (InterX)
- Jos De Roo (AGFA)
- Jan Grant (ILRT)
- Graham Klyne (Nine by Nine)
- Frank Manola (MITRE Corporation)
- Brian McBride (Hewlett Packard)
- Eric Miller (W3C)
- Stephen Petschulat (IBM)
- Patrick Stickler (Nokia)
- Aaron Swartz (HWG)
- Mike Dean (BBN Technologies / Verizon)
- R. V. Guha (Alpiri Inc)
- Pat Hayes (IHMC)
- Sergey Melnik (Stanford University)
- Martyn Horner (Profium Ltd)
This specification also draws upon an earlier RDF Model and Syntax document edited by Ora Lassilla and Ralph Swick, and RDF Schema edited by Dan Brickley and R. V. Guha. RDF and RDF Schema Working Group members who contributed to this earlier work are:
- Nick Arnett (Verity)
- Tim Berners-Lee (W3C)
- Tim Bray (Textuality)
- Dan Brickley (ILRT / University of Bristol)
- Walter Chang (Adobe)
- Sailesh Chutani (Oracle)
- Dan Connolly (W3C)
- Ron Daniel (DATAFUSION)
- Charles Frankston (Microsoft)
- Patrick Gannon (CommerceNet)
- R. V. Guha (Epinions, previously of Netscape Communications)
- Tom Hill (Apple Computer)
- Arthur van Hoff (Marimba)
- Renato Iannella (DSTC)
- Sandeep Jain (Oracle)
- Kevin Jones, (InterMind)
- Emiko Kezuka (Digital Vision Laboratories)
- Joe Lapp (webMethods Inc.)
- Ora Lassila (Nokia Research Center)
- Andrew Layman (Microsoft)
- Ralph LeVan (OCLC)
- John McCarthy (Lawrence Berkeley National Laboratory)
- Chris McConnell (Microsoft)
- Murray Maloney (Grif)
- Michael Mealling (Network Solutions)
- Norbert Mikula (DataChannel)
- Eric Miller (OCLC)
- Jim Miller (W3C, emeritus)
- Frank Olken (Lawrence Berkeley National Laboratory)
- Jean Paoli (Microsoft)
- Sri Raghavan (Digital/Compaq)
- Lisa Rein (webMethods Inc.)
- Paul Resnick (University of Michigan)
- Bill Roberts (KnowledgeCite)
- Tsuyoshi Sakata (Digital Vision Laboratories)
- Bob Schloss (IBM)
- Leon Shklar (Pencom Web Works)
- David Singer (IBM)
- Wei (William) Song (SISU)
- Neel Sundaresan (IBM)
- Ralph Swick (W3C)
- Naohiko Uramoto (IBM)
- Charles Wicksteed (Reuters Ltd.)
- Misha Wolf (Reuters Ltd.)
- Lauren Wood (SoftQuad)
7. References
7.1 Normative References
[RDF-SYNTAX]
RDF/XML Syntax Specification (Revised), Dave Beckett, World Wide Web Consortium, 6 November 2002 (work in progress). This version of the RDF/XML Syntax Specification (Revised) is http://www.w3.org/TR/2002/WD-rdf-syntax-grammar-20021108/. The latest version is at http://www.w3.org/TR/rdf-syntax-grammar/.
[RDF-SEMANTICS]
RDF Model Theory, P. Hayes, Editor. Work in progress. World Wide Web Consortium, 29 April 2002. This version of the RDF Model Theory is http://www.w3.org/TR/2002/WD-rdf-mt-20020429/. The latest version of the RDF Model Theory is at http://www.w3.org/TR/rdf-mt/.
[RDF-VOCABULARY]
RDF Vocabulary Description Language 1.0: RDF Schema, Dan Brickley, R.V. Guha, World Wide Web Consortium, April 2002 (work in progress). The latest version is at http://www.w3.org/TR/rdf-schema/.
[RDF-MIME-TYPE]
Application/rdf+xml Media Type Registration, A. Swartz, IETF Internet Draft, March 2002 (work in progress). Version available at http://www.ietf.org/internet-drafts/draft-swartz-rdfcore-rdfxml-mediatype-01.txt.
[RDF-TESTS]
RDF Test Cases, Jan Grant and Dave Beckett, Editors. Work in progress. World Wide Web Consortium, 29 April 2002. This version of the RDF Test Cases ishttp://www.w3.org/TR/2002/WD-rdf-testcases-20020429/. The latest version of the RDF Test Cases is at http://www.w3.org/TR/rdf-testcases/.
[XML]
Extensible Markup Language (XML) 1.0, Second Edition, T. Bray, J. Paoli, C.M. Sperberg-McQueen and E. Maler, Editors. World Wide Web Consortium. 6 October 2000. This version ishttp://www.w3.org/TR/2000/REC-xml-20001006. The latest version of XML is available at http://www.w3.org/TR/REC-xml.
[XML-NS]
Namespaces in XML, T. Bray, D. Hollander and A. Layman, Editors. World Wide Web Consortium. 14 January 1999. This version ishttp://www.w3.org/TR/1999/REC-xml-names-19990114/. The latest version of Namespaces in XML is available at http://www.w3.org/TR/REC-xml-names/.
[URIS]
RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax, T. Berners-Lee, R. Fielding and L. Masinter, IETF, August 1998. This document is http://www.isi.edu/in-notes/rfc2396.txt.
[URI-REG]
RFC 2717 - Registration Procedures for URL Scheme Names, R. Petke and I. King, IETF, November 1999. This document is http://www.isi.edu/in-notes/rfc2717.txt.
[RFC-2732]
RFC 2732 - Format for Literal IPv6 Addresses in URL's, R. Hinden, B. Carpenter and L. Masinter, IETF, December 1999. This document is http://www.isi.edu/in-notes/rfc2732.txt.
[RFC-2279]
RFC 2279 - UTF-8, a transformation format of ISO 10646, F. Yergeau, IETF, January 1998. This document is http://www.isi.edu/in-notes/rfc2279.txt.
[UNICODE]
The Unicode Standard, Version 3, The Unicode Consortium, Addison-Wesley, 2000. ISBN 0-201-61633-5, as updated from time to time by the publication of new versions. (See http://www.unicode.org/unicode/standard/versions/ for the latest version and additional information on versions of the standard and of the Unicode Character Database).
[NFC]
Unicode Normalization Forms, Unicode Standard Annex #15, Mark Davis, Martin Dürst. (See http://www.unicode.org/unicode/reports/tr15/ for the latest version).
[CHARMOD]
Character Model for the World Wide Web 1.0, M. Dürst, F. Yergeau, R. Ishida, M. Wolf, A. Freytag, T Texin, Editors, World Wide Web Consortium Working Draft, work in progress, 20 February 2002. This version of the Character Model is http://www.w3.org/TR/2002/WD-charmod-20020220/. The latest version of the Character Model is at http://www.w3.org/TR/charmod/.
[RFC-3066]
RFC 3066 - Tags for the Identification of Languages, H. Alvestrand, IETF, January 2001. This document is http://www.isi.edu/in-notes/rfc3066.txt.
XML-C14N
Canonical XML. J. Boyer. W3C Recommendation, March 2001.
Available at http://www.w3.org/TR/2001/REC-xml-c14n-20010315
Available at http://www.ietf.org/rfc/rfc3076.txt
[KEYWORDS]
RFC 2119 - Key words for use in RFCs to Indicate Requirement Levels, S. Bradner, IETF. March 1997. This document is http://www.ietf.org/rfc/rfc2119.txt. [[[Is this used?]]]
[RFC-3023]
RFC 3032 - XML Media Types, M. Murata, S. St.Laurent, D.Kohn, IETF, January 2001. This document is http://www.ietf.org/rfc/rfc3023.txt.
7.2 Informational References
[RDF-PRIMER]
RDF Primer, F. Manola, E. Miller, Editors, World Wide Web Consortium W3C Working Draft, work in progress, 26 April 2002. This version of the RDF Primer is http://www.w3.org/TR/2002/WD-rdf-primer-20020426/. The latest version of the RDF Primer is at http://www.w3.org/TR/rdf-primer/.
[XML-1.1]
Extensible Markup Language (XML) 1.1, John Cowan, Editor. World Wide Web Consortium Working Draft 25 April 2002. (Work in progress)
[XML-NAMESPACES-1.1]
Namespaces in XML 1.1, Tim Bray, Dave Hollander, Andrew Layman, Richard Tobin, Editors. World Wide Web Consortium Working Draft 5 September 2002. (Work in progress)
[XML-INFOSET]
XML Information Set, John Cowan and Richard Tobin, W3C Recommendation, 24 October 2001. This document is http://www.w3.org/TR/xml-infoset/.
[XML-SCHEMA0]
XML Schema Part 0: Primer - W3C Recommendation, World Wide Web Consortium, 2 May 2001.
[XML-SCHEMA1]
XML Schema Part 1: Structures - W3C Recommendation, World Wide Web Consortium, 2 May 2001.
[XML-SCHEMA2]
XML Schema Part 2: Datatypes - W3C Recommendation, World Wide Web Consortium, 2 May 2001.
[OWL]
OWL Web Ontology Language 1.0 Reference, Mike Dean, Dan Connolly, Frank van Harmelen, James Hendler, Ian Horrocks, Deborah L. McGuinness, Peter F. Patel-Schneider, and Lynn Andrea Stein. W3C Working Draft 29 July 2002. Latest version is available at http://www.w3.org/TR/owl-ref/.
[SOWA]
Knowledge Representation: Logical, Philosophical and Computational Foundations, John F. Sowa, Brookes/Cole, 2000. ISBN 0-534-94965-7.
[SOWA2]
Mathematical Background, John F. Sowa, (an extended version of appendix A from Conceptual Structures: Information Processing in Mind and Machine, 1984).
[CG]
Conceptual Graphs, John F. Sowa, ISO working document ISO/JTC1/SC 32/WG2 N 000, 2 April 2001 (work in progress). Available at http://users.bestweb.net/~sowa/cg/cgstand.htm.
[KIF]
Knowledge Interchange Format, Michael R. Genesereth, draft proposed American National Standard NCITS.T2/98-004. Available at http://logic.stanford.edu/kif/dpans.html.
[LUGER]
Artificial Intelligence: Structures and Strategies for Complex Problem Solving (3rd ed.), George F. Luger and William A. Stubblefield, Addison Wesley Longman, 1998. ISBN 0-805-31196-3.
[HAYES]
In Defense of Logic, Patrick J. Hayes, Proceedings from the International Joint Conference on Artificial Intelligence, 1975, San Francisco. Morgan Kaufmann Inc., 1977. Also in Computation and Intelligence: Collected Readings, George F. Luger (ed), AAAI press/MIT press, 1995. ISBN 0-262-62101-0.
[GRAY]
Logic, Algebra and Databases, Peter Gray, Ellis Horwood Ltd., 1984. ISBN 0-85312-709-3, 0-85312-803-0, 0-470-20103-7, 0-470-20259-9.
[HUNTER]
Metalogic: An Introduction to the Metatheory of Standard First Order Logic, Geoffrey Hunter, University of California Press, 1971. ISBN 0-520-02356-0.
[DAVIS]
Truth, Deduction and Computation: logic and semantics for computer science, Ruth E. Davis, Computer Science Press, 1989. ISBN 0-7167-8201-4.
[QUINE]
Philosophy of Logic (2nd ed.), W. V. Quine, Harvard University Press 1986, ISBN 0-674-66563-5.
[NOTATION3]
Tim Berners-Lee, DesignIssues note on N3, ...
[RDF-MS]
Resource Description Framework (RDF) Model and Syntax Specification, O. Lassila and R. Swick, Editors. World Wide Web Consortium. 22 February 1999. This version is http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/. The latest version of RDF M&S is available at http://www.w3.org/TR/REC-rdf-syntax/.
[RDF-SCHEMA]
Resource Description Framework (RDF) Schema Specification 1.0, Dan Brickley and R. V. Guha, W3C Candidate Recommendation, 27 March 2000. This document is http://www.w3.org/TR/rdf-schema/.