RDFa for HTML Authors (original) (raw)

RDFa is a thin layer of markup you can add to your web pages that makes them understandable for machines as well as people. You could describe it as a CSS for meaning. By adding it, browsers, search engines, and other software can understand more about the pages, and in so doing offer more services or better results for the user. For instance, if a browser knows that a page is about an event such as a conference, it can offer to add it to your calendar, show it on a map, locate hotels or flights, or any number of other things.

This document introduces RDFa and gives examples of its use.

If you know HTML markup, you will know that you can add metadata to an HTML document by adding <meta> and <link>elements in the head. For instance:

gives a description of the current document. You could say that the current page has a description property, whose value is "A site about fish".

which says that if you consider this page as one in a series of pages, the next one is thecod.html. In other words, this page has anext relation to thecod.html.

There are a smattering of other places in HTML where you can add some metadata, such as the title element and attribute in places, and the cite attribute on <blockquote> and others, but that is about it.

and the answer is simply that at the time this feature was added to HTML, some browsers would incorrectly have displayed the text in the meta element, even though it was in the <head> and so to prevent that happening the content was put in an attribute instead (this, by the way, is being fixed in XHTML2).

and so on.

In the time since the meta element was added to HTML, a generalised way of representing metadata has been defined at W3C. This is called RDF, the resource description framework ('resource' roughly speaking means 'document' here, but you'll see examples of other things than documents later).

RDF is a very simple framework. Essentially all knowledge is gathered as assertions of the form:

where 'URI' is the URI of the thing being described, 'property' is (the URI of) a property, and 'value' is the value that that property can take, either another URI, a literal string, or a chunk of XML.

So assuming the example document above has a URL ofhttp://www.example.com/home.html, then the RDF assertion, or_triple_ as it is often called, for the description property is

The value [html:next] means here "the url that represents the HTML next property", and is expressed here as a _Compact URI_or CURIE for short. More on those later.

RDFa extends the possibilities of metadata in XHTML, by generalising the attributes on meta and link and allowing them to be used on any element, not just meta and link(so that you may now have metadata in the body of the page as well as the head) and then defining how those attributes can be interpreted as RDF.

To take a simple example, many people add a number of so-called Dublin Core properties to their pages, such as title and author(which is called creator in Dublin Core, since the properties can be used with other things, such as paintings):

John Smith's Fish of the World **** ****

Fish of the World

by John Smith

The Dublin Core Metadata Initiative organization defined these and other properties for defining the metadata about books, works of art and so on. You can see in the example above that they duplicate information in the document itself. A nice thing about RDFa is that you can attach the properties to the document text instead:

John Smith's Fish of the World

Fish of the World

by John Smith

What this does is declare that we are going to use the Dublin Core properties, and prefix them with dc:. It then attaches the Dublin Core properties title and creator to the relevant parts of the text. Of course, a major advantage of this is that the visible versions in the text don't get out of sync with the metadata versions.

Compact URIs

In the last example we had property="dc:title". This says "the property called title from the vocabulary identified by dc:". But we also said earlier that a property was kept as a URI. A form such asdc:title is called a Compact URI, or CURIE for short. The URI it represents is just the concatenation of the URI in the declaration of the prefix (in this casexmlns:dc="http://purl.org/dc/elements/1.1/") and whatever follows the colon. So in this case dc:title is a short form of the full URI http://purl.org/dc/elements/1.1/title. (You can now probably see why CURIEs are nice to have.)

In the case that there is no prefix (as in the case of something likerel="index"), then a default prefix is used. For XHTML that default is http://www.w3.org/1999/xhtml/vocab#.

Using rel

Using the property attribute like this gives you an equivalent of the meta element, but then in the text of your page. To get the equivalent of a link element, you use the relattribute. For instance, pages often have a clickable "Next" link to take you to the next page:

Next

It can be expressed like this:

Similarly,

Back

can be written

Another typical use for rel is to use it to point to the copyright or licensing information of a page. Instead of:

Copyright

You can write

Copyright

(By the way, it doesn't matter what order you put the href andrel in.)

Of course, you could already do this in HTML. What is new is that it is now defined how to interpret this as RDF, and, as you will later see, you can apply it to more than just <a> elements.

Talking about Other Documents

Most of the metadata in HTML only allows you to talk about the document itself, and in all the examples we have given so far, we have been giving metadata about the page in question. But you may want to be able to talk about other things than just the current document (and you will see more examples of this shortly).

For this you can use the about attribute to specify what it is the information applies to. For instance, suppose you link to some data:

Here is a plot of the data: Rainfall 1900-1999. The raw data is available.

and you want to include the licensing conditions of that data:

The data is available under these conditions.

then you can say this:

The data is available under <a **about="rainfall.csv" rel="license"** href="license.html">these conditions.

If you use about on a container element, like a<p> then the about applies to all the contained relations:

The data Rainfall 1900-1999 is the property of Data Be We, Inc and is available under these conditions.

Using URIs and CURIEs in the about attribute

Note that the about attribute contains a URI. It can point to anything on the Web:

The title of the RDFa specification is RDFa in XHTML: Syntax and Processing...

Occasionally you may want to use a CURIE instead of a URI inabout (as you will see shortly), and so to distinguish a CURIE from a URI in those cases, you enclose a CURIE in square brackets. For instance, suppose you had definedxmlns:tr="http://www.w3.org/TR/", then you could write the above in the following way:

The title of the RDFa specification is RDFa in XHTML: Syntax and Processing...

Talking about People, Places and Things

Up to now we have been talking about assigning properties to things with URIs. But there is a problem: not everything that you might want to talk about has a URI. The city of Amsterdam doesn't have a URI. Nor does a person, or an object like a car, or a concept like love. Of course, these things have pages about them, but that is different. It is important not to confuse a website about something with that thing itself.

To take an example to explain the difference, suppose we want to say that T.S. Eliot is the author of the poem The Waste Land. Well, we might do a search for the poem, and findhttp://en.wikipedia.org/wiki/The_Waste_Land. You might then be tempted to say:

T.S. Eliot

Unfortunately, this says that T.S. Eliot created the Wikipedia page, which is patently not true. So what do we do?

Well, RDFa has a notation that allows you to create a local name for something that doesn't have a URI (or that has a URI that you don't know), and say something about it anyway:

The "_:" is a reserved prefix for this notation. You can put any identifier after the colon. What this says is "There is something (which we shall call 'TheWasteLand') which is the primary topic of the page athttp://en.wikipedia.org/wiki/The_Waste_Land."

Now that we have uniquely identified the poem we can record that its creator was 'T.S. Eliot'":

T.S. Eliot

(By the way, the foaf properties are identified byxmlns:foaf="http://xmlns.com/foaf/0.1/").

In this way we can mint all sorts of names for people, places, organizations and other things that haven't got URIs, and uniquely identify them. A person:

A place:

An organization:

And then we can use those names in order to talk about them:

W3C

These special CURIEs beginning "_:" are called blank nodes or_bnodes._ Note that they are local to a document, so you have to redeclare them in each document that you use them.

By the way, the important thing with blank nodes is to uniquely identify them by some means if you can. foaf:isPrimaryTopicOf is one way, but any property that is unique will work. For instance:

is just as good, since there is only one person who has that email address, and so we have uniquely identified that person.

Note that since an empty URI "" means 'the current page', on your own home page you can add code like

which says "The thing we call StevenPemberton is the primary topic of this page".

Overriding the Content

Sometimes although the content contains information that needs to be tagged, it is not always in the form you need it. For instance:

Amsterdam is located at latitude 52�22'23"N and longitude 4�53'32"E

While there are properties for recording latitude and longitude, they expect the values to be decimal numbers. Well we can write this:

Amsterdam is located at latitude 52�22'23"N and longitude 4�53'32"E

This is of course the same content ``attribute you know from the meta element. Its value overrides whatever is in the content of the element.

(The geo properties are atxmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#")

Swapping subject and object

A lesser-used but nevertheless useful relationship in HTML is the_reverse relationship_ rev. This relationship is likerel, but reverses the relationship. For instance, if a documentdoc.html is indexed by the page index.html, thendoc.html can record this fact with the link:

However, index.html can also record the relationship:

which says "this page is the index for doc.html".

You can use rev similarly in RDFa. All it does is swap the subject (the 'about') with the object (the 'href'). For instance, suppose we have a set of data about a person:

Name: Steven Pemberton Mail: steven@w3.org

Now, foaf has a property img that says that a particular image is a picture of some person. But the relationship is from the picture, to the person. What we would like to say is:

except that at the moment we are talking about the person, and not the image. So if we want to add this information to the block above, we just reverse the relationship with rev:

Name: Steven Pemberton Mail: steven@w3.org Mugshot: Photo

Note that you can have (if you want) both rel and rev on an element:

(Not that this example gives you very much in terms of extra information!)

Advanced topics

You now know enough to use RDFa for day-to-day use, but there are a few extras you might find useful.

The resource attribute

Alongside the href attribute, there is also aresource attribute with the same purpose, but usable when you don't want the link to be clickable, or you want to use a CURIE (since you can't use a CURIE in href):

The photo is entitled Steven in London

Note in passing that you may have more than one relation on an element. So we could also say:

The photo is entitled <em about="Steven.jpg" rel="foaf:img" resource="[_:StevenPemberton]" **property="dc:title"**>Steven in London

Packaging a group of relations

Often a group of properties together make up a whole. For instance an event can have a title, a description, a location, and a start and end date. If you want to say that a section of markup contains such a group of properties, you can use the typeof attribute. For instance, to mark up a conference:

WWW 2009

18th International World Wide Web Conference

To be held from 20th April 2009 until 24th April, in Madrid, Spain.

or a TV program:

Have I Got Old News For You

BBC2

Saturday 28 June, 9pm-9.30pm

Team captains Paul Merton and Ian Hislop are joined by returning guest host Jeremy Clarkson and panellists Danny Baker and Germaine Greer for the topical news quiz. [S]

Note the use of content here to get the dates and times into a machine-readable format.

Data Types

Occasionally you may want to specify that a particular property is of a certain data type. The datatype attribute is precisely for this purpose:

24th April

This would need anxmlns:xsd="http://www.w3.org/2001/XMLSchema".

Validating

If you want to make sure your page validates correctly, you should ensure your pages have the following at the top of the document (before the<html>).

The validator at http://validator.w3.org/ will check your pages.

Looking at the results

There are a number of online services willing to extract all the properties from a RDFa-enabled page, and tell you what they are. For instance, the RDFa Distiller at http://www.w3.org/2007/08/pyRdfa/.

Summary of Attributes

about

Specifies the subject of a relationship. If not given, then the subject is the current document.

rel

Defines a relation between the subject and a URL given by eitherhref or resource. The subject is either specified by the closest about or src attribute, @@

rev

The same as the the rel attribute, except that subject and object are reversed.

property

Defines a relationship between the subject and either a string (if thecontent attribute is present) or a piece of markup otherwise (the content of the element that the property attribute is on).

content

Specifies a string to use as an object for the property attribute

href

Specifies an object URI for the rev and rel attributes. Takes precedence over the resource attribute.

resource

Specifies an object URI for the rev and rel attributes if href is not present.

src

Specifies the subject of a relationship.

datatype

Specifies a datatype of the object of the property attribute (either in the content attribute, or the content of the element that the datattype attribute is on.) By default, data in the content attribute is of type string, and data in the content of an element has type xml:Literal. If datatype="" is used, then for the RDF the element content is stripped of markup, and is of type string.

typeof

Creates a blank node, which becomes the subject, and asserts that the current element contains relationships that match the given RDF type.

Examples

There are many vocabularies available across the web (called_taxonomies_ by some), and there are more being created all the time. Here is a selection:

See the RDFa Wiki list of vocabularies and RDFa examples in the wild for some more.

Further Reading

RDFa Specification - not written for beginners, and therefore hard going, but the final arbiter on RDFa

RDFa Primer - another introduction to RDFa

rdfa.info - news and information about developments.

RDFa Wiki - community meeting place for RDFa.