escaping of URI references in N-triples from Dave Beckett on 2003-05-13 (www-rdf-comments@w3.org from April to June 2003) (original) (raw)

On Thu, 08 May 2003 15:28:15 -0400 Martin Duerst <duerst@w3.org> wrote:

Dear RDF specialists,

[This is currently a personal comment. I'll ask the I18N WG to look at it on their teleconf next week.]

Emmanuel and me just discovered a problem in the RDF spec, in the definition of N-triples at http://www.w3.org/TR/rdf-testcases/#sec-uri-encoding

The 23 January 2003 last call WD, http://www.w3.org/TR/2003/WD-rdf-testcases-20030123/#sec-uri-encoding

This says:

Disallowed characters are represented in UTF-8 and then encoded using the %HH format, where HH is the byte value expressed using hexadecimal notation.

Characters above the US-ASCII range are made available by the \u or \U escapes as described in section Strings for ranges [#x80-#xFFFF] and [#x10000-#x10FFFF] respectively.

So if I have <http://example.org/��>, ...

an IRI

... what's the correct representation of this in N-triples? Is it <http://example.org/\u9234\u6728> ? Or is it <http://example.org/%E9%88%B4%E6%9C%A8> ? The spec currently seems to allow both, but this clearly is going against the purpose of N-triples for testing purposes.

RDF uses it's own definition of RDF URI References, which should have been linked from here rather than to the IRI definition in the ongoing Charmod draft work. It might be easier to say less so that whatever characters your RDF URI reference contains, this document just tells you how to encode it.

I think that removing the first paragraph in the quote above should be sufficient, along with adding a reference to the RDF Concepts WD definitions. Would that help?

In line with the IRI spec, which says that conversion to URIs should be done as late as possible, I strongly suggest to only use the http://example.org/\u9234\u6728 form. This should also allow to streamline the description of escaping, which should become the same for Strings and for URIs. There should also be a statement saying that the escaping is needed for N-triples, but not for N3, ...

I won't be discussing N3 in this document, it doesn't define that changing research language. In this regard, for example, I recall that N3 changed from ASCII to UTF-8 since N-Triples was designed.

...and there should be some I18N component in the example to show the points. ...

This can be fixed by adding to the test file http://www.w3.org/2000/10/rdf-tests/rdfcore/ntriples/test.nt some encoded RDF URI examples such as <http://example.org/\u9234\u6728> I will do this if that will address this sufficiently.

... Using the \u form is also robust to potential changes in the IRI spec. Some care is needed about characters in the ASCII range that are not allowed in URIs.

I really don't want to give the detail of URIs or IRIs - people can look up those specs if they want to know that, it shouldn't be duplicated here.

The RDF validator currently does a third thing, namely it does not use any escaping at all:

The original RDF/XML document

1: 2: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 3: xmlns:dc="http://purl.org/dc/elements/1.1/"> 4: <rdf:Description rdf:about="http://www.w3.org/��"> 5: dc:title��Ϻ 6: 7: 8:

Triples of the Data Model in N-Triples Format (Sub, Pred, Obj)

<http://www.w3.org/��> <http://purl.org/dc/elements/1.1/title> "\u9234\u6728\u592A\u90CE" .

That's illegal N-Triples (7 bit US-ASCII); no character above 126 is allowed. Which reminds me, I made one for N-Triples:

Redland N-Triples Validator http://www.redland.opensource.ac.uk/ntriples/

Dave

Received on Tuesday, 13 May 2003 05:24:38 UTC