Relative URIs are resolved incorrectly after redirects · Issue #130 · RDFLib/rdflib (original) (raw)
vfaronov, 2010-09-10T23:23:28.000Z
What steps will reproduce the problem?
- Prepare a resource http://example.org/foo serving up an RDF description that contains relative URIs, for example <#frag1>.
- Prepare a resource http://example.org/bar that redirects (for example, HTTP 301) to http://example.org/foo.
- Use RDFLib's Graph.parse() to parse http://example.org/bar.
What is the expected output? What do you see instead?
I expect the "real" URI http://example.org/foo to be used as the base URI, giving absolute URIs of the form http://example.org/foo#frag1. Instead, RDFLib uses the original requested URI http://example.org/bar as the base, giving http://example.org/bar#frag1.
What version of the product are you using? On what operating system?
RDFLib trunk (r1895) on GNU/Linux.
Please provide any additional information below.
RFC 3986 Uniform Resource Identifier (URI): Generic Syntax
http://tools.ietf.org/html/rfc3986#section-5.1.3
"Note that if the retrieval was the result of a redirected request, the last URI used (i.e., the URI that resulted in the actual retrieval of the representation) is the base URI."
Comment 1 by vfaronov
For a working example, see
<http://linked-data.ru/example>
which 301s to (RDFa).
<http://linked-data.ru/example/>
Comment 2 by vfaronov
First attempt at a patch.
This changes the base URI resolution logic a bit, and I'm not 100% sure it doesn't break anything.
Index: rdflib/parser.py
--- rdflib/parser.py (revision 1895) +++ rdflib/parser.py (working copy) @@ -94,9 +94,11 @@ except HTTPError, e: # TODO: raise Exception('"%s" while trying to open "%s"' % (e, self.url))
self.url = file.geturl() # in case redirections took place self.content_type = file.info().get('content-type') self.content_type = self.content_type.split(";", 1)[0] self.setByteStream(file)
def repr(self):self.setPublicId(self.url) # TODO: self.setEncoding(encoding)
@@ -147,6 +149,8 @@ else: raise Exception("Unexpected type '%s' for source '%s'" % (type(source), source))
- absolute_location = None
- if location is not None: base = urljoin("file:", "%s/" % pathname2url(os.getcwd())) absolute_location = URIRef(location, base=base).defrag()
@@ -155,7 +159,6 @@ file = builtin.file(filename, "rb") else: input_source = URLInputSource(absolute_location, format)
if file is not None: input_source = FileInputSource(file)publicID = publicID or absolute_location
@@ -168,13 +171,11 @@ if input_source is None: raise Exception("could not create InputSource") else: - if publicID: + if publicID is not None: input_source.setPublicId(publicID)
# TODO: what motivated this bit?
id = input_source.getPublicId()
if id is None:
input_source.setPublicId("")
elif input_source.getPublicId() is None:
input_source.setPublicId(absolute_location or "")
return input_source