HTTP 308 handling does not correctly change Host header in request. · Issue #2382 · RDFLib/rdflib (original) (raw)

RDFLib handles HTTP 308 responses with this code:

def _urlopen(req: Request) -> Any:
try:
return urlopen(req)
except HTTPError as ex:
# 308 (Permanent Redirect) is not supported by current python version(s)
# See https://bugs.python.org/issue40321
# This custom error handling should be removed once all
# supported versions of python support 308.
if ex.code == 308:
# type error: Incompatible types in assignment (expression has type "Optional[Any]", variable has type "str")
req.full_url = ex.headers.get("Location") # type: ignore[assignment]
return _urlopen(req)
else:
raise

This is because Python's urllib did not support 308 handling [ref] before Python 3.11 [ref]:

Python 3.11.0 alpha 2

...
bpo-40321: Adds support for HTTP 308 redirects to urllib. See RFC 7538 for details. Patch by Jochem Schulenklopper.

However, the way RDFLib handles 308 does not accommodate changes in HTTP host.

The following code does the same as RDFLib for HTTP 308 errors:

from typing import Any from urllib.error import HTTPError from urllib.request import HTTPHandler, HTTPSHandler, Request, build_opener from urllib.response import addinfourl

opener = build_opener(HTTPHandler(debuglevel=1), HTTPSHandler(debuglevel=1))

def _opener(req: Request) -> Any: try: return opener.open(req) except HTTPError as ex: if ex.code == 308: req.full_url = ex.headers.get("Location") return _opener(req) else: raise

response: addinfourl = _opener(Request("http://www.w3.org/ns/adms.ttl"))

Running it with Python 3.10 gives this output:

$ VIRTUAL_ENV=.venv/py310 poetry run python var/check.py [... elided ...] send: b'GET /w3c/ns/adms.ttl HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: uri.semic.eu\r\nUser-Agent: Python-urllib/3.10\r\nConnection: close\r\n\r\n' reply: 'HTTP/1.1 308 Permanent Redirect\r\n' header: Connection: close header: Location: https://uri.semic.eu/w3c/ns/adms.ttl header: Server: Caddy header: Date: Fri, 12 May 2023 09:12:53 GMT header: Content-Length: 0 send: b'GET /w3c/ns/adms.ttl HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.w3.org\r\nUser-Agent: Python-urllib/3.10\r\nConnection: close\r\n\r\n' reply: 'HTTP/1.1 200 OK\r\n' header: Server: Caddy header: Date: Fri, 12 May 2023 09:12:54 GMT header: Content-Length: 0 header: Connection: close

Running it with Python 3.11, where Python's urllib handles 308, gives this output for the same redirect:

$ VIRTUAL_ENV=.venv/py311 poetry run python var/check.py
[... elided ...]
send: b'GET /w3c/ns/adms.ttl HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: uri.semic.eu\r\nUser-Agent: Python-urllib/3.11\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 308 Permanent Redirect\r\n'
header: Connection: close
header: Location: https://uri.semic.eu/w3c/ns/adms.ttl
header: Server: Caddy
header: Date: Fri, 12 May 2023 09:13:51 GMT
header: Content-Length: 0
send: b'GET /w3c/ns/adms.ttl HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: uri.semic.eu\r\nUser-Agent: Python-urllib/3.11\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 302 Found\r\n'
header: Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, Accept
header: Access-Control-Allow-Origin: *
header: Content-Length: 145
header: Content-Type: text/html
header: Date: Fri, 12 May 2023 09:13:52 GMT
header: Location: https://raw.githubusercontent.com/SEMICeu//uri.semic.eu-puris/main/releases/w3c/ns/adms.ttl
header: Server: Caddy
header: Server: nginx/1.23.2
header: X-Request-Id: 82b477d4f02242d9813ecaecc615dd92
header: Connection: close
[... elided ...]

The reason for the difference is that the 308 handling in RDFLib sends Host: www.w3.org when being redirected to https://uri.semic.eu/w3c/ns/adms.ttl, which is wrong.