Issue 17624: Confusing TypeError in urllib.urlopen (original) (raw)

When calling urllib.urlopen with a string containing the NULL ('\x00') character, a TypeError exception is thrown, as in the following example:

urllib.urlopen('\x00\x00\x00')

[...] File "/home/bucur/onion/python-bin/lib/python2.7/urllib.py", line 86, in urlopen return opener.open(url) File "/home/bucur/onion/python-bin/lib/python2.7/urllib.py", line 207, in open return getattr(self, name)(url) File "/home/bucur/onion/python-bin/lib/python2.7/urllib.py", line 462, in open_file return self.open_local_file(url) File "/home/bucur/onion/python-bin/lib/python2.7/urllib.py", line 474, in open_local_file stats = os.stat(localname) TypeError: must be encoded string without NULL bytes, not str

This exception is confusing, since apparently the right type (a string) is passed to the function. Since this behavior cannot change, it would be good to mention this exception in the function documentation.

I can imagine code that composes a URL based on user-supplied input and passes it to urlopen crashing if it doesn't properly sanitize the URL and/or doesn't catch TypeError.

In Python3 the equivalent urllib.request.urlopen call produces:

ValueError: unknown url type:

So this is effectively already fixed (although that error message should be doing a repr on the value, so I fixed that).

We don't in general document every exception that might be raised by a function. Here the TypeError is coming from treating the url as a local filename. I don't think it is appropriate to document all the errors that can arise from treating the URL as a filename in the urllib docs, so I don't believe any changes should be made here. I've added the 'doc' componennt, so if someone from the doc team disagrees with me they can reopen the issue.

As for your specific concern, the application has more problems (as in, security problems) than crashing because of a TypeError if it is composing the URL from user input such that the URL gets treated as a local filename. (This is arguably a bug in urllib, that it appears has been fixed in Python3.)