Issue 1599329: urllib(2) should allow automatic decoding by charset (original) (raw)
Issue1599329
Created on 2006-11-19 19:47 by edemaine, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (3) | ||
---|---|---|
msg61263 - (view) | Author: Erik Demaine (edemaine) | Date: 2006-11-19 19:47 |
Currently, urllib.urlopen(...).read() returns a string, not a unicode object. Ditto for urllib2. No attempt is made to decode the data using the charset encoding specified in the header ....info()['Content-Type']. Is it fair to assume that, in Python 3K, urllib....read() will return (Unicode) strings instead of bytes, automatically decoding according to the charset? Do you think we could expose this futuristic functionality in Python 2? I doubt we could change read() without breaking a lot of existing code that already does this decoding (e.g., http://zesty.ca/python/scrape.py), but perhaps a 'uread()' method could return a unicode object instead of a string. | ||
msg61264 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2006-11-22 06:57 |
I don't think urlopen(...).read() should return strings in Py3k, but instead it should return bytes - in general, resources retrieved are byte sequences (many are application/octet-stream). Making the return type depend on the resource being fetched is also unintuitive. It might be reasonable to have the user specified "binary" or "text" on urlopen() (just like regular open()). | ||
msg81429 - (view) | Author: Daniel Diniz (ajaksu2) * ![]() |
Date: 2009-02-09 00:29 |
There's an attempt to implement this behavior, for 3.1, in issue 4733. Maybe having a parallel in 2.7 could help the 2.x 3.x transition for some users? |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:21 | admin | set | github: 44254 |
2010-11-18 02:33:31 | eric.araujo | unlink | issue4733 dependencies |
2010-11-18 02:26:19 | eric.araujo | set | status: open -> closeddependencies: - Add a "decode to declared encoding" version of urlopen to urllibsuperseder: Add a "decode to declared encoding" version of urlopen to urllibversions: + Python 3.2, - Python 3.1, Python 2.7nosy: + eric.araujoresolution: duplicatestage: test needed -> resolved |
2010-01-27 23:47:10 | mastrodomenico | set | nosy: + mastrodomenico |
2009-04-22 17:26:15 | ajaksu2 | set | keywords: + easy |
2009-02-13 01:20:26 | ajaksu2 | set | nosy: + jjlee |
2009-02-12 18:24:23 | ajaksu2 | link | issue4733 dependencies |
2009-02-12 18:23:53 | ajaksu2 | set | nosy: + orsenthildependencies: + Add a "decode to declared encoding" version of urlopen to urllibstage: test needed |
2009-02-09 00:29:25 | ajaksu2 | set | nosy: + ajaksu2messages: + versions: + Python 3.1, Python 2.7 |
2006-11-19 19:47:30 | edemaine | create |