[Python-Dev] urllib unicode handling (original) (raw)
Kristján Valur Jónsson kristjan at ccpgames.com
Wed May 7 14:00:51 CEST 2008
- Previous message: [Python-Dev] urllib unicode handling
- Next message: [Python-Dev] urllib unicode handling
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
-----Original Message----- From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Jeroen Ruigrok van der Werven Sent: Wednesday, May 07, 2008 05:20 To: Tom Pinckney Cc: python-dev at python.org Subject: Re: [Python-Dev] urllib unicode handling
-On [20080507 04:06], Tom Pinckney (thomaspinckney3 at gmail.com) wrote: >While in theory UTF-8 is not a standard, sites like Last.fm, Facebook and >Wikipedia seem to have embraced it (as have pretty much all other major web >sites). As with HTML, there is what the standard says and what the actual >browsers have to accept in order to work in the real world.
FYI, here is how we have patched urrlib2 for use in EVE:
--- C:\p4\sdk\stackless25\Lib\urllib.py 2008-03-21 14:47:23.000000000 -0000 +++ C:\p4\eve\KALI\common\stdlib\urllib.py 2007-11-06 11🔞01.000000000 -0000 @@ -1158,12 +1158,29 @@ except KeyError: res[i] = '%' + item except UnicodeDecodeError: res[i] = unichr(int(item[:2], 16)) + item[2:] return "".join(res)
+unquote_inner = unquote +def unquote(s):
- """CCP attempt at making sensible choices in unicode quoteing / unquoting """
- s = unquote_inner(s)
- try:
u = s.decode("utf-8")
try:
s2 = s.decode("ascii")
except UnicodeDecodeError:
s = u #yes, s was definitely utf8, which isn't pure ascii
else:
if u != s:
s = u
- except UnicodeDecodeError:
pass #can't have been utf8
- return s
- def unquote_plus(s): """unquote('%7e/abc+def') -> '~/abc def'""" s = s.replace('+', ' ') return unquote(s) always_safe = ('ABCDEFGHIJKLMNOPQRSTUVWXYZ'
@@ -1201,12 +1218,20 @@ for i in range(256): c = chr(i) safe_map[c] = (c in safe) and c or ('%%%02X' % i) _safemaps[cachekey] = safe_map res = map(safe_map.getitem, s) return ''.join(res) + +quote_inner = quote +def quote(s, safe = '/'):
- """CCP addition, to try to sensibly support / circumvent issues with unicode in urls"""
- try:
return quote_inner(s, safe)
- except KeyError:
return quote_inner(s.encode("utf-8", safe))
def quote_plus(s, safe = ''): """Quote the query fragment of a URL; replacing ' ' with '+'""" if ' ' in s: s = quote(s, safe + ' ') return s.replace(' ', '+')
- Previous message: [Python-Dev] urllib unicode handling
- Next message: [Python-Dev] urllib unicode handling
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]