[Python-Dev] urllib unicode handling (original) (raw)

Kristján Valur Jónsson kristjan at ccpgames.com
Wed May 7 14:00:51 CEST 2008


-----Original Message----- From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Jeroen Ruigrok van der Werven Sent: Wednesday, May 07, 2008 05:20 To: Tom Pinckney Cc: python-dev at python.org Subject: Re: [Python-Dev] urllib unicode handling

-On [20080507 04:06], Tom Pinckney (thomaspinckney3 at gmail.com) wrote: >While in theory UTF-8 is not a standard, sites like Last.fm, Facebook and >Wikipedia seem to have embraced it (as have pretty much all other major web >sites). As with HTML, there is what the standard says and what the actual >browsers have to accept in order to work in the real world.

FYI, here is how we have patched urrlib2 for use in EVE:

--- C:\p4\sdk\stackless25\Lib\urllib.py 2008-03-21 14:47:23.000000000 -0000 +++ C:\p4\eve\KALI\common\stdlib\urllib.py 2007-11-06 11🔞01.000000000 -0000 @@ -1158,12 +1158,29 @@ except KeyError: res[i] = '%' + item except UnicodeDecodeError: res[i] = unichr(int(item[:2], 16)) + item[2:] return "".join(res)

+unquote_inner = unquote +def unquote(s):

@@ -1201,12 +1218,20 @@ for i in range(256): c = chr(i) safe_map[c] = (c in safe) and c or ('%%%02X' % i) _safemaps[cachekey] = safe_map res = map(safe_map.getitem, s) return ''.join(res) + +quote_inner = quote +def quote(s, safe = '/'):

def quote_plus(s, safe = ''): """Quote the query fragment of a URL; replacing ' ' with '+'""" if ' ' in s: s = quote(s, safe + ' ') return s.replace(' ', '+')



More information about the Python-Dev mailing list