[Python-Dev] Copying cgi.parse_qs() to the urllib.parse module (original) (raw)
Tom Pinckney thomaspinckney3 at gmail.com
Mon May 12 22:58:47 CEST 2008
- Previous message: [Python-Dev] Copying cgi.parse_qs() to the urllib.parse module
- Next message: [Python-Dev] Copying cgi.parse_qs() to the urllib.parse module
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Is there any thought to extending escape to escape / unescape to by
default handle characters other than <, >, and &? At a minimum it
should handle arbitrary &xxx; values. Ideally, it would also handle
common other symbolic names besides < > etc.
HTML from common web sites such as nytimes.com frequently has a
variety of characters escaped.
Consider the page at http://travel.nytimes.com/travel/guides/europe/france/provence-and-the-french-riviera/overview.html
It lists its content type as:
content="text/html; charset=UTF-8"
And contains text like:
There’s the Côte d’
Ideally, we would decode ’ into ’ and ô into ô.
Unfortunately, #146 is really an error -- it's not a utf-8 encoded
unicode character but really a MS codepage 1252 character for
apostrophe (apparently may HTML editing systems intermingle unicode
and codepage 1252 content for apostrophes and a few other common
characters).
I'm happy to contribute some additional code for these other cases if
people agree it's useful.
On May 12, 2008, at 10:36 AM, Tony Nelson wrote:
At 11:56 PM -0400 5/10/08, Fred Drake wrote:
On May 10, 2008, at 11:49 PM, Guido van Rossum wrote:
Works for me. The other thing I always use from cgi is escape() -- will that be available somewhere else too?
xml.sax.saxutils.escape() would be an appropriate replacement, though the location is a little funky. At least it's right next to the valuable quoteattr(). --
TonyN.:' <mailto:tonynelson at georgeanelson.com> ' <http://www.georgeanelson.com/>
Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/thomaspinckney3%40gmail.com
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20080512/3c819faa/attachment.htm>
- Previous message: [Python-Dev] Copying cgi.parse_qs() to the urllib.parse module
- Next message: [Python-Dev] Copying cgi.parse_qs() to the urllib.parse module
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]