[Python-Dev] PEP 3333: wsgi_string() function (original) (raw)

P.J. Eby pje at telecommunity.com
Fri Jan 7 06:12:16 CET 2011

Previous message: [Python-Dev] PEP 3333: wsgi_string() function
Next message: [Python-Dev] PEP 3333: wsgi_string() function
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

At 04:00 PM 1/6/2011 -0800, Raymond Hettinger wrote:

Can you please take a look at <http://docs.python.org/dev/whatsnew/3.2.html#pep-3333-python-web-server-gateway-interface-v1-0-1>http://docs.python.org/dev/whatsnew/3.2.html#pep-3333-python-web-server-gateway-interface-v1-0-1 to see if it accurately recaps the resolution of the WSGI text/bytes issues. I would appreciate any feedback, as it is likely that the whatsnew document will be most people's first chance to hear the outcome of the multi-year discussion.

Hi Raymond -- nice work there. A few minor suggestions:

Native strings are used as the keys and values of the environ dictionary, not just as headers for start_response.
The read_environ() method is strictly for use with CGI-to-WSGI gateways, or for bridging other CGI-like protocols (e.g. FastCGI) to WSGI. It is ONLY for server implementers, in other words, and the typical app developer is doing something terribly wrong if they are even bothering to read its documentation. ;-)
The primary relevance of the "native string" type to an app developer is that when porting code from Python 2 to 3, they must still decode environment variable values, even though they are "already" Unicode. If their code was previously dealing only in Python 2 'str' objects, then nothing really changes. If they were previously decoding from environ str's to unicode, then they must replace their prior .decode('whatever') with .encode('latin1').decode('whatever'). That's basically it for porting from Python 2.

IOW, this design choice allows most HTTP header manipulating code (whether input or output) to be ported to Python 3 with a very mechanical change pattern. Most such code is working with ASCII anyway, since normally both input and output headers are, and there are few headers that an application would be likely to convert to actual unicode anyway.

On output via send_response(), if an application is currently encoding an output header -- why they would be, I have no idea, but if they are -- they need to add a re-encode to latin1. (i.e., .encode('whatever').decode('latin1'))

IOW, a short 2-to-3 porting guide for WSGI:

If you just used strings for headers before, that part of your code doesn't change. (And if it was broken before, it's still broken in exactly the same way. No new breakage is introduced. ;-) )
If you encoded any output headers or decoded any input headers, you must take into account the extra latin1 step. This is expected to be rare, since it's usually only SCRIPT_NAME and PATH_INFO that anybody would ever care about on input, and almost never anything on output.
Values yielded by an application or sent via a write() call MUST be byte strings; The environ and start_response() MUST be native strings. No mixing and matching.

Previous message: [Python-Dev] PEP 3333: wsgi_string() function
Next message: [Python-Dev] PEP 3333: wsgi_string() function
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list