[Web-SIG] Backup plan: WSGI 1 Addenda and wsgiref update for Py3 (original) (raw)

P.J. Eby pje at telecommunity.com
Tue Sep 21 18:09:44 CEST 2010

Previous message: [Web-SIG] PEP 444 (aka Web3)
Next message: [Web-SIG] Backup plan: WSGI 1 Addenda and wsgiref update for Py3
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

While the Web-SIG is trying to hash out PEP 444, I thought it would be a good idea to have a backup plan that would allow the Python 3 stdlib to move forward, without needing a major new spec to settle out implementation questions.

After all, even if PEP 333 is ultimately replaced by PEP 444, it's probably a good idea to have some sort of WSGI 1-ish thing available on Python 3, with bytes/unicode and other matters settled.

In the past, I was waiting for some consensuses (consensi?) on Web-SIG about different approaches to Python 3, looking for some sort of definite, "yes, we all like this" response. However, I can see now that this just means it's my fault we don't have a spec yet. :-(

So, unless any last-minute showstopper rebuttals show up this week, I've decided to go ahead officially bless nearly all of what Graham Dumpleton (who's not only the mod_wsgi author, but has put huge amounts of work into shepherding WSGI-on-Python3 proposals, WSGI amendments, etc.) has proposed, with a few minor exceptions.

In other words: almost none of the following is my own original work; it's like 90% Graham's. Any praise for this belongs to him; the only thing that belongs to me is the blame for not doing this sooner! (Sorry Graham. You asked me to do this ages ago, and you were right.)

Anyway, I'm posting this for comment to both Python-Dev and the Web-SIG. If you are commenting on the technical details of the amendments, please reply to the Web-SIG only. If you are commenting on the development agenda for wsgiref or other Python 3 library issues, please reply to Python-Dev only. That way, neither list will see off-topic discussions. Thanks!

The Plan

I plan to update the proposal below per comments and feedback during this week, then update PEP 333 itself over the weekend or early next week, followed by a code review of Python 3's wsgiref, and implementation of needed changes (such as recoding os.environ to latin1-captured bytes in the CGI handler).

To complete the changes, it is possible that I may need assistance from one or more developers who have more Python 3 experience. If after reading the proposed changes to the spec, you would like to volunteer to help with updating wsgiref to match, please let me know!

The Proposal

Overview

The primary purpose of this update is to provide a uniform porting pattern for moving Python 2 WSGI code to Python 3, meaning a pattern of changes that can be mechanically applied to as little code as practical, while still keeping the WSGI spec easy to programmatically validate (e.g. via wsgiref.validate).

The Python 3 specific changes are to use:

bytes for I/O streams in both directions
str for environ keys and values
bytes for arguments to start_response() and write()
text stream for wsgi.errors

In other words, "strings in, bytes out" for headers, bytes for bodies.

In general, only changes that don't break Python 2 WSGI implementations are allowed. The changes should also not break mod_wsgi on Python 3, but may make some Python 3 wsgi applications non-compliant, despite continuing to function on mod_wsgi.

This is because mod_wsgi allows applications to output string headers and bodies, but I am ruling that option out because it forces every piece of middleware to have to be tested with arbitrary combinations of strings and bytes in order to test compliance. If you want your application to output strings rather than bytes, you can always use a decorator to do that. (And a sample one could be provided in wsgiref.)

The secondary purpose of the update is to address some long-standing open issues documented here:

http://www.wsgi.org/wsgi/Amendments_1.0

As with the Python 3 changes, only changes that don't retroactively invalidate existing implementations are allowed.

There is no tertiary purpose. ;-) (By which I mean, all other kinds of changes are out-of-scope for this update.)
The section below labeled "A Note On String Types" is proposed for verbatim addition to the "Specification Overview" section in the PEP; the other sections below describe changes to be made inline at the appropriate part of the spec, and changes that were proposed but are rejected for inclusion in this amendment.

A Note On String Types

In general, HTTP deals with bytes, which means that this specification is mostly about handling bytes.

However, the content of those bytes often has some kind of textual interpretation, and in Python, strings are the most convenient way to handle text.

But in many Python versions and implementations, strings are Unicode, rather than bytes. This requires a careful balance between a usable API and correct translations between bytes and text in the context of HTTP... especially to support porting code between Python implementations with different str types.

WSGI therefore defines two kinds of "string":

"Native" strings (which are always implemented using the type named str)
"Bytestrings" (which are implemented using the bytes type in Python 3, and str elsewhere)

So, even though HTTP is in some sense "really just bytes", there are many API conveniences to be had by using whatever Python's default str type is.

Do not be confused however: even if Python's str is actually Unicode under the hood, the content of a native string is still restricted to bytes! See the section on Unicode Issues_ later in this document.

In short: where you see the word "string" in this document, it refers to a "native" string, i.e., an object of type str, whether it is internally implemented as bytes or unicode. Where you see references to "bytestring", this should be read as "an object of type bytes under Python 3, or type str under Python 2".

Clarifications (To be made in-line)

The following amendments are clarifications to parts of the existing spec that proved over the years to be ambiguous or insufficiently specified, as well as some attempts to correct practical errors.

(Note: many of these issues cannot be completely fixed in WSGI 1 without breaking existing implementations, and so the text below has notations such as "(MUST in WSGI 2)" to indicate where any replacement spec for WSGI 1 should strengthen them.)

If an application returns a body iterator, a server (or middleware) MAY stop iterating over it and discard the remainder of the output, as long as it calls any close() method provided by the iterator. Applications returning a generator or other custom iterator SHOULD NOT assume that the entire iterator will be consumed. (This change makes it explicit that caching middleware or HEAD-processing servers can throw away the response body.)
start_response() SHOULD (MUST in WSGI 2) check for errors in the status or headers at the time it's called, so that an error can be raised as close to the problem as possible
If start_response() raises an error when called normally (i.e. without exc_info), it SHOULD be an error to call it a second time without passing exc_info
The SERVER_PORT variable is of type str, just like any other CGI environ variable. (According to the WSGI wiki, "some implementations" expect it to be an integer, even though there is nothing in the WSGI spec that allows a CGI variable to be anything but a str.)
A server SHOULD (MUST in WSGI 2) support the size hint argument to readline() on its wsgi.input stream.
A server SHOULD (MUST in WSGI 2) return an empty bytestring from read() on wsgi.input to indicate an end-of-file condition. (In WSGI 2, language should be clarified to allow the input stream length and CONTENT_LENGTH to be out of sync, for reasons explained in Graham's blog post.)
A server SHOULD (MUST in WSGI 2) allow read() to be called without an argument, and return the entire remaining contents of the stream
If an application provides a Content-Length header, the server SHOULD NOT (MUST NOT in WSGI 2) send more data to the client than was specified in that header, whether via write(), yielded body bytestrings, or via a wsgi.file_wrapper. (This rule applies to middleware as well.)
wsgi.errors is a text stream accepting "native strings"

Rejected Amendments

Manlio Perillo's suggestion to allow header specification to be delayed until the response iterator is producing non-empty output. This would've been a possible win for async WSGI, but could require substantial changes to existing servers.

Previous message: [Web-SIG] PEP 444 (aka Web3)
Next message: [Web-SIG] Backup plan: WSGI 1 Addenda and wsgiref update for Py3
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Web-SIG mailing list