[Web-SIG] Backup plan: WSGI 1 Addenda and wsgiref update for Py3 (original) (raw)

P.J. Eby pje at telecommunity.com
Tue Sep 21 18:09:44 CEST 2010


While the Web-SIG is trying to hash out PEP 444, I thought it would be a good idea to have a backup plan that would allow the Python 3 stdlib to move forward, without needing a major new spec to settle out implementation questions.

After all, even if PEP 333 is ultimately replaced by PEP 444, it's probably a good idea to have some sort of WSGI 1-ish thing available on Python 3, with bytes/unicode and other matters settled.

In the past, I was waiting for some consensuses (consensi?) on Web-SIG about different approaches to Python 3, looking for some sort of definite, "yes, we all like this" response. However, I can see now that this just means it's my fault we don't have a spec yet. :-(

So, unless any last-minute showstopper rebuttals show up this week, I've decided to go ahead officially bless nearly all of what Graham Dumpleton (who's not only the mod_wsgi author, but has put huge amounts of work into shepherding WSGI-on-Python3 proposals, WSGI amendments, etc.) has proposed, with a few minor exceptions.

In other words: almost none of the following is my own original work; it's like 90% Graham's. Any praise for this belongs to him; the only thing that belongs to me is the blame for not doing this sooner! (Sorry Graham. You asked me to do this ages ago, and you were right.)

Anyway, I'm posting this for comment to both Python-Dev and the Web-SIG. If you are commenting on the technical details of the amendments, please reply to the Web-SIG only. If you are commenting on the development agenda for wsgiref or other Python 3 library issues, please reply to Python-Dev only. That way, neither list will see off-topic discussions. Thanks!

The Plan

I plan to update the proposal below per comments and feedback during this week, then update PEP 333 itself over the weekend or early next week, followed by a code review of Python 3's wsgiref, and implementation of needed changes (such as recoding os.environ to latin1-captured bytes in the CGI handler).

To complete the changes, it is possible that I may need assistance from one or more developers who have more Python 3 experience. If after reading the proposed changes to the spec, you would like to volunteer to help with updating wsgiref to match, please let me know!

The Proposal

Overview

  1. The primary purpose of this update is to provide a uniform porting pattern for moving Python 2 WSGI code to Python 3, meaning a pattern of changes that can be mechanically applied to as little code as practical, while still keeping the WSGI spec easy to programmatically validate (e.g. via wsgiref.validate).

The Python 3 specific changes are to use:

In other words, "strings in, bytes out" for headers, bytes for bodies.

In general, only changes that don't break Python 2 WSGI implementations are allowed. The changes should also not break mod_wsgi on Python 3, but may make some Python 3 wsgi applications non-compliant, despite continuing to function on mod_wsgi.

This is because mod_wsgi allows applications to output string headers and bodies, but I am ruling that option out because it forces every piece of middleware to have to be tested with arbitrary combinations of strings and bytes in order to test compliance. If you want your application to output strings rather than bytes, you can always use a decorator to do that. (And a sample one could be provided in wsgiref.)

  1. The secondary purpose of the update is to address some long-standing open issues documented here:

    http://www.wsgi.org/wsgi/Amendments_1.0

As with the Python 3 changes, only changes that don't retroactively invalidate existing implementations are allowed.

  1. There is no tertiary purpose. ;-) (By which I mean, all other kinds of changes are out-of-scope for this update.)

  2. The section below labeled "A Note On String Types" is proposed for verbatim addition to the "Specification Overview" section in the PEP; the other sections below describe changes to be made inline at the appropriate part of the spec, and changes that were proposed but are rejected for inclusion in this amendment.

A Note On String Types

In general, HTTP deals with bytes, which means that this specification is mostly about handling bytes.

However, the content of those bytes often has some kind of textual interpretation, and in Python, strings are the most convenient way to handle text.

But in many Python versions and implementations, strings are Unicode, rather than bytes. This requires a careful balance between a usable API and correct translations between bytes and text in the context of HTTP... especially to support porting code between Python implementations with different str types.

WSGI therefore defines two kinds of "string":

So, even though HTTP is in some sense "really just bytes", there are many API conveniences to be had by using whatever Python's default str type is.

Do not be confused however: even if Python's str is actually Unicode under the hood, the content of a native string is still restricted to bytes! See the section on Unicode Issues_ later in this document.

In short: where you see the word "string" in this document, it refers to a "native" string, i.e., an object of type str, whether it is internally implemented as bytes or unicode. Where you see references to "bytestring", this should be read as "an object of type bytes under Python 3, or type str under Python 2".

Clarifications (To be made in-line)

The following amendments are clarifications to parts of the existing spec that proved over the years to be ambiguous or insufficiently specified, as well as some attempts to correct practical errors.

(Note: many of these issues cannot be completely fixed in WSGI 1 without breaking existing implementations, and so the text below has notations such as "(MUST in WSGI 2)" to indicate where any replacement spec for WSGI 1 should strengthen them.)

Rejected Amendments



More information about the Web-SIG mailing list