[Python-Dev] [Python-checkins] r84842 - peps/trunk/pep-0444.txt (original) (raw)

Brett Cannon brett at python.org
Thu Sep 16 01:15:33 CEST 2010


Can I just ask why 444 since 392 was the last assigned Python 2 number?

On Wed, Sep 15, 2010 at 15:40, georg.brandl <python-checkins at python.org> wrote:

Author: georg.brandl Date: Thu Sep 16 00:40:38 2010 New Revision: 84842

Log: Add PEP 444, Python Web3 Interface. Added: peps/trunk/pep-0444.txt   (contents, props changed) Added: peps/trunk/pep-0444.txt ============================================================================== --- (empty file) +++ peps/trunk/pep-0444.txt     Thu Sep 16 00:40:38 2010 @@ -0,0 +1,1570 @@ +PEP: 444 +Title: Python Web3 Interface +Version: RevisionRevisionRevision +Last-Modified: DateDateDate +Author: Chris McDonough <chrism at plope.com>, +        Armin Ronacher <armin.ronacher at active-4.com> +Discussions-To: Python Web-SIG <web-sig at python.org> +Status: Draft +Type: Informational +Content-Type: text/x-rst +Created: 19-Jul-2010 + + +Abstract +======== + +This document specifies a proposed second-generation standard +interface between web servers and Python web applications or +frameworks. + + +Rationale and Goals +=================== + +This protocol and specification is influenced heavily by the Web +Services Gateway Interface (WSGI) 1.0 standard described in PEP 333 +[1] .  The high-level rationale for having any standard that allows +Python-based web servers and applications to interoperate is outlined +in PEP 333.  This document essentially uses PEP 333 as a template, and +changes its wording in various places for the purpose of forming a +different standard. + +Python currently boasts a wide variety of web application frameworks +which use the WSGI 1.0 protocol.  However, due to changes in the +language, the WSGI 1.0 protocol is not compatible with Python 3.  This +specification describes a standardized WSGI-like protocol that lets +Python 2.6, 2.7 and 3.1+ applications communicate with web servers. +Web3 is clearly a WSGI derivative; it only uses a different name than +"WSGI" in order to indicate that it is not in any way backwards +compatible. + +Applications and servers which are written to this specification are +meant to work properly under Python 2.6.X, Python 2.7.X and Python +3.1+.  Neither an application nor a server that implements the Web3 +specification can be easily written which will work under Python 2 +versions earlier than 2.6 nor Python 3 versions earlier than 3.1. + +.. note:: + +   Whatever Python 3 version fixed http://bugs.python.org/issue4006 so +   os.environ['foo'] returns surrogates (ala PEP 383) when the +   value of 'foo' cannot be decoded using the current locale instead +   of failing with a KeyError is the true minimum Python 3 version. +   In particular, however, Python 3.0 is not supported. + +.. note:: + +   Python 2.6 is the first Python version that supported an alias for +   bytes and the b"foo" literal syntax.  This is why it is the +   minimum version supported by Web3. + +Explicability and documentability are the main technical drivers for +the decisions made within the standard. + + +Differences from WSGI +===================== + +- All protocol-specific environment names are prefixed with web3. +  rather than wsgi., eg. web3.input rather than +  wsgi.input. + +- All values present as environment dictionary values are explicitly +  bytes instances instead of native strings.  (Environment keys +  however are native strings, always str regardless of +  platform). + +- All values returned by an application must be bytes instances, +  including status code, header names and values, and the body. + +- Wherever WSGI 1.0 referred to an appiter, this specification +  refers to a body. + +- No startresponse() callback (and therefore no write() +  callable nor excinfo data). + +- The readline() function of web3.input must support a size +  hint parameter. + +- The read() function of web3.input must be length delimited. +  A call without a size argument must not read more than the content +  length header specifies.  In case a content length header is absent +  the stream must not return anything on read.  It must never request +  more data than specified from the client. + +- No requirement for middleware to yield an empty string if it needs +  more information from an application to produce output (e.g. no +  "Middleware Handling of Block Boundaries"). + +- Filelike objects passed to a "filewrapper" must have an +  _iter_ which returns bytes (never text). + +- wsgi.filewrapper is not supported. + +- QUERYSTRING, SCRIPTNAME, PATHINFO values required to +  be placed in environ by server (each as the empty bytes instance if +  no associated value is received in the HTTP request). + +- web3.pathinfo and web3.scriptname should be put into the +  Web3 environment, if possible, by the origin Web3 server.  When +  available, each is the original, plain 7-bit ASCII, URL-encoded +  variant of its CGI equivalent derived directly from the request URI +  (with %2F segment markers and other meta-characters intact).  If the +  server cannot provide one (or both) of these values, it must omit +  the value(s) it cannot provide from the environment. + +- This requirement was removed: "middleware components must not +  block iteration waiting for multiple values from an application +  iterable.  If the middleware needs to accumulate more data from the +  application before it can produce any output, it must yield an +  empty string." + +- SERVERPORT must be a bytes instance (not an integer). + +- The server must not inject an additional Content-Length header +  by guessing the length from the response iterable.  This must be set +  by the application itself in all situations. + +- If the origin server advertises that it has the web3.async +  capability, a Web3 application callable used by the server is +  permitted to return a callable that accepts no arguments.  When it +  does so, this callable is to be called periodically by the origin +  server until it returns a non-None response, which must be a +  normal Web3 response tuple. + +  .. XXX (chrism) Needs a section of its own for explanation. + + +Specification Overview +====================== + +The Web3 interface has two sides: the "server" or "gateway" side, and +the "application" or "framework" side.  The server side invokes a +callable object that is provided by the application side.  The +specifics of how that object is provided are up to the server or +gateway.  It is assumed that some servers or gateways will require an +application's deployer to write a short script to create an instance +of the server or gateway, and supply it with the application object. +Other servers and gateways may use configuration files or other +mechanisms to specify where an application object should be imported +from, or otherwise obtained. + +In addition to "pure" servers/gateways and applications/frameworks, it +is also possible to create "middleware" components that implement both +sides of this specification.  Such components act as an application to +their containing server, and as a server to a contained application, +and can be used to provide extended APIs, content transformation, +navigation, and other useful functions. + +Throughout this specification, we will use the term "application +callable" to mean "a function, a method, or an instance with a +_call_ method".  It is up to the server, gateway, or application +implementing the application callable to choose the appropriate +implementation technique for their needs.  Conversely, a server, +gateway, or application that is invoking a callable must not have +any dependency on what kind of callable was provided to it. +Application callables are only to be called, not introspected upon. + + +The Application/Framework Side +------------------------------ + +The application object is simply a callable object that accepts one +argument.  The term "object" should not be misconstrued as requiring +an actual object instance: a function, method, or instance with a +_call_ method are all acceptable for use as an application +object.  Application objects must be able to be invoked more than +once, as virtually all servers/gateways (other than CGI) will make +such repeated requests.  It this cannot be guaranteed by the +implementation of the actual application, it has to be wrapped in a +function that creates a new instance on each call. + +.. note:: + +   Although we refer to it as an "application" object, this should not +   be construed to mean that application developers will use Web3 as a +   web programming API.  It is assumed that application developers +   will continue to use existing, high-level framework services to +   develop their applications.  Web3 is a tool for framework and +   server developers, and is not intended to directly support +   application developers.) + +An example of an application which is a function (simpleapp):: + +    def simpleapp(environ): +        """Simplest possible application object""" +        status = b'200 OK' +        headers = [(b'Content-type', b'text/plain')] +        body = [b'Hello world!\n'] +        return body, status, headers + +An example of an application which is an instance (simpleapp):: + +    class AppClass(object): + +        """Produce the same output, but using an instance.  An +        instance of this class must be instantiated before it is +        passed to the server.  """ + +      def call(self, environ): +            status = b'200 OK' +            headers = [(b'Content-type', b'text/plain')] +            body = [b'Hello world!\n'] +            return body, status, headers + +    simpleapp = AppClass() + +Alternately, an application callable may return a callable instead of +the tuple if the server supports asynchronous execution.  See +information concerning web3.async for more information. + + +The Server/Gateway Side +----------------------- + +The server or gateway invokes the application callable once for each +request it receives from an HTTP client, that is directed at the +application.  To illustrate, here is a simple CGI gateway, implemented +as a function taking an application object.  Note that this simple +example has limited error handling, because by default an uncaught +exception will be dumped to sys.stderr and logged by the web +server. + +:: + +    import locale +    import os +    import sys + +    encoding = locale.getpreferredencoding() + +    stdout = sys.stdout + +    if hasattr(sys.stdout, 'buffer'): +        # Python 3 compatibility; we need to be able to push bytes out +        stdout = sys.stdout.buffer + +    def getenviron(): +        d = {} +        for k, v in os.environ.items(): +            # Python 3 compatibility +            if not isinstance(v, bytes): +                # We must explicitly encode the string to bytes under +                # Python 3.1+ +                v = v.encode(encoding, 'surrogateescape') +            d[k] = v +        return d + +    def runwithcgi(application): + +        environ = getenviron() +        environ['web3.input']        = sys.stdin +        environ['web3.errors']       = sys.stderr +        environ['web3.version']      = (1, 0) +        environ['web3.multithread']  = False +        environ['web3.multiprocess'] = True +        environ['web3.runonce']     = True +        environ['web3.async']        = False + +        if environ.get('HTTPS', b'off') in (b'on', b'1'): +            environ['web3.urlscheme'] = b'https' +        else: +            environ['web3.urlscheme'] = b'http' + +        rv = application(environ) +        if hasattr(rv, 'call'): +            raise TypeError('This webserver does not support asynchronous ' +                            'responses.') +        body, status, headers = rv + +        CLRF = b'\r\n' + +        try: +            stdout.write(b'Status: ' + status + CRLF) +            for headername, headerval in headers: +                stdout.write(headername + b': ' + headerval + CRLF) +            stdout.write(CRLF) +            for chunk in body: +                stdout.write(chunk) +                stdout.flush() +        finally: +            if hasattr(body, 'close'): +                body.close() + + +Middleware: Components that Play Both Sides +------------------------------------------- + +A single object may play the role of a server with respect to some +application(s), while also acting as an application with respect to +some server(s).  Such "middleware" components can perform such +functions as: + +* Routing a request to different application objects based on the +  target URL, after rewriting the environ accordingly. + +* Allowing multiple applications or frameworks to run side-by-side in +  the same process. + +* Load balancing and remote processing, by forwarding requests and +  responses over a network. + +* Perform content postprocessing, such as applying XSL stylesheets. + +The presence of middleware in general is transparent to both the +"server/gateway" and the "application/framework" sides of the +interface, and should require no special support.  A user who desires +to incorporate middleware into an application simply provides the +middleware component to the server, as if it were an application, and +configures the middleware component to invoke the application, as if +the middleware component were a server.  Of course, the "application" +that the middleware wraps may in fact be another middleware component +wrapping another application, and so on, creating what is referred to +as a "middleware stack". + +A middleware must support asychronous execution if possible or fall +back to disabling itself. + +Here a middleware that changes the HTTPHOST key if an X-Host +header exists and adds a comment to all html responses:: + +    import time + +    def applyfilter(app, environ, filterfunc): +        """Helper function that passes the return value from an +        application to a filter function when the results are +        ready. +        """ +        appresponse = app(environ) + +        # synchronous response, filter now +        if not hasattr(appresponse, 'call'): +            return filterfunc(*appresponse) + +        # asychronous response.  filter when results are ready +        def pollingfunction(): +            rv = appresponse() +            if rv is not None: +                return filterfunc(*rv) +        return pollingfunction + +    def proxyandtimingsupport(app): +        def newapplication(environ): +            def filterfunc(body, status, headers): +                now = time.time() +                for key, value in headers: _+                    if key.lower() == b'content-type' and _ +                       value.split(b';')[0] == b'text/html': +                        # assumes ascii compatible encoding in body, +                        # but the middleware should actually parse the +                        # content type header and figure out the +                        # encoding when doing that. +                        body += ('' % +                                 (now - then)).encode('ascii') +                        break +                return body, status, headers +            then = time.time() +            host = environ.get('HTTPXHOST') +            if host is not None: +                environ['HTTPHOST'] = host + +            # use the applyfilter function that applies a given filter +            # function for both async and sync responses. +            return applyfilter(app, environ, filterfunc) +        return newapplication + +    app = proxyandtimingsupport(app) + + +Specification Details +===================== + +The application callable must accept one positional argument.  For the +sake of illustration, we have named it environ, but it is not +required to have this name.  A server or gateway must invoke the +application object using a positional (not keyword) argument. +(E.g. by calling status, headers, body = application(environ) as +shown above.) + +The environ parameter is a dictionary object, containing CGI-style +environment variables.  This object must be a builtin Python +dictionary (not a subclass, UserDict or other dictionary +emulation), and the application is allowed to modify the dictionary in +any way it desires.  The dictionary must also include certain +Web3-required variables (described in a later section), and may also +include server-specific extension variables, named according to a +convention that will be described below. + +When called by the server, the application object must return a tuple +yielding three elements: status, headers and body, or, if +supported by an async server, an argumentless callable which either +returns None or a tuple of those three elements. + +The status element is a status in bytes of the form ``b'999 +Message here'``. + +headers is a Python list of (headername, headervalue) pairs +describing the HTTP response header.  The headers structure must +be a literal Python list; it must yield two-tuples.  Both +headername and headervalue must be bytes values. + +The body is an iterable yielding zero or more bytes instances. +This can be accomplished in a variety of ways, such as by returning a +list containing bytes instances as body, or by returning a +generator function as body that yields bytes instances, or by the +body being an instance of a class which is iterable.  Regardless +of how it is accomplished, the application object must always return a +body iterable yielding zero or more bytes instances. + +The server or gateway must transmit the yielded bytes to the client in +an unbuffered fashion, completing the transmission of each set of +bytes before requesting another one.  (In other words, applications +should perform their own buffering.  See the Buffering and_ _+Streaming section below for more on how application output must be +handled.) + +The server or gateway should treat the yielded bytes as binary byte +sequences: in particular, it should ensure that line endings are not +altered.  The application is responsible for ensuring that the +string(s) to be written are in a format suitable for the client.  (The +server or gateway may apply HTTP transfer encodings, or perform +other transformations for the purpose of implementing HTTP features +such as byte-range transmission.  See Other HTTP Features, below, +for more details.) + +If the body iterable returned by the application has a close() +method, the server or gateway must call that method upon +completion of the current request, whether the request was completed +normally, or terminated early due to an error.  This is to support +resource release by the application amd is intended to complement PEP +325's generator support, and other common iterables with close() +methods. + +Finally, servers and gateways must not directly use any other +attributes of the body iterable returned by the application. + + +environ Variables +--------------------- + +The environ dictionary is required to contain various CGI +environment variables, as defined by the Common Gateway Interface +specification [2]. + +The following CGI variables must be present.  Each key is a native +string.  Each value is a bytes instance. + +.. note:: + +   In Python 3.1+, a "native string" is a str type decoded using +   the surrogateescape error handler, as done by +   os.environ._getitem_.  In Python 2.6 and 2.7, a "native +   string" is a str types representing a set of bytes. + +REQUESTMETHOD +  The HTTP request method, such as "GET" or "POST". + +SCRIPTNAME +  The initial portion of the request URL's "path" that corresponds to +  the application object, so that the application knows its virtual +  "location".  This may be the empty bytes instance if the application +  corresponds to the "root" of the server.  SCRIPTNAME will be a +  bytes instance representing a sequence of URL-encoded segments +  separated by the slash character (/).  It is assumed that +  %2F characters will be decoded into literal slash characters +  within PATHINFO , as per CGI. + +PATHINFO +  The remainder of the request URL's "path", designating the virtual +  "location" of the request's target within the application.  This +  may be a bytes instance if the request URL targets the +  application root and does not have a trailing slash.  PATHINFO will +  be a bytes instance representing a sequence of URL-encoded segments +  separated by the slash character (/).  It is assumed that +  %2F characters will be decoded into literal slash characters +  within PATHINFO , as per CGI. + +QUERYSTRING +  The portion of the request URL (in bytes) that follows the "?", +  if any, or the empty bytes instance. + +SERVERNAME, SERVERPORT +  When combined with SCRIPTNAME and PATHINFO (or their raw +  equivalents), these variables can be used to complete the URL._ _+  Note, however, that ``HTTPHOST``, if present, should be used in_ _+  preference to ``SERVERNAME`` for reconstructing the request URL._ _+  See the URL Reconstructionsection below for more detail._ _+  ``SERVERPORT`` should be a bytes instance, not an integer._ _+_ _+``SERVERPROTOCOL``_ _+  The version of the protocol the client used to send the request._ _+  Typically this will be something like ``"HTTP/1.0"`` or_ _+  ``"HTTP/1.1"`` and may be used by the application to determine how_ _+  to treat any HTTP request headers.  (This variable should probably_ _+  be called ``REQUESTPROTOCOL``, since it denotes the protocol used_ _+  in the request, and is not necessarily the protocol that will be_ _+  used in the server's response.  However, for compatibility with CGI_ _+  we have to keep the existing name.)_ _+_ _+The following CGI values **may** present be in the Web3 environment._ _+Each key is a native string.  Each value is a bytes instances._ _+_ _+``CONTENTTYPE``_ _+  The contents of any ``Content-Type`` fields in the HTTP request._ _+_ _+``CONTENTLENGTH``_ _+  The contents of any ``Content-Length`` fields in the HTTP request._ _+_ _+``HTTP`` Variables_ _+  Variables corresponding to the client-supplied HTTP request headers_ _+  (i.e., variables whose names begin with ``"HTTP"``).  The presence_ _+  or absence of these variables should correspond with the presence or_ _+  absence of the appropriate HTTP header in the request._ _+_ _+A server or gateway **should** attempt to provide as many other CGI_ _+variables as are applicable, each with a string for its key and a_ _+bytes instance for its value.  In addition, if SSL is in use, the_ _+server or gateway **should** also provide as many of the Apache SSL_ _+environment variables [5] as are applicable, such as ``HTTPS=on`` and_ _+``SSLPROTOCOL``.  Note, however, that an application that uses any_ _+CGI variables other than the ones listed above are necessarily_ _+non-portable to web servers that do not support the relevant_ _+extensions.  (For example, web servers that do not publish files will_ _+not be able to provide a meaningful ``DOCUMENTROOT`` or_ _+``PATHTRANSLATED``.)_ _+_ _+A Web3-compliant server or gateway **should** document what variables_ _+it provides, along with their definitions as appropriate._ _+Applications **should** check for the presence of any variables they_ _+require, and have a fallback plan in the event such a variable is_ _+absent._ _+_ _+Note that CGI variable *values* must be bytes instances, if they are_ _+present at all.  It is a violation of this specification for a CGI_ _+variable's value to be of any type other than ``bytes``.  On Python 2,_ _+this means they will be of type ``str``.  On Python 3, this means they_ _+will be of type ``bytes``._ _+_ _+They *keys* of all CGI and non-CGI variables in the environ, however,_ _+must be "native strings" (on both Python 2 and Python 3, they will be_ _+of type ``str``)._ _+_ _+In addition to the CGI-defined variables, the ``environ`` dictionary_ _+**may** also contain arbitrary operating-system "environment_ _+variables", and **must** contain the following Web3-defined variables._ _+_ _+=====================  ===============================================_ _+Variable               Value_ _+=====================  ===============================================_ _+``web3.version``       The tuple ``(1, 0)``, representing Web3_ _+                       version 1.0._ _+_ _+``web3.urlscheme``    A bytes value representing the "scheme" portion of_ _+                       the URL at which the application is being_ _+                       invoked.  Normally, this will have the value_ _+                       ``b"http"`` or ``b"https"``, as appropriate._ _+_ _+``web3.input``         An input stream (file-like object) from which bytes_ _+                       constituting the HTTP request body can be read._ _+                       (The server or gateway may perform reads_ _+                       on-demand as requested by the application, or_ _+                       it may pre- read the client's request body and_ _+                       buffer it in-memory or on disk, or use any_ _+                       other technique for providing such an input_ _+                       stream, according to its preference.)_ _+_ _+``web3.errors``        An output stream (file-like object) to which error_ _+                       output text can be written, for the purpose of_ _+                       recording program or other errors in a_ _+                       standardized and possibly centralized location._ _+                       This should be a "text mode" stream; i.e.,_ _+                       applications should use ``"\n"`` as a line_ _+                       ending, and assume that it will be converted to_ _+                       the correct line ending by the server/gateway._ _+                       Applications may *not* send bytes to the_ _+                       'write' method of this stream; they may only_ _+                       send text._ _+_ _+                       For many servers, ``web3.errors`` will be the_ _+                       server's main error log. Alternatively, this_ _+                       may be ``sys.stderr``, or a log file of some_ _+                       sort.  The server's documentation should_ _+                       include an explanation of how to configure this_ _+                       or where to find the recorded output.  A server_ _+                       or gateway may supply different error streams_ _+                       to different applications, if this is desired._ _+_ _+``web3.multithread``   This value should evaluate true if the_ _+                       application object may be simultaneously_ _+                       invoked by another thread in the same process,_ _+                       and should evaluate false otherwise._ _+_ _+``web3.multiprocess``  This value should evaluate true if an_ _+                       equivalent application object may be_ _+                       simultaneously invoked by another process, and_ _+                       should evaluate false otherwise._ _+_ _+``web3.runonce``      This value should evaluate true if the server_ _+                       or gateway expects (but does not guarantee!)_ _+                       that the application will only be invoked this_ _+                       one time during the life of its containing_ _+                       process.  Normally, this will only be true for_ _+                       a gateway based on CGI (or something similar)._ _+_ _+``web3.scriptname``   The non-URL-decoded ``SCRIPTNAME`` value._ _+                       Through a historical inequity, by virtue of the_ _+                       CGI specification, ``SCRIPTNAME`` is present_ _+                       within the environment as an already_ _+                       URL-decoded string.  This is the original_ _+                       URL-encoded value derived from the request URI._ _+                       If the server cannot provide this value, it_ _+                       must omit it from the environ._ _+_ _+``web3.pathinfo``     The non-URL-decoded ``PATHINFO`` value._ _+                       Through a historical inequity, by virtue of the_ _+                       CGI specification, ``PATHINFO`` is present_ _+                       within the environment as an already_ _+                       URL-decoded string.  This is the original_ _+                       URL-encoded value derived from the request URI._ _+                       If the server cannot provide this value, it_ _+                       must omit it from the environ._ _+_ _+``web3.async``         This is ``True`` if the webserver supports_ _+                       async invocation.  In that case an application_ _+                       is allowed to return a callable instead of a_ _+                       tuple with the response.  The exact semantics_ _+                       are not specified by this specification._ _+_ _+=====================  ===============================================_ _+_ _+Finally, the ``environ`` dictionary may also contain server-defined_ _+variables.  These variables should have names which are native_ _+strings, composed of only lower-case letters, numbers, dots, and_ _+underscores, and should be prefixed with a name that is unique to the_ _+defining server or gateway.  For example, ``modweb3`` might define_ _+variables with names like ``modweb3.somevariable``._ _+_ _+_ _+Input Stream_ _+~~~~~~~~~~~~_ _+_ _+The input stream (``web3.input``) provided by the server must support_ _+the following methods:_ _+_ _+=====================  ========_ _+Method                 Notes_ _+=====================  ========_ _+``read(size)``         1,4_ _+``readline([size])``   1,2,4_ _+``readlines([size])``  1,3,4_ _+``_iter_()``         4_ _+=====================  ========_ _+_ _+The semantics of each method are as documented in the Python Library_ _+Reference, except for these notes as listed in the table above:_ _+_ _+1. The server is not required to read past the client's specified_ _+   ``Content-Length``, and is allowed to simulate an end-of-file_ _+   condition if the application attempts to read past that point.  The_ _+   application **should not** attempt to read more data than is_ _+   specified by the ``CONTENTLENGTH`` variable._ _+_ _+2. The implementation must support the optional ``size`` argument to_ _+   ``readline()``._ _+_ _+3. The application is free to not supply a ``size`` argument to_ _+   ``readlines()``, and the server or gateway is free to ignore the_ _+   value of any supplied ``size`` argument._ _+_ _+4. The ``read``, ``readline`` and ``_iter_`` methods must return a_ _+   bytes instance.  The ``readlines`` method must return a sequence_ _+   which contains instances of bytes._ _+_ _+The methods listed in the table above **must** be supported by all_ _+servers conforming to this specification.  Applications conforming to_ _+this specification **must not** use any other methods or attributes of_ _+the ``input`` object.  In particular, applications **must not**_ _+attempt to close this stream, even if it possesses a ``close()``_ _+method._ _+_ _+The input stream should silently ignore attempts to read more than the_ _+content length of the request.  If no content length is specified the_ _+stream must be a dummy stream that does not return anything._ _+_ _+_ _+Error Stream_ _+~~~~~~~~~~~~_ _+_ _+The error stream (``web3.errors``) provided by the server must support_ _+the following methods:_ _+_ _+===================   ==========  ========_ _+Method                Stream      Notes_ _+===================   ==========  ========_ _+``flush()``           ``errors``  1_ _+``write(str)``        ``errors``  2_ _+``writelines(seq)``   ``errors``  2_ _+===================   ==========  ========_ _+_ _+The semantics of each method are as documented in the Python Library_ _+Reference, except for these notes as listed in the table above:_ _+_ _+1. Since the ``errors`` stream may not be rewound, servers and_ _+   gateways are free to forward write operations immediately, without_ _+   buffering.  In this case, the ``flush()`` method may be a no-op._ _+   Portable applications, however, cannot assume that output is_ _+   unbuffered or that ``flush()`` is a no-op.  They must call_ _+   ``flush()`` if they need to ensure that output has in fact been_ _+   written.  (For example, to minimize intermingling of data from_ _+   multiple processes writing to the same error log.)_ _+_ _+2. The ``write()`` method must accept a string argument, but needn't_ _+   necessarily accept a bytes argument.  The ``writelines()`` method_ _+   must accept a sequence argument that consists entirely of strings,_ _+   but needn't necessarily accept any bytes instance as a member of_ _+   the sequence._ _+_ _+The methods listed in the table above **must** be supported by all_ _+servers conforming to this specification.  Applications conforming to_ _+this specification **must not** use any other methods or attributes of_ _+the ``errors`` object.  In particular, applications **must not**_ _+attempt to close this stream, even if it possesses a ``close()``_ _+method._ _+_ _+_ _+Values Returned by A Web3 Application_ _+-------------------------------------_ _+_ _+Web3 applications return an iterable in the form (``status``,_ _+``headers``, ``body``).  The return value can be any iterable type_ _+that returns exactly three values.  If the server supports_ _+asynchronous applications (``web3.async``), the response may be a_ _+callable object (which accepts no arguments)._ _+_ _+The ``status`` value is assumed by a gateway or server to be an HTTP_ _+"status" bytes instance like ``b'200 OK'`` or ``b'404 Not Found'``._ _+That is, it is a string consisting of a Status-Code and a_ _+Reason-Phrase, in that order and separated by a single space, with no_ _+surrounding whitespace or other characters.  (See RFC 2616, Section_ _+6.1.1 for more information.)  The string **must not** contain control_ _+characters, and must not be terminated with a carriage return,_ _+linefeed, or combination thereof._ _+_ _+The ``headers`` value is assumed by a gateway or server to be a_ _+literal Python list of ``(headername, headervalue)`` tuples.  Each_ _+``headername`` must be a bytes instance representing a valid HTTP_ _+header field-name (as defined by RFC 2616, Section 4.2), without a_ _+trailing colon or other punctuation.  Each ``headervalue`` must be a_ _+bytes instance and **must not** include any control characters,_ _+including carriage returns or linefeeds, either embedded or at the_ _+end.  (These requirements are to minimize the complexity of any_ _+parsing that must be performed by servers, gateways, and intermediate_ _+response processors that need to inspect or modify response headers.)_ _+_ _+In general, the server or gateway is responsible for ensuring that_ _+correct headers are sent to the client: if the application omits a_ _+header required by HTTP (or other relevant specifications that are in_ _+effect), the server or gateway **must** add it.  For example, the HTTP_ _+``Date:`` and ``Server:`` headers would normally be supplied by the_ _+server or gateway.  The gateway must however not override values with_ _+the same name if they are emitted by the application._ _+_ _+(A reminder for server/gateway authors: HTTP header names are_ _+case-insensitive, so be sure to take that into consideration when_ _+examining application-supplied headers!)_ _+_ _+Applications and middleware are forbidden from using HTTP/1.1_ _+"hop-by-hop" features or headers, any equivalent features in HTTP/1.0,_ _+or any headers that would affect the persistence of the client's_ _+connection to the web server.  These features are the exclusive_ _+province of the actual web server, and a server or gateway **should**_ _+consider it a fatal error for an application to attempt sending them,_ _+and raise an error if they are supplied as return values from an_ _+application in the ``headers`` structure.  (For more specifics on_ _+"hop-by-hop" features and headers, please see theOther HTTP +Features` section below.)_ _+_ _+_ _+Dealing with Compatibility Across Python Versions_ _+-------------------------------------------------_ _+_ _+Creating Web3 code that runs under both Python 2.6/2.7 and Python 3.1+_ _+requires some care on the part of the developer.  In general, the Web3_ _+specification assumes a certain level of equivalence between the_ _+Python 2 str type and the Python 3 bytes type.  For example,_ _+under Python 2, the values present in the Web3 environ will be_ _+instances of the str type; in Python 3, these will be instances of_ _+the bytes type.  The Python 3 bytes type does not possess all_ _+the methods of the Python 2 str type, and some methods which it_ _+does possess behave differently than the Python 2 str type._ _+Effectively, to ensure that Web3 middleware and applications work_ _+across Python versions, developers must do these things:_ _+_ _+#) Do not assume comparison equivalence between text values and bytes_ _+   values.  If you do so, your code may work under Python 2, but it_ _+   will not work properly under Python 3.  For example, don't write_ _+   somebytes == 'abc'.  This will sometimes be true on Python 2_ _+   but it will never be true on Python 3, because a sequence of bytes_ _+   never compares equal to a string under Python 3.  Instead, always_ _+   compare a bytes value with a bytes value, e.g. "somebytes ==_ _+   b'abc'".  Code which does this is compatible with and works the_ _+   same in Python 2.6, 2.7, and 3.1.  The b in front of 'abc'_ _+   signals to Python 3 that the value is a literal bytes instance;_ _+   under Python 2 it's a forward compatibility placebo._ _+_ _+#) Don't use the _contains_ method (directly or indirectly) of_ _+   items that are meant to be byteslike without ensuring that its_ _+   argument is also a bytes instance.  If you do so, your code may_ _+   work under Python 2, but it will not work properly under Python 3._ _+   For example, 'abc' in somebytes' will raise a TypeError_ _+   under Python 3, but it will return True under Python 2.6 and_ _+   2.7.  However, b'abc' in somebytes will work the same on both_ _+   versions.  In Python 3.2, this restriction may be partially_ _+   removed, as it's rumored that bytes types may obtain a _mod__ _+   implementation._ _+_ _+#) _getitem_ should not be used._ _+_ _+   .. XXX_ _+_ _+#) Dont try to use the format method or the _mod_ method of_ _+   instances of bytes (directly or indirectly).  In Python 2, the_ _+   str type which we treat equivalently to Python 3's bytes_ _+   supports these method but actual Python 3's bytes instances_ _+   don't support these methods.  If you use these methods, your code_ _+   will work under Python 2, but not under Python 3._ _+_ _+#) Do not try to concatenate a bytes value with a string value.  This_ _+   may work under Python 2, but it will not work under Python 3.  For_ _+   example, doing 'abc' + somebytes will work under Python 2, but_ _+   it will result in a TypeError under Python 3.  Instead, always_ _+   make sure you're concatenating two items of the same type,_ _+   e.g. b'abc' + somebytes._ _+_ _+Web3 expects byte values in other places, such as in all the values_ _+returned by an application._ _+_ _+In short, to ensure compatibility of Web3 application code between_ _+Python 2 and Python 3, in Python 2, treat CGI and server variable_ _+values in the environment as if they had the Python 3 bytes API_ _+even though they actually have a more capable API.  Likewise for all_ _+stringlike values returned by a Web3 application._ _+_ _+_ _+Buffering and Streaming_ _+-----------------------_ _+_ _+Generally speaking, applications will achieve the best throughput by_ _+buffering their (modestly-sized) output and sending it all at once._ _+This is a common approach in existing frameworks: the output is_ _+buffered in a StringIO or similar object, then transmitted all at_ _+once, along with the response headers._ _+_ _+The corresponding approach in Web3 is for the application to simply_ _+return a single-element body iterable (such as a list) containing_ _+the response body as a single string.  This is the recommended_ _+approach for the vast majority of application functions, that render_ _+HTML pages whose text easily fits in memory._ _+_ _+For large files, however, or for specialized uses of HTTP streaming_ _+(such as multipart "server push"), an application may need to provide_ _+output in smaller blocks (e.g. to avoid loading a large file into_ _+memory).  It's also sometimes the case that part of a response may be_ _+time-consuming to produce, but it would be useful to send ahead the_ _+portion of the response that precedes it._ _+_ _+In these cases, applications will usually return a body iterator_ _+(often a generator-iterator) that produces the output in a_ _+block-by-block fashion.  These blocks may be broken to coincide with_ _+mulitpart boundaries (for "server push"), or just before_ _+time-consuming tasks (such as reading another block of an on-disk_ _+file)._ _+_ _+Web3 servers, gateways, and middleware **must not** delay the_ _+transmission of any block; they **must** either fully transmit the_ _+block to the client, or guarantee that they will continue transmission_ _+even while the application is producing its next block.  A_ _+server/gateway or middleware may provide this guarantee in one of_ _+three ways:_ _+_ _+1. Send the entire block to the operating system (and request that any_ _+   O/S buffers be flushed) before returning control to the_ _+   application, OR_ _+_ _+2. Use a different thread to ensure that the block continues to be_ _+   transmitted while the application produces the next block._ _+_ _+3. (Middleware only) send the entire block to its parent_ _+   gateway/server._ _+_ _+By providing this guarantee, Web3 allows applications to ensure that_ _+transmission will not become stalled at an arbitrary point in their_ _+output data.  This is critical for proper functioning of_ _+e.g. multipart "server push" streaming, where data between multipart_ _+boundaries should be transmitted in full to the client._ _+_ _+_ _+Unicode Issues_ _+--------------_ _+_ _+HTTP does not directly support Unicode, and neither does this_ _+interface.  All encoding/decoding must be handled by the_ _+**application**; all values passed to or from the server must be of_ _+the Python 3 type bytes or instances of the Python 2 type str,_ _+not Python 2 unicode or Python 3 str objects._ _+_ _+All "bytes instances" referred to in this specification **must**:_ _+_ _+- On Python 2, be of type str._ _+_ _+- On Python 3, be of type bytes._ _+_ _+All "bytes instances" **must not** :_ _+_ _+- On Python 2,  be of type unicode._ _+_ _+- On Python 3, be of type str._ _+_ _+The result of using a textlike object where a byteslike object is_ _+required is undefined._ _+_ _+Values returned from a Web3 app as a status or as response headers_ _+**must** follow RFC 2616 with respect to encoding.  That is, the bytes_ _+returned must contain a character stream of ISO-8859-1 characters, or_ _+the character stream should use RFC 2047 MIME encoding._ _+_ _+On Python platforms which do not have a native bytes-like type_ _+(e.g. IronPython, etc.), but instead which generally use textlike_ _+strings to represent bytes data, the definition of "bytes instance"_ _+can be changed: their "bytes instances" must be native strings that_ _+contain only code points representable in ISO-8859-1 encoding_ _+(\u0000 through \u00FF, inclusive).  It is a fatal error for_ _+an application on such a platform to supply strings containing any_ _+other Unicode character or code point.  Similarly, servers and_ _+gateways on those platforms **must not** supply strings to an_ _+application containing any other Unicode characters._ _+_ _+.. XXX (armin: Jython now has a bytes type, we might remove this_ _+   section after seeing about IronPython)_ _+_ _+_ _+HTTP 1.1 Expect/Continue_ _+------------------------_ _+_ _+Servers and gateways that implement HTTP 1.1 **must** provide_ _+transparent support for HTTP 1.1's "expect/continue" mechanism.  This_ _+may be done in any of several ways:_ _+_ _+1. Respond to requests containing an Expect: 100-continue request_ _+   with an immediate "100 Continue" response, and proceed normally._ _+_ _+2. Proceed with the request normally, but provide the application with_ _+   a web3.input stream that will send the "100 Continue" response_ _+   if/when the application first attempts to read from the input_ _+   stream.  The read request must then remain blocked until the client_ _+   responds._ _+_ _+3. Wait until the client decides that the server does not support_ _+   expect/continue, and sends the request body on its own.  (This is_ _+   suboptimal, and is not recommended.)_ _+_ _+Note that these behavior restrictions do not apply for HTTP 1.0_ _+requests, or for requests that are not directed to an application_ _+object.  For more information on HTTP 1.1 Expect/Continue, see RFC_ _+2616, sections 8.2.3 and 10.1.1._ _+_ _+_ _+Other HTTP Features_ _+-------------------_ _+_ _+In general, servers and gateways should "play dumb" and allow the_ _+application complete control over its output.  They should only make_ _+changes that do not alter the effective semantics of the application's_ _+response.  It is always possible for the application developer to add_ _+middleware components to supply additional features, so server/gateway_ _+developers should be conservative in their implementation.  In a_ _+sense, a server should consider itself to be like an HTTP "gateway_ _+server", with the application being an HTTP "origin server".  (See RFC_ _+2616, section 1.3, for the definition of these terms.)_ _+_ _+However, because Web3 servers and applications do not communicate via_ _+HTTP, what RFC 2616 calls "hop-by-hop" headers do not apply to Web3_ _+internal communications.  Web3 applications **must not** generate any_ _+"hop-by-hop" headers [4], attempt to use HTTP features that would_ _+require them to generate such headers, or rely on the content of any_ _+incoming "hop-by-hop" headers in the environ dictionary.  Web3_ _+servers **must** handle any supported inbound "hop-by-hop" headers on_ _+their own, such as by decoding any inbound Transfer-Encoding,_ _+including chunked encoding if applicable._ _+_ _+Applying these principles to a variety of HTTP features, it should be_ _+clear that a server **may** handle cache validation via the_ _+If-None-Match and If-Modified-Since request headers and the_ _+Last-Modified and ETag response headers.  However, it is not_ _+required to do this, and the application **should** perform its own_ _+cache validation if it wants to support that feature, since the_ _+server/gateway is not required to do such validation._ _+_ _+Similarly, a server **may** re-encode or transport-encode an_ _+application's response, but the application **should** use a suitable_ _+content encoding on its own, and **must not** apply a transport_ _+encoding.  A server **may** transmit byte ranges of the application's_ _+response if requested by the client, and the application doesn't_ _+natively support byte ranges.  Again, however, the application_ _+**should** perform this function on its own if desired._ _+_ _+Note that these restrictions on applications do not necessarily mean_ _+that every application must reimplement every HTTP feature; many HTTP_ _+features can be partially or fully implemented by middleware_ _+components, thus freeing both server and application authors from_ _+implementing the same features over and over again._ _+_ _+_ _+Thread Support_ _+--------------_ _+_ _+Thread support, or lack thereof, is also server-dependent.  Servers_ _+that can run multiple requests in parallel, **should** also provide_ _+the option of running an application in a single-threaded fashion, so_ _+that applications or frameworks that are not thread-safe may still be_ _+used with that server._ _+_ _+_ _+Implementation/Application Notes_ _+================================_ _+_ _+Server Extension APIs_ _+---------------------_ _+_ _+Some server authors may wish to expose more advanced APIs, that_ _+application or framework authors can use for specialized purposes._ _+For example, a gateway based on modpython might wish to expose_ _+part of the Apache API as a Web3 extension._ _+_ _+In the simplest case, this requires nothing more than defining an_ _+environ variable, such as modpython.someapi.  But, in many_ _+cases, the possible presence of middleware can make this difficult._ _+For example, an API that offers access to the same HTTP headers that_ _+are found in environ variables, might return different data if_ _+environ has been modified by middleware._ _+_ _+In general, any extension API that duplicates, supplants, or bypasses_ _+some portion of Web3 functionality runs the risk of being incompatible_ _+with middleware components.  Server/gateway developers should *not*_ _+assume that nobody will use middleware, because some framework_ _+developers specifically organize their frameworks to function almost_ _+entirely as middleware of various kinds._ _+_ _+So, to provide maximum compatibility, servers and gateways that_ _+provide extension APIs that replace some Web3 functionality, **must**_ _+design those APIs so that they are invoked using the portion of the_ _+API that they replace.  For example, an extension API to access HTTP_ _+request headers must require the application to pass in its current_ _+environ, so that the server/gateway may verify that HTTP headers_ _+accessible via the API have not been altered by middleware.  If the_ _+extension API cannot guarantee that it will always agree with_ _+environ about the contents of HTTP headers, it must refuse service_ _+to the application, e.g. by raising an error, returning None_ _+instead of a header collection, or whatever is appropriate to the API._ _+_ _+These guidelines also apply to middleware that adds information such_ _+as parsed cookies, form variables, sessions, and the like to_ _+environ.  Specifically, such middleware should provide these_ _+features as functions which operate on environ, rather than simply_ _+stuffing values into environ.  This helps ensure that information_ _+is calculated from environ *after* any middleware has done any URL_ _+rewrites or other environ modifications._ _+_ _+It is very important that these "safe extension" rules be followed by_ _+both server/gateway and middleware developers, in order to avoid a_ _+future in which middleware developers are forced to delete any and all_ _+extension APIs from environ to ensure that their mediation isn't_ _+being bypassed by applications using those extensions!_ _+_ _+_ _+Application Configuration_ _+-------------------------_ _+_ _+This specification does not define how a server selects or obtains an_ _+application to invoke.  These and other configuration options are_ _+highly server-specific matters.  It is expected that server/gateway_ _+authors will document how to configure the server to execute a_ _+particular application object, and with what options (such as_ _+threading options)._ _+_ _+Framework authors, on the other hand, should document how to create an_ _+application object that wraps their framework's functionality.  The_ _+user, who has chosen both the server and the application framework,_ _+must connect the two together.  However, since both the framework and_ _+the server have a common interface, this should be merely a mechanical_ _+matter, rather than a significant engineering effort for each new_ _+server/framework pair._ _+_ _+Finally, some applications, frameworks, and middleware may wish to use_ _+the environ dictionary to receive simple string configuration_ _+options.  Servers and gateways **should** support this by allowing an_ _+application's deployer to specify name-value pairs to be placed in_ _+environ.  In the simplest case, this support can consist merely of_ _+copying all operating system-supplied environment variables from_ _+os.environ into the environ dictionary, since the deployer in_ _+principle can configure these externally to the server, or in the CGI_ _+case they may be able to be set via the server's configuration files._ _+_ _+Applications **should** try to keep such required variables to a_ _+minimum, since not all servers will support easy configuration of_ _+them.  Of course, even in the worst case, persons deploying an_ _+application can create a script to supply the necessary configuration_ _+values::_ _+_ _+   from theapp import application_ _+_ _+   def newapp(environ):_ _+       environ['theapp.configval1'] = b'something'_ _+       return application(environ)_ _+_ _+But, most existing applications and frameworks will probably only need_ _+a single configuration value from environ, to indicate the_ _+location of their application or framework-specific configuration_ _+file(s).  (Of course, applications should cache such configuration, to_ _+avoid having to re-read it upon each invocation.)_ _+_ _+_ _+URL Reconstruction_ _+------------------_ _+_ _+If an application wishes to reconstruct a request's complete URL (as a_ _+bytes object), it may do so using the following algorithm::_ _+_ _+    host = environ.get('HTTPHOST')_ _+_ _+    scheme = environ['web3.urlscheme']_ _+    port = environ['SERVERPORT']_ _+    query = environ['QUERYSTRING']_ _+_ _+    url = scheme + b'://'_ _+_ _+    if host:_ _+        url += host_ _+    else:_ _+        url += environ['SERVERNAME']_ _+_ _+        if scheme == b'https':_ _+            if port != b'443':_ _+               url += b':' + port_ _+        else:_ _+            if port != b'80':_ _+               url += b':' + port_ _+_ _+    if 'web3.scriptname' in url:_ _+        url += urlquote(environ['web3.scriptname'])_ _+    else:_ _+        url += environ['SCRIPTNAME']_ _+    if 'web3.pathinfo' in environ:_ _+        url += urlquote(environ['web3.pathinfo'])_ _+    else:_ _+        url += environ['PATHINFO']_ _+    if query:_ _+        url += b'?' + query_ _+_ _+Note that such a reconstructed URL may not be precisely the same URI_ _+as requested by the client.  Server rewrite rules, for example, may_ _+have modified the client's originally requested URL to place it in a_ _+canonical form._ _+_ _+_ _+Open Questions_ _+==============_ _+_ _+- filewrapper replacement.  Currently nothing is specified here_ _+  but it's clear that the old system of in-band signalling is broken_ _+  if it does not provide a way to figure out as a middleware in the_ _+  process if the response is a file wrapper._ _+_ _+_ _+Points of Contention_ _+====================_ _+_ _+Outlined below are potential points of contention regarding this_ _+specification._ _+_ _+_ _+WSGI 1.0 Compatibility_ _+----------------------_ _+_ _+Components written using the WSGI 1.0 specification will not_ _+transparently interoperate with components written using this_ _+specification.  That's because the goals of this proposal and the_ _+goals of WSGI 1.0 are not directly aligned._ _+_ _+WSGI 1.0 is obliged to provide specification-level backwards_ _+compatibility with versions of Python between 2.2 and 2.7.  This_ _+specification, however, ditches Python 2.5 and lower compatibility in_ _+order to provide compatibility between relatively recent versions of_ _+Python 2 (2.6 and 2.7) as well as relatively recent versions of Python_ _+3 (3.1)._ _+_ _+It is currently impossible to write components which work reliably_ _+under both Python 2 and Python 3 using the WSGI 1.0 specification,_ _+because the specification implicitly posits that CGI and server_ _+variable values in the environ and values returned via_ _+startresponse represent a sequence of bytes that can be addressed_ _+using the Python 2 string API.  It posits such a thing because that_ _+sort of data type was the sensible way to represent bytes in all_ _+Python 2 versions, and WSGI 1.0 was conceived before Python 3 existed._ _+_ _+Python 3's str type supports the full API provided by the Python 2_ _+str type, but Python 3's str type does not represent a_ _+sequence of bytes, it instead represents text.  Therefore, using it to_ _+represent environ values also requires that the environ byte sequence_ _+be decoded to text via some encoding.  We cannot decode these bytes to_ _+text (at least in any way where the decoding has any meaning other_ _+than as a tunnelling mechanism) without widening the scope of WSGI to_ _+include server and gateway knowledge of decoding policies and_ _+mechanics.  WSGI 1.0 never concerned itself with encoding and_ _+decoding.  It made statements about allowable transport values, and_ _+suggested that various values might be best decoded as one encoding or_ _+another, but it never required a server to *perform* any decoding_ _+before_ _+_ _+Python 3 does not have a stringlike type that can be used instead to_ _+represent bytes: it has a bytes type.  A bytes type operates quite_ _+a bit like a Python 2 str in Python 3.1+, but it lacks behavior_ _+equivalent to str._mod_ and its iteration protocol, and_ _+containment, sequence treatment, and equivalence comparisons are_ _+different._ _+_ _+In either case, there is no type in Python 3 that behaves just like_ _+the Python 2 str type, and a way to create such a type doesn't_ _+exist because there is no such thing as a "String ABC" which would_ _+allow a suitable type to be built.  Due to this design_ _+incompatibility, existing WSGI 1.0 servers, middleware, and_ _+applications will not work under Python 3, even after they are run_ _+through 2to3._ _+_ _+Existing Web-SIG discussions about updating the WSGI specification so_ _+that it is possible to write a WSGI application that runs in both_ _+Python 2 and Python 3 tend to revolve around creating a_ _+specification-level equivalence between the Python 2 str type_ _+(which represents a sequence of bytes) and the Python 3 str type_ _+(which represents text).  Such an equivalence becomes strained in_ _+various areas, given the different roles of these types.  An arguably_ _+more straightforward equivalence exists between the Python 3 bytes_ _+type API and a subset of the Python 2 str type API.  This_ _+specification exploits this subset equivalence._ _+_ _+In the meantime, aside from any Python 2 vs. Python 3 compatibility_ _+issue, as various discussions on Web-SIG have pointed out, the WSGI_ _+1.0 specification is too general, providing support (via .write)_ _+for asynchronous applications at the expense of implementation_ _+complexity.  This specification uses the fundamental incompatibility_ _+between WSGI 1.0 and Python 3 as a natural divergence point to create_ _+a specification with reduced complexity by changing specialized_ _+support for asynchronous applications._ _+_ _+To provide backwards compatibility for older WSGI 1.0 applications, so_ _+that they may run on a Web3 stack, it is presumed that Web3 middleware_ _+will be created which can be used "in front" of existing WSGI 1.0_ _+applications, allowing those existing WSGI 1.0 applications to run_ _+under a Web3 stack.  This middleware will require, when under Python_ _+3, an equivalence to be drawn between Python 3 str types and the_ _+bytes values represented by the HTTP request and all the attendant_ _+encoding-guessing (or configuration) it implies._ _+_ _+.. note::_ _+_ _+   Such middleware *might* in the future, instead of drawing an_ _+   equivalence between Python 3 str and HTTP byte values, make use_ _+   of a yet-to-be-created "ebytes" type (aka "bytes-with-benefits"),_ _+   particularly if a String ABC proposal is accepted into the Python_ _+   core and implemented._ _+_ _+Conversely, it is presumed that WSGI 1.0 middleware will be created_ _+which will allow a Web3 application to run behind a WSGI 1.0 stack on_ _+the Python 2 platform._ _+_ _+_ _+Environ and Response Values as Bytes_ _+------------------------------------_ _+_ _+Casual middleware and application writers may consider the use of_ _+bytes as environment values and response values inconvenient.  In_ _+particular, they won't be able to use common string formatting_ _+functions such as ('%s' % bytesval) or_ _+bytesval.format('123') because bytes don't have the same API as_ _+strings on platforms such as Python 3 where the two types differ._ _+Likewise, on such platforms, stdlib HTTP-related API support for using_ _+bytes interchangeably with text can be spotty.  In places where bytes_ _+are inconvenient or incompatible with library APIs, middleware and_ _+application writers will have to decode such bytes to text explicitly._ _+This is particularly inconvenient for middleware writers: to work with_ _+environment values as strings, they'll have to decode them from an_ _+implied encoding and if they need to mutate an environ value, they'll_ _+then need to encode the value into a byte stream before placing it_ _+into the environ.  While the use of bytes by the specification as_ _+environ values might be inconvenient for casual developers, it_ _+provides several benefits._ _+_ _+Using bytes types to represent HTTP and server values to an_ _+application most closely matches reality because HTTP is fundamentally_ _+a bytes-oriented protocol.  If the environ values are mandated to be_ _+strings, each server will need to use heuristics to guess about the_ _+encoding of various values provided by the HTTP environment.  Using_ _+all strings might increase casual middleware writer convenience, but_ _+will also lead to ambiguity and confusion when a value cannot be_ _+decoded to a meaningful non-surrogate string._ _+_ _+Use of bytes as environ values avoids any potential for the need for_ _+the specification to mandate that a participating server be informed_ _+of encoding configuration parameters.  If environ values are treated_ _+as strings, and so must be decoded from bytes, configuration_ _+parameters may eventually become necessary as policy clues from the_ _+application deployer.  Such a policy would be used to guess an_ _+appropriate decoding strategy in various circumstances, effectively_ _+placing the burden for enforcing a particular application encoding_ _+policy upon the server.  If the server must serve more than one_ _+application, such configuration would quickly become complex.  Many_ _+policies would also be impossible to express declaratively._ _+_ _+In reality, HTTP is a complicated and legacy-fraught protocol which_ _+requires a complex set of heuristics to make sense of. It would be_ _+nice if we could allow this protocol to protect us from this_ _+complexity, but we cannot do so reliably while still providing to_ _+application writers a level of control commensurate with reality._ _+Python applications must often deal with data embedded in the_ _+environment which not only must be parsed by legacy heuristics, but_ _+*does not conform even to any existing HTTP specification*.  While_ _+these eventualities are unpleasant, they crop up with regularity,_ _+making it impossible and undesirable to hide them from application_ _+developers, as application developers are the only people who are able_ _+to decide upon an appropriate action when an HTTP specification_ _+violation is detected._ _+_ _+Some have argued for mixed use of bytes and string values as environ_ _+*values*.  This proposal avoids that strategy.  Sole use of bytes as_ _+environ values makes it possible to fit this specification entirely in_ _+one's head; you won't need to guess about which values are strings and_ _+which are bytes._ _+_ _+This protocol would also fit in a developer's head if all environ_ _+values were strings, but this specification doesn't use that strategy._ _+This will likely be the point of greatest contention regarding the use_ _+of bytes.  In defense of bytes: developers often prefer protocols with_ _+consistent contracts, even if the contracts themselves are suboptimal._ _+If we hide encoding issues from a developer until a value that_ _+contains surrogates causes problems after it has already reached_ _+beyond the I/O boundary of their application, they will need to do a_ _+lot more work to fix assumptions made by their application than if we_ _+were to just present the problem much earlier in terms of "here's some_ _+bytes, you decode them".  This is also a counter-argument to the_ _+"bytes are inconvenient" assumption: while presenting bytes to an_ _+application developer may be inconvenient for a casual application_ _+developer who doesn't care about edge cases, they are extremely_ _+convenient for the application developer who needs to deal with_ _+complex, dirty eventualities, because use of bytes allows him the_ _+appropriate level of control with a clear separation of_ _+responsibility._ _+_ _+If the protocol uses bytes, it is presumed that libraries will be_ _+created to make working with bytes-only in the environ and within_ _+return values more pleasant; for example, analogues of the WSGI 1.0_ _+libraries named "WebOb" and "Werkzeug".  Such libraries will fill the_ _+gap between convenience and control, allowing the spec to remain_ _+simple and regular while still allowing casual authors a convenient_ _+way to create Web3 middleware and application components.  This seems_ _+to be a reasonable alternative to baking encoding policy into the_ _+protocol, because many such libraries can be created independently_ _+from the protocol, and application developers can choose the one that_ _+provides them the appropriate levels of control and convenience for a_ _+particular job._ _+_ _+Here are some alternatives to using all bytes:_ _+_ _+- Have the server decode all values representing CGI and server_ _+  environ values into strings using the latin-1 encoding, which is_ _+  lossless.  Smuggle any undecodable bytes within the resulting_ _+  string._ _+_ _+- Encode all CGI and server environ values to strings using the_ _+  utf-8 encoding with the surrogateescape error handler.  This_ _+  does not work under any existing Python 2._ _+_ _+- Encode some values into bytes and other values into strings, as_ _+  decided by their typical usages._ _+_ _+_ _+Applications Should be Allowed to Read web3.input Past CONTENTLENGTH_ _+-----------------------------------------------------------------------------_ _+_ _+At [6], Graham Dumpleton makes the assertion that wsgi.input_ _+should be required to return the empty string as a signifier of_ _+out-of-data, and that applications should be allowed to read past the_ _+number of bytes specified in CONTENTLENGTH, depending only upon_ _+the empty string as an EOF marker.  WSGI relies on an application_ _+"being well behaved and once all data specified by CONTENTLENGTH_ _+is read, that it processes the data and returns any response. That_ _+same socket connection could then be used for a subsequent request."_ _+Graham would like WSGI adapters to be required to wrap raw socket_ _+connections: "this wrapper object will need to count how much data has_ _+been read, and when the amount of data reaches that as defined by_ _+CONTENTLENGTH, any subsequent reads should return an empty string_ _+instead."  This may be useful to support chunked encoding and input_ _+filters._ _+_ _+_ _+web3.input Unknown Length_ _+-----------------------------_ _+_ _+There's no documented way to indicate that there is content in_ _+environ['web3.input'], but the content length is unknown._ _+_ _+_ _+read() of web3.input Should Support No-Size Calling Convention_ _+----------------------------------------------------------------------_ _+_ _+At [6], Graham Dumpleton makes the assertion that the read()_ _+method of wsgi.input should be callable without arguments, and_ _+that the result should be "all available request content".  Needs_ _+discussion._ _+_ _+Comment Armin: I changed the spec to require that from an_ _+implementation.  I had too much pain with that in the past already._ _+Open for discussions though._ _+_ _+_ _+Input Filters should set environ CONTENTLENGTH to -1_ _+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~_ _+_ _+At [6], Graham Dumpleton suggests that an input filter might set_ _+environ['CONTENTLENGTH'] to -1 to indicate that it mutated the_ _+input._ _+_ _+_ _+headers as Literal List of Two-Tuples_ _+-----------------------------------------_ _+_ _+Why do we make applications return a headers structure that is a_ _+literal list of two-tuples?  I think the iterability of headers_ _+needs to be maintained while it moves up the stack, but I don't think_ _+we need to be able to mutate it in place at all times.  Could we_ _+loosen that requirement?_ _+_ _+Comment Armin: Strong yes_ _+_ _+_ _+Removed Requirement that Middleware Not Block_ _+---------------------------------------------_ _+_ _+This requirement was removed: "middleware components **must not**_ _+block iteration waiting for multiple values from an application_ _+iterable.  If the middleware needs to accumulate more data from the_ _+application before it can produce any output, it **must** yield an_ _+empty string."  This requirement existed to support asynchronous_ _+applications and servers (see PEP 333's "Middleware Handling of Block_ _+Boundaries").  Asynchronous applications are now serviced explicitly_ _+by web3.async capable protocol (a Web3 application callable may_ _+itself return a callable)._ _+_ _+_ _+web3.scriptname and web3.pathinfo_ _+-------------------------------------------_ _+_ _+These values are required to be placed into the environment by an_ _+origin server under this specification.  Unlike SCRIPTNAME and_ _+PATHINFO, these must be the original URL-encoded variants +derived from the request URI.  We probably need to figure out how +these should be computed originally, and what their values should be +if the server performs URL rewriting. + + +Long Response Headers +--------------------- + +Bob Brewer notes on Web-SIG [7]: + +    Each headervalue must not include any control characters, +    including carriage returns or linefeeds, either embedded or at the +    end.  (These requirements are to minimize the complexity of any +    parsing that must be performed by servers, gateways, and +    intermediate response processors that need to inspect or modify +    response headers.) [1] + +That's understandable, but HTTP headers are defined as (mostly) +*TEXT, and "words of *TEXT MAY contain characters from character +sets other than ISO-8859-1 only when encoded according to the rules of +RFC 2047."  [2] And RFC 2047 specifies that "an 'encoded-word' may +not be more than 75 characters long...  If it is desirable to encode +more text than will fit in an 'encoded-word' of 75 characters, +multiple 'encoded-word's (separated by CRLF SPACE) may be used." [3] +This satisfies HTTP header folding rules, as well: "Header fields can +be extended over multiple lines by preceding each extra line with at +least one SP or HT." [1] + +So in my reading of HTTP, some code somewhere should introduce +newlines in longish, encoded response header values.  I see three +options: + +1. Keep things as they are and disallow response header values if they +   contain words over 75 chars that are outside the ISO-8859-1 +   character set. + +2. Allow newline characters in WSGI response headers. + +3. Require/strongly suggest WSGI servers to do the encoding and +   folding before sending the value over HTTP. + + +Request Trailers and Chunked Transfer Encoding +---------------------------------------------- + +When using chunked transfer encoding on request content, the RFCs +allow there to be request trailers.  These are like request headers +but come after the final null data chunk.  These trailers are only +available when the chunked data stream is finite length and when it +has all been read in.  Neither WSGI nor Web3 currently supports them. + +.. XXX (armin) yield from application iterator should be specify write +   plus flush by server. + +.. XXX (armin) websocket API. + + +References +========== + +.. [1] PEP 333: Python Web Services Gateway Interface +   (http://www.python.org/dev/peps/pep-0333/) + +.. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft +   (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt) + +.. [3] "Chunked Transfer Coding" -- HTTP/1.1, section 3.6.1 +   (http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1) + +.. [4] "End-to-end and Hop-by-hop Headers" -- HTTP/1.1, Section 13.5.1 +   (http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.5.1) + +.. [5] modssl Reference, "Environment Variables" +   (http://www.modssl.org/docs/2.8/sslreference.html#ToC25) + +.. [6] Details on WSGI 1.0 amendments/clarifications. +   (http://blog.dscpl.com.au/2009/10/details-on-wsgi-10-amendmentsclarificat.html) + +.. [7] [Web-SIG] WSGI and long response header values +   http://mail.python.org/pipermail/web-sig/2006-September/002244.html + +Copyright +========= + +This document has been placed in the public domain. + + + +.. +   Local Variables: +   mode: indented-text +   indent-tabs-mode: nil +   sentence-end-double-space: t +   fill-column: 70 +   coding: utf-8 +   End:


Python-checkins mailing list Python-checkins at python.org http://mail.python.org/mailman/listinfo/python-checkins



More information about the Python-Dev mailing list