[Python-Dev] urllib, multipart/form-data encoding and file uploads (original) (raw)

Chris AtLee chris at atlee.ca
Fri Jun 27 22:20:59 CEST 2008


On Fri, Jun 27, 2008 at 11:40 AM, Bill Janssen <janssen at parc.com> wrote:

I notice that there is some work being done on urllib / urllib2 for python 2.6/3.0. One thing I've always missed in urllib/urllib2 is the facility to encode POST data as multipart/form-data. I think it would also be useful to be able to stream a POST request to the remote server rather than having requiring the user to create the entire POST body in memory before starting the request. This would be extremely useful when writing any kind of code that does file uploads.

I didn't see any recent discussion about this so I thought I'd ask here: do you think this would make a good addition to the new urllib package? I think it would be very helpful. I'd separate the two things, though; you want to be able to format a set of values as "multipart/form-data", and do various things with that resulting "document", and you want to be able to stream a POST (or PUT) request.

How about if the function that encoded the values as "multipart/form-data" was able to stream data to a POST (or PUT) request via an iterator that yielded chunks of data?

def multipart_encode(params, boundary=None): """Encode params as multipart/form-data.

``params`` should be a dictionary where the keys represent parameter names,
and the values are either parameter values, or file-like objects to
use as the parameter value.  The file-like object must support the .read(),
.seek(), and .tell() methods.

If ``boundary`` is set, then it as used as the MIME boundary.  Otherwise
a randomly generated boundary will be used.  In either case, if the
boundary string appears in the parameter values a ValueError will be
raised.

Returns an iterable object that will yield blocks of data representing
the encoded parameters."""

The file objects need to support .seek() and .tell() so we can determine how large they are before including them in the output. I've been trying to come up with a good way to specify the size separately so you could use unseekable objects, but no good ideas have come to mind. Maybe it could look for a 'size' attribute or callable on the object? That seems a bit hacky...

A couple helper functions would be necessary as well, one to generate random boundary strings that are guaranteed not to collide with file data, and another function to calculate the total size of the encoding to be used in the 'Content-Length' header in the main HTTP request.

Then we'd need to change either urllib or httplib to support iterable objects in addition to the regular strings that it currently uses.

Cheers, Chris



More information about the Python-Dev mailing list