[Python-Dev] urllib, multipart/form-data encoding and file uploads

Fri Jun 27 22:20:59 CEST 2008

On Fri, Jun 27, 2008 at 11:40 AM, Bill Janssen <janssen at parc.com> wrote:
>> I notice that there is some work being done on urllib / urllib2 for
>> python 2.6/3.0.  One thing I've always missed in urllib/urllib2 is the
>> facility to encode POST data as multipart/form-data.  I think it would
>> also be useful to be able to stream a POST request to the remote
>> server rather than having requiring the user to create the entire POST
>> body in memory before starting the request.  This would be extremely
>> useful when writing any kind of code that does file uploads.
>>
>> I didn't see any recent discussion about this so I thought I'd ask
>> here: do you think this would make a good addition to the new urllib
>> package?
>
> I think it would be very helpful.  I'd separate the two things,
> though; you want to be able to format a set of values as
> "multipart/form-data", and do various things with that resulting
> "document", and you want to be able to stream a POST (or PUT) request.

How about if the function that encoded the values as "multipart/form-data"
was able to stream data to a POST (or PUT) request via an iterator that
yielded chunks of data?

def multipart_encode(params, boundary=None):
    """Encode ``params`` as multipart/form-data.

    ``params`` should be a dictionary where the keys represent parameter names,
    and the values are either parameter values, or file-like objects to
    use as the parameter value.  The file-like object must support the .read(),
    .seek(), and .tell() methods.

    If ``boundary`` is set, then it as used as the MIME boundary.  Otherwise
    a randomly generated boundary will be used.  In either case, if the
    boundary string appears in the parameter values a ValueError will be
    raised.

    Returns an iterable object that will yield blocks of data representing
    the encoded parameters."""

The file objects need to support .seek() and .tell() so we can determine
how large they are before including them in the output.  I've been trying
to come up with a good way to specify the size separately so you could use
unseekable objects, but no good ideas have come to mind.  Maybe it could
look for a 'size' attribute or callable on the object?  That seems a bit
hacky...

A couple helper functions would be necessary as well, one to generate
random boundary strings that are guaranteed not to collide with file data,
and another function to calculate the total size of the encoding to be used
in the 'Content-Length' header in the main HTTP request.

Then we'd need to change either urllib or httplib to support iterable
objects in addition to the regular strings that it currently uses.

Cheers,
Chris