[Python-Dev] urllib, multipart/form-data encoding and file uploads

Sat Jun 28 01:21:03 CEST 2008

All sounds reasonable to me.

Bill

> On Fri, Jun 27, 2008 at 11:40 AM, Bill Janssen <janssen at parc.com> wrote:
> >> I notice that there is some work being done on urllib / urllib2 for
> >> python 2.6/3.0.  One thing I've always missed in urllib/urllib2 is the
> >> facility to encode POST data as multipart/form-data.  I think it would
> >> also be useful to be able to stream a POST request to the remote
> >> server rather than having requiring the user to create the entire POST
> >> body in memory before starting the request.  This would be extremely
> >> useful when writing any kind of code that does file uploads.
> >>
> >> I didn't see any recent discussion about this so I thought I'd ask
> >> here: do you think this would make a good addition to the new urllib
> >> package?
> >
> > I think it would be very helpful.  I'd separate the two things,
> > though; you want to be able to format a set of values as
> > "multipart/form-data", and do various things with that resulting
> > "document", and you want to be able to stream a POST (or PUT) request.
> 
> How about if the function that encoded the values as "multipart/form-data"
> was able to stream data to a POST (or PUT) request via an iterator that
> yielded chunks of data?
> 
> def multipart_encode(params, boundary=None):
>     """Encode ``params`` as multipart/form-data.
> 
>     ``params`` should be a dictionary where the keys represent parameter names,
>     and the values are either parameter values, or file-like objects to
>     use as the parameter value.  The file-like object must support the .read(),
>     .seek(), and .tell() methods.
> 
>     If ``boundary`` is set, then it as used as the MIME boundary.  Otherwise
>     a randomly generated boundary will be used.  In either case, if the
>     boundary string appears in the parameter values a ValueError will be
>     raised.
> 
>     Returns an iterable object that will yield blocks of data representing
>     the encoded parameters."""
> 
> The file objects need to support .seek() and .tell() so we can determine
> how large they are before including them in the output.  I've been trying
> to come up with a good way to specify the size separately so you could use
> unseekable objects, but no good ideas have come to mind.  Maybe it could
> look for a 'size' attribute or callable on the object?  That seems a bit
> hacky...
> 
> A couple helper functions would be necessary as well, one to generate
> random boundary strings that are guaranteed not to collide with file data,
> and another function to calculate the total size of the encoding to be used
> in the 'Content-Length' header in the main HTTP request.
> 
> Then we'd need to change either urllib or httplib to support iterable
> objects in addition to the regular strings that it currently uses.
> 
> Cheers,
> Chris
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/janssen%40parc.com