How to parse multi-part content

Sun Sep 26 20:33:20 EDT 2004

Tim Roberts wrote:

> Dave Kuhlman <dkuhlman at rexx.com> wrote:
>>
>>Suppose that I have content that looks like what I've included at
>>the end of this message.  Is there something in the standard
>>Python library that will help me parse it, break into the parts
>>separated by the boundary strings, extract headers from each
>>sub-part, etc?
>>...
>>In case you are curious, this is content posted to my Zope server
>>when I include an element '<input type="file" .../>' in my form.
> 
> Actually, you get this because your <form> header has
> enctype="multipart/form-data".  It happens that file upload only works
> with that enctype, but you can use it without a file upload.
> 
> That's why cgi.py knows how to parse this.  Look at cgi.parse_multipart.

Ah. A clue.  I think you're telling me that it's the CGI
specification that I need to be reading, right?  I'll read some of
that.

Per your suggestion, I tried cgi.parse_multipart() and also
class cgi.FieldStorage.  They don't work.  Or more correctly, I
don't know how to use them.

I guess I'll have to concede defeat, which in Python-speak means:
"It was easier to write it myself."

Basically, I wrote a little parser class ContentParser which
exposes a method get_content_by_name.  This method returns the
body (what follows two carriage returns, up to the next
boundary line) for a given name, where name is the value of the
"name" field in the line:

    Content-Disposition: form-data; name="xschemaFile"

I was in a bit of a hurry, so my solution (class ContentParser) is
not very elegant.  But if anyone needs it, let me know.

And, thanks for the suggestions.

Dave

-- 
Dave Kuhlman
http://www.rexx.com/~dkuhlman