Newline interpretation issue with MIMEApplication with binary data, Python 3.3.2

Nils Bunger nilsbunger at gmail.com
Thu Sep 26 11:56:38 EDT 2013


Hi Neil, 

Thanks for looking at this.  

I'm trying to create a multipart MIME for an HTTP POST request, not an email.  This is for a third-party API that requires a multipart POST with a binary file, so I don't have the option to just use a different encoding.

Multipart HTTP is standardized in HTTP 1.0 and supports binary parts. Also, no one will re-interpret contents of HTTP on the wire, as binary is quite normal in HTTP.

The issue seems to be some parts of the python MIME encoder still assume it's for email only, where everything would be b64 encoded.

Maybe I have to roll my own to create a multipart msg with a binary file? I was hoping to avoid that.

Nils

ps. You probably know this, but in case anyone else reads this thread, HTTP requires all headers to have CRLF, not native line endings. The python MIME modules can do that properly as of python 3.2 (fixed as of this bug http://hg.python.org/cpython/rev/ebf6741a8d6e/)



> 
> I got interested in it since I have never used any of the
> 
> modules. So I played with it enough to discover that the part of
> 
> the code above that converts the \r to \n is the flatten call.
> 
> 
> 
> I got to here and RFC 2049 and gave up.
> 
> 
> 
>    The following guidelines may be useful to anyone devising a data
> 
>    format (media type) that is supposed to survive the widest range of
> 
>    networking technologies and known broken MTAs unscathed.  Note that
> 
>    anything encoded in the base64 encoding will satisfy these rules, but
> 
>    that some well-known mechanisms, notably the UNIX uuencode facility,
> 
>    will not.  Note also that anything encoded in the Quoted-Printable
> 
>    encoding will survive most gateways intact, but possibly not some
> 
>    gateways to systems that use the EBCDIC character set.
> 
> 
> 
>     (1)   Under some circumstances the encoding used for data may
> 
>           change as part of normal gateway or user agent
> 
>           operation.  In particular, conversion from base64 to
> 
>           quoted-printable and vice versa may be necessary.  This
> 
>           may result in the confusion of CRLF sequences with line
> 
>           breaks in text bodies.  As such, the persistence of
> 
>           CRLF as something other than a line break must not be
> 
>           relied on.
> 
> 
> 
>     (2)   Many systems may elect to represent and store text data
> 
>           using local newline conventions.  Local newline
> 
>           conventions may not match the RFC822 CRLF convention --
> 
>           systems are known that use plain CR, plain LF, CRLF, or
> 
>           counted records.  The result is that isolated CR and LF
> 
>           characters are not well tolerated in general; they may
> 
>           be lost or converted to delimiters on some systems, and
> 
>           hence must not be relied on.
> 
> 
> 
> So putting a raw CR in a binary chunk maybe be intolerable, and
> 
> you need to use a different encoder. But I'm out of my element.
> 
> 
> 
> -- 
> 
> Neil Cerutti



More information about the Python-list mailing list