[Python-Dev] Patch making the current email package (mostly) support bytes

Sun Oct 3 04:43:19 CEST 2010

On Sun, Oct 3, 2010 at 9:00 AM, R. David Murray <rdmurray at bitdance.com> wrote:
> I do not propose that this is a *good* API, since it has the classic
> problem that if there are coding bugs in the email module strings may
> "escape" that have surrogates in them and we end up with programs that
> work most of the time....except when they fail with mysterious errors
> because of unusual bytes input data.  On the other hand you always
> *know* when you have bytes data in an unknown encoding (because they
> are surrogate escaped), so it is ever so much better than the Python2
> situation.

It's a similar concept to one Antoine and I (and some others) have
been considering in the tracker for making urllib.parse able to handle
ASCII-compatible bytes-encodings. I've already implemented a version
of that patch which has parallel bytes and str versions of all the
ASCII constants, and the result is pretty ugly. My next goal is to
implement a version that uses the same trick you have here for email
and see how the code complexity compares.

We do need to tread carefully to make sure the pseudo strings don't
escape, but the other approach requires similar care all the way
through the internal algorithms to make sure they aren't assuming
bytes or str instances anywhere.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia