[Python-Dev] email package status in 3.X

Barry Warsaw barry at python.org
Mon Jun 21 17:43:07 CEST 2010


On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote:

>Something that may make sense to ease the porting process is for some
>of these "on the boundary" I/O related string manipulation functions
>(such as os.path.join) to grow "encoding" keyword-only arguments. The
>recommended approach would be to provide all strings, but bytes could
>also be accepted if an encoding was specified. (If you want to mix
>encodings - tough, do the decoding yourself).

This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz
for it.

Would it make sense to have "encoding-carrying" bytes and str types?
Basically, I'm thinking of types (maybe even the current ones) that carry
around a .encoding attribute so that they can be automatically encoded and
decoded where necessary.  This at least would simplify APIs that need to do
the conversion.

By default, the .encoding attribute would be some marker to indicated "I have
no idea, do it explicitly" and if you combine ebytes or estrs that have
incompatible encodings, you'd either throw an exception or reset the .encoding
to IAmConfuzzled.  But say you had an email header like:

=?euc-jp?b?pc+l7aG8pe+hvKXrpcmhqg==?=

And code like the following (made less crappy):

-----snip snip-----
class ebytes(bytes):
    encoding = 'ascii'

    def __str__(self):
        s = estr(self.decode(self.encoding))
        s.encoding = self.encoding
        return s


class estr(str):
    encoding = 'ascii'


s = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa', 'euc-jp')
b = bytes(s, 'euc-jp')

eb = ebytes(b)
eb.encoding = 'euc-jp'
es = str(eb)
print(repr(eb), es, es.encoding)
-----snip snip-----

Running this you get:

b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa' ハローワールド! euc-jp

Would it be feasible?  Dunno.  Would it help ease the bytes/str confusion?
Dunno.  But I think it would help make APIs easier to design and use because
it would cut down on the encoding-keyword function signature infection.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100621/10fd5d0f/attachment.pgp>


More information about the Python-Dev mailing list