[Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

Fri Sep 17 21:44:54 CEST 2010

On Fri, Sep 17, 2010 at 3:25 PM, Michael Foord <fuzzyman at voidspace.org.uk>wrote:

>  On 16/09/2010 23:05, Antoine Pitrou wrote:
>
>> On Thu, 16 Sep 2010 16:51:58 -0400
>> "R. David Murray"<rdmurray at bitdance.com>  wrote:
>>
>>> What do we store in the model?  We could say that the model is always
>>> text.  But then we lose information about the original bytes message,
>>> and we can't reproduce it.  For various reasons (mailman being a big
>>> one),
>>> this is not acceptable.  So we could say that the model is always bytes.
>>> But we want access to (for example) the header values as text, so header
>>> lookup should take string keys and return string values[2].
>>>
>> Why can't you have both in a single class? If you create the class
>> using a bytes source (a raw message sent by SMTP, for example), the
>> class automatically parses and decodes it to unicode strings; if you
>> create the class using an unicode source (the text body of the e-mail
>> message and the list of recipients, for example), the class
>> automatically creates the bytes representation.
>>
>>  I think something like this would be great for WSGI. Rather than focus on
> whether bytes *or* text should be used, use a higher level object that
> provides a bytes view, and (where possible/appropriate) a unicode view too.
>

This is what WebOb does; e.g., there is only bytes version of a POST body,
and a view on that body that does decoding and encoding.  If you don't touch
something, it is never decoded or encoded.  I only vaguely understand the
specifics here, and I suspect the specifics matter, but this seems
applicable in this case too -- if you have an incoming email with a
smattering of bytes, inline (2047) encoding, other encoding declarations,
and then orthogonal systems like quoted-printable, you don't want to touch
that stuff if you don't need to as handling unicode objects implies you are
normalizing the content, and that might have subtle impacts you don't know
about, or don't want to know about, or maybe just don't fit into the unicode
model (like a string with two character sets).

Note that WebOb does not have two views, it has only one view -- unicode
viewing bytes.  I'm not sure I could keep two views straight.  I *think*
Antoine is describing two possible canonical data types (unicode or bytes)
and two views.  That sounds hard.

-- 
Ian Bicking  |  http://blog.ianbicking.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100917/795f4524/attachment.html>