[Python-Dev] [Email-SIG] Dropping bytes "support" in json

Fri Apr 10 19:08:26 CEST 2009

On Apr 9, 2009, at 11:41 PM, Tony Nelson wrote:

> At 22:38 -0400 04/09/2009, Barry Warsaw wrote:
> ...
>> So, what I'm really asking is this.  Let's say you agree that there
>> are use cases for accessing a header value as either the raw encoded
>> bytes or the decoded unicode.  What should this return:
>>
>>>>> message['Subject']
>>
>> The raw bytes or the decoded unicode?
>
> That's an easy one:  Subject: is an unstructured header, so it must be
> text, thus Unicode.  We're looking at a high-level representation of  
> an
> email message, with parsed header fields and a MIME message tree.

I'm liking Glyph's suggestion here.  We'll probably have to support  
the message['Subject'] API for backward compatibility, but in that  
case it really should be a bytes API.

>> (or better names... it's late and I'm tired ;).  One of those maps to
>> message['Subject'] but which is the more obvious choice?
>
> Structured header fields are more of a problem.  Any header with  
> addresses
> should return a list of addresses.  I think the default return type  
> should
> depend on the data type.  To get an explicit bytes or string or list  
> of
> addresses, be explicit; otherwise, for convenience, return the  
> appropriate
> type for the particular header field name.

Yes, structured headers are trickier.  In a separate message, James  
Knight makes some excellent points, which I agree with.  However the  
email package obviously cannot support every time of structured header  
possible.  It must support this through extensibility.

The obvious way is through inheritance (i.e. subclasses of Header),  
but in my experience, using inheritance of the Message class really  
doesn't work very well.  You need to pass around factories to parsing  
functions and your application tends to have its own hierarchy of  
subclasses for whatever extra things it needs.  ISTM that subclassing  
is simply not the right pattern to support extensibility in the  
Message objects or Header objects.  Yes, this leads me to think that  
all the MIME* subclasses are essentially /wrong/.

Having said all that, the email package must support structured  
headers.  Look at the insanity which is the current folding whitespace  
splitting and the impossibility of the current code to do the right  
thing for say Subject headers and Received headers, and you begin to  
see why it must be possible to extend this stuff.

>> Now, setting headers.  Sometimes you have some unicode thing and
>> sometimes you have some bytes.  You need to end up with bytes in the
>> ASCII range and you'd like to leave the header value unencoded if so.
>> But in both cases, you might have bytes or characters outside that
>> range, so you need an explicit encoding, defaulting to utf-8  
>> probably.
>
> Never for header fields.  The default is always RFC 2047, unless it  
> isn't,
> say for params.
>
> The Message class should create an object of the appropriate  
> subclass of
> Header based on the name (or use the existing object, see other
> discussion), and that should inspect its argument and DTRT or  
> complain.

>>>>> Message.set_header('Subject', 'Some text', encoding='utf-8')
>>>>> Message.set_header('Subject', b'Some bytes')
>>
>> One of those maps to
>>
>>>>> message['Subject'] = ???
>
> The expected data type should depend on the header field.  For  
> Subject:, it
> should be bytes to be parsed or verbatim text.  For To:, it should  
> be a
> list of addresses or bytes or text to be parsed.

At a higher level, yes.  At the low level, it has to be bytes.

> The email package should be pythonic, and not require deep  
> understanding of
> dozens of RFCs to use properly.  Users don't need to know about the  
> raw
> bytes; that's the whole point of MIME and any email package.  It  
> should be
> easy to set header fields with their natural data types, and doing  
> it with
> bad data should produce an error.  This may require a bit more care  
> in the
> message parser, to always produce a parsed message with defects.

I agree that we should have some higher level APIs that make it easy  
to compose email messages, and probably easy-ish to parse a byte  
stream into an email message tree.  But we can't build those without  
the lower level raw support.  I'm also convinced that this lower level  
will be the domain of those crazy enough to have the RFCs tattooed to  
the back of their eyelids.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090410/8f9e960f/attachment.pgp>