[Email-SIG] [Python-Dev] headers api for email package

R. David Murray rdmurray at bitdance.com
Mon Apr 13 17:49:35 CEST 2009


On Mon, 13 Apr 2009 at 10:28, Barry Warsaw wrote:
> On Apr 11, 2009, at 8:39 AM, Chris Withers wrote:
>
>> Barry Warsaw wrote:
>> > > > >  message['Subject']
>> > The raw bytes or the decoded unicode?
>> 
>> A header object.
>
> Yep.  You got there before I did. :)

+1

>> > Okay, so you've picked one.  Now how do you spell the other way?
>> 
>> str(message['Subject'])
>
> Yes for unstructured headers like Subject.  For structured headers... hmm.

Some "reasonable" printable interpretation that has no semantic meaning?

>> bytes(message['Subject'])
>
> Yes.
>
>> > Now, setting headers.  Sometimes you have some unicode thing and 
>> > sometimes you have some bytes.  You need to end up with bytes in the 
>> > ASCII range and you'd like to leave the header value unencoded if so. 
>> > But in both cases, you might have bytes or characters outside that range, 
>> > so you need an explicit encoding, defaulting to utf-8 probably.
>> > > > >  Message.set_header('Subject', 'Some text', encoding='utf-8')
>> > > > >  Message.set_header('Subject', b'Some bytes')
>> 
>> Where you just want "a damned valid email and stop making my life hard!":
>> 
>> Message['Subject']='Some text'
>
> Yes.  In which case I propose we guess the encoding as 1) ascii, 2) utf-8, 3) 
> wtf?

Given some usenet postings I've just dealt with, (3) appears to
sometimes be spelled 'x-unknown' and sometimes (in the most recent case)
'unknown-8bit'.  A quick google turns up a hit on RFC1428 for the latter,
and a bunch of trouble tickets for the former...so I think 'wtf' is
correctly spelled 'unknown-8bit'.

However, it's not supposed to be used by mail composers, who are
expected to know the encoding.  It's for mail gateways that are
transforming something and don't know the encoding.  I'm not
sure what this means for the email module, which certainly
will be used in a mail gateways....maybe it's the responsibility
of the application code to explicitly say 'unknown encoding'?

>> Where you care about what encoding is used:
>> 
>> Message['Subject']=Header('Some text',encoding='utf-8')
>
> Yes.
>
>> If you have bytes, for whatever reason:
>> 
>> Message['Subject']=b'some bytes'.decode('utf-8')
>> 
>> ...because only you know what encoding those bytes use!
>
> So you're saying that __setitem__() should not accept raw bytes?

If I'm understanding things correctly, if it did accept bytes the
person using that interface would need to do whatever encoding (eg:
encoded-word) was needed, so the interface should check that the byte
string is 8 bit clean.  But having some sort of 'setraw' method on Header
might be better for that case.

--David


More information about the Email-SIG mailing list