[Email-SIG] fixing the current email module

Sat Oct 10 03:25:56 CEST 2009

On Fri, 9 Oct 2009 at 17:54, Glenn Linderman wrote:
> On approximately 10/9/2009 4:20 PM, came the following characters from the 
> keyboard of R. David Murray:
>>  On Fri, 9 Oct 2009 at 13:26, Glenn Linderman wrote:
>> >  On approximately 10/9/2009 8:10 AM, came the following characters from 
>> >  the keyboard of Stephen J. Turnbull:
>> > >   Glenn Linderman writes:
>> > > > > >   produce a defect report, but then simply converted to Unicode 
>> > >  as if > > >  it were Latin-1 (since there is no other knowledge 
>> > >  available that > > >  could produce a better conversion).
>> > > > > > >   No, that is already corruption.  Most clients will assume 
>> > >  that string
>> > > > >   is valid as a header, because it's valid as a string.
>> > > > >   Sure it is corruption.  That's why there is a defect report.  But
>> > > >   the conversion technique is appropriate, per the Postel principle.
>> > > 
>> > >   Actually, I would say you are emitting leniently, in violation of the
>> > >   Postel principle. 
>> > 
>> >  You can say that, but I don't have to believe it.  I'm talking about 
>> >  accepting; the message has arrived, it is here, the client is trying to 
>> >  look at it, and I'm talking about ways the client can look at 
>> >  not-quite-perfect data, knowing that it is not quite perfect, but still 
>> >  being able to see it. I'm not at all talking about emitting data.  You 
>> >  seem to be calling the email package helping the client to accept 
>> >  not-quite-perfect data, as a form of emitting data.  It is not.
>>
>>  IMO, the appropriate way for the email package to provide the API you
>>  are talking about is it provide the client with a way to get at the raw
>>  byte string, which I think everyone agrees on.  If the client wants to
>>  decode it as if it were latin-1 to process it, it can then do that. 
>
> That certainly works, but it isn't very helpful... that forces the client 
> application to reproduce the logic to parse the header value and decode the 
> parts that can be decoded successfully, and that is exactly the sort of thing 
> Stephen was complaining about when he thought I was suggesting that to be a 
> requirement (but he was confused about what I was suggesting).

I wasn't clear, sorry :).  The current API has a 'decode_header' function,
which doesn't do the byte-to-unicode decode (yeah, there's another naming
problem here...we have two types of decoding and only one word for both)
but instead returns (bytes, charset) tuples.  This piece of the API is
broken in python3, and I don't think it is the right API going forward,
but that _kind_ of API is what I meant by 'getting at the raw byte
string':  the byte string that failed the bytes-to-unicode decoding,
not the entire header (though there will also be a way to get that if
you need it, I presume.)

--David (RDM)