[Python-Dev] [Email-SIG] Dropping bytes "support" in json

Stephen J. Turnbull turnbull at sk.tsukuba.ac.jp
Fri Apr 10 07:22:04 CEST 2009


Barry Warsaw writes:

 > There are really two ways to look at an email message.  It's either an  
 > unstructured blob of bytes, or it's a structured tree of objects.

Indeed!

 > Those objects have headers and payload.  The payload can be of any  
 > type, though I think it generally breaks down into "strings" for text/ 
 > * types and bytes for anything else (not counting multiparts).

*sigh*  Why are you back-tracking?

The payload should be of an appropriate *object* type.  Atomic object
types will have their content stored as string or bytes [nb I use
Python 3 terminology throughout].  Composite types (multipart/*) won't
need string or bytes attributes AFAICS.

Start by implementing the application/octet-stream and
text/plain;charset=utf-8 object types, of course.

 > It does seem to make sense to think about headers as text header names  
 > and text header values.

I disagree.  IMHO, structured header types should have object values,
and something like

message['to'] = "Barry 'da FLUFL' Warsaw <barry at python.org>"

should be smart enough to detect that it's a string and attempt to
(flexibly) parse it into a fullname and a mailbox adding escapes, etc.
Whether these should be structured objects or they can be strings or
bytes, I'm not sure (probably bytes, not strings, though -- see next
exampl).  OTOH

message['to'] = b'''"Barry 'da.FLUFL' Warsaw" <barry at python.org>'''

should assume that the client knows what they are doing, and should
parse it strictly (and I mean "be a real bastard", eg, raise an
exception on any non-ASCII octet), merely dividing it into fullname
and mailbox, and caching the bytes for later insertion in a
wire-format message.

 > In that case, I think you want the values as unicodes, and probably  
 > the headers as unicodes containing only ASCII.  So your table would be  
 > strings in both cases.  OTOH, maybe your application cares about the  
 > raw underlying encoded data, in which case the header names are  
 > probably still strings of ASCII-ish unicodes and the values are  
 > bytes.  It's this distinction (and I think the competing use cases)  
 > that make a true Python 3.x API for email more complicated.

I don't see why you can't have the email API be specific, with
message['to'] always returning a structured_header object (or maybe
even more specifically an address_header object), and methods like

message['to'].build_header_as_text()

which returns

"""To: "Barry 'da.FLUFL' Warsaw" <barry at python.org>"""

and

message['to'].build_header_in_wire_format()

which returns

b"""To: "Barry 'da.FLUFL' Warsaw" <barry at python.org>"""

Then have email.textview.Message and email.wireview.Message which
provide a simple interface where message['to'] would invoke
.build_header_as_text() and .build_header_in_wire_format()
respectively.

 > Thinking about this stuff makes me nostalgic for the sloppy happy days  
 > of Python 2.x

Er, yeah.

Nostalgic-for-the-BITNET-days-where-everything-was-Just-EBCDIC-ly y'rs,


More information about the Python-Dev mailing list