[Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?]

Stephen J. Turnbull stephen at xemacs.org
Tue Jan 7 14:00:17 CET 2014


Ethan Furman writes:

 > I just read your proposal again, and must admit I don't understand
 > how it would help me, but I look forward to testing an
 > implementation!
 > 
 > One wrinkle, though -- the data is binary, and if read would have
 > to be read using the latin1 codec...

That depends on what you mean by "binary".  If the binary payload is
just a blob that gets passed on (eg, as in an HTTP client receiving
and storing a JPEG file), you read the stream as 'ascii-compatible',
parse the headers using regexps or whatever, print any relevant parsed
data to logs using 'ascii-compatible', slice off the blob, and write
the blob to disk as 'ascii-compatible'.  This has the advantage over
latin1 that the bytes are marked as "uninterpreted text".  It doesn't
mean you can't create mojibake; you still can.  But Python will
complain if you try to output it as text in an encoding (unless you
use the 'surrogateescape' handler, in which case you're explicitly
accepting responsibility for any mess you create).

If you mean to process the binary, it would depend on what you want to
do whether it would help or not.  struct- and ctypes-style processing,
no, it won't help because you need to convert to bytes to use those.
(It might make sense to read the headers into a buffer this way, parse
them as ASCII-compatible text, and then read the rest as bytes.)  Pure
byte code, doesn't help, although it probably doesn't hurt.



More information about the Python-ideas mailing list