[Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?]

Steven D'Aprano steve at pearwood.info
Wed Jan 8 01:39:11 CET 2014


On Tue, Jan 07, 2014 at 08:48:05AM -0800, Ethan Furman wrote:

> [...] My binary stream is mixed:
> 
>   - binary that has to be converted (4-byte ints, for example)
>   - ascii that has to be converted (ints stored as ascii text)
>   - encoded text (character and memo fields)

Ethan, you keep referring to ascii text and encoded text as if they are 
different things. They're not. You have a binary file containing bytes. 
Some of those bytes represent data of one kind (say, 4-bit ints). Some 
of those bytes represent data of a different kind (Latin-1 encoded text 
representing character and memo fields) and other bytes represent data 
of a third kind (ASCII encoded text representing ints, but you don't 
mention what the meaning of those ints is).

ASCII or Latin-1, the text is still encoded into bytes, and still needs 
to be decoded back to text. Since Latin-1 is a superset of ASCII, you 
could use Latin-1 for them all, and still get the same result.

Of course you can't just decode the entire file into Latin-1, since 
parts of it represent non-text data, but you could decode all the text 
parts individually using Latin-1 and/or ASCII.

(To those reading and wondering how I know the character and memo fields 
use Latin-1, Ethan has discussed this case on comp.lang.python.)



-- 
Steven


More information about the Python-ideas mailing list