Ascii to Unicode.

Ethan Furman ethan at stoneleaf.us
Thu Jul 29 14:34:18 EDT 2010


Joe Goldthwaite wrote:
> Hi Ulrich,
> 
> Ascii.csv isn't really a latin-1 encoded file.  It's an ascii file with a
> few characters above the 128 range . . .

It took me a while to get this point too (if you already have "gotten 
it", I apologize, but the above comment leads me to believe you haven't).

*Every* file is an encoded file... even your UTF-8 file is encoded using 
the UTF-8 format.  Someone correct me if I'm wrong, but I believe 
lower-ascii (0-127) matches up to the first 128 Unicode code points, so 
while those first 128 code-points translate easily to ascii, ascii is 
still an encoding, and if you have characters higher than 127, you don't 
really have an ascii file -- you have (for example) a cp1252 file (which 
also, not coincidentally, shares the first 128 characters/code points 
with ascii).

Hopefully I'm not adding to the confusion.  ;)

~Ethan~



More information about the Python-list mailing list