[Python-Dev] directive statement (PEP 244)

M.-A. Lemburg mal@lemburg.com
Mon, 16 Jul 2001 23:38:50 +0200


Guido van Rossum wrote:
> 
> > Hmm, I guess you have something like this in mind...
> >
> > 1. read the file
> > 2. decode it into Unicode assuming some fixed per-file encoding
> > 3. tokenize the Unicode content
> > 4. compile it, creating Unicode objects from the given Unicode data
> >    and creating string objects from the Unicode literal data
> >    by first reencoding the Unicode data into 8-bit string data
> >
> > To make this backwards compatible, the implementation would have to
> > assume Latin-1 as the original file encoding if not given (otherwise,
> > binary data currently stored in 8-bit strings wouldn't make the
> > roundtrip).
> 
> To be compatible with the current default encoding, I would use ASCII
> as the default encoding and issue an error if any non-ASCII characters
> are found.  One should always use hex/oct escapes to enter binary data
> in literals!

Hmm, Latin-1 and other locale-specific encodings
are currently being used in 8-bit strings by far too many people 
in Europe and elsewhere... people won't feel good about it.

Note that the reason for using Latin-1 is that Latin-1 decoded
into Unicode and then reencoded into Latin-1 is a 1-1
mapping for all 8-bit values -- this gives us binary 
backward compatibility.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Consulting & Company:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/