[Python-Dev] directive statement (PEP 244)
M.-A. Lemburg
mal@lemburg.com
Mon, 16 Jul 2001 23:38:50 +0200
Guido van Rossum wrote:
>
> > Hmm, I guess you have something like this in mind...
> >
> > 1. read the file
> > 2. decode it into Unicode assuming some fixed per-file encoding
> > 3. tokenize the Unicode content
> > 4. compile it, creating Unicode objects from the given Unicode data
> > and creating string objects from the Unicode literal data
> > by first reencoding the Unicode data into 8-bit string data
> >
> > To make this backwards compatible, the implementation would have to
> > assume Latin-1 as the original file encoding if not given (otherwise,
> > binary data currently stored in 8-bit strings wouldn't make the
> > roundtrip).
>
> To be compatible with the current default encoding, I would use ASCII
> as the default encoding and issue an error if any non-ASCII characters
> are found. One should always use hex/oct escapes to enter binary data
> in literals!
Hmm, Latin-1 and other locale-specific encodings
are currently being used in 8-bit strings by far too many people
in Europe and elsewhere... people won't feel good about it.
Note that the reason for using Latin-1 is that Latin-1 decoded
into Unicode and then reencoded into Latin-1 is a 1-1
mapping for all 8-bit values -- this gives us binary
backward compatibility.
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Consulting & Company: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/