[Python-Dev] directive statement (PEP 244)
Paul Prescod
paulp@ActiveState.com
Mon, 16 Jul 2001 12:22:43 -0700
"M.-A. Lemburg" wrote:
>
>...
>
> Hmm, I guess you have something like this in mind...
>
> 1. read the file
> 2. decode it into Unicode assuming some fixed per-file encoding
> 3. tokenize the Unicode content
> 4. compile it,
Right. This is how XML, Java, Perl etc. work. XML and Python would be
the only languages to actually declare the encoding in use (in ASCII). I
think that the declaration way is clearly superior to depending on
command line arguments or BOMs.
But this is just how it has to *look* to the user. If there is an
implementation that behind the scenes only decodes Unicode literals,
that would be fine.
> ... creating Unicode objects from the given Unicode data
> and creating string objects from the Unicode literal data
> by first reencoding the Unicode data into 8-bit string data
Or we could just disallow non-ASCII 8-bit strings literals in files that
use the declaration. That was never a feature Guido really intended to
support (as I understand it!) and I don't see a need to carry it
forward. If you are in the Unicode universe then the need to put binary
data in 8-bit string literals is massively reduced.
> To make this backwards compatible, the implementation would have to
> assume Latin-1 as the original file encoding if not given (otherwise,
> binary data currently stored in 8-bit strings wouldn't make the
> roundtrip).
Another way to think about it is that files without the declaration skip
directly to the tokenize step and skip the decoding step.
--
Take a recipe. Leave a recipe.
Python Cookbook! http://www.ActiveState.com/pythoncookbook