[Python-Dev] directive statement (PEP 244)

Paul Prescod paulp@ActiveState.com
Mon, 16 Jul 2001 12:22:43 -0700


"M.-A. Lemburg" wrote:
> 
>...
> 
> Hmm, I guess you have something like this in mind...
> 
> 1. read the file
> 2. decode it into Unicode assuming some fixed per-file encoding
> 3. tokenize the Unicode content
> 4. compile it, 

Right. This is how XML, Java, Perl etc. work. XML and Python would be
the only languages to actually declare the encoding in use (in ASCII). I
think that the declaration way is clearly superior to depending on
command line arguments or BOMs.

But this is just how it has to *look* to the user. If there is an
implementation that behind the scenes only decodes Unicode literals,
that would be fine.

> ... creating Unicode objects from the given Unicode data
>    and creating string objects from the Unicode literal data
>    by first reencoding the Unicode data into 8-bit string data

Or we could just disallow non-ASCII 8-bit strings literals in files that
use the declaration. That was never a feature Guido really intended to
support (as I understand it!) and I don't see a need to carry it
forward. If you are in the Unicode universe then the need to put binary
data in 8-bit string literals is massively reduced.

> To make this backwards compatible, the implementation would have to
> assume Latin-1 as the original file encoding if not given (otherwise,
> binary data currently stored in 8-bit strings wouldn't make the
> roundtrip).

Another way to think about it is that files without the declaration skip
directly to the tokenize step and skip the decoding step.
-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook