[Python-Dev] directive statement (PEP 244)

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 17 Jul 2001 00:12:12 +0200


> But this is just how it has to *look* to the user. If there is an
> implementation that behind the scenes only decodes Unicode literals,
> that would be fine.

Formally, you would have to decode everything just to make sure that
everything follows the declared encoding (i.e. no invalid byte
sequences).

I'm also not sure what an "ASCII superset" exactly is. Is it an
encoding where all ASCII strings just mean themselves? If so, and if
you allow encodings that have a shift state, you need to keep track of
the shift state for tokenization: char 39 might not always mean
APOSTROPHE, e.g. if you are in a shift state.

> Or we could just disallow non-ASCII 8-bit strings literals in files that
> use the declaration. 

+1.

> > To make this backwards compatible, the implementation would have to
> > assume Latin-1 as the original file encoding if not given (otherwise,
> > binary data currently stored in 8-bit strings wouldn't make the
> > roundtrip).
> 
> Another way to think about it is that files without the declaration skip
> directly to the tokenize step and skip the decoding step.

That's the way I would think of it also. You don't have Latin-1 values
in such strings - they are just byte strings.

Regards,
Martin