[Python-Dev] #pragmas in Python source code
Fredrik Lundh
Fredrik Lundh" <effbot@telia.com
Thu, 13 Apr 2000 13:50:17 +0200
M.-A. Lemburg wrote:
> The current need for #pragmas is really very simple: to tell
> the compiler which encoding to assume for the characters
> in u"...strings..." (*not* "...8-bit strings...").
why not?
why keep on pretending that strings and strings are two
different things? it's an artificial distinction, and it only
causes problems all over the place.
> Could be that we don't need this pragma discussion at all
> if there is a different, more elegant solution to this...
here's one way:
1. standardize on *unicode* as the internal character set. use
an encoding marker to specify what *external* encoding you're
using for the *entire* source file. output from the tokenizer is
a stream of *unicode* strings.
2. if the user tries to store a unicode character larger than 255
in an 8-bit string, raise an OverflowError.
3. the default encoding is "none" (instead of XML's "utf-8"). in
this case, treat the script as an ascii superset, and store each
string literal as is (character-wise, not byte-wise).
additional notes:
-- item (3) is for backwards compatibility only. might be okay to
change this in Py3K, but not before that.
-- leave the implementation of (1) to 1.7. for now, assume that
scripts have the default encoding, which means that (2) cannot
happen.
-- we still need an encoding marker for ascii supersets (how about
<?python encoding=3D"utf-8" version=3D"1.6"?> ;-). however, it's up to
the tokenizer to detect that one, not the parser. the parser only
sees unicode strings.
</F>