Python strings outside the 128 range
Fredrik Lundh
fredrik at pythonware.com
Thu Jul 13 11:02:15 EDT 2006
Gerhard Fiedler wrote:
> If I understand you correctly, you are saying that if I distribute a file
> with the following lines:
>
> s = "é"
> print s
>
> I basically need to distribute also the information how the file is encoded
> and every user needs to use the same (or a compatible) encoding for reading
> this file?
if you put a, say, chr(233) in an 8-bit string literal in your source code, whoever runs
your program will get a chr(233) byte (unless someone's recoded the file on the way;
ordinary file copies and installation tools usually don't do that). how your program is
treating that chr(233) is up to your program.
to write robust and future-proof code,
- use Unicode literals if you want to put non-ASCII *text* in Python string literals,
and use a PEP 263-style coding directive to tell the parser what encoding your file
is using:
http://www.python.org/dev/peps/pep-0263/
- avoid putting non-ASCII characters in 8-bit literal strings; use escape sequences if
you need to embed binary data in a string literal.
also see the "lexical analysis" section in the language reference:
http://pyref.infogami.com/lexical-analysis
</F>
More information about the Python-list
mailing list