Python strings outside the 128 range

Fredrik Lundh fredrik at pythonware.com
Thu Jul 13 11:02:15 EDT 2006


Gerhard Fiedler wrote:

> If I understand you correctly, you are saying that if I distribute a file
> with the following lines:
>
>   s = "é"
>   print s
>
> I basically need to distribute also the information how the file is encoded
> and every user needs to use the same (or a compatible) encoding for reading
> this file?

if you put a, say, chr(233) in an 8-bit string literal in your source code, whoever runs
your program will get a chr(233) byte (unless someone's recoded the file on the way;
ordinary file copies and installation tools usually don't do that).  how your program is
treating that chr(233) is up to your program.

to write robust and future-proof code,

- use Unicode literals if you want to put non-ASCII *text* in Python string literals,
  and use a PEP 263-style coding directive to tell the parser what encoding your file
  is using:

        http://www.python.org/dev/peps/pep-0263/

- avoid putting non-ASCII characters in 8-bit literal strings; use escape sequences if
  you need to embed binary data in a string literal.

also see the "lexical analysis" section in the language reference:

    http://pyref.infogami.com/lexical-analysis

</F> 






More information about the Python-list mailing list