How to get Python to default to UTF8
weheh
weheh at verizon.net
Sun Dec 23 00:49:02 EST 2007
Hi Fredrik,
Thanks again for your feedback. I am much obliged.
Indeed, I am forced to be exteremely rigorous about decoding on the way in
and encoding on the way out everywhere in my program, just as you say. Your
advice is excellent and concurs with other sources of unicode expertise.
Following this approach is the only thing that has made it possible for me
to get my program to work.
However, the situation is still unacceptable to me because I often make
mistakes and it is easy for me to miss places where encoding is necessary. I
rely on testing to find my faults. On my development environment, I get no
error message and it seems that everything works perfectly. However, once
ported to the server, I see a crash. But this is too late a stage to catch
the error since the app is already live.
I assume that the default encoding that you mention shouldn't ever be
changed is stored in the site.py file. I've checked this file and it's set
to ascii in both machines (development and server). I haven't touched
site.py. However, a week or so ago, following the advice of someone I read
on the web, I did create a file in my cgi-bin directory called something
like site-config.py, wherein encoding was set to utf8. I ran my program a
few times, but then reading elsewhere that the site-config.py approach was
outmoded, I decided to remove this file. I'm wondering whether it made a
permanent change somewhere in the bowels of python while I wasn't looking?
Can you elaborate on where to look to see what stdin/stdout encodings are
set to? All inputs are coming at my app either via html forms or input
files. All output goes either to the browser via html or to an output file.
>
> to fix this, figure out from where you got the encoded (8-bit) string, and
> make sure you decode it properly on the way in. only use Unicode strings
> on the "inside".
>
> (Python does have two encoding defaults; there's a default encoding that
> *shouldn't* ever be changed from the "ascii" default, and there's also a
> stdin/stdout encoding that's correctly set if you run the code in an
> ordinary terminal window. if you get your data from anywhere else, you
> cannot trust any of these, so you should do your own decoding on the way
> in, and encoding things on the way out).
>
> </F>
>
More information about the Python-list
mailing list