How to get Python to default to UTF8

weheh weheh at verizon.net
Sun Dec 23 00:49:02 EST 2007


Hi Fredrik,

Thanks again for your feedback. I am much obliged.

Indeed, I am forced to be exteremely rigorous about decoding on the way in 
and encoding on the way out everywhere in my program, just as you say. Your 
advice is excellent and concurs with other sources of unicode expertise. 
Following this approach is the only thing that has made it possible for me 
to get my program to work.

However, the situation is still unacceptable to me because I often make 
mistakes and it is easy for me to miss places where encoding is necessary. I 
rely on testing to find my faults. On my development environment, I get no 
error message and it seems that everything works perfectly. However, once 
ported to the server, I see a crash. But this is too late a stage to catch 
the error since the app is already live.

I assume that the default encoding that you mention shouldn't ever be 
changed is stored in the site.py file. I've checked this file and it's set 
to ascii in both machines (development and server). I haven't touched 
site.py. However, a week or so ago, following the advice of someone I read 
on the web, I did create a file in my cgi-bin directory called something 
like site-config.py, wherein encoding was set to utf8. I ran my program a 
few times, but then reading elsewhere that the site-config.py approach was 
outmoded, I decided to remove this file. I'm wondering whether it made a 
permanent change somewhere in the bowels of python while I wasn't looking?

Can you elaborate on where to look to see what stdin/stdout encodings are 
set to? All inputs are coming at my app either via html forms or input 
files. All output goes either to the browser via html or to an output file.


>
> to fix this, figure out from where you got the encoded (8-bit) string, and 
> make sure you decode it properly on the way in.  only use Unicode strings 
> on the "inside".
>
> (Python does have two encoding defaults; there's a default encoding that 
> *shouldn't* ever be changed from the "ascii" default, and there's also a 
> stdin/stdout encoding that's correctly set if you run the code in an 
> ordinary terminal window.  if you get your data from anywhere else, you 
> cannot trust any of these, so you should do your own decoding on the way 
> in, and encoding things on the way out).
>
> </F>
> 





More information about the Python-list mailing list