Help needed with python unicode cgi-bin script

weheh weheh at verizon.net
Tue Dec 11 12:46:12 EST 2007


Hi John:
Thanks for responding.

>Look at your file using
 >   print repr(open('c:/test/spanish.txt','rb').read())

>If you see 'a\xf1o' then use charset="windows-1252"
I did this ... no change ... still see 'a\xf1o'

>else if you see 'a\xc3\xb1o' then use charset="utf-8" else ????

>Based on your responses to Martin, it appears that your file is
>actually windows-1252 but you are telling browsers that it is utf-8.

>Another check: if the file is utf-8, then doing
 >   open('c:/test/spanish.txt','rb').read().decode('utf8')
>should be OK; if it's not valid utf8, it will complain.
No. this causes decode error:

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-4: invalid 
data
      args = ('utf8', 'a\, 1, 5, 'invalid data')
      encoding = 'utf8'
      end = 5
      object = 'a\xf1o'
      reason = 'invalid data'
      start = 1


>Yet another check: open the file with Notepad. Do File/SaveAs, and
>look at the Encoding box -- ANSI or UTF-8?
Notepad says it's ANSI

Thanks. What now? Also, this is a general problem for me, whether I read 
from a file or read from an html text field, or read from an html text area. 
So I'm looking for a general solution. If it helps to debug by reading from 
textarea or text field, let me know. 





More information about the Python-list mailing list