Help needed with python unicode cgi-bin script
weheh
weheh at verizon.net
Tue Dec 11 12:46:12 EST 2007
Hi John:
Thanks for responding.
>Look at your file using
> print repr(open('c:/test/spanish.txt','rb').read())
>If you see 'a\xf1o' then use charset="windows-1252"
I did this ... no change ... still see 'a\xf1o'
>else if you see 'a\xc3\xb1o' then use charset="utf-8" else ????
>Based on your responses to Martin, it appears that your file is
>actually windows-1252 but you are telling browsers that it is utf-8.
>Another check: if the file is utf-8, then doing
> open('c:/test/spanish.txt','rb').read().decode('utf8')
>should be OK; if it's not valid utf8, it will complain.
No. this causes decode error:
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-4: invalid
data
args = ('utf8', 'a\, 1, 5, 'invalid data')
encoding = 'utf8'
end = 5
object = 'a\xf1o'
reason = 'invalid data'
start = 1
>Yet another check: open the file with Notepad. Do File/SaveAs, and
>look at the Encoding box -- ANSI or UTF-8?
Notepad says it's ANSI
Thanks. What now? Also, this is a general problem for me, whether I read
from a file or read from an html text field, or read from an html text area.
So I'm looking for a general solution. If it helps to debug by reading from
textarea or text field, let me know.
More information about the Python-list
mailing list