[Tutor] latin-1 to unicode in python

Kent Johnson kent37 at tds.net
Wed Aug 2 22:39:31 CEST 2006


anil maran wrote:
> how to determine
> wat encoding it is in
> for eg i m might not know it is in latin-1
This is hard. It is better by far to know what encoding your data is in. 
There is no way to determine for sure what encoding it is by looking at 
the data. The best you can do is rule out some encodings, and make a 
best guess. Here is one way:
http://chardet.feedparser.org/

You can also try to decode the text using different encodings and use 
the first one that works. This is risky because latin-1 will always work.

Finally, a non-Python method - MS Word is pretty good about guessing the 
encodings of text files.

Kent

PS Please reply on list
>
> */Kent Johnson <kent37 at tds.net>/* wrote:
>
>     anil maran wrote:
>     > Unicode?
>     > im getting this error:
>     > invalid byte sequence for encoding "UTF8": 0x92
>     >
>     > since the db is not handling latin-1 and is set to use UTF8 how
>     do i
>     > handle this
>
>     If you have a latin-1 string and you want utf-8, convert it to
>     Unicode
>     and then to utf-8 using decode() and encode():
>
>     In [1]: s='\x92'
>
>     In [3]: s.decode('latin-1').encode('utf-8')
>     Out[3]: '\xc2\x92'
>
>     Kent
>
>     _______________________________________________
>     Tutor maillist - Tutor at python.org
>     http://mail.python.org/mailman/listinfo/tutor
>
>
> ------------------------------------------------------------------------
> How low will we go? Check out Yahoo! Messenger’s low PC-to-Phone call 
> rates. 
> <http://us.rd.yahoo.com/mail_us/taglines/postman8/*http://us.rd.yahoo.com/evt=39663/*http://voice.yahoo.com>




More information about the Tutor mailing list