Problem Regarding Handling of Unicode string

Piet van Oostrum piet at cs.uu.nl
Mon Aug 10 11:58:27 EDT 2009


>>>>> joy99 <subhakolkata1234 at gmail.com> (j) wrote:

>j> Dear Group,
>j> I am using Python26 on WindowsXP with service pack2. My GUI is IDLE.
>j> I am using Hindi resources and get nice output like:
>j> एक
>j> where I can use all the re functions and other functions without doing
>j> any transliteration,etc.
>j> I was trying to use Bengali but it is giving me output like:
>j> '\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'
>j> I wanted to see Bengali output as
>j> অনেক
>j> and I like to use all functions including re.
>j> If any one can help me on that.
>j> Best Regards,
>j> Subhabrata.

Make sure your stdout (in case you use print) has utf-8 encoding. This
might be problematic on Windows, however.

>>> print '\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'
অনেক

Or if you write to a file, open it with utf-8 encoding.

I take utf-8 because in general this is the preferred encoding for
non-ASCII text. It could be that Bengali has a different preferred encoding.
-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: piet at vanoostrum.org



More information about the Python-list mailing list