Problem Regarding Handling of Unicode string

Tue Aug 11 04:17:00 EDT 2009

On Aug 10, 9:26 pm, joy99 <subhakolkata1... at gmail.com> wrote:
> Dear Group,
>
> I am using Python26 on WindowsXP with service pack2. My GUI is IDLE.
> I am using Hindi resources and get nice output like:
> एक
> where I can use all the re functions and other functions without doing
> any transliteration,etc.
> I was trying to use Bengali but it is giving me output like:

WHAT is giving you this output?

> '\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'

In a very ordinary IDLE session (Win XP SP3, Python 2.6.2, locale:
Australia/English, no "Hindi resources"):

>>> x = '\xef\xbb\xbf\xe0\xa6\x85\xe0\xa6\xa8\xe0\xa7\x87\xe0\xa6\x95'
>>> ux = x.decode('utf-8')
>>> ux
u'\ufeff\u0985\u09a8\u09c7\u0995'
>>> print ux
অনেক # looks like what you wanted; please confirm
>>> import unicodedata
>>> for c in ux:
	print unicodedata.name(c)

ZERO WIDTH NO-BREAK SPACE # this is a BOM
BENGALI LETTER A
BENGALI LETTER NA
BENGALI VOWEL SIGN E
BENGALI LETTER KA
>>>

> I wanted to see Bengali output as
> অনেক
> and I like to use all functions including re.
> If any one can help me on that.

"I am using Hindi resources" doesn't tell us much ... except to prompt
the comment that perhaps if you want to display Bengali script, you
may need Bengali resources. However it looks like I can display your
Bengali data without any special resources.

It seems like you are not doing the same with Bengali as you are doing
with Hindi. We can't help you very much if you don't show exactly what
you are doing.

Have you considered asking in an Indian Python forum? Note: you will
still need to say what you are doing that works with Hindi but not
with Bengali.

Cheers,
John