what is this UnicodeDecodeError:....?

Wed Oct 11 12:18:41 EDT 2006

John Machin wrote:
> kath wrote:
> > I have a number of excel files. In each file DATE is represented by
> > different name. I want to read the date from those different file. Also
> > the date is in different column in different file.
> >
> > To identify the date field in different files I have created a file
> > called _globals where I keep all aliases for DATE in a array called
> > 'alias_DATE'.
>
> It's actually a list. In Python an array is something else; look at the
> docs for the array module if you're interested.
>
> >
> > Array alias_DATE looks like,
> >
> > alias_DATE=['TRADEDATE', 'Accounting Date', 'Date de VL','Datum',
> > 'Kurs-datum', 'Date', 'Fecha Datos', 'Calculation Date', 'ClosingDate',
> > 'Pricing Date', 'NAV Date', 'NAVDate', 'NAVDATE', 'ValuationDate',
> > 'Datestamp', 'Fecha de Valoración', 'Kurs-','datum',
> > """Kurs-\ndatum""", "Kurs-\ndatum"]
>
> Nothing to do with the question you asked, but the last two entries
> have the same value; is that intentional?
> | >>> """Kurs-\ndatum""" == "Kurs-\ndatum"
> | True
>
>
> >
> > Now I want the index of the column where date is there.  I followed the
> > with followin code.
> >
> >
> > >>> b=xlrd.open_workbook('Santander_051206.xls')
> > >>> sh=b.sheet_by_index(0)
> > >>> sh.cell_value(rowx=0, colx=11)
> > u'Fecha de Valoraci\xf3n'
> > >>> val=sh.cell_value(rowx=0, colx=11)
> > >>> val
> > u'Fecha de Valoraci\xf3n'
> > >>> print val
> > Fecha de Valoración
> > >>> import _globals		# the file where I have stored my 'alias_DATE' array
> > >>> _globals.alias_DATE.index(val)
> > Traceback (most recent call last):
> >   File "<interactive input>", line 1, in ?
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position
> > 17: ordinal not in range(128)
> > >>>
> >
> > Though I have matching value in the array, why I am getting this error.
> > Can any one please tell me why is this error, and how to get rid of
> > this error. Because I have some files which containing some more
> > special characters.
> >
>
> Hello again, Sudhir.
>
> The text string returned by xlrd is a unicode object (u'Fecha de
> Valoraci\xf3n'). The text strings in your list are str objects, encoded
> in some unspecified encoding. Python is trying to convert the str
> object 'Fecha de Valoración' to Unicode, using the (default) ascii
> codec to do the conversion, and failing.
>
> One way to handle this is to specify any non-ASCII strings in your
> lookup list as unicode, like this:
>
> contents of sudhir.py:
> | # -*- coding: cp1252 -*-
> | alist = ['Datestamp', u'Fecha de Valoraci\xf3n', 'Kurs-','datum']
> | blist = ['Datestamp', u'Fecha de Valoración', 'Kurs-','datum']
> | assert alist == blist
> | val = u'Fecha de Valoraci\xf3n'
> | print 'a', alist.index(val)
> | print 'b', blist.index(val)
>
> | OS prompt>sudhir.py
> | a 1
> | b 1
>
> Note: the encoding "cp1252" is appropriate to my environment, not
> necessarily to yours.
>
> You may like to have a look through this:
>  http://www.amk.ca/python/howto/unicode
>
> HTH,
> John

Hi.... thanks for your brave reply. The link you gave was the good one.
It had comprehensive information.I enjoyed reading it. Well it cleared
my doubts regarding encoding data, what is Unicode data, how to deal
with unicode data.

Thank you very much..

Regards,
sudhir.