Regular expression

Fri Jun 20 03:38:08 EDT 2008

Sallu <praveen.sunsetpoint at gmail.com> wrote:
> string = 'riché'
...
> unicode(string)).encode('ASCII', 'ignore')
...
> 
> Output :
> 
> sys:1: DeprecationWarning: Non-ASCII character '\xc3' in file regu.py
> on line 4, but no encoding declared; see
> http://www.python.org/peps/pep-0263.html for details
> riché
> Traceback (most recent call last):
>   File "regu.py", line 13, in ?
>     msg=strip_accents(string)
>   File "regu.py", line 10, in strip_accents
>     return unicodedata.normalize('NFKD',
> unicode(string)).encode('ASCII', 'ignore')
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
> 4: ordinal not in range(128)
> 
> 
The problem is the expression: unicode(string) which is equivalent to 
saying string.decode('ascii')

The string contains a non-ascii character so the decode fails. You should 
specify whatever encoding you used for the source file. From the error 
message it looks like you used utf-8, so "string.decode('utf-8')" should 
give you a unicode string to work with.

-- 
Duncan Booth http://kupuguy.blogspot.com