Regular expression

Fri Jun 20 04:20:13 EDT 2008

Duncan Booth pisze:
> Sallu <praveen.sunsetpoint at gmail.com> wrote:
>> string = 'riché'
> ...
>> unicode(string)).encode('ASCII', 'ignore')
> ...
>> Output :
>>
>> sys:1: DeprecationWarning: Non-ASCII character '\xc3' in file regu.py
>> on line 4, but no encoding declared; see
>> http://www.python.org/peps/pep-0263.html for details
>> riché
>> Traceback (most recent call last):
>>   File "regu.py", line 13, in ?
>>     msg=strip_accents(string)
>>   File "regu.py", line 10, in strip_accents
>>     return unicodedata.normalize('NFKD',
>> unicode(string)).encode('ASCII', 'ignore')
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
>> 4: ordinal not in range(128)
>>
>>
> The problem is the expression: unicode(string) which is equivalent to 
> saying string.decode('ascii')
> 
> The string contains a non-ascii character so the decode fails. You should 
> specify whatever encoding you used for the source file. From the error 
> message it looks like you used utf-8, so "string.decode('utf-8')" should 
> give you a unicode string to work with.
> 
> 

Or just specify source encoding like that:
#!/usr/bin/python
# -*- coding: utf-8 -*-

or

#!/usr/bin/python
# coding=utf-8


-- 
Soltys

"Free software is a matter of liberty not price"