Regular expression

Fri Jun 20 02:16:02 EDT 2008

On Jun 20, 10:58 am, Soltys <sol... at noabuse.com> wrote:
> Hi,
> Your post is not about re, but about encoding, next time
> be more careful when choosing topic for your post!
> Did you check what pep0263 says about encoding?
> One of the first thins it says is:
>
> "(...)
> Defining the Encoding
> Python will default to ASCII as standard encoding if no other
>      encoding hints are given.
> (...)"
>
> So when you're using non ASCII characters you should always
> specify encoding. Here again, read pep0263 for how this can
> be done, especially section Defining Encoding, where there
> are multiple ways of doing that.
>
> Sallu pisze:
>
>
>
> > Hi All,
> > here i have on textbox in which i want to restrict the user to not
> > enter the 'acent character' like ( é )
> > i wrote the program
>
> > import re
> > value="this is Praveen"
> > #value = 'riché gerry'
> > if(re.search(r"^[A-Za-z0-9]*$",value)):
> >   print "Not allowed accent character"
> > else:
> >   print "Valid"
>
> > output :
>
> > sys:1: DeprecationWarning: Non-ASCII character '\xc3' in file regu1.py
> > on line 3, but no encoding declared; seehttp://www.python.org/peps/pep-0263.html
> > for details
> > Valid
>
> > when i make comment #value="this is Praveen" and released comment
> > value = 'riché gerry'
> > but still i am getting same output even it have accent character.
>
> --
> Soltys
>
> "Free software is a matter of liberty not price"

I am sorry sotys..actually i am very much new to python..
import re
import os, sys

string = 'riché'
print string

def strip_accents(string):
  import unicodedata
  return unicodedata.normalize('NFKD',
unicode(string)).encode('ASCII', 'ignore')

msg=strip_accents(string)
print msg

Output :

sys:1: DeprecationWarning: Non-ASCII character '\xc3' in file regu.py
on line 4, but no encoding declared; see http://www.python.org/peps/pep-0263.html
for details
riché
Traceback (most recent call last):
  File "regu.py", line 13, in ?
    msg=strip_accents(string)
  File "regu.py", line 10, in strip_accents
    return unicodedata.normalize('NFKD',
unicode(string)).encode('ASCII', 'ignore')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
4: ordinal not in range(128)