regular expression unicode character class trouble

Steven Bethard steven.bethard at gmail.com
Sun Sep 4 15:08:36 EDT 2005


Diez B. Roggisch wrote:
> Hi,
> 
> I need in a unicode-environment the character-class
> 
> set("\w") - set("[0-9]")
> 
> or aplha w/o num. Any ideas how to create that?

I'd use something like r"[^_\d\W]", that is, all things that are neither 
underscores, digits or non-alphas.  In action:

py> re.findall(r'[^_\d\W]+', '42badger100x__xxA1BC')
['badger', 'x', 'xxA', 'BC']

HTH,

STeVe



More information about the Python-list mailing list