python regex character group matches

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Wed Sep 17 09:45:24 EDT 2008


On Wed, 17 Sep 2008 09:27:47 -0400, christopher taylor wrote:

> the other day, i was trying to match unicode character sequences that
> looked like this:
> 
> \\uAD0X...
>
> my issue, is that the pattern i used was returning:
> 
> [ '\\uAD0X', '\\u1BF3', ... ]
> 
> when i expected:
> 
> [ '\\uAD0X\\u1BF3', ]
> 
> the code looks something like this:
> 
> pat = re.compile("(\\\u[0-9A-F]{4})+", re.UNICODE|re.LOCALE) #print
> pat.findall(txt_line)
> results = pat.finditer(txt_line)
>  
> i ran the pattern through a couple of my colleagues and they were all in
> agreement that my pattern should have matched correctly.

Correctly for what input?  And the examples above are not matching (no 
pun intended) the regular expression.  `pat` doesn't match '\\uAD0X' 
because there's no 'X' in the character class.  BTW: Are you sure you 
need or want the `re.UNICODE` flag?

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list