python regex character group matches
Marc 'BlackJack' Rintsch
bj_666 at gmx.net
Wed Sep 17 09:45:24 EDT 2008
On Wed, 17 Sep 2008 09:27:47 -0400, christopher taylor wrote:
> the other day, i was trying to match unicode character sequences that
> looked like this:
>
> \\uAD0X...
>
> my issue, is that the pattern i used was returning:
>
> [ '\\uAD0X', '\\u1BF3', ... ]
>
> when i expected:
>
> [ '\\uAD0X\\u1BF3', ]
>
> the code looks something like this:
>
> pat = re.compile("(\\\u[0-9A-F]{4})+", re.UNICODE|re.LOCALE) #print
> pat.findall(txt_line)
> results = pat.finditer(txt_line)
>
> i ran the pattern through a couple of my colleagues and they were all in
> agreement that my pattern should have matched correctly.
Correctly for what input? And the examples above are not matching (no
pun intended) the regular expression. `pat` doesn't match '\\uAD0X'
because there's no 'X' in the character class. BTW: Are you sure you
need or want the `re.UNICODE` flag?
Ciao,
Marc 'BlackJack' Rintsch
More information about the Python-list
mailing list