python regex character group matches

christopher taylor christopher.paul.taylor at gmail.com
Wed Sep 17 09:27:47 EDT 2008


hello python-list!

the other day, i was trying to match unicode character sequences that
looked like this:

\\uAD0X...

my issue, is that the pattern i used was returning:

[ '\\uAD0X', '\\u1BF3', ... ]

when i expected:

[ '\\uAD0X\\u1BF3', ]

the code looks something like this:

pat = re.compile("(\\\u[0-9A-F]{4})+", re.UNICODE|re.LOCALE)
#print pat.findall(txt_line)
results = pat.finditer(txt_line)

i ran the pattern through a couple of my colleagues and they were all
in agreement that my pattern should have matched correctly.

is this a simple case of a messed up regex or am i not using the regex
api correctly?

cheers,

ct



More information about the Python-list mailing list