python regex character group matches

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Wed Sep 17 10:55:35 EDT 2008


On Wed, 17 Sep 2008 15:56:31 +0200, Fredrik Lundh wrote:

> Assuming that you want to find runs of \uXXXX escapes, simply use
> non-capturing parentheses:
> 
>     pat = re.compile(u"(?:\\\u[0-9A-F]{4})")

Doesn't work for me:

>>> pat = re.compile(u"(?:\\\u[0-9A-F]{4})")
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 
5-7: truncated \uXXXX escape


Assuming that the OP is searching byte strings, I came up with this:

>>> pat = re.compile('(\\\u[0-9A-F]{4})+')
>>> pat.search('abcd\\u1234\\uAA99\\u0BC4efg').group(0)
'\\u1234\\uAA99\\u0BC4'



-- 
Steven



More information about the Python-list mailing list