[Tutor] New Issues with REGEX Greediness:

Mon Aug 3 16:49:46 CEST 2009

On Sun, Aug 2, 2009 at 2:55 PM, gpo<goodpotatoes at yahoo.com> wrote:
>
> Python:
> line='>Checking Privilege for UserId:
> {874BE70A-194B-DE11-BE5C-000C297901A5}, PrivilegeId: {prvReadSdkMessage}.
> Returned hr = 0'
> (re.search('(\w+)\:.+.{8}-.{4}-.{4}-.{4}-.{12}',line)).group(0)
> RESULT
> 'UserId: {874BE70A-194B-DE11-BE5C-000C297901A5'
>
> How/Why are these results different?  What is Python doing differently in
> regex, that I need to adjust to?

The group(0) method returns the entire portion of the string that
matched your regex.  Group(1) gives you what you want -- the portion
that matched the first parenthesized capture group:

>>> import re
>>> line='>Checking Privilege for UserId: {874BE70A-194B-DE11-BE5C-000C297901A5}, PrivilegeId: {prvReadSdkMessage}. Returned hr = 0'

>>> (re.search('(\w+)\:.+.{8}-.{4}-.{4}-.{4}-.{12}',line)).group(0)
'UserId: {874BE70A-194B-DE11-BE5C-000C297901A5'

>>> (re.search('(\w+)\:.+.{8}-.{4}-.{4}-.{4}-.{12}',line)).group(1)
'UserId'
>>>

The section of the docs on the group method of MatchObjects may be
helpful: http://docs.python.org/library/re.html#re.MatchObject.group

-- 
Jerry