Help with regex and optional substring in search string

Zero Piraeus schesis at gmail.com
Wed Oct 14 11:30:56 EDT 2009


:

2009/10/14 Timur Tabi <timur.tabi at gmail.com>:
> I'm having trouble creating a regex pattern that matches a string that
> has an optional substring in it.  What I'm looking for is a pattern
> that matches both of these strings:
>
> Subject: [PATCH 08/18] This is the patch name
> Subject: This is the patch name
>
> What I want is to extract the "This is the patch name".  I tried this:
>
> m = re.search('Subject:\s*(\[[\w\s]*\])*(.*)', x)
>
> Unfortunately, the second group appears to be too greedy, and returns
> this:
>
>>>> print m.group(1)
> None
>>>> print m.group(2)
> [PATCH 08/18] Subject line

It's not that the second group is too greedy. The first group isn't
matching what you want it to, because neither \w nor \s match the "/"
inside your brackets. This works for your example input:

>>> import re
>>> pattern = re.compile("Subject:\s*(?:\[[^\]]*\])?\s*(.*)")
>>> for s in (
...     "Subject: [PATCH 08/18] This is the patch name",
...     "Subject: This is the patch name",
... ):
...     re.search(pattern, s).group(1)
...
'This is the patch name'
'This is the patch name'

Going through the changes from your original regex in order:

'(?:etc)' instead of '(etc)' are non-grouping parentheses (since you
apparently don't care about that bit).

'[^\]]' instead of '[\w\s]' matches "everything except a closing bracket".

The '\s*' before the second set of parentheses takes out the leading
whitespace that would otherwise be returned as part of the match.

 -[]z.



More information about the Python-list mailing list