a regexp riddle: re.search(r'(?:(\w+), |and (\w+))+', 'whatever a, bbb, and c') =? ('a', 'bbb', 'c')

Steve Holden steve at holdenweb.com
Thu Nov 25 10:25:52 EST 2010


On 11/24/2010 10:46 PM, Phlip wrote:
> HypoNt:
> 
> I need to turn a human-readable list into a list():
> 
>    print re.search(r'(?:(\w+), |and (\w+))+', 'whatever a, bbb, and
> c').groups()
> 
> That currently returns ('c',). I'm trying to match "any word \w+
> followed by a comma, or a final word preceded by and."
> 
> The match returns 'a, bbb, and c', but the groups return ('bbb', 'c').
> What do I type for .groups() to also get the 'a'?
> 
> Please go easy on me (and no RTFM!), because I have only been using
> regular expressions for about 20 years...

A kind of lazy way just uses a pattern for the separators to fuel a call
to re.split(). I assume that " and " and " , " are both acceptable in
any position:

The best I've been able to do so far (due to split's annoying habit of
including the matches of any groups in the pattern I have to throw away
every second element) is:

>>> re.split("\s*(,|and)?\s*", 'whatever a, bbb, and c')[::2]
['whatever', 'a', 'bbb', '', 'c']

That empty string is because of the ", and" which isn't recognise as a
single delimiter.

A parsing package might give you better results.

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
PyCon 2011 Atlanta March 9-17       http://us.pycon.org/
See Python Video!       http://python.mirocommunity.org/
Holden Web LLC                 http://www.holdenweb.com/




More information about the Python-list mailing list