Python's regular expression?

Mirco Wahab peace.is.our.profession at gmx.de
Tue May 9 04:02:28 EDT 2006


Hi Duncan

> Nick Craig-Wood wrote:
>> Which translates to
>>   match = re.search('(blue|white|red)', t)
>>   if match:
>>   else:
>>      if match:
>>      else:
>>         if match:
> 
> This of course gives priority to colours and only looks for garments or 
> footwear if the it hasn't matched on a prior pattern. If you actually 
> wanted to match the first occurrence of any of these (or if the condition 
> was re.match instead of re.search) then named groups can be a nice way of 
> simplifying the code:

A good point. And a good example when to use named
capture group references. This is easily extended
for 'spitting out' all other occuring categories
(see below).

> PATTERN = '''
>     (?P<c>blue|white|red)
>     ...

This is one nice thing in Pythons Regex Syntax,
you have to emulate the ?P-thing in other
Regex-Systems more or less 'awk'-wardly ;-)

> For something this simple the titles and group names could be the 
> same, but I'm assuming real code might need a bit more.
Non no, this is quite good because it involves
some math-generated table-code lookup.

I managed somehow to extend your example in order
to spit out all matches and their corresponding
category:

  import re

  PATTERN = '''
      (?P<c>blue |white |red    )
  |   (?P<g>socks|tights        )
  |   (?P<f>boot |shoe  |trainer)
  '''

  PATTERN = re.compile(PATTERN , re.VERBOSE)
  TITLES = { 'c': 'Colour', 'g': 'Garment', 'f': 'Footwear' }

  t = 'blue socks and red shoes'
  for match in PATTERN.finditer(t):
      grp = match.lastgroup
      print "%s: %s" %( TITLES[grp], match.group(grp) )

which writes out the expected:
   Colour: blue
   Garment: socks
   Colour: red
   Footwear: shoe

The corresponding Perl-program would look like this:

   $PATTERN = qr/
       (blue |white |red    )(?{'c'})
   |   (socks|tights        )(?{'g'})
   |   (boot |shoe  |trainer)(?{'f'})
   /x;

   %TITLES = (c =>'Colour', g =>'Garment', f =>'Footwear');

   $t = 'blue socks and red shoes';
   print "$TITLES{$^R}: $^N\n" while( $t=~/$PATTERN/g );

and prints the same:
   Colour: blue
   Garment: socks
   Colour: red
   Footwear: shoe

You don't have nice named match references (?P<..>)
in Perl-5, so you have to emulate this by an ordinary
code assertion (?{..}) an set some value ($^R) on
the fly - which is not that bad in the end (imho).

(?{..}) means "zero with code assertion",
this sets Perl-predefined $^R to its evaluated
value from the {...}

As you can see, the pattern matching related part
reduces from 4 lines to one line.

If you wouldn't need dictionary lookup and
get away with associated categories, all
you'd have to do would be this:

   $PATTERN = qr/
       (blue |white |red    )(?{'Colour'})
   |   (socks|tights        )(?{'Garment'})
   |   (boot |shoe  |trainer)(?{'Footwear'})
   /x;

   $t = 'blue socks and red shoes';
   print "$^R: $^N\n" while( $t=~/$PATTERN/g );

What's the point of all that? IMHO, Python's
Regex support is quite good and useful, but
won't give you an edge over Perl's in the end.

Thanks & Regards

Mirco




More information about the Python-list mailing list