Python's regular expression?

Nick Craig-Wood nick at craig-wood.com
Mon May 8 10:30:12 EDT 2006


Mirco Wahab <peace.is.our.profession at gmx.de> wrote:
>  After some minutes in this NG I start to get
>  the picture. So I narrowed the above regex-question
>  down to a nice equivalence between Perl and Python:
> 
>  Python:
> 
>     import re
> 
>     t = 'blue socks and red shoes'
>     if re.match('blue|white|red', t):
>         print t
> 
>     t = 'blue socks and red shoes'
>     if re.search('blue|white|red', t):
>        print t
> 
>  Perl:
> 
>     use Acme::Pythonic;
> 
>     $t = 'blue socks and red shoes'
>     if $t =~ /blue|white|red/:
>       print $t
> 
>  And Python Regexes eventually lost (for me) some of
>  their (what I believed) 'clunky appearance' ;-)

If you are used to perl regexes there is one clunkiness of python
regexpes which you'll notice eventually...

Let's make the above example a bit more "real world", ie use the
matched item in some way...

Perl:

   $t = 'blue socks and red shoes';
   if ( $t =~ /(blue|white|red)/ )
   {
      print "Colour: $1\n";
   }
   
Which prints

  Colour: blue

In python you have to express this like

  import re

  t = 'blue socks and red shoes'
  match = re.search('(blue|white|red)', t)
  if match:
     print "Colour:", match.group(1)

Note the extra variable "match".  You can't do assignment in an
expression in python which makes for the extra verbiosity, and you
need a variable to store the result of the match in (since python
doesn't have the magic $1..$9 variables).

This becomes particularly frustrating when you have to do a series of
regexp matches, eg

   if ( $t =~ /(blue|white|red)/ )
   {
      print "Colour: $1\n";
   }
   elsif ( $t =~ /(socks|tights)/)
   {
      print "Garment: $1\n";
   }
   elsif ( $t =~ /(boot|shoe|trainer)/)
   {
      print "Footwear: $1\n";
   }

Which translates to

  match = re.search('(blue|white|red)', t)
  if match:
     print "Colour:", match.group(1)
  else:
     match = re.search('(socks|tights)', t)
     if match:
        print "Garment:", match.group(1)
     else:
        match = re.search('(boot|shoe|trainer)', t)
        if match:
           print "Footwear:", match.group(1)
           # indented ad infinitum!

You can use a helper class to get over this frustration like this

import re

class Matcher:
  def search(self, r,s):
    self.value = re.search(r,s)
    return self.value
  def __getitem__(self, i):
    return self.value.group(i)

m = Matcher()
t = 'blue socks and red shoes'

if m.search(r'(blue|white|red)', t):
    print "Colour:", m[1]
elif m.search(r'(socks|tights)', t):
    print "Garment:", m[1]
elif m.search(r'(boot|shoe|trainer)', t):
    print "Footwear:", m[1]

Having made the transition from perl to python a couple of years ago,
I find myself using regexpes much less.  In perl everything looks like
it needs a regexp, but python has a much richer set of string methods,
eg .startswith, .endswith, good subscripting and the nice "in"
operator for strings.

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list