[Tutor] re module- puzzling results when matching money

Sat Aug 3 22:30:14 CEST 2013

----- Original Message -----

> From: Alex Kleider <akleider at sonic.net>
> To: Python Tutor <tutor at python.org>
> Cc: 
> Sent: Saturday, August 3, 2013 8:15 PM
> Subject: [Tutor] re module- puzzling results when matching money
> 
> #!/usr/bin/env python
> 
> """
> I've been puzzling over the re module and have a couple of questions
> regarding the behaviour of this script.
> 
> I've provided two possible patterns (re_US_money):
> the one surrounded by the 'word boundary' meta sequence seems not to 
> work
> while the other one does. I can't understand why the addition of the 
> word
> boundary defeats the match.

\b
Word boundary.  This is a zero-width assertion that matches only at the
beginning or end of a word.  A word is defined as a sequence of alphanumeric
characters, so the end of a word is indicated by whitespace or a
non-alphanumeric character.[http://docs.python.org/2/howto/regex.html]
So I think it's because a dollar sign is not an alphanumeric character.

>>> re.findall(r"\b\e\b", "d e f")
['e']
>>> re.findall(r"\b\$\b", "d $ f")
[]
>>> re.findall(r"\b\&\b", "d & f")
[]

How about this version (I like the re.VERBOSE/re.X flag!)

import re
import collections

regex = r"""(?P<sign>\$)
            (?P<dollars>\d*)
            (?:\.)
            (?P<cents>\d{2})"""
target = \
"""Cost is $4.50. With a $.30 discount:
Price is $4.15.
The price could be less, say $4 or $4.
Let's see how this plays out:  $4.50.60
"""
Match = collections.namedtuple("Match", "sign dollars cents")
matches = [Match(*match) for match in re.findall(regex, target, re.X)]
for match in matches:
    print repr(match.sign), repr(match.dollars), repr(match.cents)