[Tutor] re module- puzzling results when matching money

Alex Kleider akleider at sonic.net
Sun Aug 4 01:34:34 CEST 2013


On 2013-08-03 13:30, Albert-Jan Roskam wrote:


> Word boundary.  This is a zero-width assertion that matches only at the
> beginning or end of a word.  A word is defined as a sequence of 
> alphanumeric
> characters, so the end of a word is indicated by whitespace or a
> non-alphanumeric character.[http://docs.python.org/2/howto/regex.html]
> So I think it's because a dollar sign is not an alphanumeric character.

I get it now, thanks.


> 
>>>> re.findall(r"\b\e\b", "d e f")
                     ^
I'm puzzled by the presence of the '\' character before the 'e' above.
Testing suggests that its presence or absence seems to make no 
difference.


> ['e']
>>>> re.findall(r"\b\$\b", "d $ f")
                     ^
Here it escapes the '$' which would otherwise be a metachar.

> []
>>>> re.findall(r"\b\&\b", "d & f")
                     ^
Here also I don't understand but again it seems not to matter.

> []
> 
> 
> How about this version (I like the re.VERBOSE/re.X flag!)

I am also now getting to like re.VERBOSE

> 
> import re
> import collections
> 
> regex = r"""(?P<sign>\$)
>             (?P<dollars>\d*)
>             (?:\.)
>             (?P<cents>\d{2})"""
> target = \
> """Cost is $4.50. With a $.30 discount:
> Price is $4.15.
> The price could be less, say $4 or $4.
> Let's see how this plays out:  $4.50.60
> """
> Match = collections.namedtuple("Match", "sign dollars cents")
> matches = [Match(*match) for match in re.findall(regex, target, re.X)]
> for match in matches:
>     print repr(match.sign), repr(match.dollars), repr(match.cents)

'collections' is new to me.  A new topic to study.
Thanks for the help, much appreciated!
alex k


More information about the Tutor mailing list