Begineer Question : Global string substitution with re
Gary Herron
gherron at islandtraining.com
Mon Sep 22 05:40:06 EDT 2003
On Monday 22 September 2003 02:22 am, peter leonard wrote:
> Hi,
> This is a basic question but I can't figure out what is wron - even after
> reading the documentation. I have a script that normalizes strings. One of
> the steps is to convert all fractions too the tag 'fraction'. For example :
>
> import re
> line = "This is the first ratio, 170/37, and this is the second 170/37 "
>
>
> def normalise(text):
>
> #Tag fractions
> fraction = r'(\s+\d+\/\d+\s+)'
> regfr = re.compile(fraction)
> text = regfr.sub(" |fraction| ",text)
>
> #Remove punctuation
> punc = r'\,'
> regpunc = re.compile(punc)
> text = regpunc.sub("",text)
>
> return text
>
> print line,"\n"
> print normalise(line),"\n"
>
>
> The output from this script is :
>
> This is the first ratio, 170/37, and this is the second 170/37
>
> This is the first ratio 170/37 and this is the second |fraction|
>
>
> I can't understand why only one of the fractions gets substituted. The
> documentation for sub states that the default argument for sub is 0 which
> means replace all occurences. The output of my script should be :
>
> This is the first ratio |fraction| and this is the second |fraction|
The problem is that your regular expression ends with "\s+". This means
the digits of the fraction *must* be followed by at least one space,
and the digits of your first fraction are followed by a comma and not
a space.
Your re is matching spaces--fraction--spaces. I'd guess that you
don't really want to match spaces on either side of the fraction.
Gary Herron
More information about the Python-list
mailing list