Extracting text from a string + note

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Thu Sep 7 15:24:04 EDT 2006


Tempo:
> I am having a little trouble extracting text from a string. The
> string that I am dealing with is pasted below, and I want to
> extract the prices that are contained in the string below.

This may help:

>>> import re
>>> reg = r"(?<=  \$  )  (?:  \d* \.? \d*  )"
>>> prices = re.compile(reg, flags=re.VERBOSE)
>>> prices.findall('</span>, $66.99 <span class="sale"> $.99')
['66.99', '.99']

You can read about Python regular expressions:
http://www.amk.ca/python/howto/regex/
http://docs.python.org/lib/module-re.html

------------------------

Perl 6 regular expressions are verbose by default, future Python may do
the same.

>From Apocalypse 5, by Larry Wall:
http://dev.perl.org/perl6/doc/design/apo/A05.html

>In real life, tokens are more recognizable if they are separated by whitespace.<

>Now, you may rightly point out that + is something we already have, and we already introduced /x to allow whitespace, so why is this bullet point here? Well, there's a lot of inertia in culture, and the problem with /x is that it's not the default, so people don't think to turn it on when it would probably do a lot of good. The culture is biased in the wrong direction. Whitespace around tokens should be the norm, not the exception. It should be acceptable to use whitespace to separate tokens that could be confused.<

Bye,
bearophile




More information about the Python-list mailing list