[Tutor] re module / separator

Serdar Tumgoren zstumgoren at gmail.com
Wed Jun 24 21:07:44 CEST 2009


Hey Tiago,

> text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b."
>
> Of this line of text, I want to take out strings where all words start
> with a, end with "b.". But I don't want a list of words. I want that:
>
> ["a2345b.", "a45453b. a325643b. a435643b."]
>

Are you saying you want a list of every item that starts with an "a"
and ends with a "b"? If so, the above list is not what you're after.
It only contains two items:
  a2345b.
  a45453b. a325643b. a435643b.

You can verify this by trying len(["a2345b.", "a45453b. a325643b.
a435643b."]).  You can also see that each item is wrapped in double
quotes and separated by a comma.

> And I feel I still don't fully understand regular expression's logic. I
> do not understand the results below:

Try reading this:
http://www.amk.ca/python/howto/regex/

I've found it to be a very gentle and useful introduction to regexes.

It explains, among other things, what the search and findall methods
do. If I'm understanding your problem correctly, you probably want the
findall method:

You should definitely take the time to read up on regexes. Your
patterns grew too complex for this problem (again, if I'm
understanding you right) which is probably why you're not
understanding your results.

In [9]:   re.findall(r'a[a-z0-9]+b',text)
Out[9]: ['a2345b', 'a45453b', 'a325643b', 'a435643b']

There are other ways to perform the above, for instance using the "\w"
metacharacter to match any alphanumeric.

In [20]: re.findall(r'a\w+b',text)
Out[20]: ['a2345b', 'a45453b', 'a325643b', 'a435643b']

Or, to get even more (needlessly) complicated:

In [21]: re.findall(r'\ba\w+b\b',text)
Out[21]: ['a2345b', 'a45453b', 'a325643b', 'a435643b']

As you learned, regexes can get really complicated, really quickly if
you don't understand the syntax.  Others with more experience might
offer more elegant solutions to your problem, but I'd still encourage
you to read up on the basics and get comfortable with the re module.
It's a great tool once you understand it.

Best of luck,
Serdar


More information about the Tutor mailing list