[Tutor] re module / separator

Serdar Tumgoren zstumgoren at gmail.com
Wed Jun 24 23:57:59 CEST 2009


As usual, Kent Johnson has swooped in an untangled the mess with a
clear explanation.

By the time a regex gets this complicated, I typically start thinking
of ways to simplify or avoid them altogether.

Below is the code I came up with. It goes through some gymnastics and
can surely stand improvement, but it seems to get the job done.
Suggestions are welcome.


In [83]: text
Out[83]: 'a2345b. f325. a45453b. a325643b. a435643b. g234324b.'

In [84]: textlist = text.split()

In [85]: textlist
Out[85]: ['a2345b.', 'f325.', 'a45453b.', 'a325643b.', 'a435643b.', 'g234324b.']

In [86]: newlist = []

In [87]: pat = re.compile(r'a\w+b\.')

In [88]: for item in textlist:
   ....:     if pat.match(item):
   ....:         newlist.append(item)
   ....:     else:
   ....:         newlist.append("|")
   ....:
   ....:

In [89]: newlist
Out[89]: ['a2345b.', '|', 'a45453b.', 'a325643b.', 'a435643b.', '|']

In [90]: lastlist = ''.join(newlist)

In [91]: lastlist
Out[91]: 'a2345b.|a45453b.a325643b.a435643b.|'

In [92]: lastlist.rstrip("|").split("|")
Out[92]: ['a2345b.', 'a45453b.a325643b.a435643b.']


More information about the Tutor mailing list