[Tutor] Regular expression on python

Peter Otten __peter__ at web.de
Wed Apr 15 11:42:34 CEST 2015


Albert-Jan Roskam wrote:

> On Tue, 4/14/15, Peter Otten <__peter__ at web.de> wrote:

>>> >>> pprint.pprint(
>>> ... [(k, int(v)) for k, v in
>>> ...
>re.compile(r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*").findall(line)])
>>> [('Input Read Pairs', 2127436),
>>>('Both Surviving', 1795091),
>>>('Forward Only Surviving', 17315),
>>>('Reverse Only Surviving', 6413),
>>>('Dropped', 308617)]

> Yes, nice, but why do you use
> re.compile(regex).findall(line)
> and not
> re.findall(regex, line)
> 
> I know what re.compile is for. I often use it outside a loop and then
> actually use the compiled regex inside a loop, I just haven't see the way
> you use it before.

What you describe here is how I use regular expressions most of the time.
Also, re.compile() behaves the same over different Python versions while the 
shortcuts for the pattern methods changed signature over time. 
Finally, some have a gotcha. Compare:

>>> re.compile("a", re.IGNORECASE).sub("b", "aAAaa")
'bbbbb'
>>> re.sub("a", "b", "aAAaa", re.IGNORECASE)
'bAAba'

Did you expect that? Congrats for thorough reading of the docs ;)

> personally, I prefer to be verbose about being verbose, ie use the
> re.VERBOSE flag. But perhaps that's just a matter of taste. Are there any
> use cases when the ?iLmsux operators are clearly a better choice than the
> equivalent flag? For me, the mental burden of a regex is big enough
> already without these operators. 

I pass flags separately myself, but

>>> re.sub("(?i)a", "b", "aAAaa")
'bbbbb'

might serve as an argument for inlined flags.



More information about the Tutor mailing list