Reading a file, sans whitespace

Michael Geary Mike at DeleteThis.Geary.com
Sun May 23 19:47:05 EDT 2004


Uri wrote:
> Thanks guys! Tim's idea seems like the easiest for a newbie
> to implement, but I'll play around with Mike's pre-compiling
> thing, too. I don't really understand what the compile part
> does, could you expound upon that?

It's just a way to make a regular expression more efficient when you use it
repeatedly. When you use a regular expression, Python does two things:
First, it compiles the regular expression into a special customized function
that implements the string matching that the regular expression specifies.
Then, it runs that function on the string you're using. If you're going to
use the regular expression repeatedly, you can compile it every time, or
compile it once and use the precompiled version after that.

For example, these do exactly the same thing:

import re
for line in file( 'inputFile' ).readlines():
    print re.split( '\s+', line.strip() )

import re
reWhitespace = re.compile( '\s+' )
for line in file( 'inputFile' ).readlines():
    print reWhitespace.split( line.strip() )

But for a large file, the second version will be faster because the regular
expression is compiled only once instead of every time through the loop. It
may not make much difference for a simple regular expression like this (and
of course string.split is even simpler and probably faster), but for a
complicated regular expression it will make more of a difference in
performance.

-Mike





More information about the Python-list mailing list