Reading a file, sans whitespace

Terry Reedy tjreedy at udel.edu
Mon May 24 10:35:57 EDT 2004


"Michael Geary" <Mike at DeleteThis.Geary.com> wrote in message
news:10b2dvrfqhmbm5a at corp.supernews.com...
> Uri wrote:
> For example, these do exactly the same thing:
>
> import re
> for line in file( 'inputFile' ).readlines():
>     print re.split( '\s+', line.strip() )
>
> import re
> reWhitespace = re.compile( '\s+' )
> for line in file( 'inputFile' ).readlines():
>     print reWhitespace.split( line.strip() )
>
> But for a large file, the second version will be faster because the
regular
> expression is compiled only once instead of every time through the loop.

I am curious whether you have actually timed this or seen others timings.
My impression (from other posts and from reading the code a year ago) is
that the current re implementation caches compiled re's
(recache[hash(restring)] = re.compile(restring)) just so that the first
example will *not* recompile every time thru the loop.  If so, I think one
should name an re for pretty much the same reasons as for anything else:
conceptual chunking and reuse in multiple places.

Terry J. Reedy







More information about the Python-list mailing list