[Tutor] read file and match string

Kalle Svensson kalle@gnupung.net
Thu, 31 Jan 2002 23:59:40 +0100


[Jim Ragsdale]
> import mmap, re
> def search(filename, rx):
>     f = open(filename, 'r+')
>     mem = mmap.mmap(f.fileno(), 0)
>     for match in rx.finditer(mem):
>         print match.group(0)
>     mem.close()
>     f.close()
[...]
> Can someone explain the top snippet to me? looks like a function
> that takes a filename argument and what is the rx?

A regular expression object, like re.compile("foo").

> Is this what is needed for what i am doing or is it slightly
> different?

I believe it's slightly different.  The regular expression in the new
function should match to the end of the line.  If you had
p = re.compile("foo")
you want
rx = re.compile(".*foo.*")
now (I think).
Also, it prints the results to standard output, instad of writing them
to a result file.

> The mem line looks like it opens the file like xreadlines.

The mem line maps the file into memry, thereby making access to it
faster.  It might be a bad idea if your file is very large, say as
large as your RAM.

> Any help would be appreciated. Thanks!

If the string you're searching for is simple, it might be faster to
use the string find method instead of regular expressions.  Also, if
you're using an old version of python (1.5.2 or 2.0), try upgrading to
2.1.2 or 2.2, I think the file reading stuff (with xreadlines, like
you used first) has been optimized a bit in those newer versions.

Also, a warning:  I don't use mmap or re very much, and might be
totally wrong.  I hope somebody will correct me in that case.

Peace,
  Kalle
-- 
Kalle Svensson (kalle@gnupung.net) - Laziness, impatience, hubris: Pick two!
English: http://www.gnupung.net/  Svenska: http://www.lysator.liu.se/~kalle/
Stuff: ["http://www.%s.org/" % x for x in "gnu debian python emacs".split()]