regex over files

Skip Montanaro skip at pobox.com
Tue Apr 26 15:53:49 EDT 2005


    >> It's hard to imagine how sliding a small window onto a file within Python
    >> would be more efficient than the operating system's paging system. ;-)

    Robin> well it might be if I only want to scan forward through the file
    Robin> (think lexical analysis). Most lexical analyzers use a buffer and
    Robin> produce a stream of tokens ie a compressed version of the
    Robin> input. There are problems crossing buffers etc, but we never
    Robin> normally need the whole file in memory.

If I mmap() a file, it's not slurped into main memory immediately, though as
you pointed out, it's charged to my process's virtual memory.  As I access
bits of the file's contents, it will page in only what's necessary.  If I
mmap() a huge file, then print out a few bytes from the middle, only the
page containing the interesting bytes is actually copied into physical
memory.

Skip



More information about the Python-list mailing list