regex over files

Steve Holden steve at holdenweb.com
Tue Apr 26 05:36:07 EDT 2005


Robin Becker wrote:
> Richard Brodie wrote:
> 
>> "Robin Becker" <robin at reportlab.com> wrote in message
>> news:mailman.2469.1114444689.1799.python-list at python.org...
>>
>>> Gerald Klix wrote:
>>>
>>>> Map the file into RAM by using the mmap module.
>>>> The file's contents than is availabel as a seachable string.
>>>>
>>>
>>> that's a good idea, but I wonder if it actually saves on memory? I 
>>> just tried
>>> regexing through a 25Mb file and end up with 40Mb as working set (it 
>>> rose
>>> linearly as the loop progessed through the file). Am I actually 
>>> saving anything
>>> by not letting normal vm do its thing?
>>
>>
>>
>> You aren't saving memory in that sense, no. If you have any RAM spare the
>> file will end up in it. However, if you are short on memory though, 
>> mmaping the
>> file gives the VM the opportunity to discard pages from the file, 
>> instead of paging
>> them out. Try again with a 25Gb file and watch the difference ;) YMMV.
>>
>>
> 
> :)
> 
> So we avoid dirty page writes etc etc. However, I still think I could 
> get away with a small window into the file which would be more efficient.

I seem to remember that the Medusa code contains a fairly good 
overlapped search for a terminator string, if you want to chunk the file.

Take a look at the handle_read() method of class async_chat in the 
standard library's asynchat.py.

regards
  Steve
-- 
Steve Holden        +1 703 861 4237  +1 800 494 3119
Holden Web LLC             http://www.holdenweb.com/
Python Web Programming  http://pydish.holdenweb.com/




More information about the Python-list mailing list