regex over files
Steve Holden
steve at holdenweb.com
Tue Apr 26 05:36:07 EDT 2005
Robin Becker wrote:
> Richard Brodie wrote:
>
>> "Robin Becker" <robin at reportlab.com> wrote in message
>> news:mailman.2469.1114444689.1799.python-list at python.org...
>>
>>> Gerald Klix wrote:
>>>
>>>> Map the file into RAM by using the mmap module.
>>>> The file's contents than is availabel as a seachable string.
>>>>
>>>
>>> that's a good idea, but I wonder if it actually saves on memory? I
>>> just tried
>>> regexing through a 25Mb file and end up with 40Mb as working set (it
>>> rose
>>> linearly as the loop progessed through the file). Am I actually
>>> saving anything
>>> by not letting normal vm do its thing?
>>
>>
>>
>> You aren't saving memory in that sense, no. If you have any RAM spare the
>> file will end up in it. However, if you are short on memory though,
>> mmaping the
>> file gives the VM the opportunity to discard pages from the file,
>> instead of paging
>> them out. Try again with a 25Gb file and watch the difference ;) YMMV.
>>
>>
>
> :)
>
> So we avoid dirty page writes etc etc. However, I still think I could
> get away with a small window into the file which would be more efficient.
I seem to remember that the Medusa code contains a fairly good
overlapped search for a terminator string, if you want to chunk the file.
Take a look at the handle_read() method of class async_chat in the
standard library's asynchat.py.
regards
Steve
--
Steve Holden +1 703 861 4237 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/
More information about the Python-list
mailing list