regex over files

Robin Becker robin at reportlab.com
Mon Apr 25 11:55:57 EDT 2005


Gerald Klix wrote:
> Map the file into RAM by using the mmap module.
> The file's contents than is availabel as a seachable string.
> 

that's a good idea, but I wonder if it actually saves on memory? I just tried 
regexing through a 25Mb file and end up with 40Mb as working set (it rose 
linearly as the loop progessed through the file). Am I actually saving anything 
by not letting normal vm do its thing?

> HTH,
> Gerald
> 
> Robin Becker schrieb:
> 
>> Is there any way to get regexes to work on non-string/unicode objects. 
>> I would like to split large files by regex and it seems relatively 
>> hard to do so without having the whole file in memory. Even with 
>> buffers it seems hard to get regexes to indicate that they failed 
>> because of buffer termination and getting a partial match to be 
>> resumable seems out of the question.
>>
>> What interface does re actually need for its src objects?
> 
> 


-- 
Robin Becker




More information about the Python-list mailing list