Is Python Suitable for Large Find & Replace Operations?

Peter Hansen peter at engcorp.com
Fri Jun 17 09:18:38 EDT 2005


rbt wrote:
> The script is too long to post in its entirety. In short, I open the
> files, do a binary read (in 1MB chunks for ease of memory usage) on them
> before placing that read into a variable and that in turn into a list
> that I then apply the following re to
> 
> ss = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')
> 
> like this:
> 
> for chunk in whole_file:
>     search = ss.findall(chunk)
>     if search:
>         validate(search)

This seems so obvious that I hesitate to ask, but is the above really a 
simplification of the real code, which actually handles the case of SSNs 
that lie over the boundary between chunks?  In other words, what happens 
if the first chunk has only the first four digits of the SSN, and the 
rest lies in the second chunk?

-Peter



More information about the Python-list mailing list