Is Python Suitable for Large Find & Replace Operations?
Peter Hansen
peter at engcorp.com
Fri Jun 17 09:18:38 EDT 2005
rbt wrote:
> The script is too long to post in its entirety. In short, I open the
> files, do a binary read (in 1MB chunks for ease of memory usage) on them
> before placing that read into a variable and that in turn into a list
> that I then apply the following re to
>
> ss = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')
>
> like this:
>
> for chunk in whole_file:
> search = ss.findall(chunk)
> if search:
> validate(search)
This seems so obvious that I hesitate to ask, but is the above really a
simplification of the real code, which actually handles the case of SSNs
that lie over the boundary between chunks? In other words, what happens
if the first chunk has only the first four digits of the SSN, and the
rest lies in the second chunk?
-Peter
More information about the Python-list
mailing list