Is Python Suitable for Large Find & Replace Operations?

rbt rbt at athop1.ath.vt.edu
Fri Jun 17 09:45:59 EDT 2005


On Fri, 2005-06-17 at 09:18 -0400, Peter Hansen wrote:
> rbt wrote:
> > The script is too long to post in its entirety. In short, I open the
> > files, do a binary read (in 1MB chunks for ease of memory usage) on them
> > before placing that read into a variable and that in turn into a list
> > that I then apply the following re to
> > 
> > ss = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')
> > 
> > like this:
> > 
> > for chunk in whole_file:
> >     search = ss.findall(chunk)
> >     if search:
> >         validate(search)
> 
> This seems so obvious that I hesitate to ask, but is the above really a 
> simplification of the real code, which actually handles the case of SSNs 
> that lie over the boundary between chunks?  In other words, what happens 
> if the first chunk has only the first four digits of the SSN, and the 
> rest lies in the second chunk?
> 
> -Peter

No, that's a good question. As of now, there is nothing to handle the
scenario that you bring up. I have considered this possibility (rare but
possible). I have not written a solution for it. It's a very good point
though.

Is that not why proper software is engineered? Anyone can build a go
cart (write a program), but it takes a team of engineers and much
testing to build a car, no? Which woulu you rather be riding in during a
crash? I wish upper mgt had a better understanding of this ;)




More information about the Python-list mailing list