Is Python Suitable for Large Find & Replace Operations?
rbt
rbt at athop1.ath.vt.edu
Fri Jun 17 09:45:59 EDT 2005
On Fri, 2005-06-17 at 09:18 -0400, Peter Hansen wrote:
> rbt wrote:
> > The script is too long to post in its entirety. In short, I open the
> > files, do a binary read (in 1MB chunks for ease of memory usage) on them
> > before placing that read into a variable and that in turn into a list
> > that I then apply the following re to
> >
> > ss = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')
> >
> > like this:
> >
> > for chunk in whole_file:
> > search = ss.findall(chunk)
> > if search:
> > validate(search)
>
> This seems so obvious that I hesitate to ask, but is the above really a
> simplification of the real code, which actually handles the case of SSNs
> that lie over the boundary between chunks? In other words, what happens
> if the first chunk has only the first four digits of the SSN, and the
> rest lies in the second chunk?
>
> -Peter
No, that's a good question. As of now, there is nothing to handle the
scenario that you bring up. I have considered this possibility (rare but
possible). I have not written a solution for it. It's a very good point
though.
Is that not why proper software is engineered? Anyone can build a go
cart (write a program), but it takes a team of engineers and much
testing to build a car, no? Which woulu you rather be riding in during a
crash? I wish upper mgt had a better understanding of this ;)
More information about the Python-list
mailing list