Is Python Suitable for Large Find & Replace Operations?
Gilles Lenfant
gilles.lenfant at nospam.com
Tue Jun 14 13:51:57 EDT 2005
rbt a écrit :
> Here's the scenario:
>
> You have many hundred gigabytes of data... possible even a terabyte or
> two. Within this data, you have private, sensitive information (US
> social security numbers) about your company's clients. Your company has
> generated its own unique ID numbers to replace the social security numbers.
>
> Now, management would like the IT guys to go thru the old data and
> replace as many SSNs with the new ID numbers as possible. You have a tab
> delimited txt file that maps the SSNs to the new ID numbers. There are
> 500,000 of these number pairs. What is the most efficient way to
> approach this? I have done small-scale find and replace programs before,
> but the scale of this is larger than what I'm accustomed to.
>
> Any suggestions on how to approach this are much appreciated.
Are this huge amount of data to rearch/replace stored in an RDBMS or in
flat file(s) with markup (XML, CSV, ...) ?
--
Gilles
More information about the Python-list
mailing list