Large Amount of Data

Sat May 26 05:40:21 EDT 2007

In <damdnZoYo5SScMrbnZ2dnUVZ_jadnZ2d at comcast.com>, Jack wrote:

> I have tens of millions (could be more) of document in files. Each of them 
> has other properties in separate files. I need to check if they exist,
> update and merge properties, etc.
> And this is not a one time job. Because of the quantity of the files, I
> think querying and updating a database will take a long time...

But databases are exactly build and optimized to handle large amounts of
data.

> Let's say, I want to do something a search engine needs to do in terms
> of the amount of data to be processed on a server. I doubt any serious
> search engine would use a database for indexing and searching. A hash
> table is what I need, not powerful queries.

You are not forced to use complex queries and an index is much like a hash
table, often even implemented as a hash table.  And a database doesn't
have to be an SQL database.  The `shelve` module or an object DB like zodb
or Durus are databases too.

Maybe you should try it and measure before claiming it's going to be too
slow and spend time to implement something like a database yourself.

Ciao,
	Marc 'BlackJack' Rintsch