Large Amount of Data

Jack nospam at invalid.com
Sat May 26 04:17:26 EDT 2007


I have tens of millions (could be more) of document in files. Each of them 
has other
properties in separate files. I need to check if they exist, update and 
merge properties, etc.
And this is not a one time job. Because of the quantity of the files, I 
think querying and
updating a database will take a long time...

Let's say, I want to do something a search engine needs to do in terms of 
the amount of
data to be processed on a server. I doubt any serious search engine would 
use a database
for indexing and searching. A hash table is what I need, not powerful 
queries.

"John Nagle" <nagle at animats.com> wrote in message 
news:nfR5i.4273$C96.1640 at newssvr23.news.prodigy.net...
> Jack wrote:
>> I need to process large amount of data. The data structure fits well
>> in a dictionary but the amount is large - close to or more than the size
>> of physical memory. I wonder what will happen if I try to load the data
>> into a dictionary. Will Python use swap memory or will it fail?
>>
>> Thanks.
>
>     What are you trying to do?  At one extreme, you're implementing 
> something
> like a search engine that needs gigabytes of bitmaps to do joins fast as
> hundreds of thousands of users hit the server, and need to talk seriously
> about 64-bit address space machines.  At the other, you have no idea how
> to either use a database or do sequential processing.  Tell us more.
>
> John Nagle 





More information about the Python-list mailing list