Sorted and reversed on huge dict ?

Paul Rubin http
Fri Nov 3 16:59:39 EST 2006


vd12005 at yahoo.fr writes:
> but maybe if keys of dicts are not duplicated in memory it can be done
> (as all dicts will have the same keys, with different (count) values)?

There will still be a pointer for each key, the strings themselves
won't be duplicated.  

> memory is 4Gb of ram, 

That sounds like enough ram to hold all your stuff easily.

> is there a good way to know how much ram is used
> directly from python  (or should i rely on 'top' and other unix
> command? 

I think try resource.getrusage()

> by now around 220mb is used for around 200.000 words handled in 15
> dicts)

That sounds very manageable given a 4gb machine.  Otherwise, since
it sounds like you're trying to scan some big text corpus and figure
out word frequencies, you could do it the old fashioned way.  Write
the words one per line into a big file, then sort the file with the
Unix sort utility (which is an external sort), then read the sorted
file (or pipe it through "uniq -c") to figure out the counts.



More information about the Python-list mailing list