Scalable python dict {'key_is_a_string': [count, some_val]}

Arnaud Delobelle arnodel at googlemail.com
Sat Feb 20 06:03:47 EST 2010


On 20 Feb, 06:36, krishna <krishna.k.0... at gmail.com> wrote:
> I have to manage a couple of dicts with huge dataset (larger than
> feasible with the memory on my system), it basically has a key which
> is a string (actually a tuple converted to a string) and a two item
> list as value, with one element in the list being a count related to
> the key. I have to at the end sort this dictionary by the count.
>
> The platform is linux. I am planning to implement it by setting a
> threshold beyond which I write the data into files (3 columns: 'key
> count some_val' ) and later merge those files (I plan to sort the
> individual files by the key column and walk through the files with one
> pointer per file and merge them; I would add up the counts when
> entries from two files match by key) and sorting using the 'sort'
> command. Thus the bottleneck is the 'sort' command.
>
> Any suggestions, comments?
>
> By the way, is there a linux command that does the merging part?
>
> Thanks,
> Krishna

Have you looked here? http://docs.python.org/library/persistence.html

--
Arnaud



More information about the Python-list mailing list