Hashed lookups for tabular data

Joseph L. Casale jcasale at activenetwerx.com
Mon Jan 19 12:09:05 EST 2015


> So presumably your data's small enough to fit into memory, right? If
> it isn't, going back to the database every time would be the best
> option. But if it is, can you simply keep three dictionaries in sync?

Hi Chris,
Yeah the data can fit in memory and hence the desire to avoid a trip here.

> row = (foo, bar, quux) # there could be duplicate quuxes but not foos or bars
> foo_dict = {}
> bar_dict = {}
> quux_dict = collections.defaultdict(set)
> 
> foo_dict[row[0]] = row
> bar_dict[row[1]] = row
> quux_dict[row[2]].add(row)

This is actually far simpler than I had started imagining, however the row data
is duplicated. I am hacking away at an attempt with references to one copy of
the row.

Its kind of hard to recreate an sql like object in Python with indexes and the
inherent programmability against a single copy of data.

I'll mock this up and see what it profiles like.
Thanks!
jlc


More information about the Python-list mailing list