Hashed lookups for tabular data

Kushal Kumaran kushal at locationd.net
Mon Jan 19 12:44:31 EST 2015


"Joseph L. Casale" <jcasale at activenetwerx.com> writes:

>> So presumably your data's small enough to fit into memory, right? If
>> it isn't, going back to the database every time would be the best
>> option. But if it is, can you simply keep three dictionaries in sync?
>
> Hi Chris,
> Yeah the data can fit in memory and hence the desire to avoid a trip here.
>
>> row = (foo, bar, quux) # there could be duplicate quuxes but not foos or bars
>> foo_dict = {}
>> bar_dict = {}
>> quux_dict = collections.defaultdict(set)
>> 
>> foo_dict[row[0]] = row
>> bar_dict[row[1]] = row
>> quux_dict[row[2]].add(row)
>
> This is actually far simpler than I had started imagining, however the row data
> is duplicated. I am hacking away at an attempt with references to one copy of
> the row.
>
> Its kind of hard to recreate an sql like object in Python with indexes and the
> inherent programmability against a single copy of data.
>

If you want an sql-like interface, you can simply create an in-memory
sqlite3 database.

 import sqlite3
 db = sqlite3.Connection(':memory:')

You can create indexes as you need, and query using SQL.  Later, if you
find the data getting too big to fit in memory, you can switch to using
an on-disk database instead without significant changes to the code.

-- 
regards,
kushal



More information about the Python-list mailing list