Most efficient way to build very large dictionaries

Roger Binns rogerb at rogerbinns.com
Wed Dec 24 03:50:49 EST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

python at bdurham.com wrote:
> Thank you for your suggestion about looking at SQLite. I haven't
> compared the performance of SQLite to Python dictionaries, but I'm
> skeptical that SQLite would be faster than in-memory Python dictionaries
> for the type of analysis I'm doing.

I'd recommend at least trying a test just to see.  As an example SQLite
uses indices which will be faster than Python dicts for some set
operations.  (And if you aren't careful, your various Python based
optimizations will end up duplicating what SQLite does internally anyway :-)

> Prior to my use of Python, my
> customer used a very expensive Oracle system to analyze their log files.
> My simple Python scripts are 4-20x faster than the Oracle PL/SQL they
> are replacing - and run on much cheaper hardware.

SQLite is not like Oracle or any similar database system.  It does not
operate over the network or similar connection. It is a library in your
process that has an optimized disk storage format (single file) and a
SQL parser that generates bytecode for a special purpose virtual machine
in pretty much the same way CPython operates.  The performance
improvements you are seeing with Python over Oracle are exactly the same
range people see with SQLite over Oracle.  One common usage reported on
the SQLite mailing list is people copying data out of Oracle and running
their analysis in SQLite because of the performance advantages.

> Note: Memory is currently not a concern for me so I don't need SQLite's
> ability to work with data sets larger than my physical memory.

The pragmas tune things like cache sizes.  The SQLite default is 2MB,
relying on the operating system for caching beyond that.  Bumping up
that kind of size was my suggestion :-)

Roger

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAklR9+UACgkQmOOfHg372QSMbwCdGS5S2/96fWW8knjfBVqReAfV
AEwAn2Yc+L9BEZgT69OjwtyqxLtifVpU
=mPfy
-----END PGP SIGNATURE-----




More information about the Python-list mailing list