Fastest database solution

Roger Binns rogerb at rogerbinns.com
Fri Feb 6 04:12:45 EST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Curt Hash wrote:
> I started out using sqlite3, but was not satisfied with the performance
> results. I then tried using psycopg2 with a local postgresql server, and
> the performance got even worse. 

SQLite is in the same process.  Communication with postgres is via
another process so marshalling the traffic and context switches will
impose overhead as you found.

> I don't think
> my code/queries are inherently slow, but I'm not a DBA or a very
> accomplished Python developer, so I could be wrong.

It doesn't sound like a database is the best solution to your issue
anyway.  A better solution would likely be some form of hashing the
lines and storing something that gives quick hash lookups.  The hash
would have to do things like not care what variable names are used etc.

There are already lots of plagiarism detectors out there so it may be
more prudent using one of them, or at least learn how they do things so
your own system could improve on them.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkmL/wgACgkQmOOfHg372QTAmACg0INMfUKA10Uc6UJwNhYhDeoV
EKwAoKpDMRzr7GzCKeYxn93TU69nDx4X
=4r01
-----END PGP SIGNATURE-----




More information about the Python-list mailing list