Fastest database solution
M.-A. Lemburg
mal at egenix.com
Fri Feb 6 07:19:55 EST 2009
On 2009-02-06 09:10, Curt Hash wrote:
> I'm writing a small application for detecting source code plagiarism that
> currently relies on a database to store lines of code.
>
> The application has two primary functions: adding a new file to the database
> and comparing a file to those that are already stored in the database.
>
> I started out using sqlite3, but was not satisfied with the performance
> results. I then tried using psycopg2 with a local postgresql server, and the
> performance got even worse. My simple benchmarks show that sqlite3 is an
> average of 3.5 times faster at inserting a file, and on average less than a
> tenth of a second slower than psycopg2 at matching a file.
>
> I expected postgresql to be a lot faster ... is there some peculiarity in
> psycopg2 that could be causing slowdown? Are these performance results
> typical? Any suggestions on what to try from here? I don't think my
> code/queries are inherently slow, but I'm not a DBA or a very accomplished
> Python developer, so I could be wrong.
>
> Any advice is appreciated.
In general, if you do bulk insert into a large table, you should consider
turning off indexing on the table and recreate/update the indexes in one
go afterwards.
But regardless of this detail, I think you should consider a filesystem
based approach. This is going to be a lot faster than using a
database to store the source code line by line. You can still use
a database for the administration and indexing of the data, e.g.
by storing a hash of each line in the database.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Feb 06 2009)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the Python-list
mailing list