using sqlite3 - execute vs. executemany; committing ...

David wizzardx at gmail.com
Sun May 4 18:36:37 EDT 2008


> >
> Thank you very much for your comprehensive answer and detailed informations,
> David; I really appreciate it!

You're welcome.

>
> As for the number of items, there would be approx. 34 000 calls of execute()
> in my present code, in the final version probably more; I think executmany
> is more efficient here, if there aren't any drawbacks.

executemany is probably a good idea here. If memory becomes a problem
at some point (eg: millions of lines) you'll probably want to use an
on-disk database (I suggest postgresql), and read in batches of say
1000, which you save to the database with a single executemany.

>  I thought a database would perform much better in such cases (and the
> preliminary tests confirm this);

It sounds like your Python code has some serious (algorithm or data
structure) problems. Python code (with appropriate dictionaries,
caches, memoization etc) should always be able to perform better than
an in-memory SQL database (even optimized with indexes, etc). You
definitely need to check that. You don't even create any db indexes
(in the code you gave), but sqllite still performs better than your
structs which do use dicts!

Basically what you're saying is that (the equivalent, but with SQL
overhead) of a set of lists containing tuples, iwthout any dicts for
lookup, performs better than your original sructure.

Unless you actually need a database (too much data for memory, or you
need persistant data, possibly for sharing between apps) I suggest
sticking to Python structures, and optimizing your structures and
look-up algorithms.

One reasonable reason for you to use a database would be if
maintaining your dicts (for fast lookup) became too complicated and
you wanted the DBM to keep indexes automatically updated for you.

> however now I see the possible risks too.
> I'll look into other possible approaches. There will be clearly a finite set
> of the column names, hence it is possible to define the db at the beginning
> and fill it with values only after parsing the texts; the problem is, this
> can't be done untill the texts are available, I just thought, a more generic
> approach would be be more easily extensible and also a bit simpler.

If you don't know in advance what the fields are, then how does your
python app work? Are all queries in the user interface (that refer to
this database) all arbitrary and initiated by a human?

Also, how do you setup foreign relationships between tables, and
indexes (for performance), if you don't know what fields are going to
be present?

Maybe this would be more clear (to me) if you gave a short example of
the data, with a note or two to explain where there are performance
problems.

> the parsing the text to get the tags with their values are done in the
> program itself

What kind of tags? From your description it sounds like you have
markers inside the text (basically bookmarks with arbitrary metadata,
hence your need for dynamic schema), which the user can query, so they
can quickly jump between parts of the text, based on their search
results. Is this about right?

> Just for information, what are valid table and column names in sqlite?
> (Unfortunately, I couldn't find any reference for that (maybe except the
> reserved sqlite_master);  as quoted names, everything I tried, worked fine,
> also whitespace, various unicode characters etc.; of course, I can imagine,

I'm not sure about that.

If you want to be safe:

1) Add a prefix to your table and field names
2) 'normalise' all table and field names (eg: convert to lower case,
remove non-alphabetic characters, etc).
3) Make sure that you don't get the same 'normalised' name for 2
different incoming strings.
4) Be prepared to scrap the schema-generating approach if your app's
database requirements change (eg: you need to share the db with other
apps which have their own tables that you don't want to stomp over).

David.



More information about the Python-list mailing list