using sqlite3 - execute vs. executemany; committing ...

Mon May 5 08:59:30 EDT 2008

>
>
> Thanks for further comments David,

You are right, the choice of an appropriate data structure isn't easy; I
described the requirements in an earlier post:
http://mail.python.org/pipermail/python-list/2007-December/469506.html

Here is a slightly modified sample of the text source I currently use in my
present (ad hoc and quite dirty) code:

<KC 12><KNJ 13><VCC 0><VN rn_1_vers_1><FN 21rb><b>Wi den L...n
<VN rn_2_vers_1>m...k gein den P<u>ra</u>gern.</b>

<VNJ 1><VCC 1><VN 1>Wen dy h<u>er</u> czu T... vf dy vient
<VNJ 2><VCC 2><VN 2>w<u>er</u>n gesampt czu h<u>er</u>tin strit,
<VNJ 3><VN 3>dez m...s vs gein obint
<VNJ 4><VCC 3><VN 4>W... stunt s...t.
<VNJ 5><VCC 3><VN 5>Doch gar Sthir czu em kam,
<VNJ 6><VCC 5><VN 6>h...g W... sich ken em nom.
<VNJ 7><VCC 6><VN 7>Ez irgink ubil den L...:
<VNJ 8><VCC 7><VN 8>S... slug W...u<u>m</u> daz houbt ab vil gern.
<VNJ 9><VN 9>Den P<u>ra</u>g<u>er</u>n mochtin dy L...n<u>er</u> nit intken,
<VNJ 10><VCC 8><VN 10>si slugin si, das si plutis hin runnen.

<VNJ 00><VCC 9><VN _rn_1_vers_11><b>Wi de<u>m</u> Stracka geschach,
<VN rn_2_vers_11>du er do heim sin husf<u>ro</u>win sach.</b>

<VNJ 11><VCC 9><VN 11>Aus allin L...t<u>er</u>n ein<u>er</u> hin quom,
<VNJ 12><VCC 10><VN 12><FN 21va>der waz S...<u>ra</u>ka gna<u>n</u>t mit
de<u>m</u> nom.
<VNJ 13><VCC 11><VN 13>Em ein vetil den rat gab, ...

========================================
The corresponding data stored as a dict  look like the following dict (here
sorted, at least on the highest level):
(<u>, <i>, <b> ... are ignored here, these are used for the text formatting
for the gui and aren't retrieved any more)

{0: {u'KC': u'12', u'VCC': u'0', u'KNJ': u'13', u'FN': u'21rb', u'VN':
u'rn_1_vers_1'},
13: {u'KC': u'12', u'VCC': u'0', u'KNJ': u'13', u'VN': u'rn_2_vers_1',
u'FN': u'21rb'},
39: {u'KC': u'12', u'VCC': u'1', u'KNJ': u'13', u'VN': u'1', u'VNJ': u'1',
u'FN': u'21rb'},
71: {u'KC': u'12', u'VCC': u'2', u'KNJ': u'13', u'FN': u'21rb', u'VNJ':
u'2', u'VN': u'2'},
102: {u'KC': u'12', u'VCC': u'2', u'KNJ': u'13', u'FN': u'21rb', u'VNJ':
u'3', u'VN': u'3'},
126: {u'KC': u'12', u'VCC': u'3', u'KNJ': u'13', u'VN': u'4', u'VNJ': u'4',
u'FN': u'21rb'},
144: {u'KC': u'12', u'VCC': u'3', u'KNJ': u'13', u'FN': u'21rb', u'VNJ':
u'5', u'VN': u'5'},
171: {u'KC': u'12', u'VCC': u'5', u'KNJ': u'13', u'VN': u'6', u'VNJ': u'6',
u'FN': u'21rb'},
199: {u'KC': u'12', u'VCC': u'6', u'KNJ': u'13', u'FN': u'21rb', u'VNJ':
u'7', u'VN': u'7'},
224: {u'KC': u'12', u'VCC': u'7', u'KNJ': u'13', u'VN': u'8', u'VNJ': u'8',
u'FN': u'21rb'},
264: {u'KC': u'12', u'VCC': u'7', u'KNJ': u'13', u'VN': u'9', u'VNJ': u'9',
u'FN': u'21rb'},
307: {u'KC': u'12', u'VCC': u'8', u'KNJ': u'13', u'FN': u'21rb', u'VNJ':
u'10', u'VN': u'10'},
348: {u'KC': u'12', u'VCC': u'9', u'KNJ': u'13', u'VN': u'_rn_1_vers_11',
u'VNJ': u'00', u'FN': u'21rb'},
373: {u'KC': u'12', u'VCC': u'9', u'KNJ': u'13', u'FN': u'21rb', u'VNJ':
u'00', u'VN': u'rn_2_vers_11'},
409: {u'KC': u'12', u'VCC': u'9', u'KNJ': u'13', u'VN': u'11', u'VNJ':
u'11', u'FN': u'21rb'},
444: {u'KC': u'12', u'VCC': u'10', u'KNJ': u'13', u'VN': u'12', u'VNJ':
u'12', u'FN': u'21va'},
480: {u'KC': u'12', u'VCC': u'11', u'KNJ': u'13', u'FN': u'21va', u'VNJ':
u'13', u'VN': u'13'}}

The keys are indices of the places in text, where the
value of some tag/property ... changes; also the not changing values are copied
in the inner dicts, hence after determining the index of the first such
"boundary" preceeding the given text position (using bisect), the
complete parameters can be retrieved with a single dict lookup.
The other way seems much more complicated with this dict
structure. I.e. it should be possible to find the index/or all
indices/or the lowest index given some combinations of tag values - e.g.
with the input: u'FN': u'21va' >> 444 should  be found (as the lowest index
with this value)
for u'KC': u'12' AND u'VN': u'3' >> 102
(there are of course several other combinations to search for)
I couldn't find an effective way to lookup the specific values of the inner
dict (other than nested loops), probably using a set of tuples instead the
inner dict would allow the checks for intersection with the input data, but
this also seems quite expensive, as it requires the loop with checks over
the entire data ... the same would probably be true for multiple reversed
dicts keyed on the tags and containing the indices and the
corresponding values (but I actually haven't  tested the latter two
approaches).

The approach with sqlite I'm testing now simply contains columns for all
relevant "tags" plus a column for text index; this way I find it quite easy
to query the needed information in both directions (index >> tags; tag(s) >>
index/indices) with the db selects - even as a beginner with SQL.

The exact set of tags will be complete with the availability of all
texts, then the gui will be adapted to make the user selections available
(comboboxes). The searches are restricted
to the predefined "properties", they are not manually queried.
With the dynamic building of the db I just
tried, not to hardcode all the tags, but it would be later
possible to create the database tables and columns also in a
controlled "static" way.

Comparing the two versions I have, the dictionary approach really seems
quite a bit more complicated (the data as well as the functions for the
retrieval), than the sqlite3 version.

For this app I really actually don't need a persistent db, nor should it be
shared etc. The only used thing are the efficient lookup possibilities,
which seemed to do what I needed (simply using intersect selects). Are there
any ways how to implements this efficiently using the python builtin types?

Greetings,
   Vlasta
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080505/4ac98c98/attachment.html>