using sqlite3 - execute vs. executemany; committing ...

David wizzardx at gmail.com
Tue May 6 05:07:11 EDT 2008


Hi Vlasta.

I had a look at your original mail.

I think your simpler (than XML) format is a good idea for now. At a
later stage you could change it to something like this:

<CUSTOM_TAG KC=12 KNJ=13 VCC=0...etc>text goes here, some more text</CUSTOM_TAG>
<CUSTOM_TAG KC=13 KNJ=14 VCC=1...etc>text goes here, some more text</CUSTOM_TAG>

And so on. This isn't much more verbose than your current syntax, and
it is (I think) valid XML that you can feed into other tools, etc.

As for internal representation, the sqllite table-like format seems to
suite your app well. Also, there may be a performance advantage (for
queries) since the actual searching takes place in optimized C code
(in C-friendly structures), rather than vanilla Python code (which
uses string dict lookups for a lot of things which C doesn't).

Have you benchmarked how long the sqllite version takes, as opposed to
a simple list of dicts like this?:

[{'line':123, 'KC':12, 'VCC':0, KNJ: 12, text: u'some text'},
 {'line':124, 'KC':13, 'VCC':1, KNJ: 13, text: u'some more text'},
 {'line':125, 'KC':14, 'VCC':2, KNJ: 14, text: u'some more text'},

 ... etc ....
]

(Note that I'm storing some values as ints rather than unicode)

In theory, if you enable psyco, this structure (even without dict
lookups) should be almost (or within a few orders at least) as fast as
sqlite (without db indexes).

Could you try that, and let me know how it perfoms?

As for optimization, here's a simple method which might help. I'm
basically simulating db indexes on all the fields:

1) Use the simple structure, as above (a list of dicts)

2) Add a dict for each field, which gives you a list of indexes into
the above structure, for each unique value

example (simplified) structure:

lookups = {
  'line': {123: [0,1,2], 124: [3,4,5].... etc}
  'KC': {12: [0,4,6], 13: [7,8,9]...etc}
  'VC': {14: [0,3,4], 15: [10,11,12] ... etc}
   ... etc
}


3) When the user searches on various criteria, look up all the
matches, and get intersections (using the 'set' module).

eg:

user searches for line 123, VC 15:

Your code:

a) Looks up line 13 indexes (0,1,2), and adds them to a set object
b) Looks up VC 15 indexes (10,11,12), and adds them to another set object
c) Gets the intersection of those sets (see python docs) (in this case
no results, but you get the idea)
d) Gives those results to the user

In theory this should perform well.

Could you try this and let me know how well it works?

David



More information about the Python-list mailing list