Databases and python

Thu Feb 16 04:23:12 EST 2006

Jonathan Gardner wrote:
> I'm no expert in BDBs, but I have spent a fair amount of time working
> with PostgreSQL and Oracle. It sounds like you need to put some
> optimization into your algorithm and data representation.
> 
> I would do pretty much like you are doing, except I would only have the
> following relations:
> 
> - word to word ID
> - filename to filename ID
> - word ID to filename ID
> 
> You're going to want an index on pretty much every column in this
> database. 

stop !

I'm not a db expert neither, but putting indexes everywhere is well
known DB antipattern. An index is only useful if the indexed field is
discriminant enough (ie: there must be the less possible records having
the same value for this field). Else, the indexed lookup may end up
taking far more time than a simple linear lookup.  Also, indexes slow
down write operations.

> That's because you're going to lookup by any one of these
> columns for the corresponding value.
> 
> I said I wasn't an expert in BDBs. But I do have some experience
> building up large databases. In the first stage, you just accumulate
> the data. Then you build the indexes only as you need them.

Yes. And only where it makes sens.

(snip)

> And your idea of hundreds of thousands of tables? Very bad. Don't do
> it.

+100 on this

-- 
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'onurb at xiludom.gro'.split('@')])"