DB API

Boudewijn Rempt boud at rempt.xs4all.nl
Wed Jan 19 15:02:42 EST 2000


Gordon McMillan <gmcm at hypernet.com> wrote:

> Perhaps some bozo code "FOR UPDATE" in the select 
> statement? <sound of fingers breaking>. He won't do that 
> again.

> Actually, if the SQL is coded right (no ORDER BY or GROUP 
> BY) and the stuff in the middle is decent, the DB can be 
> returning rows before the query has even finished executing. If 
> the query is parameterized (pre-parsed) and simple, the 
> bottleneck will be the network and the load on the DB will be 
> insignificant. Unfortunately, that's a lot of "if"s...
>  

Well, all I can say - before this turns into a 'it is. No it ain't'
match, that I've been working with several fairly large databases
(above a million rows, above a thousand tables and with some quite
serious processing), and that according to my experience he best plan
is to do most of your processing as close as possible to the database
server - preferably inside. The network never has enough bandwidth,
and the client PC never has enough memory, at least in my experience.

I've seen some horrible GROUP BY's and ORDER BY's - on the other hand,
postprocessing the rows on the client can be impossible, making an ORDER
BY useful (especially if there's an ordered index to go by). Using
unparametrized queries in a production application is simply silly,
since it will overload Oracle's SQL area - I assume other rdbms's will
have an equivalent. I'm not sure what the stuff in the middle should
be - there are such a lot of possibilties, and even when you've got a
nice transaction manager, you'll find that the manager is trying out
his Cognos toy in daytime ;-).

You need a lot of space on the server if you want to handle large queries
- all the stuff will be stored someplace, even if the database can push
the first rows to the client before it's found the last rows (and that's
often not possible - complex joins can make that impossible) - but it
will in all probability sweep the buffer cache clean in the process,
making it inefficient after all. At least, that's what happened when
our last large application had an architecture where most queries
processed a megabyte of data. Forty clients brought the database to
a standstill.

I still think that a relation database is for getting small amounts of
compact data from a large (but well-indexed) mass of data. Dependent
on the nature of the application, the size of the server and the number
of concurrent users, a few thousand rows could be classified as a small
amount, but even on a large server, you wouldn't want hundreds of users
all demanding a few thousand rows every other moment. And I'm sure that
for handling large blobs other solutions are more suitable, like object
databases, but I haven't tried them.

Sure, it's possible that there is a need for data-slurping, but you'd
better be prepared for some questions about performance.

-- 

Boudewijn Rempt  | http://www.valdyas.org



More information about the Python-list mailing list