[PYTHON DB-SIG] DB-Modules returning lists

P.S.Craig@durham.ac.uk P.S.Craig@durham.ac.uk
Fri, 10 Jan 1997 09:45:37 +0000 (GMT)


Hi,

I have been following this discussion with some interest and debating
whether I should step in with my pennyworth. Here it is.

About 4 months ago, I needed a python binding to ingres so that I
could python/Tk to provide a GUI for some student attendance records
which are entered by many people and which need to be accessible to
the whole department.

I decided to brew my own for a number of reasons:
1) I was unable to get hold of the standard stuff which people have
   been talking about because of connection problems to the ftp site.
2) I do not come from a database background.  In fact, I am generally
   much more interested the Matrix-SIG than the DB-SIG.
3) I wanted to be able to do numerical computations with the results
   of queries without having to write lots of type conversion code.

The solution I settled on was to use two-dimensional Numerical python
arrays to hold the results of select queries. I can imagine that those
reading this will immediately say that not all data is numerical and
of course you are right.  The secret is that Numerical python can
handle multi-dimensional arrays of arbitrary python objects.

There are several benefits: 

1) The ability to perform binary operations with results of queries
   can be nice.
2) Often the 2-d array from a db actually represents a
   higher-dimensional structure. It's easy with numerical python to
   change the dimensionality of the returned array. Then we can access
   the data in its natural representation
3) The problem of naming the columns is easily solved by wrapping the
   numerical python array in a class which implements __getitem__ and
   __setitem__ to allow named columns. It's then easy to pick out data
   which satisfy additional criteria to those specified in the
   original query without returning to the db.
4) Numerical python is fairly efficient.

A couple of slight problems:

1) Wrapping the numerical python arrays in another class means that
   binary operations don't work transparently. In principle, this can
   be resolved using the UserArray stuff for numerical python, but I
   haven't yet had the time.
2) I have dropped the whole cursor idea. All data from a query is
   lifted immediately into python in one gulp. Potentially disastrous
   for very large query results. Is this a big hassle in practice?
   Not for me, at any rate.

The modified numerical python stuff is implemented in a module. I sort
of intend this module to end up offering an array interface similar to
S (a statistics and data analysis package).

I also have a class based SQL interface to ingres. The basic classes
are Connection, View, Alias, Table and a bunch of operator classes for
handling query building. The module works well, but I am aware that it
needs a number of extra classes to make it cleaner (Field, Query,
etc).

I feel irritated with myself for doing everything completely from
scratch. I don't know if any of what I have described is interesting
to others. If it is, I'll be pleased to show you my (lousy) code or
tell you more about how it works.

Peter Craig

#--------------------------------------------------------------------#
| E-mail:   P.S.Craig@durham.ac.uk  Telephone: +44-91-3742376 (Work) |
| Fax:      +44-91-3747388                     +44-91-3860448 (Home) |
|                                                                    |
| WWW:      http://fourier.dur.ac.uk:8000/stats/psc.html             |
|							             |
| Snail:    Peter Craig, Dept. of Math. Sciences, Univ. of Durham,   |
|           South Road, Durham DH1 3LE, England			     |
#--------------------------------------------------------------------#

=================
DB-SIG  - SIG on Tabular Databases in Python

send messages to: db-sig@python.org
administrivia to: db-sig-request@python.org
=================