[Persistence-sig] getting started

Kevin Altis altis@semi-retired.com
Wed, 10 Jul 2002 13:30:34 -0700


I'm the lead for the PythonCard project
http://pythoncard.sourceforge.net/

I'll mostly be lurking and don't expect that I will contribute any code. I'm
not a database guy and I've only been using Python for a little over a year,
so all you data gurus are much more qualified than I to say what is good and
proper. However, if there is something usable that comes out of this SIG
then it is likely a PythonCard sample or two will get created that utilizes
the API/package. I'll go ahead and give a long introduction, to get it out
of the way and hopefully bring up some relevant topics.

Persistence in the context of PythonCard is probably a bit different than
what most people have in mind for this SIG. We don't even have complete
agreement among the main PythonCard developers on this topic.

I would like to have a storage solution that is built-in to the Python
standard distribution and that won't change in the next few years but
preferably won't change for 5-10 years or longer, so that there is little
risk of stored data becoming unusable as Python is updated. The data format
must also be cross-platform, at least for the major desktop platforms in
use, so that data created on one platform can be easily exchanged with a
user on another platform without the need for an explicit import/export.
This is where shelve falls down, unless you use dumddbm.

The storage format we end up using for PythonCard will be a basic document
type that any PythonCard app/Python script should be able to open and make
some sense of. Other storage formats will always be an option, but there
will be at least one well-defined format that any and all apps should
understand regardless of whether they are running on Windows, Mac OS X,
Linux/Unix. I'm mostly thinking of storing "dumb data" or simple types,
lists and dictionaries, so I'm not particularly concerned about being able
to store instances of complex classes and their member relations. Storing
class instances worries me because I expect some classes to change over time
and potentially break the loading of old data files created with different
versions of the classes.

Plain pickles seem to fit my requirements as long as you only use native
Python types, so that there are no dependencies on external classes and
modules when loading the pickle. A conversion of the data to a newer format
might be acceptable, but this implies some kind of versioning or other
smarts in the data file. The PythonCard flatfileDatabase sample uses a
simple list of dictionaries for storing data, keeping the entire data set in
memory while the app is running. The data can be loaded and stored as a
single pickle file (version in cvs, not release 0.6.7). I would have
preferred a solution where all the data didn't need to be in memory and the
access to each record in the list was transparent, but I ran into issues
trying to use shelve for this and we haven't gotten far enough along with
ZODB to know whether it will do the job.

A number of people working on PythonCard apps would be very happy if the
simple lists and dictionaries could be mapped to underlying SQL data stores
without the user of the storage needing to know anything about SQL.
Concurrency and transactions would be nice too.

I posted a message to the PythonCard-users mailing list about shelve at the
end of June that covers some of the issues I ran into with shelve.

"why we probably don't want to use shelve"
http://aspn.activestate.com/ASPN/Mail/Message/1259977

There are even more messages in the PythonCard-users archive about
persistence and pickle, but most of them only touch on issues this SIG will
address.

http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/PythonCard

ka
---
Kevin Altis
altis@semi-retired.com
http://www.pythoncard.org/