Persistent objects

Sun Dec 12 04:54:28 EST 2004

I've had this recurring half-baked desire for long enough that I
thought I'd post about it, even though I don't have any concrete
proposals and the whole idea is fraught with hazards.

Basically I wish there was a way to have persistent in-memory objects
in a Python app, maybe a multi-process one.  So you could have a
persistent dictionary d, and if you say 
   d[x] = Frob(foo=9, bar=23)
that creates a Frob instance and stores it in d[x].  Then if you
exit the app and restart it later, there'd be a way to bring d back
into the process and have that Frob instance be there.

Please don't suggest using a pickle or shelve; I know about those
already.  I'm after something higher-performance.  Basically d would
live in a region of memory that could be mmap'd to a disk file as well
as shared with other processes.  One d was rooted into that region,
any entries created in it would also be in that region, and any
objects assigned to the entries would also get moved to that region.

There'd probably have to be a way to lock the region for update, using
semaphores.  Ordinary subscript assignments would lock automatically,
but there might be times when you want to update several structures in
a single transaction.

A thing like this could save a heck of a lot of SQL traffic in a busy
server app.  There are all kinds of bogus limitations you see on web
sites, where you can't see more than 10 items per html page or
whatever, because they didn't want loading a page to cause too many
database hits.  With the in-memory approach, all that data could be
right there in the process, no TCP messages needed and no context
switches needed, just ordinary in-memory dictionary references.  Lots
of machines now have multi-GB of physical memory which is enough to
hold all the stuff from all but the largest sites.  A site like
Slashdot, for example, might get 100,000 logins and 10,000 message
posts per day.  At a 1k bytes per login (way too much) and 10k bytes
per message post (also way too much), that's still just 200 megabytes
for a full day of activity.  Even a low-end laptop these days comes
with more ram than that, and multi-GB workstations are no big deal any
more.  Occasionally someone might look at a several-day-old thread and
that might cause some disk traffic, but even that can be left in
memory (the paging system can handle it).

On the other hand, there'd either have to be interpreter hair to
separate the persistent objects from the non-persistent ones, or else
make everything persistent and then have some way to keep processes
sharing memory from stepping on each other.  Maybe the abstraction
machinery in PyPy can make this easy.

Well, as you can see, this idea leaves a lot of details not yet
thought out.  But it's alluring enough that I thought I'd ask if
anyone else sees something to pursue here.