persistent composites

kindly kindly at gmail.com
Sun Jun 14 10:54:12 EDT 2009


On Jun 14, 3:27 pm, Aaron Brady <castiro... at gmail.com> wrote:
> Hi, please forgive the multi-posting on this general topic.
>
> Some time ago, I recommended a pursuit of keeping 'persistent
> composite' types on disk, to be read and updated at other times by
> other processes.  Databases provide this functionality, with the
> exception that field types in any given table are required to be
> uniform.  Python has no such restriction.
>
> I tried out an implementation of composite collections, specifically
> lists, sets, and dicts, using 'sqlite3' as a persistence back-end.
> It's significantly slower, but we might argue that attempting to do it
> by hand classifies as a premature optimization; it is easy to optimize
> debugged code.
>
> The essentials of the implementation are:
>   - each 'object' gets its own table.
>     = this includes immutable types
>   - a reference count table
>     = when an object's ref. count reaches zero, its table is dropped
>   - a type-map table
>     = maps object's table ID to a string of its type
>   - a single 'entry point' table, with the table ID of the entry-point
> object
>     = the entry point is the only data structure available to new
> connections.  (I imagine it will usually be a list or dict.)
>
> I will be sure to kill any interest you might have by now, by
> "revealing" a snippet of code.
>
> The object creation procedure:
>
> def new_table( self, type ):
>   ''' 'type' is a string, the name of the class the object is an
> instance of '''
>   cur= self.conn.cursor( )
>   recs= cur.execute( '''SELECT max( tableid ) FROM refcounts''' )
>   rec= cur.fetchone( )
>   if rec[ 0 ] is None:
>     obid= 0
>   else:
>     obid= rec[ 0 ]+ 1
>   cur.execute( '''INSERT INTO types VALUES( ?, ? )''', ( obid,
> type ) )
>   cur.execute( '''INSERT INTO refcounts VALUES( ?, ? )''', ( obid,
> 1 ) )
>
> The increment ref. count procedure:
>
> def incref( self, obid ):
>   cur= self.conn.cursor( )
>   recs= cur.execute( '''SELECT count FROM refcounts WHERE tableid
> = ?''', ( obid, ) )
>   rec= recs.fetchone( )
>   newct= rec[ 0 ]+ 1
>   cur.execute( '''UPDATE refcounts SET count = ? WHERE tableid = ?''',
> ( newct, obid ) )
>
> The top-level structure contains these two procedures, as well as
> 'setentry', 'getentry', and 'revive' procedures.
>
> Most notably, user-defined types are possible.  The dict is merely a
> persistent dict.  'revive' checks the global namespace by name for the
> original type, subject to the same restrictions that we all know and
> love that 'pickle' has.
>
> As usual, deadlocks and cyclic garbage pose the usual problems.  The
> approach I used last time was to maintain a graph of acquired locks,
> and merely check for cycles to avert deadlocks, which would go in a
> separate table.  For garbage, I can't find a better solution than
> Python already uses.
>
> From the 3.0 docs:
> gc.garbage
>
>     A list of objects which the collector found to be unreachable but
> could not be freed (uncollectable objects).
> ...
> Python doesn’t collect such [garbage] cycles automatically because, in
> general, it isn’t possible for Python to guess a safe order in which
> to run the __del__() methods. If you know a safe order, you can force
> the issue by examining the garbage list, and explicitly breaking
> cycles due to your objects within the list.
>
> Before I go and flesh out the entire interfaces for the provided
> types, does anyone have a use for it?

I like it as a concept, have not got any use for it this minute, but I
am sure it will be useful someday.



More information about the Python-list mailing list