Python ORM library for distributed mostly-read-only objects?

Roy Smith roy at panix.com
Sun Jun 22 09:49:53 EDT 2014


In article <85659fdd-511b-4aea-9c4b-17a4bbb88662 at googlegroups.com>,
 smurfix at gmail.com wrote:

> My problem: I have a large database of interconnected objects which I need to 
> process with a combination of short- and long-lived workers. These objects 
> are mostly read-only (i.e. any of them can be changed/marked-as-deleted, but 
> that happens infrequently). The workers may or may not be within one Python 
> process, or even on one system.
> 
> I've been doing this with a "classic" session-based SQLAlchemy ORM, approach, 
> but that ends up way too slow and memory intense, as each thread gets its own 
> copy of every object it needs. I don't want that.
> 
> My existing code does object loading and traversal by simple attribute 
> access; I'd like to keep that if at all possible.
> 
> Ideally, what I'd like to have is an object server which mediates write 
> access to the database and then sends change/invalidation notices to the 
> workers. (Changes are infrequent enough that I don't care if a worker gets a 
> notice it's not interested in.)
> 
> I don't care if updates are applied immediately or are only visible to the 
> local process until committed. I also don't need fancy indexing or query 
> abilities; if necessary I can go to the storage backend for that. (That 
> should be SQL, though a NoSQL back-end would be nice to have.)
> 
> Does something like this already exist, somewhere out there, or do I need to 
> write this, or does somebody know of an alternate solution?

If you want to go NoSQL, I think what you're describing is a MongoDB 
replica set (http://docs.mongodb.org/manual/replication/).  One of the 
replicas is the primary, to which all writes are directed.  You can have 
some number of secondaries, which get all the changes applied to the 
primary, and spread out the load for read access.  If you want a vaguely 
SQLAlchemy flavored ORM, there's mongoengine (http://mongoengine.org/).

On the other hand, this may be overkill for what you're trying to do.  
Can you give us some more quantitative idea of your requirements?  How 
many objects?  How much total data is being stored?  How many queries 
per second, and what is the acceptable latency for a query?



More information about the Python-list mailing list