From pje@telecommunity.com Thu Aug 1 00:34:31 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Wed, 31 Jul 2002 19:34:31 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: <15688.25222.339977.30416@slothrop.zope.com> References: <3.0.5.32.20020731174131.00904c10@telecommunity.com> <3.0.5.32.20020730150539.0089c240@telecommunity.com> <3.0.5.32.20020730135832.008fa690@telecommunity.com> <3.0.5.32.20020731174131.00904c10@telecommunity.com> Message-ID: <5.1.0.14.0.20020731191704.051f9700@mail.telecommunity.com> At 06:19 PM 7/31/02 -0400, Jeremy Hylton wrote: >[Meta-comment: I'm sorry it's taking us so long to reach some kind of >understanding on this issue. It seems like we keep talking past each >other, but I'm not sure why.] > > >>>>> "PJE" == Phillip J Eby writes: > > [I wrote:] > >> If I understand this example correctly, then there are three > >> different objects that implement the resource manager interface: > >> > >> 1. persist->XML > >> 2. XML->Database > >> 3. Database > >> > >> It sounds like the application code only interacts with 1, and > >> that 2 and 2 should be considered implementation details of 1. > >> Thus, only 1 should register with the transaction, since it's the > >> only independent entity. > >> > >> When the transaction commits, it first calls prepare() on 1. > >> This delegates the responsibility for the commit to 2, which in > >> turn delegates to 3. So for 1 to return True from its prepare, 2 > >> and 3 must also return True. > >> > >> Why doesn't this work? :-) > >> > > PJE> Because 3 would be shared by other objects also being persisted > PJE> to that SQL database, for just the first thing that comes to > PJE> mind. > >If you call prepare() twice on a resource manager, it should return >the same answer both times, right? If so, then it shouldn't matter if >the same resource manager is being used as a top-level component and >an internal component. It will perform its prepare work the first >time it is called and then just return its vote the second time it is >called. The greater the guarantees that the transaction can give in its contract to the resource manager, the easier it is to write resource managers. I think that we should err on the side of making the transaction core more complex, if it makes implementation of other components easier. Specifically, the transaction should make guarantees that certain methods will be called a specific number of times (as I proposed in the Straw Man API), because it makes the resource manager code simpler -- i.e., less boilerplate needed to write them. > PJE> But that's an implementation detail. This is primarily an > PJE> architectural issue. > >I agree that it's an architectural issue. (It's good that we agree on >some things .) The example above sounds like a component-based >system, where there is a compound persist->xml->database component. >The subcomponents of this entity should not be registering themselves >with the transaction manager. A component should control all >communication of its constituent parts with other components. Er, no. See my first point. The "database" component, in the specific application example I have in mind, is *shared*. In addition to persist->xml->database there's also some persist->database taking place, on different objects stored in the same relational back-end. To re-state, there is *not* a three-part-component composed of three data managers, there are three data managers, loosely coupled via the objects they manage. You seem to be assuming that there's only one data manager to an application. I expect to be dealing with a variety of application scenarios where I will have a *bunch* of them, each written relatively independently of the others. In most cases, they'll have little indirect coupling to underlying data managers. But it will happen sometimes. > PJE> Data manager 1 is generic code written to work on an XML DOM. > PJE> It shouldn't have to *know* that the DOM *is* persistent, let > PJE> alone *how* it's persisted. > >The description of the first component implies that is supports >persistence objects and stores them using another component that >stores XML. That top-level component *must* know how to handle >persistent objects and transactions, as it implements those >interfaces. As Jim would say, Waaaaa! :) My whole point is that I don't *want* to have a "top-level component" in order to implement this scenario. It seems a poor component architecture that doesn't support delegation without implementation knowledge, and that's what you're asking for here. To create this top-level component, it has to know about implementation details of its children. But if the data managers are simply peer transaction participants, this is unnecessary. > PJE> You're calling for the placement of global architecture > PJE> knowledge into individual components, that should only be known > PJE> at a higher abstraction level. > >I thought I was arguing the opposite. Individual components should >not all talk to the global transaction manager. Instead, when a >component is assembled, the parts should be wired together so that >each knows who to communicate with. The application simply says, "Hello Mr. Transaction Manager. I'll be using the following collection of data managers today. Kindly inform them when you have something going on that they need to know about." That is a *lot* different than the degree of implementation knowledge required to assemble compound data managers. For one thing, it requires considerably less skill on the part of the application developer. From barry@zope.com Thu Aug 1 02:06:28 2002 From: barry@zope.com (Barry A. Warsaw) Date: Wed, 31 Jul 2002 21:06:28 -0400 Subject: [Persistence-sig] "Straw Man" transaction API References: <15685.57251.14632.949497@slothrop.zope.com> <15688.15508.288534.906790@slothrop.zope.com> Message-ID: <15688.35220.331816.465900@anthem.wooz.org> >>>>> "JH" == Jeremy Hylton writes: JH> The database needs some object to represent the particular JH> savepoint. A transaction could call savepoint() three times JH> and have three different states it could rollback to. I JH> decided a rollback object was clearer than a rollback() method JH> on the transaction that took a savepoint_id argument. Say you had savepoint(t1), savepoint(t2), and savepoint(t3) where t1 < t2 < t3. Then you rolled back savepoint(t1) and then try to rollback savepoint(t3), you'd get an exception right? -Barry From anthony@interlink.com.au Thu Aug 1 06:23:03 2002 From: anthony@interlink.com.au (Anthony Baxter) Date: Thu, 01 Aug 2002 15:23:03 +1000 Subject: [ZODB-Dev] Re: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <15688.35220.331816.465900@anthem.wooz.org> Message-ID: <200208010523.g715N3625198@localhost.localdomain> >>> Barry A. Warsaw wrote > Say you had savepoint(t1), savepoint(t2), and savepoint(t3) where t1 < > t2 < t3. Then you rolled back savepoint(t1) and then try to rollback > savepoint(t3), you'd get an exception right? If you have multiple savepoints in the same transaction, should you be allowed to roll back the one that's not the most recent? To my brain, this doesn't make sense... -- Anthony Baxter It's never too late to have a happy childhood. From jim@zope.com Thu Aug 1 13:42:10 2002 From: jim@zope.com (Jim Fulton) Date: Thu, 01 Aug 2002 08:42:10 -0400 Subject: [ZODB-Dev] Re: [Persistence-sig] "Straw Man" transaction API References: <200208010523.g715N3625198@localhost.localdomain> Message-ID: <3D492CA2.1000108@zope.com> Anthony Baxter wrote: >>>>Barry A. Warsaw wrote >>>> >>Say you had savepoint(t1), savepoint(t2), and savepoint(t3) where t1 < >>t2 < t3. Then you rolled back savepoint(t1) and then try to rollback >>savepoint(t3), you'd get an exception right? >> > > If you have multiple savepoints in the same transaction, should you be > allowed to roll back the one that's not the most recent? To my brain, this > doesn't make sense... Rolling back to a non-recent savepoint implicitly rolls back the recent savepoints. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From barry@zope.com Thu Aug 1 13:46:57 2002 From: barry@zope.com (Barry A. Warsaw) Date: Thu, 1 Aug 2002 08:46:57 -0400 Subject: [ZODB-Dev] Re: [Persistence-sig] "Straw Man" transaction API References: <15688.35220.331816.465900@anthem.wooz.org> <200208010523.g715N3625198@localhost.localdomain> Message-ID: <15689.11713.625074.103729@anthem.wooz.org> >>>>> "AB" == Anthony Baxter writes: >> Barry A. Warsaw wrote >> Say you had savepoint(t1), savepoint(t2), and savepoint(t3) >> where t1 < t2 < t3. Then you rolled back savepoint(t1) and >> then try to rollback savepoint(t3), you'd get an exception >> right? AB> If you have multiple savepoints in the same transaction, AB> should you be allowed to roll back the one that's not the most AB> recent? To my brain, this doesn't make sense... I think (but am not sure) that the idea was that rolling back to an earlier savepoint would roll back all the intermediate ones. In either case, it means the savepoints have some shared state so that the proper exceptions would be raised if you Did Something Nasty. -Barry From pje@telecommunity.com Thu Aug 1 13:52:22 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Thu, 01 Aug 2002 08:52:22 -0400 Subject: [ZODB-Dev] Re: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <3D492CA2.1000108@zope.com> References: <200208010523.g715N3625198@localhost.localdomain> Message-ID: <5.1.0.14.0.20020801084747.05ff4dc0@mail.telecommunity.com> At 08:42 AM 8/1/02 -0400, Jim Fulton wrote: >Anthony Baxter wrote: >>>>>Barry A. Warsaw wrote >>>Say you had savepoint(t1), savepoint(t2), and savepoint(t3) where t1 < >>>t2 < t3. Then you rolled back savepoint(t1) and then try to rollback >>>savepoint(t3), you'd get an exception right? >>If you have multiple savepoints in the same transaction, should you be >>allowed to roll back the one that's not the most recent? To my brain, this >>doesn't make sense... > >Rolling back to a non-recent savepoint implicitly rolls back the recent >savepoints. Oy. That makes my head hurt. Not on the interface side, where it makes perfect sense. More on the implementation side. The one DBMS whose savepoint implementation I'm sufficiently familiar with to try and figure this out, won't allow this. You can only have one savepoint. I don't think many DBMS systems offer such flexible savepoint capabilities, so it's going to be important to work out what happens when a data manager can only support a single savepoint at a time. From barry@zope.com Thu Aug 1 14:11:44 2002 From: barry@zope.com (Barry A. Warsaw) Date: Thu, 1 Aug 2002 09:11:44 -0400 Subject: [ZODB-Dev] Re: [Persistence-sig] "Straw Man" transaction API References: <200208010523.g715N3625198@localhost.localdomain> <5.1.0.14.0.20020801084747.05ff4dc0@mail.telecommunity.com> Message-ID: <15689.13200.31875.494136@anthem.wooz.org> >>>>> "PJE" == Phillip J Eby writes: PJE> The one DBMS whose savepoint implementation I'm sufficiently PJE> familiar with to try and figure this out, won't allow this. PJE> You can only have one savepoint. Do you mean you can only have one savepoint in total? PJE> I don't think many DBMS systems offer such flexible savepoint PJE> capabilities, so it's going to be important to work out what PJE> happens when a data manager can only support a single PJE> savepoint at a time. Maybe multiple savepoint rollbacks can be implemented in the connection by iterating through each individual savepoint. -Barry From pje@telecommunity.com Thu Aug 1 15:12:10 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Thu, 01 Aug 2002 10:12:10 -0400 Subject: [ZODB-Dev] Re: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <15689.13200.31875.494136@anthem.wooz.org> References: <200208010523.g715N3625198@localhost.localdomain> <5.1.0.14.0.20020801084747.05ff4dc0@mail.telecommunity.com> Message-ID: <3.0.5.32.20020801101210.019cad90@telecommunity.com> At 09:11 AM 8/1/02 -0400, Barry A. Warsaw wrote: > >>>>>> "PJE" == Phillip J Eby writes: > > PJE> The one DBMS whose savepoint implementation I'm sufficiently > PJE> familiar with to try and figure this out, won't allow this. > PJE> You can only have one savepoint. > >Do you mean you can only have one savepoint in total? I meant, active at a given point in time; that is, there is only one savepoint at any given moment that you can roll back to. But it turns out I misspoke, at least in relation to Sybase 12.5. I just rechecked the manual and it appears to allow rolling back to arbitrary named savepoints. I'm not sure why I thought this wasn't the case, although perhaps I am thinking of an older version. I haven't been working with 12.5 long. I think I'm going to go back and look at the manuals for some of the other databases I'm using or plan to use in future, and verify what nesting or savepoint capabilities they have. It seems to me that one could simulate savepoints through the use of nested transactions, or vice versa. I'll also take a look at JDBC's metadata variables for transactions, to get an idea of what variations of capabilities are likely to be out there. After I've got a better idea of what the different DB's do or don't support in this area, I'll comment again. :) From jeremy@alum.mit.edu Thu Aug 1 15:58:50 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 1 Aug 2002 10:58:50 -0400 Subject: [ZODB-Dev] Re: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <200208010523.g715N3625198@localhost.localdomain> References: <15688.35220.331816.465900@anthem.wooz.org> <200208010523.g715N3625198@localhost.localdomain> Message-ID: <15689.19626.780605.301525@slothrop.zope.com> >>>>> "AB" == Anthony Baxter writes: >>>> Barry A. Warsaw wrote >> Say you had savepoint(t1), savepoint(t2), and savepoint(t3) where >> t1 < t2 < t3. Then you rolled back savepoint(t1) and then try to >> rollback savepoint(t3), you'd get an exception right? Yes. RollbackError. AB> If you have multiple savepoints in the same transaction, should AB> you be allowed to roll back the one that's not the most recent? AB> To my brain, this doesn't make sense... It does make sense. It's all about saving partial progress so that you can return to it later if something goes wrong. Nested transactions are an example of something that wants to rollback to arbitrary earlier savepoints. If part of a subtransaction fails, you need to rollback to th beginning of the subtransaction. You need not abort the entire transaction, because some part of the application can recover and continue from the last savepoint. In particular, the state of persistent objects gets rollback but the total state of your application (e.g. control flow, non-persistent local variables, etc.) is not. Jeremy From pje@telecommunity.com Thu Aug 1 19:36:43 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Thu, 01 Aug 2002 14:36:43 -0400 Subject: [ZODB-Dev] Re: [Persistence-sig] Nesting and savepoints In-Reply-To: <3.0.5.32.20020801101210.019cad90@telecommunity.com> References: <15689.13200.31875.494136@anthem.wooz.org> <200208010523.g715N3625198@localhost.localdomain> <5.1.0.14.0.20020801084747.05ff4dc0@mail.telecommunity.com> Message-ID: <3.0.5.32.20020801143643.0089dcd0@telecommunity.com> At 10:12 AM 8/1/02 -0400, Phillip J. Eby wrote: > >I think I'm going to go back and look at the manuals for some of the other >databases I'm using or plan to use in future, and verify what nesting or >savepoint capabilities they have. It seems to me that one could simulate >savepoints through the use of nested transactions, or vice versa. I'll >also take a look at JDBC's metadata variables for transactions, to get an >idea of what variations of capabilities are likely to be out there. > >After I've got a better idea of what the different DB's do or don't support >in this area, I'll comment again. :) > An update on the transaction models of some common database systems and APIs: PostgreSQL: no nesting, no savepoints BerkeleyDB: nesting; parallel simultaneous child transactions allowed Sybase: nesting w/sequential children only; rollback rolls *entire* transaction back unless savepoints are used to mark child transaction beginnings; savepoints can be named and rollback to any point is possible Oracle: no nesting; named savepoints with rollback to any point; savepoints can be used to emulate sequential child transactions to arbitrary nesting depth. Java JTA/JTS: support for nested transactions is optional; the XAResource interface explicitly does *not* support nested transactions, and nothing in the JTA spec defines the semantics of nested transactions, so whether nested transactions can be parallel or must be sequential doesn't appear to be specified. Java JDBC: spec does not mention nested transactions at all. Named savepoints are supported; one calls savepoint = connection.setSavePoint("savepoint_name"), and rolls back with connection.rollback(savepoint). There is a metadata field for whether a driver supports savepoints. Note: by "sequential children", I mean that one can only have one uncommitted child transaction per parent transaction. In this model, a "begin" operation always nests within the outer transaction, rather than creating a "parallel" child transaction. BerkeleyDB supports parallel children, where you can have more than one direct child transaction of a given parent. In other words, the sequential children model has a stack of active transactions, while the parallel children model has a tree of active transactions. Anybody have any other data points to share on this subject? From guido@python.org Wed Aug 7 14:23:01 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 07 Aug 2002 09:23:01 -0400 Subject: [Persistence-sig] Announcing this SIG Message-ID: <200208071323.g77DN1b03301@pcp02138704pcs.reston01.va.comcast.net> I don't think this SIG was ever announced on the python-announce list (I searched the May, June and July archives for postings with "persist" in their subject, and found none). I think it should, to make sure we reach enough people with possible interest in the subject. We've only got about 50 subscribers now. The meta-sig is a pretty small audience to announce a SIG, and I don't know where else it's been announced apart from some Zope forum. I'd be happy to post the original SIG announcement with a preface (to stress the SIG's scope limitations). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Aug 9 16:00:04 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 09 Aug 2002 11:00:04 -0400 Subject: [Persistence-sig] NEW SIG: Persistence-SIG Message-ID: <200208091500.g79F04n07136@pcp02138704pcs.reston01.va.comcast.net> I'd like to bring a new Python Special Interest Group (SIG) under your attention. The Persistence-SIG was started early July and has seen some discussion already, but can use a wider audience. Home page: http://www.python.org/sigs/persistence-sig/ Mailing list: http://mail.python.org/mailman/listinfo/persistence-sig Archives: http://mail.python.org/pipermail-21/persistence-sig/ The SIG's charter is focused on proposing common frameworks for transaction coordination, basic persistence management (without proposing a particular storage manager), and cache management. The SIG home page has a more elaborate list of topics that are in scope as well as some examples of topics that are not. The current thinking in the SIG is to adopt an API similar to that which is currently used in Zope's ZODB (after removing Zope-specific warts and historical accidents). But that's by no means cast in stone, and I'm hoping that some new blood in the SIG will either validate this choice or present an alternative that may find wider-spread acceptance. For more information, please see the SIG home page! --Guido van Rossum (home page: http://www.python.org/~guido/) From jiba@tuxfamily.org Tue Aug 13 13:25:54 2002 From: jiba@tuxfamily.org (Lamy Jean-Baptiste) Date: Tue, 13 Aug 2002 14:25:54 +0200 (CEST) Subject: [Persistence-sig] Observing object changes Message-ID: <20020813142252.A2281@localhost.club-internet.fr> Hi everyone, I've just see the new persistance-SIG and i realize i have already worked on observing object changes. I use it for updating graphical interfaces when objects are changed, and for debugging, but it appears it can be usefull for OO database too -- so such a feature would rock ! Here is the API i use: addevent (obj, func) -- Add the event FUNC (=an observer) to OBJ removeevent(obj, func) -- Remove the event FUNC from OBJ FUNC is a callable which is then called each time OBJ is modified. It takes 4 arguments: the modified object, the name of the modified attribute, the new value and the old value. Example of use: class Point: def __init__(self, x, y): self.x, self.y = x, y def event(obj, attr, new, old): print attr, "was", old, "is now", new p = Point(0.0, 0.0) addevent(p, event) p.x = 1.0 # print "x was 0.0 is now 1.0" I have already written a pure Python implementation of that (feel free to ask for the code if you're interested); it works well but it is really an ugly hack ! The principle is to change the class of the observed object to a new class that define a __setattr__ method. However, the interest of this approach is that it work for ANY object (including list and dict, and already existent object); the object doesn't have to be designed to fit it (e.g. the Point class above was not written especially to fit a particular observation framework). I wonder if it is possible to write a more cleaner version of it in C...? I hope this helps. Please don't strike me if i am off-topic ;-) Jiba From shane@zope.com Tue Aug 13 14:34:17 2002 From: shane@zope.com (Shane Hathaway) Date: Tue, 13 Aug 2002 09:34:17 -0400 Subject: [Persistence-sig] Observing object changes References: <20020813142252.A2281@localhost.club-internet.fr> Message-ID: <3D590AD9.4090001@zope.com> Lamy Jean-Baptiste wrote: > I have already written a pure Python implementation of that (feel free to > ask for the code if you're interested); it works well but it is really an > ugly hack ! The principle is to change the class of the observed object to > a new class that define a __setattr__ method. > > However, the interest of this approach is that it work for ANY object > (including list and dict, and already existent object); the object doesn't > have to be designed to fit it (e.g. the Point class above was not written > especially to fit a particular observation framework). > > I wonder if it is possible to write a more cleaner version of it in C...? > > > I hope this helps. Please don't strike me if i am off-topic ;-) This is right on topic. The current proposal for a persistence API and the current ZODB mandate that a persistent object derive from a special base class. In ZODB the base class, implemented in C, is called "Persistence.Persistent" and it implements __setattr__ and __getattribute__. But it's clear there are many other reasons one might want to hook into attribute changes and accesses. There is a field of research called "aspect oriented programming" that makes extensive use of events like these. (http://aosd.net/) If Python had attribute access observability built in, the persistence framework we're working on could be simpler. Shane From pyth@devel.trillke.net Wed Aug 14 00:14:52 2002 From: pyth@devel.trillke.net (holger krekel) Date: Wed, 14 Aug 2002 01:14:52 +0200 Subject: [Persistence-sig] Observing object changes In-Reply-To: <20020813142252.A2281@localhost.club-internet.fr>; from jiba@tuxfamily.org on Tue, Aug 13, 2002 at 02:25:54PM +0200 References: <20020813142252.A2281@localhost.club-internet.fr> Message-ID: <20020814011452.U10625@prim.han.de> Lamy Jean-Baptiste wrote: > I've just see the new persistance-SIG and i realize i have already worked > on observing object changes. I use it for updating graphical interfaces > when objects are changed, and for debugging, but it appears it can be > usefull for OO database too -- so such a feature would rock ! > > > Here is the API i use: > > addevent (obj, func) -- Add the event FUNC (=an observer) to OBJ > removeevent(obj, func) -- Remove the event FUNC from OBJ > > I have already written a pure Python implementation of that (feel free to > ask for the code if you're interested); it works well but it is really an > ugly hack ! The principle is to change the class of the observed object to > a new class that define a __setattr__ method. This is not enough. Not every modifiction of an object goes through '__setattr__', e.g. myobj.somelist.append(42) would modify 'myobj' but you wouldn't notice. So everything an object passes out (like its attribute) would need to have a thin wrapper which holds a reference to "its" object and notifies upon change. This is likely to be a recursive process. regards, holger From shane@zope.com Wed Aug 14 03:19:58 2002 From: shane@zope.com (Shane Hathaway) Date: Tue, 13 Aug 2002 22:19:58 -0400 (EDT) Subject: [Persistence-sig] Observing object changes In-Reply-To: <20020814011452.U10625@prim.han.de> Message-ID: On Wed, 14 Aug 2002, holger krekel wrote: > Lamy Jean-Baptiste wrote: > > Here is the API i use: > > > > addevent (obj, func) -- Add the event FUNC (=an observer) to OBJ > > removeevent(obj, func) -- Remove the event FUNC from OBJ > > > > I have already written a pure Python implementation of that (feel free to > > ask for the code if you're interested); it works well but it is really an > > ugly hack ! The principle is to change the class of the observed object to > > a new class that define a __setattr__ method. > > This is not enough. Not every modifiction of an object goes through > '__setattr__', e.g. > > myobj.somelist.append(42) > > would modify 'myobj' but you wouldn't notice. So everything an object > passes out (like its attribute) would need to have a thin wrapper which > holds a reference to "its" object and notifies upon change. This is > likely to be a recursive process. This is a classic problem in ZODB applications. The normal ZODB solution looks like this: myobj.somelist.append(42) myobj._p_changed = 1 an alternative: myobj.somelist.append(42) myobj.somelist = myobj.somelist Most ZODB programmers can accept this; it's not too bad. But here's the thing that concerns me: both of the above well-known idioms are still not quite correct. If, for whatever reason, an exception occurs between appending to the list and notifying the persistence framework, and the transaction aborts, the list may be left in an inconsistent state. The list won't be rolled back if its containing object wasn't changed in some other way. If 42 is an important number, this might be a serious problem. ;-) If you just reverse the two statements, the lurking bug goes away. But it still bothers me because the programmer has to be so careful to get it just right. I hope this SIG will look into ways of solving this (for all kinds of persistence, not just ZODB). Shane From aerd@retemail.es Fri Aug 16 01:25:51 2002 From: aerd@retemail.es (Ernesto Revilla) Date: Fri, 16 Aug 2002 02:25:51 +0200 Subject: [Persistence-sig] join the SIG Message-ID: <002401c244bb$7e401fe0$0100a8c0@sicem.biz> Dear all, My name is Ernesto Revilla (Spain) and I'm also very interested because we are designing a new ERP system (for small and medium sized firms) wich will have to use a lot of business objects. I would be grateful if anyone could summarize somehow the results till now. I'm actually working on a new general purpose persistence framework, because the one included in Webware named MiddleKit (http://webware.sourceforge.net/Webware-0.7/MiddleKit/Docs/index.html) does not provide transactions. And there are no comparable frameworks to Java Data Objects (http://java.sun.com/aboutJava/communityprocess/review/jsr012/JDO_0_8.pdf) for Python. I started with this topic about 6 months but have not very much experience. After studying the persistence layer white-paper written by Scott Ambler (http://www.ambysoft.com/persistenceLayer.pdf) I peeked thru some implementations for Java like Castor (http://www.castor.org) and persistence-layer (http://player.sourceforge.net ) . I propose: divide the SIG purpose. I would like to see a level 1 minimal specification, especiallly the API, because it might be difficult to agree. I would discard the savepoints discussions but perhaps allow nested transactions inside the persistence-layer (not the persistence mechanism). Anyway, I would try to keep things very simple, so we could get a initial level 1 implementation soon (end of the year). I'm willing to spend something like 15-20 hours a week on this (depending if the proposed solution goes in the same direction as what our company needs for the new project). I thought something like this: There is one or more class mapping files which specify which classes there are, which attributes they have, also the types of attributes, and to which persistent mechanism it should map. Although the map file could specify just one persistence mechanism to use, the classes and the attributes can override this information. For each persistence mechanism, there could be additional information, e.g. a relational database would specify connection info, table names, field names, primary keys and foreign keys, a file storage would use 'directory' and 'filename'. The supplied information should be specific to each type of persistence mechanism (relational databases, files, bsd sotre like, and! memory only storages (http://www.prevayler.org/) ). In MiddleKit, after specifying a class map, a batch command creates code for the default persistent classes of the class map, then a user can override them inheriting from these generated classes. This is because the mapping file only specifies the data attributes, not the code functions. May be the generation step isn't necessary thru Meta-Classing. Perhaps, is could be another way round, just read class definitions and when storing, look up the class map. A mininal API could be something like this: Say the class map specifies that class Invoice has the attributes 'Reference' of type string, 'Customer' of type Customer, 'lines' of type 'List of ArticleLine' from Persistence import PersistenceManager pm=PersistenceManager() # The loading of the classmap automatically defines the classes with all their attributes as properties. # The base 'set' method defined in the basic 'PersistentObject' class will do type checking and others pm.loadClassMap('/homer/erny/classmap') # just a class definition with user stuff # like business rules, updating attributes, accessing related information class Invoice(pm.classes['Invoice']): def _totalAmount(self): amount=0 for l in self.lines: amount+=l.amount totalAmount=property(_totalamount,None) # how to work with the objects: # all retrieves, updates, and lookups are done inside a transaction # this will isolate the modifications to other users. Optimistic locking is used # implicitly tr=pm.Transaction() result=tr.retrieve(oql='SELECT i FROM Invoice i WHERE i.customer.name LIKE 'Thomson*') amount=0 while result.hasMore(): inv=result.next() inv.lines.append(ArticleLine(ref='BOOK1', qty=1)) # modifying attributes amount+=inv.totalAmount # of course you could also set whatever attributes. Note that this is all done in a transaction. # delete the last accessed invoice: del inv # Better inv.delete() ? # Adding the objects: inv=Invoice() # We have to add it explicitly, because otherwise we would not know to # what transaction it belongs tr.add(inv) inv.customer=tr.retrieve(oid=45323) inv.lines.add(ArticleLine(Ref='TOY2',qty=3)) # after finishing all things, do: tr.commit() # note that the transactions began automatically, no tr.begin() was needed. # Transactions can have nested transactions: tr2=tr.Transaction() # Metaclass information could be available like this: attrtype=inv._class.customer.type # other properties are name, description, store information, etc. ======================= * During commit, the persistence-layer, would check that no other person has changed the same objetcs, throwing a TransactionError if needed. * The class map specifies which classes should use optimistic locking and which one pesimistic locking. *An exception in the code or the deletion of a transaction does a rollback automatically. * For some classes, there should be a retry mechanism so the object would be re-read and the changes reapplied * Note that nothing until here says something about the type of storage or if it supports transactions or inheritance. ======================= Implementation hints: * class definitions will be created thru meta-classing with properties created automatically and inheriting either from other persistent classes of the classmap or a PersistentObject base class (could specify other base class) * the property set function should do type-checking with 'isinstance' * whenver a user accesses an object, the object is read-in in a system-wide cache and a 'proxy' object is returned. * all attribute changes will be recorded in the 'proxy' object. All called methods on a PersistentObject will be tracked, so the changes can be reapplied if necessary. The transaction is a container of the new object states, and the actions applied. * The method-call tracking can be implemented thru metaclasses which scan the user persistent class at definition time and reroute the calls to a tracking procedure which in turn calls the user method. (Sadly, I can't override MethodType, or FunctionType and tell the interpreter to use them instead of the default. In 'object' terms, we need the calls to be 'serializable'.) * all changes are done in-memory (transaction space) until the whole transaction is commited, in which moment it would start transactions in the used storages (if supported), block all used objects, update and thereafter unlock them. This would also update the cache. * Like the PostgreSQL multi-version concurrency control, we could have several versions of the same object in the cache. So with in-memory changes, readers don't block writes nor the other way round. The important thing is that a user has a consistent image of a object, although it is out of date. * objects should have backpointers to their containers wich is especially helpful for query optimization. * I would like to see that old object versions could be written out to another storage system. After all, I hope that this is not too much out of track. As said before, I would like to see a minimal API spec with: * start, commit, rollback transactions, (exclude nested transactions initially?) * retrieve, update, create and delete objects (I borrowed the Castor OQL implementation for a porting to Python * access to meta-data * say something about retry-operations for very frequently updated objects, such as global counters or total amounts per period * loading classmaps * minimal features of a classmap (the format later) (classes with classnames, superclasses, abstract, etc, attrinute name and types, also type of relations (for example, 'embedded' for UML-composition and 'linked' for UML-aggregation and association, or for a bit lower level 'on delete cascade', 'on delete detach', and so on. With best regards, Erny From stephan.diehl@gmx.net Fri Aug 16 14:55:18 2002 From: stephan.diehl@gmx.net (Stephan Diehl) Date: Fri, 16 Aug 2002 15:55:18 +0200 Subject: [Persistence-sig] newbie on this list Message-ID: Dear all, my name is Stephan Diehl and I'm living in Germany. After reading some mail from this list I'm still not really sure what the purpose of this SIG is. Please excuse me, if the rest of the section is beside the point :-) I've written a persistence layer for some (internal) project. At some point, I want to publish it as open source, but there is no documentation yet, and it will take at least a couple of weeks to write some and get the software into a releasable state. If there is interest, I could give out the stuff anyway. so, why on earth did I write another persistence layer? At first, I used Standalone ZODB but got cold feet due to the blackbox behaviour, I just want to see, what's in the database :-) Design requirements (not everything implemented): threadsafe process safe (more than one process can access the store without problem) easy to use objects are stored in plaintext and shoud be usable from other languages as well (maybe PHP for a web frontend) every object has a unique id you could use the system in the following way: from PStore import PStore from PObject import PObject,PList,PDict class class1(PObject):pass store 1= PStore(...) store2 = PStore(...) #the same store store1['entry1'] = obj1 = class1() obj1.a = 1 obj1.b = 'some string' store1.update() obj2 = store2['entry1'] # obj1 and obj2 have now the same attributes obj2.c = 100 # obj1 and obj2 are now different store2.update() # obj1 and obj2 have again the same attributes ----------------------------------------------------------------------------------- so, all in all, the stuff works similar to ZODB, but uses MySQL as a database. You can store even objects that are not subclassed from PObject, but you can't change them. As I said, If anybody is intersted or has further questions, please contact me. Cheers Stephan From shane@zope.com Fri Aug 16 15:18:37 2002 From: shane@zope.com (Shane Hathaway) Date: Fri, 16 Aug 2002 10:18:37 -0400 (EDT) Subject: [Persistence-sig] newbie on this list In-Reply-To: Message-ID: On Fri, 16 Aug 2002, Stephan Diehl wrote: > I've written a persistence layer for some (internal) project. At some point, > I want to publish it as open source, but there is no documentation yet, and > it will take at least a couple of weeks to write some and get the software > into a releasable state. > If there is interest, I could give out the stuff anyway. > > so, why on earth did I write another persistence layer? > At first, I used Standalone ZODB but got cold feet due to the blackbox > behaviour, I just want to see, what's in the database :-) You're in the right place. I think there are two possible outcomes of this SIG: 1) we develop a persistence layer that's easy to understand; or, 2) we make ZODB's persistence layer easy to understand. I think ZODB is amazingly useful and extensible. It meets all the requirements you listed and more; with a little extension it can store in plaintext in any database. But extending ZODB requires deep Python zen, and even a little C zen. So we want to either reduce or replace that complexity, so that we can all reuse a common persistence layer. Shane From stephan.diehl@gmx.net Fri Aug 16 16:14:30 2002 From: stephan.diehl@gmx.net (Stephan Diehl) Date: Fri, 16 Aug 2002 17:14:30 +0200 Subject: [Persistence-sig] newbie on this list In-Reply-To: References: Message-ID: > > You're in the right place. That's good to know :-) > > I think there are two possible outcomes of this SIG: > > 1) we develop a persistence layer that's easy to understand; or, > > 2) we make ZODB's persistence layer easy to understand. are you talking about the usage of such a persistence layer or the inner workings? The usage of ZODB is nearly as easy as it can get. > > I think ZODB is amazingly useful and extensible. It meets all the > requirements you listed and more; with a little extension it can store in > plaintext in any database. But extending ZODB requires deep Python zen, > and even a little C zen. Well, any piece of software that uses object introspection, imports on the fly and the likes will be kind of complicated. The only question is if the user has to see that. > > So we want to either reduce or replace that complexity, so that we can all > reuse a common persistence layer. O.K. I reread the this SIGs scope and can see now where this is heading. Basicly, we aim for a persistence API just like the db API. Woudn't it be nice to have at least a rudimentary query interface as well? (I know, it's not covered by this SIG). Somesting like getObjects(attr , regExp ) returns a list of objIds (or a list of objects). I'm still not sure where to begin? Should one have a look at the ZODB code? Or just post everything that comes to mind? > > Shane Stephan From shane@zope.com Fri Aug 16 16:41:09 2002 From: shane@zope.com (Shane Hathaway) Date: Fri, 16 Aug 2002 11:41:09 -0400 (EDT) Subject: [Persistence-sig] newbie on this list In-Reply-To: <200208161513.g7GFDPo28879@smtp.zope.com> Message-ID: On Fri, 16 Aug 2002, Stephan Diehl wrote: > > I think there are two possible outcomes of this SIG: > > > > 1) we develop a persistence layer that's easy to understand; or, > > > > 2) we make ZODB's persistence layer easy to understand. > > are you talking about the usage of such a persistence layer or the inner > workings? The usage of ZODB is nearly as easy as it can get. I'm talking more about the inner workings. ZODB is a black box to most people. It's easy to use, but as soon as people want to store something other than pickles, they dismiss even the components of ZODB that don't assume pickles, because it looks too hard to dive in. We want to build distinguishable components with clear contracts that fulfill a lot of people's needs for object persistence. > (snip) > > Woudn't it be nice to have at least a rudimentary query interface as well? (I > know, it's not covered by this SIG). Somesting like > getObjects(attr , regExp ) > returns a list of objIds (or a list of objects). It would be nice, but the task we're trying to achieve is already complex enough. Do you really want to use a regex on a million records? ;-) > I'm still not sure where to begin? Should one have a look at the ZODB code? > Or just post everything that comes to mind? It's your call. Be creative. :-) Shane From aerd@retemail.es Fri Aug 16 18:37:34 2002 From: aerd@retemail.es (Ernesto Revilla) Date: Fri, 16 Aug 2002 19:37:34 +0200 Subject: [Persistence-sig] Is is possible to separate programming API from IDataManager api? Message-ID: <001c01c2454b$a08e9dc0$0100a8c0@sicem.biz> I'm sorry if my former mail was out of track. I didn't read all the posted messages, which is really hard. There are a lot of implementation details. Is it possible to separate programming API from IDataManager API and define them separate and independently? Can there be a mechanism to vote about the API, like a Wiki with a voting mechanism (with dead-lines)? Could we get a level 1 API, without multi-thread issues and multiple-save points? (20/80 -> use 20% of resources to solve 80% of use cases, leave other 15% for level 2 API). Please, just CRUD (Basic Create, Update, Delete). Transactions with minimum, i.e. begin, commit and rollback, without nesting. Please set minimum of metadata, so the resource manager can imagine how to store the data. Without observable framework, which can be set later on top of PersistentObject (ObservablePersistentObject). Just programming API. ========================== Some general use cases: Use case 1: an application wants to use just 1 new relational database for storing information (relational, because there may be other applications accessing, especially reporting tools) Use case 2: the same with ZODB Use case 3: An application want to access diferent models in the same database. (The applications integrates two basic applications, each of it has a independent data model.) Use case 4: An applications has objects with some of the attributes from loaded from another storage system. This could be the case of new applications, which integrate old data, but want to add functionality for new uses, such as scanned documents, audio, images. =========================== Request: Can we have a index to document with interesting links like ZODB4, general persistence, etc.? (likely on another Wiki page) =========================== Technical question a bit out of track: Can objects have back-links so a possible OQL interpreter (borrow code from castor) could easily do query optimizations? e.g. "SELECT i FROM Invoice i WHERE i.customer.name LIKE 'Thomson*'" (would directly jump to customer objects and go back to Invoice object, that is inverse traversal of objects). Does ZODB objects point to their containers? Erny From titus@caltech.edu Sun Aug 18 01:03:11 2002 From: titus@caltech.edu (Titus Brown) Date: Sat, 17 Aug 2002 17:03:11 -0700 Subject: [Persistence-sig] 'cucumber' Message-ID: <20020818000311.GA14227@caltech.edu> Hi everyone, I thought I'd toss my own little package into the fray. I've written a fairly simple O/R mapping system named 'cucumber' that sits on top of PostgreSQL. By making use of PG's inheritance hierarchies, cucumber class inheritance relations can be mapped directly into PostgreSQL in a very simple and transparent way. Currently there is a Python implementation, but I hope to have Perl and Java interfaces soon; because all of the data is stored in the database, and because the class files are quite sparing, it's entirely possible to make cross-language data retrieval work. cucumber is still quite young, but it's been used in three separate projects (all mine, of course) and it seems to work quite well. I'm continuing work on it because it satisfies a couple of needs: * cross-language access is potentially quite useful; * access to a cucumber database can be done through straight SQL, without any Python/Perl/Java knowledge; you only need to be careful when creating tables; * it's quite lightweight and easy to extend. I don't regard it as mature -- I've labeled the current CVS version as 0.1.1, to give you an idea -- but I think it may contrast interestingly with other approaches. Some of the drawbacks of the package's immaturity and my own lack of intelligence are: * I haven't yet figured out how to get rid of SQL when building lists of objects (done via a "Catalog" class, which takes SQL WHERE clauses); * the exception/error-reporting hierarchy is, umm, "not well developed"; * documentation is, of course, non-existent. All in all, though, it works and works well for a quick hack of under 1000 lines of Python that most Python hackers could understand. cucumber is checked into CVS at SourceForge, and I'm happy to To show you the basic flavor of things, you first create your various tables in straight SQL: --- example.sql CREATE TABLE classes ( id INTEGER PRIMARY KEY, name TEXT, description TEXT ); CREATE SEQUENCE object_unique_id; INSERT INTO classes VALUES (1, 'example.BaseObject', 'base example object'); INSERT INTO classes VALUES (2, 'example.Fruit', 'fruit object.'); CREATE TABLE base_objects ( id INTEGER PRIMARY KEY DEFAULT NEXTVAL('object_unique_id'), class_id INTEGER REFERENCES classes DEFAULT 1 ); CREATE TABLE fruits ( class_id INTEGER REFERENCES classes DEFAULT 2, name TEXT ) INHERITS (base_objects); --- and then write a class file: --- example/Fruit.py from cucumber import Object class Fruit(Object): table = 'fruits' mymembers = ('name',) myrefmembers = () myobjmembers = {} --- and finally, you use it: -- run-example #! /usr/bin/env python import sys # import the object manager from example.ExampleManager import ExampleManager # import a class to use from example.Fruit import Fruit # create an instance around the database 'cuc-example' manager = ExampleManager('cuc-example') # # create a new Fruit # orange = manager.create(Fruit, name='orange') orange_id = orange.id manager.commit() # save it into the database # # load it back in from the id. # orange2 = manager.load(orange_id) print orange2.name -- Note that the object manager (here 'ExampleManager') is descended from cucumber.ObjectManager. Also, transactions etc. are managed through PostgreSQL and are subject to PostgreSQL's transaction mechanisms -- no intelligence is used on the part of the program. And finally, no object caching is yet done, although I'm sure someone more expert than me could do a good quick job of it. cheers, --titus From titus@caltech.edu Sun Aug 18 01:15:20 2002 From: titus@caltech.edu (Titus Brown) Date: Sat, 17 Aug 2002 17:15:20 -0700 Subject: [Persistence-sig] 'cucumber' In-Reply-To: <20020818000311.GA14227@caltech.edu> References: <20020818000311.GA14227@caltech.edu> Message-ID: <20020818001520.GB14282@caltech.edu> -> I thought I'd toss my own little package into the fray. I've written -> a fairly simple O/R mapping system named 'cucumber' that sits on top of -> PostgreSQL. By making use of PG's inheritance hierarchies, cucumber -> class inheritance relations can be mapped directly into PostgreSQL -> in a very simple and transparent way. ...and, to follow up, I should say that I'd be very interested in refactoring things to adhere to a standard Python persistence interface. My primary concern is that mechanisms be easily reflected into Postgres -- e.g. I don't know of any way to back out of multiple commits in Postgres, which some people seem to have been talking about earlier -- in part because that pretty much guarantees that I can implement something in e.g. Perl... thanks, --titus From pje@telecommunity.com Sun Aug 18 16:46:53 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Sun, 18 Aug 2002 11:46:53 -0400 Subject: [Persistence-sig] ACID, savepoints, and exceptions (was re: "Straw Man" transaction API) In-Reply-To: <15685.57251.14632.949497@slothrop.zope.com> References: <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <3.0.5.32.20020719120237.00898b60@telecommunity.com> <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <5.1.0.14.0.20020818111324.02942350@mail.telecommunity.com> At 08:36 PM 7/29/02 -0400, Jeremy Hylton wrote: >Last week, I worked out a revised transaction API for user code and >for data managers. It's implemented in ZODB4, but is fairly >preliminary code. I imagine we'll revise it further, but I'd like to >describe the changes briefly. During the past week, I've been writing a TransactionService for PEAK, specifically designing it to allow interaction/adaptation to the new ZODB4 transaction API, and extending it to support the multi-prepare and durable subscription models that I need for my applications and framework projects. I believe I've largely been successful, but in the process it has highlighted for me some open issues/ambiguities in the ZODB4 transaction API as it sits right now, relating to error handling and also savepoints. >class IRollback(Interface): > > def rollback(): > """Rollback changes since savepoint.""" > >I think the rollback mechanism will work well enough. Gray and Reuter >explain that it can be used to simulate a nested transaction >architecture. Thus, I think it's a reasonable building block for the >nested transaction API. In my API I've standardized on a 'CannotRevertException' when rollback to a savepoint is not possible, and added a 'NullSavepoint' object which can be returned by an object that has nothing to do on rollback. An open issue that needs to be addressed, however, is the question of rolling back more than once to the same savepoint. In some ways, it's a very handy capability, but I'm not sure which databases support this. I'm therefore inclined to say we should explicitly say that a savepoint can be rolled back at most once (since some savepoints may not be able to be rolled back). Another open issue: what happens if a rollback fails? Is the transaction "hosed" at that point? What if five data managers roll back, and the sixth one fails? This suggests adding a 'canRollback()' method to the interface, such that a rollback aggregator can check that its aggregated savepoints can actually be rolled back, so that "CannotRevert" errors don't cause the transaction to be hosed. However, the issue of another type of exception occurring during rollback still must be addressed. >I think I'm also in favor of the new abort semantics. ZODB3 would >abort the transactions -- call abort() on all the data managers -- if >an error occurred during a commit. The new code requires that the >user do this instead. I think that's better, because it leaves the >state of the objects intact if the code wants to analyze what went >wrong before retrying the transaction. The interesting question here again is, is the transaction "hosed"? Should there be a flag that says, "you can't do anything to this transaction but abort it"? To put it in broader terms, if *any* exception is thrown during execution of a transaction-related method, should we consider the transaction unrecoverable? I'm inclined to say yes, because I can think of too many code paths in both my and the ZODB4 transaction code where it becomes nearly impossible to guarantee a "clean" state when an exception occurs. By definition, if code called by the transaction system raises an exception, it is announcing that it cannot satisfy its contract with the transaction. Therefore, the transaction cannot be certain of satisfying its contract with the application for a clean commit. Another issue here is clean aborts. If an error is raised by a data manager during abort, what should the semantics be? Older ZODB transaction classes wrap every data manager abort call in a try-except that ensures that *all* the abort methods get called, even if several of them raise errors. The new ZODB4 transaction API doesn't do this, and thus can fail to completely roll back a transaction. Of course, the tradeoff is that the old code only gave you information about the first exception that occurred, and not any of the later ones. Perhaps the answer is to make the transaction keep track of which data managers have received which messages, and to require the caller to keep 'abort()'-ing until all data managers have been aborted, even if each one raises errors? I don't really know what's "right" here. If the first data manager's failure causes subsequent DM's to fail, what then? How much retry and recovery logic code must somebody put into their application, in order to guarantee correctness and recovery? Isn't that what the transaction API is *for*? I guess my inclination at this point is to think that maybe the transaction needs to have some kind of log - not in the 'logging' module sense, but in the sense of a list of actions performed and errors occurred. These errors could then be wrapped up in another exception or a return value upon completion of operations like abort() and commit(). Then, if somebody wants to analyze it, they have all the data. But I don't believe it makes sense for the application to try to correct errors "under the hood" of the transaction. Data managers should handle their own errors, if there's any handling to be done. Any analysis of the errors after the fact is going to be by a human being, to figure out how to fix the application or the data managers so they don't do whatever it is in the first place, or so that they catch the problem before it becomes an error in a commit or abort operation. From jeremy@alum.mit.edu Tue Aug 20 05:32:18 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Tue, 20 Aug 2002 00:32:18 -0400 Subject: [Persistence-sig] Re: ACID, savepoints, and exceptions (was re: "Straw Man"transaction API) In-Reply-To: <5.1.0.14.0.20020818111324.02942350@mail.telecommunity.com> References: <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <3.0.5.32.20020719120237.00898b60@telecommunity.com> <5.1.0.14.0.20020818111324.02942350@mail.telecommunity.com> Message-ID: <15713.50770.267777.438658@slothrop.zope.com> >>>>> "PJE" == Phillip J Eby writes: PJE> During the past week, I've been writing a TransactionService PJE> for PEAK, specifically designing it to allow PJE> interaction/adaptation to the new ZODB4 transaction API, and PJE> extending it to support the multi-prepare and durable PJE> subscription models that I need for my applications and PJE> framework projects. Glad to hear you're making progress. I've been completely swamped with other things, so I haven't had any time for ZODB4 since we last talked. PJE> In my API I've standardized on a 'CannotRevertException' when PJE> rollback to a savepoint is not possible, and added a PJE> 'NullSavepoint' object which can be returned by an object that PJE> has nothing to do on rollback. NullSavepoint is just an implementation convenience, right? PJE> An open issue that needs to be addressed, however, is the PJE> question of rolling back more than once to the same savepoint. PJE> In some ways, it's a very handy capability, but I'm not sure PJE> which databases support this. Let me ask the question the other way: Of the databases that support savepoints, which ones don't support this? PJE> I'm therefore inclined to say we PJE> should explicitly say that a savepoint can be rolled back at PJE> most once (since some savepoints may not be able to be rolled PJE> back). I want savepoints that can be returned to multiple times. If a database supports savepoints at all, I don't see why it wouldn't support multiple rollbacks. (If it didn't, an adapter could just call savepoint() as part of finishing each rollback().) Multiple rollbacks is necessary to support nested transactions. PJE> Another open issue: what happens if a rollback fails? Is the PJE> transaction "hosed" at that point? I think it is. PJE> What if five data managers PJE> roll back, and the sixth one fails? Exactly. If you can't be sure each of the data managers is in a consistent state, you need to abort the transaction. PJE> This suggests adding a PJE> 'canRollback()' method to the interface, such that a rollback PJE> aggregator can check that its aggregated savepoints can PJE> actually be rolled back, so that "CannotRevert" errors don't PJE> cause the transaction to be hosed. It's probably good to have some way to query this, although I feel like the predicate methods for testing features haven't worked out all that well in the ZODB3 storage api. What about that client code has access to would support the canRollback() method? It seems like it depends on which objects are participating in the transaction. I tend more towards an ask for forgiveness (AFF) than a look before you leap (LBYL). If savepoint() returned None when it wasn't possible to rollback, that would be good enough, no? The clients know, for their specific transaction, whether rollback is going to work. The savepoint() presumably hasn't caused too much extra work in those cases. PJE> However, the issue of PJE> another type of exception occurring during rollback still must PJE> be addressed. Yes. >> I think I'm also in favor of the new abort semantics. ZODB3 >> would abort the transactions -- call abort() on all the data >> managers -- if an error occurred during a commit. The new code >> requires that the user do this instead. I think that's better, >> because it leaves the state of the objects intact if the code >> wants to analyze what went wrong before retrying the transaction. PJE> The interesting question here again is, is the transaction PJE> "hosed"? Should there be a flag that says, "you can't do PJE> anything to this transaction but abort it"? Yes, and yes. PJE> To put it in broader terms, if *any* exception is thrown during PJE> execution of a transaction-related method, should we consider PJE> the transaction unrecoverable? Yes. If a resource manager raises an unexpected exception, you've got no idea what state its in or whether it can/has committed the data. PJE> I'm inclined to say yes, because I can think of too many code PJE> paths in both my and the ZODB4 transaction code where it PJE> becomes nearly impossible to guarantee a "clean" state when an PJE> exception occurs. By definition, if code called by the PJE> transaction system raises an exception, it is announcing that PJE> it cannot satisfy its contract with the transaction. PJE> Therefore, the transaction cannot be certain of satisfying its PJE> contract with the application for a clean commit. Right. PJE> Another issue here is clean aborts. If an error is raised by a PJE> data manager during abort, what should the semantics be? Older PJE> ZODB transaction classes wrap every data manager abort call in PJE> a try-except that ensures that *all* the abort methods get PJE> called, even if several of them raise errors. The new ZODB4 PJE> transaction API doesn't do this, and thus can fail to PJE> completely roll back a transaction. I tried to do as little as possible within the commit() implementation to deal with errors. I figured if an error occurs, the client had better abort the transaction explicitly. The documentation for ZODB3 said that clients needed to do this, but the implementation didn't work that way. Jeremy From pje@telecommunity.com Thu Aug 22 00:02:48 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Wed, 21 Aug 2002 19:02:48 -0400 Subject: [Persistence-sig] Re: ACID, savepoints, and exceptions (was re: "Straw Man"transaction API) Message-ID: <5.1.1.6.0.20020821190246.009f10b0@mail.telecommunity.com> At 12:32 AM 08/20/2002 -0400, Jeremy Hylton wrote: > >>>>> "PJE" == Phillip J Eby writes: > PJE> In my API I've standardized on a 'CannotRevertException' when > PJE> rollback to a savepoint is not possible, and added a > PJE> 'NullSavepoint' object which can be returned by an object that > PJE> has nothing to do on rollback. > >NullSavepoint is just an implementation convenience, right? Yep. > PJE> An open issue that needs to be addressed, however, is the > PJE> question of rolling back more than once to the same savepoint. > PJE> In some ways, it's a very handy capability, but I'm not sure > PJE> which databases support this. > >Let me ask the question the other way: Of the databases that support >savepoints, which ones don't support this? An interesting question. The reason I'm iffy about it is that the ones I looked at (Sybase, Oracle, and SleepyCat/BerkeleyDB) weren't very precise in their docs, at least the docs I looked at. They simply didn't mention what happens to a savepoint once you roll back to it. SleepyCat offers nested transactions, which I *believe* are terminated upon rollback, just like top-level transactions. So anything implemented on a SleepyCat back-end might need to work around this issue. > PJE> I'm therefore inclined to say we > PJE> should explicitly say that a savepoint can be rolled back at > PJE> most once (since some savepoints may not be able to be rolled > PJE> back). > >I want savepoints that can be returned to multiple times. If a >database supports savepoints at all, I don't see why it wouldn't >support multiple rollbacks. (If it didn't, an adapter could >just call savepoint() as part of finishing each rollback().) Multiple >rollbacks is necessary to support nested transactions. I don't think that rollback to the *same* savepoint is necessary, but I suppose the point is moot, since even a DB that didn't allow multiple rollbacks would logically support creating a second savepoint at the location you got to after rolling back the first. It's a little more work to implement in that case, but I think I agree with your logic. But... there is a difference in implementation burden that applies here. How many applications will use savepoints as part of their natural flow, and is it too much to ask to have them do: while 1: sp = txn.savepoint() try: # do something that might fail... except: sp.rollback() continue The only difference here, as far as I can see, is that the savepoint() call is in the loop (in my suggested approach) instead of just above and outside it (as it would be with reusable savepoints). Perhaps there's something else you're using savepoints for that doesn't look like this sort of loop, in which case it would be interesting to learn about that use case. > PJE> This suggests adding a > PJE> 'canRollback()' method to the interface, such that a rollback > PJE> aggregator can check that its aggregated savepoints can > PJE> actually be rolled back, so that "CannotRevert" errors don't > PJE> cause the transaction to be hosed. > >It's probably good to have some way to query this, although I feel >like the predicate methods for testing features haven't worked out all >that well in the ZODB3 storage api. What about that client code has >access to would support the canRollback() method? It seems like it >depends on which objects are participating in the transaction. > >I tend more towards an ask for forgiveness (AFF) than a look before >you leap (LBYL). If savepoint() returned None when it wasn't possible >to rollback, that would be good enough, no? The clients know, for >their specific transaction, whether rollback is going to work. The >savepoint() presumably hasn't caused too much extra work in those >cases. Okay. So what you're saying is, document that savepoint() returns an IRollback or None, and None means you can't roll back to the savepoint. And if any participant returns None for the savepoint() call, the transaction must return None from its savepoint() call. I'm good with that; my primary goal here is just to remove the ambiguity of what happens when something can savepoint() but not rollback(). > PJE> Another issue here is clean aborts. If an error is raised by a > PJE> data manager during abort, what should the semantics be? Older > PJE> ZODB transaction classes wrap every data manager abort call in > PJE> a try-except that ensures that *all* the abort methods get > PJE> called, even if several of them raise errors. The new ZODB4 > PJE> transaction API doesn't do this, and thus can fail to > PJE> completely roll back a transaction. > >I tried to do as little as possible within the commit() implementation >to deal with errors. I figured if an error occurs, the client had >better abort the transaction explicitly. The documentation for ZODB3 >said that clients needed to do this, but the implementation didn't >work that way. Er, the paragraph I wrote above is about the abort() method; the word "commit" isn't even in the the paragraph. :) I'm fine with the idea of requiring an explicit abort() by the application upon exception during commit(). It's the fact that ZODB4 doesn't trap errors during *abort()* that's an issue for me, relative to older ZODB versions. When I get back from the Enterprise Architecture summit, I plan to redo some things in my own "straw man" transactions for PEAK. I realized on the trip up here, that I haven't really thought through some of the ramifications of Shane's "multi-pass commit" counter-proposal to my "write-through cascade" architecture. For example, durable subscriptions make less sense in the multi-pass commit model, because there are more objects to call, more times, up to O(n^2) in the degenerate case, for fairly large "n" (I expect to have dozens of data managers per app, although relatively few will have active involvement in a given transaction). I also need to think through how the re-pass protocol will work, given the absence of durable subscriptions. I have some hope that these re-thinks will make the API leaner and meaner than I currently have it, while retaining "Zopeward compatibility". Ideally, we should be able to each present our somewhat different transaction models to the SIG, as a jumping-off point for future discussion. I have lowered my expectations somewhat, however, with respect to the SIG's goal of a transaction API. Previously I hoped to use the to-be-decided API as PEAK's core transaction API, but now I'm aspiring merely to have in PEAK an API that can be adapted to that of the SIG. Or, if I turn out to be really lucky, the PEAK API may merely end up being a slight superset relative to the SIG API. Unfortunately, I have too much code in too many projects which need the PEAK transaction API to exist already, and so I need to move forward with *something*, even if I end up having to do some refactoring later. Luckily, however, my first draft at an actual PEAK implementation, both of a standalone transaction service and as a transaction service layered over the ZODB4 transaction API, verified for me that it's possible to do this kind of layering, as long as the underlying transaction API is at least as rich as that of ZODB4. And I'm guessing the SIG isn't going to endorse any transaction model that isn't at least that rich. :) From paoloinvernizzi@dmsware.com Mon Aug 26 09:19:54 2002 From: paoloinvernizzi@dmsware.com (Paolo Invernizzi) Date: Mon, 26 Aug 2002 10:19:54 +0200 Subject: [Persistence-sig] persistence to code-related-object? Message-ID: <754797929.20020826101954@dmsware.com> Hello folks, I was looking at the class persistence module in current zodb, and the current implementation is pretty broken. (I've posted details on the zodb-dev mailing list...) Jeremy adviced me to ask all you if the problem of persistence applied to code stuff (function,module,classes,code-object,...) is relevant to this sig or not. I'm pretty interested, as applying persistence machinery to code related object and leveraging the import module to handle imports from persistent module, spread new interesting paths to investigate. Paolo Invernizzi From jsasmor@gte.net Mon Aug 26 16:50:51 2002 From: jsasmor@gte.net (Jeff Sasmor) Date: Mon, 26 Aug 2002 11:50:51 -0400 Subject: [Persistence-sig] persistence to code-related-object? References: <754797929.20020826101954@dmsware.com> Message-ID: <005601c24d18$6747d570$0601a8c0@NETKOOK> Hello, Paolo, I have done what you're writing about, I think, if what you mean is being able to import from a code object stored in the ZODB. It's part of a Python development system that I am working on, and actually it's not all that hard to do (now, getting 'reload' to work properly is **much** more challenging). The key to getting this to work is the imputil package which I believe was originally written by Greg Ward and extended by the same folk(s) who wrote the McMillin installer package. You can find it at their (McMillin) website or email me and I can fwd it to you. I have a whole folder/object structure similar to what Zope looks like, with the ZODB (ZODB 3, so far) underlying it. That made it simpler to deal with the import issue, you just say import folder.object where the object is a PythonMethod obj that has a GUI editor (wxPython/Scintilla) and a code obj embedded in it. So you can start with the imputil package or wait a few months and leverage off what I have done. jms #-------------------------------- Jeff Sasmor jeff@sasmor.com ----- Original Message ----- From: "Paolo Invernizzi" To: Sent: Monday, August 26, 2002 4:19 AM Subject: [Persistence-sig] persistence to code-related-object? Hello folks, I was looking at the class persistence module in current zodb, and the current implementation is pretty broken. (I've posted details on the zodb-dev mailing list...) Jeremy adviced me to ask all you if the problem of persistence applied to code stuff (function,module,classes,code-object,...) is relevant to this sig or not. I'm pretty interested, as applying persistence machinery to code related object and leveraging the import module to handle imports from persistent module, spread new interesting paths to investigate. Paolo Invernizzi _______________________________________________ Persistence-sig mailing list Persistence-sig@python.org http://mail.python.org/mailman-21/listinfo/persistence-sig From jeremy@alum.mit.edu Mon Aug 26 16:53:18 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 26 Aug 2002 11:53:18 -0400 Subject: [Persistence-sig] persistence to code-related-object? In-Reply-To: <005601c24d18$6747d570$0601a8c0@NETKOOK> References: <754797929.20020826101954@dmsware.com> <005601c24d18$6747d570$0601a8c0@NETKOOK> Message-ID: <15722.20206.173867.19003@slothrop.zope.com> I suspect that transparently storing code in the database is an advanced topic that won't be addressed by the PEPs, although I'm happy to consider it if it ends up being an important requirement. It is an important requirement for Zope, and we've got an incomplete version in the ZODB4 code base. It's based on ihooks, rather than imputils. Paolo has also pointed out that we've got a write-only implementation at the moment :-(. None of the tests verify that a class can be reloaded from the database. Jeremy From jsasmor@gte.net Mon Aug 26 18:33:41 2002 From: jsasmor@gte.net (Jeff Sasmor) Date: Mon, 26 Aug 2002 13:33:41 -0400 Subject: [Persistence-sig] persistence to code-related-object? References: <754797929.20020826101954@dmsware.com><005601c24d18$6747d570$0601a8c0@NETKOOK> <15722.20206.173867.19003@slothrop.zope.com> Message-ID: <007701c24d26$bd3c5920$0601a8c0@NETKOOK> I'd respectfully suggest looking at the augmented imputils found on the mcmillan website: http://www.mcmillan-inc.com/importhooks.html I found this reasonably easy to extend this to my project. How one would wedge this in to supporting any arbitrary Python code, that is, something without a way of specifying a 'path' to a class as in: from whatever.x import something is another matter; I never explored that aspect since I was only trying to solve one problem! jeff #-------------------------------- Jeff Sasmor jeff@sasmor.com ----- Original Message ----- From: "Jeremy Hylton" To: "Jeff Sasmor" Cc: Sent: Monday, August 26, 2002 11:53 AM Subject: Re: [Persistence-sig] persistence to code-related-object? I suspect that transparently storing code in the database is an advanced topic that won't be addressed by the PEPs, although I'm happy to consider it if it ends up being an important requirement. It is an important requirement for Zope, and we've got an incomplete version in the ZODB4 code base. It's based on ihooks, rather than imputils. Paolo has also pointed out that we've got a write-only implementation at the moment :-(. None of the tests verify that a class can be reloaded from the database. Jeremy _______________________________________________ Persistence-sig mailing list Persistence-sig@python.org http://mail.python.org/mailman-21/listinfo/persistence-sig From paoloinvernizzi@dmsware.com Tue Aug 27 08:24:44 2002 From: paoloinvernizzi@dmsware.com (Paolo Invernizzi) Date: Tue, 27 Aug 2002 09:24:44 +0200 Subject: [Persistence-sig] persistence to code-related-object? In-Reply-To: <15722.20206.173867.19003@slothrop.zope.com> References: <754797929.20020826101954@dmsware.com> <005601c24d18$6747d570$0601a8c0@NETKOOK> <15722.20206.173867.19003@slothrop.zope.com> Message-ID: <552916553.20020827092444@dmsware.com> Hello Jeremy, JH> I suspect that transparently storing code in the database is an JH> advanced topic that won't be addressed by the PEPs, although I'm happy JH> to consider it if it ends up being an important requirement. Yep, I think that the sig can reach a PEP without address this face of the problem, still taking the door open for future expansions. What I mean is "let's keep an eye on that, and let's avoid API that would bring that problem not solvable." JH> It is an important requirement for Zope, and we've got an incomplete JH> version in the ZODB4 code base. Yep. I've read Jeff mail, and I know ihook and Gordon's iu.py. Actually for my project I've worked with Gordon's iu, and I've wrote an import director for ZODB4 (indeed the installer in a whole is cool material!). But as Jeff pointed out, the real "challenge" is the "reload" abstraction/implementation. Basically the "reload" is like a "commit" of modified code into an existing application, but the big challenge of updating existing instances of reloaded classes is left behind (as actually nothing happens, existing instances are not affected). If we apply persistent machinery to code-object, we can begin to think of possible limited solution/problems to archive this goal. Jeremy, Am I loosing my time if I start playing around connection implementation and cache? -- Best regards, Paolo mailto:paoloinvernizzi@dmsware.com From jeremy@alum.mit.edu Fri Aug 30 00:25:09 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 29 Aug 2002 19:25:09 -0400 Subject: [Persistence-sig] 'cucumber' In-Reply-To: <20020818000311.GA14227@caltech.edu> References: <20020818000311.GA14227@caltech.edu> Message-ID: <15726.44373.948135.670546@slothrop.zope.com> >>>>> "TB" == Titus Brown writes: TB> Hi everyone, I thought I'd toss my own little package into the TB> fray. I've written a fairly simple O/R mapping system named TB> 'cucumber' that sits on top of PostgreSQL. By making use of TB> PG's inheritance hierarchies, cucumber class inheritance TB> relations can be mapped directly into PostgreSQL in a very TB> simple and transparent way. Thanks for telling us about your system. I don't know if there's much fray here at the moment . People seem to be busier working than chatting about persistent. How do you see cucumber fitting into the SIG's goal of generic Python APIs for transactions and persistence? I haven't had a chance to look at cucumber, so I don't know what its implementation looks like. Do the ZODB4-based APIs discussed earlier look reasonable to you? Jeremy From jeremy@alum.mit.edu Fri Aug 30 00:33:53 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 29 Aug 2002 19:33:53 -0400 Subject: [Persistence-sig] Is is possible to separate programming API from IDataManager api? In-Reply-To: <001c01c2454b$a08e9dc0$0100a8c0@sicem.biz> References: <001c01c2454b$a08e9dc0$0100a8c0@sicem.biz> Message-ID: <15726.44897.432951.853172@slothrop.zope.com> >>>>> "ER" == Ernesto Revilla writes: ER> Is it possible to separate programming API from IDataManager API ER> and define them separate and independently? I'm not sure what you mean. The APIs are separate, but any useful package is going to need to have both APIs. In other words, we can makes changes to the APIs independently, but we need to make decisions about both of them. ER> Can there be a mechanism to vote about the API, like a Wiki with ER> a voting mechanism (with dead-lines)? I like the IETF model for Python SIGs: We don't believe in kinds, presidents, or voting. We believe in rough consensus and running code. ER> Could we get a level 1 API, without multi-thread issues and ER> multiple-save points? (20/80 -> use 20% of resources to solve ER> 80% of use cases, leave other 15% for level 2 API). ER> Please, just CRUD (Basic Create, Update, Delete). Transactions ER> with minimum, i.e. begin, commit and rollback, without nesting. ER> Please set minimum of metadata, so the resource manager can ER> imagine how to store the data. I certainly don't think we can ignore threads. There are lots of multi-threaded programs and we need some simple way to accomodate them. I think the latest ideas that Phillip and I discussed deal with threads pretty well. The API issues for multiple save points also seem pretty straightforward. It's a very useful feature, supported by many databases, so I'd rather not leave it off. Put another way, I don't think that the difficulties of nested transactions or multi-threading are preventing us from making progress. Jeremy From aerd@retemail.es Fri Aug 30 01:27:30 2002 From: aerd@retemail.es (Ernesto Revilla) Date: Fri, 30 Aug 2002 02:27:30 +0200 Subject: [Persistence-sig] API proposals References: <001c01c2454b$a08e9dc0$0100a8c0@sicem.biz> <15726.44897.432951.853172@slothrop.zope.com> Message-ID: <001a01c24fbc$09a290d0$0100a8c0@sicem.biz> Hi again, although there was a discussion about the API, I don't find the right source whre to get a summary. Is there any draft posted? I'm actually writing a persistence-layer for multiple backend technologies, and I would like to be as near as possible, to make future migrations easier. Thanx, Erny From titus@caltech.edu Fri Aug 30 07:39:09 2002 From: titus@caltech.edu (Titus Brown) Date: Thu, 29 Aug 2002 23:39:09 -0700 Subject: [Persistence-sig] 'cucumber' In-Reply-To: <15726.44373.948135.670546@slothrop.zope.com> References: <20020818000311.GA14227@caltech.edu> <15726.44373.948135.670546@slothrop.zope.com> Message-ID: <20020830063909.GA29994@caltech.edu> -> TB> Hi everyone, I thought I'd toss my own little package into the -> TB> fray. I've written a fairly simple O/R mapping system named -> TB> 'cucumber' that sits on top of PostgreSQL. By making use of -> TB> PG's inheritance hierarchies, cucumber class inheritance -> TB> relations can be mapped directly into PostgreSQL in a very -> TB> simple and transparent way. -> -> Thanks for telling us about your system. I don't know if there's much -> fray here at the moment . People seem to be busier working than -> chatting about persistent. What a shame . -> How do you see cucumber fitting into the SIG's goal of generic Python -> APIs for transactions and persistence? I haven't had a chance to look -> at cucumber, so I don't know what its implementation looks like. Do -> the ZODB4-based APIs discussed earlier look reasonable to you? I'm having trouble following the ZODB-based discussion, because I don't understand where y'all are coming from; I've never used either Zope or the standalone ZODB distribution, partly because I would prefer to use databases that are accessible across languages. I have also not been on the list for very long & haven't been able to read through all of the archives in the detail they require. cucumber by itself is not particularly important -- it's nice and simple and useful and all that, but a bit too odd to be widely adopted -- but I'd like to see if it can be fit nicely into whatever transaction & (data) persistence APIs are produced. I think cucumber (or, rather, PostgreSQL, but with the cucumber adaptor layer) provides a nice example of what a minimal persistence and transaction API should be able to encompass; if the APIs produced are significantly more complicated than required for cucumber, I'd argue that perhaps this SIG is making things too complex. As I'm sure you all know, PostgreSQL provides a simple ACID-compliant SQL database with commit & rollback. The main thing I've been confused about in the discussions so far is how exactly savepoints work -- they seem to be a bit more complicated than normal transactions -- and whether or not they'll be a necessary part of the final product. Other than that, I'll happily lurk until I feel I have something to contribute ;). cheers, --titus