From jacobs@penguin.theopalgroup.com Mon Jul 8 21:09:25 2002 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Mon, 8 Jul 2002 16:09:25 -0400 (EDT) Subject: [Persistence-sig] Is anyone here yet? Message-ID: Is anyone here yet? For a moment, I am king. ;) -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From marklists@mceahern.com Mon Jul 8 21:20:43 2002 From: marklists@mceahern.com (Mark McEahern) Date: Mon, 8 Jul 2002 15:20:43 -0500 Subject: [Persistence-sig] Is anyone here yet? In-Reply-To: Message-ID: > Is anyone here yet? it's a small world so far: http://mail.python.org/mailman-21/roster/persistence-sig // m - From jeremy@zope.com Tue Jul 9 14:15:32 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Tue, 9 Jul 2002 09:15:32 -0400 Subject: [Persistence-sig] getting started Message-ID: <15658.57844.17239.668311@slothrop.zope.com> It looks like many of the people who expressed interest in the SIG have subscribed to the list, so we ought to get started. I think we should begin with some introductions and a review of the SIG charter. Introductions: Please tell us about your interest in the persistence SIG, what personal/professional goals you have for it, and how much time & energy you have. (Feel free to lurk if that's your preference.) Charter: Jim Fulton wrote the SIG charter. A very brief summary is that we should: - focus on transparency, transactions, and memory-caching issues; - put off concurrency control, queries, and constraints; - produce PEPs and, if there is consensus, code for the std library. Does that sound like the right set of initial constraints? Are there other issues to consider or avoid? In the brief discussion on the meta-sig, several related projects were mentioned. It would be helpful to capture a brief summary of each on the SIG web pages. I'll follow up with my into soon. Jeremy From jacobs@penguin.theopalgroup.com Tue Jul 9 20:02:22 2002 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Tue, 9 Jul 2002 15:02:22 -0400 (EDT) Subject: [Persistence-sig] getting started In-Reply-To: <15658.57844.17239.668311@slothrop.zope.com> Message-ID: Hi Jeremy and other persistent folk, My primary interest has to do with developing high performance enterprise-objects and object-relational mapping systems using new-style Python class features. A secondary interest involves distributed transaction management frameworks, and heterogeneous backing stores. I plan to devote a significant amount of my own time, as well as that of my development team, to propose standards and produce reference implementations of ideas developed here. Looking forward to the future, -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From pobrien@orbtech.com Tue Jul 9 20:15:54 2002 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Tue, 9 Jul 2002 14:15:54 -0500 Subject: [Persistence-sig] getting started In-Reply-To: <15658.57844.17239.668311@slothrop.zope.com> Message-ID: [Jeremy Hylton] > > It looks like many of the people who expressed interest in the SIG > have subscribed to the list, so we ought to get started. I think we > should begin with some introductions and a review of the SIG charter. > > Introductions: Please tell us about your interest in the persistence > SIG, what personal/professional goals you have for it, and how much > time & energy you have. (Feel free to lurk if that's your preference.) Sounds good to me. I've been programming in Python for about a year and a half now, and various other languages for the past 15 years. I've also done a lot of work with relational databases and data modeling. I'm the author of PyCrust (a Python shell written in wxPython) and a developer on the PythonCard project (an app building framework for wxPython). I've created a couple of websites with Quixote and various utilities with Python. I'm also in the middle of creating an xhtml-compliant html generator similar to htmlgen. My interest in this SIG is directly related to my interest in using ZODB outside of Zope to create medium-sized applications with persistent objects instead of a traditional relational database approach using SQL. I think ZODB is very good, but more could be done to make it easier to use by someone familiar with relational databases. Along those lines I started a project to make ZODB easier, called Bulldozer, and you can get the code from SourceForge at http://sourceforge.net/projects/bdoz. There isn't any documentation so the only clues to my intent are in the source and unit tests and this wiki page at http://www.orbtech.com/wiki/BullDozer. Unfortunately, I haven't had the time or energy to make any progress on this project for the past couple of months. I think there are lots of applications that need persistent data but don't necessarily need or benefit from a relational database. There is also much to be gained from not having to translate objects into relational tuples and back again. And I think have good persistence support in the Python core would be a really good selling point for Python. Transparent persistence is also a hot item on the PythonCard project right now. I've got a lot of interest in this topic so I'll do my best to make time available. I'm on vacation all next week and then I'll be at OSCON. It would be great to discuss this topic in person there. Does anyone plan to set up a Birds-of-a-feather session at OSCON? -- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python software development." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien ----------------------------------------------- From guido@python.org Tue Jul 9 20:40:34 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 09 Jul 2002 15:40:34 -0400 Subject: [Persistence-sig] getting started In-Reply-To: Your message of "Tue, 09 Jul 2002 14:15:54 CDT." References: Message-ID: <200207091940.g69JeYw03746@odiug.zope.com> > I've got a lot of interest in this topic so I'll do my best to make time > available. I'm on vacation all next week and then I'll be at OSCON. It would > be great to discuss this topic in person there. Does anyone plan to set up a > Birds-of-a-feather session at OSCON? Could you set up this BOF? I think it would be a good idea. --Guido van Rossum (home page: http://www.python.org/~guido/) From pje@telecommunity.com Tue Jul 9 21:18:13 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 09 Jul 2002 16:18:13 -0400 Subject: [Persistence-sig] getting started In-Reply-To: References: <15658.57844.17239.668311@slothrop.zope.com> Message-ID: <3.0.5.32.20020709161813.01aa9d10@telecommunity.com> At 03:02 PM 7/9/02 -0400, Kevin Jacobs wrote: > >My primary interest has to do with developing high performance >enterprise-objects and object-relational mapping systems using new-style >Python class features. A secondary interest involves distributed >transaction management frameworks, and heterogeneous backing stores. > >I plan to devote a significant amount of my own time, as well as that of my >development team, to propose standards and produce reference implementations >of ideas developed here. > I think I can safely say, "me too", on all of the above. :) From pobrien@orbtech.com Tue Jul 9 21:41:17 2002 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Tue, 9 Jul 2002 15:41:17 -0500 Subject: [Persistence-sig] getting started In-Reply-To: <200207091940.g69JeYw03746@odiug.zope.com> Message-ID: > > Birds-of-a-feather session at OSCON? > > Could you set up this BOF? I think it would be a good idea. > > --Guido van Rossum (home page: http://www.python.org/~guido/) Done. I'll let you know when I hear back from Gretchen at O'Reilly. -- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python software development." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien ----------------------------------------------- From bzimmer@ziclix.com Wed Jul 10 04:16:37 2002 From: bzimmer@ziclix.com (brian zimmer) Date: Tue, 9 Jul 2002 22:16:37 -0500 Subject: [Persistence-sig] getting started In-Reply-To: <15658.57844.17239.668311@slothrop.zope.com> Message-ID: <002001c227c0$36ed7150$6401a8c0@mountain> Hi all, I am primarily interested in relational databases, OR mappings and distributed transactions. As the author of zxJDBC (the Jython implementation of the DB API) I'm curious to see if anything proposed here has large ramifications on Jython development. thanks, brian From Sebastien.Bigaret@inqual.com Wed Jul 10 08:17:11 2002 From: Sebastien.Bigaret@inqual.com (Sebastien Bigaret) Date: 10 Jul 2002 09:17:11 +0200 Subject: [Persistence-sig] getting started In-Reply-To: "Phillip J. Eby"'s message of "Tue, 09 Jul 2002 16:18:13 -0400" References: <15658.57844.17239.668311@slothrop.zope.com> <3.0.5.32.20020709161813.01aa9d10@telecommunity.com> Message-ID: <87eleckq6g.fsf@bidibule.brest.inqual.bzh> "Phillip J. Eby" writes: > At 03:02 PM 7/9/02 -0400, Kevin Jacobs wrote: > > > >My primary interest has to do with developing high performance > >enterprise-objects and object-relational mapping systems using new-s= tyle > >Python class features. A secondary interest involves distributed > >transaction management frameworks, and heterogeneous backing stores. > > > >I plan to devote a significant amount of my own time, as well as tha= t of my > >development team, to propose standards and produce reference impleme= ntations > >of ideas developed here. > > >=20 > I think I can safely say, "me too", on all of the above. :) So do I ;) I would also add that I'm primarily interested in OR mapping, that part= of my working time is actually dedicated to that subject ; I work on a projec= t, a framework dedicated to object/relational mapping [1], so I am already d= evoting time on the subject and will be pleased to participate in elaborating standards, etc. Jeremy> [SIG charter] Does that sound like the right set of initial Jeremy> constraints? Are there other issues to consider or avoid? Ok for me. -- S=E9bastien. [1] soon to be open-sourced, when the last legal problems are eliminate= d. I'll announce it here then --hopefully next week. From pobrien@orbtech.com Wed Jul 10 13:43:47 2002 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Wed, 10 Jul 2002 07:43:47 -0500 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation Message-ID: The OSCON BOF information appears below. Let me know if there are any problems with the date and time that I requested. Otherwise, I'll see you all there. -- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python software development." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien ----------------------------------------------- -----Original Message----- From: Gretchen Bartholomew [mailto:gretchen@oreilly.com] Sent: Tuesday, July 09, 2002 9:28 PM To: pobrien@orbtech.com Cc: Vee McMillen; gretchen@oreilly.com Subject: OSCON Birds of a Feather Session - confirmation Dear Mr. O'Brien: Thank you for submitting a proposal to moderate a Birds of a Feather session (BOF) at the upcoming O'Reilly Open Source Convention -- July 22 - 26, 2002 in San Diego, CA. Your BOF proposal has been accepted and I would like to schedule your BOF for the following date/times. Please let me know if you have any conflicts with this itinerary. ====== Title: Python Persistence Date: Thursday, July 25 Time: 8:00 - 10:00 pm Location: Grande Ballroom C Moderator: Patrick O'Brien, Orbtech Summary: A Python Persistence Special Interest Group was recently formed to explore ways to add basic persistence and transaction mechanisms into the core of Python to avoid duplication of effort by a variety of projects that have similar issues. This BOF will permit participants to ponder Python persistence in person. ======== The BOF session information, as seen above, will be posted on the conference BOF page: http://conferences.oreillynet.com/pub/w/15/bof.html Audio/visual equipment and A/V support is not supplied by O'Reilly & Associates for BOF sessions. If you have any questions or concerns, please do not hesitate to contact me. We look forward to seeing you in San Diego. Kind Regards, Gretchen Gretchen Bartholomew Conf. Planning Coordinator O'Reilly & Associates Phone: 707-827-7186 Fax: 707-823-9746 ============ O'Reilly Open Source Convention Sheraton San Diego Hotel & Marina July 22 - 26, 2002 -- San Diego, CA http://conferences.oreilly.com/oscon/ ============ ============ O'Reilly Mac OS X Conference Westin Santa Clara Sept. 30 - Oct. 3, 2002 -- Santa Clara, CA http://conferences.oreillynet.com/macosx2002/ ============ From jim@zope.com Wed Jul 10 13:55:03 2002 From: jim@zope.com (Jim Fulton) Date: Wed, 10 Jul 2002 08:55:03 -0400 Subject: [Persistence-sig] getting started References: <15658.57844.17239.668311@slothrop.zope.com> Message-ID: <3D2C2EA7.6050502@zope.com> Jeremy Hylton wrote: ... > Introductions: Please tell us about your interest in the persistence > SIG, what personal/professional goals you have for it, This is pretty much covered in the SIG charter. :) > and how much > time & energy you have. (Feel free to lurk if that's your preference.) Not as much as I'd like, but I will try to make time. Fortunately, Jeremy and others at Python Labs are involved. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim@zope.com Wed Jul 10 14:05:57 2002 From: jim@zope.com (Jim Fulton) Date: Wed, 10 Jul 2002 09:05:57 -0400 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation References: Message-ID: <3D2C3135.4070401@zope.com> Is there any chance we could move this to Wednesday? I'm leaving Thursday morning. :( Jim Patrick K. O'Brien wrote: > The OSCON BOF information appears below. Let me know if there are any > problems with the date and time that I requested. Otherwise, I'll see you > all there. > > -- > Patrick K. O'Brien > Orbtech > ----------------------------------------------- > "Your source for Python software development." > ----------------------------------------------- > Web: http://www.orbtech.com/web/pobrien/ > Blog: http://www.orbtech.com/blog/pobrien/ > Wiki: http://www.orbtech.com/wiki/PatrickOBrien > ----------------------------------------------- > > -----Original Message----- > From: Gretchen Bartholomew [mailto:gretchen@oreilly.com] > Sent: Tuesday, July 09, 2002 9:28 PM > To: pobrien@orbtech.com > Cc: Vee McMillen; gretchen@oreilly.com > Subject: OSCON Birds of a Feather Session - confirmation > > > Dear Mr. O'Brien: > > Thank you for submitting a proposal to moderate a Birds of a Feather session > (BOF) at the upcoming O'Reilly Open Source Convention -- July 22 - 26, > 2002 in San Diego, CA. > > Your BOF proposal has been accepted and I would like to schedule your BOF > for the following date/times. Please let me know if you have any conflicts > with this itinerary. > > ====== > > Title: Python Persistence > Date: Thursday, July 25 > Time: 8:00 - 10:00 pm > Location: Grande Ballroom C > Moderator: Patrick O'Brien, Orbtech > Summary: A Python Persistence Special Interest Group was recently > formed to explore ways to add basic persistence and transaction mechanisms > into the core of Python to avoid duplication of effort by a variety of > projects that have similar issues. This BOF will permit participants to > ponder Python persistence in person. > > ======== > > The BOF session information, as seen above, will be posted on the conference > BOF page: > http://conferences.oreillynet.com/pub/w/15/bof.html > > Audio/visual equipment and A/V support is not supplied by O'Reilly & > Associates for BOF sessions. > > If you have any questions or concerns, please do not hesitate to contact me. > We look forward to seeing you in San Diego. > > Kind Regards, > > Gretchen > > > > Gretchen Bartholomew > Conf. Planning Coordinator > O'Reilly & Associates > > Phone: 707-827-7186 > Fax: 707-823-9746 > > > ============ > O'Reilly Open Source Convention > Sheraton San Diego Hotel & Marina > July 22 - 26, 2002 -- San Diego, CA > > http://conferences.oreilly.com/oscon/ > ============ > > ============ > O'Reilly Mac OS X Conference > Westin Santa Clara > Sept. 30 - Oct. 3, 2002 -- Santa Clara, CA > > http://conferences.oreillynet.com/macosx2002/ > ============ > > > > _______________________________________________ > Persistence-sig mailing list > Persistence-sig@python.org > http://mail.python.org/mailman-21/listinfo/persistence-sig > -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From pobrien@orbtech.com Wed Jul 10 14:19:32 2002 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Wed, 10 Jul 2002 08:19:32 -0500 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation In-Reply-To: <3D2C3135.4070401@zope.com> Message-ID: [Jim Fulton] > > Is there any chance we could move this to Wednesday? > > I'm leaving Thursday morning. :( The only timeslot on Wednesday is from 8 to 10 and that is taken up by: Python Software Foundation Date: 07/24/2002 Time: 8:00pm - 10:00pm Location: Marina II in the East Tower Moderated by: Guido van Rossum The only timeslot on Tuesday is from 6 to 7 and that is taken up by: What is Python? Date: 07/23/2002 Time: 6:00pm - 7:00pm Location: Harbor Island I in the East Tower Moderated by: Wesley J. Chun, CyberWeb Consulting Monday has no conflicts for the entire timeslot from 6 to 10, but I don't fly in until Tuesday and I wasn't sure if many people would be there on Monday. That's why I picked Thursday. I'm open to suggestions. It would be a shame not to have you there. Thoughts? -- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python software development." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien ----------------------------------------------- From guido@python.org Wed Jul 10 14:20:55 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 10 Jul 2002 09:20:55 -0400 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation In-Reply-To: Your message of "Wed, 10 Jul 2002 09:05:57 EDT." <3D2C3135.4070401@zope.com> References: <3D2C3135.4070401@zope.com> Message-ID: <200207101320.g6ADKtH25999@pcp02138704pcs.reston01.va.comcast.net> > Is there any chance we could move this to Wednesday? > > I'm leaving Thursday morning. :( I'd prefer Wednesday too -- Thursday night I have an OSI board meeting to attend. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Jul 10 14:22:37 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 10 Jul 2002 09:22:37 -0400 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation In-Reply-To: Your message of "Wed, 10 Jul 2002 08:19:32 CDT." References: Message-ID: <200207101322.g6ADMbn26017@pcp02138704pcs.reston01.va.comcast.net> > The only timeslot on Wednesday is from 8 to 10 and that is taken up by: > > Python Software Foundation > Date: 07/24/2002 > Time: 8:00pm - 10:00pm > Location: Marina II in the East Tower > Moderated by: Guido van Rossum Oops, I forgot. Strike Wednesday, too. > The only timeslot on Tuesday is from 6 to 7 and that is taken up by: > > What is Python? > Date: 07/23/2002 > Time: 6:00pm - 7:00pm > Location: Harbor Island I in the East Tower > Moderated by: Wesley J. Chun, CyberWeb Consulting We can overlap with this -- Wesley's BOF is for absolute beginners, ours for dyed-in-the-wool developers. --Guido van Rossum (home page: http://www.python.org/~guido/) From pobrien@orbtech.com Wed Jul 10 14:27:46 2002 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Wed, 10 Jul 2002 08:27:46 -0500 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation In-Reply-To: <200207101322.g6ADMbn26017@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido van Rossum] > > The only timeslot on Tuesday is from 6 to 7 and that is taken up by: > > > > What is Python? > > Date: 07/23/2002 > > Time: 6:00pm - 7:00pm > > Location: Harbor Island I in the East Tower > > Moderated by: Wesley J. Chun, CyberWeb Consulting > > We can overlap with this -- Wesley's BOF is for absolute beginners, > ours for dyed-in-the-wool developers. Does Tuesday work for you, Jim? -- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python software development." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien ----------------------------------------------- From jim@zope.com Wed Jul 10 15:05:31 2002 From: jim@zope.com (Jim Fulton) Date: Wed, 10 Jul 2002 10:05:31 -0400 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation References: <200207101322.g6ADMbn26017@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3D2C3F2B.8060808@zope.com> Guido van Rossum wrote: >>The only timeslot on Wednesday is from 8 to 10 and that is taken up by: >> >>Python Software Foundation >>Date: 07/24/2002 >>Time: 8:00pm - 10:00pm >>Location: Marina II in the East Tower >>Moderated by: Guido van Rossum >> > > Oops, I forgot. Strike Wednesday, too. > > >>The only timeslot on Tuesday is from 6 to 7 and that is taken up by: >> >>What is Python? >>Date: 07/23/2002 >>Time: 6:00pm - 7:00pm >>Location: Harbor Island I in the East Tower >>Moderated by: Wesley J. Chun, CyberWeb Consulting >> > > We can overlap with this -- Wesley's BOF is for absolute beginners, > ours for dyed-in-the-wool developers. Unfortunately, I don't arrive at the SD airport till 8pm. I chose to keep my time as OSCON short this year. :( I guess I'll just miss the BOF. Dang. I suggest you go ahead with Thursday evening. I'll see how hard it would be to extend my stay another day. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From donnalcwalter@yahoo.com Wed Jul 10 15:24:24 2002 From: donnalcwalter@yahoo.com (Donnal Walter) Date: Wed, 10 Jul 2002 07:24:24 -0700 (PDT) Subject: [Persistence-sig] getting started In-Reply-To: <15658.57844.17239.668311@slothrop.zope.com> Message-ID: <20020710142424.32998.qmail@web13901.mail.yahoo.com> [Jeremy Hylton] > Introductions: Please tell us about your interest in the > persistence SIG, what personal/professional goals you have for > it, and how much time & energy you have. (Feel free to lurk if > that's your preference.) Programming is an avocation for me (I'm an academic physician) so I am sure I will be lurking mostly. But the custom clinical apps on which I have been working all require persistent data, so I will be following these proceedings with interest. I have little expertise, some time, and lots of energy. :-) ===== Donnal Walter Arkansas Children's Hospital __________________________________________________ Do You Yahoo!? Sign up for SBC Yahoo! Dial - First Month Free http://sbc.yahoo.com From sdrees@sdrees2.de Wed Jul 10 15:40:54 2002 From: sdrees@sdrees2.de (Stefan Drees) Date: Wed, 10 Jul 2002 16:40:54 +0200 Subject: [Persistence-sig] getting started In-Reply-To: <15658.57844.17239.668311@slothrop.zope.com>; from jeremy@zope.com on Tue, Jul 09, 2002 at 09:15:32AM -0400 References: <15658.57844.17239.668311@slothrop.zope.com> Message-ID: <20020710164054.B14438@sdrees2.de> On Tue, Jul 09, 2002 at 09:15:32AM -0400 - a wonderful day - Jeremy Hylton wrote: > ... > Introductions: Please tell us about your interest in the > persistence SIG, what personal/professional goals you have for > it, and how much time & energy you have. (Feel free to lurk if > that's your preference.) I've been programming and consulting since 1989. For now I am sure I will be lurking first. But a standardized persistency layer in python seems - at least to me - to be an important feature for python to stay competitive. So I will be following these discussions and hopefully the coding with interest and some participation, I guess. I do have some expertise, well at least some time, and energy. All the best, s t e f a n. -- Stefan Drees, sdrees@acm.org. From guido@python.org Wed Jul 10 15:56:41 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 10 Jul 2002 10:56:41 -0400 Subject: [Persistence-sig] getting started In-Reply-To: Your message of "Wed, 10 Jul 2002 16:40:54 +0200." <20020710164054.B14438@sdrees2.de> References: <15658.57844.17239.668311@slothrop.zope.com> <20020710164054.B14438@sdrees2.de> Message-ID: <200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net> > But a standardized persistency layer in python seems - at least to > me - to be an important feature for python to stay competitive. What is the competition doing in this area? --Guido van Rossum (home page: http://www.python.org/~guido/) From sdrees@sdrees2.de Wed Jul 10 16:54:43 2002 From: sdrees@sdrees2.de (Stefan Drees) Date: Wed, 10 Jul 2002 17:54:43 +0200 Subject: [Persistence-sig] getting started In-Reply-To: <200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Wed, Jul 10, 2002 at 10:56:41AM -0400 References: <15658.57844.17239.668311@slothrop.zope.com> <20020710164054.B14438@sdrees2.de> <200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020710175443.A15365@sdrees2.de> On Wed, Jul 10, 2002 at 10:56:41AM -0400 - a wonderful day - Guido van Rossum wrote: > > But a standardized persistency layer in python seems - at > > least to me - to be an important feature for python to stay > > competitive. > What is the competition doing in this area? Hm, nothing I'm aware of, but that's the point: staying ahead in some important areas just helps, doesn't it? All the best, s t e f a n. -- Stefan Drees, sdrees@acm.org. From guido@python.org Wed Jul 10 17:00:42 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 10 Jul 2002 12:00:42 -0400 Subject: [Persistence-sig] getting started In-Reply-To: Your message of "Wed, 10 Jul 2002 17:54:43 +0200." <20020710175443.A15365@sdrees2.de> References: <15658.57844.17239.668311@slothrop.zope.com> <20020710164054.B14438@sdrees2.de> <200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net> <20020710175443.A15365@sdrees2.de> Message-ID: <200207101600.g6AG0gY26597@pcp02138704pcs.reston01.va.comcast.net> > > > But a standardized persistency layer in python seems - at > > > least to me - to be an important feature for python to stay > > > competitive. > > What is the competition doing in this area? > Hm, nothing I'm aware of, but that's the point: staying ahead > in some important areas just helps, doesn't it? I dunno. I personally believe there's a reason why few languages standardize persistence, and why languages that do include persistence have remained at the fringe at best. --Guido van Rossum (home page: http://www.python.org/~guido/) From smenard@bigfoot.com Wed Jul 10 17:31:31 2002 From: smenard@bigfoot.com (Steve Menard) Date: Wed, 10 Jul 2002 12:31:31 -0400 Subject: [Persistence-sig] getting started In-Reply-To: <200207101600.g6AG0gY26597@pcp02138704pcs.reston01.va.comca st.net> References: <15658.57844.17239.668311@slothrop.zope.com> <20020710164054.B14438@sdrees2.de> <200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net> <20020710175443.A15365@sdrees2.de> Message-ID: <5.1.0.14.0.20020710122707.02a76d70@pop.videotron.ca> At 12:00 PM 7/10/2002 -0400, Guido van Rossum wrote: > > > > But a standardized persistency layer in python seems - at > > > > least to me - to be an important feature for python to stay > > > > competitive. > > > What is the competition doing in this area? > > Hm, nothing I'm aware of, but that's the point: staying ahead > > in some important areas just helps, doesn't it? > >I dunno. I personally believe there's a reason why few languages >standardize persistence, and why languages that do include persistence >have remained at the fringe at best. > >--Guido van Rossum (home page: http://www.python.org/~guido/) Could you elaborate on why you believe so? I know the technical hurdles will not be insignificant, and we have to be careful not to try to come up with "THE ONE TRUE SOLUTION" that would be supposed to solve everyone's problems. Personally, something like ZOPE, with a few enhancements and guaranteed to work on any platform (read pure-python), would go a LONG way ion the right direction. More static languages like C++, Java, Eiffel etc.. will naturally have a harder time creating versatile and easy to use persistence. That's where python's dynamic nature should help us. Steve From jmillr@umich.edu Wed Jul 10 17:42:13 2002 From: jmillr@umich.edu (John Miller) Date: Wed, 10 Jul 2002 12:42:13 -0400 Subject: [Persistence-sig] getting started In-Reply-To: Message-ID: Like others, I expect mainly to lurk. I would appreciate it if someone were willing to explain how the goals of this sig go beyond pickling and shelving. I know that this sounds like a newbie question, (which, in most respects, I am) but it would help to make explicit the context for the ensuing discussion. Since Python already incorporates persistence via pickling and shelving, what is currently lacking? (I know that the answer is probably obvious to most people on this list.) In other words, quickly describe the difference between pickling and shelving, describe how ZODB incorporates one or the other or both, and why or why not extending pickling and/or shelving themselves is a wise move to achieve the goals of this sig. Thanks in advance to anyone willing to lay the groundwork in this or a similar fashion for us developers-in-training! John Miller School of Education University of Michigan >>>> But a standardized persistency layer in python seems - at >>>> least to me - to be an important feature for python to stay >>>> competitive. >>> What is the competition doing in this area? >> Hm, nothing I'm aware of, but that's the point: staying ahead >> in some important areas just helps, doesn't it? > > I dunno. I personally believe there's a reason why few languages > standardize persistence, and why languages that do include persistence > have remained at the fringe at best. > > --Guido van Rossum (home page: http://www.python.org/~guido/) From jim@zope.com Wed Jul 10 18:01:00 2002 From: jim@zope.com (Jim Fulton) Date: Wed, 10 Jul 2002 13:01:00 -0400 Subject: [Persistence-sig] getting started References: Message-ID: <3D2C684C.9060307@zope.com> John Miller wrote: > Like others, I expect mainly to lurk. I would appreciate it if someone > were willing to explain how the goals of this sig go beyond pickling and > shelving. I know that this sounds like a newbie question, (which, in > most respects, I am) but it would help to make explicit the context for > the ensuing discussion. Since Python already incorporates persistence > via pickling and shelving, what is currently lacking? (I know that the > answer is probably obvious to most people on this list.) In other words, > quickly describe the difference between pickling and shelving, describe > how ZODB incorporates one or the other or both, and why or why not > extending pickling and/or shelving themselves is a wise move to achieve > the goals of this sig. Thanks in advance to anyone willing to lay the > groundwork in this or a similar fashion for us developers-in-training! Pickling and shelving: - Are not transactional - Are not transparent The application must explicitly load and save objects, must track object changes, and must manage when objects are and are not in memory. - Do not work with relational data. The proposed frameworks would provide a common *basis* (not solution) for transparent transactional persistence, both for object databases, like ZODB and for object-relational mapping frameworks. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jacobs@penguin.theopalgroup.com Wed Jul 10 18:27:53 2002 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Wed, 10 Jul 2002 13:27:53 -0400 (EDT) Subject: [Persistence-sig] getting started In-Reply-To: Message-ID: On Wed, 10 Jul 2002, John Miller wrote: > Like others, I expect mainly to lurk. I would appreciate it if someone > were willing to explain how the goals of this sig go beyond pickling and > shelving. The major reasons why things are more complex than shelve or pickle are due to the requirements of more sophisticated back-end data storage mechanisms. For one, the backend data store may need to be interoperable with other systems; i.e., relational or object database backends. Also, the backend store may be very large, so that loading and updating objects need to be done incrementally, efficiently, and safely. Here are some articles/debates about Java's persistent data objects that are useful, even if you don't agree with them: http://www.onjava.com/pub/a/onjava/2002/05/29/jdo.html http://www.onjava.com/pub/a/onjava/2002/04/10/jdbc.html Here is a partial taxonomy of issues I'd like to see addressed. You'll notice that many of them are somewhat specific to fixed schema persistent backends, like some object-relational (OR) systems: 1) Extensible bi-directional type mapping i.e., systems for mapping types from Python to an RDBMS and back in a lossless fashion. 2) Manual Schema specification vs. automatic schema introspection i.e., the ability to construct sensible objects from relations depends on the schema, but also other information like foreign keys, constraints, and possibly other meta-data not available from the backend. Some OR systems require differing amounts of user-specified schema information to build appropriate objects. 3) Foreign key referenced object instantiation i.e., how and when to instantiate new objects from attributes another object with attributes that can act as foreign keys. 4) Transactional scoping of object updates i.e., multiple OR-mapped objects can be queried from distinct transactions, then referentially linked together. This opens the door to several rather nasty situations, some of which can be handled, others must be explicitly disallowed. 5) Systems for tracking of uncommited object updates. 6) Query language abstraction for building OR frameworks. -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From guido@python.org Wed Jul 10 19:31:12 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 10 Jul 2002 14:31:12 -0400 Subject: [Persistence-sig] getting started In-Reply-To: Your message of "Wed, 10 Jul 2002 12:31:31 EDT." <5.1.0.14.0.20020710122707.02a76d70@pop.videotron.ca> References: <15658.57844.17239.668311@slothrop.zope.com> <20020710164054.B14438@sdrees2.de> <200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net> <20020710175443.A15365@sdrees2.de> <5.1.0.14.0.20020710122707.02a76d70@pop.videotron.ca> Message-ID: <200207101831.g6AIVCw27387@pcp02138704pcs.reston01.va.comcast.net> > >I dunno. I personally believe there's a reason why few languages > >standardize persistence, and why languages that do include persistence > >have remained at the fringe at best. > > > >--Guido van Rossum (home page: http://www.python.org/~guido/) > > Could you elaborate on why you believe so? > > I know the technical hurdles will not be insignificant, and we have to be > careful not to try to come up with "THE ONE TRUE SOLUTION" that would be > supposed to solve everyone's problems. Personally, something like ZOPE, > with a few enhancements and guaranteed to work on any platform (read > pure-python), would go a LONG way ion the right direction. Kevin Jacobs's posts here are an example of what I mean. He wants to map objects to relational databases, which is very different from Zope. Coming up with something that supports both sounds hard. > More static languages like C++, Java, Eiffel etc.. will naturally have a > harder time creating versatile and easy to use persistence. That's where > python's dynamic nature should help us. I don't know why you think that. As long as a language has the metadata describing its types at run-time, it should have no problem. At least Java and (modern) C++ satisfy this condition; I don't know enough about Eiffel but I'd bet that it also has considerable run-time accessible meta-data. --Guido van Rossum (home page: http://www.python.org/~guido/) From jacobs@penguin.theopalgroup.com Wed Jul 10 19:35:03 2002 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Wed, 10 Jul 2002 14:35:03 -0400 (EDT) Subject: [Persistence-sig] getting started In-Reply-To: <200207101831.g6AIVCw27387@pcp02138704pcs.reston01.va.comcast.net> Message-ID: On Wed, 10 Jul 2002, Guido van Rossum wrote: > Kevin Jacobs's posts here are an example of what I mean. He wants to > map objects to relational databases, which is very different from > Zope. Coming up with something that supports both sounds hard. It is different than ZODB, and it will be hard to come up with an implementation that supports both paradigms. However, I am much more concerned with interface at the moment, so I still think there is much useful work that can be done here that applies to both. -Kevin ;) -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From smenard@bigfoot.com Wed Jul 10 19:50:41 2002 From: smenard@bigfoot.com (Steve Menard) Date: Wed, 10 Jul 2002 14:50:41 -0400 Subject: [Persistence-sig] getting started In-Reply-To: <200207101831.g6AIVCw27387@pcp02138704pcs.reston01.va.comca st.net> References: <15658.57844.17239.668311@slothrop.zope.com> <20020710164054.B14438@sdrees2.de> <200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net> <20020710175443.A15365@sdrees2.de> <5.1.0.14.0.20020710122707.02a76d70@pop.videotron.ca> Message-ID: <5.1.0.14.0.20020710144606.02aa5130@pop.videotron.ca> At 02:31 PM 7/10/2002 -0400, Guido van Rossum wrote: > > >I dunno. I personally believe there's a reason why few languages > > >standardize persistence, and why languages that do include persistence > > >have remained at the fringe at best. > > > > > >--Guido van Rossum (home page: http://www.python.org/~guido/) > > > > Could you elaborate on why you believe so? > > > > I know the technical hurdles will not be insignificant, and we have to be > > careful not to try to come up with "THE ONE TRUE SOLUTION" that would be > > supposed to solve everyone's problems. Personally, something like ZOPE, > > with a few enhancements and guaranteed to work on any platform (read > > pure-python), would go a LONG way ion the right direction. > >Kevin Jacobs's posts here are an example of what I mean. He wants to >map objects to relational databases, which is very different from >Zope. Coming up with something that supports both sounds hard. Yep. I can't help but agree on this. I think its possible to come up with a common public interface for both mechanism. However, I doubt an object built for one model can be reused as-is in a different model. My personal interest in this is more to ZODB-like functionality becoming standard than other more enterprise-oriented solutions (like OR mappings seem to be). > > More static languages like C++, Java, Eiffel etc.. will naturally have a > > harder time creating versatile and easy to use persistence. That's where > > python's dynamic nature should help us. > >I don't know why you think that. As long as a language has the >metadata describing its types at run-time, it should have no problem. >At least Java and (modern) C++ satisfy this condition; I don't know >enough about Eiffel but I'd bet that it also has considerable run-time >accessible meta-data. It's the cost of accessing that information that makes it harder. I have worked on a few Java persistence prototypes, and I have never come up with something satisfactory. Tracking changes is hard (because we can't trap the setattr), getting/setting values is hard (access protecttion being the chief culprit), etc... Steve From smenard@bigfoot.com Wed Jul 10 19:58:00 2002 From: smenard@bigfoot.com (Steve Menard) Date: Wed, 10 Jul 2002 14:58:00 -0400 Subject: [Persistence-sig] getting started In-Reply-To: <15658.57844.17239.668311@slothrop.zope.com> Message-ID: <5.1.0.14.0.20020710145101.080e0dd0@pop.videotron.ca> At 09:15 AM 7/9/2002 -0400, Jeremy Hylton wrote: >It looks like many of the people who expressed interest in the SIG >have subscribed to the list, so we ought to get started. I think we >should begin with some introductions and a review of the SIG charter. > >Introductions: Please tell us about your interest in the persistence >SIG, what personal/professional goals you have for it, and how much >time & energy you have. (Feel free to lurk if that's your preference.) Well, better late than never. I am professional programmer. I've been using python on and off for a few years now. My main interest in this is so I can get ZODB-like functionality without too much fuss. I currently do not have a lot of time to devote, but as I will be in recovery all of september, I can put some time into coding/testing. Additonally, since I have a few projects waiting on just such functionality (they are why I started POD http://www.sourceforge.net/projects/pypod ), I will certainly be in a position to use whatever comes out of the SIG. >Charter: Jim Fulton wrote the SIG charter. A very brief summary is >that we should: > > - focus on transparency, transactions, and memory-caching issues; > > - put off concurrency control, queries, and constraints; > > - produce PEPs and, if there is consensus, code for the std library. One small comment. Perhaps the single missing feature of ZODB (besides not running on Python 2.2) is a query language. Furthermore, without adequate support from the Storage classes, such a language will be very difficult to tack on afterward. Steve From jim@zope.com Wed Jul 10 20:16:10 2002 From: jim@zope.com (Jim Fulton) Date: Wed, 10 Jul 2002 15:16:10 -0400 Subject: [Persistence-sig] getting started References: <15658.57844.17239.668311@slothrop.zope.com> <20020710164054.B14438@sdrees2.de> <200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net> <20020710175443.A15365@sdrees2.de> <5.1.0.14.0.20020710122707.02a76d70@pop.videotron.ca> <200207101831.g6AIVCw27387@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3D2C87FA.301@zope.com> Guido van Rossum wrote: >>>I dunno. I personally believe there's a reason why few languages >>>standardize persistence, and why languages that do include persistence >>>have remained at the fringe at best. >>> >>>--Guido van Rossum (home page: http://www.python.org/~guido/) >>> >>Could you elaborate on why you believe so? >> >>I know the technical hurdles will not be insignificant, and we have to be >>careful not to try to come up with "THE ONE TRUE SOLUTION" that would be >>supposed to solve everyone's problems. Personally, something like ZOPE, >>with a few enhancements and guaranteed to work on any platform (read >>pure-python), would go a LONG way ion the right direction. >> > > Kevin Jacobs's posts here are an example of what I mean. He wants to > map objects to relational databases, which is very different from > Zope. Coming up with something that supports both sounds hard. Maybe, but understand that O-R mapping is not in the scope of the SIG. Rather, basic persistence and transaction frameworks, that one could build O-R mappings or object databases on top of are in scope. I'm hopeful that we could come up with low-level frameworks that could serve both. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From altis@semi-retired.com Wed Jul 10 21:30:34 2002 From: altis@semi-retired.com (Kevin Altis) Date: Wed, 10 Jul 2002 13:30:34 -0700 Subject: [Persistence-sig] getting started Message-ID: I'm the lead for the PythonCard project http://pythoncard.sourceforge.net/ I'll mostly be lurking and don't expect that I will contribute any code. I'm not a database guy and I've only been using Python for a little over a year, so all you data gurus are much more qualified than I to say what is good and proper. However, if there is something usable that comes out of this SIG then it is likely a PythonCard sample or two will get created that utilizes the API/package. I'll go ahead and give a long introduction, to get it out of the way and hopefully bring up some relevant topics. Persistence in the context of PythonCard is probably a bit different than what most people have in mind for this SIG. We don't even have complete agreement among the main PythonCard developers on this topic. I would like to have a storage solution that is built-in to the Python standard distribution and that won't change in the next few years but preferably won't change for 5-10 years or longer, so that there is little risk of stored data becoming unusable as Python is updated. The data format must also be cross-platform, at least for the major desktop platforms in use, so that data created on one platform can be easily exchanged with a user on another platform without the need for an explicit import/export. This is where shelve falls down, unless you use dumddbm. The storage format we end up using for PythonCard will be a basic document type that any PythonCard app/Python script should be able to open and make some sense of. Other storage formats will always be an option, but there will be at least one well-defined format that any and all apps should understand regardless of whether they are running on Windows, Mac OS X, Linux/Unix. I'm mostly thinking of storing "dumb data" or simple types, lists and dictionaries, so I'm not particularly concerned about being able to store instances of complex classes and their member relations. Storing class instances worries me because I expect some classes to change over time and potentially break the loading of old data files created with different versions of the classes. Plain pickles seem to fit my requirements as long as you only use native Python types, so that there are no dependencies on external classes and modules when loading the pickle. A conversion of the data to a newer format might be acceptable, but this implies some kind of versioning or other smarts in the data file. The PythonCard flatfileDatabase sample uses a simple list of dictionaries for storing data, keeping the entire data set in memory while the app is running. The data can be loaded and stored as a single pickle file (version in cvs, not release 0.6.7). I would have preferred a solution where all the data didn't need to be in memory and the access to each record in the list was transparent, but I ran into issues trying to use shelve for this and we haven't gotten far enough along with ZODB to know whether it will do the job. A number of people working on PythonCard apps would be very happy if the simple lists and dictionaries could be mapped to underlying SQL data stores without the user of the storage needing to know anything about SQL. Concurrency and transactions would be nice too. I posted a message to the PythonCard-users mailing list about shelve at the end of June that covers some of the issues I ran into with shelve. "why we probably don't want to use shelve" http://aspn.activestate.com/ASPN/Mail/Message/1259977 There are even more messages in the PythonCard-users archive about persistence and pickle, but most of them only touch on issues this SIG will address. http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/PythonCard ka --- Kevin Altis altis@semi-retired.com http://www.pythoncard.org/ From pje@telecommunity.com Wed Jul 10 21:29:42 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Wed, 10 Jul 2002 16:29:42 -0400 Subject: [Persistence-sig] getting started In-Reply-To: References: Message-ID: <3.0.5.32.20020710162942.00868100@telecommunity.com> At 01:27 PM 7/10/02 -0400, Kevin Jacobs wrote: > >Here is a partial taxonomy of issues I'd like to see addressed. You'll >notice that many of them are somewhat specific to fixed schema persistent >backends, like some object-relational (OR) systems: > > 1) Extensible bi-directional type mapping > 2) Manual Schema specification vs. automatic schema introspection > 3) Foreign key referenced object instantiation > 4) Transactional scoping of object updates > 5) Systems for tracking of uncommited object updates. > 6) Query language abstraction for building OR frameworks. These are all good points, but actually solving them is (IMHO) outside scope for the SIG's mission. What we want is basic support for: 1) "Transparent" persistence, for some value of "transparent". A mechanism to either specify that a class is intended to be persistent, or to otherwise provide proxy or observer-style support to allow a persistence *mechanism* to know when object states are changed or accessed, or about to be changed or accessed, in order to do their thing. 2) Transaction framework/API, for some value of "framework/API". Again, this is also about mechanisms for registering, observing, or otherwise notifying objects (or persistence mechanisms) about transaction participation. This is actually a very narrow set of goals, ones that I think we have a high degree of ability to achieve, if we stay focused on them, and how our individual high-level requirements (such as you've described) are reflected in these introspection/notification aspects. If Python has common idioms for dealing with these issues, then many persistence *mechanisms* can co-exist and compete in their respective niches for what they can handle. Perhaps multiple mechanisms might even be able to share the management of a single object. One reason that I believe these narrow goals are attainable, is that the existing Zope "Persistence" and "Transaction" packages from ZODB4 *can* be leveraged to build O-R and other mappings. I know this, because I've designed a framework atop the existing ZODB4 code base that can map *anything* to or from *anything*. Specific examples: * Relational database * XML/XMI in a file * XML/XMI, persisted to a relational database * XML/XMI, persisted in ZODB * A relational database written using persistent objects for tables and rows, stored in ZODB. :) Indeed, the design I have is of sufficient generality to persist any object to any backend (where said backend may actually be another persistent object, stored in yet another backend!), as long as: 1. All objects to be persisted subclass Persistence.Persistent. 2. All backends participate in the Transactions.Transaction framework. (There is an additional restriction when dealing with backends which themselves are stored in other backends, which is that the "outermost" backends must support potentially commiting the same object more than once during the tpc_begin->tpc_vote phase of transaction commit.) (I should also note that when dealing with relational databases, my design work addressed such matters as cache consistency, relational integrity constraint ordering, multi-row queries, foreign key and inverse foreign key relationships, etc., etc., ad nauseam.) Anyway, the fact that these things can be done based solely on the existing ZODB4 Persistent and Transaction classes, entirely ignoring the "ZODB" package itself, means that what's available from ZODB is actually pretty close to what's needed as a base mechanism. It's more a question (to me, anyway) of what could/should be improved, particularly in the form of how the API calls and interfaces are phrased. For my requirements, I'd be fine with it if we just put Persistent and Transaction in the standard library, with better docs. :) But it'd be nice if certain things were spelled differently, or a bit more flexible. From pobrien@orbtech.com Wed Jul 10 21:38:15 2002 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Wed, 10 Jul 2002 15:38:15 -0500 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation In-Reply-To: <3D2C3F2B.8060808@zope.com> Message-ID: [Jim Fulton] > > Unfortunately, I don't arrive at the SD airport till 8pm. > > I chose to keep my time as OSCON short this year. :( > > I guess I'll just miss the BOF. Dang. > > I suggest you go ahead with Thursday evening. > > I'll see how hard it would be to extend my stay another day. Unfortunately, Guido has another meeting Thursday evening. Would moving the time up to 6:00 pm on Thursday help? I would think this BOF would be most productive if we had the Pope, the BDFL and the SIG Coordinator all together at the same time. But I'll take whatever I can get. :-) -- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python software development." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien ----------------------------------------------- From jim@zope.com Wed Jul 10 21:47:21 2002 From: jim@zope.com (Jim Fulton) Date: Wed, 10 Jul 2002 16:47:21 -0400 Subject: [Persistence-sig] getting started References: <3.0.5.32.20020710162942.00868100@telecommunity.com> Message-ID: <3D2C9D59.2000203@zope.com> Very well said. Jim Phillip J. Eby wrote: > At 01:27 PM 7/10/02 -0400, Kevin Jacobs wrote: > >>Here is a partial taxonomy of issues I'd like to see addressed. You'll >>notice that many of them are somewhat specific to fixed schema persistent >>backends, like some object-relational (OR) systems: >> >> 1) Extensible bi-directional type mapping >> 2) Manual Schema specification vs. automatic schema introspection >> 3) Foreign key referenced object instantiation >> 4) Transactional scoping of object updates >> 5) Systems for tracking of uncommited object updates. >> 6) Query language abstraction for building OR frameworks. >> > > These are all good points, but actually solving them is (IMHO) outside > scope for the SIG's mission. What we want is basic support for: > > 1) "Transparent" persistence, for some value of "transparent". A mechanism > to either specify that a class is intended to be persistent, or to > otherwise provide proxy or observer-style support to allow a persistence > *mechanism* to know when object states are changed or accessed, or about to > be changed or accessed, in order to do their thing. > > 2) Transaction framework/API, for some value of "framework/API". Again, > this is also about mechanisms for registering, observing, or otherwise > notifying objects (or persistence mechanisms) about transaction participation. > > This is actually a very narrow set of goals, ones that I think we have a > high degree of ability to achieve, if we stay focused on them, and how our > individual high-level requirements (such as you've described) are reflected > in these introspection/notification aspects. If Python has common idioms > for dealing with these issues, then many persistence *mechanisms* can > co-exist and compete in their respective niches for what they can handle. > Perhaps multiple mechanisms might even be able to share the management of a > single object. > > One reason that I believe these narrow goals are attainable, is that the > existing Zope "Persistence" and "Transaction" packages from ZODB4 *can* be > leveraged to build O-R and other mappings. I know this, because I've > designed a framework atop the existing ZODB4 code base that can map > *anything* to or from *anything*. Specific examples: > > * Relational database > * XML/XMI in a file > * XML/XMI, persisted to a relational database > * XML/XMI, persisted in ZODB > * A relational database written using persistent objects for tables and > rows, stored in ZODB. :) > > Indeed, the design I have is of sufficient generality to persist any object > to any backend (where said backend may actually be another persistent > object, stored in yet another backend!), as long as: > > 1. All objects to be persisted subclass Persistence.Persistent. > 2. All backends participate in the Transactions.Transaction framework. > > (There is an additional restriction when dealing with backends which > themselves are stored in other backends, which is that the "outermost" > backends must support potentially commiting the same object more than once > during the tpc_begin->tpc_vote phase of transaction commit.) > > (I should also note that when dealing with relational databases, my design > work addressed such matters as cache consistency, relational integrity > constraint ordering, multi-row queries, foreign key and inverse foreign key > relationships, etc., etc., ad nauseam.) > > Anyway, the fact that these things can be done based solely on the existing > ZODB4 Persistent and Transaction classes, entirely ignoring the "ZODB" > package itself, means that what's available from ZODB is actually pretty > close to what's needed as a base mechanism. It's more a question (to me, > anyway) of what could/should be improved, particularly in the form of how > the API calls and interfaces are phrased. > > For my requirements, I'd be fine with it if we just put Persistent and > Transaction in the standard library, with better docs. :) But it'd be > nice if certain things were spelled differently, or a bit more flexible. > > > > _______________________________________________ > Persistence-sig mailing list > Persistence-sig@python.org > http://mail.python.org/mailman-21/listinfo/persistence-sig > -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From guido@python.org Wed Jul 10 22:00:33 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 10 Jul 2002 17:00:33 -0400 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation In-Reply-To: Your message of "Wed, 10 Jul 2002 15:38:15 CDT." References: Message-ID: <200207102100.g6AL0Xw27884@pcp02138704pcs.reston01.va.comcast.net> > Unfortunately, Guido has another meeting Thursday evening. Would > moving the time up to 6:00 pm on Thursday help? I would think this > BOF would be most productive if we had the Pope, the BDFL and the > SIG Coordinator all together at the same time. But I'll take > whatever I can get. :-) Alas, the OSI board has a dinner meeting preceding the (open) board meeting starting at 5:30 on Thu. Perhaps we could try to do this over lunch on Wed? (Lunch Wed is booked too for me...) Or we could pick a slot in the Python track that is unlikely to be of interest for the persistence crowd. I could miss the two Thursday morning talks (weave and WRDLpy -- no offense intended). --Guido van Rossum (home page: http://www.python.org/~guido/) From jim@zope.com Wed Jul 10 22:08:41 2002 From: jim@zope.com (Jim Fulton) Date: Wed, 10 Jul 2002 17:08:41 -0400 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation References: <200207102100.g6AL0Xw27884@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3D2CA259.8040900@zope.com> Guido van Rossum wrote: >>Unfortunately, Guido has another meeting Thursday evening. Would >>moving the time up to 6:00 pm on Thursday help? I would think this >>BOF would be most productive if we had the Pope, the BDFL and the >>SIG Coordinator all together at the same time. But I'll take >>whatever I can get. :-) >> > > Alas, the OSI board has a dinner meeting preceding the (open) board > meeting starting at 5:30 on Thu. Perhaps we could try to do this over > lunch on Wed? (Lunch Wed is booked too for me...) > > Or we could pick a slot in the Python track that is unlikely to be of > interest for the persistence crowd. I could miss the two Thursday > morning talks (weave and WRDLpy -- no offense intended). This last idea sounds good to me. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From guido@python.org Wed Jul 10 22:10:13 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 10 Jul 2002 17:10:13 -0400 Subject: [Persistence-sig] getting started In-Reply-To: Your message of "Wed, 10 Jul 2002 16:47:21 EDT." <3D2C9D59.2000203@zope.com> References: <3.0.5.32.20020710162942.00868100@telecommunity.com> <3D2C9D59.2000203@zope.com> Message-ID: <200207102110.g6ALAEC27920@pcp02138704pcs.reston01.va.comcast.net> > Phillip J. Eby wrote: [...] > > Anyway, the fact that these things can be done based solely on the existing > > ZODB4 Persistent and Transaction classes, entirely ignoring the "ZODB" > > package itself, means that what's available from ZODB is actually pretty > > close to what's needed as a base mechanism. It's more a question (to me, > > anyway) of what could/should be improved, particularly in the form of how > > the API calls and interfaces are phrased. > > > > For my requirements, I'd be fine with it if we just put Persistent and > > Transaction in the standard library, with better docs. :) But it'd be > > nice if certain things were spelled differently, or a bit more flexible. This is a goal I can agree with. Care to start a list of what spellings you'd like to change? --Guido van Rossum (home page: http://www.python.org/~guido/) From jim@zope.com Wed Jul 10 21:52:39 2002 From: jim@zope.com (Jim Fulton) Date: Wed, 10 Jul 2002 16:52:39 -0400 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation References: Message-ID: <3D2C9E97.8020608@zope.com> Patrick K. O'Brien wrote: > [Jim Fulton] > >>Unfortunately, I don't arrive at the SD airport till 8pm. >> >>I chose to keep my time as OSCON short this year. :( >> >>I guess I'll just miss the BOF. Dang. >> >>I suggest you go ahead with Thursday evening. >> >>I'll see how hard it would be to extend my stay another day. >> > > Unfortunately, Guido has another meeting Thursday evening. Would moving the > time up to 6:00 pm on Thursday help? I would think this BOF would be most > productive if we had the Pope, the BDFL and the SIG Coordinator all together > at the same time. But I'll take whatever I can get. :-) I'm pretty sure that Jeremy isn't going to be there. If earlier on Thursday can work, then I'll try to change my reservation. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From pobrien@orbtech.com Wed Jul 10 23:12:45 2002 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Wed, 10 Jul 2002 17:12:45 -0500 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation In-Reply-To: <3D2CA259.8040900@zope.com> Message-ID: [Jim Fulton] > > > > Or we could pick a slot in the Python track that is unlikely to be of > > interest for the persistence crowd. I could miss the two Thursday > > morning talks (weave and WRDLpy -- no offense intended). > > This last idea sounds good to me. I'm waiting to hear back from O'Reilly to see if we can make this happen. -- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python software development." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien ----------------------------------------------- From jcw@equi4.com Thu Jul 11 00:16:37 2002 From: jcw@equi4.com (Jean-Claude Wippler) Date: Thu, 11 Jul 2002 01:16:37 +0200 Subject: [Persistence-sig] getting started In-Reply-To: <200207102110.g6ALAEC27920@pcp02138704pcs.reston01.va.comcast.net> References: <200207102110.g6ALAEC27920@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020710231637.10844@triqs.com> Jeremy Hylton wrote: >Introductions: Please tell us about your interest in the persistence I have a long-standing interest in persistence and scripting. Finding a middle ground between the relational data model, object storage, structured storage, and plain serialization is a key area of focus for me. I'm self-employed, and have been so for well over a decade, with a mix of working on commissioned projects and doing research on persistence and scripting (more and more so). What I would hope to see happen here, is a generalization away from being purely OO (which has no intrinsic connection to persistence), purely single-language (because data often lives *far* longer than language technologies do), or even purely relational (which provides insufficient expressiveness for algorithmic optimizations). I think the main focus needs to be on data representation, in such a way that language access and memory-mapped files can effectively interface to each other. Ultimately, it may affect the very core, e.g. PyObject changes. As designer of the MetaKit embedded database, which binds to several languages, has many years of production use (maintaining full datafile compatibility), and is finding its way into Roundup (Python), Starkits (Tcl), and the AddressBook of every Mac (C++), I can't help but think that there has to be something to an approach which focuses on generality-through-simplicity. So much for the blurb. If this forum is about finding a genuine common ground for persistence and scripting, not just ZODB and/or Python, then I would love to help make things happen, and contribute serious time and code (FWIW). Jean-Claude Wippler Equi4 Software - http://www.equi4.com From pje@telecommunity.com Thu Jul 11 00:38:00 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Wed, 10 Jul 2002 19:38:00 -0400 Subject: [Persistence-sig] getting started In-Reply-To: <200207102110.g6ALAEC27920@pcp02138704pcs.reston01.va.comca st.net> References: <3.0.5.32.20020710162942.00868100@telecommunity.com> <3D2C9D59.2000203@zope.com> Message-ID: <3.0.5.32.20020710193800.00893350@telecommunity.com> At 05:10 PM 7/10/02 -0400, Guido van Rossum wrote: >> Phillip J. Eby wrote: >[...] >> > Anyway, the fact that these things can be done based solely on the existing >> > ZODB4 Persistent and Transaction classes, entirely ignoring the "ZODB" >> > package itself, means that what's available from ZODB is actually pretty >> > close to what's needed as a base mechanism. It's more a question (to me, >> > anyway) of what could/should be improved, particularly in the form of how >> > the API calls and interfaces are phrased. >> > >> > For my requirements, I'd be fine with it if we just put Persistent and >> > Transaction in the standard library, with better docs. :) But it'd be >> > nice if certain things were spelled differently, or a bit more flexible. > >This is a goal I can agree with. Care to start a list of what >spellings you'd like to change? > As I said, I can pretty much live with it all as it is now. Some minor annoyances: * There's no way to be notified that a "transaction is over". You have to trap different messages from Transaction, while perhaps registering a dummy object, just to figure out transaction boundaries. This is a pain when creating transactional caches, i.e., ones which want to clear themselves whenever a transaction commits *or* aborts. * A similar, related pain, is that you have to re-register on *every* transaction, and keep track of whether you've registered yet, any time you do something that might mean you *should* be registered. A way to "permanently" (i.e. until app termination or otherwise requested) subscribe to transaction begin/end messages would be very handy. Or even to the whole tpc_begin/vote/finish message sequence. * While on the subject of such messages, why should the Transaction object have to be the one to keep track of changed objects? Why shouldn't data managers do that themselves? In the case of my "storage jars" model, I have to track "currently dirty objects" separately from the transaction's list of objects needing to be committed, because I may "pre-flush" certain changes to say, an RDBMS, in order to ensure that queries within the same transaction will see the updated data. Since the "jar" has to track this anyway, why does the Transaction need to do the same? Why not just send the jars a set of begin/vote/finish messages? In my current framework, my "jars" automatically detect when they're being asked to commit something that's already flushed to the back-end, and ignore it. If the Transaction didn't bother tracking stuff and telling me to commit it, I'd just have tpc_begin cause a flush of all dirty objects, and I'd be ready for tpc_vote. Not only that, but the Transaction object itself would get lots simpler, and wouldn't need to have complex logic to manage data managers' objects for them! (Granted, data managers would need to know which items they "committed" during tpc_begin->tpc_vote, in order to roll them back, but I suspect that many data managers are already tracking this in some form, if only to do invalidation messages.) * "Ghosting" attributes. Right now, persistent objects are either loaded, or not. There's no way to designate an object as "loaded except for attributes X, Y, and Z". Why do I need that? Because I may have data stored for that object in different back-ends (LDAP and SQL is a combo that comes up often for me) and don't want to incur a possibly large load-time penalty to get all the (non-object) attributes, that may not even get read during a particular transaction. So, if we're talking about redoing Persistence.Persistent, I'd like to see attribute-specific read/write monitoring, if it doesn't add so much performance overhead as to remove the benefits of having it. By the way, it would be an acceptable solution for this if we had extremely lightweight proxies that could stand-in for an arbitrary Python object, and call something to load the "real" object upon access. Of course, if we had such an animal, it could replace the need for subclassing Persistence.Persistent in the first place! It could also trap all the "modifying" methods like __setitem__, __setslice__, etc. (Interestingly, the Zope 3 security proxy objects written in C, look to me to have sufficient generality to perform these functions, in that they monitor all attribute and method accesses. Although I am perhaps missing whether they work in regard to operations that the object performs upon *itself*. It may be that such accesses are not checked, but would need to be for a persistence proxy.) Anyway, the above pretty much sums up my principal annoyances/peeves with Persistent and Transaction. I can pretty much do everything I want with the existing systems, but the above things would make them easier to do. (Right now, to do state that's loaded from multiple back-ends, I have to have some kind of support added into the object, or change its class on the fly to add descriptors for lazily-loaded attributes.) From pje@telecommunity.com Thu Jul 11 01:38:49 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Wed, 10 Jul 2002 20:38:49 -0400 Subject: [Persistence-sig] getting started In-Reply-To: <5.1.0.14.0.20020710144606.02aa5130@pop.videotron.ca> References: <200207101831.g6AIVCw27387@pcp02138704pcs.reston01.va.comca st.net> <15658.57844.17239.668311@slothrop.zope.com> <20020710164054.B14438@sdrees2.de> <200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net> <20020710175443.A15365@sdrees2.de> <5.1.0.14.0.20020710122707.02a76d70@pop.videotron.ca> Message-ID: <5.1.0.14.0.20020710203438.05ed8eb0@mail.telecommunity.com> At 02:50 PM 7/10/02 -0400, Steve Menard wrote: >Yep. I can't help but agree on this. I think its possible to come up with >a common public interface for both mechanism. However, I doubt an object >built for one model can be reused as-is in a different model. If by "object", you mean "persistence mechanism/mapping", then I agree. But, if by "object" you mean the object to be persisted, I disagree. An explicit goal of my recent work was to support transparent switching between persistence *mechanisms* without any change to the objects which were to be stored. The only place I've been less than 100% successful using the ZODB4 P&T packages, is with objects stored in more than one back-end. And even there, I can do it successfully as long as I don't need lazy-loading of "non-object" attributes. (By which I mean values I'd like to have as simple strings or numbers, without subclassing them to create, say, a PersistentInteger or PersistentString class.) From pje@telecommunity.com Thu Jul 11 01:45:25 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Wed, 10 Jul 2002 20:45:25 -0400 Subject: [Persistence-sig] getting started In-Reply-To: <20020710235059.80707.qmail@web20709.mail.yahoo.com> References: <3.0.5.32.20020710193800.00893350@telecommunity.com> Message-ID: <5.1.0.14.0.20020710202925.05ed0470@mail.telecommunity.com> At 04:50 PM 7/10/02 -0700, Ilia Iourovitski wrote: >Based upon your comments you need to completly >different thing: >1. Transaction monitor, to which you can register >different providers. Probably heuristic commit too. >In Java world it is JTA spec and Tyrex as example. >In case if you mixing SQL, LDAP you xa transactions >and >two-way commit protocol. Yes, and the existing ZODB4 Transaction package supports two-phase commit across multiple providers. I just find its protocol for doing it annoying in some ways. >2. For slow pesristence providers you need proxies, >mapping meta info, lazy loading, lazy collection. >Loading by id, by query. >Usuall OR Mapper stuff. Yes, and I've designed and/or written all that, except for the proxy or base class, for which I use ZODB4's Persistence package. >And what about locks in case of RDBMS. I don't think there's anything special needed at the persistence level to handle this. >All of those thing are out of scope of persistence-sig. Item 2 things, yes, item 1 things no. And even item 2's stuff has to be *capable* of being *done* with the SIG's output. I don't need anybody to write those things; I just want to base my work for them on a solid mechanism for detecting access and changes to objects, and a solid API for interacting with a transaction object. From pobrien@orbtech.com Thu Jul 11 14:06:59 2002 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Thu, 11 Jul 2002 08:06:59 -0500 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation Message-ID: It looks like O'Reilly is unable to accommodate our request to hold a BOF during one of the Thursday morning sessions. (See the reply below.) So it appears we have few options, none of which look particularly good, unless someone can think of another: 1. Go with the scheduled time of 8-10pm Thursday. Guido will miss it. Jim *might* be able to make it. 2. Change the time to 6-8pm to make it easier for Jim to attend. Guido will still miss it. 3. Change the time to 6-7pm Tuesday. Guido can make it (I believe) but Jim will miss it (unless he got an earlier flight?). 4. Give up on an "official" BOF and have lunch together or meet on our own somehow, somewhere. Let me know what you think. -- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python software development." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien ----------------------------------------------- -----Original Message----- From: Gretchen Bartholomew [mailto:gretchen@oreilly.com] Sent: Wednesday, July 10, 2002 11:18 PM To: pobrien@orbtech.com Cc: vee@oreilly.com Subject: RE: OSCON Birds of a Feather Session - confirmation Dear Patrick, I would very much like to reschedule your BOF for a more convenient time for you and your peers. Unfortunately, however, I cannot schedule BOFs in the morning or during the day, for that matter, while sessions are in progress. All conference rooms are being utilized for convention sessions. BOFs are held in the evenings. I have several BOF slots available during the following regular BOF dates/times. You are welcome to move your BOF to any slot within these timeframes. Monday: 6:00pm - 10:00pm Tuesday: 6:00pm - 7:00 pm Wednesday: 8:00pm - 10:00pm Thursday: 6:00pm - 10:00pm Simply let me know which time out of those listed above would be the best for you and I will reschedule. Many thanks. Gretchen From jim@zope.com Thu Jul 11 14:29:28 2002 From: jim@zope.com (Jim Fulton) Date: Thu, 11 Jul 2002 09:29:28 -0400 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation References: Message-ID: <3D2D8838.8040809@zope.com> Patrick K. O'Brien wrote: > It looks like O'Reilly is unable to accommodate our request to hold a BOF > during one of the Thursday morning sessions. (See the reply below.) So it > appears we have few options, none of which look particularly good, unless > someone can think of another: > > 1. Go with the scheduled time of 8-10pm Thursday. Guido will miss it. Jim > *might* be able to make it. > > 2. Change the time to 6-8pm to make it easier for Jim to attend. Guido will > still miss it. That woudn't make it any easier for me. > 3. Change the time to 6-7pm Tuesday. Guido can make it (I believe) but Jim > will miss it (unless he got an earlier flight?). Uh, how about 10pm on Tuesday. I can make that unless my plane is late. :) > 4. Give up on an "official" BOF and have lunch together or meet on our own > somehow, somewhere. How about getting together early on Wednesday, say 7 am? We could meet over breakfast. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From guido@python.org Thu Jul 11 14:36:43 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 11 Jul 2002 09:36:43 -0400 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation In-Reply-To: Your message of "Thu, 11 Jul 2002 09:29:28 EDT." <3D2D8838.8040809@zope.com> References: <3D2D8838.8040809@zope.com> Message-ID: <200207111336.g6BDahf05430@odiug.zope.com> > > 4. Give up on an "official" BOF and have lunch together or meet on our own > > somehow, somewhere. > > How about getting together early on Wednesday, say 7 am? We could meet over breakfast. +1. --Guido van Rossum (home page: http://www.python.org/~guido/) From pobrien@orbtech.com Thu Jul 11 15:36:07 2002 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Thu, 11 Jul 2002 09:36:07 -0500 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session - confirmation In-Reply-To: <200207111336.g6BDahf05430@odiug.zope.com> Message-ID: [Guido van Rossum] > > > > 4. Give up on an "official" BOF and have lunch together or > meet on our own > > > somehow, somewhere. > > > > How about getting together early on Wednesday, say 7 am? We > could meet over breakfast. > > +1. Okay, breakfast it is. Now we need to decide where. O'Reilly provides breakfast and I'm trying to find out when they start serving. But I think 7:00 is probably a safe time. So we could just plan to meet at the O'Reilly Food & Beverage Banquet Tent. The other option is the hotel restaurant. Harbor's Edge Restaurant is located off the main lobby of the Sheraton East Tower, tantalizes your palette with American Eclectic Cuisine and offers a panoramic view of the Marina. Better still, they're open for breakfast starting at 6:30am. Any preferences? -- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python software development." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien ----------------------------------------------- From pobrien@orbtech.com Thu Jul 11 16:23:12 2002 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Thu, 11 Jul 2002 10:23:12 -0500 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -confirmation In-Reply-To: Message-ID: FYI, O'Reilly serves breakfast from 7 to 8:30 am. > -----Original Message----- > From: persistence-sig-bounces+pobrien=orbtech.com@python.org > [mailto:persistence-sig-bounces+pobrien=orbtech.com@python.org]On Behalf > Of Patrick K. O'Brien > Sent: Thursday, July 11, 2002 9:36 AM > To: Guido van Rossum; jim@zope.com > Cc: Persistence-Sig > Subject: RE: [Persistence-sig] FW: OSCON Birds of a Feather Session > -confirmation > > > [Guido van Rossum] > > > > > > 4. Give up on an "official" BOF and have lunch together or > > meet on our own > > > > somehow, somewhere. > > > > > > How about getting together early on Wednesday, say 7 am? We > > could meet over breakfast. > > > > +1. > > Okay, breakfast it is. Now we need to decide where. O'Reilly provides > breakfast and I'm trying to find out when they start serving. But I think > 7:00 is probably a safe time. So we could just plan to meet at > the O'Reilly > Food & Beverage Banquet Tent. > > The other option is the hotel restaurant. Harbor's Edge Restaurant is > located off the main lobby of the Sheraton East Tower, tantalizes your > palette with American Eclectic Cuisine and offers a panoramic view of the > Marina. Better still, they're open for breakfast starting at 6:30am. > > Any preferences? > > -- > Patrick K. O'Brien > Orbtech > ----------------------------------------------- > "Your source for Python software development." > ----------------------------------------------- > Web: http://www.orbtech.com/web/pobrien/ > Blog: http://www.orbtech.com/blog/pobrien/ > Wiki: http://www.orbtech.com/wiki/PatrickOBrien > ----------------------------------------------- > > > > _______________________________________________ > Persistence-sig mailing list > Persistence-sig@python.org > http://mail.python.org/mailman-21/listinfo/persistence-sig From jim@zope.com Thu Jul 11 17:03:49 2002 From: jim@zope.com (Jim Fulton) Date: Thu, 11 Jul 2002 12:03:49 -0400 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -confirmation References: Message-ID: <3D2DAC65.90305@zope.com> Patrick K. O'Brien wrote: > FYI, O'Reilly serves breakfast from 7 to 8:30 am. Let's just meet there right at 7 or a few minutes before. They have reasonably big tables that would be good for such a get together. Jim > >>-----Original Message----- >>From: persistence-sig-bounces+pobrien=orbtech.com@python.org >>[mailto:persistence-sig-bounces+pobrien=orbtech.com@python.org]On Behalf >>Of Patrick K. O'Brien >>Sent: Thursday, July 11, 2002 9:36 AM >>To: Guido van Rossum; jim@zope.com >>Cc: Persistence-Sig >>Subject: RE: [Persistence-sig] FW: OSCON Birds of a Feather Session >>-confirmation >> >> >>[Guido van Rossum] >> >>>>>4. Give up on an "official" BOF and have lunch together or >>>>> >>>meet on our own >>> >>>>>somehow, somewhere. >>>>> >>>>How about getting together early on Wednesday, say 7 am? We >>>> >>>could meet over breakfast. >>> >>>+1. >>> >>Okay, breakfast it is. Now we need to decide where. O'Reilly provides >>breakfast and I'm trying to find out when they start serving. But I think >>7:00 is probably a safe time. So we could just plan to meet at >>the O'Reilly >>Food & Beverage Banquet Tent. >> >>The other option is the hotel restaurant. Harbor's Edge Restaurant is >>located off the main lobby of the Sheraton East Tower, tantalizes your >>palette with American Eclectic Cuisine and offers a panoramic view of the >>Marina. Better still, they're open for breakfast starting at 6:30am. >> >>Any preferences? >> >>-- >>Patrick K. O'Brien >>Orbtech >>----------------------------------------------- >>"Your source for Python software development." >>----------------------------------------------- >>Web: http://www.orbtech.com/web/pobrien/ >>Blog: http://www.orbtech.com/blog/pobrien/ >>Wiki: http://www.orbtech.com/wiki/PatrickOBrien >>----------------------------------------------- >> >> >> >>_______________________________________________ >>Persistence-sig mailing list >>Persistence-sig@python.org >>http://mail.python.org/mailman-21/listinfo/persistence-sig >> -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From guido@python.org Thu Jul 11 17:33:08 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 11 Jul 2002 12:33:08 -0400 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -confirmation In-Reply-To: Your message of "Thu, 11 Jul 2002 12:03:49 EDT." <3D2DAC65.90305@zope.com> References: <3D2DAC65.90305@zope.com> Message-ID: <200207111633.g6BGX8F13139@odiug.zope.com> > Let's just meet there right at 7 or a few minutes > before. They have reasonably big tables that would be > good for such a get together. OK. Wednesday, 7am at the O'Reilly breakfast table. So far those present will be Jim, Patrick and me. Who else plans to be there? Where else could we announce this? --Guido van Rossum (home page: http://www.python.org/~guido/) From pobrien@orbtech.com Thu Jul 11 17:46:54 2002 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Thu, 11 Jul 2002 11:46:54 -0500 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -confirmation In-Reply-To: <200207111633.g6BGX8F13139@odiug.zope.com> Message-ID: [Guido van Rossum] > > OK. Wednesday, 7am at the O'Reilly breakfast table. > > So far those present will be Jim, Patrick and me. Who else plans to > be there? Where else could we announce this? I'm going to go ahead and hold the BOF on Thursday night as well for those of us who can't get enough of this Persistence topic. I'll report the results of the breakfast meeting at the BOF, and I'll take notes at the BOF and report them back here. I'll also see if I can get O'Reilly to change the BOF description to mention that we'll be having an informal pre-BOF meeting over breakfast Wednesday morning. Should someone list this information on the SIG web page on python.org as well? -- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python software development." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien ----------------------------------------------- From pobrien@orbtech.com Thu Jul 11 20:38:59 2002 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Thu, 11 Jul 2002 14:38:59 -0500 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session-confirmation In-Reply-To: Message-ID: The BOF info at O'Reilly (http://conferences.oreillynet.com/pub/w/15/bof.html) now looks like this: Python Persistence Date: 07/25/2002 Time: 8:00pm - 10:00pm Location: Grande Ballroom C in the East Tower Moderated by: Patrick O'Brien, Orbtech A Python Persistence Special Interest Group was recently formed to explore ways to add basic persistence and transaction mechanisms into the core of Python to avoid duplication of effort by a variety of projects that have similar issues. This BOF will permit participants to ponder Python persistence in person. In addition, anyone interested in an informal Python Persistence breakfast discussion with Jim Fulton and Guido van Rossum is welcome to join us at the O'Reilly Food Tent Wednesday morning at 7am. -- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python software development." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien ----------------------------------------------- From jim@zope.com Fri Jul 12 17:44:45 2002 From: jim@zope.com (Jim Fulton) Date: Fri, 12 Jul 2002 12:44:45 -0400 Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -confirmation References: Message-ID: <3D2F077D.3000201@zope.com> Patrick K. O'Brien wrote: > [Guido van Rossum] > >>OK. Wednesday, 7am at the O'Reilly breakfast table. >> >>So far those present will be Jim, Patrick and me. Who else plans to >>be there? Where else could we announce this? >> > > I'm going to go ahead and hold the BOF on Thursday night as well for those > of us who can't get enough of this Persistence topic. I'll report the > results of the breakfast meeting at the BOF, and I'll take notes at the BOF > and report them back here. I'll also see if I can get O'Reilly to change the > BOF description to mention that we'll be having an informal pre-BOF meeting > over breakfast Wednesday morning. I went ahead and extended my stay a day, so I'll be able to make the BoF. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From pje@telecommunity.com Sun Jul 14 17:21:52 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Sun, 14 Jul 2002 12:21:52 -0400 Subject: [Persistence-sig] "Straw Man" transaction API Message-ID: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> Since it's been pretty quiet here, apart from the BOF discussion, I thought I'd draft up a transaction/participant API to stir up some debate. I did a little research on JTA and related protocols in Java, and found that JTA is actually pretty pitiful in comparison to the rich model already offered by ZODB. Also, the DBAPI doesn't really offer a way to get at multi-phase commit protocols, but perhaps if we get a nice Python transaction API together, we can encourage such access be made available in DBAPI 3.0. My goals for the straw man were to support the functionality of ZODB transactions, but without any ZODB-specific baggage in the API, to decouple the management of dirty objects, writes, etc. from the co-ordination of the transaction itself, and to support a richer model of what a "transaction participant" is, including the ability to nest or chain storage mechanisms together to an arbitrary depth. Backward compatibility in the API or the transaction coordination messages was explicitly not a goal. Anyway, here it is, for all of you to pick apart or set fire to, like the straw man it is. I ask only that you read the whole thing before you light up your flamethrowers. :) """'Straw Man' Transaction Interfaces""" class Transaction: """Manages transaction lifecycle, participants, and metadata. There is no predefined number of transactions that may exist, or what they are associated with. Depending on the application model, there may be one per application, one per transaction, one per incoming connection (in server applications), or some other number. The transaction package should, however, offer an API for managing per-thread (or per-app, if threads aren't being used) transactions, since this will probably be the most common usage scenario.""" # The basic transaction lifecycle def begin(self, **info): """Begin a transaction. Raise TransactionInProgress if already begun. Any keyword arguments are passed on to the setInfo() method. (See below.)""" def commit(self): """Commit the transaction, or raise NoTransaction if not in progress.""" def abort(self): """Abort the transaction, or raise NoTransaction if not in progress.""" # Managing participants def subscribe(self, participant): """Add 'participant' to the set of objects that will receive transaction messages. Note that no particular ordering of participants should be assumed. If the transaction is already active, 'participant' will receive a 'begin_txn()' message. If a commit or savepoint is in progress, 'participant' may also receive other messages to "catch it up" to the other participants. However, if the commit or savepoint has already progressed too far for the new participant to join in, a TransactionInProgress error will be raised. Note: this is not ZODB! Participants remain subscribed until they unsubscribe, or until the transaction object is de-allocated!""" def unsubscribe(self, participant): """Remove 'participant' from the set of objects that will receive transaction messages. It can only be called when a transaction is not in progress, or in response to begin/commit/abort_txn() messages received by the unsubscribing participant. Otherwise, TransactionInProgress will be raised.""" # Getting/setting information about a transaction def isActive(self): """Return True if transaction is in progress.""" def getTimestamp(self): """Return the time that the transaction began, in time.time() format, or None if no transaction in progress.""" def setInfo(self, **args): """Update the transaction's metadata dictionary with the supplied keyword arguments. This can be used to record information such as a description of the transaction, the user who performed it, etc. Note that the transaction itself does nothing with this information. Transaction participants will need to retrieve the information with 'getInfo()' and record it at the appropriate point during the transaction.""" def getInfo(self): """Return a copy of the transaction's metadata dictionary""" # "Sub-transaction" support def savepoint(self): """Request a write to stable storage, and mark a savepoint for possible partial rollback via 'revert()'. This will most often be used simply to suggest a good time for in-memory data to be written out. But it can also be used in conjunction with revert() to provide a single-level 'nested transaction', if all participants support reverting to a savepoint.""" def revert(self): """Request a rollback to the last savepoint. If no savepoint has occurred in this transaction, this is implemented via an abort(), followed by a begin(), keeping the same metadata. If a savepoint has occurred, this will raise CannotRevertException unless all transaction participants support reverting to a savepoint.""" class Participant: """Participant in a transaction; may be a resource manager, a transactional cache, or just a logging/monitoring object. Event sequence is approximately as follows: begin_txn ( ( begin_savepoint end_savepoint ) | revert ) * ( begin_commit vote_commit commit_txn ) | abort_txn In other words, every transaction begins with begin_txn, and ends with either commit_txn or abort_txn. A commit_txn will always be preceded by begin_commit and vote_commit. An abort_txn may occur at *any* point following begin_txn, and aborts the transaction. begin/end_savepoint pairs and revert() messages may occur any time between begin_txn and begin_commit, as long as abort_txn hasn't happened. Generally speaking, participants fall into a few broad categories: * Database connections * Resource managers that write data to another participant, e.g. a storage manager writing to a database connection * Resource managers that manage their own storage transactions, e.g. ZODB Database/Storage objects, a filesystem-based queue, etc. * Objects which don't manage any transactional resources, but need to know what's happening with a transaction, in order to log it. Each kind of participant will typically use different messages to achieve their goals. Resource managers that use other participants for storage, for example, won't care much about begin_txn() and vote_commit(), while a resource manager that manages direct storage will care about vote_commit() very deeply! Resource managers that use other participants for storage, but buffer writes to the other participant, will need to pay close attention to the begin_savepoint() and begin_commit() messages. Specifically, they must flush all pending writes to the participant that handles their storage, and enter a "write-through" mode, where any further writes are flushed immediately to the underlying participant. This is to ensure that all writes are written to the "root participant" for those writes, by the time end_savepoint() or vote_commit() is issued. By following this algorithm, any number of participants may be chained together, such as a persistence manager that writes to an XML document, which is persisted in a database table, which is persisted in a disk file. The persistence manager, the XML document, the database table, and the disk file would all be participants, but only the disk file would actually use vote_commit() and commit_txn() to handle a commit. All of the other participants would flush pending updates and enter write-through mode at the begin_commit() message, guaranteeing that the disk file participant would know about all the updates by the time vote_comit() was issued, regardless of the order in which the participants received the messages.""" def begin_txn(self, txn): """Transaction is beginning; nothing special to be done in most cases. A transactional cache might use this message to reset itself. A database connection might issue BEGIN TRAN here.""" def begin_savepoint(self, txn): """Savepoint is beginning; flush dirty objects and enter write-through mode, if applicable. Note: this is not ZODB! You will not get savepoint messages before a regular commit, just because another savepoint has already occurred!""" def end_savepoint(self, txn): """Savepoint is finished, it's safe to return to buffering writes; a database connection would probably issue a savepoint/checkpoint command here.""" def revert(self, txn): """Roll back to last savepoint, or raise CannotRevertException; Database connections whose underlying DB doesn't support savepoints should definitely raise CannotRevertError. Resource managers that write data to other participants, should simply roll back state for all objects changed since the last savepoint, whether written through to the underlying storage or not. Transactional caches may want to reset on this message, also, depending on their precise semantics. Note: this is not ZODB! You will not get a revert() before an abort_txn(), just because a savepoint has occurred during the transaction!""" def begin_commit(self, txn): """Transaction commit is beginning; flush dirty objects and enter write-through mode, if applicable. DB connections will probably do nothing here. Note: participants *must* continue to accept writes until vote_commit() occurs, and *must* accept repeated writes of the same objects!""" def vote_commit(self, txn): """Raise an exception if commit isn't possible. This will mostly be used by resource managers that handle their own storage, or the few DB connections that are capable of multi-phase commit.""" def commit_txn(self, txn): """This message follows vote_commit, if no participants vetoed the commit. DB connections will probably issue COMMIT TRAN here. Transactional caches might use this message to reset themselves.""" def abort_txn(self, txn): """This message can be received at any time, and means the entire transaction must be rolled back. Transactional caches might use this message to reset themselves.""" From Sebastien.Bigaret@inqual.com Mon Jul 15 14:06:16 2002 From: Sebastien.Bigaret@inqual.com (Sebastien Bigaret) Date: 15 Jul 2002 15:06:16 +0200 Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: "Phillip J. Eby"'s message of "Sun, 14 Jul 2002 12:21:52 -0400" References: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> Message-ID: <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> Ok, now for some comments (with no flamethrowers lighted up, but maybe I'll trigger some ;) About the Transaction API: The API seems globally OK to me I have. I'd like to make the following remarks: - about info/setInfo(): maybe we need a setInfo() different from an updateInfo() or addToInfo(). I also suspect that a 'ResourceManager' writing info. to other participants might use such a metadictionary to pass additional information for use in the current transaction (warning: name collision); if this is *not* the place for that, it should perhaps be stated in doc. - registration of Participants: We might need a unique identifier for a given participant ; e.g., we might wish that only one participant for a given 'postgresql' DB connection is registered (in that case, the id. could be the DB backend name+the connectionDictionary). Obviously participants could still register without an id. - revert(): I expected an 'undo()' ; 'revert' sounds like 'abort' to me, but this can just be a language problem --the documentation made it clear. - about commit(): I see this basically like a vote_commit() on each participants, followed by a commit_txn() I have the feeling that what will be done during the commit() phase should be explicitly stated, along with the goals we are going after. Here is a little example: suppose a transaction has to commit changes against two different DB storages, DB1 which supports multi-phase commit, DB2 which does not. Then they get vote_commit(): DB1 will be able to answer OK or KO, but DB2 will not because it is not capable of saying whether a transaction will successfully succeed, hence: it answers 'OK' to the 'vote_commit' message. Now the participants gets the commit_txn() ; since we do not assume any particular ordering for paricipants, suppose that DB1 gets it first. DB1 commits the changes, then DB2 attempts to commit its changes but fails: what can we do? We can stop committing and start sending 'abort_txn' to all participants, however, DB1 is likely to be unable to revert the already committed changes --and this will definitely be the case if both DB1 and DB2 do not support nested transactions). My opinion here is that we shouldn't try to handle multi-backends commits as a whole -- some backends simply makes it almost impossible. But: this should be clearly stated. - last on this: it may be useful for observers to get events such as transaction_did_commit() (committing is a Transaction's message for which we cannot guarantee it will come to its normal end, for the reasons written above) ; I'm thinking here of some DB-caches that would be participants/observers for the Transaction machinery, that would take the opportunity to update their caches, etc. About the Participant API: - I have some problems about the begin/end_savepoint(): again this might be a language problem, but I would prefer something like 'prepareToSavepoint()' and 'markSavepoint()' - same for begin_commit() - vote_for_commit: as far as I understand participants using other participants can simply ignore it, but should not raise (exception to be named, BTW). To my understanding, a raise here is understood as a veto. Is that it? Last: do we need to specify a TransactionManager or TransactionFactory API? Some ideas about what could be done there: (hmm, this could be made class method as well) - registering participants' factories, so that Transactions can be initialized with a default set of participants, since applications often use the same configuration for their Transactions. Something like: def buildDefaultTransaction(self) - ??? It seems to me that the points stressed in the sig-charter are taken into account here --except for the 'Effective Memory Usage' which, by the way, cannot be addressed at the transaction level --and I do not really see how this particular point can be made anything else but a ``compulsory recommendation'' ?! -- Sebastien. From jim@zope.com Mon Jul 15 14:59:50 2002 From: jim@zope.com (Jim Fulton) Date: Mon, 15 Jul 2002 09:59:50 -0400 Subject: [Persistence-sig] "Straw Man" transaction API References: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> Message-ID: <3D32D556.6040801@zope.com> This is an interesting proposal. I'll me interested to see more discussion on it. It appears to shift responsability for management of individual object changes further into the resource managers, which is fine. I'm a little fuzzy on participants that write data to other participants. The notion that they flush data on begin_savepoints feels a little brittle to me. If the participant the flush to does any significant work on begin_savepoint, then it appears that things could happen in an inconvenient order and cause problems. Is the transaction info cleared at transaction boundaries? Jim Phillip J. Eby wrote: > Since it's been pretty quiet here, apart from the BOF discussion, I > thought I'd draft up a transaction/participant API to stir up some > debate. I did a little research on JTA and related protocols in Java, > and found that JTA is actually pretty pitiful in comparison to the rich > model already offered by ZODB. Also, the DBAPI doesn't really offer a > way to get at multi-phase commit protocols, but perhaps if we get a nice > Python transaction API together, we can encourage such access be made > available in DBAPI 3.0. > > My goals for the straw man were to support the functionality of ZODB > transactions, but without any ZODB-specific baggage in the API, to > decouple the management of dirty objects, writes, etc. from the > co-ordination of the transaction itself, and to support a richer model > of what a "transaction participant" is, including the ability to nest or > chain storage mechanisms together to an arbitrary depth. Backward > compatibility in the API or the transaction coordination messages was > explicitly not a goal. > > Anyway, here it is, for all of you to pick apart or set fire to, like > the straw man it is. I ask only that you read the whole thing before > you light up your flamethrowers. :) > > > """'Straw Man' Transaction Interfaces""" > > class Transaction: > > """Manages transaction lifecycle, participants, and metadata. > > There is no predefined number of transactions that may exist, or > what they are associated with. Depending on the application > model, there may be one per application, one per transaction, one > per incoming connection (in server applications), or some other > number. The transaction package should, however, offer an API for > managing per-thread (or per-app, if threads aren't being used) > transactions, since this will probably be the most common usage > scenario.""" > > # The basic transaction lifecycle > > def begin(self, **info): > """Begin a transaction. Raise TransactionInProgress if > already begun. Any keyword arguments are passed on to the > setInfo() method. (See below.)""" > > def commit(self): > """Commit the transaction, or raise NoTransaction if not in > progress.""" > > def abort(self): > """Abort the transaction, or raise NoTransaction if not in > progress.""" > > > # Managing participants > > def subscribe(self, participant): > """Add 'participant' to the set of objects that will receive > transaction messages. Note that no particular ordering of > participants should be assumed. If the transaction is already > active, 'participant' will receive a 'begin_txn()' message. If > a commit or savepoint is in progress, 'participant' may also > receive other messages to "catch it up" to the other > participants. However, if the commit or savepoint has already > progressed too far for the new participant to join in, a > TransactionInProgress error will be raised. > > Note: this is not ZODB! Participants remain subscribed until > they unsubscribe, or until the transaction object is > de-allocated!""" > > def unsubscribe(self, participant): > """Remove 'participant' from the set of objects that will > receive transaction messages. It can only be called when a > transaction is not in progress, or in response to > begin/commit/abort_txn() messages received by the > unsubscribing participant. Otherwise, TransactionInProgress > will be raised.""" > > > # Getting/setting information about a transaction > > def isActive(self): > """Return True if transaction is in progress.""" > > def getTimestamp(self): > """Return the time that the transaction began, in time.time() > format, or None if no transaction in progress.""" > > def setInfo(self, **args): > """Update the transaction's metadata dictionary with the > supplied keyword arguments. This can be used to record > information such as a description of the transaction, the user > who performed it, etc. Note that the transaction itself does > nothing with this information. Transaction participants will > need to retrieve the information with 'getInfo()' and record > it at the appropriate point during the transaction.""" > > def getInfo(self): > """Return a copy of the transaction's metadata dictionary""" > > > # "Sub-transaction" support > > def savepoint(self): > """Request a write to stable storage, and mark a savepoint for > possible partial rollback via 'revert()'. This will most > often be used simply to suggest a good time for in-memory data > to be written out. But it can also be used in conjunction > with revert() to provide a single-level 'nested transaction', > if all participants support reverting to a savepoint.""" > > def revert(self): > """Request a rollback to the last savepoint. If no savepoint > has occurred in this transaction, this is implemented via an > abort(), followed by a begin(), keeping the same metadata. If > a savepoint has occurred, this will raise > CannotRevertException unless all transaction participants > support reverting to a savepoint.""" > > > > class Participant: > """Participant in a transaction; may be a resource manager, a > transactional cache, or just a logging/monitoring object. > > Event sequence is approximately as follows: > > begin_txn > ( ( begin_savepoint end_savepoint ) | revert ) * > ( begin_commit vote_commit commit_txn ) | abort_txn > > In other words, every transaction begins with begin_txn, and ends > with either commit_txn or abort_txn. A commit_txn will always be > preceded by begin_commit and vote_commit. An abort_txn may occur > at *any* point following begin_txn, and aborts the transaction. > begin/end_savepoint pairs and revert() messages may occur any time > between begin_txn and begin_commit, as long as abort_txn hasn't > happened. > > Generally speaking, participants fall into a few broad categories: > > * Database connections > > * Resource managers that write data to another participant, e.g. a > storage manager writing to a database connection > > * Resource managers that manage their own storage transactions, > e.g. ZODB Database/Storage objects, a filesystem-based queue, etc. > > * Objects which don't manage any transactional resources, but need to > know what's happening with a transaction, in order to log it. > > Each kind of participant will typically use different messages to > achieve their goals. Resource managers that use other > participants for storage, for example, won't care much about > begin_txn() and vote_commit(), while a resource manager that > manages direct storage will care about vote_commit() very deeply! > > Resource managers that use other participants for storage, but > buffer writes to the other participant, will need to pay close > attention to the begin_savepoint() and begin_commit() messages. > Specifically, they must flush all pending writes to the > participant that handles their storage, and enter a > "write-through" mode, where any further writes are flushed > immediately to the underlying participant. This is to ensure that > all writes are written to the "root participant" for those writes, > by the time end_savepoint() or vote_commit() is issued. > > By following this algorithm, any number of participants may be > chained together, such as a persistence manager that writes to an > XML document, which is persisted in a database table, which is > persisted in a disk file. The persistence manager, the XML > document, the database table, and the disk file would all be > participants, but only the disk file would actually use > vote_commit() and commit_txn() to handle a commit. All of the > other participants would flush pending updates and enter > write-through mode at the begin_commit() message, guaranteeing that > the disk file participant would know about all the updates by the > time vote_comit() was issued, regardless of the order in which the > participants received the messages.""" > > def begin_txn(self, txn): > """Transaction is beginning; nothing special to be done in > most cases. A transactional cache might use this message to > reset itself. A database connection might issue BEGIN TRAN > here.""" > > def begin_savepoint(self, txn): > """Savepoint is beginning; flush dirty objects and enter > write-through mode, if applicable. Note: this is not ZODB! > You will not get savepoint messages before a regular commit, > just because another savepoint has already occurred!""" > > def end_savepoint(self, txn): > """Savepoint is finished, it's safe to return to buffering > writes; a database connection would probably issue a > savepoint/checkpoint command here.""" > > def revert(self, txn): > """Roll back to last savepoint, or raise > CannotRevertException; Database connections whose underlying > DB doesn't support savepoints should definitely raise > CannotRevertError. Resource managers that write data to other > participants, should simply roll back state for all objects > changed since the last savepoint, whether written through to > the underlying storage or not. Transactional caches may want > to reset on this message, also, depending on their precise > semantics. Note: this is not ZODB! You will not get a > revert() before an abort_txn(), just because a savepoint has > occurred during the transaction!""" > > def begin_commit(self, txn): > """Transaction commit is beginning; flush dirty objects and > enter write-through mode, if applicable. DB connections will > probably do nothing here. Note: participants *must* continue > to accept writes until vote_commit() occurs, and *must* > accept repeated writes of the same objects!""" > > def vote_commit(self, txn): > """Raise an exception if commit isn't possible. This will > mostly be used by resource managers that handle their own > storage, or the few DB connections that are capable of > multi-phase commit.""" > > def commit_txn(self, txn): > """This message follows vote_commit, if no participants vetoed > the commit. DB connections will probably issue COMMIT TRAN > here. Transactional caches might use this message to reset > themselves.""" > > def abort_txn(self, txn): > """This message can be received at any time, and means the > entire transaction must be rolled back. Transactional caches > might use this message to reset themselves.""" > > > > > _______________________________________________ > Persistence-sig mailing list > Persistence-sig@python.org > http://mail.python.org/mailman-21/listinfo/persistence-sig -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From Sebastien.Bigaret@inqual.com Mon Jul 15 15:18:47 2002 From: Sebastien.Bigaret@inqual.com (Sebastien Bigaret) Date: 15 Jul 2002 16:18:47 +0200 Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: Sebastien Bigaret's message of "15 Jul 2002 15:06:16 +0200" References: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> Message-ID: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> > My opinion here is that we shouldn't try to handle multi-backends commits > as a whole -- some backends simply makes it almost impossible. But: this > should be clearly stated. More on this: what do you think of the following possible additions to the API? (This is just a quick draft) NB: TP Monitoring stands for Transaction Processing Monitoring - to class Participant: canBeTPMonitored(self): # or supportNestedTransactions() ? """ Tells whether the participant can be integrated into a TP monitoring process. Valid answers are: - YES ; e.g. RDBMS that support nested transactions will definitely answer yes. - NO - NOT_APPLICABLE ; e.g. for listeners - to class Transaction: def isTPMonitoringEnabled(self): """ Answer is true if all participants answer 'YES' or 'NOT_APPLICABLE' to 'canBeTPMonitored', false otherwise. """ --> With such an API, we could then make sure that, when isTPMonitoringEnabled() evaluates to true, the commit() phase in a Transaction ensures that it does all the changes or rollbacks everything. (e.g. by beginning a top-level transaction in each participants, via the appropriate API --to be defined--, and by committing this top-level transaction at the end of the commit phase, when everything has gone smoothly). Last note: I'm not positive at all about canBeTPMonitored being equivalent to the ability of using/simulating nested transactions ; I have the feeling that at least the latter implies the former. For RDBMS it seems OK, for file-based storage this could be emulated through concurrent versioning, but the general case is quite a bit beyond my knowledges. And this is not something I already played with but just a dream of mine, so people having real experience with TP monitoring processes can go and grab their flamethrowers now! -- Sebastien. From pje@telecommunity.com Mon Jul 15 16:36:26 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Mon, 15 Jul 2002 11:36:26 -0400 Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <3D32D556.6040801@zope.com> References: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> Message-ID: <3.0.5.32.20020715113626.01abf990@telecommunity.com> At 09:59 AM 7/15/02 -0400, Jim Fulton wrote: > >This is an interesting proposal. I'll me interested to see >more discussion on it. It appears to shift responsability for >management of individual object changes further into the resource >managers, which is fine. My thought on this was that the resource managers know more about their objects than the transaction does. Also, this should greatly reduce the complexity of the commit operation compared to ZODB's Transaction. And most important, it *decouples transactions from the persistence framework*. ZODB's Transaction has to know how to get at an object's storage manager, while Strawman doesn't. >I'm a little fuzzy on participants that write data to other participants. >The notion that they flush data on begin_savepoints feels a little >brittle to me. If the participant the flush to does any significant work >on begin_savepoint, then it appears that things could happen in an inconvenient >order and cause problems. The assumption here is that things which do "real" work (as opposed to writing to another participant) should trap the second message. In other words, there's a pretty solid distinction between "delegating" participants and "direct" participants, in terms of their behavior. The use cases I'm looking at are one or more persistence mechanisms writing to a storage mechanism. At some point, you have to "bottom out" to "real" storage, and that's where you handle the ending of a savepoint or commit. >Is the transaction info cleared at transaction boundaries? If you mean the setInfo() stuff, yes. I probably should've documented that, but then if I documented *every* assumption I made, the doc would've been twice the size. :) From pje@telecommunity.com Mon Jul 15 16:57:56 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Mon, 15 Jul 2002 11:57:56 -0400 Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> References: <"Phillip J. Eby"'s message of "Sun, 14 Jul 2002 12:21:52 -0400"> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> Message-ID: <3.0.5.32.20020715115756.01ab65f0@telecommunity.com> At 03:06 PM 7/15/02 +0200, Sebastien Bigaret wrote: > > - about info/setInfo(): maybe we need a setInfo() different from an > updateInfo() or addToInfo(). I also suspect that a 'ResourceManager' > writing info. to other participants might use such a metadictionary to > pass additional information for use in the current transaction (warning: > name collision); if this is *not* the place for that, it should perhaps be > stated in doc. I probably should've called it updateInfo(), since a dict.update() was the semantics I had in mind. I was deliberately leaving it vague as to what information might be passed in it, since it was primarily a mechanism to allow for extensions, not to mention support of Zope's need to save "user" and "note" metadata on transactions. > - registration of Participants: > > We might need a unique identifier for a given participant ; e.g., we > might wish that only one participant for a given 'postgresql' DB > connection is registered (in that case, the id. could be the DB backend > name+the connectionDictionary). > > Obviously participants could still register without an id. I think an identifier is a YAGNI. I'm almost positive that my application model won't need it. But if you just want register() to guarantee that the participant is registered once and only once, that's fine by me and a sensible thing, IMHO. Although I might just as soon it raise ParticipantAlreadyRegistered if you register it again, as that might help expose a bug in your code. :) Of course, if it does that, then I suppose exposing an isRegistered(participant) method would allow you to work around that. > - revert(): I expected an 'undo()' ; 'revert' sounds like 'abort' to me, but > this can just be a language problem --the documentation made it clear. I tried to use common RDBMS terminology; the few examples of "checkpoint" or "savepoint" I found (e.g. Sybase) used "revert" as the terminology for going back to the last checkpoint or savepoint. > - about commit(): I see this basically like a vote_commit() on each > participants, followed by a commit_txn() Actually it's begin_commit() on each, vote_commit() on each, and then commit_txn() on each. > I have the feeling that what will be done during the commit() phase should > be explicitly stated, along with the goals we are going after. Here is a > little example: suppose a transaction has to commit changes against two > different DB storages, DB1 which supports multi-phase commit, DB2 which > does not. > > Then they get vote_commit(): DB1 will be able to answer OK or KO, but DB2 > will not because it is not capable of saying whether a transaction will > successfully succeed, hence: it answers 'OK' to the 'vote_commit' message. > > Now the participants gets the commit_txn() ; since we do not assume any > particular ordering for paricipants, suppose that DB1 gets it first. DB1 > commits the changes, then DB2 attempts to commit its changes but fails: > what can we do? We can stop committing and start sending 'abort_txn' to > all participants, however, DB1 is likely to be unable to revert the > already committed changes --and this will definitely be the case if both > DB1 and DB2 do not support nested transactions). > > My opinion here is that we shouldn't try to handle multi-backends commits > as a whole -- some backends simply makes it almost impossible. But: this > should be clearly stated. Actually, I think we should just document what will happen if you mix voting and non-voting participants. Also, we may wish to have some way to declare a participant non-voting, so that such participants can receive commit_txn() first. ZODB Transactions can survive the failure of *one* commit_txn() message, and StrawMan can too. The most common use case for a non-voting paricipant would be an RDBMS connection, and the most common use case of such is to have only one, even if there will be other participants writing to it. ZODB declares itself "hosed" when a failure occurs past the first tpc_finish() (its equivalent to commit_txn). We will need to be similarly cautious, if there is more than one non-voting participant. > - last on this: it may be useful for observers to get events such as > transaction_did_commit() (committing is a Transaction's message for which > we cannot guarantee it will come to its normal end, for the reasons > written above) ; I'm thinking here of some DB-caches that would be > participants/observers for the Transaction machinery, that would take the > opportunity to update their caches, etc. That's a good point, perhaps adding a 'commit_finished()' message might do the trick, although there are already quite a lot of messages. > - I have some problems about the begin/end_savepoint(): again this might be > a language problem, but I would prefer something like > 'prepareToSavepoint()' and 'markSavepoint()' Those aren't bad. > - same for begin_commit() I could see prepareToCommit or prepare_for_commit, certainly. > - vote_for_commit: as far as I understand participants using other > participants can simply ignore it, but should not raise (exception to be > named, BTW). To my understanding, a raise here is understood as a veto. > Is that it? 'vote_on_commit' seems more natural to me, phrasing-wise. Yes, a raise is a veto; that's an assumption from ZODB transactions that I failed to document. >Last: do we need to specify a TransactionManager or TransactionFactory API? I don't think so, really, other than what I mentioned about providing some simple thread-specific associations. >Some ideas about what could be done there: (hmm, this could be made class >method as well) > > - registering participants' factories, so that Transactions can be > initialized with a default set of participants, since applications often > use the same configuration for their Transactions. Something like: > > def buildDefaultTransaction(self) YAGNI. The code that sets up the participants should know their transactional scope, and thus is capable of registering them with the appropriate transaction. > It seems to me that the points stressed in the sig-charter are taken into >account here --except for the 'Effective Memory Usage' which, by the way, >cannot be addressed at the transaction level --and I do not really see how >this particular point can be made anything else but a ``compulsory >recommendation'' ?! Actually, as was noted in the savepoint-related docstrings, one purpose of the savepoint API is to indicate a "good time to write things out", which can free up memory used by queued updates. Also, in ZODB's persistence model, dirty objects can't be dropped from the cache (since they contain state that needs to be written). So if their writes can be flushed, they become eligible to be "ghosted" out of the cache and the memory made available as well. This can be an issue in large ZODB transactions, especially those done by full-text indexing operations. So actually the transaction API *does* have some contact points with memory usage. And the main reason I put savepoint() in was to accomodate this requirement for ZODB. I don't really expect to have much use for it in my primary applications development. From pje@telecommunity.com Mon Jul 15 22:46:54 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Mon, 15 Jul 2002 17:46:54 -0400 Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> References: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> Message-ID: <3.0.5.32.20020715174654.0085eac0@telecommunity.com> At 04:18 PM 7/15/02 +0200, Sebastien Bigaret wrote: > >- to class Participant: > > canBeTPMonitored(self): # or supportNestedTransactions() ? > """ > Tells whether the participant can be integrated into a TP monitoring > process. Valid answers are: > > - YES ; e.g. RDBMS that support nested transactions will definitely > answer yes. > > - NO > > - NOT_APPLICABLE ; e.g. for listeners I'm pretty sure that this terminology is not accurate. Nested transactions and multi-phase commit aren't really related, AFAIK. It's quite possible to support either one without the other. If we're going to have introspection for multi-phase commit, I'd rather have something like 'canVote()', with the response being False if the participant can raise errors during commit_txn(), or true if the participant guarantees it will not fail on commit_txn() if it didn't veto commit during the voting phase. >- to class Transaction: > > def isTPMonitoringEnabled(self): > """ > Answer is true if all participants answer 'YES' or 'NOT_APPLICABLE' to > 'canBeTPMonitored', false otherwise. > > """ This is a YAGNI, I would say; the transaction is the only party that needs to know about its participants' voting capabilities. But it might be useful to expose a 'canRevert()' introspection on the transaction, that would tell us if all the participants support reverting to a savepoint. That information would be useful outside the transaction. From kennethroberts@eqcity.ktb.net Fri Jul 19 07:51:01 2002 From: kennethroberts@eqcity.ktb.net (Kennethroberts) Date: Fri, 19 Jul 2002 06:51:01 GMT Subject: [Persistence-sig] Put me on the list Message-ID: <02071900013235742@eqcity.ktb.net> Please place me on your sigs Mail list. From pje@telecommunity.com Fri Jul 19 16:52:07 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Fri, 19 Jul 2002 11:52:07 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <3.0.5.32.20020715174654.0085eac0@telecommunity.com> References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> Message-ID: <3.0.5.32.20020719115207.0086e100@telecommunity.com> Following on the unparalleled success of the "Straw Man" transaction API (he said, with tongue in cheek), I thought it might be good to make a proposal for persistence as well. Since I won't be at the BOF, I figure I should get my two cents in now, while the getting's good. Here's my proposal, such as it is... Deliver a Persistence package based on the one at http://cvs.zope.org/Zope3/lib/python/Persistence/ but with the following changes: * Remove the BTrees subpackage, and the Class, Cache, Function, and Module modules, along with the ICache interface. Rationale: The BTrees package is only useful for a relatively small subset of possible persistence backends, and is subject to periodic data structure changes which affect applications using it. It's probably best kept out of the Python core. Similar arguments apply to the Cache system, although not quite as strongly. Class, Function, and Module are very recent developments which have not had the extended usage that most of the rest of the code has. (Note: I don't mean to say that the persistence C code has been thoroughly exercised either, in the sense that much of it is completely new for Python 2.2. But its *design* has a long history, and previous implementations have had much testing of the kind of edge and corner issues that the Class, Function, and Module modules haven't been exposed to yet.) * I do think we should keep PersistentList and PersistentMapping in the core; they're useful for almost any kind of application, and any kind of back-end storage. They don't introduce policy or data format dependencies into users' code, either. * Make _p_dm a synonym for _p_jar, and deprecate _p_jar. This could be done by making a _p_jar descriptor that read/wrote through to _p_dm, and issued a deprecation warning. I don't personally have a problem with _p_jar, but I've heard rumblings from other people (ZC folks?) that it's confusing or that they want to get rid of it. So if we're doing it, now seems like the time. * Flag _p_changed *after* __setattr__, not before! This will help co-operative transaction participants play nicely together, since they can't "write through" a change if they're getting notified *before* the change takes place! Docs should also clarify that when set in other code, _p_changed should be set at the latest possible moment, *after* the object is in its new, stable state. * Keep the _p_atime slot, but don't fill it with anything by default. Instead, have a _p_getattr_hook(persistentObj,attrName,retrievedValue) slot at C level that's called after the getattribute completes. A data manager can then set the hook to point to a _p_atime update function, *or* it can introduce postprocessing for "proxy" attributes. That is, a data manager could set the hook to handle "lazy" loading of certain attributes which would otherwise be costly to retrieve, by placing a dummy value in the object's dictionary, and then having the post-call hook return a replacement value. For speed, this will generally want to be a C function; let the base package include a simple hook that updates _p_atime, and another which checks whether the retrievedValue is an instance of a LazyValue base class, and if so, calls the object. This will probably cover the basics. A data manager that uses ZODB caching will use the atime function, and non-ZODB data managers will probably want the other hook. I also have an idea about using the transaction's timestamp() plus a counter to supply a "time" value that minimizes system calls, but I'm not sure it would actually improve performance any, so I'm fine with not trying to push that into the initial package. As long as the hook slot is present in the base package, I or anyone else are free to make up and try our own hooks to put in it. * Get rid of the term "register", since objects won't "register" with the transaction, and neither should they with their data manager. They should "inform their data manager" that they have changed. Something like an objectChanged() message is appropriate in place of register(). I believe this would clarify the API. * Take out the interfaces. :( I'd rather this were, "leave this in, in a way such that it works whether you have Interface or not", but the reality is that a dependency in the standard library on something outside the standard library is a big no-no, and just begging for breakage as soon as there *is* an Interface package (with a new API) in the standard library. Whew! I think that about covers it, as far as what I'd like to see, and what I think would be needed to make it acceptable for the core. Comments? By the way, my rationale for not taking any radical new approaches to persistence, observation, or notification in this proposal is that the existing Persistence package is "transparent" enough, and has the benefit of lots of field experience. I spent a lot of time trying to come up with "better" ways before writing this; mostly I found that trying to make it more "transparent" to the object being persisted, just pushes the complexity into either the app or the backend, without really helping anything. It's not a really big deal to: 1. Subclass Persistent 2. Use PersistentList and PersistentMapping or other Persistent objects for your attributes, or set self._p_changed when you change a non-persistent mutable. 3. Use transactions Especially if that's all you need to do in order to have persistence to any number of backends, including the current ZODB and all the wonderful SQL or other mappings that will be creatable by everybody on this list using their own techniques. The key is not so much "transparency" per se, as *uniformity* across backends. I think the existing API is transparent enough; let's work on having uniform and universal access to it, as a Python core package. From pje@telecommunity.com Fri Jul 19 17:02:37 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Fri, 19 Jul 2002 12:02:37 -0400 Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <3.0.5.32.20020715115756.01ab65f0@telecommunity.com> References: <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <"Phillip J. Eby"'s message of "Sun, 14 Jul 2002 12:21:52 -0400"> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> Message-ID: <3.0.5.32.20020719120237.00898b60@telecommunity.com> One further comment on the Straw Man transaction API... I believe that the Python transaction API should issue Python warnings for problematic conditions, rather than write to a logger (such as zLOG in the current ZODB transactions). IMHO, even though 2.3 will include a logging mdoule, I'm not comfortable with the idea of a transaction co-ordinator itself issuing log messages, especially given the complexity of the logging package that's the main contender for implementing the logging PEP. I'd rather have something extremely simple, and warnings seem to me like the way to, well, issue warnings. :) If there's conflict about this point, though, I'd be okay with isolating either log calls or warnings into methods of the base transaction that could be overridden in a subclass, and then folks can choose their own way from there. From guido@python.org Fri Jul 19 17:09:01 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 12:09:01 -0400 Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: Your message of "Fri, 19 Jul 2002 12:02:37 EDT." <3.0.5.32.20020719120237.00898b60@telecommunity.com> References: <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <3.0.5.32.20020719120237.00898b60@telecommunity.com> Message-ID: <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net> > One further comment on the Straw Man transaction API... I believe that the > Python transaction API should issue Python warnings for problematic > conditions, rather than write to a logger (such as zLOG in the current ZODB > transactions). > > IMHO, even though 2.3 will include a logging mdoule, I'm not comfortable > with the idea of a transaction co-ordinator itself issuing log messages, > especially given the complexity of the logging package that's the main > contender for implementing the logging PEP. I'd rather have something > extremely simple, and warnings seem to me like the way to, well, issue > warnings. :) > > If there's conflict about this point, though, I'd be okay with isolating > either log calls or warnings into methods of the base transaction that > could be overridden in a subclass, and then folks can choose their own way > from there. Warnings seem better to me because there are several ways to decide how to deal with them (including turning them into errors and suppressing them completely) under control of either the program or command line options. It's also possible to have warnings be sent to a logger, and applications that use the logger should probably set this up. (Hm, maybe it would be cool if the logging module has a shortcut to redirect all warnings to the log?) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Jul 19 21:03:54 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 16:03:54 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: Your message of "Fri, 19 Jul 2002 11:52:07 EDT." <3.0.5.32.20020719115207.0086e100@telecommunity.com> References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> Message-ID: <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net> > * I do think we should keep PersistentList and PersistentMapping in the > core; they're useful for almost any kind of application, and any kind of > back-end storage. They don't introduce policy or data format dependencies > into users' code, either. But perhaps these should be rewritten to derive from dict and list instead of UserDict and UserList? Also, the module names are inconsistent -- PersistentMapping is defined in _persistentMapping.py but PersistentList is defined in PersistentList.py. Both are then "pulled up" one level by __init__.py and their __module__ attribute modified. I find all that hideous and tricky, and I propose to clean this up before making it a standard Python package. > * Make _p_dm a synonym for _p_jar, and deprecate _p_jar. This could be > done by making a _p_jar descriptor that read/wrote through to _p_dm, and > issued a deprecation warning. I don't personally have a problem with > _p_jar, but I've heard rumblings from other people (ZC folks?) that it's > confusing or that they want to get rid of it. So if we're doing it, now > seems like the time. It's just that "jar" makes no sense (except in the "cutesy" sense of a jar full of pickles). But "dm" is a little obscure too. Maybe write it out in full as _p_datamanager? > * Flag _p_changed *after* __setattr__, not before! This will help > co-operative transaction participants play nicely together, since they > can't "write through" a change if they're getting notified *before* the > change takes place! Docs should also clarify that when set in other code, > _p_changed should be set at the latest possible moment, *after* the object > is in its new, stable state. +1 > * Keep the _p_atime slot, but don't fill it with anything by default. > Instead, have a _p_getattr_hook(persistentObj,attrName,retrievedValue) slot > at C level that's called after the getattribute completes. A data manager > can then set the hook to point to a _p_atime update function, *or* it can > introduce postprocessing for "proxy" attributes. That is, a data manager > could set the hook to handle "lazy" loading of certain attributes which > would otherwise be costly to retrieve, by placing a dummy value in the > object's dictionary, and then having the post-call hook return a > replacement value. > > For speed, this will generally want to be a C function; let the base > package include a simple hook that updates _p_atime, and another which > checks whether the retrievedValue is an instance of a LazyValue base class, > and if so, calls the object. This will probably cover the basics. A data > manager that uses ZODB caching will use the atime function, and non-ZODB > data managers will probably want the other hook. I also have an idea about > using the transaction's timestamp() plus a counter to supply a "time" value > that minimizes system calls, but I'm not sure it would actually improve > performance any, so I'm fine with not trying to push that into the initial > package. As long as the hook slot is present in the base package, I or > anyone else are free to make up and try our own hooks to put in it. Shouldn't there be a setattr hook too? > * Get rid of the term "register", since objects won't "register" with the > transaction, and neither should they with their data manager. They should > "inform their data manager" that they have changed. Something like an > objectChanged() message is appropriate in place of register(). I believe > this would clarify the API. > > * Take out the interfaces. :( I'd rather this were, "leave this in, in a > way such that it works whether you have Interface or not", but the reality > is that a dependency in the standard library on something outside the > standard library is a big no-no, and just begging for breakage as soon as > there *is* an Interface package (with a new API) in the standard library. Of course. > Whew! I think that about covers it, as far as what I'd like to see, and > what I think would be needed to make it acceptable for the core. Comments? > > By the way, my rationale for not taking any radical new approaches to > persistence, observation, or notification in this proposal is that the > existing Persistence package is "transparent" enough, and has the benefit > of lots of field experience. I spent a lot of time trying to come up with > "better" ways before writing this; mostly I found that trying to make it > more "transparent" to the object being persisted, just pushes the > complexity into either the app or the backend, without really helping > anything. It's not a really big deal to: > > 1. Subclass Persistent > > 2. Use PersistentList and PersistentMapping or other Persistent objects for > your attributes, or set self._p_changed when you change a non-persistent > mutable. > > 3. Use transactions > > Especially if that's all you need to do in order to have persistence to any > number of backends, including the current ZODB and all the wonderful SQL or > other mappings that will be creatable by everybody on this list using their > own techniques. The key is not so much "transparency" per se, as > *uniformity* across backends. I think the existing API is transparent > enough; let's work on having uniform and universal access to it, as a > Python core package. I've often thought that it's ugly that you have to set _p_state and _p_changed, rather than do these things with method calls. What do you think about that? Especially the conventions for _p_state look confusing to me. --Guido van Rossum (home page: http://www.python.org/~guido/) From pje@telecommunity.com Fri Jul 19 23:12:35 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Fri, 19 Jul 2002 18:12:35 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comca st.net> References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> Message-ID: <3.0.5.32.20020719181235.00894ec0@telecommunity.com> At 04:03 PM 7/19/02 -0400, Guido van Rossum wrote: >> * I do think we should keep PersistentList and PersistentMapping in the >> core; they're useful for almost any kind of application, and any kind of >> back-end storage. They don't introduce policy or data format dependencies >> into users' code, either. > >But perhaps these should be rewritten to derive from dict and list >instead of UserDict and UserList? Perhaps. What are the implications for pickling? >Also, the module names are >inconsistent -- PersistentMapping is defined in _persistentMapping.py >but PersistentList is defined in PersistentList.py. Both are then >"pulled up" one level by __init__.py and their __module__ attribute >modified. I find all that hideous and tricky, and I propose to clean >this up before making it a standard Python package. +1 >> * Make _p_dm a synonym for _p_jar, and deprecate _p_jar. This could be >> done by making a _p_jar descriptor that read/wrote through to _p_dm, and >> issued a deprecation warning. I don't personally have a problem with >> _p_jar, but I've heard rumblings from other people (ZC folks?) that it's >> confusing or that they want to get rid of it. So if we're doing it, now >> seems like the time. > >It's just that "jar" makes no sense (except in the "cutesy" sense of a >jar full of pickles). But "dm" is a little obscure too. Maybe write >it out in full as _p_datamanager? Sure, whatever. Maybe just _p_manager. >> * Keep the _p_atime slot, but don't fill it with anything by default. >> Instead, have a _p_getattr_hook(persistentObj,attrName,retrievedValue) slot >> at C level that's called after the getattribute completes. A data manager >> can then set the hook to point to a _p_atime update function, *or* it can >> introduce postprocessing for "proxy" attributes. That is, a data manager >> could set the hook to handle "lazy" loading of certain attributes which >> would otherwise be costly to retrieve, by placing a dummy value in the >> object's dictionary, and then having the post-call hook return a >> replacement value. >> >> For speed, this will generally want to be a C function; let the base >> package include a simple hook that updates _p_atime, and another which >> checks whether the retrievedValue is an instance of a LazyValue base class, >> and if so, calls the object. This will probably cover the basics. A data >> manager that uses ZODB caching will use the atime function, and non-ZODB >> data managers will probably want the other hook. I also have an idea about >> using the transaction's timestamp() plus a counter to supply a "time" value >> that minimizes system calls, but I'm not sure it would actually improve >> performance any, so I'm fine with not trying to push that into the initial >> package. As long as the hook slot is present in the base package, I or >> anyone else are free to make up and try our own hooks to put in it. > >Shouldn't there be a setattr hook too? Hm. Seems like a YAGNI to me, unless you're saying that it's so that _p_atime can be updated on setattr, in which case, sure, add a _p_setattr_hook(obj,attrname,setval) that's called after successful setattr. Otherwise, I can't think of a use case that isn't already covered by the objectChanged() (formerly register()) message. >I've often thought that it's ugly that you have to set _p_state and >_p_changed, rather than do these things with method calls. What do >you think about that? Especially the conventions for _p_state look >confusing to me. I've never used _p_state for anything; I thought that was something purely private/internal to the implementation. So I'm not sure what you're talking about, there. For _p_changed, I don't have any objections to a method or methods, but it seems to me that it *was* a method at one time and Jim changed it to an attribute, so it might be good to ask him why. :) Of course, I've also seen people using ZODB write code like this: self.foo = self.foo To flag things as changed, without using an explicit _p_changed call. On a mental level, it has a certain appeal, because it's like saying, hey, I'm changing *this* attribute. :) But I don't have a strong preference for or against any of these three broad categories of change signalling. From smenard@bigfoot.com Fri Jul 19 23:28:29 2002 From: smenard@bigfoot.com (Steve Menard) Date: Fri, 19 Jul 2002 18:28:29 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <3.0.5.32.20020719181235.00894ec0@telecommunity.com> References: <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comca st.net> <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> Message-ID: <5.1.0.14.0.20020719182600.02a1fc98@pop.videotron.ca> At 06:12 PM 7/19/2002 -0400, Phillip J. Eby wrote: >At 04:03 PM 7/19/02 -0400, Guido van Rossum wrote: > >> * I do think we should keep PersistentList and PersistentMapping in the > >> core; they're useful for almost any kind of application, and any kind of > >> back-end storage. They don't introduce policy or data format dependencies > >> into users' code, either. > > > >But perhaps these should be rewritten to derive from dict and list > >instead of UserDict and UserList? > >Perhaps. What are the implications for pickling? I have done exactly that for POD and it works great. > >Also, the module names are > >inconsistent -- PersistentMapping is defined in _persistentMapping.py > >but PersistentList is defined in PersistentList.py. Both are then > >"pulled up" one level by __init__.py and their __module__ attribute > >modified. I find all that hideous and tricky, and I propose to clean > >this up before making it a standard Python package. > >+1 I agreed too. For consistency, could we make PersistentMapping a synonym for PersistentDict? From jeremy@alum.mit.edu Mon Jul 22 15:05:48 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jul 2002 10:05:48 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net> References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15676.4412.741538.532146@slothrop.zope.com> >>>>> "GvR" == Guido van Rossum writes: >> * I do think we should keep PersistentList and PersistentMapping >> in the >> core; they're useful for almost any kind of application, and any >> kind of back-end storage. They don't introduce policy or data >> format dependencies into users' code, either. GvR> But perhaps these should be rewritten to derive from dict and GvR> list instead of UserDict and UserList? One small comment. (I owe more substantial comment on Phillip's earlier proposals.) The persistent versions of dict and list can't extend the builtin types, because they need to hook __getitem__() and __setitem__(). The overridden methods may not be called if we extend the builtin types. Jeremy From smenard@bigfoot.com Mon Jul 22 15:26:18 2002 From: smenard@bigfoot.com (Steve Menard) Date: Mon, 22 Jul 2002 10:26:18 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <15676.4412.741538.532146@slothrop.zope.com> References: <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net> <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca> At 10:05 AM 7/22/2002 -0400, Jeremy Hylton wrote: > >>>>> "GvR" == Guido van Rossum writes: > > >> * I do think we should keep PersistentList and PersistentMapping > >> in the > >> core; they're useful for almost any kind of application, and any > >> kind of back-end storage. They don't introduce policy or data > >> format dependencies into users' code, either. > > GvR> But perhaps these should be rewritten to derive from dict and > GvR> list instead of UserDict and UserList? > >One small comment. (I owe more substantial comment on Phillip's >earlier proposals.) The persistent versions of dict and list can't >extend the builtin types, because they need to hook __getitem__() and >__setitem__(). The overridden methods may not be called if we extend >the builtin types. > >Jeremy hum, if those method are not guaranteed to be called by subclassing dict or list, then there is something broken. Either that or there is a subtle thing I do not understand. On a side note, as I have said in another post, I have done exactly that, subclassing dict and list. While my model didn't need to override __getitem__(), the __setitem__() at least seemed to act properly. In fact the only problem I have found is that it was not possible to mix __slots__ and dict/list. Steve From jacobs@penguin.theopalgroup.com Mon Jul 22 15:59:36 2002 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Mon, 22 Jul 2002 10:59:36 -0400 (EDT) Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca> Message-ID: On Mon, 22 Jul 2002, Steve Menard wrote: > At 10:05 AM 7/22/2002 -0400, Jeremy Hylton wrote: > >One small comment. (I owe more substantial comment on Phillip's > >earlier proposals.) The persistent versions of dict and list can't > >extend the builtin types, because they need to hook __getitem__() and > >__setitem__(). The overridden methods may not be called if we extend > >the builtin types. > > hum, if those method are not guaranteed to be called by subclassing dict or > list, then there is something broken. Either that or there is a subtle > thing I do not understand. In fact, I am quite sure that one can inherit from list or dict and override __getitem__ and __setitem__ in a cooperative fashion. Can you provide a little more information on why you think otherwise? > On a side note, as I have said in another post, I have done exactly that, > subclassing dict and list. While my model didn't need to override > __getitem__(), the __setitem__() at least seemed to act properly. In fact > the only problem I have found is that it was not possible to mix __slots__ > and dict/list. For all strange and perverse things I've done, slots work just fine when inheriting from list and dict. Again, can you provide an example of where you found otherwise? Thanks, -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From jeremy@alum.mit.edu Mon Jul 22 16:02:11 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jul 2002 11:02:11 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca> References: <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net> <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca> Message-ID: <15676.7795.542136.74597@slothrop.zope.com> >>>>> "SM" == Steve Menard writes: GvR> But perhaps these should be rewritten to derive from dict and GvR> list instead of UserDict and UserList? >> >> One small comment. (I owe more substantial comment on Phillip's >> earlier proposals.) The persistent versions of dict and list >> can't extend the builtin types, because they need to hook >> __getitem__() and __setitem__(). The overridden methods may not >> be called if we extend the builtin types. >> SM> hum, if those method are not guaranteed to be called by SM> subclassing dict or list, then there is something broken. Either SM> that or there is a subtle thing I do not understand. The latter. For performance reasons, most C code uses calls like PyDict_GetItem(), which operates directly on the C representation of a dict. If you inherit from dict, you'll get the same C representation for your object. That allows PyDict_GetItem() to be called, but doesn't arrange to call your __getitem__() method. The indirection required to invoke a subclass's __getitem__() would cause serious performance problems. Normally Guido only recommends inheriting from dict to add new behavior (as opposed to customizing existing behavior). Jeremy From smenard@bigfoot.com Mon Jul 22 16:41:36 2002 From: smenard@bigfoot.com (Steve Menard) Date: Mon, 22 Jul 2002 11:41:36 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: References: <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca> Message-ID: <5.1.0.14.0.20020722113746.07a8bd20@pop.videotron.ca> At 10:59 AM 7/22/2002 -0400, Kevin Jacobs wrote: >On Mon, 22 Jul 2002, Steve Menard wrote: > > At 10:05 AM 7/22/2002 -0400, Jeremy Hylton wrote: > > >One small comment. (I owe more substantial comment on Phillip's > > >earlier proposals.) The persistent versions of dict and list can't > > >extend the builtin types, because they need to hook __getitem__() and > > >__setitem__(). The overridden methods may not be called if we extend > > >the builtin types. > > > > hum, if those method are not guaranteed to be called by subclassing > dict or > > list, then there is something broken. Either that or there is a subtle > > thing I do not understand. > >In fact, I am quite sure that one can inherit from list or dict and override >__getitem__ and __setitem__ in a cooperative fashion. Can you provide a >little more information on why you think otherwise? Mater of fact, I do not think otherwise. Jeremy said : "The overridden methods may not be called if we extend the builtin types." Which I think is wrong. > > On a side note, as I have said in another post, I have done exactly that, > > subclassing dict and list. While my model didn't need to override > > __getitem__(), the __setitem__() at least seemed to act properly. In fact > > the only problem I have found is that it was not possible to mix __slots__ > > and dict/list. > >For all strange and perverse things I've done, slots work just fine when >inheriting from list and dict. Again, can you provide an example of where >you found otherwise? Ok, my problem was from inheriting both from dict and from my Persistent class. Persistent was using slots. I could dig out or reproduce error message if you're interested. Steve From smenard@bigfoot.com Mon Jul 22 16:44:10 2002 From: smenard@bigfoot.com (Steve Menard) Date: Mon, 22 Jul 2002 11:44:10 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <15676.7795.542136.74597@slothrop.zope.com> References: <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca> <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net> <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca> Message-ID: <5.1.0.14.0.20020722114214.07a9d2d0@pop.videotron.ca> At 11:02 AM 7/22/2002 -0400, Jeremy Hylton wrote: > >>>>> "SM" == Steve Menard writes: > > GvR> But perhaps these should be rewritten to derive from dict and > GvR> list instead of UserDict and UserList? > >> > >> One small comment. (I owe more substantial comment on Phillip's > >> earlier proposals.) The persistent versions of dict and list > >> can't extend the builtin types, because they need to hook > >> __getitem__() and __setitem__(). The overridden methods may not > >> be called if we extend the builtin types. > >> > > SM> hum, if those method are not guaranteed to be called by > SM> subclassing dict or list, then there is something broken. Either > SM> that or there is a subtle thing I do not understand. > >The latter. For performance reasons, most C code uses calls like >PyDict_GetItem(), which operates directly on the C representation of a >dict. If you inherit from dict, you'll get the same C representation >for your object. That allows PyDict_GetItem() to be called, but >doesn't arrange to call your __getitem__() method. The indirection >required to invoke a subclass's __getitem__() would cause serious >performance problems. Ok, makes sense. Since it is unsafe to override those methods, perhaps it should be disallowed then. Because we get different behavior when obj[x] is called from C and when called from Python. >Normally Guido only recommends inheriting from dict to add new >behavior (as opposed to customizing existing behavior). Steve From jacobs@penguin.theopalgroup.com Mon Jul 22 16:42:11 2002 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Mon, 22 Jul 2002 11:42:11 -0400 (EDT) Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <5.1.0.14.0.20020722113746.07a8bd20@pop.videotron.ca> Message-ID: On Mon, 22 Jul 2002, Steve Menard wrote: > Mater of fact, I do not think otherwise. Jeremy said : > > "The overridden methods may not be called if we extend the builtin types." > > Which I think is wrong. I see what he is saying now -- most of the Python core does a PyDict_Check(o), not a PyDict_CheckExact(o) to determine if an object is a real (base) dictionary. Such code then does PyDict_GetItem/PyDict_SetItem rather than PyObject_GetItem/PyObject_SetItem, and thus bypass your derived __getitem__ and __setitem__. > > > On a side note, as I have said in another post, I have done exactly that, > > > subclassing dict and list. While my model didn't need to override > > > __getitem__(), the __setitem__() at least seemed to act properly. In fact > > > the only problem I have found is that it was not possible to mix __slots__ > > > and dict/list. > > > >For all strange and perverse things I've done, slots work just fine when > >inheriting from list and dict. Again, can you provide an example of where > >you found otherwise? > > Ok, my problem was from inheriting both from dict and from my Persistent > class. Persistent was using slots. I could dig out or reproduce error > message if you're interested. >From your description, I see what is happening now. I have a meta-class that lazily instantiates slots, which may help. It totally avoids the problem of layout conflicts, so long as all base classes can have slots added to them (i.e., not anything that inherits from tuple). -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From jacobs@penguin.theopalgroup.com Mon Jul 22 16:45:14 2002 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Mon, 22 Jul 2002 11:45:14 -0400 (EDT) Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <5.1.0.14.0.20020722114214.07a9d2d0@pop.videotron.ca> Message-ID: On Mon, 22 Jul 2002, Steve Menard wrote: > Ok, makes sense. Since it is unsafe to override those methods, perhaps it > should be disallowed then. Because we get different behavior when obj[x] is > called from C and when called from Python. I would be happier of we had a PyDict_{G,S}etItemExact for when we know we have a base dict, and modify PyDict_{G,S}etItem to use PyObject_{G,S}etItem when not PyDict_CheckExact. It will be a pain, but being correct is almost always better than being fast. -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From jeremy@alum.mit.edu Mon Jul 22 16:50:41 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jul 2002 11:50:41 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <5.1.0.14.0.20020722113746.07a8bd20@pop.videotron.ca> References: <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca> <5.1.0.14.0.20020722113746.07a8bd20@pop.videotron.ca> Message-ID: <15676.10705.19551.392096@slothrop.zope.com> >>>>> "SM" == Steve Menard writes: >> > On a side note, as I have said in another post, I have done >> > exactly that, subclassing dict and list. While my model didn't >> > need to override __getitem__(), the __setitem__() at least >> > seemed to act properly. In fact the only problem I have found >> > is that it was not possible to mix __slots__ and dict/list. >> >> For all strange and perverse things I've done, slots work just >> fine when inheriting from list and dict. Again, can you provide >> an example of where you found otherwise? SM> Ok, my problem was from inheriting both from dict and from my SM> Persistent class. Persistent was using slots. I could dig out or SM> reproduce error message if you're interested. dict and Persistent are not compatible at the C level. That's a second problem, and one that I hadn't thought of. (It doesn't have anything to do with slots.) >>> class PD(Persistent, dict): ... pass ... Traceback (most recent call last): File "", line 1, in ? TypeError: multiple bases have instance lay-out conflict There's no way to make this problem go away if we continue to implement persistence in C. Jeremy From smenard@bigfoot.com Mon Jul 22 17:08:37 2002 From: smenard@bigfoot.com (Steve Menard) Date: Mon, 22 Jul 2002 12:08:37 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <15676.10705.19551.392096@slothrop.zope.com> References: <5.1.0.14.0.20020722113746.07a8bd20@pop.videotron.ca> <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca> <5.1.0.14.0.20020722113746.07a8bd20@pop.videotron.ca> Message-ID: <5.1.0.14.0.20020722120712.02aec010@pop.videotron.ca> At 11:50 AM 7/22/2002 -0400, Jeremy Hylton wrote: > >>>>> "SM" == Steve Menard writes: > > >> > On a side note, as I have said in another post, I have done > >> > exactly that, subclassing dict and list. While my model didn't > >> > need to override __getitem__(), the __setitem__() at least > >> > seemed to act properly. In fact the only problem I have found > >> > is that it was not possible to mix __slots__ and dict/list. > >> > >> For all strange and perverse things I've done, slots work just > >> fine when inheriting from list and dict. Again, can you provide > >> an example of where you found otherwise? > > SM> Ok, my problem was from inheriting both from dict and from my > SM> Persistent class. Persistent was using slots. I could dig out or > SM> reproduce error message if you're interested. > >dict and Persistent are not compatible at the C level. That's a >second problem, and one that I hadn't thought of. (It doesn't have >anything to do with slots.) > > >>> class PD(Persistent, dict): >... pass >... >Traceback (most recent call last): > File "", line 1, in ? >TypeError: multiple bases have instance lay-out conflict > >There's no way to make this problem go away if we continue to >implement persistence in C. Right. That's the same problem I had, even though my Persistent was not implemented in C. It simply used __slots__. I guess since __slots__ change the layout of the object the same problem is caused. Steve From jim@zope.com Mon Jul 22 18:16:46 2002 From: jim@zope.com (Jim Fulton) Date: Mon, 22 Jul 2002 13:16:46 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> Message-ID: <3D3C3DFE.6070203@zope.com> Phillip J. Eby wrote: > Following on the unparalleled success of the "Straw Man" transaction API > (he said, with tongue in cheek), It seemed pretty sucessful to me. > I thought it might be good to make a > proposal for persistence as well. Thanks. This is very helpful. > Since I won't be at the BOF, We'll miss you. > I figure I > should get my two cents in now, while the getting's good. > > Here's my proposal, such as it is... Deliver a Persistence package based on > the one at http://cvs.zope.org/Zope3/lib/python/Persistence/ but with the > following changes: > > * Remove the BTrees subpackage, and the Class, Cache, Function, and Module > modules, along with the ICache interface. Rationale: The BTrees package is > only useful for a relatively small subset of possible persistence backends, > and is subject to periodic data structure changes which affect applications > using it. I'm OK with taking out BTrees, however, BTrees were included in ZODB by very popular demand. You haven't given a rational for not including the caching framework. The caching framework is closely ties to persistence and, I think, largely independent of data managers. > It's probably best kept out of the Python core. Similar > arguments apply to the Cache system, although not quite as strongly. > Class, Function, and Module are very recent developments which have not had > the extended usage that most of the rest of the code has. Fair enough. > (Note: I don't > mean to say that the persistence C code has been thoroughly exercised > either, in the sense that much of it is completely new for Python 2.2. But > its *design* has a long history, and previous implementations have had much > testing of the kind of edge and corner issues that the Class, Function, and > Module modules haven't been exposed to yet.) > > * I do think we should keep PersistentList and PersistentMapping in the > core; they're useful for almost any kind of application, and any kind of > back-end storage. They don't introduce policy or data format dependencies > into users' code, either. I *never* use persistent list and almost never use persistent mapping. I find BTrees far more useful. :) > * Make _p_dm a synonym for _p_jar, and deprecate _p_jar. This could be > done by making a _p_jar descriptor that read/wrote through to _p_dm, and > issued a deprecation warning. I don't personally have a problem with > _p_jar, but I've heard rumblings from other people (ZC folks?) that it's > confusing or that they want to get rid of it. So if we're doing it, now > seems like the time. I wouldn't worry about backward compatability. Ditch '_p_jar' and pick a better name, like '_p_manager' as you suggested. > * Flag _p_changed *after* __setattr__, not before! This will help > co-operative transaction participants play nicely together, since they > can't "write through" a change if they're getting notified *before* the > change takes place! It would be helpful if you could provide an illustrative example in a separate dedicated message. > Docs should also clarify that when set in other code, > _p_changed should be set at the latest possible moment, *after* the object > is in its new, stable state. I'm with Guido in wanting a set of api calls to replace the baroque '_p_changed' semantics. Note to both you and Guido, you (Phillip) are right, _p_state is an internal implementation detail. > * Keep the _p_atime slot, but don't fill it with anything by default. > Instead, have a _p_getattr_hook(persistentObj,attrName,retrievedValue) slot > at C level that's called after the getattribute completes. A data manager > can then set the hook to point to a _p_atime update function, *or* it can > introduce postprocessing for "proxy" attributes. That is, a data manager > could set the hook to handle "lazy" loading of certain attributes which > would otherwise be costly to retrieve, by placing a dummy value in the > object's dictionary, and then having the post-call hook return a > replacement value. I suggest we step back a bit and think of the API in terms of events. I suggest we think about what events are generated and who they are sent to. Your API change is consistent with that, > For speed, this will generally want to be a C function; let the base > package include a simple hook that updates _p_atime, and another which > checks whether the retrievedValue is an instance of a LazyValue base class, > and if so, calls the object. This will probably cover the basics. A data > manager that uses ZODB caching will use the atime function, and non-ZODB > data managers will probably want the other hook. I also have an idea about > using the transaction's timestamp() plus a counter to supply a "time" value > that minimizes system calls, but I'm not sure it would actually improve > performance any, so I'm fine with not trying to push that into the initial > package. As long as the hook slot is present in the base package, I or > anyone else are free to make up and try our own hooks to put in it. I'd like to get rid of _p_atime, as it is totally dependent on a particular cache implementation, which we happen to be phasing out. Persistent objects should have *no* > * Get rid of the term "register", since objects won't "register" with the > transaction, and neither should they with their data manager. They should > "inform their data manager" that they have changed. Something like an > objectChanged() message is appropriate in place of register(). I believe > this would clarify the API. That's fine. > * Take out the interfaces. :( I'd rather this were, "leave this in, in a > way such that it works whether you have Interface or not", but the reality > is that a dependency in the standard library on something outside the > standard library is a big no-no, and just begging for breakage as soon as > there *is* an Interface package (with a new API) in the standard library. I think that this is a very bad idea. I think the interfaces clarify things quite a bit. > Whew! I think that about covers it, as far as what I'd like to see, and > what I think would be needed to make it acceptable for the core. Comments? > > By the way, my rationale for not taking any radical new approaches to > persistence, observation, or notification in this proposal is that the > existing Persistence package is "transparent" enough, and has the benefit > of lots of field experience. I spent a lot of time trying to come up with > "better" ways before writing this; mostly I found that trying to make it > more "transparent" to the object being persisted, just pushes the > complexity into either the app or the backend, without really helping > anything. It's not a really big deal to: > > 1. Subclass Persistent > > 2. Use PersistentList and PersistentMapping or other Persistent objects for > your attributes, or set self._p_changed when you change a non-persistent > mutable. These are not a big deal to you, because you have a deep understanding and interest in the machinery. They are a big deal to most people. It would be *wonderful* if we could avoid this. Maybe if we had a standard persistence framework, we could motivate language changes that made this cleaner. :) > 3. Use transactions > > Especially if that's all you need to do in order to have persistence to any > number of backends, including the current ZODB and all the wonderful SQL or > other mappings that will be creatable by everybody on this list using their > own techniques. The key is not so much "transparency" per se, as > *uniformity* across backends. I think the existing API is transparent > enough; let's work on having uniform and universal access to it, as a > Python core package. Transactions are a huge benefit, as opposed to something that is "not really a big deal". :) Here are some additional points: - While we should provide a standard implementation of a persistence *interface*, we should allow other implementations. For example, the data manager or cache should not depend on internal details of the persistence implementation. We should not require a specific C layout for persistent objects, for example. - The persistence interface and implementations should be independent of the cache implementations (e.g. no _p_atime). We *do* need to provide an better API for handling objects that are unwilling to be deactivated. Perhaps _p_deactivate should return a value indicating whether the object was deactivated, and, if not, perhaps why. - We need to define the state model for persistent objects. I'd like to include the notion of a persistent refcount. Possible states are: o Unsaved o Up to date o changed o ghost In addition, there is a persistent reference count. This is used by C code to indicate that the object is being used outside of Python. An objecty can't be turned into a ghost if it's persistent reference count is > 0. We'll model the reference count as a "sticky" state. We transition to the sticky state when the reference count becomes non-zero and from the sticky state when the reference count drops to zero. This state is largely indepent of the other states. - I'd like to spend some time thinking through persistence related events. Here's a start: o When a persistent object is modified while in the up-to-date state, it should notify it's datata manager and transition to the changed state. o When the object it accessed, it should notify it's data manager. Perhaps it should pass it's current state. o The persistent object calls a method on the data manager when it's state needs to be loaded. o The persistent object should probably notify the data manager of any state changes. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From pje@telecommunity.com Mon Jul 22 19:32:17 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Mon, 22 Jul 2002 14:32:17 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <3D3C3DFE.6070203@zope.com> References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> Message-ID: <5.1.0.14.0.20020722132838.05986020@mail.telecommunity.com> At 01:16 PM 7/22/02 -0400, Jim Fulton wrote: >Phillip J. Eby wrote: >>* Remove the BTrees subpackage, and the Class, Cache, Function, and Module >>modules, along with the ICache interface. Rationale: The BTrees package is >>only useful for a relatively small subset of possible persistence backends, >>and is subject to periodic data structure changes which affect applications >>using it. > >I'm OK with taking out BTrees, however, BTrees were included in ZODB by >very popular demand. And they should continue to be included with ZODB. But IMHO their use is specific to persistence mechanisms which use "pickle jar"-style or "shelve"-like primitive databases. (Primitive in the sense of not providing any concepts such as indexes or built-in search capabilities.) If you have a higher-level mechanism, even one as simple as SleepyCat DB (aka Berkeley DB) b-trees, you're most often better off using those features of the backend. If this were not true, there'd be no need for any persistence mechanisms besides ZODB, and we wouldn't be having this conversation. :) (Note that I'm assuming that ZODB itself will continue to exist as an independent package, providing a persistence mechanism through its Connection, Database, and Storage objects. It just shouldn't need to include Persistence or Transaction any more; BTrees would become ZODB.BTrees, or something similar.) >You haven't given a rational for not including the caching framework. >The caching framework is closely ties to persistence and, I think, >largely independent of data managers. IMHO the existing caches are tied to a specific caching policy, which embeds many ZODB-ish assumptions. For RDBMS work, I primarily need transactional caching, where caches are cleared between transactions. For that, I can use a simple WeakValueDictionary, with some code that deactivates objects between transactions. But if you think we should throw in some basic cache implementations for the most common caching policies, I've no objection. I just thought it better to save argument at the present time over *which* caching policies would be most common. :) >>* I do think we should keep PersistentList and PersistentMapping in the >>core; they're useful for almost any kind of application, and any kind of >>back-end storage. They don't introduce policy or data format dependencies >>into users' code, either. > >I *never* use persistent list and almost never use persistent mapping. >I find BTrees far more useful. :) Then I suppose we could drop them, too. :) But I suspect that examining third-party usage of ZODB (including Zope 2 products) would show them to be moderately popular, even for use with ZODB. There's another reason for including them, though... they serve as very simple examples of implementing persistent objects whose attributes are mutable objects. >>* Flag _p_changed *after* __setattr__, not before! This will help >>co-operative transaction participants play nicely together, since they >>can't "write through" a change if they're getting notified *before* the >>change takes place! > >It would be helpful if you could provide an illustrative example in a separate >dedicated message. Okay. I'm persisting some objects in an SQL database. I have two txn participants: the SQL persistence manager, and the SQL database connection. The former writes to the latter. But actually, I have a third participant, another persistence manager which manages a "facade" object whose state is stored in two of the objects managed by the SQL persistence manager. We reach transaction commit, and the "facade" object has uncommitted state... The "begin_commit" message (formerly tpc_begin) reaches the SQL persistence manager first, so it does nothing because no state has been written by the third manager. It then goes into "write-through" state. The message reaches the SQL DB connection next, and it ignores it, because it's always in "write-through" mode, effectively. Finally it reaches the third manager, which writes the dirty state from the facade to the underlying SQL-persisted objects. If they notify their manager of the change, before they're actually changed (as setattr does now), the manager will try to "write-through" a change that hasn't occurred yet, causing a lost write. Conversely, if change notification always takes place *after* a change, the write-through will succeed, and by extension, one can have as many levels of "write-through" transaction participants as one desires, without the transaction itself needing to be aware of dependencies between participants, and without requiring more commit phases or other complications, such as are needed by Steve Alexander's TransactionAgents for Zope 2. (In other words, I'm not the only person who likes being able to stack persistence mechanisms and "triggers". Although I suspect it was my work with ZPatterns that initially led Steve down that dark path of corruption. ;) ) >>* Take out the interfaces. :( I'd rather this were, "leave this in, in a >>way such that it works whether you have Interface or not", but the reality >>is that a dependency in the standard library on something outside the >>standard library is a big no-no, and just begging for breakage as soon as >>there *is* an Interface package (with a new API) in the standard library. > >I think that this is a very bad idea. I think the interfaces clarify things >quite a bit. I think maybe I was unclear. I certainly don't think that the interfaces should cease to exist, or that they should not exist as documentation. I'm referring to their inclusion as operating code, only. >>Whew! I think that about covers it, as far as what I'd like to see, and >>what I think would be needed to make it acceptable for the core. Comments? >>By the way, my rationale for not taking any radical new approaches to >>persistence, observation, or notification in this proposal is that the >>existing Persistence package is "transparent" enough, and has the benefit >>of lots of field experience. I spent a lot of time trying to come up with >>"better" ways before writing this; mostly I found that trying to make it >>more "transparent" to the object being persisted, just pushes the >>complexity into either the app or the backend, without really helping >>anything. It's not a really big deal to: >>1. Subclass Persistent >>2. Use PersistentList and PersistentMapping or other Persistent objects for >>your attributes, or set self._p_changed when you change a non-persistent >>mutable. > >These are not a big deal to you, because you have a deep understanding and >interest in the machinery. They are a big deal to most people. It would >be *wonderful* if we could avoid this. Maybe if we had a standard persistence >framework, we could motivate language changes that made this cleaner. :) Interesting that you say this, considering how much adoption ZODB has had in the larger Python community. Perhaps you could be more specific as to the audience you're talking about? To get rid of these things is possible, but complex. Getting rid of Persistent while minimizing loss of generality would mean either introducing proxies, or dynamically altering object types in order to get the observation capability. I'm seriously unconvinced that adding a line to import Persistent, and adding a word to the definition of a few application base classes, is so burdensome as to be worth the complexity and fragility of either of the basic approaches to avoiding it! (The second issue could probably be addressed with an extension of the solution to the first... by adding further complexity.) If our goal is to provide a Python core package for this in a speedy timeframe -- say this summer -- I think that developing and debugging a whole new way of doing things like this is probably out of the question. Thing is, *we don't have to actually solve this problem*. If we create a decent base API/implementation, there's no reason people can't create the proxies or class-substitution mechanisms on their own, using the base implementation to do the actual persistence part. In principle, it should be possible to create such a mechanism for arbitrary data managers. I should also mention, by the way, that PEAK (formerly TransWarp) has mechanisms that allow generation of class families with re-parented base classes, and re-written methods. So that's just one of many possible means by which one could create a transparency mechanism, independent of the persistence framework or persistence mechanisms. I don't think we should tie the persistence framework, therefore, to one specific transparency mechanism. Especially since we don't know what transparency mechanisms will be "best" for a given situation. >Transactions are a huge benefit, as opposed to something that is "not >really a big deal". :) I'm really surprised you get objections to adding a base class, but not to rewriting applications to use transactions. Adding the latter actually seems more invasive a change, to me, even if the benefit is certainly noticed and appreciated. >Here are some additional points: > >- While we should provide a standard implementation of a persistence > *interface*, we should allow other implementations. For example, the > data manager or cache should not depend on internal details of the > persistence implementation. We should not require a specific C layout > for persistent objects, for example. Okay. >- The persistence interface and implementations should be independent of > the cache implementations (e.g. no _p_atime). We *do* need to provide > an better API for handling objects that are unwilling to be deactivated. > Perhaps _p_deactivate should return a value indicating whether the object > was deactivated, and, if not, perhaps why. Okay. >- We need to define the state model for persistent objects. I'd like to >include > the notion of a persistent refcount. Possible states are: > > o Unsaved > > o Up to date > > o changed > > o ghost > > In addition, there is a persistent reference count. This is used by C code > to indicate that the object is being used outside of Python. An objecty > can't be turned into a ghost if it's persistent reference count is > 0. > We'll model the reference count as a "sticky" state. We transition to > the sticky > state when the reference count becomes non-zero and from the sticky state > when the reference count drops to zero. This state is largely indepent > of the other > states. Sounds good. >- I'd like to spend some time thinking through persistence related events. > Here's a start: > > o When a persistent object is modified while in the up-to-date state, > it should notify it's datata manager and transition to the changed > state. Sure. > o When the object it accessed, it should notify it's data manager. > Perhaps it > should pass it's current state. I'd like to rephrase that as being it notifies, *if* it has been requested to do so by the data manager. The data manager may decide to turn on or off such notifications at will. (In other words, I want my post-getattr hook function that can modify the result of the getattr, and I want it removable so I don't continue to pay in performance once all my state is loaded.) > o The persistent object calls a method on the data manager when it's > state > needs to be loaded. As long as I still have the ability to set or remove a getattr-hook that works independently of this, I'm fine. > o The persistent object should probably notify the data manager of > any state > changes. *Shrug*. IAGNI. (I ain't gonna need it. :) I don't have a use case for any messages but "I'm changed", "load me", and "postprocess a getattr". For what it's worth, I'd *really* like to keep this *simple*. Simple to me means released sooner, more explicit, more reliable. So I'd be happiest if we can stick to specific use cases. I've spent a lot of time hacking around the existing packages to do SQL/LDAP stuff, and others here should have strong experience using ZODB for its "natural" backends and application structures. That means we should be able to get pretty concrete about what is and isn't needed. In the absence of more use cases, I'm not sure what else is really needed besides what we've already discussed. Indeed, most of what I've outlined has been stuff I think should be taken *out*. To put it another way, I think we should have to justify everything we want to put *in*, not what we take out. Python standard library modules are widely distributed, and have a long life. Whatever we put in needs to have a healthy life expectancy! From jim@zope.com Mon Jul 22 20:47:42 2002 From: jim@zope.com (Jim Fulton) Date: Mon, 22 Jul 2002 15:47:42 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> <5.1.0.14.0.20020722132838.05986020@mail.telecommunity.com> Message-ID: <3D3C615E.6030201@zope.com> Phillip J. Eby wrote: > At 01:16 PM 7/22/02 -0400, Jim Fulton wrote: > >> Phillip J. Eby wrote: >> >>> * Remove the BTrees subpackage, and the Class, Cache, Function, and >>> Module >>> modules, along with the ICache interface. Rationale: The BTrees >>> package is >>> only useful for a relatively small subset of possible persistence >>> backends, >>> and is subject to periodic data structure changes which affect >>> applications >>> using it. >> >> >> I'm OK with taking out BTrees, however, BTrees were included in ZODB by >> very popular demand. > > > And they should continue to be included with ZODB. They don't depend on ZODB in any way. > But IMHO their use > is specific to persistence mechanisms which use "pickle jar"-style or > "shelve"-like primitive databases. (Primitive in the sense of not > providing any concepts such as indexes or built-in search > capabilities.) If you have a higher-level mechanism, even one as simple > as SleepyCat DB (aka Berkeley DB) b-trees, you're most often better off > using those features of the backend. I don't agree. > If this were not true, there'd be no need for any persistence mechanisms > besides ZODB, and we wouldn't be having this conversation. :) There are lots of other reasons for a non-ZODB persistent storage including: 1) Need to store data in relational databases - Because they are trusted - because data needs to be accessed from other apps - because they may scale better for some apps 2) Competition is good. :) > (Note that I'm assuming that ZODB itself will continue to exist as an > independent package, providing a persistence mechanism through its > Connection, Database, and Storage objects. It just shouldn't need to > include Persistence or Transaction any more; Of course. > BTrees would become > ZODB.BTrees, or something similar.) No, they would be separate. They don't depend on ZODB. > >> You haven't given a rational for not including the caching framework. >> The caching framework is closely ties to persistence and, I think, >> largely independent of data managers. > > > IMHO the existing caches are tied to a specific caching policy, which > embeds many ZODB-ish assumptions. For RDBMS work, I primarily need > transactional caching, where caches are cleared between transactions. > For that, I can use a simple WeakValueDictionary, with some code that > deactivates objects between transactions. > > But if you think we should throw in some basic cache implementations for > the most common caching policies, I've no objection. I just thought it > better to save argument at the present time over *which* caching > policies would be most common. :) I think that there should, at least, be a standard cache interface. It should be possible to develop data managers and caches independently. Maybe we could include one or two standard implementations. These could provide useful examples for other implementations and, of course, be useful in themselves. ... >>> * Take out the interfaces. :( I'd rather this were, "leave this in, >>> in a >>> way such that it works whether you have Interface or not", but the >>> reality >>> is that a dependency in the standard library on something outside the >>> standard library is a big no-no, and just begging for breakage as >>> soon as >>> there *is* an Interface package (with a new API) in the standard >>> library. >> >> >> I think that this is a very bad idea. I think the interfaces clarify >> things >> quite a bit. > > > I think maybe I was unclear. I certainly don't think that the > interfaces should cease to exist, or that they should not exist as > documentation. I'm referring to their inclusion as operating code, only. So you don't want them to get imported? ... >> These are not a big deal to you, because you have a deep understanding >> and >> interest in the machinery. They are a big deal to most people. It would >> be *wonderful* if we could avoid this. Maybe if we had a standard >> persistence >> framework, we could motivate language changes that made this cleaner. :) > > > Interesting that you say this, considering how much adoption ZODB has > had in the larger Python community. Perhaps you could be more specific > as to the audience you're talking about? I was mainly refering to the handling of non-persistent mutable stumbling block. This is a major stubling block and source of errors to most ZODB users. Having to mix in persistence is an annoyance. It would be really cool (but hard, very hard) to get rid of them. > To get rid of these things is possible, but complex. Getting rid of > Persistent while minimizing loss of generality would mean either > introducing proxies, or dynamically altering object types in order to > get the observation capability. I'm seriously unconvinced that adding a > line to import Persistent, and adding a word to the definition of a few > application base classes, is so burdensome as to be worth the complexity > and fragility of either of the basic approaches to avoiding it! (The > second issue could probably be addressed with an extension of the > solution to the first... by adding further complexity.) I agree that this is hard. It's really hard. I wasn't even suggesting that we needed to solve this problem. I was merely pointing out that this *is* a big deal for a lot of people. > If our goal is to provide a Python core package for this in a speedy > timeframe -- say this summer -- I think that developing and debugging a > whole new way of doing things like this is probably out of the question. Agreed. OTOH, it wouldn't hurt to ponder other alternatives, if not now, them maybe later. > Thing is, *we don't have to actually solve this problem*. If we create > a decent base API/implementation, there's no reason people can't create > the proxies or class-substitution mechanisms on their own, using the > base implementation to do the actual persistence part. In principle, it > should be possible to create such a mechanism for arbitrary data managers. True. But maybe someone will think of a way to solve this without proxies or alchemy? ... >> o When the object it accessed, it should notify it's data manager. >> Perhaps it >> should pass it's current state. > > > I'd like to rephrase that as being it notifies, *if* it has been > requested to do so by the data manager. The data manager may decide to > turn on or off such notifications at will. (In other words, I want my > post-getattr hook function that can modify the result of the getattr, > and I want it removable so I don't continue to pay in performance once > all my state is loaded.) We need to think some more about this. I'd rather err on the side of simple persistent objects and complex data managers. I'd also like persistent objects to be as lightweight as possible. Carrying a bunch of attributes for hooks is worrysome/ > >> o The persistent object calls a method on the data manager when >> it's state >> needs to be loaded. > > > As long as I still have the ability to set or remove a getattr-hook that > works independently of this, I'm fine. Would different objects in the same DM have different values of the same hook? If so, why? >> o The persistent object should probably notify the data manager of >> any state >> changes. > > > *Shrug*. IAGNI. (I ain't gonna need it. :) I don't have a use case > for any messages but "I'm changed", "load me", and "postprocess a getattr". > > For what it's worth, I'd *really* like to keep this *simple*. Simple to > me means released sooner, more explicit, more reliable. So I'd be > happiest if we can stick to specific use cases. A decent cache is going to handle objects differenty based on their states. For example, a cache that deactivates objects when they haven't been used in a while needs to know which objects are ghostifyable and needs to know when ghostifyable objects have changed. > I've spent a lot of time hacking around the existing packages to do > SQL/LDAP stuff, and others here should have strong experience using ZODB > for its "natural" backends and application structures. That means we > should be able to get pretty concrete about what is and isn't needed. > In the absence of more use cases, I'm not sure what else is really > needed besides what we've already discussed. Indeed, most of what I've > outlined has been stuff I think should be taken *out*. > > To put it another way, I think we should have to justify everything we > want to put *in*, not what we take out. Python standard library modules > are widely distributed, and have a long life. Whatever we put in needs > to have a healthy life expectancy! I don't think we should approach this effort with the assumption that the first version is going into the standard library. I'm pretty happy with the persistence mechanism I came up with for ZODB, but there are a lot of things I'd like to fix. I agree that we should be rather conservative, but this is a good time to fix things. Having dome so, we should get some experience with what we've come up with before we worry about adding it to the standard library. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jeremy@zope.com Mon Jul 22 21:03:11 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Mon, 22 Jul 2002 16:03:11 -0400 (EDT) Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net> References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15676.25855.718959.288651@localhost.localdomain> >>>>> "GvR" == Guido van Rossum writes: GvR> I've often thought that it's ugly that you have to set _p_state GvR> and _p_changed, rather than do these things with method calls. GvR> What do you think about that? Especially the conventions for GvR> _p_state look confusing to me. I would like to keep a simplified version of _p_changed, but not _p_state. The purpose of assignment to _p_changed is to mark an object as changed. Assignment seems clear here. _p_changed is a flag, normally false; when an object is changed, it is set to true. Why would a method call be any clearer? In general, it seems Python programs often use instance variables in this way, and the property mechanism only makes it easier for something like looks like assignment to behave in special ways. I don't think there's any need to make _p_state part of the documented API, although it may be useful to keep for debugging. Jeremy From jeremy@zope.com Mon Jul 22 21:12:50 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Mon, 22 Jul 2002 16:12:50 -0400 (EDT) Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <3.0.5.32.20020719115207.0086e100@telecommunity.com> References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> Message-ID: <15676.26434.240121.243006@localhost.localdomain> >>>>> "PJE" == Phillip J Eby writes: PJE> * Flag _p_changed *after* __setattr__, not before! This will PJE> help co-operative transaction participants play nicely PJE> together, since they can't "write through" a change if they're PJE> getting notified *before* the change takes place! Docs should PJE> also clarify that when set in other code, _p_changed should be PJE> set at the latest possible moment, *after* the object is in its PJE> new, stable state. Can you flesh out this request? The second sentence there suggests interesting issues, but doesn't spell them out. As for when _p_changed should be set: Why does it matter? PJE> * Keep the _p_atime slot, but don't fill it with anything by PJE> default. I'd just as soon drop it completely. If a particular application wants to extend the base persistence interface, it can. PJE> * Get rid of the term "register", since objects won't PJE> "register" with the transaction, and neither should they with PJE> their data manager. They should "inform their data manager" PJE> that they have changed. Something like an objectChanged() PJE> message is appropriate in place of register(). I believe this PJE> would clarify the API. I don't have a problem with register(). In what way is the current interface unclear? PJE> By the way, my rationale for not taking any radical new PJE> approaches to persistence, observation, or notification in this PJE> proposal is that the existing Persistence package is PJE> "transparent" enough, and has the benefit of lots of field PJE> experience. I'd like to see some comments from people who haven't already used ZODB. I worry that all the comments are coming from a small number of people who wrote or use ZODB's persistent mechanism, and that we'll make decisions will be limiting for other persistent applications. (But maybe there aren't any other such applications/users.) Jeremy From tim@zope.com Mon Jul 22 21:43:25 2002 From: tim@zope.com (Tim Peters) Date: Mon, 22 Jul 2002 16:43:25 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <15676.25855.718959.288651@localhost.localdomain> Message-ID: [Guido] > I've often thought that it's ugly that you have to set _p_state > and _p_changed, rather than do these things with method calls. > What do you think about that? Especially the conventions for > _p_state look confusing to me. [Jeremy Hylton] > I would like to keep a simplified version of _p_changed, If _p_changed is a 1-bit flag now, how much simpler can it get ? > but not _p_state. The purpose of assignment to _p_changed is to mark an > object as changed. Assignment seems clear here. _p_changed is a > flag, normally false; when an object is changed, it is set to true. > Why would a method call be any clearer? Presumably so that interested parties could influence what happens when an object becomes "dirty"? Maybe update a distributed cache, who knows. I suspect Philip Eby was getting at something related with his plea to set _p_changed only after an object is an a sane state again after a change is complete. OTOH, method calls are a large overhead whem the mutation is simple; e.g., if a persistent list has to call a changed() method every time someone does a[i] = 6 that's a real drag on potential performance. From jeremy@alum.mit.edu Mon Jul 22 21:49:51 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jul 2002 16:49:51 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: References: <15676.25855.718959.288651@localhost.localdomain> Message-ID: <15676.28655.414460.130631@slothrop.zope.com> >>>>> "TP" == Tim Peters writes: TP> [Guido] >> I've often thought that it's ugly that you have to set _p_state >> and _p_changed, rather than do these things with method calls. >> What do you think about that? Especially the conventions for >> _p_state look confusing to me. TP> [Jeremy Hylton] >> I would like to keep a simplified version of _p_changed, TP> If _p_changed is a 1-bit flag now, how much simpler can it get TP> ? It's not a one-bit flag, and that's the part I want to simplify. You can also: - set _p_changed to None, which requests that the object become a ghost. - delete the _p_changed attribute (del obj._p_changed) which also asks the object to become a ghost, but in subtly different ways than just setting the attribute to None. - revive a ghost, although I'm not entirely clear how this work. The Zope3 persistence mechanism supports all the _p_changed magic, but also exports _p_activate() and _p_deactivate(). The first makes a ghost a real object, the second makes a real object a ghost. Jeremy From jeremy@zope.com Mon Jul 22 22:24:08 2002 From: jeremy@zope.com (Jeremy Hylton) Date: Mon, 22 Jul 2002 17:24:08 -0400 (EDT) Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> References: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> Message-ID: <15676.30712.483484.650388@localhost.localdomain> I would like to see arbitrarily nested transactions supported in the next generation transaction API. Jeremy From pje@telecommunity.com Tue Jul 23 02:12:20 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Mon, 22 Jul 2002 21:12:20 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <3D3C615E.6030201@zope.com> References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> <5.1.0.14.0.20020722132838.05986020@mail.telecommunity.com> Message-ID: <5.1.0.14.0.20020722200752.05208970@mail.telecommunity.com> At 03:47 PM 7/22/02 -0400, Jim Fulton wrote: >Phillip J. Eby wrote: > > > But IMHO their use >>is specific to persistence mechanisms which use "pickle jar"-style or >>"shelve"-like primitive databases. (Primitive in the sense of not >>providing any concepts such as indexes or built-in search >>capabilities.) If you have a higher-level mechanism, even one as simple >>as SleepyCat DB (aka Berkeley DB) b-trees, you're most often better off >>using those features of the backend. > >I don't agree. I didn't qualify my statement sufficiently, then. :) See below. >>If this were not true, there'd be no need for any persistence mechanisms >>besides ZODB, and we wouldn't be having this conversation. :) > >There are lots of other reasons for a non-ZODB persistent storage >including: > >1) Need to store data in relational databases > > - Because they are trusted > > - because data needs to be accessed from other apps > > - because they may scale better for some apps Right, and if you're doing it because of the second or third sub-item above, you will have little use for BTrees. AFAICT, the only reason one would store a BTree in another BTree would be if you're doing ZODB-type things in an SQL db "because they are trusted". This is part of what I meant by "most often better off using those [higher-level] features of the back-end." Applications which have different read/write characteristics and structural/performance requirements than content management applications, will generally be *much* better off leaving these things to a good back-end, than managing BTrees themselves. >I think that there should, at least, be a standard cache interface. >It should be possible to develop data managers and caches independently. >Maybe we could include one or two standard implementations. These could >provide useful examples for other implementations and, of course, be >useful in themselves. Sure. I personally don't think there's much that you can standardize on in a caching API besides which mapping methods one is required to support, without getting into policy and use cases. But I'm probably biased by the relative simplicity of my own use cases re: caching, and by my intense desire to get an "official" persistence base into the standard library, at the expense of any actual persistence *mechanisms* if need be. I'm going to have to write my own mechanism anyway, so again I'm biased. :) >>>>* Take out the interfaces. :( I'd rather this were, "leave this in, in a >>>>way such that it works whether you have Interface or not", but the reality >>>>is that a dependency in the standard library on something outside the >>>>standard library is a big no-no, and just begging for breakage as soon as >>>>there *is* an Interface package (with a new API) in the standard library. >>> >>>I think that this is a very bad idea. I think the interfaces clarify things >>>quite a bit. >> >>I think maybe I was unclear. I certainly don't think that the interfaces >>should cease to exist, or that they should not exist as >>documentation. I'm referring to their inclusion as operating code, only. > >So you don't want them to get imported? It's not that I care one way or the other. Honestly, I'd rather see Interface end up in the standard library too - at least once the metaclass bug is fixed. :) But my overriding priority here is a standard for Persistence and Transaction bases for eventual inclusion in the standard library. I have many projects which desperately need good persistence and transaction frameworks, but I'm between a rock (ZODB 3) and a hard place (ZODB 4) right now. Both have transaction API's that are somewhat difficult to work with, and I need some of the things that are in ZODB 4, but if ZODB 4 is about to be re-factored... I'm stuck in the middle with code that could end up orphaned. Even if I go off and write everything I need "from scratch" in order to dodge out this dependency, it doesn't help me if the eventual standard doesn't match up closely enough with my work. I'm still left with "orphaned" code - sort of like a DB connection object created prior to adoption of a DBAPI standard. Thus, my objective is to keep the shortest possible distance between me and a Python community consensus on a base-level transaction and persistence API. I have a fairly limited time window, however, before I will have to pick something and do something, regardless of the long-term cost. :( >I was mainly refering to the handling of non-persistent mutable >stumbling block. This is a major stubling block and source of errors >to most ZODB users. Yeah, that one really requires metadata, or collaborative properties. But those are things that are also already in PEAK, so again I'm probably biased as to how difficult/available they are. Also, in the SQL world, the solution to non-persistent mutable data is actually quite trivial: don't have non-persistent mutable data. :) Seriously, since a data manager loads an object's state, it can *guarantee* that there will be no non-persistent mutable attributes. (Note that if the object replaces a persistent mutable with a non-persistent one, that will trigger a change, and the data manager can force it back to a persistent mutable when the state goes back to "up to date".) In the SQL world, a data manager *must* have this sort of schema knowledge in order to do its job. Pickle-driven data managers may have a harder time of this, of course, if they lack sufficient schema knowledge to manage object state in this fashion. Then again, perhaps we could solve the problem for pickle-driven databases as well, if there were a Python protocol for declaring immutability! Heck, in theory, one could use interface adaptation to transform objects like lists into persistent equivalents. It would only be necessary to do this, however, if the object whose state was being loaded didn't declare that it handled its own persistence properly. The performance/space issue of saving extra persistent objects could actually be dealt with by having the substituted objects implement only observation on behalf of their holder(s), rather than being actual persistent objects. >I agree that this is hard. It's really hard. I wasn't even suggesting >that we needed to solve this problem. I was merely pointing out that this >*is* a big deal for a lot of people. Understood. Ironically, enough, I think I have stumbled onto another mechanism for doing so, above. Newly created objects and their subobjects won't be observed, of course, but that's moot since they have to be referenced from another persistent object to get saved at all. In "rootless" persistence mechanisms (such as most SQL databases), the data manager has to explicitly add the object anyhow. So it seems that all that's needed is sufficient introspection capability to distinguish between: * A persistent object * An immutable * An "observed" mutable * An "unobserved" mutable With the ability to substitute a suitable observed mutable for an unobserved one, when state is loaded or saved. I'm going to think about this some more... It seems altogether too easy, so I'm sure there's something I'm missing. Most likely, it's just that the devil is in the details... the specific issues of introspection, selection, and substitution are likely to have lots of little gotchas. >>If our goal is to provide a Python core package for this in a speedy >>timeframe -- say this summer -- I think that developing and debugging a >>whole new way of doing things like this is probably out of the question. > >Agreed. OTOH, it wouldn't hurt to ponder other alternatives, if not now, >them maybe later. I admit I do enjoy trying to solve the problem. I'm just not optimistic about finding a simple solution. :) >>Thing is, *we don't have to actually solve this problem*. If we create a >>decent base API/implementation, there's no reason people can't create the >>proxies or class-substitution mechanisms on their own, using the base >>implementation to do the actual persistence part. In principle, it >>should be possible to create such a mechanism for arbitrary data managers. > >True. But maybe someone will think of a way to solve this without proxies >or alchemy? Unless you're going to fundamentally alter the Python object model, it's not doable. Python objects by definition get their behavior from their type. To change the behavior, you must either change the type, the type pointer in the object, or replace the object with another one. >>I'd like to rephrase that as being it notifies, *if* it has been >>requested to do so by the data manager. The data manager may decide to >>turn on or off such notifications at will. (In other words, I want my >>post-getattr hook function that can modify the result of the getattr, and >>I want it removable so I don't continue to pay in performance once all my >>state is loaded.) > >We need to think some more about this. I'd rather err on the side of >simple persistent objects and complex data managers. So would I, which is why I want the hook, so the data manager can provide the behavior, rather than building it into the object. :) >I'd also like persistent objects to be as lightweight as possible. >Carrying a bunch of attributes for hooks is worrysome/ Hm. Well, we're talking C-level slots here, and I only asked for one hook, myself. Guido suggested the setattr hook. :) I like lightweight in *performance*, and having a callable C function seems lighter in that sense than having the object look up an attribute on the data manager every time an attribute lookup is performed on it. Plus, the hook can be stateful, while a method on the data manager has to check state - which could require a re-entrant attribute lookup back to the object. >>> o The persistent object calls a method on the data manager when >>> it's state >>> needs to be loaded. >> >>As long as I still have the ability to set or remove a getattr-hook that >>works independently of this, I'm fine. > >Would different objects in the same DM have different values of the same hook? Different values, yes. Different non-empty values, probably not. In other words, I'm mainly interested in having the hook be "on" or "off" for a given data manager. >If so, why? I have only one use case for having different non-empty hook values for the same DM: polymorphism. But there are other ways to achieve it, so I don't think different non-empty values per DM is a requirement. I suppose you could then implement the hook as a bit flag rather than a hook pointer, but it seems to me the performance might be worth using a pointer instead of a bit flag. >A decent cache is going to handle objects differenty based on their states. >For example, a cache that deactivates objects when they haven't been used in a >while needs to know which objects are ghostifyable and needs to know when >ghostifyable objects have changed. So add "sticky"/"unsticky" messages, and we'd be done. Or, if "stickiness" represents a minority state among ghostable objects, don't even add this, because it'd be more efficient for the cache to just ask the object to deactivate itself and see what happens, than to send lots of "I'm sticky... whoops, now I'm not" messages to data managers. With the messages I listed previously, a data manager should have enough information. I'd rather we try to implement some data managers or caches and find we need to add something, than add a YAGNI on this one, because the performance penalty for unnecessary notifications seems potentially high, not to mention the added complexity for data managers to handle a bunch of extra messages. >>I've spent a lot of time hacking around the existing packages to do >>SQL/LDAP stuff, and others here should have strong experience using ZODB >>for its "natural" backends and application structures. That means we >>should be able to get pretty concrete about what is and isn't needed. >>In the absence of more use cases, I'm not sure what else is really needed >>besides what we've already discussed. Indeed, most of what I've outlined >>has been stuff I think should be taken *out*. >>To put it another way, I think we should have to justify everything we >>want to put *in*, not what we take out. Python standard library modules >>are widely distributed, and have a long life. Whatever we put in needs >>to have a healthy life expectancy! > >I don't think we should approach this effort with the assumption that the >first >version is going into the standard library. I'm pretty happy with the >persistence >mechanism I came up with for ZODB, but there are a lot of things I'd like >to fix. As I mentioned above, my primary goal is just to get a consensus for the basic interfaces. I'd be happy if we end up with something like a DBAPI PEP that everybody agreed on. The standard library is gravy, but I *do* want to see it there before too terribly long. IOW, I'd like this to be like the XML processing and distutils, which were separately distributed for a (Python) release or two as candidates for the standard library, and became standard later. From pje@telecommunity.com Tue Jul 23 02:29:23 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Mon, 22 Jul 2002 21:29:23 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <15676.26434.240121.243006@localhost.localdomain> References: <3.0.5.32.20020719115207.0086e100@telecommunity.com> <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> Message-ID: <5.1.0.14.0.20020722211245.0520c020@mail.telecommunity.com> At 04:12 PM 7/22/02 -0400, Jeremy Hylton wrote: > >>>>> "PJE" == Phillip J Eby writes: > > PJE> * Flag _p_changed *after* __setattr__, not before! This will > PJE> help co-operative transaction participants play nicely > PJE> together, since they can't "write through" a change if they're > PJE> getting notified *before* the change takes place! Docs should > PJE> also clarify that when set in other code, _p_changed should be > PJE> set at the latest possible moment, *after* the object is in its > PJE> new, stable state. > >Can you flesh out this request? The second sentence there suggests >interesting issues, but doesn't spell them out. > >As for when _p_changed should be set: Why does it matter? Because setting _p_changed triggers a notification to the DM, which may need to perform an immediate save of the object's state, if a transaction commit is already in progress. > PJE> * Get rid of the term "register", since objects won't > PJE> "register" with the transaction, and neither should they with > PJE> their data manager. They should "inform their data manager" > PJE> that they have changed. Something like an objectChanged() > PJE> message is appropriate in place of register(). I believe this > PJE> would clarify the API. > >I don't have a problem with register(). In what way is the current >interface unclear? "register" doesn't mean anything in the context of a data manager. It made some sense in reference to a transaction - presumably something registering with a transaction is some sort of transacted thing. Registering with a data manager, however, doesn't say anything about what it's being registered for or what this will do. "objectChanged()", however, would clearly state that this is a notice that an object has been changed. Also, "register" implies implementation that may not exist! Some data managers may save changes immediately, and not "register" anything about the object or the change. (Think, for example, of a chat room object implemented via a persistence mechanism.) >I'd like to see some comments from people who haven't already used >ZODB. I'd like to see some, too! If it was just going to be Jim and me we could've taken it to private e-mail and avoided having a SIG. :) > I worry that all the comments are coming from a small number of >people who wrote or use ZODB's persistent mechanism, and that we'll >make decisions will be limiting for other persistent applications. >(But maybe there aren't any other such applications/users.) Personally, I'm trying to speak as someone who has *wrestled* with ZODB, trying to make it do things it's not entirely suited for. As pro-ZODB as I may sound in some ways, my needs are pretty diametrically opposite a *lot* of ZODB's design parameters. I want: * Transparent use of legacy databases w/fixed schemas (vs. new DB format, fluid schema) * Strongly transactional caching (vs. out-of-date reads of objects not written in that txn) * High write-to-read ratio (vs. high read-to-write ratio) * Use DBMS indexing and query capabilities (vs. creating them "from scratch") * Undo and versions optionally handled at the application domain level (vs. building them into the infrastructure) If you can think of a ZODB design parameter that I *don't* want the opposite of (besides things like lightweight, high performance, low memory, easy to use, etc., that nobody in their right mind would disagree with), please let me know. :) So, given that I'm so "opposite" in my needs, I think it's really quite impressive that it can accomodate me with so little stretching. There isn't anything I've proposed in Straw Man and Straw Baby that I can't do with the existing ZODB 4 code, if I'm willing to hack at it a bit. Okay, maybe more than a bit. But it's at least *possible*. Honestly, I was and am much more disturbed by the possibility of ZODB 4 undergoing an API upheaval, than I am about not getting every little thing I want. A de-facto "standard" framework that I can work around, is better for me than a fast-moving target that might someday meet my needs better. Practicality beats purity, and all that. :) From pje@telecommunity.com Tue Jul 23 02:32:50 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Mon, 22 Jul 2002 21:32:50 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: References: <15676.25855.718959.288651@localhost.localdomain> Message-ID: <5.1.0.14.0.20020722212950.0520d600@mail.telecommunity.com> At 04:43 PM 7/22/02 -0400, Tim Peters wrote: > > but not _p_state. The purpose of assignment to _p_changed is to mark an > > object as changed. Assignment seems clear here. _p_changed is a > > flag, normally false; when an object is changed, it is set to true. > > Why would a method call be any clearer? > >Presumably so that interested parties could influence what happens when an >object becomes "dirty"? Maybe update a distributed cache, who knows. I >suspect Philip Eby was getting at something related with his plea to set >_p_changed only after an object is an a sane state again after a change is >complete. Yes, that's precisely it. Updating a distributed cache would be another example of a "write-through changes" situation. >OTOH, method calls are a large overhead whem the mutation is simple; e.g., >if a persistent list has to call a changed() method every time someone does > > a[i] = 6 > >that's a real drag on potential performance. Right, which is why the existing self._p_changed = 1 thing isn't too bad as long as the descriptor is in C, and only calls through to the DM when the object transitions from up-to-date to "dirty". Of course, write-through DM's will immediately reset the state to "up-to-date", but if they want to get called on every change, that's the price they'll pay. From pje@telecommunity.com Tue Jul 23 02:34:53 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Mon, 22 Jul 2002 21:34:53 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <15676.28655.414460.130631@slothrop.zope.com> References: <15676.25855.718959.288651@localhost.localdomain> Message-ID: <5.1.0.14.0.20020722213305.0520e0f0@mail.telecommunity.com> At 04:49 PM 7/22/02 -0400, Jeremy Hylton wrote: > >>>>> "TP" == Tim Peters writes: > >It's not a one-bit flag, and that's the part I want to simplify. You >can also: > > - set _p_changed to None, which requests that the object become a > ghost. > > - delete the _p_changed attribute (del obj._p_changed) which also > asks the object to become a ghost, but in subtly different ways > than just setting the attribute to None. > > - revive a ghost, although I'm not entirely clear how this work. > >The Zope3 persistence mechanism supports all the _p_changed magic, but >also exports _p_activate() and _p_deactivate(). The first makes a >ghost a real object, the second makes a real object a ghost. I'd be happy sticking with the methods for activate/deactivate, and simplifying _p_changed to a one-bit flag. The change flag wants to be as lightweight as possible. And we can even make it a boolean, to make Guido happy. ;) From pje@telecommunity.com Tue Jul 23 02:37:27 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Mon, 22 Jul 2002 21:37:27 -0400 Subject: [Persistence-sig] Nested Transactions In-Reply-To: <15676.30712.483484.650388@localhost.localdomain> References: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> Message-ID: <5.1.0.14.0.20020722213513.05099560@mail.telecommunity.com> At 05:24 PM 7/22/02 -0400, Jeremy Hylton wrote: >I would like to see arbitrarily nested transactions supported in the >next generation transaction API. Could you add some more specifics? For example, what happens if a transaction participant can't support nested transactions? I gather that this capability is not exactly common, even among SQL databases. From jeremy@alum.mit.edu Tue Jul 23 04:17:41 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 22 Jul 2002 23:17:41 -0400 Subject: [Persistence-sig] Nested Transactions In-Reply-To: <5.1.0.14.0.20020722213513.05099560@mail.telecommunity.com> References: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <5.1.0.14.0.20020722213513.05099560@mail.telecommunity.com> Message-ID: <15676.51925.551981.560829@slothrop.zope.com> >>>>> "PJE" == Phillip J Eby writes: PJE> At 05:24 PM 7/22/02 -0400, Jeremy Hylton wrote: >> I would like to see arbitrarily nested transactions supported in >> the next generation transaction API. PJE> Could you add some more specifics? For example, what happens PJE> if a transaction participant can't support nested transactions? PJE> I gather that this capability is not exactly common, even among PJE> SQL databases. I gather you want more specifics about the API, which I'll post as soon as I work them out :-). The general idea is clear enough, I think. A larger transaction can be composed of several components transactions, each of which is an atomic action. This approach can be applied recursively. If the backend doesn't support it, then the application gets an exception when I tries to use the feature. The APIs should support it, though, so that applications can be written against it. I didn't think nested transactions were that uncommon, BTW. At least some of the EJB servers support them. The two interface ideas I have are a savepoint() method or a way to create a new transaction and specify a parent. A savepoint() method would return an object with a rollback() method. When you call savepoint(), you commit a subtransaction. If you call rollback(), you roll back changes to the savepoint. I think this API can support arbitrarily nested transactions, although the application will have to work to manage the savepoint objects. The other option seems simpler for nesting, because you explicitly begin a new transaction for each atomic subaction. The problem with it is that the current transaction API doesn't have an explicit begin phase. (Is there a common pattern for RDBMS? I've used some that do an implicit BEGIN WORK and some that require an explicit one.) Jeremy From iiourov@yahoo.com Tue Jul 23 08:13:11 2002 From: iiourov@yahoo.com (Ilia Iourovitski) Date: Tue, 23 Jul 2002 00:13:11 -0700 (PDT) Subject: [Persistence-sig] Nested Transactions In-Reply-To: <15676.51925.551981.560829@slothrop.zope.com> Message-ID: <20020723071311.39096.qmail@web20707.mail.yahoo.com> In odmg style API transaction should be started explicitely. In RDBMS world user can explicitly enable/disable transaction control typically through connection->setAutoCommit(false) Ilia --- Jeremy Hylton wrote: > >>>>> "PJE" == Phillip J Eby > writes: > > PJE> At 05:24 PM 7/22/02 -0400, Jeremy Hylton > wrote: > >> I would like to see arbitrarily nested > transactions supported in > >> the next generation transaction API. > > PJE> Could you add some more specifics? For > example, what happens > PJE> if a transaction participant can't support > nested transactions? > PJE> I gather that this capability is not exactly > common, even among > PJE> SQL databases. > > I gather you want more specifics about the API, > which I'll post as > soon as I work them out :-). The general idea is > clear enough, I > think. A larger transaction can be composed of > several components > transactions, each of which is an atomic action. > This approach can be > applied recursively. > > If the backend doesn't support it, then the > application gets an > exception when I tries to use the feature. The APIs > should support > it, though, so that applications can be written > against it. > > I didn't think nested transactions were that > uncommon, BTW. At least > some of the EJB servers support them. > > The two interface ideas I have are a savepoint() > method or a way to > create a new transaction and specify a parent. A > savepoint() method > would return an object with a rollback() method. > When you call > savepoint(), you commit a subtransaction. If you > call rollback(), you > roll back changes to the savepoint. I think this > API can support > arbitrarily nested transactions, although the > application will have to > work to manage the savepoint objects. > > The other option seems simpler for nesting, because > you explicitly > begin a new transaction for each atomic subaction. > The problem with > it is that the current transaction API doesn't have > an explicit begin > phase. (Is there a common pattern for RDBMS? I've > used some that do > an implicit BEGIN WORK and some that require an > explicit one.) > > Jeremy > > > > > _______________________________________________ > Persistence-sig mailing list > Persistence-sig@python.org > http://mail.python.org/mailman-21/listinfo/persistence-sig __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From iiourov@yahoo.com Tue Jul 23 08:22:44 2002 From: iiourov@yahoo.com (Ilia Iourovitski) Date: Tue, 23 Jul 2002 00:22:44 -0700 (PDT) Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <15676.26434.240121.243006@localhost.localdomain> Message-ID: <20020723072244.83368.qmail@web20704.mail.yahoo.com> For RDBMS based storages api should provides the following method: create(object) storage shall populated id from rdbms which is usually primary key. delete(object) load(object type, object id)->object query(string, parameters)->list of objects or smart collection Those methods can be placed in Persistence/IPersistentDataManager.py Thanks Ilia --- Jeremy Hylton wrote: > >>>>> "PJE" == Phillip J Eby > writes: > > PJE> * Flag _p_changed *after* __setattr__, not > before! This will > PJE> help co-operative transaction participants > play nicely > PJE> together, since they can't "write through" a > change if they're > PJE> getting notified *before* the change takes > place! Docs should > PJE> also clarify that when set in other code, > _p_changed should be > PJE> set at the latest possible moment, *after* > the object is in its > PJE> new, stable state. > > Can you flesh out this request? The second sentence > there suggests > interesting issues, but doesn't spell them out. > > As for when _p_changed should be set: Why does it > matter? > > PJE> * Keep the _p_atime slot, but don't fill it > with anything by > PJE> default. > > I'd just as soon drop it completely. If a > particular application > wants to extend the base persistence interface, it > can. > > PJE> * Get rid of the term "register", since > objects won't > PJE> "register" with the transaction, and neither > should they with > PJE> their data manager. They should "inform > their data manager" > PJE> that they have changed. Something like an > objectChanged() > PJE> message is appropriate in place of > register(). I believe this > PJE> would clarify the API. > > I don't have a problem with register(). In what way > is the current > interface unclear? > > PJE> By the way, my rationale for not taking any > radical new > PJE> approaches to persistence, observation, or > notification in this > PJE> proposal is that the existing Persistence > package is > PJE> "transparent" enough, and has the benefit of > lots of field > PJE> experience. > > I'd like to see some comments from people who > haven't already used > ZODB. I worry that all the comments are coming from > a small number of > people who wrote or use ZODB's persistent mechanism, > and that we'll > make decisions will be limiting for other persistent > applications. > (But maybe there aren't any other such > applications/users.) > > Jeremy > > > > _______________________________________________ > Persistence-sig mailing list > Persistence-sig@python.org > http://mail.python.org/mailman-21/listinfo/persistence-sig __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From pyth@devel.trillke.net Tue Jul 23 08:32:42 2002 From: pyth@devel.trillke.net (holger krekel) Date: Tue, 23 Jul 2002 09:32:42 +0200 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <15676.26434.240121.243006@localhost.localdomain>; from jeremy@zope.com on Mon, Jul 22, 2002 at 04:12:50PM -0400 References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> <15676.26434.240121.243006@localhost.localdomain> Message-ID: <20020723093242.E10625@prim.han.de> Jeremy Hylton wrote: > I'd like to see some comments from people who haven't already used > ZODB. I worry that all the comments are coming from a small number of > people who wrote or use ZODB's persistent mechanism, and that we'll > make decisions will be limiting for other persistent applications. > (But maybe there aren't any other such applications/users.) I am following the threads but haven't found time to contribute, although i really, really want to. Next week should be much better. Actually i have been quite involved with developing a CORBA Object transaction service in C++ for the realtime TAO-Object broker. One spin-off currently lives at xots.sourceforge.net but personally i am mainly concentrating on integrating TAO with python first :-) Interestingly, several people have asked me for transactions *without* persistence. They just wanted a lightweight in-memory protocol for handling atomicity and consistency and didn't give a damn about durability and transaction monitors. Overall, i'd like to have the basic APIs (as much) orthogonal to each other (as possible). Btw, i surely qualify as not knowing ZODB very much :-) Please have some patience while i am trying to put my thoughts into order next week. My starting point will probably be Phillip's API. regards, holger From jim@zope.com Tue Jul 23 15:15:03 2002 From: jim@zope.com (Jim Fulton) Date: Tue, 23 Jul 2002 10:15:03 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> <5.1.0.14.0.20020722132838.05986020@mail.telecommunity.com> <5.1.0.14.0.20020722200752.05208970@mail.telecommunity.com> Message-ID: <3D3D64E7.2010508@zope.com> Phillip J. Eby wrote: > At 03:47 PM 7/22/02 -0400, Jim Fulton wrote: > >> Phillip J. Eby wrote: >> ... >> I think that there should, at least, be a standard cache interface. >> It should be possible to develop data managers and caches independently. >> Maybe we could include one or two standard implementations. These could >> provide useful examples for other implementations and, of course, be >> useful in themselves. > > > Sure. I personally don't think there's much that you can standardize on > in a caching API besides which mapping methods one is required to > support, without getting into policy and use cases. I expect that you can hide behind the interface. ... >>> I think maybe I was unclear. I certainly don't think that the >>> interfaces should cease to exist, or that they should not exist as >>> documentation. I'm referring to their inclusion as operating code, >>> only. >> >> >> So you don't want them to get imported? > > > It's not that I care one way or the other. Honestly, I'd rather see > Interface end up in the standard library too - at least once the > metaclass bug is fixed. :) Whqat metaclass bug? > But my overriding priority here is a > standard for Persistence and Transaction bases for eventual inclusion in > the standard library. I'd like to keep the interfaces but make them resilient to the absense of the interface package. I'll deal with those details. .... >> True. But maybe someone will think of a way to solve this without proxies >> or alchemy? > > > Unless you're going to fundamentally alter the Python object model, it's > not doable. Python objects by definition get their behavior from their > type. To change the behavior, you must either change the type, the type > pointer in the object, or replace the object with another one. There's been a proposal for adding an observer framework to Python. A suitable general observer framework just might allow the problem to be solved. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From pje@telecommunity.com Tue Jul 23 15:29:34 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 23 Jul 2002 10:29:34 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <3D3D64E7.2010508@zope.com> References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> <5.1.0.14.0.20020722132838.05986020@mail.telecommunity.com> <5.1.0.14.0.20020722200752.05208970@mail.telecommunity.com> Message-ID: <5.1.0.14.0.20020723102440.0502cec0@mail.telecommunity.com> At 10:15 AM 7/23/02 -0400, Jim Fulton wrote: >Phillip J. Eby wrote: >>At 03:47 PM 7/22/02 -0400, Jim Fulton wrote: >>>I think that there should, at least, be a standard cache interface. >>>It should be possible to develop data managers and caches independently. >>>Maybe we could include one or two standard implementations. These could >>>provide useful examples for other implementations and, of course, be >>>useful in themselves. >> >>Sure. I personally don't think there's much that you can standardize on >>in a caching API besides which mapping methods one is required to >>support, without getting into policy and use cases. > >I expect that you can hide behind the interface. Huh? >>It's not that I care one way or the other. Honestly, I'd rather see >>Interface end up in the standard library too - at least once the >>metaclass bug is fixed. :) > >Whqat metaclass bug? You know, the one in Interfaces.Implements, where it doesn't treat metaclass instances as classes. The one you said you were okay with fixing, that I provided a patch for, and which Steve Alexander was checking to verify that it didn't break anything else in Zope 3... The one that's completely off-topic for this list. :) >>>True. But maybe someone will think of a way to solve this without proxies >>>or alchemy? >> >>Unless you're going to fundamentally alter the Python object model, it's >>not doable. Python objects by definition get their behavior from their >>type. To change the behavior, you must either change the type, the type >>pointer in the object, or replace the object with another one. > >There's been a proposal for adding an observer framework to Python. >A suitable general observer framework just might allow the problem to be >solved. Yes, and any such observer framework is going to have to work via changing the type, type pointer, or replacing the object, unless it's going to be by altering the fundamental Python object model. :) From jim@zope.com Tue Jul 23 15:32:45 2002 From: jim@zope.com (Jim Fulton) Date: Tue, 23 Jul 2002 10:32:45 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API References: <20020723072244.83368.qmail@web20704.mail.yahoo.com> Message-ID: <3D3D690D.8040905@zope.com> Ilia Iourovitski wrote: > For RDBMS based storages api should > provides the following method: I'll first note that, if these methods are needed at all, they should be methods on a specific data manager. They do not affect the transaction or the persistence frameworks. > create(object) storage shall populated id from rdbms > which is usually primary key. This should not be necessary. One should be able to design a data manager that detected new objects and assigned them ids when referencing objects are created. > delete(object) I can imagine a datamanager that lacked garbage collection could need this. > load(object type, object id)->object An object type should be unnecessary. If a data manager needs to track this sort of information, it should embed it in the object id. Note also, that persistence applications load most objects automatically through object traverssal. It is often necessary to explicitly load one or more root objects to provide a starting place for traversl. > query(string, parameters)->list of objects or smart > collection > > Those methods can be placed in > Persistence/IPersistentDataManager.py No, these methods are specific to particular data manager APIs, although I can imagine a number of data managers sharing an API like the one above. Note that IPersistentDataManager.py is an interface for use by persistent objects. It does not include all data-manager methods. Similarly, Transaction.IDataManager.IDataManager is the data-manager API used by the transaction framework. Data managers will implement Persistence.IPersistentDataManager.IPersistentDataManager and Transaction.IDataManager.IDataManager as well as application APIs like the one you propose above. Perhaps there should be some common data-manager application API somewhat like the one you propose above. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jacobs@penguin.theopalgroup.com Tue Jul 23 15:38:05 2002 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Tue, 23 Jul 2002 10:38:05 -0400 (EDT) Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <5.1.0.14.0.20020723102440.0502cec0@mail.telecommunity.com> Message-ID: On Tue, 23 Jul 2002, Phillip J. Eby wrote: > >There's been a proposal for adding an observer framework to Python. > >A suitable general observer framework just might allow the problem to be > >solved. > > Yes, and any such observer framework is going to have to work via changing > the type, type pointer, or replacing the object, unless it's going to be by > altering the fundamental Python object model. :) I think that this can be done with a light-weight C proxy methods -- hopefully glued in by an intelligent meta-class. That way, we do not lose much performance, and don't have to butcher Python's object model too much. My goal is to migrate towards a persistence system that uses metaclasses and proxies to do most of the heavy lifting involved in transforming general user-specified objects into persistent objects. -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From jim@zope.com Tue Jul 23 15:44:34 2002 From: jim@zope.com (Jim Fulton) Date: Tue, 23 Jul 2002 10:44:34 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <3.0.5.32.20020719115207.0086e100@telecommunity.com> <5.1.0.14.0.20020722132838.05986020@mail.telecommunity.com> <5.1.0.14.0.20020722200752.05208970@mail.telecommunity.com> <5.1.0.14.0.20020723102440.0502cec0@mail.telecommunity.com> Message-ID: <3D3D6BD2.9070807@zope.com> Phillip J. Eby wrote: > At 10:15 AM 7/23/02 -0400, Jim Fulton wrote: > >> Phillip J. Eby wrote: >> >>> At 03:47 PM 7/22/02 -0400, Jim Fulton wrote: >>> >>>> I think that there should, at least, be a standard cache interface. >>>> It should be possible to develop data managers and caches >>>> independently. >>>> Maybe we could include one or two standard implementations. These could >>>> provide useful examples for other implementations and, of course, be >>>> useful in themselves. >>> >>> >>> Sure. I personally don't think there's much that you can standardize >>> on in a caching API besides which mapping methods one is required to >>> support, without getting into policy and use cases. >> >> >> I expect that you can hide behind the interface. > > > Huh? Hee hee. Sorry, I expect that you can hide the policies behind the interface. ... >>>> True. But maybe someone will think of a way to solve this without >>>> proxies >>>> or alchemy? >>> >>> >>> Unless you're going to fundamentally alter the Python object model, >>> it's not doable. Python objects by definition get their behavior >>> from their type. To change the behavior, you must either change the >>> type, the type pointer in the object, or replace the object with >>> another one. >> >> >> There's been a proposal for adding an observer framework to Python. >> A suitable general observer framework just might allow the problem to be >> solved. > > > Yes, and any such observer framework is going to have to work via > changing the type, type pointer, or replacing the object, unless it's > going to be by altering the fundamental Python object model. :) I don't think it would have to much with the type at all. If objects generate the right events, I think that properly designed observers can do the necessary work. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim@zope.com Tue Jul 23 15:52:57 2002 From: jim@zope.com (Jim Fulton) Date: Tue, 23 Jul 2002 10:52:57 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API References: Message-ID: <3D3D6DC9.1060000@zope.com> Kevin Jacobs wrote: > On Tue, 23 Jul 2002, Phillip J. Eby wrote: > >>>There's been a proposal for adding an observer framework to Python. >>>A suitable general observer framework just might allow the problem to be >>>solved. >>> >>Yes, and any such observer framework is going to have to work via changing >>the type, type pointer, or replacing the object, unless it's going to be by >>altering the fundamental Python object model. :) >> > > I think that this can be done with a light-weight C proxy methods -- > hopefully glued in by an intelligent meta-class. That way, we do not lose > much performance, and don't have to butcher Python's object model too much. > > My goal is to migrate towards a persistence system that uses metaclasses and > proxies to do most of the heavy lifting involved in transforming > general user-specified objects into persistent objects. Proxies can be a useful tool. We certainly use them a lot, although I sometimes feel dirty afterwards. ;) There are a *lot* of gotchas. I would definately *not* recommend using them for persistence. I would find a persistent mix-in to be far less intrusive than proxies. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jacobs@penguin.theopalgroup.com Tue Jul 23 16:08:36 2002 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Tue, 23 Jul 2002 11:08:36 -0400 (EDT) Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <3D3D6DC9.1060000@zope.com> Message-ID: On Tue, 23 Jul 2002, Jim Fulton wrote: > Proxies can be a useful tool. We certainly use them a lot, although > I sometimes feel dirty afterwards. ;) There are a *lot* of gotchas. > I would definately *not* recommend using them for persistence. I > would find a persistent mix-in to be far less intrusive than proxies. Believe it or not, but we're on the same wavelength: I'm thinking about proxy-methods a la aspect oriented programming, more than whole proxy objects. e.g. cooperative __{g,s}et{attr,item}__ methods that implement observer semantics and can forward to base-class methods. Whole object proxies have the problem that object identity and type information is obscured in ways that are contrary to standard Python idioms. -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From pje@telecommunity.com Tue Jul 23 16:31:50 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 23 Jul 2002 11:31:50 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: References: <3D3D6DC9.1060000@zope.com> Message-ID: <5.1.0.14.0.20020723112620.04fd6b50@mail.telecommunity.com> At 11:08 AM 7/23/02 -0400, Kevin Jacobs wrote: >On Tue, 23 Jul 2002, Jim Fulton wrote: > > Proxies can be a useful tool. We certainly use them a lot, although > > I sometimes feel dirty afterwards. ;) There are a *lot* of gotchas. > > I would definately *not* recommend using them for persistence. I > > would find a persistent mix-in to be far less intrusive than proxies. > >Believe it or not, but we're on the same wavelength: > >I'm thinking about proxy-methods a la aspect oriented programming, more than >whole proxy objects. e.g. cooperative __{g,s}et{attr,item}__ methods that >implement observer semantics and can forward to base-class methods. Whole >object proxies have the problem that object identity and type information is >obscured in ways that are contrary to standard Python idioms. So, you're saying you want to alter the types, then? The interesting part of that is how to alter them in such a way that your observing code doesn't get re-entered when you're modifying both subclasses and base classes of the objects. You'd need some kind of thread-specific collaboration stack, I think. From jacobs@penguin.theopalgroup.com Tue Jul 23 16:37:27 2002 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Tue, 23 Jul 2002 11:37:27 -0400 (EDT) Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <5.1.0.14.0.20020723112620.04fd6b50@mail.telecommunity.com> Message-ID: On Tue, 23 Jul 2002, Phillip J. Eby wrote: > At 11:08 AM 7/23/02 -0400, Kevin Jacobs wrote: > >On Tue, 23 Jul 2002, Jim Fulton wrote: > > > Proxies can be a useful tool. We certainly use them a lot, although > > > I sometimes feel dirty afterwards. ;) There are a *lot* of gotchas. > > > I would definately *not* recommend using them for persistence. I > > > would find a persistent mix-in to be far less intrusive than proxies. > > > >Believe it or not, but we're on the same wavelength: > > > >I'm thinking about proxy-methods a la aspect oriented programming, more than > >whole proxy objects. e.g. cooperative __{g,s}et{attr,item}__ methods that > >implement observer semantics and can forward to base-class methods. Whole > >object proxies have the problem that object identity and type information is > >obscured in ways that are contrary to standard Python idioms. > > So, you're saying you want to alter the types, then? The interesting part > of that is how to alter them in such a way that your observing code doesn't > get re-entered when you're modifying both subclasses and base classes of > the objects. You'd need some kind of thread-specific collaboration stack, > I think. I suppose, though saying 'alter the types' implies slightly different things to me. I don't see great difficulty in isolating subclass and superclass modifications, although performance is clearly an important issue. As for the thread-specific business, you've totally lost me. Can you provide a use-case so that I can better understand where you are coming from? Thanks, -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From pje@telecommunity.com Tue Jul 23 17:00:11 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 23 Jul 2002 12:00:11 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: References: <5.1.0.14.0.20020723112620.04fd6b50@mail.telecommunity.com> Message-ID: <5.1.0.14.0.20020723114104.0541e4f0@mail.telecommunity.com> At 11:37 AM 7/23/02 -0400, Kevin Jacobs wrote: >On Tue, 23 Jul 2002, Phillip J. Eby wrote: > > > So, you're saying you want to alter the types, then? The interesting part > > of that is how to alter them in such a way that your observing code > doesn't > > get re-entered when you're modifying both subclasses and base classes of > > the objects. You'd need some kind of thread-specific collaboration stack, > > I think. > >I suppose, though saying 'alter the types' implies slightly different things I mean, if you're proxying methods, presumably you're doing so by altering the methods provided by the type, unless you mean to change the type's type so that the methods are altered on the fly. Either way, a change to the type instance. :) >to me. I don't see great difficulty in isolating subclass and superclass >modifications, although performance is clearly an important issue. As for >the thread-specific business, you've totally lost me. Can you provide a >use-case so that I can better understand where you are coming from? Consider a co-operative method that performs a super() call. If one surrounds both the super and subclass with observer calls, they will take place more than once. Perhaps that's what you mean by performance; I suppose if you are strictly observing things, it may not be a big deal to have the methods called more than once. My comment about thread-specificness was about a way to ensure that the wrapper method only gets called once. It's not relevant if you don't plan to ensure that wrappers on co-operative methods are called only once. I should note, however, that there is one possibly rather important use case for not calling a wrapper more than once: object changes. Let's say that class B is a subclass of class A. B had an invariant that attribute "q" is always 3 times attribute "r", and has a setR() method that sets "r". It uses a super() call to class A to do the actual setting of R, and then sets the "q" attribute. Now, if there is a post-return observer associated with the setR() method in both A and B, it will be called at a point where it will announce a state that is valid for objects of type A, but violates an invariant for the specific instance being announced. (Also, even if you didn't care about publishing an invalid state, it should be noted that use cases like Tim Peters' example of a distributed cache would really multiply the performance issue, especially if you're talking about a deep hierarchy of super() calls.) Anyway, if we're strictly talking about observers, the simplest way to address this might be to carry a per-instance "nesting count" that you increment on entry to every proxy and decrement on exit. When the count reaches zero on exit, fire any pending observation events. Downside to this approach: if multiple threads enter overlapping calls on the object, a sort of "livelock" can occur where the object never issues any events. To address that, you would have to have at least per-thread counters for each instance, which adds some more performance overhead for access. This is where my comment about thread-specific collaboration stacks came from. From iiourov@yahoo.com Tue Jul 23 18:08:36 2002 From: iiourov@yahoo.com (Ilia Iourovitski) Date: Tue, 23 Jul 2002 10:08:36 -0700 (PDT) Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <3D3D690D.8040905@zope.com> Message-ID: <20020723170836.41755.qmail@web20702.mail.yahoo.com> --- Jim Fulton wrote: > Ilia Iourovitski wrote: > > For RDBMS based storages api should > > provides the following method: > > I'll first note that, if these methods are needed > at all, they should be methods on a specific data > manager. > They do not affect the transaction or the > persistence > frameworks. > > > > create(object) storage shall populated id from > rdbms > > which is usually primary key. > > This should not be necessary. One should be able to > design a data manager that detected new objects and > assigned them ids when referencing objects are > created. Typical storage (rdbms, odbms, xml like xindicea) do not provide root object. So after transaction started object must be loaded from storage or created. > > > delete(object) > > I can imagine a datamanager that lacked garbage > collection could > need this. > in case of rdbms there are objects which are not referenced. > > load(object type, object id)->object > > An object type should be unnecessary. If a data > manager > needs to track this sort of information, it should > embed it in the object id. In rdbms case id usually integer. adding the whole package/class name can be expensive. > > Note also, that persistence applications load most > objects > automatically through object traverssal. It is often > necessary to explicitly load one or more root > objects to > provide a starting place for traversl. > > > > query(string, parameters)->list of objects or > smart > > collection > > > > Those methods can be placed in > > Persistence/IPersistentDataManager.py > > No, these methods are specific to particular data > manager > APIs, although I can imagine a number of data > managers sharing an > API like the one above. Note that > IPersistentDataManager.py is > an interface for use by persistent objects. It does > not include > all data-manager methods. Similarly, > Transaction.IDataManager.IDataManager is the > data-manager API > used by the transaction framework. > And most storages like rdbms, ldap, xml has it. > Data managers will implement > Persistence.IPersistentDataManager.IPersistentDataManager > and > Transaction.IDataManager.IDataManager as well as > application APIs > like the one you propose above. Perhaps there should > be some > common data-manager application API somewhat like > the one you propose > above. > > Jim > > > -- > Jim Fulton mailto:jim@zope.com > Python Powered! > CTO (888) 344-4332 > http://www.python.org > Zope Corporation http://www.zope.com > http://www.zope.org > __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From pje@telecommunity.com Tue Jul 23 19:05:35 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 23 Jul 2002 14:05:35 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <20020723170836.41755.qmail@web20702.mail.yahoo.com> References: <3D3D690D.8040905@zope.com> Message-ID: <5.1.0.14.0.20020723140040.0519cc90@mail.telecommunity.com> At 10:08 AM 7/23/02 -0700, Ilia Iourovitski wrote: > > > > load(object type, object id)->object > > > > An object type should be unnecessary. If a data > > manager > > needs to track this sort of information, it should > > embed it in the object id. > >In rdbms case id usually integer. adding the whole >package/class name can be expensive. This is easily addressed by using separate data managers for each table or other base class type. No need to carry the type in the object ID. > > > query(string, parameters)->list of objects or > > smart > > > collection > > > > > > Those methods can be placed in > > > Persistence/IPersistentDataManager.py > > > > No, these methods are specific to particular data > > manager > > APIs, although I can imagine a number of data > > managers sharing an > > API like the one above. Note that > > IPersistentDataManager.py is > > an interface for use by persistent objects. It does > > not include > > all data-manager methods. Similarly, > > Transaction.IDataManager.IDataManager is the > > data-manager API > > used by the transaction framework. > > >And most storages like rdbms, ldap, xml has it. The most straightforward way to handle queries is by creating query data managers, which take OIDs that represent the parameters of the query. Note, by the way, that IPersistentDataManager is an interface exposed to persistent objects by their data manager. It is *not* the interface a data manager exposes to application code, which can and should be quite different. > > Data managers will implement > > >Persistence.IPersistentDataManager.IPersistentDataManager > > and > > Transaction.IDataManager.IDataManager as well as > > application APIs > > like the one you propose above. Perhaps there should > > be some > > common data-manager application API somewhat like > > the one you propose > > above. I agree with Jim that none of this stuff is needed in the interface that a data manager exposes to persistent objects. This stuff would be in a data manager's application-level interface, and I don't see any need for standardization there; that's an area for value-add by competing persistence mechanisms and data manager implementations. Any standardization of them now would be counter-productive, I think. From iiourov@yahoo.com Tue Jul 23 19:35:48 2002 From: iiourov@yahoo.com (Ilia Iourovitski) Date: Tue, 23 Jul 2002 11:35:48 -0700 (PDT) Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <5.1.0.14.0.20020723140040.0519cc90@mail.telecommunity.com> Message-ID: <20020723183548.42816.qmail@web20705.mail.yahoo.com> --- "Phillip J. Eby" wrote: > At 10:08 AM 7/23/02 -0700, Ilia Iourovitski wrote: > > > > > > load(object type, object id)->object > > > > > > An object type should be unnecessary. If a data > > > manager > > > needs to track this sort of information, it > should > > > embed it in the object id. > > > >In rdbms case id usually integer. adding the whole > >package/class name can be expensive. > > This is easily addressed by using separate data > managers for each table or > other base class type. No need to carry the type in > the object ID. > You mean one data manager per table. Too much. > > > > > query(string, parameters)->list of objects or > > > smart > > > > collection > > > > > > > > Those methods can be placed in > > > > Persistence/IPersistentDataManager.py > > > > > > No, these methods are specific to particular > data > > > manager > > > APIs, although I can imagine a number of data > > > managers sharing an > > > API like the one above. Note that > > > IPersistentDataManager.py is > > > an interface for use by persistent objects. It > does > > > not include > > > all data-manager methods. Similarly, > > > Transaction.IDataManager.IDataManager is the > > > data-manager API > > > used by the transaction framework. > > > > >And most storages like rdbms, ldap, xml has it. > > The most straightforward way to handle queries is by > creating query data > managers, which take OIDs that represent the > parameters of the query. > Most of the time people retrive object by attributes. not by OID. > Note, by the way, that IPersistentDataManager is an > interface exposed to > persistent objects by their data manager. It is > *not* the interface a data > manager exposes to application code, which can and > should be quite different. > > > > > Data managers will implement > > > > >Persistence.IPersistentDataManager.IPersistentDataManager > > > and > > > Transaction.IDataManager.IDataManager as well as > > > application APIs > > > like the one you propose above. Perhaps there > should > > > be some > > > common data-manager application API somewhat > like > > > the one you propose > > > above. > > I agree with Jim that none of this stuff is needed > in the interface that a > data manager exposes to persistent objects. This > stuff would be in a data > manager's application-level interface, and I don't > see any need for > standardization there; that's an area for value-add > by competing > persistence mechanisms and data manager > implementations. Any > standardization of them now would be > counter-productive, I think. > It's already exist. Look at the JDO. In java land OR Mappers popular because instead of learning every time different api/query language you can use odmg/oql against rdmbs, odbms, ldap, xml, you name it. It is major selling point for "generic persistance" toolkit. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From pje@telecommunity.com Tue Jul 23 20:19:32 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 23 Jul 2002 15:19:32 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <20020723183548.42816.qmail@web20705.mail.yahoo.com> References: <5.1.0.14.0.20020723140040.0519cc90@mail.telecommunity.com> Message-ID: <5.1.0.14.0.20020723150312.04f70390@mail.telecommunity.com> At 11:35 AM 7/23/02 -0700, Ilia Iourovitski wrote: >--- "Phillip J. Eby" wrote: > > At 10:08 AM 7/23/02 -0700, Ilia Iourovitski wrote: > > > > > > > > load(object type, object id)->object > > > > > > > > An object type should be unnecessary. If a data > > > > manager > > > > needs to track this sort of information, it > > should > > > > embed it in the object id. > > > > > >In rdbms case id usually integer. adding the whole > > >package/class name can be expensive. > > > > This is easily addressed by using separate data > > managers for each table or > > other base class type. No need to carry the type in > > the object ID. > > >You mean one data manager per table. Too much. Why? I could simply do something like this: class MyDBManager: def __init__(self, sqlconn): self.table1 = TableDBManager("table1", sqlconn, ...) self.table2 = TableDBManager("table2", sqlconn, ...) ... myDB = MyDBManager(someSQLconnection) And then refer to myDB.table1['someKey'] to load an object. This doesn't seem like "too much", especially since you could generate the individual DM's based on metadata. At any rate, this is pretty much the approach I intend to use myself, except that using PEAK eliminates the need for the __init__ method. > > > > > query(string, parameters)->list of objects or > > > > smart > > > > > collection > > > > > > > > > > Those methods can be placed in > > > > > Persistence/IPersistentDataManager.py > > > > > > > > No, these methods are specific to particular > > data > > > > manager > > > > APIs, although I can imagine a number of data > > > > managers sharing an > > > > API like the one above. Note that > > > > IPersistentDataManager.py is > > > > an interface for use by persistent objects. It > > does > > > > not include > > > > all data-manager methods. Similarly, > > > > Transaction.IDataManager.IDataManager is the > > > > data-manager API > > > > used by the transaction framework. > > > > > > >And most storages like rdbms, ldap, xml has it. > > > > The most straightforward way to handle queries is by > > creating query data > > managers, which take OIDs that represent the > > parameters of the query. > > >Most of the time people retrive object by attributes. >not by OID. Right. So define a query manager that takes the attributes as fields in an OID, and returns a persistent object that represents a sequence of records. e.g. for object in someQueryMgr[ ('param1value','param2value') ]: ... All you need is a separate query manager for each (parameterized) query your app needs -- and again, there's nothting stopping you from generating your own via metadata or even from OQL if that's your heart's desire. > > I agree with Jim that none of this stuff is needed > > in the interface that a > > data manager exposes to persistent objects. This > > stuff would be in a data > > manager's application-level interface, and I don't > > see any need for > > standardization there; that's an area for value-add > > by competing > > persistence mechanisms and data manager > > implementations. Any > > standardization of them now would be > > counter-productive, I think. > > >It's already exist. Look at the JDO. In java land OR >Mappers popular because instead of learning every time >different api/query language you can use odmg/oql >against rdmbs, odbms, ldap, xml, you name it. >It is major selling point for "generic persistance" >toolkit. I'd have to disagree with you there. There is very little commonality between Java data mappers; many offer some sort of OQL dialect, but they vary in so many other aspects of their implementations and usage that calling them standardized would be a joke. Please note that specific persistence mechanisms and query languages -- especially any kind of "generic persistence toolkit" -- are completely out of scope for this SIG's goals. We want to standardize the *basis* for you to create your *own* persistence mechanisms, query languages, and so on. The SIG will not be creating any code that actually talks to any kind of database, nor supplies any kind of data management API. To the best of my understanding, the SIG's charter is focused on these interfaces: * The interface which objects to be persisted must supply to their data manager * The interface which data managers must supply to their persistent objects * The interface which transaction participants must supply to a transaction * The interface which transaction objects supply to their participants * The interface which transaction objects supply to an application The items you are talking about are not a part of any of these interfaces. From iiourov@yahoo.com Tue Jul 23 21:16:32 2002 From: iiourov@yahoo.com (Ilia Iourovitski) Date: Tue, 23 Jul 2002 13:16:32 -0700 (PDT) Subject: [Persistence-sig] "Straw Baby" Persistence API In-Reply-To: <5.1.0.14.0.20020723150312.04f70390@mail.telecommunity.com> Message-ID: <20020723201632.89700.qmail@web20709.mail.yahoo.com> --- "Phillip J. Eby" wrote: > At 11:35 AM 7/23/02 -0700, Ilia Iourovitski wrote: > > >--- "Phillip J. Eby" wrote: > > > At 10:08 AM 7/23/02 -0700, Ilia Iourovitski > wrote: > > > > > > > > > > load(object type, object id)->object > > > > > > > > > > An object type should be unnecessary. If a > data > > > > > manager > > > > > needs to track this sort of information, it > > > should > > > > > embed it in the object id. > > > > > > > >In rdbms case id usually integer. adding the > whole > > > >package/class name can be expensive. > > > > > > This is easily addressed by using separate data > > > managers for each table or > > > other base class type. No need to carry the > type in > > > the object ID. > > > > >You mean one data manager per table. Too much. > > Why? I could simply do something like this: > > class MyDBManager: > > def __init__(self, sqlconn): > self.table1 = TableDBManager("table1", > sqlconn, ...) > self.table2 = TableDBManager("table2", > sqlconn, ...) > ... > > myDB = MyDBManager(someSQLconnection) > > And then refer to myDB.table1['someKey'] to load an > object. This doesn't > seem like "too much", especially since you could > generate the individual > DM's based on metadata. > > At any rate, this is pretty much the approach I > intend to use myself, > except that using PEAK eliminates the need for the > __init__ method. > > > > > > > > > query(string, parameters)->list of objects > or > > > > > smart > > > > > > collection > > > > > > > > > > > > Those methods can be placed in > > > > > > Persistence/IPersistentDataManager.py > > > > > > > > > > No, these methods are specific to particular > > > data > > > > > manager > > > > > APIs, although I can imagine a number of > data > > > > > managers sharing an > > > > > API like the one above. Note that > > > > > IPersistentDataManager.py is > > > > > an interface for use by persistent objects. > It > > > does > > > > > not include > > > > > all data-manager methods. Similarly, > > > > > Transaction.IDataManager.IDataManager is the > > > > > data-manager API > > > > > used by the transaction framework. > > > > > > > > >And most storages like rdbms, ldap, xml has it. > > > > > > The most straightforward way to handle queries > is by > > > creating query data > > > managers, which take OIDs that represent the > > > parameters of the query. > > > > >Most of the time people retrive object by > attributes. > >not by OID. > > Right. So define a query manager that takes the > attributes as fields in an > OID, and returns a persistent object that represents > a sequence of > records. e.g. > > for object in someQueryMgr[ > ('param1value','param2value') ]: > ... > > All you need is a separate query manager for each > (parameterized) query > your app needs -- and again, there's nothting > stopping you from generating > your own via metadata or even from OQL if that's > your heart's desire. > > > > > I agree with Jim that none of this stuff is > needed > > > in the interface that a > > > data manager exposes to persistent objects. > This > > > stuff would be in a data > > > manager's application-level interface, and I > don't > > > see any need for > > > standardization there; that's an area for > value-add > > > by competing > > > persistence mechanisms and data manager > > > implementations. Any > > > standardization of them now would be > > > counter-productive, I think. > > > > >It's already exist. Look at the JDO. In java land > OR > >Mappers popular because instead of learning every > time > >different api/query language you can use odmg/oql > >against rdmbs, odbms, ldap, xml, you name it. > >It is major selling point for "generic persistance" > >toolkit. > > I'd have to disagree with you there. There is very > little commonality > between Java data mappers; many offer some sort of > OQL dialect, but they > vary in so many other aspects of their > implementations and usage that > calling them standardized would be a joke. > Both Castor and Object Bridge support odmg and oql. Object Bridge is going to be "db.apache.org standard". JDO specification is close to odmg. > Please note that specific persistence mechanisms and > query languages -- > especially any kind of "generic persistence toolkit" > -- are completely out > of scope for this SIG's goals. We want to > standardize the *basis* for you > to create your *own* persistence mechanisms, query > languages, and so > on. The SIG will not be creating any code that > actually talks to any kind > of database, nor supplies any kind of data > management API. To the best of > my understanding, the SIG's charter is focused on > these interfaces: > > * The interface which objects to be persisted must > supply to their data manager > > * The interface which data managers must supply to > their persistent objects > > * The interface which transaction participants must > supply to a transaction > > * The interface which transaction objects supply to > their participants > > * The interface which transaction objects supply to > an application > > The items you are talking about are not a part of > any of these interfaces. > __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From pje@telecommunity.com Wed Jul 24 00:33:46 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 23 Jul 2002 19:33:46 -0400 Subject: [Persistence-sig] Is threaded access to persistent objects in scope? Message-ID: <5.1.0.14.0.20020723192604.050ad680@mail.telecommunity.com> Does anybody have any use cases for multi-thread access to the same persistent object? ZODB explicitly denies such thread-safety, making each thread responsible for maintaining a separate object cache, or otherwise synchronizing access, and thus avoiding locking issues and all the associated complexity. I don't have any need to change this, personally; I'm happy staying as far away from threading issues as possible. But does anybody have any *concrete* use cases where threaded access to the *same* object is a necessity? By same, I mean the identical object pointer, rather than a copy of the object loaded specifically for that thread? I haven't managed to come up with any use cases that wouldn't be better handled using message or event queues, or something like the Linda "tuplespace". By the way, when I say "concrete", I mean that saying "oh, that sounds terrible for performance and language X doesn't do it that way" is not a "concrete" use case. :) Thanks! From pje@telecommunity.com Wed Jul 24 02:01:24 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 23 Jul 2002 21:01:24 -0400 Subject: [Persistence-sig] A simple Observation API Message-ID: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> I've taken the time this evening to draft a simple Observation API, and an implementation of it. It's not well-documented, but the API should be fairly clear from the example below. Comments and questions encouraged. Note that this draft doesn't deal with any threading issues whatsoever. It also doesn't try to address the possibility that an observer might throw an exception when it's given a notification during a 'finally' clause that closes a beginWrite/endWrite pair. If anybody has suggestions for how to handle these situations, please let me know. By the way, my informal tests show that subclassing Observable makes an object's attribute read access approximately 11 times slower than normal, even if no actual observation is taking place (i.e., an _o_readHook is not set). I have not yet done a timing comparison for write operations and method calls, but I expect the slowdown to be as bad, or worse. Rewriting Observation.py in C, using structure slots for many of the attributes, would probably eliminate most of these slowdowns, at least for unobserved instances. Of course, any operations actually performed by a change observer or read hook, would add their own overhead, in addition to the raw observation overhead. This is a fairly "transparent" API, although it still requires the user to subclass a specific base, and declare which mutable attributes are touched by what methods. But it is less invasive, in that observation-specific code does not need to be incorporated into the methods themselves. One possible enhancement to this framework: use separate observer lists for the beforeChange() and afterChange() events, and make them simple callables instead of objects with obvservation methods. While this would require an additional attribute, it would simplify the process of creating dynamic activation methods, and reduce calls in situations where only one event needed to be captured. This could be useful for setting up observation on a mutable attribute so as to "wire" it to trigger change events on the object(s) that contained it. Anyway, here's the demo, followed by the module itself. #### Demo of observation API #### from Observation import Observable, WritingMethod class aSubject(Observable): def __init__(self): self.spam = [] # __init__ touches spam, but shouldn't notify anyone about it __init__ = WritingMethod(__init__, ignore=['spam']) def addSpam(self,spam): self.spam.append(spam) # addSpam touches spam, even though it doesn't set the attribute addSpam = WritingMethod(addSpam, attrs=['spam']) def setFoo(self, foo): self.foo = foo self.bar = 3*foo # setFoo modifies multiple attributes, and should send at most # one notice of modification, upon exiting. setFoo = WritingMethod(setFoo) class anObserver(object): def beforeChange(self, ob): print ob,"is about to change" def afterChange(self, ob, attrs): print ob,"changed",attrs def getAttr(self, ob, attr): print "reading",attr,"of",ob return object.__getattribute__(ob,attr) subj = aSubject() obs = anObserver() subj._o_changeObservers = (obs,) subj._o_readHook = obs.getAttr subj.setFoo(9) print subj.bar subj.addSpam('1 can') ##### End sample code ##### #### Observation.py #### __all__ = ['Observable', 'WritingMethod', 'getAttr', 'setAttr', 'delAttr'] getAttr = object.__getattribute__ setAttr = object.__setattr__ delAttr = object.__delattr__ class Observable(object): """Object that can send read/write notifications""" _o_readHook = staticmethod(getAttr) _o_nestCount = 0 _o_changedAttrs = () _o_observers = () def _o_beginWrite(self): """Start a (possibly nested) write operation""" ct = self._o_nestCount self._o_nestCount = ct + 1 if ct: return for ob in self._o_changeObservers: ob.beforeChange(self) def _o_endWrite(self): """Finish a (possibly nested) write operation""" ct = self._o_nestCount = self._o_nestCount - 1 if ct: return ca = self._o_changedAttrs if ca: del self._o_changedAttrs for ob in self._o_changeObservers: ob.afterChange(self,ca) def __getattribute__(self,attr): """Return an attribute of the object, using a read hook if available""" if attr.startswith('_o_') or attr=='__dict__': return getAttr(self,attr) return getAttr(self,'_o_readHook')(self, attr) def __setattr__(self,attr,val): if attr.startswith('_o_') or attr=='__dict__': setAttr(self,attr,val) else: self._o_beginWrite() try: ca = self._o_changedAttrs if attr not in ca: self._o_changedAttrs = ca + (attr,) setAttr(self,attr,val) finally: self._o_endWrite() def __delattr__(self,attr): if attr.startswith('_o_') or attr=='__dict__': delAttr(self,attr) else: self._o_beginWrite() try: ca = self._o_changedAttrs if attr not in ca: self._o_changedAttrs = ca + (attr,) delAttr(self,attr) finally: self._o_endWrite() from new import instancemethod class WritingMethod(object): """Wrap this around a function to handle write observation automagically""" def __init__(self, func, attrs=(), ignore=()): self.func = func self.attrs = tuple(attrs) self.ignore = tuple(ignore) def __get__(self, ob, typ=None): if typ is None: typ = type(ob) return instancemethod(self, ob, typ) def __call__(self, inst, *args, **kwargs): attrs, remove = self.attrs, self.ignore inst._o_beginWrite() try: if attrs or remove: ca = inst._o_changedAttrs remove = [(r,1) for r in remove if r not in ca] inst._o_changedAttrs = ca + attrs return self.func(inst, *args, **kwargs) finally: if remove: inst._o_changedAttrs = tuple( [a for a in inst._o_changedAttrs if a not in remove] ) inst._o_endWrite() From pje@telecommunity.com Wed Jul 24 02:09:35 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 23 Jul 2002 21:09:35 -0400 Subject: [Persistence-sig] Clarification: A simple Observation API In-Reply-To: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> Message-ID: <5.1.0.14.0.20020723210437.061d4030@mail.telecommunity.com> At 09:01 PM 7/23/02 -0400, Phillip J. Eby wrote: >class aSubject(Observable): > > .... > > def setFoo(self, foo): > self.foo = foo > self.bar = 3*foo > > # setFoo modifies multiple attributes, and should send at most > # one notice of modification, upon exiting. > setFoo = WritingMethod(setFoo) Just a quick clarification on the demo code... if a method only sets one attribute, or if it sets multiple attributes, but you don't care about consolidating the change events, it's not necessary to declare the method a WritingMethod. In that case, the __setattr__ hook will issue change events for each attribute set, unless the method is being called by a WritingMethod, either directly or indirectly. Use of a WritingMethod wrapper is only required for methods that set attributes and need the changes to be ignored, or which manipulate mutable attributes without actually setting attributes on the instance. Any other use is optional at the implementor's discretion. From donnalcwalter@yahoo.com Wed Jul 24 09:12:35 2002 From: donnalcwalter@yahoo.com (Donnal Walter) Date: Wed, 24 Jul 2002 01:12:35 -0700 (PDT) Subject: [Persistence-sig] Naive questions about getting and setting In-Reply-To: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> Message-ID: <20020724081235.31632.qmail@web13906.mail.yahoo.com> 1. Would naive (and rather application specific) questions such as these be better posed to comp.lang.python? If so, I would happily comply in the future. 2. In regard to the persistence API and especially in regard to observation, would someone please point out the pitfalls of using so-called "setter" and "getter" methods in attribute classes themselves, as opposed to __setattribute__ and __getattribute__ methods in the container classes? I am in no way proposing this as a general solution, but if in a given situation one wanted to set up a scheme similar to that coded below, what would be the major liabilities? Is it simply a matter of lack of transparency? Or is there also a serious problem with decreased efficiency? ======= class Cell(object): def __init__(self, *args): """Arguments, if any, must be references to other cells.""" self.__value = None # the scalar value of the Cell self.__observers = [] # list of observers if len(args) > 0: # if this Cell is dependent self.ref = args # save the list of references for i in self.ref: # for every external ref i.AddObserver(self) # register as an observer self.Update() # set initial from refs else: # if this Cell is independent self.Reset(content) # simply reset its value def AddObserver(self, observer): if observer not in self.__observers: self.__observers.append(observer) def _setValue(self, value): if value != self.__value: self.__value = value for o in self.__observers: o.Update() def _getValue(self): return self.__value def Set(self, input): try: # make sure input can be converted self._setValue(self.Encode(input)) except ValueError: # if incompatible input value, reinitialize self.Reset() def Get(self): return self.Decode(self._getValue()) def Encode(value): """ override to change Cell type""" return value def Decode(value): """ override to change Cell type""" return value def Update(self): """ Override in observer Cells. (Observers have access to the self.ref list.) """ pass def Reset(self): """ may be overridden to change default value""" self._setValue('') ======= ===== Donnal Walter Arkansas Children's Hospital __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From Sebastien.Bigaret@inqual.com Fri Jul 26 14:54:11 2002 From: Sebastien.Bigaret@inqual.com (Sebastien Bigaret) Date: 26 Jul 2002 15:54:11 +0200 Subject: [Persistence-sig] A simple Observation API In-Reply-To: "Phillip J. Eby"'s message of "Tue, 23 Jul 2002 21:01:24 -0400" References: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> Message-ID: <873cu6a930.fsf@bidibule.brest.inqual.bzh> Now that the code has shown up, I have some comments :) --just kidding, I was just to busy to read the list since the beginning of the week, now I read the whole stuuf ; I must admit I did not understand in details every single arguments and points you discussed about, so I might have questions for some stuff that were answered but that I didn't understand. Here are some notes & questions I had while reading: About caching and caching policies: Phillip did talk about 'transactional caching' and I'm not sure what it really is, however, there is some needs to have 'application-wide' caching mechanism to avoid unnecessary round-trips to the DB. Of course, this should not defeat the 'smallest-possible-memory-footprint-requirement' pointed out in the sig charter ; but if an object has already been fetched somewhere (and is still active in an other thread, or the cache/snapshots would have been deleted), then it is usually unnecessary to re-fetch the object, simply use the cached snapshot instead. But this sounds to me a bit off-topic for this list. +1 on defining a state model for persistent objects ; however I'm a little fuzzy about the difference between 'unsaved' and 'changed'. To my understanding 'unsaved' is for new objects, while 'changed' is for existing (previously made persistent objects, is this right? About RDBMS: I'm ok with what has been said ; I agree that most of the work has to be done at DM-level ; observability, as shown in the demo. code, seems sufficient for most purposes. The only thing I cannot see how it can be done is: Ilia> create(object) storage shall populated id from rdbms Ilia> which is usually primary key. Jim> This should not be necessary. One should be able to Jim> design a data manager that detected new objects and Jim> assigned them ids when referencing objects are created. Can you elaborate on that? More on the Observation API: > Note that this draft doesn't deal with any threading issues whatsoever. You asked earlier "does anybody have any *concrete* use cases where threaded access to the *same* object is a necessity?". The only use-case I can think about is when you have a pool of objects shared by all threads (e.g. to avoid unnecessary round-trips to a DB for accessing mostly-read-only objects), where it is possible that other objects, loaded/copied specifically for each thread, can have references (relationships) to shared objects. I'm not saying, however, that threading issues **should** be addressed because of this, i have the feeling that, if you want such a feature, you can afford the extra effort to make these shared objects thread-safe (e.g. reentrant locks are ok as long as you access the objects' attributes using getters/setters and not directly). > This is a fairly "transparent" API, although it still requires the user to > subclass a specific base, and declare which mutable attributes are touched > by what methods. But it is less invasive, in that observation-specific > code does not need to be incorporated into the methods themselves. It looks pretty to my eyes, indeed. Last question to make sure I did not miss an important point: after having read all your messages and after I had a look on the Persistence package in Zope3, this is how I understand the ``unghostification'' of an object: it holds a flag saying whether it is a ghost, and has a special attribute, _p_datamanager. The Persistent object has a _p_activate() method which in turn calls the setstate() method an the IPersistentDataManager.setstate() ; this is triggered automatically. Is that it? Can someone be more explicit about when this is triggered? I tried to look at the C code but I'm not familiar at all with C code for python and couldnt get a clear answer. -- Sebastien. From guido@python.org Mon Jul 29 22:06:43 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 17:06:43 -0400 Subject: [Persistence-sig] Naive questions about getting and setting In-Reply-To: Your message of "Wed, 24 Jul 2002 01:12:35 PDT." <20020724081235.31632.qmail@web13906.mail.yahoo.com> References: <20020724081235.31632.qmail@web13906.mail.yahoo.com> Message-ID: <200207292106.g6TL6ij06410@pcp02138704pcs.reston01.va.comcast.net> > 1. Would naive (and rather application specific) questions such as > these be better posed to comp.lang.python? If so, I would happily > comply in the future. I don't know, but given the resounding silence in response to your email you may have drawn this conclusion yourself... :-) You may also want to read up on descriptors and other aspects of new types; I wrote a tutorial: http://www.python.org/2.2.1/descrintro.html > 2. In regard to the persistence API and especially in regard to > observation, would someone please point out the pitfalls of using > so-called "setter" and "getter" methods in attribute classes > themselves, as opposed to __setattribute__ and __getattribute__ > methods in the container classes? There is no __setattribute__; for historical reasons, there's __getattr__, __setattr__, and __getattribute__. If you have a few attributes that need special handling, and the rest don't, implementing them using descriptors is much preferred, because it doesn't slow down access to the other attributes. OTOH, if you need to trap *all* attributes (like Philip's Observable class), __getattribute__ and __setattr__ are the only way. > I am in no way proposing this as a general solution, but if in a > given situation one wanted to set up a scheme similar to that coded > below, what would be the major liabilities? Is it simply a matter > of lack of transparency? Or is there also a serious problem with > decreased efficiency? I'm afraid I don't understand what your example code is trying to do. It seems out of scope for this SIG. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Jul 29 22:14:15 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 17:14:15 -0400 Subject: [Persistence-sig] Was there a Persistence-BOF at OSCON? Message-ID: <200207292114.g6TLEF406429@pcp02138704pcs.reston01.va.comcast.net> So now that we're all safe back home, I'd like to hear what happened at the Persistence-BOF at OSCON, if it was actually held. (I was in a different meeting that night, and very tired, so I didn't even attempt to peek in.) I should probably report on the persistence breakfast meeting: it didn't happen, because Jim was delayed in Phoenix and the only two people showing up for breakfast were Patrick O'Brien and me. We discussed mostly other things on our minds, like PythonCard. I also note that, conform the predictions for any SIG, as soon as any code was posted, the discussion stopped. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Jul 29 22:18:43 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 17:18:43 -0400 Subject: [Persistence-sig] A simple Observation API References: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> Message-ID: <200207292118.g6TLIhm06453@pcp02138704pcs.reston01.va.comcast.net> Some questions about Phillip's Observable protocol. Wby does it have to be so complicated? E.g. if you have to do something special for method that touches an attribute without doing a setattr operation on it, why not have the magic be inside that method rather than declare a wrapper? (The wrapper looks like it is much more expensive than another way of flagging a change would be.) What exactly is the point of collapsing multiple setattr() ops together? Just performance? Or is there a semantic reason? If just performance, where is the time going that you're trying to save? What's the use case for declaring a method as "touches an attribute but that change should be ignored"? (If it's only __init__, a lighter-weight mechanism might be sufficient.) --Guido van Rossum (home page: http://www.python.org/~guido/) From pje@telecommunity.com Mon Jul 29 22:16:40 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Mon, 29 Jul 2002 17:16:40 -0400 Subject: [Persistence-sig] Was there a Persistence-BOF at OSCON? In-Reply-To: <200207292114.g6TLEF406429@pcp02138704pcs.reston01.va.comca st.net> Message-ID: <3.0.5.32.20020729171640.0089b100@telecommunity.com> At 05:14 PM 7/29/02 -0400, Guido van Rossum wrote: > >I also note that, conform the predictions for any SIG, as soon as any >code was posted, the discussion stopped. :-) I actually thought it was everybody but me went to OSCON. :) As for me, I was sick for a good part of last week, which is why I haven't yet replied to Donal Walter's post about property-like objects. (I've actually got something similar in PEAK, although it's actually a way of generating getters and setters... a specialized kind of descriptor that can add generated methods to the class it's contained it. A bit esoteric for this list's purposes though.) From pje@telecommunity.com Mon Jul 29 22:37:45 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Mon, 29 Jul 2002 17:37:45 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: <200207292118.g6TLIhm06453@pcp02138704pcs.reston01.va.comca st.net> References: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> Message-ID: <3.0.5.32.20020729173745.008a0240@telecommunity.com> At 05:18 PM 7/29/02 -0400, Guido van Rossum wrote: >Some questions about Phillip's Observable protocol. Wby does it have >to be so complicated? E.g. if you have to do something special for >method that touches an attribute without doing a setattr operation on >it, why not have the magic be inside that method rather than declare a >wrapper? (The wrapper looks like it is much more expensive than >another way of flagging a change would be.) It's only for event compression, otherwise putting a simple flag operation in the method would indeed be more lightweight. Of course, I'm pretty sure I could write a bytecode-hacking version that would recode the underlying method to include the necessary wrapping code around its body, making it just as fast as putting the code inline. But I didn't want to put that much effort into an example. :) >What exactly is the point of collapsing multiple setattr() ops >together? Just performance? Or is there a semantic reason? If just >performance, where is the time going that you're trying to save? Semantics plus performance. The semantic part is that some "database" systems (e.g. LDAP) inherently don't support transactions, AND must receive a semantically valid set of attributes in a single update operation. I may be overgeneralizing this aspect, however. The performance save is for situations like Tim Peters' distributed cache example. If a change notification is going to cause network traffic, it would be a good idea to minimize the number of such notifications. It's a common situation (IMHO) to change multiple attributes in a set of related methods, so this supports that scenario while ensuring a minimal set of update events are issued. >What's the use case for declaring a method as "touches an attribute >but that change should be ignored"? (If it's only __init__, a >lighter-weight mechanism might be sufficient.) I discovered the __init__ issue when I went to write the example code, and adding an ignore list seemed like the simplest way to solve it quickly without adding a metaclass or something else special to handle __init__. Also, I know I've frequently written classes which do the bulk of their attribute setup in methods other than __init__, and imagine others do as well. These days I use PEAK attribute binding descriptors that automatically initialize attributes on first-use, instead, but I wrote the Observable example assuming "plain-jane", "mainstream" Python with no special metaclasses or the like. In general, as to the features of the API, I wrote this mostly based on the use cases that other folks had, although I'm certainly not against having event compression. :) My own requirements in the API are only the changeable "get" hook, and that notification of writes takes place after the modifications. The idea of using method wrappers to incorporate the metadata about what attributes are modified, was an attempt to help mask implementation details from the "naive" user. It seemed to me a less invasive form of "dead chicken waving", and also allowed for alternative implementation strategies for the observable's internal mechanism. From pobrien@orbtech.com Mon Jul 29 22:45:42 2002 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Mon, 29 Jul 2002 16:45:42 -0500 Subject: [Persistence-sig] Was there a Persistence-BOF at OSCON? In-Reply-To: <200207292114.g6TLEF406429@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido van Rossum] > > So now that we're all safe back home, I'd like to hear what happened > at the Persistence-BOF at OSCON, if it was actually held. (I was in a > different meeting that night, and very tired, so I didn't even attempt > to peek in.) > > I should probably report on the persistence breakfast meeting: it > didn't happen, because Jim was delayed in Phoenix and the only two > people showing up for breakfast were Patrick O'Brien and me. We > discussed mostly other things on our minds, like PythonCard. I was afraid the same thing would happen again on Thursday, with Jim and I as the only participants. (Of course, I would have enjoyed a one-on-one conversation with the ZopePope to balance out the very nice one-on-one I got to have with the Python BDFL.) Thankfully, the Persistence BOF went much better than I feared. There were about 8 of us in attendance, the discussion was productive, and we filled the entire time from 8pm to 10pm. I took notes and intend to submit a mini report to the list as soon as I get caught up with other items. Expect to hear from me no later than the end of this week. I'm still optimistic that this SIG can reach consensus and produce a useful Persistence foundation. -- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python programming expertise." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien ----------------------------------------------- From guido@python.org Mon Jul 29 22:56:53 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 17:56:53 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: Your message of "Mon, 29 Jul 2002 17:37:45 EDT." <3.0.5.32.20020729173745.008a0240@telecommunity.com> References: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> <3.0.5.32.20020729173745.008a0240@telecommunity.com> Message-ID: <200207292156.g6TLusI06618@pcp02138704pcs.reston01.va.comcast.net> > At 05:18 PM 7/29/02 -0400, Guido van Rossum wrote: > >Some questions about Phillip's Observable protocol. Wby does it > >have to be so complicated? E.g. if you have to do something > >special for method that touches an attribute without doing a > >setattr operation on it, why not have the magic be inside that > >method rather than declare a wrapper? (The wrapper looks like it > >is much more expensive than another way of flagging a change would > >be.) > > It's only for event compression, otherwise putting a simple flag > operation in the method would indeed be more lightweight. Of > course, I'm pretty sure I could write a bytecode-hacking version > that would recode the underlying method to include the necessary > wrapping code around its body, making it just as fast as putting the > code inline. But I didn't want to put that much effort into an > example. :) I hope you were really only joking. Hacking bytecode is inexcusable mixing of abstraction levels. > >What exactly is the point of collapsing multiple setattr() ops > >together? Just performance? Or is there a semantic reason? If > >just performance, where is the time going that you're trying to > >save? > > Semantics plus performance. The semantic part is that some > "database" systems (e.g. LDAP) inherently don't support > transactions, AND must receive a semantically valid set of > attributes in a single update operation. I may be overgeneralizing > this aspect, however. I'm guessing that you'll have to do this differently anyway, e.g. cache all changes and them force them out all at one with a commit() operation. > The performance save is for situations like Tim Peters' distributed > cache example. If a change notification is going to cause network > traffic, it would be a good idea to minimize the number of such > notifications. It's a common situation (IMHO) to change multiple > attributes in a set of related methods, so this supports that > scenario while ensuring a minimal set of update events are issued. Isn't there a way to do this in a less obtrusive way, e.g. by buffering? I don't know much of this application area, but the mechanism you are proposing looks very heavy-handed. I would expect that in a realistic system, most methods would grow wrappers. And this *still* doesn't prevent bugs like updating a list attribute by calling its append() method without somehow flagging this operation. (Flagging changes at the method call level seems too course-grained. What about a method that only occasionally makes a change to a given attribute?) > >What's the use case for declaring a method as "touches an attribute > >but that change should be ignored"? (If it's only __init__, a > >lighter-weight mechanism might be sufficient.) > > I discovered the __init__ issue when I went to write the example > code, and adding an ignore list seemed like the simplest way to > solve it quickly without adding a metaclass or something else > special to handle __init__. Also, I know I've frequently written > classes which do the bulk of their attribute setup in methods other > than __init__, and imagine others do as well. These days I use PEAK > attribute binding descriptors that automatically initialize > attributes on first-use, instead, but I wrote the Observable example > assuming "plain-jane", "mainstream" Python with no special > metaclasses or the like. That's good, because I have no idea what PEAK is. :-) Anyway, if the special handling is mostly for __init__ (or things it calls), then a metaclass could make the notation a bit prettier. > In general, as to the features of the API, I wrote this mostly based > on the use cases that other folks had, although I'm certainly not > against having event compression. :) My own requirements in the API > are only the changeable "get" hook, and that notification of writes > takes place after the modifications. > > The idea of using method wrappers to incorporate the metadata about > what attributes are modified, was an attempt to help mask > implementation details from the "naive" user. It seemed to me a > less invasive form of "dead chicken waving", and also allowed for > alternative implementation strategies for the observable's internal > mechanism. Maybe it's just too early to start proposing code? Or has the discussion already moved to IRC? I'm curious about why the message flow just stopped. --Guido van Rossum (home page: http://www.python.org/~guido/) From pje@telecommunity.com Mon Jul 29 23:20:03 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Mon, 29 Jul 2002 18:20:03 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: <200207292156.g6TLusI06618@pcp02138704pcs.reston01.va.comca st.net> References: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> <3.0.5.32.20020729173745.008a0240@telecommunity.com> Message-ID: <3.0.5.32.20020729182003.007d2d30@telecommunity.com> At 05:56 PM 7/29/02 -0400, Guido van Rossum wrote: > >Hacking bytecode is inexcusable mixing of abstraction levels. Huh? >Isn't there a way to do this in a less obtrusive way, e.g. by >buffering? I don't know much of this application area, but the >mechanism you are proposing looks very heavy-handed. I would expect >that in a realistic system, most methods would grow wrappers. Yep. The problem with buffering is, if you're trying to allow for cascaded storage, e.g. persisting to an XML document which is persisted in a database... You then end up having to have some kind of explicit ordering that occurs between transaction participants. Unfortunately, this application area in general is one where the total amount of complexity can only be moved from one place to another, and not actually reduced by much, at least if you're trying to maintain generality. :( I was trying to keep a more or less even balance of complexity between the persistent objects, the data managers, and the transaction object. >And >this *still* doesn't prevent bugs like updating a list attribute by >calling its append() method without somehow flagging this operation. Right. If we could catch *that*, then there wouldn't be any need for a special API in the first place! :) Of course, it could be done by having the setattr trap assignments of mutables to attributes in the first place, and having the observer subscribe to notifications from the mutable, with an annotation that it's actually a modification to the "owner". But this leads to a new set of questions like "what's mutable?", and what kind of performance degradation ensues if your normal practice is to keep re-assigning new values to a an attribute of type "list". :) Oh, and let's not forget the overhead of un-tracking observer subscriptions when the attribute is overwritten or deleted... Ugh. It seems that the only really *simple* way to address this issue "once and for all" would be to disallow assignment of non-persistent objects to the attributes of persistent objects, except for a small set of known immutable types, such as numbers, strings, and tuples. This could be trivially trapped with an isinstance() check in setattr against say, (int,str,unicode,float,complex,tuple,Persistent). It would then be impossible to make this kind of mistake... unless of course you use a single-element tuple containing a mutable... Argh!!!! >Maybe it's just too early to start proposing code? I was actually trying to propose an API, not an implementation. I originally started trying to write text to explain the API, as I did with my Transaction API proposal, but found it more difficult in this instance than writing code. The ideas being proposed in the API were really just the event mechanism, the getattr hook, metadata declaration, and event compression. >Or has the discussion already moved to IRC? Eh? >I'm curious about why the message flow just stopped. I really did think it was OSCON. A lot of the other posters (e.g. you, Jim, and Jeremy) were gone this last week, yes? From jeremy@alum.mit.edu Tue Jul 30 01:23:29 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 29 Jul 2002 20:23:29 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: <3.0.5.32.20020729173745.008a0240@telecommunity.com> References: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> <3.0.5.32.20020729173745.008a0240@telecommunity.com> Message-ID: <15685.56449.750240.468121@slothrop.zope.com> >>>>> "PJE" == Phillip J Eby writes: [GvR asks the question that puzzles me too:] >> What exactly is the point of collapsing multiple setattr() ops >> together? Just performance? Or is there a semantic reason? If >> just performance, where is the time going that you're trying to >> save? PJE> Semantics plus performance. The semantic part is that some PJE> "database" systems (e.g. LDAP) inherently don't support PJE> transactions, AND must receive a semantically valid set of PJE> attributes in a single update operation. I may be PJE> overgeneralizing this aspect, however. PJE> The performance save is for situations like Tim Peters' PJE> distributed cache example. If a change notification is going PJE> to cause network traffic, it would be a good idea to minimize PJE> the number of such notifications. It's a common situation PJE> (IMHO) to change multiple attributes in a set of related PJE> methods, so this supports that scenario while ensuring a PJE> minimal set of update events are issued. I remain convinced that the current mechanism ought to work. Perhaps I just needed to be convinced otherwise, but I don't think these cases are worked out in enough detail to be convincing. I also think the semantics of the proposed alternative makes it harder on the users, presumably in order to make the infrastructure's job easier. I'm thinking about a complex data structure implemented using many helper methods. If the data structure is modified inside a helper message, it can't mark the object changed; it needs to wait for the top-level operation to finish. As a result, the data structure would need to keep a separate flag to indicate whether it should be marked as changed later. Then the methods that are "top-level" needed to be edited to check that flag and set _p_changed. It's worse, though, because you might want to implement one "top-level" operation by calling another top-level operation. That would require the introduction of extra wrappers around the public versions of methods that just do bookkeeping, so that the internal routines could call other internal routines. The complexity aside, I don't understand why the transaction framework isn't sufficient to handle the two examples you mention above. LDAP does not support transactions, but does expect to get consistent updates. A transaction provides, among other things, the consistency. It should be possible to delay updates to the LDAP database until the transaction commits. The fact that LDAP does not participate in two-phase commit limits its robustness, but should not affect consistency. (Specifically, I mean that a transaction may fail in the final stage of the two-phase commit with this sort of data manager.) The distributed cache examples seems to be the same. If there are multiple udpates, delay sending any of the updates until the transaction commits. It might abort, after all, and then no updates need to sent; this is just the atomic property of transactions. The two examples seem to need the A and C of ACID transactions, so why not use them? Proper nested transactions should make the current mechanism even cleaner. Some methods of an object may want to have ACID semantics. They can operate as a subtransaction, with all-or-nothing updates to the object state provided that the top-level transaction commits. I think a simple boolean flag, _p_changed, is all the change notification we need when combined with transactions. Jeremy From jeremy@alum.mit.edu Tue Jul 30 01:36:51 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 29 Jul 2002 20:36:51 -0400 Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net> References: <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <3.0.5.32.20020719120237.00898b60@telecommunity.com> <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15685.57251.14632.949497@slothrop.zope.com> Last week, I worked out a revised transaction API for user code and for data managers. It's implemented in ZODB4, but is fairly preliminary code. I imagine we'll revise it further, but I'd like to describe the changes briefly. Here's a short summary from the ZODB4/Doc/changes.txt document: The Transaction implementation has been completely overhauled. There are four significant changes that users may need to cope with. The first is that a transaction that fails because of an uncaught exception is not aborted. The user code should explicitly call get_transaction().abort(). The second is that commit() does not take an optional argument to flag subtransaction commits. Instead, call the savepoint() method. ZODB will return a rollback object from savepoint(). If the rollback object's rollback() method is called, it will abort the savepoint() -- rolling back changes to the previous savepoint or the start of the transaction. The other changes to the Transaction implementation affect implementors of resource / data managers. The ZODB Connection object is an example of a data manager. When the persistence machinery detects that an object has been modified, the register() method is called on its data manager. It's up to the data manager to register with the transaction. The manager is registered with the transaction, not the individual objects. The interface the data manager implements (IDataManager) has changed. It should implement four methods: prepare(), abort(), commit(), and savepoint(). Here is how they correspond to the odl API: prepare() is roughly equivalent to tpc_begin() through tpc_vote(). abort() and commit() are roughly equivalent to tpc_abort() and tpc_finish(). savepoint() is used for subtransactions. The APIs look like this: class ITransaction(Interface): """Transaction objects.""" def abort(): """Abort the current transaction.""" def begin(): """Begin a transaction.""" def commit(): """Commit a transaction.""" def join(resource): """Join a resource manager to the current transaction.""" def status(): """Return status of the current transaction.""" class IDataManager(Interface): """Data management interface for storing objects transactionally.""" def prepare(transaction): """Begin two-phase commit of a transaction. DataManager should return True or False. """ def abort(transaction): """Abort changes made by transaction.""" def commit(transaction): """Commit changes made by transaction.""" def savepoint(transaction): """Do tentative commit of changes to this point. Should return an object implementing IRollback """ class IRollback(Interface): def rollback(): """Rollback changes since savepoint.""" I think the rollback mechanism will work well enough. Gray and Reuter explain that it can be used to simulate a nested transaction architecture. Thus, I think it's a reasonable building block for the nested transaction API. I think I'm also in favor of the new abort semantics. ZODB3 would abort the transactions -- call abort() on all the data managers -- if an error occurred during a commit. The new code requires that the user do this instead. I think that's better, because it leaves the state of the objects intact if the code wants to analyze what went wrong before retrying the transaction. Note that a Transaction doesn't have a register method. Instead, a modified object calls register() on its data manager. The data manager can join() that transaction if that's the right thing to do. The ZODB Connection joins on the first register call of the transaction. However, I currently have join() on the transaction, not the Transaction.Manager (aka TP monitor). I'm in favor of sticking with register() as the persistent method, although notify() would be okay, too. I imagine that some data managers would want to be notified when an object is read or written. In that case, I'm not sure if notify() is enough; we might want a notify method for each kind of event or a notify() method with the event as an argument. (The need for notify-on-read, BTW, is to support higher isolation levels than ZODB currently supports.) Jeremy From shane@zope.com Tue Jul 30 03:46:27 2002 From: shane@zope.com (Shane Hathaway) Date: Mon, 29 Jul 2002 22:46:27 -0400 (EDT) Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <15685.57251.14632.949497@slothrop.zope.com> Message-ID: On Mon, 29 Jul 2002, Jeremy Hylton wrote: > Last week, I worked out a revised transaction API for user code and > for data managers. It's implemented in ZODB4, but is fairly > preliminary code. I imagine we'll revise it further, but I'd like to > describe the changes briefly. This is great work. > (snip) > > The APIs look like this: > > class ITransaction(Interface): > """Transaction objects.""" > > def abort(): > """Abort the current transaction.""" > > def begin(): > """Begin a transaction.""" > > def commit(): > """Commit a transaction.""" > > def join(resource): > """Join a resource manager to the current transaction.""" By "resource manager" do you mean "IDataManager"? > > def status(): > """Return status of the current transaction.""" What kind of object would status() return? Who might make use of it? Also, I'd like to see some way to set transaction metadata. > class IDataManager(Interface): > """Data management interface for storing objects transactionally.""" > > def prepare(transaction): > """Begin two-phase commit of a transaction. > > DataManager should return True or False. > """ > > def abort(transaction): > """Abort changes made by transaction.""" > > def commit(transaction): > """Commit changes made by transaction.""" > > def savepoint(transaction): > """Do tentative commit of changes to this point. > > Should return an object implementing IRollback > """ I would like this interface to be called ITransactionParticipant. There are many interesting kinds of objects that would be interested in participating in a transaction, and not all of them have the immediate responsibility of storing data. But the names you chose for the methods are very clear and concise, I think. > class IRollback(Interface): > > def rollback(): > """Rollback changes since savepoint.""" > > I think the rollback mechanism will work well enough. Gray and Reuter > explain that it can be used to simulate a nested transaction > architecture. Thus, I think it's a reasonable building block for the > nested transaction API. Making rollback operations into objects is a little surprising, but as I don't fully understand the ideas behind nested transactions, I'm sure there's a reason for rollback objects to exist. :-) > I think I'm also in favor of the new abort semantics. ZODB3 would > abort the transactions -- call abort() on all the data managers -- if > an error occurred during a commit. The new code requires that the > user do this instead. I think that's better, because it leaves the > state of the objects intact if the code wants to analyze what went > wrong before retrying the transaction. > > Note that a Transaction doesn't have a register method. Instead, a > modified object calls register() on its data manager. The data > manager can join() that transaction if that's the right thing to do. > The ZODB Connection joins on the first register call of the > transaction. However, I currently have join() on the transaction, not > the Transaction.Manager (aka TP monitor). > > I'm in favor of sticking with register() as the persistent method, > although notify() would be okay, too. I imagine that some data > managers would want to be notified when an object is read or written. > In that case, I'm not sure if notify() is enough; we might want a > notify method for each kind of event or a notify() method with the > event as an argument. It seems to me that the data manager should register to receive specific notifications. Some data managers are only interested in knowing when an object is moving from "ghost" to "saved" and from "saved" to "changed" state (such as ZODB); others might want more events, like being notified the first time an object is read in a transaction or receiving notification of *every* attribute change. Supporting the extra events in C only incurs a speed penalty if the data manager requests those events. Shane From pje@telecommunity.com Tue Jul 30 13:27:56 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 30 Jul 2002 08:27:56 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: <15685.56449.750240.468121@slothrop.zope.com> References: <3.0.5.32.20020729173745.008a0240@telecommunity.com> <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> <3.0.5.32.20020729173745.008a0240@telecommunity.com> Message-ID: <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com> At 08:23 PM 7/29/02 -0400, Jeremy Hylton wrote: > >>>>> "PJE" == Phillip J Eby writes: > >I remain convinced that the current mechanism ought to work. Perhaps >I just needed to be convinced otherwise, but I don't think these cases >are worked out in enough detail to be convincing. [shrug]. As I said, I was attempting to propose something that fit things brought up by others, and supply a generally-useful Observation framework, usable for things besides the context of persistence and transactions. (Per Jim's suggestion that such an Observation framework would have greater motivation for a user to take advantage of it.) For me, in the persistence/transaction context, the post-change flag is sufficient. >I also think the semantics of the proposed alternative makes it harder >on the users, presumably in order to make the infrastructure's job >easier. I'm thinking about a complex data structure implemented using >many helper methods. If the data structure is modified inside a >helper message, it can't mark the object changed; it needs to wait for >the top-level operation to finish. As a result, the data structure >would need to keep a separate flag to indicate whether it should be >marked as changed later. Then the methods that are "top-level" needed >to be edited to check that flag and set _p_changed. It's worse, >though, because you might want to implement one "top-level" operation >by calling another top-level operation. That would require the >introduction of extra wrappers around the public versions of methods >that just do bookkeeping, so that the internal routines could call >other internal routines. Well, the example implementation I wrote took care of all of that, quite elegantly I thought. But for my purposes, it's sufficient as long as _p_changed is set after the last modification that occurs. It's okay if it's also set after previous modifications. It just must be set after the last modification, regardless of how many other times it's set. This requirement on my part has strictly to do with data managers that write to other data managers, in the context of the transaction API I proposed. From jeremy@alum.mit.edu Tue Jul 30 13:40:39 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Tue, 30 Jul 2002 08:40:39 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com> References: <3.0.5.32.20020729173745.008a0240@telecommunity.com> <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com> Message-ID: <15686.35143.15684.228405@slothrop.zope.com> >>>>> "PJE" == Phillip J Eby writes: PJE> Well, the example implementation I wrote took care of all of PJE> that, quite elegantly I thought. But for my purposes, it's PJE> sufficient as long as _p_changed is set after the last PJE> modification that occurs. It's okay if it's also set after PJE> previous modifications. It just must be set after the last PJE> modification, regardless of how many other times it's set. PJE> This requirement on my part has strictly to do with data PJE> managers that write to other data managers, in the context of PJE> the transaction API I proposed. Can you explain how _p_changed is used outside of transaction control? I still don't understand how the timing of _p_changed affects things. Jeremy From pje@telecommunity.com Tue Jul 30 13:40:01 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 30 Jul 2002 08:40:01 -0400 Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <15685.57251.14632.949497@slothrop.zope.com> References: <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <3.0.5.32.20020719120237.00898b60@telecommunity.com> <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <5.1.0.14.0.20020730082812.05e848c0@mail.telecommunity.com> At 08:36 PM 7/29/02 -0400, Jeremy Hylton wrote: >Last week, I worked out a revised transaction API for user code and >for data managers. It's implemented in ZODB4, but is fairly >preliminary code. I imagine we'll revise it further, but I'd like to >describe the changes briefly. I'm not sure if this new API is in relation to the proposals on this list or not, but I'm curious how this affects a few things: * The need for participants to join every transaction. This is one of my top complaints about the existing API. I have *never* had a single application where I couldn't simply register all participants to the transactions at or near startup, and never need to do so again -- if it weren't for the fact that the transaction API doesn't work that way. I have to write code that tracks whether an object has been registered with *this* transaction, and knows when the transaction is over so that it knows it needs to register again. Could we at least have a "permanent join" operation? * Arbitarily nested, cascading participants. Does this support them? How? I don't see any mention of the issues in the interfaces. * If a data manager can't support rollback to a savepoint, what does it return? >(The need for notify-on-read, BTW, is to support higher isolation >levels than ZODB currently supports.) And to support delayed loading of attributes by multi-backend data managers. Although to support that, there'd need to be the opportunity to override the attribute value that was read. From pje@telecommunity.com Tue Jul 30 18:58:32 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 30 Jul 2002 13:58:32 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: <15686.35143.15684.228405@slothrop.zope.com> References: <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com> <3.0.5.32.20020729173745.008a0240@telecommunity.com> <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com> Message-ID: <3.0.5.32.20020730135832.008fa690@telecommunity.com> At 08:40 AM 7/30/02 -0400, Jeremy Hylton wrote: >>>>>> "PJE" == Phillip J Eby writes: > > PJE> Well, the example implementation I wrote took care of all of > PJE> that, quite elegantly I thought. But for my purposes, it's > PJE> sufficient as long as _p_changed is set after the last > PJE> modification that occurs. It's okay if it's also set after > PJE> previous modifications. It just must be set after the last > PJE> modification, regardless of how many other times it's set. > > PJE> This requirement on my part has strictly to do with data > PJE> managers that write to other data managers, in the context of > PJE> the transaction API I proposed. > >Can you explain how _p_changed is used outside of transaction control? >I still don't understand how the timing of _p_changed affects things. > This has to do with the "write-through mode" phase between "prepareToCommit()" and "voteOnCommit()" messages (whatever you call them). During this phase, to support cascaded storage (one data manager writes to another), all data managers must "write through" any changes that occur *immediately*. They can't wait for "prepareToCommit()", because they've already received it. Basically, when the object says, "I've changed" (i.e. via "register" or "notify" or whatever you call it), the data manager must write it out right then. But, if the _p_changed flag is set *before* the change, the data manager has no way to know what the change was and write it. It can't wait for "voteOnCommit()", because then the DM it writes to might have already voted, for example. It *must* know about the change as soon as the change has occurred. Thus, the change message must *follow* a change. It's okay if there are multiple change messages, as long as there's at least one *after* a set of changes. Now, you may say that there are other ways to address dependencies between participants than having "write-through mode" during the prepare->vote phase. And you're right. ZPatterns certainly manages to work around this, as does Steve Alexander's TransactionAgents. TransactionAgents, however, is actually a partial rewrite of the Zope transaction machinery, and there are some holes in how ZPatterns addresses the issue as well. (ZPatterns addresses it by adding more objects to the transaction during the "commit()" calls to the data managers, that are roughly equivalent to the current "prepare()" message concept.) We could address this by having transaction participants declare their dependencies to other participants, and have the transaction do a topological sort, and send all messages in dependency order. It could then be an error to have a circular dependency, and data managers could raise an error if they received an object change message once they were done with the prepare() call. It would make the Transaction API and implementation a bit more complex, leave data managers about the same in complexity as they would have been before, and it would mean that persistent objects wouldn't need to worry about whether _p_changed was flagged before or after a change. I proposed the direction I proposed, however, because it seemed to me easier to require _p_changed to be after, than to make the transaction manage a dependency graph. Data managers will still have to keep track of whether they've received a prepare() message, and do something special with a change notification during that time, regardless of whether you manage dependencies or have a "write-through" mode. But, with explicit dependency management, DM's also have the extra overhead of declaring their dependencies at registration, and they lose the ability to "not know" who they depend on. In other words, some modularity/information hiding is lost if you can't have the data manager delegate to functions or objects that know "how" to write the data, without it having to know as well in order to do the registration. Plus, had I proposed dependency management, I would be now defending *that*, and I figured "_p_changed after" would be easier to justify. :) Perhaps I should have proposed dependency management instead, so that then you could have said, "oh but we could solve that more easily if we just made _p_changed be after instead of before", and then I would have said, "Oh, of course, that's brilliant". :) All joking aside, I'm not married to either approach. If you have something that'll do it better than either way, or if I've somehow overlooked a way in which this is already solved by the new ZODB4 API, please let me know. From shane@zope.com Tue Jul 30 19:40:33 2002 From: shane@zope.com (Shane Hathaway) Date: Tue, 30 Jul 2002 14:40:33 -0400 (EDT) Subject: [Persistence-sig] A simple Observation API In-Reply-To: <3.0.5.32.20020730135832.008fa690@telecommunity.com> Message-ID: On Tue, 30 Jul 2002, Phillip J. Eby wrote: > At 08:40 AM 7/30/02 -0400, Jeremy Hylton wrote: > >>>>>> "PJE" == Phillip J Eby writes: > > > > PJE> Well, the example implementation I wrote took care of all of > > PJE> that, quite elegantly I thought. But for my purposes, it's > > PJE> sufficient as long as _p_changed is set after the last > > PJE> modification that occurs. It's okay if it's also set after > > PJE> previous modifications. It just must be set after the last > > PJE> modification, regardless of how many other times it's set. > > > > PJE> This requirement on my part has strictly to do with data > > PJE> managers that write to other data managers, in the context of > > PJE> the transaction API I proposed. > > > >Can you explain how _p_changed is used outside of transaction control? > >I still don't understand how the timing of _p_changed affects things. > > > > This has to do with the "write-through mode" phase between > "prepareToCommit()" and "voteOnCommit()" messages (whatever you call them). > During this phase, to support cascaded storage (one data manager writes to > another), all data managers must "write through" any changes that occur > *immediately*. They can't wait for "prepareToCommit()", because they've > already received it. Basically, when the object says, "I've changed" > (i.e. via "register" or "notify" or whatever you call it), the data manager > must write it out right then. I'm having trouble understanding this. Is prepareToCommit() the first phase, and voteOnCommit() the second phase? Can't the data manager commit the data on the second phase? > But, if the _p_changed flag is set *before* the change, the data manager > has no way to know what the change was and write it. It can't wait for > "voteOnCommit()", because then the DM it writes to might have already > voted, for example. It *must* know about the change as soon as the change > has occurred. Thus, the change message must *follow* a change. It's okay > if there are multiple change messages, as long as there's at least one > *after* a set of changes. For ZODB 3 I've realized that sometimes application code needs to set _p_changed *before* making a change. Here is an example of potentially broken code: def addDate(self, date): self.dates.append(date) # self.dates is a simple list self.dates.sort() self._p_changed = 1 Let's say self.dates.sort() raises some exception that leads to an aborted transaction. Objects are supposed to be reverted on transaction abort, but that won't happen here! The connection was never notified that there were changes, so self.dates is now out of sync. But if the application sets _p_changed just *before* the change, aborting will work. > Now, you may say that there are other ways to address dependencies between > participants than having "write-through mode" during the prepare->vote > phase. And you're right. ZPatterns certainly manages to work around this, > as does Steve Alexander's TransactionAgents. TransactionAgents, however, > is actually a partial rewrite of the Zope transaction machinery, and there > are some holes in how ZPatterns addresses the issue as well. (ZPatterns > addresses it by adding more objects to the transaction during the > "commit()" calls to the data managers, that are roughly equivalent to the > current "prepare()" message concept.) > > We could address this by having transaction participants declare their > dependencies to other participants, and have the transaction do a > topological sort, and send all messages in dependency order. It could then > be an error to have a circular dependency, and data managers could raise an > error if they received an object change message once they were done with > the prepare() call. It would make the Transaction API and implementation a > bit more complex, leave data managers about the same in complexity as they > would have been before, and it would mean that persistent objects wouldn't > need to worry about whether _p_changed was flagged before or after a change. Are you alluding to "indexing agents" and "rule agents" like we talked about before? I think we do need some kind of transaction participant ordering to support those concepts. I had in mind a simple numerical prioritization scheme. Is the need complex enough to require topological sorting? Shane From pje@telecommunity.com Tue Jul 30 20:05:39 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 30 Jul 2002 15:05:39 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: References: <3.0.5.32.20020730135832.008fa690@telecommunity.com> Message-ID: <3.0.5.32.20020730150539.0089c240@telecommunity.com> At 02:40 PM 7/30/02 -0400, Shane Hathaway wrote: >On Tue, 30 Jul 2002, Phillip J. Eby wrote: >> >> This has to do with the "write-through mode" phase between >> "prepareToCommit()" and "voteOnCommit()" messages (whatever you call them). >> During this phase, to support cascaded storage (one data manager writes to >> another), all data managers must "write through" any changes that occur >> *immediately*. They can't wait for "prepareToCommit()", because they've >> already received it. Basically, when the object says, "I've changed" >> (i.e. via "register" or "notify" or whatever you call it), the data manager >> must write it out right then. > >I'm having trouble understanding this. Is prepareToCommit() the first >phase, and voteOnCommit() the second phase? Can't the data manager commit >the data on the second phase? They're messages, not phases. The phase is the period between messages. Let's say we have DM1, DM2, and DM3, and the transaction calls: DM2.prepare() DM3.prepare() DM1.prepare() DM2.vote() DM3.vote() DM1.vote() If DM1 writes to DM3, and DM3 writes to DM2, then this ordering doesn't work, unless you have a "write-through" phase between prepare() and vote(). That is, if DM3 goes into "write-through" mode when it receives prepare(), then it will write through to DM2 when DM1 writes to it during the DM1.prepare() method. >> But, if the _p_changed flag is set *before* the change, the data manager >> has no way to know what the change was and write it. It can't wait for >> "voteOnCommit()", because then the DM it writes to might have already >> voted, for example. It *must* know about the change as soon as the change >> has occurred. Thus, the change message must *follow* a change. It's okay >> if there are multiple change messages, as long as there's at least one >> *after* a set of changes. > >For ZODB 3 I've realized that sometimes application code needs to set >_p_changed *before* making a change. Here is an example of potentially >broken code: > >def addDate(self, date): > self.dates.append(date) # self.dates is a simple list > self.dates.sort() > self._p_changed = 1 > >Let's say self.dates.sort() raises some exception that leads to an aborted >transaction. Objects are supposed to be reverted on transaction abort, >but that won't happen here! The connection was never notified that there >were changes, so self.dates is now out of sync. But if the application >sets _p_changed just *before* the change, aborting will work. Good point. I hadn't really thought about that use case. But the Observation API I proposed does support it, via separate beforeChange()/afterChange() notifications. A DM could track beforeChange() to know that an object needs rolling back, and afterChange(), to actually send a change through to an underlying DB, if it's in write-through mode at the time. >> Now, you may say that there are other ways to address dependencies between >> participants than having "write-through mode" during the prepare->vote >> phase. And you're right. ZPatterns certainly manages to work around this, >> as does Steve Alexander's TransactionAgents. TransactionAgents, however, >> is actually a partial rewrite of the Zope transaction machinery, and there >> are some holes in how ZPatterns addresses the issue as well. (ZPatterns >> addresses it by adding more objects to the transaction during the >> "commit()" calls to the data managers, that are roughly equivalent to the >> current "prepare()" message concept.) >> >> We could address this by having transaction participants declare their >> dependencies to other participants, and have the transaction do a >> topological sort, and send all messages in dependency order. It could then >> be an error to have a circular dependency, and data managers could raise an >> error if they received an object change message once they were done with >> the prepare() call. It would make the Transaction API and implementation a >> bit more complex, leave data managers about the same in complexity as they >> would have been before, and it would mean that persistent objects wouldn't >> need to worry about whether _p_changed was flagged before or after a change. > >Are you alluding to "indexing agents" and "rule agents" like we talked >about before? That's what TransactionAgents does, but that's not what I'm looking for per se. I'm looking at simple data managers. For example, if I make a data manager that persists a set of objects to an XML DOM, I might want to use it with a DOM persistence manager that stores XML documents in an SQL database. All three "data managers" (persist->XML, XML->Database, SQL database) are transaction participants, with implied or actual ordering. >I think we do need some kind of transaction participant >ordering to support those concepts. I had in mind a simple numerical >prioritization scheme. Is the need complex enough to require topological >sorting? Numerical prioritization requires that you have global knowledge of the participants, and therefore seems to go against modular usage of components, such as in my example above. Certainly, any non-circular topological relationship can be reduced to a numerical ordering. After all, Python new-style classes do it in __mro__. A topological sort using the kjbuckets module is maybe 30-40 lines of Python code, however; not much to pay, IMHO, for the amount of debugging saved by those people who would otherwise be tearing their hair out trying to figure out why something is intermittently failing because they gave two items the same numerical priority, but sometimes one of them is going first and sometimes the other one is. The post-change flag approach I proposed has the advantage of determining dependencies dynamically; that is, only dependencies that actually exist will have an effect, and explicit management through priorities or dependencies isn't required. In terms of API, I'd much rather deal with the overhead of before/after change notifications (as in my suggested Observation API) than have to explicitly declare priorities or dependencies. I can much more easily verify (by testing or local code inspection) that my object obeys the observation API, than I can debug *global* and *dynamic* interaction dependencies. So in my opinion, I'd *much* rather put up with the wrapper overhead on write methods, than deal with the global debug nightmares that declaring dependencies or priorities between data managers is (again, in my opinion) likely to bring. Such issues are harder for novice developers to understand. If their class works correctly, they reason, so too should my application. All the components worked individually, why won't they work together? IMO, the principle of least surprise says they should just work, without needing to wave any additional dead chickens over the code. From guido@python.org Tue Jul 30 20:11:21 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 30 Jul 2002 15:11:21 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: Your message of "Mon, 29 Jul 2002 18:20:03 EDT." <3.0.5.32.20020729182003.007d2d30@telecommunity.com> References: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> <3.0.5.32.20020729173745.008a0240@telecommunity.com> <3.0.5.32.20020729182003.007d2d30@telecommunity.com> Message-ID: <200207301911.g6UJBLe19703@odiug.zope.com> > >Hacking bytecode is inexcusable mixing of abstraction levels. > > Huh? The bytecode spec is not part of the Python spec, it's an implementation detail. Jython, e.g., doesn't use bytecode. Neither do various systems that translate Python source code to C or machine code. Unfortunately I'm going to have to pull out of this thread -- I'm so far behind on my email that I can't afford trying to understand this discussion. --Guido van Rossum (home page: http://www.python.org/~guido/) From Sebastien.Bigaret@inqual.com Tue Jul 30 20:34:13 2002 From: Sebastien.Bigaret@inqual.com (Sebastien Bigaret) Date: 30 Jul 2002 21:34:13 +0200 Subject: [Persistence-sig] A simple Observation API In-Reply-To: "Phillip J. Eby"'s message of "Tue, 30 Jul 2002 13:58:32 -0400" References: <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com> <3.0.5.32.20020729173745.008a0240@telecommunity.com> <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com> <3.0.5.32.20020730135832.008fa690@telecommunity.com> Message-ID: <878z3t57t6.fsf@bidibule.brest.inqual.bzh> > At 08:40 AM 7/30/02 -0400, Jeremy Hylton wrote: > >>>>>> "PJE" == Phillip J Eby writes: > > > > PJE> Well, the example implementation I wrote took care of all of > > PJE> that, quite elegantly I thought. But for my purposes, it's > > PJE> sufficient as long as _p_changed is set after the last > > PJE> modification that occurs. It's okay if it's also set after > > PJE> previous modifications. It just must be set after the last > > PJE> modification, regardless of how many other times it's set. > > > > PJE> This requirement on my part has strictly to do with data > > PJE> managers that write to other data managers, in the context of > > PJE> the transaction API I proposed. > > > >Can you explain how _p_changed is used outside of transaction control? > >I still don't understand how the timing of _p_changed affects things. > > > > This has to do with the "write-through mode" phase between > "prepareToCommit()" and "voteOnCommit()" messages (whatever you call them). > During this phase, to support cascaded storage (one data manager writes to > another), all data managers must "write through" any changes that occur > *immediately*. They can't wait for "prepareToCommit()", because they've > already received it. Basically, when the object says, "I've changed" > (i.e. via "register" or "notify" or whatever you call it), the data manager > must write it out right then. I'd like to add a few words here, saying that cascaded storage is not the only case where "write-through" mode is involved: the so-called 'cascade', i.e. one DM writing to a lower-level one, can be ``transverse'' as well, i.e. one DM writing to another one, at the same 'level'. Just an example here: say you have DM1 and DM2 being responsible for RDBMS DB1 and DB2. If obj1 and obj2 are to be stored within, resp., DB1 and DB2, then you can have that sort of ``write-through'' mechanism being triggered as well. The reason for this is that, if obj1 and obj2 are in relation with each other, and since informations needed for relationships are mostly stored in RDBMS in an asymetrical manner (put it simply: this info==a foreign key, stored in only one of the two tables), a change in one of the object needs to be forwarded to the other DM. Humm... Having writing this, I'm not sure this is related to what you're saying here, mainly because the forwarded informations I'm talking about is *not* in the object's properties... or is it? Well, changes are *not* in the original obj1's properties (although changes might be propagated bottom-up, but that's another story), but changes are made in the corresponding 'row1' 's properties (at DM1 level). So, if DM1 is already in write-through mode, it will in turn immediately notify/write to its SQL-database-connection-DM. We will potentially have more than one SQL statement issued for a single row/whatever, but the necessary informations about the whole architecture and dependencies (which the DMs do know) do not have to be put into the Transaction framework. If this is it, it makes me think that it is like having the DMs calling a (reentrant) version of 'prepareCommit()' on their level-1 DMs --but the actual forwarding of the message is not explicit, rather made implicit through the 'write-through' mode. Is this what you mean? -- Sebastien. From pje@telecommunity.com Tue Jul 30 20:55:15 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 30 Jul 2002 15:55:15 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: <878z3t57t6.fsf@bidibule.brest.inqual.bzh> References: <"Phillip J. Eby"'s message of "Tue, 30 Jul 2002 13:58:32 -0400"> <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com> <3.0.5.32.20020729173745.008a0240@telecommunity.com> <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com> <3.0.5.32.20020730135832.008fa690@telecommunity.com> Message-ID: <3.0.5.32.20020730155515.0089ea00@telecommunity.com> At 09:34 PM 7/30/02 +0200, Sebastien Bigaret wrote: > >I'd like to add a few words here, saying that cascaded storage is not the only >case where "write-through" mode is involved: the so-called 'cascade', i.e. one >DM writing to a lower-level one, can be ``transverse'' as well, i.e. one DM >writing to another one, at the same 'level'. Just an example here: say you >have DM1 and DM2 being responsible for RDBMS DB1 and DB2. If obj1 and obj2 are >to be stored within, resp., DB1 and DB2, then you can have that sort of >``write-through'' mechanism being triggered as well. The reason for this is >that, if obj1 and obj2 are in relation with each other, and since informations >needed for relationships are mostly stored in RDBMS in an asymetrical manner >(put it simply: this info==a foreign key, stored in only one of the two >tables), a change in one of the object needs to be forwarded to the other DM. > > Humm... > > Having writing this, I'm not sure this is related to what you're saying > here, mainly because the forwarded informations I'm talking about is *not* > in the object's properties... or is it? Well, changes are *not* in the > original obj1's properties (although changes might be propagated bottom-up, > but that's another story), but changes are made in the corresponding 'row1' > 's properties (at DM1 level). So, if DM1 is already in write-through mode, > it will in turn immediately notify/write to its > SQL-database-connection-DM. We will potentially have more than one SQL > statement issued for a single row/whatever, but the necessary informations > about the whole architecture and dependencies (which the DMs do know) do not > have to be put into the Transaction framework. > > If this is it, it makes me think that it is like having the DMs calling a > (reentrant) version of 'prepareCommit()' on their level-1 DMs --but the > actual forwarding of the message is not explicit, rather made implicit > through the 'write-through' mode. > >Is this what you mean? > Yes, if I've understood you correctly. My point was that it's easier to implement scenarios such as you described, with the "write-throughs during commit" algorithm, as it doesn't need to explicitly track all those dependencies. Yes, it may cause occasional inefficienct write operations when there is a complex cascade taking place, and the participants are registered in a less-than-optimal order, but the idea is for "complex things to be possible", while keeping simple things simple, and ideally to guarantee correctness. That's why, in the absence of other information to the contrary, I favor the "write-throughs during commit" algorithm for handling dependencies. It scales the best for complex scenarios, guarantees correctness for any non-circular dependency graph, and involves the least code to be written for even the simplest cases, with the possible exception of how persistent objects issue change notifications. From shane@zope.com Tue Jul 30 21:02:28 2002 From: shane@zope.com (Shane Hathaway) Date: Tue, 30 Jul 2002 16:02:28 -0400 (EDT) Subject: [Persistence-sig] A simple Observation API In-Reply-To: <3.0.5.32.20020730150539.0089c240@telecommunity.com> Message-ID: On Tue, 30 Jul 2002, Phillip J. Eby wrote: > They're messages, not phases. The phase is the period between messages. Yep. :-) > Let's say we have DM1, DM2, and DM3, and the transaction calls: > > DM2.prepare() > DM3.prepare() > DM1.prepare() > > DM2.vote() > DM3.vote() > DM1.vote() > > If DM1 writes to DM3, and DM3 writes to DM2, then this ordering doesn't > work, unless you have a "write-through" phase between prepare() and vote(). > That is, if DM3 goes into "write-through" mode when it receives prepare(), > then it will write through to DM2 when DM1 writes to it during the > DM1.prepare() method. I see now. From one perspective, this problem is a side effect of keeping transaction participants registered between transactions, as you've been suggesting. ZODB 3's transaction manager would normally have no problem with this, since DM3 and DM2 would only get added to the transaction once DM1 started writing. The implicit order would solve the problem. Unfortunately, this solution has a weakness--if some other data manager wrote unrelated data to DM3 or DM2 before DM1 wrote its data, the implicit order would be incorrect. Thus the need for transaction agents, which guarantee a specific order (if I recall correctly). Write-through mode seems like a performance killer for many applications. What about this: transaction participants could tell the transaction that even though their prepare() method has been called already, they need it called again. Shane From pje@telecommunity.com Tue Jul 30 21:17:54 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 30 Jul 2002 16:17:54 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: References: <3.0.5.32.20020730150539.0089c240@telecommunity.com> Message-ID: <3.0.5.32.20020730161754.008701c0@telecommunity.com> At 04:02 PM 7/30/02 -0400, Shane Hathaway wrote: > >I see now. From one perspective, this problem is a side effect of keeping >transaction participants registered between transactions, as you've been >suggesting. ZODB 3's transaction manager would normally have no problem >with this, since DM3 and DM2 would only get added to the transaction once >DM1 started writing. The implicit order would solve the problem. > >Unfortunately, this solution has a weakness--if some other data manager >wrote unrelated data to DM3 or DM2 before DM1 wrote its data, the implicit >order would be incorrect. Thus the need for transaction agents, which >guarantee a specific order (if I recall correctly). Right; this is why I described both ZPatterns and TransactionAgents as being hacks. We rely on the use of registration order to force things to work in most cases, or re-registration. It's a pretty ugly hack. >Write-through mode seems like a performance killer for many applications. >What about this: transaction participants could tell the transaction that >even though their prepare() method has been called already, they need it >called again. The only drawback I'm aware of for that approach, is that it leads to an infinite loop instead of a stack overflow, in the event you accidentally create a circular dependency graph. The infinite loop doesn't produce a traceback, and thus doesn't show you *how* you created the circularity. I suppose you could require that the number of times you loop through a list sending prepare() calls is no greater than some multiplier of the total number of participants, and then at least you could detect what seems like a runaway dependency. Printing out what the loop *was* could be hard, though, and the information would not show you as directly how the loop occurred. But I'm willing to bend on this point, since I think even accidental circularity is likely to be rare, that when it does occur you're likely to have known there was a risk of it, and that you'll be likely to know where to look for where it occurred. It's a lot different than the risk of out-of-order commits, which could occur with explicit dependency management for even very simple scenarios. Also, I think a different method should be used for the second prepare() call - perhaps a flush() method. That way, prepare() won't need to be able to be called twice during the same commit, which I can see some problems with. prepare() could simply call flush(), or perhaps the transaction could do it. flush() should be written so as to be usable at any point in the transaction, since it'll presumably be used to implement savepoints as well, and in some cases to ensure an underlying DB is up-to-date before performing a query. I do like the simplification of not needing a "write-through" mode, although in reality all we are doing is replacing it with a "re-flush" mode. That is, once a participant receives prepare(), it must respond to any future change notifications by requesting a re-call of flush() by the transaction. By the way, I'd still like to have the option of having participants join a transaction "permanently", in order to avoid all of the state management code that such things currently require. With the exception of the above issues, I'm good with this approach. Brilliant idea, Shane. :) From shane@zope.com Tue Jul 30 21:55:47 2002 From: shane@zope.com (Shane Hathaway) Date: Tue, 30 Jul 2002 16:55:47 -0400 (EDT) Subject: [Persistence-sig] A simple Observation API In-Reply-To: <3.0.5.32.20020730161754.008701c0@telecommunity.com> Message-ID: On Tue, 30 Jul 2002, Phillip J. Eby wrote: > At 04:02 PM 7/30/02 -0400, Shane Hathaway wrote: > > >Write-through mode seems like a performance killer for many applications. > >What about this: transaction participants could tell the transaction that > >even though their prepare() method has been called already, they need it > >called again. > > The only drawback I'm aware of for that approach, is that it leads to an > infinite loop instead of a stack overflow, in the event you accidentally > create a circular dependency graph. The infinite loop doesn't produce a > traceback, and thus doesn't show you *how* you created the circularity. Good point. OTOH, from my own experience, stack overflows in Python sometimes lead to segfaults, and I'd prefer an infinite loop over a segfault. :-) > (snip) > > Also, I think a different method should be used for the second prepare() > call - perhaps a flush() method. That way, prepare() won't need to be able > to be called twice during the same commit, which I can see some problems > with. prepare() could simply call flush(), or perhaps the transaction > could do it. flush() should be written so as to be usable at any point in > the transaction, since it'll presumably be used to implement savepoints as > well, and in some cases to ensure an underlying DB is up-to-date before > performing a query. Yes, flush() is a good idea. It keeps the phase change distinct from the repeatable messages, and its purpose would be well understood. > I do like the simplification of not needing a "write-through" mode, > although in reality all we are doing is replacing it with a "re-flush" > mode. That is, once a participant receives prepare(), it must respond to > any future change notifications by requesting a re-call of flush() by the > transaction. > > By the way, I'd still like to have the option of having participants join a > transaction "permanently", in order to avoid all of the state management > code that such things currently require. Yes, that sounds useful for logging, periodic backups (to ensure the backup is based on fully committed data), and other utilities. As long as joining permanently is optional, since objects like CommitVersions don't need to stick around. Now, I wonder about multithreaded apps. If you join a transaction permanently, do you join all threads? At first I wasn't thinking you would, but on further reflection, it seems like that's what you'd want. And how would this affect CORBA (since, from what I hear, its transactions are not bound to threads)? > With the exception of the above issues, I'm good with this approach. > Brilliant idea, Shane. :) Thanks. You too. Shane From Sebastien.Bigaret@inqual.com Wed Jul 31 00:35:09 2002 From: Sebastien.Bigaret@inqual.com (Sebastien Bigaret) Date: 31 Jul 2002 01:35:09 +0200 Subject: [Persistence-sig] A simple Observation API In-Reply-To: Shane Hathaway's message of "Tue, 30 Jul 2002 16:55:47 -0400 (EDT)" References: Message-ID: <87d6t4zt5e.fsf@inqual.com> > Phillip> (snip) > Phillip> Also, I think a different method should be used for the > Phillip> second prepare() call - perhaps a flush() method. That way, > Phillip> prepare() won't need to be able to be called twice during the > Phillip> same commit, which I can see some problems with. prepare() > Phillip> could simply call flush(), or perhaps the transaction could > Phillip> do it. flush() should be written so as to be usable at any > Phillip> point in the transaction, since it'll presumably be used to > Phillip> implement savepoints as well, and in some cases to ensure an > Phillip> underlying DB is up-to-date before performing a query. Shane> Yes, flush() is a good idea. It keeps the phase change Shane> distinct from the repeatable messages, and its purpose would be Shane> well understood. +1, this sounds good. Shane> Now, I wonder about multithreaded apps. If you join a Shane> transaction permanently, do you join all threads? At first I Shane> wasn't thinking you would, but on further reflection, it seems Shane> like that's what you'd want. And how would this affect CORBA Shane> (since, from what I hear, its transactions are not bound to Shane> threads)? Could you be more explicit? It seems strange to me. For me and the applications I usually address, DataManagers are most of the time bound to a ``session'' idea (just like Sessions in Zope or in any http-based app., i.e. a set of objects being modified by subsequent requests from users until it reaches the point where it needs to be made persistent, independently from each other). --> What do you think of making it possible for DM-factories to permanently join transactions, so that it is possible to do whatever can be accurate for a given situation/application ? (e.g. w/ factories returning a singleton if you want to join all threads, or a session-specific DM, or thread-specific DM if you need to, etc.) NB: just before posting I have a doubt on what you're actually talking about. I know I'm mixing MT and sessions in a unreasonable manner hereabove, but the point is that I'm basically thinking 'joining' in terms of 'initialization of a transaction's participants'. Maybe I misunderstood the whole stuff here. Philipp wrote about this: Phillip> * The need for participants to join every transaction. This Phillip> is one of my top complaints about the existing API. I have Phillip> *never* had a single application where I couldn't simply Phillip> register all participants to the transactions at or near Phillip> startup, and never need to do so again -- if it weren't for Phillip> the fact that the transaction API doesn't work that way. I Phillip> have to write code that tracks whether an object has been Phillip> registered with *this* transaction, and knows when the Phillip> transaction is over so that it knows it needs to register Phillip> again ...I can't decide whether you are talking about initialization of a transaction _instance_. The last sentence suggests that participants are unregistered when the transaction closes: do you mean destroyed, or commit/rollback time? If this is the latter case, then I guess I have missed something, since I cannot find any references in the previous threads about participants being unregistered at that point. If this is the first case (hence, making it possible to generate a given set of DataManagers for each new transaction), then my proposal for DM-factories might be meaningful. -- Sebastien. From pje@telecommunity.com Wed Jul 31 00:53:20 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 30 Jul 2002 19:53:20 -0400 Subject: [Persistence-sig] Threads and transactions (was Observation API) In-Reply-To: References: <3.0.5.32.20020730161754.008701c0@telecommunity.com> Message-ID: <5.1.0.14.0.20020730194212.05d95460@mail.telecommunity.com> At 04:55 PM 7/30/02 -0400, Shane Hathaway wrote: >On Tue, 30 Jul 2002, Phillip J. Eby wrote: > > > > By the way, I'd still like to have the option of having participants join a > > transaction "permanently", in order to avoid all of the state management > > code that such things currently require. > >Yes, that sounds useful for logging, periodic backups (to ensure the >backup is based on fully committed data), and other utilities. As long as >joining permanently is optional, since objects like CommitVersions don't >need to stick around. Also, in my use cases, certain caches want to clear themselves on transactional boundaries. >Now, I wonder about multithreaded apps. If you join a transaction >permanently, do you join all threads? At first I wasn't thinking you >would, but on further reflection, it seems like that's what you'd want. >And how would this affect CORBA (since, from what I hear, its transactions >are not bound to threads)? A permanent join should be to *that* transaction object only. Anything else implies too much policy, IMHO. For my use cases, I will normally have at most one transaction per thread. This is the normal use case for Zope also, I believe. In the event that you have an object which can safely participate in multiple transactions simultaneously, then by all means you should be able to register it with them. I think that the transaction API *should* provide at least some nominal support for associating a transaction with a thread, or automatically creating per-thread transactions, if only because ZODB has supported that in the past. Java's JTA also assumes that the default use case is transaction-per-thread. But, the minimum I would like to see is that a transaction object should be reusable over and over. As long as that's the case, a permanent join is useful, since I can declare a transaction object, associate things with it, and proceed about my business. I think most people, however, will want to be able to do something similar to ZODB's existing "get_transaction()" function to get a singleton Transaction object, to do basic single-threaded, single-transaction applications. And simple things should be simple. From pje@telecommunity.com Wed Jul 31 00:57:36 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 30 Jul 2002 19:57:36 -0400 Subject: [Persistence-sig] Clearing participants (was Observation API) In-Reply-To: <87d6t4zt5e.fsf@inqual.com> References: Message-ID: <5.1.0.14.0.20020730195406.0267fdd0@mail.telecommunity.com> At 01:35 AM 7/31/02 +0200, Sebastien Bigaret wrote: >Phillip> * The need for participants to join every transaction. This >Phillip> is one of my top complaints about the existing API. I have >Phillip> *never* had a single application where I couldn't simply >Phillip> register all participants to the transactions at or near >Phillip> startup, and never need to do so again -- if it weren't for >Phillip> the fact that the transaction API doesn't work that way. I >Phillip> have to write code that tracks whether an object has been >Phillip> registered with *this* transaction, and knows when the >Phillip> transaction is over so that it knows it needs to register >Phillip> again > > ...I can't decide whether you are talking about initialization of a > transaction _instance_. The last sentence suggests that participants > are unregistered when the transaction closes: do you mean destroyed, > or commit/rollback time? If this is the latter case, then I guess I > have missed something, since I cannot find any references in the > previous threads about participants being unregistered at that > point. If this is the first case (hence, making it possible to > generate a given set of DataManagers for each new transaction), then > my proposal for DM-factories might be meaningful. Sorry, the "transaction API" and "existing API" I referred to is the currently available transaction API in Zope/ZODB, not the API I proposed on this list. The old Zope/ZODB transaction API requires registration for each transaction lifecycle; the registration list is cleared upon every commit or rollback. My motivation for making registration permanent in my "Straw Man" transaction API proposal was to counteract this. In the API Shane and I are discussing, there would be an option to register with a transaction instance such that the registration would remain across commit/rollback boundaries. From shane@zope.com Wed Jul 31 03:26:33 2002 From: shane@zope.com (Shane Hathaway) Date: Tue, 30 Jul 2002 22:26:33 -0400 (EDT) Subject: [Persistence-sig] A simple Observation API In-Reply-To: <87d6t4zt5e.fsf@inqual.com> Message-ID: On 31 Jul 2002, Sebastien Bigaret wrote: > ...I can't decide whether you are talking about initialization of a > transaction _instance_. The last sentence suggests that participants > are unregistered when the transaction closes: do you mean destroyed, > or commit/rollback time? If this is the latter case, then I guess I > have missed something, since I cannot find any references in the > previous threads about participants being unregistered at that > point. If this is the first case (hence, making it possible to > generate a given set of DataManagers for each new transaction), then > my proposal for DM-factories might be meaningful. The terminology we're using is a little confusing, since an object that is truly a transaction should probably begin its life at the beginning of a transaction and, at commit or rollback time, should become permanently immutable. It might even be stored in the database. But the things we've been calling transactions play a role more like transaction "coordinators". As coordinators, they might be reused for numerous non-overlapping transactions. If they are reused, it makes sense to be able to register a permanent transaction participant with a specific coordinator. I think there might a problem, though. ZODB customarily uses one transaction coordinator per thread. But ZODB connections are not really thread-specific; they may be reused in a different thread when they are opened or closed. So if, for example, you registered a permanent transaction participant that cleared the cache of a specific ZODB connection, you wouldn't get the effect you wanted! :-) That's why I suggested that if you want permanent participants, that perhaps you'd really want to register the transaction participant for all threads. It requires you to consider thread safety, but I think you'd frequently have to consider that anyway. Shane From jeremy@alum.mit.edu Wed Jul 31 03:51:13 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Tue, 30 Jul 2002 22:51:13 -0400 Subject: [Persistence-sig] Clearing participants (was Observation API) In-Reply-To: <5.1.0.14.0.20020730195406.0267fdd0@mail.telecommunity.com> References: <5.1.0.14.0.20020730195406.0267fdd0@mail.telecommunity.com> Message-ID: <15687.20641.456885.879684@slothrop.zope.com> >>>>> "PJE" == Phillip J Eby writes: PJE> My motivation for making registration permanent in my "Straw PJE> Man" transaction API proposal was to counteract this. In the PJE> API Shane and I are discussing, there would be an option to PJE> register with a transaction instance such that the registration PJE> would remain across commit/rollback boundaries. I don't think it makes sense to talk about a single transaction that spans multiple commits. A transaction ends with a commit or an abort. If you do something after that, it's a different transaction. I agree, however, that it is worth discussing 1) what mechanisms are needed for associating threads with transactions in order to support a range of policies and 2) how a resource manager can express its interest in all (some?) transactions. The second issue probably depends on the first, but not vice versa. Jeremy From iiourov@yahoo.com Wed Jul 31 07:32:40 2002 From: iiourov@yahoo.com (Ilia Iourovitski) Date: Tue, 30 Jul 2002 23:32:40 -0700 (PDT) Subject: [Persistence-sig] A simple Observation API In-Reply-To: Message-ID: <20020731063240.55579.qmail@web20705.mail.yahoo.com> If object participate in more than one transaction concurrently, transaction API shall provide locks. It should be possible to acquire read/write lock in the same fashion as RDBMS let client lock row using select for update. In number of cases without locks pure transactions doesn't garantee concurrency control without problem. Typical example is user account balance wich can be updated by user through the web and at the same time by monthly batch process. Ilia --- Shane Hathaway wrote: > On 31 Jul 2002, Sebastien Bigaret wrote: > > > ...I can't decide whether you are talking about > initialization of a > > transaction _instance_. The last sentence > suggests that participants > > are unregistered when the transaction closes: do > you mean destroyed, > > or commit/rollback time? If this is the latter > case, then I guess I > > have missed something, since I cannot find any > references in the > > previous threads about participants being > unregistered at that > > point. If this is the first case (hence, making > it possible to > > generate a given set of DataManagers for each > new transaction), then > > my proposal for DM-factories might be > meaningful. > > The terminology we're using is a little confusing, > since an object that is > truly a transaction should probably begin its life > at the beginning of a > transaction and, at commit or rollback time, should > become permanently > immutable. It might even be stored in the database. > > But the things we've been calling transactions play > a role more like > transaction "coordinators". As coordinators, they > might be reused for > numerous non-overlapping transactions. If they are > reused, it makes > sense to be able to register a permanent transaction > participant with a > specific coordinator. > > I think there might a problem, though. ZODB > customarily uses one > transaction coordinator per thread. But ZODB > connections are not really > thread-specific; they may be reused in a different > thread when they are > opened or closed. So if, for example, you > registered a permanent > transaction participant that cleared the cache of a > specific ZODB > connection, you wouldn't get the effect you wanted! > :-) > > That's why I suggested that if you want permanent > participants, that > perhaps you'd really want to register the > transaction participant for all > threads. It requires you to consider thread safety, > but I think you'd > frequently have to consider that anyway. > > Shane > > > _______________________________________________ > Persistence-sig mailing list > Persistence-sig@python.org > http://mail.python.org/mailman-21/listinfo/persistence-sig __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From niki@vintech.bg Wed Jul 31 09:29:41 2002 From: niki@vintech.bg (Niki Spahiev) Date: Wed, 31 Jul 2002 11:29:41 +0300 Subject: [Persistence-sig] A simple Observation API References: <3.0.5.32.20020730135832.008fa690@telecommunity.com> <3.0.5.32.20020730150539.0089c240@telecommunity.com> Message-ID: <3D479FF5.5010808@vintech.bg> Phillip J. Eby wrote: >>def addDate(self, date): >> self.dates.append(date) # self.dates is a simple list >> self.dates.sort() >> self._p_changed = 1 >> >>Let's say self.dates.sort() raises some exception that leads to an aborted >>transaction. Objects are supposed to be reverted on transaction abort, >>but that won't happen here! The connection was never notified that there >>were changes, so self.dates is now out of sync. But if the application >>sets _p_changed just *before* the change, aborting will work. > > > Good point. I hadn't really thought about that use case. But the > Observation API I proposed does support it, via separate > beforeChange()/afterChange() notifications. A DM could track > beforeChange() to know that an object needs rolling back, and > afterChange(), to actually send a change through to an underlying DB, if > it's in write-through mode at the time. Maybe this will solve it? def addDate(self, date): self._p_changed = 1 self.dates.append(date) # self.dates is a simple list self.dates.sort() self._p_changed = 1 _p_changed before *and* after? regards, Niki Spahiev From pje@telecommunity.com Wed Jul 31 13:06:10 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Wed, 31 Jul 2002 08:06:10 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: References: <87d6t4zt5e.fsf@inqual.com> Message-ID: <5.1.0.14.0.20020731075426.050ed220@mail.telecommunity.com> At 10:26 PM 7/30/02 -0400, Shane Hathaway wrote: >But the things we've been calling transactions play a role more like >transaction "coordinators". As coordinators, they might be reused for >numerous non-overlapping transactions. If they are reused, it makes >sense to be able to register a permanent transaction participant with a >specific coordinator. > >I think there might a problem, though. ZODB customarily uses one >transaction coordinator per thread. But ZODB connections are not really >thread-specific; they may be reused in a different thread when they are >opened or closed. So if, for example, you registered a permanent >transaction participant that cleared the cache of a specific ZODB >connection, you wouldn't get the effect you wanted! :-) Well, ZODB could always: 1. do as it does now, and register non-permananently, or 2. pool the transaction with the connection. (See below.) >That's why I suggested that if you want permanent participants, that >perhaps you'd really want to register the transaction participant for all >threads. It requires you to consider thread safety, but I think you'd >frequently have to consider that anyway. In my use case, the transaction will live as an attribute of a root "application" object, and application objects will be pooled for use by different threads. Application objects also contain as attributes all their connections, data managers, etc. So everything's pooled together, and there's no question of which transaction goes with what. This approach is virtually identical to what Zope does now, except that Zope keeps the transaction with the thread, instead of with the resource pool. From barry@python.org Wed Jul 31 13:21:48 2002 From: barry@python.org (Barry A. Warsaw) Date: Wed, 31 Jul 2002 08:21:48 -0400 Subject: [Persistence-sig] A simple Observation API References: <3.0.5.32.20020730135832.008fa690@telecommunity.com> <3.0.5.32.20020730150539.0089c240@telecommunity.com> <3D479FF5.5010808@vintech.bg> Message-ID: <15687.54876.929262.735189@anthem.wooz.org> >>>>> "NS" == Niki Spahiev writes: | def addDate(self, date): | self._p_changed = 1 | self.dates.append(date) # self.dates is a simple list | self.dates.sort() | self._p_changed = 1 NS> _p_changed before *and* after? Seems unnecessarily redundant. IIUC, setting _p_changed to 1 will register the object so setting it twice simply registers it twice, which doesn't seem very useful. I guess when to set _p_changed will be a decision that the object designer will have to make based on the semantics of the object, and the operation. -Barry From pje@telecommunity.com Wed Jul 31 14:20:52 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Wed, 31 Jul 2002 09:20:52 -0400 Subject: [Persistence-sig] When to set _p_changed (was A simple Observation API) In-Reply-To: <15687.54876.929262.735189@anthem.wooz.org> References: <3.0.5.32.20020730135832.008fa690@telecommunity.com> <3.0.5.32.20020730150539.0089c240@telecommunity.com> <3D479FF5.5010808@vintech.bg> Message-ID: <3.0.5.32.20020731092052.019c1210@telecommunity.com> At 08:21 AM 7/31/02 -0400, Barry A. Warsaw wrote: > >I guess when to set _p_changed will >be a decision that the object designer will have to make based on the >semantics of the object, and the operation. > Actually, if Shane's proposal for how to handle cascaded data managers is used, then it will be unequivocal that _p_changed *must* be set *before* the change, in order to ensure proper rollback behavior. A principal drawback to the approach that I had been proposing, was that it required _p_changed to be set *after* a change, which wasn't good for being able to ensure that rollbacks would always be handled correctly. From shane@zope.com Wed Jul 31 14:50:27 2002 From: shane@zope.com (Shane Hathaway) Date: Wed, 31 Jul 2002 09:50:27 -0400 (EDT) Subject: [Persistence-sig] A simple Observation API In-Reply-To: <5.1.0.14.0.20020731075426.050ed220@mail.telecommunity.com> Message-ID: On Wed, 31 Jul 2002, Phillip J. Eby wrote: > In my use case, the transaction will live as an attribute of a root > "application" object, and application objects will be pooled for use by > different threads. Application objects also contain as attributes all > their connections, data managers, etc. So everything's pooled together, > and there's no question of which transaction goes with what. > > This approach is virtually identical to what Zope does now, except that > Zope keeps the transaction with the thread, instead of with the resource pool. That's an interesting idea. It should work well (though you'll have a bootstrapping issue ;-) ). It may not be necessary, though, for all transaction coordinators to provide a method for registering a permanent participant. It could be a method of a different interface. Not all coordinators (i.e. non-pooled) will be able to fulfill the contract as expected. Shane From jim@zope.com Wed Jul 31 19:23:16 2002 From: jim@zope.com (Jim Fulton) Date: Wed, 31 Jul 2002 14:23:16 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API References: <5.1.0.14.0.20020723140040.0519cc90@mail.telecommunity.com> <5.1.0.14.0.20020723150312.04f70390@mail.telecommunity.com> Message-ID: <3D482B14.4020506@zope.com> Phillip J. Eby wrote: > At 11:35 AM 7/23/02 -0700, Ilia Iourovitski wrote: > ... >> > The most straightforward way to handle queries is by >> > creating query data >> > managers, which take OIDs that represent the >> > parameters of the query. >> > >> Most of the time people retrive object by attributes. >> not by OID. > > > Right. So define a query manager that takes the attributes as fields in > an OID, and returns a persistent object that represents a sequence of > records. e.g. > > for object in someQueryMgr[ ('param1value','param2value') ]: > ... > > All you need is a separate query manager for each (parameterized) query > your app needs -- and again, there's nothting stopping you from > generating your own via metadata or even from OQL if that's your heart's > desire. I think queries are entirely different beasts from oids. I would recommend be inclined to see a data-manager specific query interface for queries. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim@zope.com Wed Jul 31 19:35:57 2002 From: jim@zope.com (Jim Fulton) Date: Wed, 31 Jul 2002 14:35:57 -0400 Subject: [Persistence-sig] "Straw Baby" Persistence API References: <20020723170836.41755.qmail@web20702.mail.yahoo.com> Message-ID: <3D482E0D.6070303@zope.com> Ilia Iourovitski wrote: > --- Jim Fulton wrote: > ... >>>create(object) storage shall populated id from >>> >>rdbms >> >>>which is usually primary key. >>> >>This should not be necessary. One should be able to >>design a data manager that detected new objects and >>assigned them ids when referencing objects are >>created. >> > > Typical storage (rdbms, odbms, xml like xindicea) > do not provide root object. > So after transaction > started > object must be loaded from storage or created. This is a good point. There often isn't a single root objects that are objects are reachable from. On the other hand, most non-trivial relationaql systems have related objects. Most objects are reachable from other objects. It should be possible to load objects automatically when traversing to them from other objects. In addition, if a new object is added to another object, it should bve possible to add the new object to the database automatically. >>>delete(object) >>> >>I can imagine a datamanager that lacked garbage >>collection could >>need this. >> >> > in case of rdbms there are objects which are not > referenced. Right. >>>load(object type, object id)->object >>> >>An object type should be unnecessary. If a data >>manager >>needs to track this sort of information, it should >>embed it in the object id. >> > > In rdbms case id usually integer. adding the whole > package/class name can be expensive. That depends on how you do it, the object id need not be ythe same as the primary key and could encode the class in a more efficient manner than storing the package and class names. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim@zope.com Wed Jul 31 20:38:13 2002 From: jim@zope.com (Jim Fulton) Date: Wed, 31 Jul 2002 15:38:13 -0400 Subject: [Persistence-sig] A simple Observation API References: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> <873cu6a930.fsf@bidibule.brest.inqual.bzh> Message-ID: <3D483CA5.1020704@zope.com> Sebastien Bigaret wrote: ... > About caching and caching policies: > > Phillip did talk about 'transactional caching' and I'm not sure what it > really is, however, there is some needs to have 'application-wide' caching > mechanism to avoid unnecessary round-trips to the DB. Right > Of course, this should > not defeat the 'smallest-possible-memory-footprint-requirement' pointed out > in the sig charter ; Where the goal is not "smallest possible" but small enough and smaller than the entire database. > but if an object has already been fetched somewhere > (and is still active in an other thread, or the cache/snapshots would have > been deleted), then it is usually unnecessary to re-fetch the object, simply > use the cached snapshot instead. But this sounds to me a bit off-topic for > this list. One of the goals of ZODB's caching strategy is also to provide isolation between separate threads. Different threads can have separate caches and so don't need locks to mediate access to objects in the caches. Concievably, an RDBMS-based data manager could employ a lower-level cache to avoid duplicate RDBMS accesses among threads in much the way ZEO uses a client cache to avoid extra trips to the storage server. > > +1 on defining a state model for persistent objects ; however I'm a little > fuzzy about the difference between 'unsaved' and 'changed'. To my > understanding 'unsaved' is for new objects, Right. > while 'changed' is for existing > (previously made persistent objects, is this right? Right. ... > Ilia> create(object) storage shall populated id from rdbms > Ilia> which is usually primary key. > > Jim> This should not be necessary. One should be able to > Jim> design a data manager that detected new objects and > Jim> assigned them ids when referencing objects are created. > > Can you elaborate on that? Suppose I have a car object that has already been stored in the database, but it doesn't have an engine. I should be able to say: # get the car ... car.engine = Engine() commit() Now when I commit, the car's data manager should be able to notice that it now has an engine and that the engine doesn't have an oid. It will then know that an new engine object needs to be created and that it's primary key (which is not necessarily the same as the oid) needs to be stored in the cars engine column.) Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jeremy@alum.mit.edu Wed Jul 31 20:37:56 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 31 Jul 2002 15:37:56 -0400 Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: References: <15685.57251.14632.949497@slothrop.zope.com> Message-ID: <15688.15508.288534.906790@slothrop.zope.com> >>>>> "SH" == Shane Hathaway writes: >> The APIs look like this: >> >> class ITransaction(Interface): """Transaction objects.""" >> >> def abort(): """Abort the current transaction.""" >> >> def begin(): """Begin a transaction.""" >> >> def commit(): """Commit a transaction.""" >> >> def join(resource): """Join a resource manager to the current >> transaction.""" SH> By "resource manager" do you mean "IDataManager"? I have used these terms somewhat interchangeably, yes. I think "resource manager" is the more widely used terminology. >> >> def status(): """Return status of the current transaction.""" SH> What kind of object would status() return? Who might make use SH> of it? I expect status returns values from an "enum" with values like in-progress, committed, aborted. SH> Also, I'd like to see some way to set transaction metadata. I didn't include any transaction metadata in the generic Transaction interface. I wasn't sure how generally applicable that was. Instead, I created a subclass of Transaction in ZODB that has the old ZODB interface. SH> I would like this interface to be called SH> ITransactionParticipant. There are many interesting kinds of SH> objects that would be interested in participating in a SH> transaction, and not all of them have the immediate SH> responsibility of storing data. But the names you chose for the SH> methods are very clear and concise, I think. I think IResourceManager is probably better (see above). I wish I could take credit for the names, but I just grabbed them from the Gray & Reuter book :-). >> class IRollback(Interface): >> >> def rollback(): """Rollback changes since savepoint.""" >> >> I think the rollback mechanism will work well enough. Gray and >> Reuter explain that it can be used to simulate a nested >> transaction architecture. Thus, I think it's a reasonable >> building block for the nested transaction API. SH> Making rollback operations into objects is a little surprising, SH> but as I don't fully understand the ideas behind nested SH> transactions, I'm sure there's a reason for rollback objects to SH> exist. :-) The database needs some object to represent the particular savepoint. A transaction could call savepoint() three times and have three different states it could rollback to. I decided a rollback object was clearer than a rollback() method on the transaction that took a savepoint_id argument. SH> It seems to me that the data manager should register to receive SH> specific notifications. Some data managers are only interested SH> in knowing when an object is moving from "ghost" to "saved" and SH> from "saved" to "changed" state (such as ZODB); others might SH> want more events, like being notified the first time an object SH> is read in a transaction or receiving notification of *every* SH> attribute change. Supporting the extra events in C only incurs SH> a speed penalty if the data manager requests those events. That's a good idea. We need to flesh out all the events that might be part of the persistence framework, then we can see how that percolates up into the transaction API. Jeremy From jim@zope.com Wed Jul 31 20:46:08 2002 From: jim@zope.com (Jim Fulton) Date: Wed, 31 Jul 2002 15:46:08 -0400 Subject: [Persistence-sig] A simple Observation API References: <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com> <3.0.5.32.20020729173745.008a0240@telecommunity.com> <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com> <3.0.5.32.20020730135832.008fa690@telecommunity.com> Message-ID: <3D483E80.6070906@zope.com> Phillip J. Eby wrote: > At 08:40 AM 7/30/02 -0400, Jeremy Hylton wrote: > ... > This has to do with the "write-through mode" phase between > "prepareToCommit()" and "voteOnCommit()" messages (whatever you call them). > During this phase, to support cascaded storage (one data manager writes to > another), all data managers must "write through" any changes that occur > *immediately*. They can't wait for "prepareToCommit()", because they've > already received it. Basically, when the object says, "I've changed" > (i.e. via "register" or "notify" or whatever you call it), the data manager > must write it out right then. > > But, if the _p_changed flag is set *before* the change, the data manager > has no way to know what the change was and write it. It can't wait for > "voteOnCommit()", because then the DM it writes to might have already > voted, for example. It *must* know about the change as soon as the change > has occurred. Thus, the change message must *follow* a change. It's okay > if there are multiple change messages, as long as there's at least one > *after* a set of changes. I realize that this issue seems to be resolved by eliminating write throughs, but I want to make sure I understand something here. I would have assumed that all changes to persistent objects would occur before commit, and thus before any prepares are done. You seem to be assuming that persistent objects could change after the application has issued a commit. Is that right? Is the reason that the prepare logic of some data managers do their work by manipulating persistent objects in other data managers? Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim@zope.com Wed Jul 31 20:54:46 2002 From: jim@zope.com (Jim Fulton) Date: Wed, 31 Jul 2002 15:54:46 -0400 Subject: [Persistence-sig] A simple Observation API References: Message-ID: <3D484086.2020408@zope.com> Shane Hathaway wrote: > On 31 Jul 2002, Sebastien Bigaret wrote: > > >> ...I can't decide whether you are talking about initialization of a >> transaction _instance_. The last sentence suggests that participants >> are unregistered when the transaction closes: do you mean destroyed, >> or commit/rollback time? If this is the latter case, then I guess I >> have missed something, since I cannot find any references in the >> previous threads about participants being unregistered at that >> point. If this is the first case (hence, making it possible to >> generate a given set of DataManagers for each new transaction), then >> my proposal for DM-factories might be meaningful. >> > > The terminology we're using is a little confusing, since an object that is > truly a transaction should probably begin its life at the beginning of a > transaction and, This doesn't help. ;) > at commit or rollback time, should become permanently > immutable. It might even be stored in the database. > > But the things we've been calling transactions play a role more like > transaction "coordinators". As coordinators, they might be reused for > numerous non-overlapping transactions. If they are reused, it makes > sense to be able to register a permanent transaction participant with a > specific coordinator. Right. We really need to clean up the terminology. We should distinguish between "transaction coordinators" (or "transaction managers" or whatever) and transactions. > I think there might a problem, though. ZODB customarily uses one > transaction coordinator per thread. This is changing. In the future, you'll be able to asociate a connection and a transaction coordinator independent of thread. > But ZODB connections are not really > thread-specific; they may be reused in a different thread when they are > opened or closed. So if, for example, you registered a permanent > transaction participant that cleared the cache of a specific ZODB > connection, you wouldn't get the effect you wanted! :-) But ZODB connections are rarely used by multiple threads at the same time. In the model where you do associate transaction coordinators with threads, what you'd want to do is register a connection with a thread-global transaction coordinator when the connection is opened and unregister it when it is closed. > That's why I suggested that if you want permanent participants, that > perhaps you'd really want to register the transaction participant for all > threads. No, you don't want that. > It requires you to consider thread safety, but I think you'd > frequently have to consider that anyway. No, with ZODB you effectively never have to worry about threads. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jim@zope.com Wed Jul 31 20:56:27 2002 From: jim@zope.com (Jim Fulton) Date: Wed, 31 Jul 2002 15:56:27 -0400 Subject: [Persistence-sig] A simple Observation API References: <20020731063240.55579.qmail@web20705.mail.yahoo.com> Message-ID: <3D4840EB.20501@zope.com> Ilia Iourovitski wrote: > If object participate in more than one transaction > concurrently, transaction API shall provide locks. Right. That's why you really don't want to share a single (copy of an) object among multiple concurrent threads or transactions. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From pje@telecommunity.com Wed Jul 31 20:55:40 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Wed, 31 Jul 2002 15:55:40 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: <3D483E80.6070906@zope.com> References: <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com> <3.0.5.32.20020729173745.008a0240@telecommunity.com> <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com> <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com> <3.0.5.32.20020730135832.008fa690@telecommunity.com> Message-ID: <3.0.5.32.20020731155540.00910250@telecommunity.com> At 03:46 PM 7/31/02 -0400, Jim Fulton wrote: > >I realize that this issue seems to be resolved by eliminating write throughs, >but I want to make sure I understand something here. I would have assumed that >all changes to persistent objects would occur before commit, and thus before >any prepares are done. You seem to be assuming that persistent objects could >change after the application has issued a commit. Is that right? Is the reason >that the prepare logic of some data managers do their work by manipulating >persistent objects in other data managers? Right. One of my examples was data manager "A" saving its objects by writing them to an XML DOM. That DOM in turn might be a set of persistent objects, managed by data manager "B", which saves them by writing the entire XML document into a field of a relational database. From shane@zope.com Wed Jul 31 21:01:43 2002 From: shane@zope.com (Shane Hathaway) Date: Wed, 31 Jul 2002 16:01:43 -0400 (EDT) Subject: [Persistence-sig] A simple Observation API In-Reply-To: <3D484086.2020408@zope.com> Message-ID: On Wed, 31 Jul 2002, Jim Fulton wrote: > Shane Hathaway wrote: > > But ZODB connections are not really > > thread-specific; they may be reused in a different thread when they are > > opened or closed. So if, for example, you registered a permanent > > transaction participant that cleared the cache of a specific ZODB > > connection, you wouldn't get the effect you wanted! :-) > > But ZODB connections are rarely used by multiple threads at the same time. > In the model where you do associate transaction coordinators with threads, > what you'd want to do is register a connection with a thread-global > transaction coordinator when the connection is opened and unregister it > when it is closed. Ah-ha, that would work. Thanks. Shane From jeremy@alum.mit.edu Wed Jul 31 21:09:10 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 31 Jul 2002 16:09:10 -0400 Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: References: <15688.15508.288534.906790@slothrop.zope.com> Message-ID: <15688.17382.566113.907598@slothrop.zope.com> >>>>> "SH" == Shane Hathaway writes: SH> On Wed, 31 Jul 2002, Jeremy Hylton wrote: I would like this SH> interface to be called ITransactionParticipant. There are many SH> interesting kinds of objects that would be interested in SH> participating in a transaction, and not all of them have the SH> immediate responsibility of storing data. But the names you SH> chose for the methods are very clear and concise, I think. >> >> I think IResourceManager is probably better (see above). I wish >> I could take credit for the names, but I just grabbed them from >> the Gray & Reuter book :-). SH> Ok, but some of the things we'd like to tie into transactions SH> don't really manage data/resources. For example, SH> "CommitVersion", "AbortVersion", and "TransactionalUndo" objects SH> (from ZODB 3) just listen for the "commit" message. Then they SH> ask an object that really is responsible for data/resources to SH> do something. SH> I don't have the book, but my uneducated guess is that we're SH> working with something a little more general than what Gray and SH> Reuter proposed. I think that "resource manager" is a suitably generic term. Do we really care whether the thing-with-a-commit-method manages an object or not? I don't think it makes things clearer to distinguish between the overall class of resource managers and the subset that manage their own objects. There are a bunch of ways to split this hair: The XXXVersion and TransactionalUndo objects really do have resources -- the names of the version or the transaction id. The Connection doesn't manage objects either, the storage does. So the storage is a resource manager (except that it doesn't support the resource manager API) and all these things layered on top constitute nested resource managers. All of the above are resource managers. It's not appropriate to ask how these managers work, because that's not part of the transaction API. A resource manager is just a black box with prepare(), commit(), etc. methods. Jeremy From pje@telecommunity.com Wed Jul 31 21:07:55 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Wed, 31 Jul 2002 16:07:55 -0400 Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <15688.16504.339771.416052@slothrop.zope.com> References: <5.1.0.14.0.20020730082812.05e848c0@mail.telecommunity.com> <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <3.0.5.32.20020719120237.00898b60@telecommunity.com> <5.1.0.14.0.20020730082812.05e848c0@mail.telecommunity.com> Message-ID: <3.0.5.32.20020731160755.01389ca0@telecommunity.com> At 03:54 PM 7/31/02 -0400, Jeremy Hylton wrote: >>>>>> "PJE" == Phillip J Eby writes: > > PJE> At 08:36 PM 7/29/02 -0400, Jeremy Hylton wrote: > >> Last week, I worked out a revised transaction API for user code > >> and for data managers. It's implemented in ZODB4, but is fairly > >> preliminary code. I imagine we'll revise it further, but I'd > >> like to describe the changes briefly. > > PJE> I'm not sure if this new API is in relation to the proposals on > PJE> this list or not, but I'm curious how this affects a few > PJE> things: > > PJE> * The need for participants to join every transaction. > >How would you like this feature to interact with custom policies for >mapping threads to transaction ids? If ZODB keeps with its default >policy, it may be useful for a ZODB Connection (resource manager) to >join every transaction run by a particular thread. However, the >Connection would need to stop joining at some pount. I'd just like to be able to create a transaction manager that's used for some set of transactions, and register some set of participants to it. Nothing fancy, really. I'll manage my own threading issues entirely outside of the transaction manager and participants. > PJE> * If a data manager can't support rollback to a savepoint, what > PJE> does it return? > >Good question. Here's my first guess at an answer: It returns None. >If multiple resource managers participate in a transaction and one >doesn't support savepoints, then the application can't rollback the >savepoint. The other resource managers may execute the savepoint, but >rollback is impossible. Perhaps it should return a NullSavePoint object, that returns False to a "can_rollback" method. The aggregated savepoint object would return true for can_rollback if all its contents return true, and the rollback() method would only run if can_rollback is true. >(In the case of ZODB, it can be useful to execute a savepoint >regardless of whether it is rollback, because it allows modified >objects to become ghosts.) Yes; it could also be used to ensure that an external system such as an SQL database reflects the current state of persistent objects, which is handy if one must also use legacy code in the same transaction context which runs against that data. > >> (The need for notify-on-read, BTW, is to support higher isolation > >> levels than ZODB currently supports.) > > PJE> And to support delayed loading of attributes by multi-backend > PJE> data managers. Although to support that, there'd need to be > PJE> the opportunity to override the attribute value that was read. > >It's possible to define a custom __getattr__ on a Persistent >subclass. Is that enough? Nope. __getattr__ is implemented by the Persistent object, not the data manager. I want the *data manager* to do delayed loading of attributes in certain cases. For example, it's a common use case for me to need data for an object from both an LDAP and an SQL database. However, some LDAP attributes (such as a user's picture or the membership of an LDAP group) are *huge*. I'd like to avoid the overhead of loading these attributes until/unless they're needed (because on most transactions they're not needed). That's why I need the ability for a data manager to implement delayed loading of certain attributes. Without this separation, you can't implement your Persistent objects as a truly abstract application model, that's portable to different data managers as backends. A Persistent object should never have to know details of how its storage is implemented. In theory, as a result of this SIG's work, people should be able to write a set of Persistent classes for any application model, and then persist it with any sufficiently capable data manager(s). From shane@zope.com Wed Jul 31 20:56:36 2002 From: shane@zope.com (Shane Hathaway) Date: Wed, 31 Jul 2002 15:56:36 -0400 (EDT) Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <15688.15508.288534.906790@slothrop.zope.com> Message-ID: On Wed, 31 Jul 2002, Jeremy Hylton wrote: > SH> I would like this interface to be called > SH> ITransactionParticipant. There are many interesting kinds of > SH> objects that would be interested in participating in a > SH> transaction, and not all of them have the immediate > SH> responsibility of storing data. But the names you chose for the > SH> methods are very clear and concise, I think. > > I think IResourceManager is probably better (see above). I wish I > could take credit for the names, but I just grabbed them from the Gray > & Reuter book :-). Ok, but some of the things we'd like to tie into transactions don't really manage data/resources. For example, "CommitVersion", "AbortVersion", and "TransactionalUndo" objects (from ZODB 3) just listen for the "commit" message. Then they ask an object that really is responsible for data/resources to do something. I don't have the book, but my uneducated guess is that we're working with something a little more general than what Gray and Reuter proposed. Shane From jeremy@alum.mit.edu Wed Jul 31 20:54:32 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 31 Jul 2002 15:54:32 -0400 Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <5.1.0.14.0.20020730082812.05e848c0@mail.telecommunity.com> References: <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net> <87y9cdw37b.fsf@bidibule.brest.inqual.bzh> <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com> <3.0.5.32.20020719120237.00898b60@telecommunity.com> <5.1.0.14.0.20020730082812.05e848c0@mail.telecommunity.com> Message-ID: <15688.16504.339771.416052@slothrop.zope.com> >>>>> "PJE" == Phillip J Eby writes: PJE> At 08:36 PM 7/29/02 -0400, Jeremy Hylton wrote: >> Last week, I worked out a revised transaction API for user code >> and for data managers. It's implemented in ZODB4, but is fairly >> preliminary code. I imagine we'll revise it further, but I'd >> like to describe the changes briefly. PJE> I'm not sure if this new API is in relation to the proposals on PJE> this list or not, but I'm curious how this affects a few PJE> things: PJE> * The need for participants to join every transaction. How would you like this feature to interact with custom policies for mapping threads to transaction ids? If ZODB keeps with its default policy, it may be useful for a ZODB Connection (resource manager) to join every transaction run by a particular thread. However, the Connection would need to stop joining at some pount. PJE> * Arbitarily nested, cascading participants. Does this support PJE> them? How? I don't see any mention of the issues in the PJE> interfaces. It doesn't support them at all, as much previous discussion has illustrated. I'd like to address that in a separate email. PJE> * If a data manager can't support rollback to a savepoint, what PJE> does it return? Good question. Here's my first guess at an answer: It returns None. If multiple resource managers participate in a transaction and one doesn't support savepoints, then the application can't rollback the savepoint. The other resource managers may execute the savepoint, but rollback is impossible. (In the case of ZODB, it can be useful to execute a savepoint regardless of whether it is rollback, because it allows modified objects to become ghosts.) I'm not sure if it's useful to provide an introspection capability to see if rollback is allowed. >> (The need for notify-on-read, BTW, is to support higher isolation >> levels than ZODB currently supports.) PJE> And to support delayed loading of attributes by multi-backend PJE> data managers. Although to support that, there'd need to be PJE> the opportunity to override the attribute value that was read. It's possible to define a custom __getattr__ on a Persistent subclass. Is that enough? Jeremy From jeremy@alum.mit.edu Wed Jul 31 21:23:42 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 31 Jul 2002 16:23:42 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: <3.0.5.32.20020730150539.0089c240@telecommunity.com> References: <3.0.5.32.20020730135832.008fa690@telecommunity.com> <3.0.5.32.20020730150539.0089c240@telecommunity.com> Message-ID: <15688.18254.298450.338865@slothrop.zope.com> >>>>> "PJE" == Phillip J Eby writes: PJE> That's what TransactionAgents does, but that's not what I'm PJE> looking for per se. I'm looking at simple data managers. For PJE> example, if I make a data manager that persists a set of PJE> objects to an XML DOM, I might want to use it with a DOM PJE> persistence manager that stores XML documents in an SQL PJE> database. All three "data managers" (persist->XML, PJE> XML->Database, SQL database) are transaction participants, with PJE> implied or actual ordering. If I understand this example correctly, then there are three different objects that implement the resource manager interface: 1. persist->XML 2. XML->Database 3. Database It sounds like the application code only interacts with 1, and that 2 and 2 should be considered implementation details of 1. Thus, only 1 should register with the transaction, since it's the only independent entity. When the transaction commits, it first calls prepare() on 1. This delegates the responsibility for the commit to 2, which in turn delegates to 3. So for 1 to return True from its prepare, 2 and 3 must also return True. Why doesn't this work? :-) Jeremy From jeremy@alum.mit.edu Wed Jul 31 21:26:49 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 31 Jul 2002 16:26:49 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: <3.0.5.32.20020730150539.0089c240@telecommunity.com> References: <3.0.5.32.20020730135832.008fa690@telecommunity.com> <3.0.5.32.20020730150539.0089c240@telecommunity.com> Message-ID: <15688.18441.175230.815465@slothrop.zope.com> > DM2.prepare() > DM3.prepare() > DM1.prepare() > > DM2.vote() > DM3.vote() > DM1.vote() Note in the API I've proposed/implemented, there is only prepare(), not vote(). The resource manager should return True from prepare() if it is prepared to commit. Jeremy From shane@zope.com Wed Jul 31 21:33:02 2002 From: shane@zope.com (Shane Hathaway) Date: Wed, 31 Jul 2002 16:33:02 -0400 (EDT) Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: <15688.17382.566113.907598@slothrop.zope.com> Message-ID: On Wed, 31 Jul 2002, Jeremy Hylton wrote: > >>>>> "SH" == Shane Hathaway writes: > SH> I don't have the book, but my uneducated guess is that we're > SH> working with something a little more general than what Gray and > SH> Reuter proposed. > > I think that "resource manager" is a suitably generic term. Do we > really care whether the thing-with-a-commit-method manages an object > or not? I don't think it makes things clearer to distinguish between > the overall class of resource managers and the subset that manage > their own objects. I'm going to defer to your book, with a final objection that "resource manager" is terribly non-descriptive except in the context of a special jargon. If I'm a Python programmer with plenty of experience but no experience in transactions, I'm going to have to read a whole book to learn what a resource manager is. Something like "transaction participant", however, gives me a much better idea of the contract between the coordinator and the participant. Shane From jim@zope.com Wed Jul 31 21:33:30 2002 From: jim@zope.com (Jim Fulton) Date: Wed, 31 Jul 2002 16:33:30 -0400 Subject: [Persistence-sig] "Straw Man" transaction API References: Message-ID: <3D48499A.6060507@zope.com> Shane Hathaway wrote: > On Wed, 31 Jul 2002, Jeremy Hylton wrote: > > >> SH> I would like this interface to be called >> SH> ITransactionParticipant. There are many interesting kinds of >> SH> objects that would be interested in participating in a >> SH> transaction, and not all of them have the immediate >> SH> responsibility of storing data. But the names you chose for the >> SH> methods are very clear and concise, I think. >> >>I think IResourceManager is probably better (see above). I wish I >>could take credit for the names, but I just grabbed them from the Gray >>& Reuter book :-). >> > > Ok, but some of the things we'd like to tie into transactions don't really > manage data/resources. I agree, theoritically, but > For example, "CommitVersion", "AbortVersion", and > "TransactionalUndo" objects (from ZODB 3) just listen for the "commit" > message. These are not good examples, as these would no longer be registered with the transaction coordinator. Rather, they would be handled internally to the resource manager (ow whatever). They are certainly also about managing data, Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (888) 344-4332 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From jeremy@alum.mit.edu Wed Jul 31 21:45:06 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 31 Jul 2002 16:45:06 -0400 Subject: [Persistence-sig] "Straw Man" transaction API In-Reply-To: References: <15688.17382.566113.907598@slothrop.zope.com> Message-ID: <15688.19538.482052.762174@slothrop.zope.com> >>>>> "SH" == Shane Hathaway writes: SH> I'm going to defer to your book, with a final objection that SH> "resource manager" is terribly non-descriptive except in the SH> context of a special jargon. I think special jargon is appropriate for the SIG. We'll have to have a section of the transaction PEP that defines the terms. SH> If I'm a Python programmer with plenty of experience but no SH> experience in transactions, I'm going to have to read a whole SH> book to learn what a resource manager is. Something like SH> "transaction participant", however, gives me a much better idea SH> of the contract between the coordinator and the participant. On the other hand, I think it's fair for end-user documentation to use less precise terminology if that makes it easier to understand. On the third hand, this definition is only important for people writing transaction participants. It doesn't seem unreasonable for them to learn the domain jargon; even better, if we stick with widely used jargon, people with database backend experience but no Python experience will be at home. Jeremy From pje@telecommunity.com Wed Jul 31 22:41:31 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Wed, 31 Jul 2002 17:41:31 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: <15688.18254.298450.338865@slothrop.zope.com> References: <3.0.5.32.20020730150539.0089c240@telecommunity.com> <3.0.5.32.20020730135832.008fa690@telecommunity.com> <3.0.5.32.20020730150539.0089c240@telecommunity.com> Message-ID: <3.0.5.32.20020731174131.00904c10@telecommunity.com> At 04:23 PM 7/31/02 -0400, Jeremy Hylton wrote: >>>>>> "PJE" == Phillip J Eby writes: > > PJE> That's what TransactionAgents does, but that's not what I'm > PJE> looking for per se. I'm looking at simple data managers. For > PJE> example, if I make a data manager that persists a set of > PJE> objects to an XML DOM, I might want to use it with a DOM > PJE> persistence manager that stores XML documents in an SQL > PJE> database. All three "data managers" (persist->XML, > PJE> XML->Database, SQL database) are transaction participants, with > PJE> implied or actual ordering. > >If I understand this example correctly, then there are three different >objects that implement the resource manager interface: > >1. persist->XML >2. XML->Database >3. Database > >It sounds like the application code only interacts with 1, and that 2 >and 2 should be considered implementation details of 1. Thus, only 1 >should register with the transaction, since it's the only independent >entity. > >When the transaction commits, it first calls prepare() on 1. This >delegates the responsibility for the commit to 2, which in turn >delegates to 3. So for 1 to return True from its prepare, 2 and 3 >must also return True. > >Why doesn't this work? :-) > Because 3 would be shared by other objects also being persisted to that SQL database, for just the first thing that comes to mind. But that's an implementation detail. This is primarily an architectural issue. Data manager 1 is generic code written to work on an XML DOM. It shouldn't have to *know* that the DOM *is* persistent, let alone *how* it's persisted. You're calling for the placement of global architecture knowledge into individual components, that should only be known at a higher abstraction level. From pje@telecommunity.com Wed Jul 31 22:42:48 2002 From: pje@telecommunity.com (Phillip J. Eby) Date: Wed, 31 Jul 2002 17:42:48 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: <15688.18441.175230.815465@slothrop.zope.com> References: <3.0.5.32.20020730150539.0089c240@telecommunity.com> <3.0.5.32.20020730135832.008fa690@telecommunity.com> <3.0.5.32.20020730150539.0089c240@telecommunity.com> Message-ID: <3.0.5.32.20020731174248.009022d0@telecommunity.com> At 04:26 PM 7/31/02 -0400, Jeremy Hylton wrote: >> DM2.prepare() >> DM3.prepare() >> DM1.prepare() >> >> DM2.vote() >> DM3.vote() >> DM1.vote() > >Note in the API I've proposed/implemented, there is only prepare(), >not vote(). The resource manager should return True from prepare() if >it is prepared to commit. > Note that this doesn't work correctly when resource managers are cascaded and need re-flush messages, per the discussion between Shane and I. :) From jeremy@alum.mit.edu Wed Jul 31 23:19:50 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 31 Jul 2002 18:19:50 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: <3.0.5.32.20020731174131.00904c10@telecommunity.com> References: <3.0.5.32.20020730150539.0089c240@telecommunity.com> <3.0.5.32.20020730135832.008fa690@telecommunity.com> <3.0.5.32.20020731174131.00904c10@telecommunity.com> Message-ID: <15688.25222.339977.30416@slothrop.zope.com> [Meta-comment: I'm sorry it's taking us so long to reach some kind of understanding on this issue. It seems like we keep talking past each other, but I'm not sure why.] >>>>> "PJE" == Phillip J Eby writes: [I wrote:] >> If I understand this example correctly, then there are three >> different objects that implement the resource manager interface: >> >> 1. persist->XML >> 2. XML->Database >> 3. Database >> >> It sounds like the application code only interacts with 1, and >> that 2 and 2 should be considered implementation details of 1. >> Thus, only 1 should register with the transaction, since it's the >> only independent entity. >> >> When the transaction commits, it first calls prepare() on 1. >> This delegates the responsibility for the commit to 2, which in >> turn delegates to 3. So for 1 to return True from its prepare, 2 >> and 3 must also return True. >> >> Why doesn't this work? :-) >> PJE> Because 3 would be shared by other objects also being persisted PJE> to that SQL database, for just the first thing that comes to PJE> mind. If you call prepare() twice on a resource manager, it should return the same answer both times, right? If so, then it shouldn't matter if the same resource manager is being used as a top-level component and an internal component. It will perform its prepare work the first time it is called and then just return its vote the second time it is called. PJE> But that's an implementation detail. This is primarily an PJE> architectural issue. I agree that it's an architectural issue. (It's good that we agree on some things .) The example above sounds like a component-based system, where there is a compound persist->xml->database component. The subcomponents of this entity should not be registering themselves with the transaction manager. A component should control all communication of its constituent parts with other components. PJE> Data manager 1 is generic code written to work on an XML DOM. PJE> It shouldn't have to *know* that the DOM *is* persistent, let PJE> alone *how* it's persisted. The description of the first component implies that is supports persistence objects and stores them using another component that stores XML. That top-level component *must* know how to handle persistent objects and transactions, as it implements those interfaces. PJE> You're calling for the placement of global architecture PJE> knowledge into individual components, that should only be known PJE> at a higher abstraction level. I thought I was arguing the opposite. Individual components should not all talk to the global transaction manager. Instead, when a component is assembled, the parts should be wired together so that each knows who to communicate with. Jeremy From jeremy@alum.mit.edu Wed Jul 31 23:08:41 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 31 Jul 2002 18:08:41 -0400 Subject: [Persistence-sig] A simple Observation API In-Reply-To: <3.0.5.32.20020731174248.009022d0@telecommunity.com> References: <3.0.5.32.20020730150539.0089c240@telecommunity.com> <3.0.5.32.20020730135832.008fa690@telecommunity.com> <3.0.5.32.20020731174248.009022d0@telecommunity.com> Message-ID: <15688.24553.963964.576250@slothrop.zope.com> >>>>> "PJE" == Phillip J Eby writes: >> Note in the API I've proposed/implemented, there is only >> prepare(), not vote(). The resource manager should return True >> from prepare() if it is prepared to commit. >> PJE> Note that this doesn't work correctly when resource managers PJE> are cascaded and need re-flush messages, per the discussion PJE> between Shane and I. :) I didn't understand that discussion. :) Jeremy