Please have a look at this class

Steven D'Aprano steve at REMOVE.THIS.cybersource.com.au
Thu Jan 25 10:40:16 EST 2007


On Thu, 25 Jan 2007 06:38:28 -0800, antred wrote:

> Hello everyone,
> 
> While working on a program I encountered a situation where I'd
> construct a largish data structure (a tree) from parsing a host of
> files and would end up having to throw away parts of my newly built
> tree if a file turned out to contain invalid data.

Why not validate the file before you add it to the tree?

> My first thought was
> 'Well, you can always make a deep copy of your tree first, then add new
> data to the copy and revert to the original if you need to.", but as
> this tree can grow very big this is not exactly efficient.

My first thought would have been to just delete the invalid node from the
tree.

BTW, what sort of tree are you using? A basic binary tree, or something
more sophisticated? Deleting nodes from trees need not be expensive.


> So my second idea was to come up with a class that offers a very
> limited degree of database-like behavior, meaning you can make changes
> to the object and then decide whether you want to commit those changes
> or roll them back to get back to the original.

I don't know whether that would be a useful tool to have in general. How
do you use it? I can guess two broad strategies:

(1) commit; add; validate; commit or roll-back.

That seems wasteful, because if you're adding only a single node to the
tree, it should be *very* easy to delete it and revert to the previous
state.

Alternatively:

(2) commit; add multiple times; validate multiple items; commit the lot,
or roll-back the lot and then re-add the ones that validated.

But does it really make sense to be rolling back a whole lot of
transactions, just because one of them is faulty? I don't think so.

In this specific example, I think the right solution is to make adding and
deleting a file an atomic operation (or at least as close to atomic as
Python allows), so you don't need to think about attributes. You just add
or delete nodes. Anyway, that's the solution I'd be looking at.

[snip]

> The basic idea behind this is that each instance of the Unrollable
> class keeps an internal dictionary (which, in lieu of a better name I'm
> currently calling 'sand box') 

Bad name! "Sand box" already has an established usage, and that's not it.

A better name might be something like "provisional" or "uncommitted" or
similar.

[snip]

> Finally, this works for 'private' attributes (i.e. names with two
> leading underscores), too, as the __setattr__ implementation mangles
> the name of the attribute if it detects a private name.

I think it would be better NOT to mangle the names of the attribute, as
that defeats the purpose of name mangling in the first place. If the user
of your code wants to muck about with private variables, then he should
mangle the name before passing it to your code.


[snip]

> class Unrollable( object ):
> 	"""Provides a very simple commit/rollback system."""
> 
> 	def __setattr__( self, attributeName, attributeValue ):
> 		"""Changes the specified attribute by setting it to the passed value.
> The change is only made to the sandbox and is not committed."""
> 
> 		if attributeName.find( '__' ) == 0:
> 			# Mangle name to make attribute private.
> 			attributeName = '_' + self.__class__.__name__ + attributeName

You use mangle twice in your code -- that should be factored out as a
function or method.

Or better yet, just don't do it at all.


> 		try:
> 			theDict = self.__dict__[ '_Unrollable__dSandBox' ]
> 		except KeyError:
> 			theDict = self.__dict__[ '_Unrollable__dSandBox' ] = {}

Use this instead:

theDict = self.__dict__.setdefault('_Unrollable__dSandBox', {})



> 		theDict[ attributeName ] = attributeValue
> 
> 
> 	def __getattr__( self, attributeName ):
> 		"""This method ensures an attribute can be accessed even when it
> hasn't been committed yet (since it might not exist in the object
> 		itself yet)."""

Doesn't that defeat the purpose of having a distinct commit? If something
hasn't been committed, you shouldn't be able to access it!


[snip]

What about commits/roll-backs for deletion of attributes?


> 	def commitChanges( self ):
> 		"""Commits the contents of the sandbox to the actual object. Clears
> the sandbox."""
> 		while len( self.__dSandBox ) > 0:
> 			key, value = self.__dSandBox.popitem()
> 			self.__dict__[ key ] = value

If you change the name of your class, or subclass it, your 'sandbox' will
break. Here, you use a double-underscore name and let Python mangle it,
but earlier you hard-coded the mangled name. Do one or the other, not both.

 
> 	def unroll( self, bRecurseSandBox = True, bRecurseDict = False ):
> 		"""Ditches all changes currently in the sandbox. Recurses all objects
> in the instance itself and in its sandbox and, if
> 		they're unrollable instances themselves, invokes the unroll method on
> them as well."""

But if you're ditching the objects in the 'sandbox', why would you need to
unroll them? Aren't they going to disappear?


> 		if bRecurseSandBox:
> 			while len( self.__dSandBox ) > 0:
> 				key, value = self.__dSandBox.popitem()
> 
> 				if isinstance( value, Unrollable ):
> 					value.unroll( bRecurseSandBox, bRecurseDict )
> 		else:
> 			self.__dSandBox.clear()
> 
> 		if bRecurseDict:
> 			iterator = self.__dict__.itervalues()
> 
> 			while True:
> 				try:
> 					nextVal = iterator.next()
> 				except StopIteration:
> 					break
> 
> 				if isinstance( nextVal, Unrollable ):
> 					nextVal.unroll( bRecurseSandBox, bRecurseDict )


Use this instead:


for nextVal in iterator:
    if isinstance( ... ):



> 	def hasUncommittedChanges( self ):
> 		"""Returns true if there are uncommitted changes, false otherwise."""
> 		return len( self.__dSandBox ) > 0

Which will fail if the 'sandbox' doesn't exist. In other parts of the code
you test for its existence.


Another thing... I notice that you seem to be inconsistently using a form
of the Hungarian Notation (e.g. dSandbox, bRecurseDict, etc.). I suggest
you read this for a better understanding of when you should do so: 

http://www.joelonsoftware.com/articles/Wrong.html


Hope this was helpful,



-- 
Steven.




More information about the Python-list mailing list