A cautionary tale

Wed Dec 4 04:16:40 EST 2013

Hi all

There is no question at the end of this, it is just an account of a couple 
of days in the life of a programmer (me). I just felt like sharing it. Feel 
free to ignore.

The business/accounting system I am writing involves a lot of reading data 
from a database, and if changed, writing it back again.

There are a number of data types involved - string, integer, decimal, 
boolean, date, datetime. I currently support PostgreSQL, MS SQL Server, and 
sqlite3. In all cases, they have a DB-API 2.0-compliant adaptor which 
handles the conversion to and from python objects transparently.

Over the last year or so I have added two new types. I added a JSON type, to 
handle 'lists' and 'dicts', and an XML type to handle more complex 
structures. In both cases they are stored in the database as strings, so I 
have to handle the conversions myself.

I don't allow direct access to the objects, as they can be affected by 
various business rules, so I use getters and setters. For the new types, I 
used the getter to convert from the string to the underlying object, and the 
setter to convert it back to a string.

Then a couple of days ago I decided that this was not the correct place to 
do it - it should be done when reading from and writing to the database. 
That way the data is always represented by the underlying object, which can 
be passed around without worrying about conversions.

It was a bit of effort, as I had to add extra getters and setters to handle 
the transfer between the database and the program, and then over-ride them 
in the case of the new data types to provide the required functionality. But 
after a few hours of hunting down all the places that required changes, 
testing, fixing errors, etc, it seemed to be working fine, so I thought I 
could carry on with the meat of my program.

Then I noticed that certain changes were not being written back to the 
database. After some investigation, I found the error in a part of my 
program that I have not had to look at for ages. When reading data in from 
the database, I preserve a copy of the original value. When saving, I 
compare that to the current value when deciding which columns need updating. 
I do this in the obvious way -

    on reading -
        orig_value = value

    on saving -
       if value != orig_value:
            this one needs updating

Have you spotted the deliberate mistake yet? In the case of a JSON list, 
orig_value and value point to the same, mutable, list. So when I compare 
value with orig_value, they are always the same, whether changes have been 
made or not!

The obvious answer is to store a copy of the list. It was not so obvious 
where to make the change, as there were other implications. Eventually I 
decided to over-ride the 'getter' for the JSON type, and return copy(value) 
instead of value. That way if it is changed and then put back using the 
'setter', the two objects are no longer equal. I have made that change, done 
some more testing, and for now it seems ok.

So have the last couple of days been a waste of time? I don't think so. Is 
the program a bit cleaner and conceptually sounder? I hope so.

Why am I telling you all this? No particular reason, just thought some of 
you might find it interesting.

Frank Millman