A cautionary tale
Frank Millman
frank at chagford.com
Wed Dec 4 04:16:40 EST 2013
Hi all
There is no question at the end of this, it is just an account of a couple
of days in the life of a programmer (me). I just felt like sharing it. Feel
free to ignore.
The business/accounting system I am writing involves a lot of reading data
from a database, and if changed, writing it back again.
There are a number of data types involved - string, integer, decimal,
boolean, date, datetime. I currently support PostgreSQL, MS SQL Server, and
sqlite3. In all cases, they have a DB-API 2.0-compliant adaptor which
handles the conversion to and from python objects transparently.
Over the last year or so I have added two new types. I added a JSON type, to
handle 'lists' and 'dicts', and an XML type to handle more complex
structures. In both cases they are stored in the database as strings, so I
have to handle the conversions myself.
I don't allow direct access to the objects, as they can be affected by
various business rules, so I use getters and setters. For the new types, I
used the getter to convert from the string to the underlying object, and the
setter to convert it back to a string.
Then a couple of days ago I decided that this was not the correct place to
do it - it should be done when reading from and writing to the database.
That way the data is always represented by the underlying object, which can
be passed around without worrying about conversions.
It was a bit of effort, as I had to add extra getters and setters to handle
the transfer between the database and the program, and then over-ride them
in the case of the new data types to provide the required functionality. But
after a few hours of hunting down all the places that required changes,
testing, fixing errors, etc, it seemed to be working fine, so I thought I
could carry on with the meat of my program.
Then I noticed that certain changes were not being written back to the
database. After some investigation, I found the error in a part of my
program that I have not had to look at for ages. When reading data in from
the database, I preserve a copy of the original value. When saving, I
compare that to the current value when deciding which columns need updating.
I do this in the obvious way -
on reading -
orig_value = value
on saving -
if value != orig_value:
this one needs updating
Have you spotted the deliberate mistake yet? In the case of a JSON list,
orig_value and value point to the same, mutable, list. So when I compare
value with orig_value, they are always the same, whether changes have been
made or not!
The obvious answer is to store a copy of the list. It was not so obvious
where to make the change, as there were other implications. Eventually I
decided to over-ride the 'getter' for the JSON type, and return copy(value)
instead of value. That way if it is changed and then put back using the
'setter', the two objects are no longer equal. I have made that change, done
some more testing, and for now it seems ok.
So have the last couple of days been a waste of time? I don't think so. Is
the program a bit cleaner and conceptually sounder? I hope so.
Why am I telling you all this? No particular reason, just thought some of
you might find it interesting.
Frank Millman
More information about the Python-list
mailing list