Python declarative

Fri Jan 24 06:18:01 EST 2014

On Fri, Jan 24, 2014 at 8:21 PM, Frank Millman <frank at chagford.com> wrote:
> I find that I am using JSON and XML more and more in my project, so I
> thought I would explain what I am doing to see if others think this is an
> acceptable approach or if I have taken a wrong turn.

Please don't take this the wrong way, but the uses of JSON and XML
that you describe are still completely inappropriate. Python does not
need this sort of thing.

> I store database metadata in the database itself. I have a table that
> defines each table in the database, and I have a table that defines each
> column. Column definitions include information such as data type, allow
> null, allow amend, maximum length, etc. Some columns require that the value
> is constrained to a subset of allowable values (e.g. 'title' must be one of
> 'Mr', 'Mrs', etc.). I know that this can be handled by a 'check' constraint,
> but I have added some additional features which can't be handled by that. So
> I have a column in my column-definition table called 'choices', which
> contains a JSON'd list of allowable choices. It is actually more complex
> than that, but this will suffice for a simple example.

This is MySQL-style of thinking, that a database is a thing that's
interpreted by an application. The better way to do it is to craft a
check constraint; I don't know what features you're trying to use that
can't be handled by that, but (at least in PostgreSQL) I've never had
anything that can't be expressed that way. With the simple example you
give, incidentally, it might be better to use an enumeration rather
than a CHECK constraint, but whichever way.

> For my more complex example, I should explain that my project involves
> writing a generalised business/accounting system. Anyone who has worked on
> these knows that you quickly end up with hundreds of database tables storing
> business data, and hundreds of forms allowing users to CRUD the data
> (create/read/update/delete). Each form is unique, and yet they all share a
> lot of common characteristics. I have abstracted the contents of a form
> sufficiently that I can represent at least 90% of it in XML. This is not
> just the gui, but all the other elements - the tables required, any input
> parameters, any output parameters, creating any variables to be used while
> entering the form, any business logic to be applied at each point, etc. Each
> form definition is stored as gzip'd XML in a database, and linked to the
> menu system. There is just one python program that responds to the selection
> of a menu option, retrieves the form from the database, unpacks it and runs
> it.

Write your rendering engine as a few simple helper functions, and then
put all the rest in as code instead of XML. The easiest way to go
about it is to write three forms, from scratch, and then look at the
common parts and figure out which bits can go into helper functions.
You'll find that you can craft a powerful system with just a little
bit of code, if you build it right, and it'll mean _less_ library code
than you currently have as XML. If you currently represent 90% of it
in XML, then you could represent 90% of it with simple calls to helper
functions, which is every bit as readable (probably more so), and much
*much* easier to tweak. Does your XML parsing engine let you embed
arbitrary expressions into your code? Can you put a calculated value
somewhere? With Python, everything's code, so there's no difference
between saying "foo(3)" and saying "foo(1+2)".

> Incidentally, I would take issue with the comment that 'JSON is easily
> readable by humans (UNLIKE XML)'. Here is a more complete example of my
> 'choices' definition.
>
> [true, true, [["admin", "System administrator", [], []], ["ind",
> "Individual", [["first_name", true], ["surname", true]], [["first_name", "
> "], ["surname", ""]]], ["comp", "Company", [["comp_name", true], ["reg_no",
> true], ["vat_no", false]], [["comp_name", ""]]]]]
>
> You can read it, but what does it mean?
>
> This is what it would look like if I stored it in XML -
>
> <choices use_subtypes="true" use_displaynames="true">
>   <choice code="admin" descr="System administrator">
>     <subtype_columns/>
>     <displaynames/>
>   </choice>
>   <choice code="ind" descr="Individual">
>     <subtype_columns>
>       <subtype_column col_name="first_name" required="true"/>
>       <subtype_column col_name="surname" required="true"/>
>     </subtype_columns>
>     <displaynames>
>       <displayname col_name="first_name" separator=" "/>
>       <displayname col_name="surname" separator=""/>
>     </displaynames>
>   </choice>
>   <choice code="comp" descr="Company">
>     <subtype_columns>
>       <subtype_column col_name="comp_name" required="true"/>
>       <subtype_column col_name="reg_no" required="true"/>
>       <subtype_column col_name="vat_no" required="false"/>
>     </subtype_columns>
>     <displaynames>
>       <displayname col_name="comp_name" separator=""/>
>     </displaynames>
>   </choice>
> </choices>
>
> More verbose - sure. Less human-readable - I don't think so.

Well, that's because you elided all the names in the JSON version. Of
course that's not a fair comparison. :) Here's how I'd render that XML
in JSON. I assume that this is stored in a variable, database field,
or whatever, named "choices", and so I skip that first-level name.

{"use_subtypes":true, "use_displaynames":true, "choice":[
    {"code":"admin", "descr":"System administrator",
"subtype_columns":[], "displaynames":[]},
    {"code":"ind", "descr":"Individual",
        "subtype_columns":[
            {"col_name":"first_name", "required":true},
            {"col_name":"surname", "required":true}
        ], "displaynames":[
            {"col_name":"first_name", "separator":" "},
            {"col_name":"surname", "separator":""}
        ]
    },
    {"code":"comp", "descr":"Company",
        "subtype_columns":[
            {"col_name":"comp_name", "required":true},
            {"col_name":"reg_no", "required":true},
            {"col_name":"vat_no", "required":false}
        ], "displaynames":[
            {"col_name":"comp_name", "separator":""}
        ]
    }
]}

You can mix and match the two styles to get the readability level you
need. I think the "col_name" and "required" tags are probably better
omitted, which would make the JSON block look like this:

{"use_subtypes":true, "use_displaynames":true, "choice":[
    {"code":"admin", "descr":"System administrator",
"subtype_columns":[], "displaynames":[]},
    {"code":"ind", "descr":"Individual",
        "subtype_columns":[
            ["first_name", "required"],
            ["surname", "required"]
        ], "displaynames":[
            ["first_name", " "]
            ["surname", ""]
        ]
    },
    {"code":"comp", "descr":"Company",
        "subtype_columns":[
            ["comp_name", "required"]
            ["reg_no", "required"]
            ["vat_no", "optional"]
        ], "displaynames":[
            ["comp_name", ""]
        ]
    }
]}

Note that instead of "true" or "false", I've simply used "required" or
"optional". Easy readability without excessive verbosity. And you
could probably make "required" the default, so you just have
[["comp_name"], ["reg_no"], ["vat_no", "optional"]] for the last
block.

Now, here's the real killer. You can take this JSON block and turn it
into Python code like this:

choices = ... block of JSON ...

And now it's real code. It's that simple. You can't do that with XML.
And once it's code, you can add functionality to it to improve
readability even more.

> Also, intuitively one would think it would take much longer to process the
> XML version compared with the JSON version. I have not done any benchmarks,
> but I use lxml, and I am astonished at the speed. Admittedly a typical
> form-processor spends most of its time waiting for user input. Even so, for
> my purposes, I have never felt the slightest slowdown caused by XML.

On this, I absolutely agree. You will almost never see a performance
difference between any of the above. Don't pick based on performance -
pick based on what makes sense and is readable. The pure-code version
might suffer terribly (interpreted Python with several levels of
helper functions, each one having its overhead - though personally, I
expect it'd be comparable to the others), but you would still do
better with it, because the difference will be on the scale of
microseconds.

I've built GUIs in a number of frameworks, including building
frameworks myself. I've built a system for creating CRUD databases.
(It's still in use, incidentally, though only on our legacy OS/2
systems. One of the costs of writing in REXX rather than Python, but I
didn't know Python back in the early 1990s when I wrote Hudson.)
Frameworks that boast that it doesn't take code to use them tend to
suffer from the Inner-Platform Effect [1] [2] if they get sufficiently
powerful, and it quickly becomes necessary to drop to code somewhere.
(Or, if they DON'T get powerful enough to hit the IPE, they overly
restrain what you can do with them. That's fine if all you want is
simple, but what if 99.9% of what you want fits into the framework and
0.1% simply can't be done?)

With Hudson, I handled the "easy bits" with a single program (basic
table view display with some options, database/table selection, etc,
etc), and then dropped to actual GUI code to handle the custom parts
(the main form for displaying one record; optionally the Search
dialog, though a basic one was provided "for free"), with a library of
handy functions available to call on. Okay, I did a terrible job of
the "library" part back in those days - some of them were copied and
pasted into each file :| - but the concept is there.

In Gypsum (a current project, MUD client for Linux/Windows/Mac OS),
all GUI work is done through GTK. But there are a few parts that get
really clunky, like populating a Table layout, so I made a helper
function that creates a table based on a list of lists: effectively, a
2D array of cell values, where a cell could have a string (becomes a
label), a widget (gets placed there), or nothing (the label/widget to
the left spans this cell too). That doesn't cover _every_ possible
use-case (there's no way to span multiple rows), but if I need
something it can't do, I'm writing code already right there, so I can
just use the Table methods directly. There was actually one extremely
common sub-case where even my GTK2Table function got clunky, so I made
a second level of wrapper that takes a simple list of elements and
creates a 2D array of elements, with labels right-justified. Once
again, if I'm doing the same thing three times, it's a prime candidate
for a helper.

The cool thing about doing everything in code is that you can create
helpers at the scope where they're most needed. To build the character
sheet module for Gypsum (one of its uses is Dungeons and Dragons), I
had to build up a more complex and repetitive GUI than most, so I
ended up making a handful of helpers that existed right there in the
charsheet module. (A couple of them have since been promoted to
global; GTK2Table mentioned above started out in charsheet.) Anyone
reading code anywhere _else_ in the project doesn't need to understand
those functions; but they improve readability in that module
enormously. That's something that's fundamentally impossible in an
interpreted/structured system like an XML GUI - at least, impossible
without some horrible form of inner-platform effect.

Python code is more readable than most data structures you could come
up with, because you don't have to predict everything in advance.

ChrisA

[1] http://en.wikipedia.org/wiki/Inner-platform_effect
[2] http://thedailywtf.com/Articles/The_Inner-Platform_Effect.aspx