Alternatives to XML?

Thu Aug 25 02:29:47 EDT 2016

On Thu, Aug 25, 2016 at 4:11 PM, Frank Millman <frank at chagford.com> wrote:
> "Chris Angelico"  wrote in message
> news:CAPTjJmq2bcQPmQ9itVvZrBZJPcbYe5z6vDpKGYQj=8H+qKvXxQ at mail.gmail.com...
>
> On Thu, Aug 25, 2016 at 3:33 PM, Frank Millman <frank at chagford.com> wrote:
>>
>> @Peter/Chris
>> > I don't understand - please explain.
>> >
>> > If I store the business rule in Python code, how do I prevent untrusted
>> > users putting malicious code in there? I presume I would have to execute
>> > > the
>> > code by calling eval(), which we all know is dangerous. Is there another
>> > > way
>> > of executing it that I am unaware of?
>
>
>> The real question is: How malicious can your users be?
>
>
>> If the XML file is stored adjacent to the Python script that runs it,
>> anyone who can edit one can edit the other. Ultimately, that means that (a)
>> any malicious user can simply edit the Python script, and therefore (b)
>> anyone who's editing the other file is not malicious.
>
>
>> If that's not how you're doing things, give some more details of what
>> you're trying to do. How are you preventing changes to the Python script?
>> How frequent will changes be? Can you simply put all changes through a git
>> repository and use a pull request workflow to ensure that a minimum of two
>> people eyeball every change?
>
>
> All interaction with users is via a gui. The database contains tables that
> define the database itself - tables, columns, form definitions, etc. These
> are not purely descriptive, they drive the entire system. So if a user
> modifies a definition, the changes are immediate.
>
> Does that answer your question? I can go into a lot more detail, but I am
> not sure where to draw the line.

Sounds to me like you have two very different concerns, then. My
understanding of "GUI" is that it's a desktop app running on the
user's computer, as opposed to some sort of client/server system - am
I right?

1) Malicious users, as I describe above, can simply mess with your
code directly, or bypass it and talk to the database, or anything. So
you can ignore them.

2) Non-programmer users, without any sort of malice, want to be able
to edit these scripts but not be caught out by a tiny syntactic
problem.

Concern #2 is an important guiding principle. You need your DSL to be
easy to (a) read/write, and (b) verify/debug. That generally means
restricting functionality some, but you don't have to dig into the
nitty-gritty of security. If someone figures out a way to do something
you didn't intend to be possible, no big deal; but if someone CANNOT
figure out how to use the program normally, that's a critical failure.

So I would recommend making your config files as simple and clean as
possible. That might mean Python code; it does not mean XML, and
probably not JSON either. It might mean YAML. It might mean creating
your own DSL. It might be as simple as "Python code with a particular
style guide". There are a lot of options. Here's your original XML and
Python code:

<case>
 <compare src="_param.auto_party_id" op="is_not" tgt="$None">
   <case>
     <on_insert>
       <auto_gen args="_param.auto_party_id"/>
     </on_insert>
     <not_exists>
       <literal value="<new>"/>
     </not_exists>
   </case>
 </compare>
</case>

if _param.auto_party_id is not None:
   if on_insert:
       value = auto_gen(_param.auto_party_id)
   elif not_exists:
       value = '<new>'

Here's a very simple format, borrowing from RFC822 with a bit of Python added:

if: _param.auto_party_id != None
    if: on_insert
        value: =auto_gen(_param.auto_party_id)
    elif: not_exists
        value: <new>

"if" and "elif" expect either a single boolean value, or two values
separated by a known comparison operator (==, !=, <, >, etc). I'd
exclude "is" and "is not" for parser simplicity. Any field name that
isn't a keyword (in this case "value:") sets something to the given
value, or evals the given string if it starts with an equals sign.

You can mess around with the details all you like, but this is a
fairly simple format (every line consists of "keyword: value" at some
indentation level), and wouldn't be too hard to parse. The use of
eval() means this is assuming non-malicious usage, which is (as
mentioned above) essential anyway, as a full security check on this
code would be virtually impossible.

And, as mentioned, Python itself makes a fine DSL for this kind of thing.

ChrisA