XML Considered Harmful

Michael F. Stemper michael.stemper at gmail.com
Tue Sep 21 15:22:52 EDT 2021


On 21/09/2021 13.49, alister wrote:
> On Tue, 21 Sep 2021 13:12:10 -0500, Michael F. Stemper wrote:
> 
>> On the prolog thread, somebody posted a link to:
>> <https://dirtsimple.org/2004/12/python-is-not-java.html>
>>
>> One thing that it tangentially says is "XML is not the answer."
>>
>> I read this page right when I was about to write an XML parser to get
>> data into the code for a research project I'm working on.
>> It seems to me that XML is the right approach for this sort of thing,
>> especially since the data is hierarchical in nature.
>>
>> Does the advice on that page mean that I should find some other way to
>> get data into my programs, or does it refer to some kind of misuse/abuse
>> of XML for something that it wasn't designed for?
>>
>> If XML is not the way to package data, what is the recommended approach?
> 
> 1'st can I say don't write your own XML parser, there are already a
> number of existing parsers that should do everything you will need.  This
> is a wheel that does not need re-inventing.

I was going to build it on top of xml.etree.ElementTree

> 2nd if you are not generating the data then you have to use whatever data
> format you are supplied

It's my own research, so I can give myself the data in any format that I
like.

> as far as I can see the main issue with XML is bloat, it tries to do too
> many things & is a very verbose format, often the quantity of mark-up can
> easily exceed the data contained within it.
> 
> other formats such a JSON & csv have far less overhead, although again
> not always suitable.

I've heard of JSON, but never done anything with it.

How does CSV handle hierarchical data? For instance, I have
generators[1], each of which has a name, a fuel and one or more
incremental heat rate curves. Each fuel has a name, UOM, heat content,
and price. Each incremental cost curve has a name, and a series of
ordered pairs (representing a piecewise linear curve).

Can CSV files model this sort of situation?

> As in all such cases it is a matter of choosing the most apropriate tool
> for the job in hand.

Naturally. That's what I'm exploring.


[1] The kind made of tons of iron and copper, filled with oil, and
rotating at 1800 rpm.

-- 
Michael F. Stemper
This sentence no verb.


More information about the Python-list mailing list