XML Considered Harmful

Pete Forman petef4+usenet at gmail.com
Tue Sep 21 17:21:43 EDT 2021


"Michael F. Stemper" <michael.stemper at gmail.com> writes:

> On 21/09/2021 13.49, alister wrote:
>> On Tue, 21 Sep 2021 13:12:10 -0500, Michael F. Stemper wrote:
> It's my own research, so I can give myself the data in any format that I
> like.
>
>> as far as I can see the main issue with XML is bloat, it tries to do
>> too many things & is a very verbose format, often the quantity of
>> mark-up can easily exceed the data contained within it. other formats
>> such a JSON & csv have far less overhead, although again not always
>> suitable.
>
> I've heard of JSON, but never done anything with it.

Then you should certainly try to get a basic understanding of it. One
thing JSON shares with XML is that it is best left to machines to
produce and consume. Because both can be viewed in a text editor there
is a common misconception that they are easy to edit. Not so, commas are
a common bugbear in JSON and non-trivial edits in (XML unaware) text
editors are tricky.

Consider what overhead you should worry about. If you are concerned
about file sizes then XML, JSON and CSV should all compress to a similar
size.

> How does CSV handle hierarchical data? For instance, I have
> generators[1], each of which has a name, a fuel and one or more
> incremental heat rate curves. Each fuel has a name, UOM, heat content,
> and price. Each incremental cost curve has a name, and a series of
> ordered pairs (representing a piecewise linear curve).
>
> Can CSV files model this sort of situation?

The short answer is no. CSV files represent spreadsheet row-column
values with nothing fancier such as formulas or other redirections.

CSV is quite good as a lowest common denominator exchange format. I say
quite because I would characterize it by 8 attributes and you need to
pick a dialect such as MS Excel which sets out what those are. XML and
JSON are controlled much better. You can easily verify that you conform
to those and guarantee that *any* conformant parser can read your
content. XML is more powerful in that repect than JSON in that you can
define and enforce schemas. In your case the fuel name, UOM, etc. can be
validated with standard tools. In JSON all that checking is entirely
handled by the consuming program(s).

>> As in all such cases it is a matter of choosing the most apropriate tool
>> for the job in hand.
>
> Naturally. That's what I'm exploring.

You might also like to consider HDF5. It is targeted at large volumes of
scientific data and its capabilities are well above what you need.
MATLAB, Octave and Scilab use it as their native format. PyTables and
h2py provide Python/NumPy bindings to it.

-- 
Pete Forman


More information about the Python-list mailing list