XML Considered Harmful

Avi Gross avigross at verizon.net
Mon Sep 27 21:01:04 EDT 2021


Michael,

Given your further explanation, indeed reading varying numbers of points in
using a CSV is not valid, albeit someone might just make N columns (maybe a
few more than 7) to handle a hopefully worst case. Definitely it makes more
sense to read in a list or other data structure.

You keep talking about generators, though. If the generators are outside of
your program, then yes, you need to read in whatever they produce. But if
your data generator is within your own program, that opens up other
possibilities. I am not saying you necessarily would want to use the usual
numpy/pandas modules and have some kind of data.frame. I do know other
languages (like R) where I have used columns that are lists.

My impression is you may not be using your set of data points for any other
purposes except when ready to draw a spline. Again, in some languages this
opens up many possibilities. A fairly trivial one is if you store your
points as something like "1.2:3.86:12:83.2" meaning a character string with
some divider. When ready to use that, it is fairly straightforward to
convert it to a list to use for your purpose.

Can I just ask if by a generator, you do NOT mean the more typical use of
"generator" as used in python in which some code sort of runs as needed to
keep generating the next item to work on. Do you mean something that creates
realistic test cases to simulate a real-word scenario? These often can
create everything at once and often based on random numbers. Again, if you
have or build such code, it is not clear it needs to be written to disk and
then read back. You may of course want to save it, perhaps as a log, to show
what your program was working on. 



-----Original Message-----
From: Python-list <python-list-bounces+avigross=verizon.net at python.org> On
Behalf Of Michael F. Stemper
Sent: Monday, September 27, 2021 11:40 AM
To: python-list at python.org
Subject: Re: XML Considered Harmful

On 25/09/2021 16.39, Avi Gross wrote:
> Michael,
> 
> I don't care what you choose. Whatever works is fine for an internal use.

Maybe I should have taken the provoking article with a few more grains of
salt. At this point, I'm not seeing any issues that are applicable to my use
case.

> But is the data scheme you share representative of your actual
application?
> 
>>From what I see below, unless the number of "point" variables is not 
>>always
> exactly four, the application might be handled well by any format that 
> handles rectangular data, perhaps even CSV.
> 
> You show a I mean anything like a data.frame can contain data columns 
> like
> p1,p2,p3,p4 and a categorical one like IHRcurve_name.
> 
> Or do you have a need for more variability such as an undetermined 
> number of similar units in ways that might require more flexibility or 
> be more efficient done another way?

As far as the number of points per IHR curve, the only requirement is that
there must be at least two. It's hard to define a line segment with only
one. The mock data that I have so far has curves ranging from two to five
points. I didn't notice that the snippet that I posted had two curves with
the same number of breakpoints, which was misleading.

My former employer's systems had, IIRC, space for seven points per curve in
the database structures. Of all the sizing changes made over a long career,
I don't recall any customer ever requiring more than that. But, it's
cleanest to use python lists (with no inherent sizing limitations) to
represent the IHR (and incremental cost) curves.


> MOST of the discussion I am seeing here seems peripheral to getting 
> you what you need for your situation and may require a learning curve 
> to learn to use properly. Are you planning on worrying about how to 
> ship your data encrypted, for example? Any file format you use for 
> storage can presumably be encrypted and send and decrypted if that
matters.

This work is intended to look at the feasability of relaxing some
constraints normally required for the solution of Economic Dispatch.
So all of my data are hypothetical. Once I have stuff up and running, I'll
be making up data for lots of different generators.

Being retired, I don't have access to any proprietary information about any
specific generators, so all of the data is made up out of my head. I still
need a way to get it into my programs, of course.

> So, yes, from an abstract standpoint we can discuss the merits of 
> various approaches. If it matters that humans can deal with your data 
> in a file or that it be able to be imported into a program like EXCEL, 
> those are considerations. But if not, there are quite a few relatively 
> binary formats where your program can save a snapshot of the data into 
> a file and read it back in next time.

Not needed here. I'm strictly interested in getting the models of
(generic) generating fleets in. Output of significant results will probably
be in CSV, which nicely replicates tabular displays that I used through most
of my career.

> Or, did I miss something and others have already produced the data 
> using other tools, in which case you have to read it in at least once/

Well, the "tool" is vi, but this is a good description of what I'm doing.

--
Michael F. Stemper
The FAQ for rec.arts.sf.written is at
<http://leepers.us/evelyn/faqs/sf-written.htm>
Please read it before posting.
--
https://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list