enumerate XML tags (keys that will become headers) along with text (values) and write to CSV in one row (as opposed to "stacked" values with one header)

Sahlusar sahluwalia at wynyardgroup.com
Mon Jun 29 10:52:07 EDT 2015


On Sunday, 28 June 2015 03:46:56 UTC-4, Stefan Behnel  wrote:
> Denis McMahon schrieb am 26.06.2015 um 09:44:
> > xml data is an unordered list, and are trying to assign an order to it.
> > 
> > If the xml data was ordered, either each tag would be different, or each 
> > tag would have an attribute specifying a sequence number.
> 
> XML is not unordered. The document order is well defined and entirely
> obvious from the data. Whether this order is relevant and has a meaning or
> not is, however, not part of XML itself but is left to the semantics of the
> specific document format at hand. Meaning, XML document formats can choose
> to ignore that order and define it as irrelevant. That doesn't mean it's
> not there for a given document, but it may mean that a re-transmission of
> the same document would be allowed to use a different order without
> changing the information.
> 
> This property applies to pretty much all structured data formats and not
> just XML, by the way, also to CSV and other tabular formats.
> 
> Stefan

@Stefan, Ned, and Robert: You have all hit the nail on the head. I do not have an authentic and veritable XSD (or XML data structures for that matter). So far it is all deprecated and/or anonymized data from the client. Therefore, I can only hypothesize what the end output will be for the database architecture that I am working with. 

>From what I understand, therefore, based on your constructive insight, is that the 14 occurrences of the same tag (regardless of placement relative to neighbouring children and the root are all being defined as the same key. However, their individual values are also being treated as the same (from the algorithm that I wrote in my Stack Overflow post (please see above)). The constraint is that I am anticipating terabytes of data every day from the client in the coming months. The algorithm should be able to parse, and write out to CSV in the most efficient manner. That is my design constraint. I welcome your feedback on this. 

Here is the post, again, for your convenience: 

http://stackoverflow.com/questions/31058100/enumerate-column-headers-in-csv-that-belong-to-the-same-tag-key-in-python 





More information about the Python-list mailing list