JSON Object to CSV File Troubleshooting

Sahlusar ahlusar.ahluwalia at gmail.com
Sun Jun 21 19:56:27 EDT 2015


On Sunday, June 21, 2015 at 7:34:47 PM UTC-4, Denis McMahon wrote:
> On Mon, 22 Jun 2015 00:55:11 +0300, Joonas Liik wrote:
> 
> > In xml for instance this is valid:
> 
> > <a>
> >  <b>1</b>
> > </a>
> > .. and so is this:
> > <a>
> >  <b>1</b> <b>2</b>
> > </a>
> 
> What the OP needs to do is sit down with the XML and work out how it 
> needs to be represented in CSV terms, and then code that transformation, 
> but it appears that he's not listening.
> 
> ie, does your xml:
> 
> <a>
>   <b>1</b> <b>2</b>
> </a>
> 
> translate to CSV:
> 
> "a","b"   // headers
> "b",1     // data row 1
> "b",2     // data row 2
> 
> or to CSV:
> 
> "a", "b", "b"     // headers
> "", 1, 2          // data row
> 
> or even CSV:
> 
> "b"   // headers
> 1     // data row 1
> 2     // data row 2
> 
> If he can't codify that in a consistent manner across all the XML he 
> wishes to process, then he really does need to find someone competent to 
> do the job instead of wallowing around in json until the client gives up 
> in despair at the lack of progress and finds someone else to do the job.
> 
> This should really have been defined by whoever set the task to do the 
> conversion. If the job is to convert from some XML DTD to a CSV format, 
> then there should be a clear description of what extracts from the XML 
> are expected to be in which positions in the CSV.
> 
> This is the sort of data conversion code I generally turn out in a day or 
> so, it's hardly rocket science as long as you have a clear description of 
> what is required. If you don't have a clear description of what is 
> required, you have to keep asking questions until you get one.
> 
> -- 
> Denis McMahon, denismfmcmahon at gmail.com



On Sunday, June 21, 2015 at 7:34:47 PM UTC-4, Denis McMahon wrote:
> On Mon, 22 Jun 2015 00:55:11 +0300, Joonas Liik wrote:
> 
> > In xml for instance this is valid:
> 
> > <a>
> >  <b>1</b>
> > </a>
> > .. and so is this:
> > <a>
> >  <b>1</b> <b>2</b>
> > </a>
> 
> What the OP needs to do is sit down with the XML and work out how it 
> needs to be represented in CSV terms, and then code that transformation, 
> but it appears that he's not listening.
> 
> ie, does your xml:
> 
> <a>
>   <b>1</b> <b>2</b>
> </a>
> 
> translate to CSV:
> 
> "a","b"   // headers
> "b",1     // data row 1
> "b",2     // data row 2
> 
> or to CSV:
> 
> "a", "b", "b"     // headers
> "", 1, 2          // data row
> 
> or even CSV:
> 
> "b"   // headers
> 1     // data row 1
> 2     // data row 2
> 
> If he can't codify that in a consistent manner across all the XML he 
> wishes to process, then he really does need to find someone competent to 
> do the job instead of wallowing around in json until the client gives up 
> in despair at the lack of progress and finds someone else to do the job.
> 
> This should really have been defined by whoever set the task to do the 
> conversion. If the job is to convert from some XML DTD to a CSV format, 
> then there should be a clear description of what extracts from the XML 
> are expected to be in which positions in the CSV.
> 
> This is the sort of data conversion code I generally turn out in a day or 
> so, it's hardly rocket science as long as you have a clear description of 
> what is required. If you don't have a clear description of what is 
> required, you have to keep asking questions until you get one.
> 
> -- 
> Denis McMahon, denismfmcmahon at gmail.com



On Sunday, June 21, 2015 at 7:34:47 PM UTC-4, Denis McMahon wrote:
> On Mon, 22 Jun 2015 00:55:11 +0300, Joonas Liik wrote:
> 
> > In xml for instance this is valid:
> 
> > <a>
> >  <b>1</b>
> > </a>
> > .. and so is this:
> > <a>
> >  <b>1</b> <b>2</b>
> > </a>
> 
> What the OP needs to do is sit down with the XML and work out how it 
> needs to be represented in CSV terms, and then code that transformation, 
> but it appears that he's not listening.
> 
> ie, does your xml:
> 
> <a>
>   <b>1</b> <b>2</b>
> </a>
> 
> translate to CSV:
> 
> "a","b"   // headers
> "b",1     // data row 1
> "b",2     // data row 2
> 
> or to CSV:
> 
> "a", "b", "b"     // headers
> "", 1, 2          // data row
> 
> or even CSV:
> 
> "b"   // headers
> 1     // data row 1
> 2     // data row 2
> 
> If he can't codify that in a consistent manner across all the XML he 
> wishes to process, then he really does need to find someone competent to 
> do the job instead of wallowing around in json until the client gives up 
> in despair at the lack of progress and finds someone else to do the job.
> 
> This should really have been defined by whoever set the task to do the 
> conversion. If the job is to convert from some XML DTD to a CSV format, 
> then there should be a clear description of what extracts from the XML 
> are expected to be in which positions in the CSV.
> 
> This is the sort of data conversion code I generally turn out in a day or 
> so, it's hardly rocket science as long as you have a clear description of 
> what is required. If you don't have a clear description of what is 
> required, you have to keep asking questions until you get one.
> 
> -- 
> Denis McMahon, denismfmcmahon at gmail.com


I have gone back to the drawing board and scrapped that idea of an intermediate JSON/dictionary. 


This is the output that I seek:

> "a","b"   // headers
> "b",1     // data row 1
> "b",2     // data row 2
> 

Here is an example XML document that I am working with:

<Response ID="24856-775" RequestType="Moverview">        
        <MonthDayCount>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
        </MonthDayCount>
        <Warnings />
        <SList />
        <LList />
        <EA>Y</EA>
        <EHA>Y</EHA>
        <EBY>Y</EBY>
        <EOTH>Y</EOTH>
        <EIL>Y</EIL>
        <EM>Y</EM>
        <ED>Y</ED>
        <EQ>Y</EQ>
        <ERS>Y</ERS>
        <ECCS>Y</ECCS>
        <EES>Y</EES>
        <UAS>Y</UAS>
        <PA>False</PA>
        <PL>False</PL>
        <PC>False</PC>
        <PCs>False</PCs>
        <PJ>False</PJ>
        <OITC>0</OITC>
        <MG />
        <R />
        <CCGoods />
    </MO>
</Response>


So far I have gotten this output:

""" my output -->
('Response.RequestType', 'Moverview')
('Response.ID', '24856-775')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.MonthDayCount.Int32.', '0')
('Response.Warnings.', None)
('Response.SList.', None)
('Response.LList.', None)
('Response.EA.', 'Y')
('Response.EHA.', 'Y')
('Response.EBY.', 'Y')
('Response.EOTH.', 'Y')
('Response.EIL.', 'Y')
('Response.EM.', 'Y')
('Response.ED.', 'Y')
('Response.EQ.', 'Y')
('Response.ERS.', 'Y')
('Response.ECCS.', 'Y')
('Response.EES.', 'Y')
('Response.UAS.', 'Y')
('Response.PA.', 'False')
('Response.PL.', 'False')
('Response.PC.', 'False')
('Response.PCs.', 'False')
('Response.PJ.', 'False')
('Response.OITC.', '0')
('Response.MG.', None)
('Response.R.', None)
('Response.CCGoods.', None)
"""

The function that produced this has been redesigned to flatten the XML using a generator:

import xml.etree.cElementTree as ElementTree 
from xml.etree.ElementTree import XMLParser 
import json 
import csv 
import tokenize 
import token 
try: 
    from collections import OrderedDict 
    import json 
except ImportError: 
    from ordereddict import OrderedDict 
    import simplejson as json 
import itertools 
import six 
import string 
#from csvkit import CSVKitWriter 
def flatten_list(aList, prefix=''): 
    for element in aList:
        if element: 
            # treat like dict 
            if len(element) == 1 or element[0].tag != element[1].tag: 
                yield from flatten_dict(element, prefix)
            # treat like list 
            elif element[0].tag == element[1].tag: 
                yield from flatten_list(element, prefix)
        elif element.text: 
            text = element.text.strip() 
            if text: 
                yield prefix, text
def flatten_dict(parent_element, prefix=''):
    prefix = prefix + parent_element.tag + '.'
    if parent_element.items():
        for k, v in parent_element.items():
            yield prefix + k, v
    for element in parent_element:
        eprefix = prefix + element.tag + '.'
        if element:
            # treat like dict - we assume that if the first two tags 
            # in a series are different, then they are all different. 
            if len(element) == 1 or element[0].tag != element[1].tag: 
                yield from flatten_dict(element, prefix=prefix)
            # treat like list - we assume that if the first two tags 
            # in a series are the same, then the rest are the same. 
            else: 
                # here, we put the list in dictionary; the key is the 
                # tag name the list elements all share in common, and 
                # the value is the list itself
                yield from flatten_list(element, prefix=eprefix+element[0].tag+'.')
            # if the tag has attributes, add those to the dict
            if element.items():
                for k, v in element.items():
                    yield eprefix+k, v 
        # this assumes that if you've got an attribute in a tag, 
        # you won't be having any text. This may or may not be a 
        # good idea -- time will tell. It works for the way we are 
        # currently doing XML configuration files... 
        elif element.items(): 
            for k, v in element.items():
                yield eprefix+k, v
        # finally, if there are no child tags and no attributes, extract 
        # the text 
        else:
            yield eprefix, element.text
def main():
    with open('source.xml', 'r', encoding='utf-8') as f: 
        xml_string = f.read() 
    xml_string= xml_string.replace('&#x0;', '') #optional to remove ampersands. 
    root = ElementTree.XML(xml_string) 
    for item in flatten_dict(root):
        print(item)
if __name__ == "__main__":
    main()


I now seek to work on the following. 

1.

(**'Response.RequestType'**, 'Moverview') 
(**'Response.ID', **'24856-775') 

The ** 'sample.text.in_tag ** corresponds with the column headers. The values would be the second item in the tuple.

All of the "Int32" should be stripped away and replaced with an enumeration number corresponding with an index for the header "MonthDayCount" (for example MonthDayCount_1, MonthDayCount_2, MonthDayCount_3, MonthDayCount_4 etc)


The Responses for the rest of the elements (save for the first two above (under item number 1) could be removed). I am still trying to figure how to do that as well...

Again The first and second item in the tuple would be the header and corresponding value, respectively - just like your initial assumption. 

You are welcome to contribute and provide me with feedback. Thank you for your continued feedback and guidance. 



More information about the Python-list mailing list