To write headers once with different values in separate row in CSV

kbtyo ahlusar.ahluwalia at gmail.com
Thu Jun 25 12:37:12 EDT 2015


Okay, so I have gone back to the drawing board and have the following predicament (my apologies, in advance for the indentation):

Here is my sample:
    <Response ID="24856-775" RequestType="Moverview">        
            <MonthDayCount>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
                <Int32>0</Int32>
            </MonthDayCount>
    			<FeeCount>
    					<Int32>0</Int32>
    					<Int32>0</Int32>
    					<Int32>0</Int32>
    					<Int32>0</Int32>
    					<Int32>0</Int32>
    					<Int32>0</Int32>
    			</FeeCount>
    			<PaymentBucketAmount>
    						<Double>0</Double>
    						<Double>0</Double>
    						<Double>0</Double>
    						<Double>0</Double>
    						<Double>0</Double>
    						<Double>0</Double>
    						<Double>0</Double>
    						<Double>0</Double>
    					</PaymentBucketAmount>
    					<PaymentBucketDueDate>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    						<DateTime>1/1/0001 12:00:00 AM</DateTime>
    					</PaymentBucketDueDate>
            <Warnings />
            <SList />
            <LList />
            <PA>False</PA>
            <PL>False</PL>
            <PC>False</PC>
            <PCs>False</PCs>
            <PJ>False</PJ>
            <OITC>0</OITC>
            <MG />
            <R />
            <CCGoods />
    </Response>


Using this:


    import xml.etree.cElementTree as ElementTree 
    from xml.etree.ElementTree import XMLParser
    import csv
    
    def flatten_list(aList, prefix=''):
        for i, element in enumerate(aList, 1):
            eprefix = "{}{}".format(prefix, i)
            if element:
                # treat like dict 
                if len(element) == 1 or element[0].tag != element[1].tag: 
                    yield from flatten_dict(element, eprefix)
                # treat like list 
                elif element[0].tag == element[1].tag: 
                    yield from flatten_list(element, eprefix)
            elif element.text: 
                text = element.text.strip() 
                if text: 
                    yield eprefix[:].rstrip('.'), element.text
    
    def flatten_dict(parent_element, prefix=''):
        prefix = prefix + parent_element.tag 
        if parent_element.items():
            for k, v in parent_element.items():
                yield prefix + k, v
        for element in parent_element:
            eprefix = prefix + element.tag  
            if element:
                # treat like dict - we assume that if the first two tags 
                # in a series are different, then they are all different. 
                if len(element) == 1 or element[0].tag != element[1].tag: 
                    yield from flatten_dict(element, prefix=prefix)
                # treat like list - we assume that if the first two tags 
                # in a series are the same, then the rest are the same. 
                else: 
                    # here, we put the list in dictionary; the key is the 
                    # tag name the list elements all share in common, and 
                    # the value is the list itself
                    yield from flatten_list(element, prefix=eprefix)
                # if the tag has attributes, add those to the dict
                if element.items():
                    for k, v in element.items():
                        yield eprefix+k
            # this assumes that if you've got an attribute in a tag, 
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are 
            # currently doing XML configuration files... 
            elif element.items(): 
                for k, v in element.items():
                    yield eprefix+k
            # finally, if there are no child tags and no attributes, extract 
            # the text 
            else:
                yield eprefix, element.text                
    
    def makerows(pairs):
        headers = []
        columns = {}
        for k, v in pairs:
            if k in columns:
                columns[k].extend((v,))
            else:
                headers.append(k)
                columns[k] = [k, v]
        m = max(len(c) for c in columns.values())
        for c in columns.values():
            c.extend(' ' for i in range(len(c), m))
        L = [columns[k] for k in headers]
        rows = list(zip(*L))
        return rows                   
                        
    
    def main():
        with open('sample.xml', 'r', encoding='utf-8') as f: 
            xml_string = f.read() 
        xml_string= xml_string.replace('&#x0;', '') #optional to remove ampersands. 
        root = ElementTree.XML(xml_string) 
        for key, value in flatten_dict(root):
            key = key.rstrip('.').rsplit('.', 1)[-1]
            print(key,value)            
            
    if __name__ == "__main__":
        main()


I receive this output:

    ResponseRequestType Moverview
    ResponseID 24856-775
    ResponseMonthDayCount1 0
    ResponseMonthDayCount2 0
    ResponseMonthDayCount3 0
    ResponseMonthDayCount4 0
    ResponseMonthDayCount5 0
    ResponseMonthDayCount6 0
    ResponseMonthDayCount7 0
    ResponseMonthDayCount8 0
    ResponseMonthDayCount9 0
    ResponseMonthDayCount10 0
    ResponseMonthDayCount11 0
    ResponseMonthDayCount12 0
    ResponseMonthDayCount13 0
    ResponseMonthDayCount14 0
    ResponseMonthDayCount15 0
    ResponseMonthDayCount16 0
    ResponseMonthDayCount17 0
    ResponseMonthDayCount18 0
    ResponseMonthDayCount19 0
    ResponseMonthDayCount20 0
    ResponseMonthDayCount21 0
    ResponseMonthDayCount22 0
    ResponseMonthDayCount23 0
    ResponseMonthDayCount24 0
    ResponseMonthDayCount25 0
    ResponseFeeCount1 0
    ResponseFeeCount2 0
    ResponseFeeCount3 0
    ResponseFeeCount4 0
    ResponseFeeCount5 0
    ResponseFeeCount6 0
    ResponsePaymentBucketAmount1 0
    ResponsePaymentBucketAmount2 0
    ResponsePaymentBucketAmount3 0
    ResponsePaymentBucketAmount4 0
    ResponsePaymentBucketAmount5 0
    ResponsePaymentBucketAmount6 0
    ResponsePaymentBucketAmount7 0
    ResponsePaymentBucketAmount8 0
    ResponsePaymentBucketDueDate1 1/1/0001 12:00:00 AM
    ResponsePaymentBucketDueDate2 1/1/0001 12:00:00 AM
    ResponsePaymentBucketDueDate3 1/1/0001 12:00:00 AM
    ResponsePaymentBucketDueDate4 1/1/0001 12:00:00 AM
    ResponsePaymentBucketDueDate5 1/1/0001 12:00:00 AM
    ResponsePaymentBucketDueDate6 1/1/0001 12:00:00 AM
    ResponsePaymentBucketDueDate7 1/1/0001 12:00:00 AM
    ResponsePaymentBucketDueDate8 1/1/0001 12:00:00 AM
    ResponseWarnings None
    ResponseSList None
    ResponseLList None
    ResponsePA False
    ResponsePL False
    ResponsePC False
    ResponsePCs False
    ResponsePJ False
    ResponseOITC 0
    ResponseMG None
    ResponseR None
    ResponseCCGoods None


When I write it out to the CSV, using :        


    writer = csv.writer(open("try2.csv", 'wt')) 
    writer.writerows(makerows(flatten_dict(root)))`


I still receive the headers with the **Response** chained to the sub-elements of the root with the tags' text as the values (which is just fine). My goal is to only have the sublements as the headers (along with their values), sans the Response (unless it is the tag name). Thanks you all for your feedback! For example, ResponsePaymentBucketAmount1 0, should just be PaymentBucketAmount1 as the header.



More information about the Python-list mailing list