stripping fields from xml file into a csv

Stefan Behnel stefan_ml at behnel.de
Mon Mar 1 02:46:04 EST 2010


Hal Styli, 01.03.2010 00:15:
> Stefan, I was happy to see such concise code.
> Your python worked with only very minor modifications.
> 
> Hai's test xml data *without* the first and last line is close enough
> to the data I am using:
> 
> <order customer="john" product="eggs" quantity="12" />
> <order customer="cindy" product="bread" quantity="1" />
> <order customer="larry" product="tea bags" quantity="100" />
> <order customer="john" product="butter" quantity="1" />
> <order product="chicken" quantity="2" customer="derek" />
> 
> ... quirky.
>
> I  get a large file given to me in this format. I believe it is
> created by something like:
> grep 'customer=' *.xml, where there are a large number of  xml files.

Try to get this fixed at the source. Exporting non-XML that looks like XML
is not a good idea in general, and it means that everyone who wants to read
the data has to adapt, instead of fixing the source once and for all.


> I had to edit the data to include the first and last lines, <orders>
> and </orders>,
> to get the python code to work. It's not an arduous task(!), but can
> you recommend a way to get it to work without
> manually editing the data?

Iff this cannot be fixed at the source, you can write a file-like wrapper
around a file that simply returns the boundary tags before and after
reading from the file itself. All you need is a .read(n) method, see the
documentation of the file type.


> One other thing, what's the Roland Mueller post above about (I'm
> viewing htis in google groups)? What would the test.xsl file look
> like?

This is the XSLT script he posted:

============================
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:fo="http://www.w3.org/1999/XSL/Format"
  version="1.0">

<!-- text output because we want to have an CSV file -->
<xsl:output method="text"/>

<!-- remove all whitespace coming with input XML -->
<xsl:strip-space elements="*"/>

<!-- matches any <order> element and extracts the customer,product&quantity
attributes -->
<xsl:template match="order">
  <xsl:value-of select="@customer"/>
  <xsl:text>,</xsl:text>
  <xsl:value-of select="@product"/>
  <xsl:text>,</xsl:text>
  <xsl:value-of select="@quantity"/>
  <xsl:text>
</xsl:text>
</xsl:template>

</xsl:stylesheet>
============================

Stefan




More information about the Python-list mailing list