stripping fields from xml file into a csv

Sun Feb 28 07:01:03 EST 2010

Hello,

2010/2/28 Stefan Behnel <stefan_ml at behnel.de>

> Hal Styli, 27.02.2010 21:50:
> > I have a sed solution to the problems below but would like to rewrite
> > in python...
>
> Note that sed (or any other line based or text based tool) is not a
> sensible way to handle XML. If you want to read XML, use an XML parser.
> They are designed to do exactly what you want in a standard compliant way,
> and they can deal with all sorts of XML formatting and encoding, for
> example.
>
>
> > I need to strip out some data from a quirky xml file into a csv:
> >
> > from something like this
> >
> > < ..... cust="dick" .... product="eggs" ... quantity="12" .... >
> > < .... cust="tom" .... product="milk" ... quantity="2" ...>
> > < .... cust="harry" .... product="bread" ... quantity="1" ...>
> > < .... cust="tom" .... product="eggs" ... quantity="6" ...>
> > < ..... cust="dick" .... product="eggs" ... quantity="6" .... >
>
> As others have noted, this doesn't tell much about your XML. A more
> complete example would be helpful.
>
>
> > to this
> >
> > dick,eggs,12
> > tom,milk,2
> > harry,bread,1
> > tom,eggs,6
> > dick,eggs,6
> >
> > I am new to python and xml and it would be great to see some slick
> > ways of achieving the above by using python's XML capabilities to
> > parse the original file or python's regex to achive what I did using
> > sed.
>
>
another solution in this case could be to use an XSLT stylesheet. That way
the input processing is defined in an XSLT stylesheet.

The stylesheet is test.xsl and the insput data test.xml. The following
Python code the applies the stylesheet on the input data and puts the output
into foo.

Python code:
#!/usr/bin/python
import sys
import libxml2
import libxslt

styledoc = libxml2.parseFile("test.xsl")
style = libxslt.parseStylesheetDoc(styledoc)
doc = libxml2.parseFile("test.xml")
result = style.applyStylesheet(doc, None)
style.saveResultToFilename("foo", result, 0)

BR,
Roland

*Example run in Linux:*
roland at komputer:~/Desktop/XML/XSLT$ ./xslt_test.py
roland at komputer:~/Desktop/XML/XSLT$ cat foo
john,eggs,12
cindy,bread,1
larry,tea bags,100
john,butter,1
derek,chicken,2
derek,milk,2
*
The test.xsl stylesheet:*
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:fo="http://www.w3.org/1999/XSL/Format"
  version="1.0">

<!-- text output because we want to have an CSV file -->
<xsl:output method="text"/>

<!-- remove all whitespace coming with input XML -->
<xsl:strip-space elements="*"/>

<!-- matches any <order> element and extracts the customer,product&quantity
attributes -->
<xsl:template match="order">
  <xsl:value-of select="@customer"/>
  <xsl:text>,</xsl:text>
  <xsl:value-of select="@product"/>
  <xsl:text>,</xsl:text>
  <xsl:value-of select="@quantity"/>
  <xsl:text>
</xsl:text>
</xsl:template>

</xsl:stylesheet>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100228/de2cb37e/attachment-0001.html>