[python-win32] Properly encoded HTML from MSXML XLST processor into python string (via IStream) ?

Andreas Neubauer aneubauer at ra.rockwell.com
Mon Sep 10 21:23:57 CEST 2007


Dear all,
Using the Microsoft XML core services (MSXML 4.0) as an XSLT-processor for 
python 
i got into a trap when trying to generate properly unicode(UTF-8) encoded 
HTML: 
  The encoding statement gets lost in the HTML header, and white-spaces 
UTF-8: HEX code C2 A0 convert to A0. 

Testing and reading the Microsoft doku I found this working fine if the 
target output is of type IStream ...
Can I somehow use a Microsoft IStream object or implement it in a suitable 
manner ?

The XSLT stylesheet  controls it like this:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" >
        <xsl:output version="1.0" method="html" indent="no" 
encoding="UTF-8"/>

However this output configuration statement gets ignored if the 
IXSLProcessor is not used with a custom output (e.g. IStream object).

Microsoft: "When a new transform is started, the processor will use a 
QueryInterface this output for IStream. When the transform is complete or 
reset is called, IStream is released. The only method that is used on 
IStream is Write. The bytes written to the stream will be encoded 
according to the encoding attribute on the <xsl:output> element.
If you do not provide a custom output, then you will get a string when you 
read this property. The string contains the incrementally buffered 
transformation result.
Reading this property has the side effect of resetting that internal 
buffer so that each time you read the property you get the next chunk of 
output. In this case, the output is always generated in the Unicode 
encoding, and the encoding attribute on the <xsl:output> element is 
ignored."

Like in this python example:
-----------------------------------------------------------------------------------
_msxmlLib = 
win32com.client.gencache.EnsureModule("{F5078F18-C551-11D3-89B9-0000F81FE221}", 
0, 4, 0)
...
xslt = win32com.client.dynamic.Dispatch("Msxml2.XSLTemplate.4.0")
...
    xslProc = xslt.createProcessor();
    xslProc.input = xmlDoc
    xslProc.transform()

    xmlData=xslProc.output
-----------------------------------------------------------------------------------
If I use a custom output like this:
-----------------------------------------------------------------------------------
   xmlData=""
   xslProc.transform()
   xslProc.output(xmlData)
-----------------------------------------------------------------------------------
MSXML com object returns an error: 
  " Exception during xslt transformation: 'unicode' object is not callable 
"

Is there a way in python to provide an appropriate unicode object 
receiving the output without ignoring the encoding statement ?
Any proven way using IXSLProcessor generating properly encoded HTML into 
python ?

Kind regards
Andreas Neubauer

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-win32/attachments/20070910/f5e2c702/attachment.htm 


More information about the python-win32 mailing list