From janssen at parc.com Tue Jun 10 21:13:35 2008 From: janssen at parc.com (Bill Janssen) Date: Tue, 10 Jun 2008 12:13:35 PDT Subject: [XML-SIG] "encoding" argument to xml.dom.minidom.toxml()? Message-ID: <08Jun10.121338pdt."58698"@synergy1.parc.xerox.com> I've been using the minidom to produce little properly-formatted XML documents, by building a DOM tree, then calling "toxml" to generate the actual XML. But I tripped over the optional "encoding" argument to that function. I figured that the only point of having an encoding argument would be to allow the user to control the output character set encoding, but it turns out that specifying an encoding of, say, "ASCII", doesn't do that. It just raises encoding exceptions when you attempt to encode a non-ASCII character. What's the point of having an encoding argument when it always has to be "UTF-8"? Especially since it seems that this could be made useful by changing one line of code. In xml/dom/minidom.py, in the class Node, in the method "toprettyxml", change the line writer = codecs.lookup(encoding)[3](writer) to writer = codecs.lookup(encoding)[3](writer, "xmlcharrefreplace") What am I missing here? Bill From stefan_ml at behnel.de Tue Jun 10 21:33:31 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 10 Jun 2008 21:33:31 +0200 Subject: [XML-SIG] "encoding" argument to xml.dom.minidom.toxml()? In-Reply-To: <08Jun10.121338pdt."58698"@synergy1.parc.xerox.com> References: <08Jun10.121338pdt."58698"@synergy1.parc.xerox.com> Message-ID: <484ED70B.8060107@behnel.de> Hi, Bill Janssen wrote: > I figured that the only point of having an encoding argument would be > to allow the user to control the output character set encoding, but it > turns out that specifying an encoding of, say, "ASCII", doesn't do > that. It just raises encoding exceptions when you attempt to encode a > non-ASCII character. Well, what did you expect? That it magically transmogrifies your non-ASCII data into plain ASCII data? > What's the point of having an encoding argument > when it always has to be "UTF-8"? Did you try any other encoding besides "ASCII"? > Especially since it seems that this could be made useful by changing > one line of code. In xml/dom/minidom.py, in the class Node, in the > method "toprettyxml", change the line > > writer = codecs.lookup(encoding)[3](writer) > > to > > writer = codecs.lookup(encoding)[3](writer, "xmlcharrefreplace") Could be done, yes. ElementTree and lxml do it that way. It's not required, though. If you say you want to serialise plain ASCII data, nothing keeps an XML serialiser from shouting at you when it finds non-ASCII data. Same for latin1 data or kyrillic data, or ... Stefan From janssen at parc.com Tue Jun 10 22:59:06 2008 From: janssen at parc.com (Bill Janssen) Date: Tue, 10 Jun 2008 13:59:06 PDT Subject: [XML-SIG] "encoding" argument to xml.dom.minidom.toxml()? In-Reply-To: <484ED70B.8060107@behnel.de> References: <08Jun10.121338pdt."58698"@synergy1.parc.xerox.com> <484ED70B.8060107@behnel.de> Message-ID: <08Jun10.135914pdt."58698"@synergy1.parc.xerox.com> Stefan, > > writer = codecs.lookup(encoding)[3](writer, "xmlcharrefreplace") > > Could be done, yes. ElementTree and lxml do it that way. It's not required, > though. If you say you want to serialise plain ASCII data, nothing keeps an > XML serialiser from shouting at you when it finds non-ASCII data. Same for > latin1 data or kyrillic data, or ... I'm not sure what you're saying. The "encoding" parameter is about the character set encoding of the XML output file; it has little or nothing to do with the input data, which in my case is all unicode strings. Clearly, with XML, one can use ASCII, for instance, as a character set encoding. Why not make this parameter work? Bill From janssen at parc.com Tue Jun 10 23:00:17 2008 From: janssen at parc.com (Bill Janssen) Date: Tue, 10 Jun 2008 14:00:17 PDT Subject: [XML-SIG] "encoding" argument to xml.dom.minidom.toxml()? In-Reply-To: <484ED70B.8060107@behnel.de> References: <08Jun10.121338pdt."58698"@synergy1.parc.xerox.com> <484ED70B.8060107@behnel.de> Message-ID: <08Jun10.140019pdt."58698"@synergy1.parc.xerox.com> > Well, what did you expect? That it magically transmogrifies your non-ASCII > data into plain ASCII data? Yep. And there's no reason I can see why it can't do exactly that. I think the "encoding" argument should either be removed, or made to work. Bill From stefan_ml at behnel.de Tue Jun 10 23:05:44 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 10 Jun 2008 23:05:44 +0200 Subject: [XML-SIG] "encoding" argument to xml.dom.minidom.toxml()? In-Reply-To: <08Jun10.140019pdt."58698"@synergy1.parc.xerox.com> References: <08Jun10.121338pdt."58698"@synergy1.parc.xerox.com> <484ED70B.8060107@behnel.de> <08Jun10.140019pdt."58698"@synergy1.parc.xerox.com> Message-ID: <484EECA8.9000508@behnel.de> Hi, Bill Janssen wrote: >> Well, what did you expect? That it magically transmogrifies your non-ASCII >> data into plain ASCII data? > > Yep. And there's no reason I can see why it can't do exactly that. I > think the "encoding" argument should either be removed, or made to > work. That's why I asked if you tried other encodings. Obviously, you only tried "UTF-8" and "ASCII". There's tons of other encodings out there, and I bet they work just fine - as does "ASCII" (for ASCII data, that is). Stefan From bkline at rksystems.com Tue Jun 10 23:51:12 2008 From: bkline at rksystems.com (Bob Kline) Date: Tue, 10 Jun 2008 17:51:12 -0400 Subject: [XML-SIG] "encoding" argument to xml.dom.minidom.toxml()? In-Reply-To: <484EECA8.9000508@behnel.de> References: <08Jun10.121338pdt."58698"@synergy1.parc.xerox.com> <484ED70B.8060107@behnel.de> <08Jun10.140019pdt."58698"@synergy1.parc.xerox.com> <484EECA8.9000508@behnel.de> Message-ID: <484EF750.8020907@rksystems.com> Stefan Behnel wrote: > Hi, > > Bill Janssen wrote: > >>> Well, what did you expect? That it magically transmogrifies your non-ASCII >>> data into plain ASCII data? >>> >> Yep. And there's no reason I can see why it can't do exactly that. I >> think the "encoding" argument should either be removed, or made to >> work. >> > > That's why I asked if you tried other encodings. Obviously, you only tried > "UTF-8" and "ASCII". There's tons of other encodings out there, and I bet they > work just fine - as does "ASCII" (for ASCII data, that is). > > Stefan > I suspect there's a certain amount of unarticulated assumptions on both sides of this exchange. I'm guessing that Bill might be thinking something like: "it's possible to represent any Unicode character in XML as &#"; and was hoping that the method would do just that for the non-ASCII characters if he asks for ASCII encoding. Stefan is (if he even realizes that Bill might be thinking this) himself possibly thinking "no way is the method going to do that much work for the caller." Of course, I realize that it's always risky trying to guess what people are thinking, but I throw this out as a possibility in the hopes that, if I turn out to be right for even just one side of the exchange, this might help clear the air a little bit. :-) -- Bob Kline http://www.rksystems.com mailto:bkline at rksystems.com From stefan_ml at behnel.de Wed Jun 11 00:16:54 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 11 Jun 2008 00:16:54 +0200 Subject: [XML-SIG] "encoding" argument to xml.dom.minidom.toxml()? In-Reply-To: <484EF750.8020907@rksystems.com> References: <08Jun10.121338pdt."58698"@synergy1.parc.xerox.com> <484ED70B.8060107@behnel.de> <08Jun10.140019pdt."58698"@synergy1.parc.xerox.com> <484EECA8.9000508@behnel.de> <484EF750.8020907@rksystems.com> Message-ID: <484EFD56.5020908@behnel.de> Hi, Bob Kline wrote: > Stefan is (if he even realizes that Bill might be thinking > this) himself possibly thinking "no way is the method going to do that > much work for the caller." No, I'm actually just saying that this is not a bug and maybe not even a missing feature. It's a design decision. Saying that the "encoding" keyword doesn't work, just because it detects the error that the user passed an encoding target that cannot represent the data, is pretty obviously wrong. Some people may expect this error. Stefan From janssen at parc.com Wed Jun 11 01:32:53 2008 From: janssen at parc.com (Bill Janssen) Date: Tue, 10 Jun 2008 16:32:53 PDT Subject: [XML-SIG] "encoding" argument to xml.dom.minidom.toxml()? In-Reply-To: <484EF750.8020907@rksystems.com> References: <08Jun10.121338pdt."58698"@synergy1.parc.xerox.com> <484ED70B.8060107@behnel.de> <08Jun10.140019pdt."58698"@synergy1.parc.xerox.com> <484EECA8.9000508@behnel.de> <484EF750.8020907@rksystems.com> Message-ID: <08Jun10.163259pdt."58698"@synergy1.parc.xerox.com> > I suspect there's a certain amount of unarticulated assumptions on both > sides of this exchange. I'm guessing that Bill might be thinking > something like: "it's possible to represent any Unicode character in XML > as &#"; and was hoping that the method > would do just that for the non-ASCII characters if he asks for ASCII > encoding. Yep, that's what I was thinking. I don't see any other reason to have that parameter there. The reason I asked on this list (instead of just committing the change :-) is that I don't really know much about the grubby details of XML, and wanted to engage some minds to consider possible nasty side-effects of making that change. For instance, would emitting charrefs in a CDATA section or a Processing Instructions section really be a good idea? Bill From janssen at parc.com Wed Jun 11 01:41:19 2008 From: janssen at parc.com (Bill Janssen) Date: Tue, 10 Jun 2008 16:41:19 PDT Subject: [XML-SIG] "encoding" argument to xml.dom.minidom.toxml()? In-Reply-To: <484EFD56.5020908@behnel.de> References: <08Jun10.121338pdt."58698"@synergy1.parc.xerox.com> <484ED70B.8060107@behnel.de> <08Jun10.140019pdt."58698"@synergy1.parc.xerox.com> <484EECA8.9000508@behnel.de> <484EF750.8020907@rksystems.com> <484EFD56.5020908@behnel.de> Message-ID: <08Jun10.164124pdt."58698"@synergy1.parc.xerox.com> > Hi, > > Bob Kline wrote: > > Stefan is (if he even realizes that Bill might be thinking > > this) himself possibly thinking "no way is the method going to do that > > much work for the caller." > > No, I'm actually just saying that this is not a bug and maybe not even a > missing feature. It's a design decision. Saying that the "encoding" keyword > doesn't work, just because it detects the error that the user passed an > encoding target that cannot represent the data, is pretty obviously wrong. That's not what I'm saying. I'm objecting to the fact that the encoding target *can* represent the data, but the code isn't written to do that. That's the bug I'm pointing out. If in fact the "encoding" argument is about "type-checking" the input data against some character set, rather than being about the XML character set encoding, then both the code and the documentation are broken. But that's a different bug, and to my way of thinking a much less interesting and less useful way to perceive the situation. Bill From bkline at rksystems.com Wed Jun 11 03:47:13 2008 From: bkline at rksystems.com (Bob Kline) Date: Tue, 10 Jun 2008 21:47:13 -0400 Subject: [XML-SIG] "encoding" argument to xml.dom.minidom.toxml()? In-Reply-To: <08Jun10.164124pdt."58698"@synergy1.parc.xerox.com> References: <08Jun10.121338pdt."58698"@synergy1.parc.xerox.com> <484ED70B.8060107@behnel.de> <08Jun10.140019pdt."58698"@synergy1.parc.xerox.com> <484EECA8.9000508@behnel.de> <484EF750.8020907@rksystems.com> <484EFD56.5020908@behnel.de> <08Jun10.164124pdt."58698"@synergy1.parc.xerox.com> Message-ID: <484F2EA1.2060307@rksystems.com> I believe there are reasonable grounds for both Bill's and Stefan's interpretation of the somewhat ambiguous documentation for the method, and further, that the documentation would benefit from some clarification one way or another. I don't think Bill is correct in thinking that is no other possible reason for having the encoding parameter than to induce the method to use numeric character references for those characters which don't directly fit in the selected encoding. I have used similar methods which are known to raise an exception for mismatched encodings/values to determine the most widely supported encoding which adequately handles all the characters in a Unicode string, and I've seen others do the same. Of course, it's more justifiable to rely on such behavior when the documentation makes it clear exactly when the exceptions will be raised. In this case, given the current wording, it would technically be up to the whim of the implementor. In general, this page of the standard library documentation could use a little cleanup (for example, the docs for the next method refers to "the encoding argument" which - according to the signature for the method - doesn't accept that argument at all). Once it's clear that there is consensus as to which behavior is the most appropriate for toxml(), I'll be happy to contribute to a documentation patch which will nail things down more clearly. -- Bob Kline http://www.rksystems.com mailto:bkline at rksystems.com From stefan_ml at behnel.de Wed Jun 11 06:57:34 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 11 Jun 2008 06:57:34 +0200 Subject: [XML-SIG] "encoding" argument to xml.dom.minidom.toxml()? In-Reply-To: <08Jun10.163259pdt."58698"@synergy1.parc.xerox.com> References: <08Jun10.121338pdt."58698"@synergy1.parc.xerox.com> <484ED70B.8060107@behnel.de> <08Jun10.140019pdt."58698"@synergy1.parc.xerox.com> <484EECA8.9000508@behnel.de> <484EF750.8020907@rksystems.com> <08Jun10.163259pdt."58698"@synergy1.parc.xerox.com> Message-ID: <484F5B3E.9050307@behnel.de> Hi, Bill Janssen wrote: >> I suspect there's a certain amount of unarticulated assumptions on both >> sides of this exchange. I'm guessing that Bill might be thinking >> something like: "it's possible to represent any Unicode character in XML >> as &#"; and was hoping that the method >> would do just that for the non-ASCII characters if he asks for ASCII >> encoding. > > Yep, that's what I was thinking. I don't see any other reason to have > that parameter there. Have you considered that it may be there to allow other encodings than UTF-8? Check the codecs module to see how many others there are. Stefan From janssen at parc.com Wed Jun 11 18:14:36 2008 From: janssen at parc.com (Bill Janssen) Date: Wed, 11 Jun 2008 09:14:36 PDT Subject: [XML-SIG] "encoding" argument to xml.dom.minidom.toxml()? In-Reply-To: <484F5B3E.9050307@behnel.de> References: <08Jun10.121338pdt."58698"@synergy1.parc.xerox.com> <484ED70B.8060107@behnel.de> <08Jun10.140019pdt."58698"@synergy1.parc.xerox.com> <484EECA8.9000508@behnel.de> <484EF750.8020907@rksystems.com> <08Jun10.163259pdt."58698"@synergy1.parc.xerox.com> <484F5B3E.9050307@behnel.de> Message-ID: <08Jun11.091436pdt."58698"@synergy1.parc.xerox.com> Stefan, I think we are talking past each other. I know it's there to allow encodings other than UTF-8, and I'm familiar with the codecs module, and I like the parameter, in general. The problem is that if you ignore the documentation, which seems to know that it's broken, and specify an encoding other than UTF-8, the generated XML sometimes doesn't conform to that encoding. Instead, an exception is raised from deep inside Python, which contains no indication of what piece of input data caused it. And there's no need for that to happen. XML can fully support any output encoding for any Unicode input stream, and it should do that. Bill > Hi, > > Bill Janssen wrote: > >> I suspect there's a certain amount of unarticulated assumptions on both > >> sides of this exchange. I'm guessing that Bill might be thinking > >> something like: "it's possible to represent any Unicode character in XML > >> as &#"; and was hoping that the method > >> would do just that for the non-ASCII characters if he asks for ASCII > >> encoding. > > > > Yep, that's what I was thinking. I don't see any other reason to have > > that parameter there. > > Have you considered that it may be there to allow other encodings than UTF-8? > Check the codecs module to see how many others there are. > > Stefan From sap28 at kent.ac.uk Wed Jun 11 17:50:48 2008 From: sap28 at kent.ac.uk (sap28 at kent.ac.uk) Date: Wed, 11 Jun 2008 16:50:48 +0100 (BST) Subject: [XML-SIG] installing PyXML for Python 2.5 Message-ID: <1472.129.12.16.180.1213199448.squirrel@webmail.cs.kent.ac.uk> Hello! I'm trying to install PyXML for Python 2.5 on windows XP to be able to use the validating parser but I have been unsuccessful so far. I tried to compile the PyXML from the any platform package but I got the following error: "running build_ext error: Python was built with Visual Studio 2003; extensions must be built with a compiler than can generate compatible binaries. Visual Studio 2003 was not found on this system. If you have Cygwin installed, you can try compiling with MingW32, by passing "-c mingw32" to setup.py." (The Python package was added by the installation of the Haptic API H3D Beta). I then installed Cygwin and tried compiling using the indication provided but either the -c command wasn't recognized or the mingw32 wasn't defined. I've never really used Cygwin either so I don't quite know if I've inputted the wrong command or if it just doesn't work. I also naively tried to just copy the files to check if it could work without compilation, but i get the following error: from xml.parsers import expat File "C:\Python25\lib\site-packages\_xmlplus\parsers\expat.py", line 4, in from pyexpat import * ImportError: DLL load failed: The specified module could not be found. I really need it to have it working for my phd work, so I would be grateful if you could help me. Thanks for your help, Best regards, Sabrina From stefan_ml at behnel.de Wed Jun 11 19:47:46 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 11 Jun 2008 19:47:46 +0200 Subject: [XML-SIG] installing PyXML for Python 2.5 In-Reply-To: <1472.129.12.16.180.1213199448.squirrel@webmail.cs.kent.ac.uk> References: <1472.129.12.16.180.1213199448.squirrel@webmail.cs.kent.ac.uk> Message-ID: <48500FC2.9000100@behnel.de> Hi, sap28 at kent.ac.uk wrote: > I'm trying to install PyXML for Python 2.5 on windows XP Sounds like a FAQ to me. > to be able to use the validating parser Try lxml. http://codespeak.net/lxml/ Stefan From jeanmarc.chourot at free.fr Fri Jun 20 22:40:42 2008 From: jeanmarc.chourot at free.fr (jeanmarc.chourot at free.fr) Date: Fri, 20 Jun 2008 22:40:42 +0200 Subject: [XML-SIG] elementtree and uncomplete parsing Message-ID: <1213994442.485c15ca1aaca@imp.free.fr> Hi all, As a noob, I cannot find the way to make an incomplete parse of a tree. For instance, please consider the following xml file This text is completely crap because blabla This is another node with random tags I would like to retrieve what is between the tags ... into strings, the "subelements" being considered as simple string and not processed by elelement tree. In other words, this could be badly formed HTML not processed embeded into well formed xml tags. i.e. : string1 = "This text is completely crap because blabla " string2="This is another node with random tags " Could anyone help me with this ? Thanks a lot From stefan_ml at behnel.de Sat Jun 21 07:39:17 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 21 Jun 2008 07:39:17 +0200 Subject: [XML-SIG] elementtree and uncomplete parsing In-Reply-To: <1213994442.485c15ca1aaca@imp.free.fr> References: <1213994442.485c15ca1aaca@imp.free.fr> Message-ID: <485C9405.7010606@behnel.de> Hi, jeanmarc.chourot at free.fr wrote: > > This text is completely crap because blabla > > > > This is another node with random tags > > > > I would like to retrieve what is between the tags ... into > strings, the "subelements" being considered as simple string and not processed > by elelement tree. You are trying to make an XML parser not parse XML, that's bound to fail. > In other words, this could be badly formed HTML not processed embeded into > well formed xml tags. If you really have something like "embedded HTML", it must be escaped in your data to be parsable. There is no way an XML parser can return what you want without modifying your 'data' (at least loosing whitespace etc.). I think the easiest option (if you have it) is to talk to the idiots who sent you the data and have them fix it. Stefan From jeanmarc.chourot at free.fr Sat Jun 21 10:23:08 2008 From: jeanmarc.chourot at free.fr (Jean-Marc Chourot) Date: Sat, 21 Jun 2008 10:23:08 +0200 Subject: [XML-SIG] elementtree and uncomplete parsing In-Reply-To: <485C9405.7010606@behnel.de> References: <1213994442.485c15ca1aaca@imp.free.fr> <485C9405.7010606@behnel.de> Message-ID: <1214036588.6525.9.camel@jeanmarc-laptop> > Hi, > > jeanmarc.chourot at free.fr wrote: > > > > This text is completely crap because blabla > > > > > > > > This is another node with random tags > > > > > > > > I would like to retrieve what is between the tags ... into > > strings, the "subelements" being considered as simple string and not processed > > by elelement tree. > > You are trying to make an XML parser not parse XML, that's bound to fail. > > > > In other words, this could be badly formed HTML not processed embeded into > > well formed xml tags. > > If you really have something like "embedded HTML", it must be escaped in your > data to be parsable. There is no way an XML parser can return what you want > without modifying your 'data' (at least loosing whitespace etc.). > > I think the easiest option (if you have it) is to talk to the idiots who sent > you the data and have them fix it. > > Stefan > Thanks for you help, The real problem is not about "badly formed HTML" : each node will correspond to a leaf of a wx.TreeCtrl and the data associated to the leaf will be the content of a wx.RichTextCtrl. When saving the whole tree content in one file, I want to be able to get the structure of the tree and relocate the data to each leaf and definitely not touch the content which is parse the wxrichTxtCtrl. I was hoping Elementtree could help with this.. but maybe I am wrong and should think of a simplier system of tags in the text. From stefan_ml at behnel.de Sat Jun 21 12:02:06 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 21 Jun 2008 12:02:06 +0200 Subject: [XML-SIG] elementtree and uncomplete parsing In-Reply-To: <1214036588.6525.9.camel@jeanmarc-laptop> References: <1213994442.485c15ca1aaca@imp.free.fr> <485C9405.7010606@behnel.de> <1214036588.6525.9.camel@jeanmarc-laptop> Message-ID: <485CD19E.7020708@behnel.de> Hi, Jean-Marc Chourot wrote: > The real problem is not about "badly formed HTML" : each node will > correspond to a leaf of a wx.TreeCtrl and the data associated to the > leaf will be the content of a wx.RichTextCtrl. When saving the whole > tree content in one file, I want to be able to get the structure of the > tree and relocate the data to each leaf and definitely not touch the > content which is parse the wxrichTxtCtrl. If I understand correctly, your XML-like string content comes from user input in the RichTextCtrl. Meaning: when you copy it into the XML tree, it should get escaped (i.e. '<' replaced by '<' etc.). Then every XML parser will read this as you expect. ElementTree will do the escaping for you when you set the ".text" property of a leaf node. Or did you mean to say that wxPython saves the broken XML for you? Stefan From martin at v.loewis.de Mon Jun 23 06:54:25 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 23 Jun 2008 06:54:25 +0200 Subject: [XML-SIG] "encoding" argument to xml.dom.minidom.toxml()? In-Reply-To: <08Jun11.091436pdt."58698"@synergy1.parc.xerox.com> References: <08Jun10.121338pdt."58698"@synergy1.parc.xerox.com> <484ED70B.8060107@behnel.de> <08Jun10.140019pdt."58698"@synergy1.parc.xerox.com> <484EECA8.9000508@behnel.de> <484EF750.8020907@rksystems.com> <08Jun10.163259pdt."58698"@synergy1.parc.xerox.com> <484F5B3E.9050307@behnel.de> <08Jun11.091436pdt."58698"@synergy1.parc.xerox.com> Message-ID: <485F2C81.8050003@v.loewis.de> > Stefan, I think we are talking past each other. I know it's there to > allow encodings other than UTF-8, and I'm familiar with the codecs > module, and I like the parameter, in general. The problem is that if > you ignore the documentation, which seems to know that it's broken, > and specify an encoding other than UTF-8, the generated XML sometimes > doesn't conform to that encoding. Can you give an example? I'm unable to reproduce the behavior you are seeing; it works just fine for me: py> import xml.dom.minidom py> d=xml.dom.minidom.getDOMImplementation().createDocument(None,"root",None) py> t=d.createTextNode(u"\u20ac") py> x=d.documentElement.appendChild(t) py> d.toxml(encoding="iso-8859-15") '\xa4' AFAICT, this is the correct byte string. Can you give an example where toxml returns an incorrect byte string? Regards, Martin