From nick at isilon.com Thu Apr 8 14:35:45 2004 From: nick at isilon.com (Nicholas M. Kirsch) Date: Thu Apr 8 14:35:49 2004 Subject: [XML-SIG] XForms Message-ID: <20040408183544.GG11388@isilon.com> Does anyone know of any Python libraries for interacting with XForms? A cursory google search didn't result in anything useful. Additionally, are there any opinions on XForms? I did find an article in which the author thought XForms was in jeopardy of adoption due to lack of Microsoft support (apparently they have a similar product called InfoPath) and complexity. Essentially, I am looking for a good XML schema for creating user-interfaces which are mainly forms. I'm also investigating Mozilla's XUL; there seems to be a couple of Python libraries for it. I appreciate any comments. Nick From hasank at metu.edu.tr Thu Apr 8 15:13:02 2004 From: hasank at metu.edu.tr (hasan karaaslan) Date: Thu Apr 8 15:13:10 2004 Subject: [XML-SIG] Installation problem with:EZRO, PyXML, 4Suite, Python2.3, Zope-2.7.0 Message-ID: <200404081913.i38JD4719646@myra.general.services.metu.edu.tr> Hi, I am trying to istall an open source, WorkForce Connections v1.4 (or the most newest version EZRO 2.x. It is a Content Repository type software based on Python and Zope. I need some help for the installation of xmllib and related software. Due to the interference of the xmllib of PyXML and 4Suite, I think, the following problim arises. Due to the belove error, the Zope server creates "SAMS metasite" not in the correct form, and resulting unable to create a subsite. The related softwares installed or upgraded as follow: Zope 2.7.0 Workforce_Connections_linux14 OrderedFolder-0.4.0.tgz Python 2.3 (and all other related packages such as development etc) PyXML-0.8.3 4Suite-1.0a3.tar.gz And during the installtion of the software I faced with the fallowing error: 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.RawContent.RawContent: "editContent" != "manage_editForm" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.RawContent.RawContent: "editContent" != "manage_editForm" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.Comment.Comment: "index_html" != "__call__" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.Comment.Comment: "index_html" != "__call__" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.MCQuestion.MCQuestion: "editContent" != "manage_editForm" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.MCQuestion.MCQuestion: "editContent" != "manage_editForm" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.MPQuestion.MPQuestion: "editContent" != "manage_editForm" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.MPQuestion.MPQuestion: "editContent" != "manage_editForm" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.RDFContent.RDFContent: "__call__" != "index_html" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.RDFContent.RDFContent: "__call__" != "index_html" /usr/lib/python2.3/xmllib.py:10: DeprecationWarning: The xmllib module is obsolete. Use xml.sax instead. DeprecationWarning) ------ 2004-04-03T16:09:31 DEBUG(-200) TemporaryStorage create storage temporary storage for sessioning ------ 2004-04-03T16:09:31 BLATHER(-100) ZODB Commiting subtransaction of size 5386 ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.SuggestFolder.SuggestFolder: "__str__" != "index_html" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.SuggestFolder.SuggestFolder: "__str__" != "manageSuggestFolder" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.SuggestFolder.SuggestFolder: "__str__" != "__call__" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.SuggestFolder.SuggestFolder: "__str__" != "index_html" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.SuggestFolder.SuggestFolder: "__str__" != "manageSuggestFolder" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.SuggestFolder.SuggestFolder: "__str__" != "__call__" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.SuggestPage.SuggestPage: "__str__" != "__call__" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.SuggestPage.SuggestPage: "__str__" != "manageSuggestPage" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.SuggestPage.SuggestPage: "__str__" != "__call__" ------ 2004-04-03T16:09:31 PROBLEM(100) Init Ambiguous name for method of Products.ContentRepository.SuggestPage.SuggestPage: "__str__" != "manageSuggestPage" If anyone has some suggestions what I'm doing wrong, I'd very very glad to hear them. Thanks in advance, __________________ Hasan Karaaslan -------------------------------------- hasan karaaslan hasank@metu.edu.tr From sanders at msu.edu Tue Apr 13 09:16:11 2004 From: sanders at msu.edu (Shelley Sander) Date: Tue Apr 13 09:05:43 2004 Subject: [XML-SIG] Buy Vicodin online today, overnight shipping xyiz kccg v Message-ID: <5.2.1.1.2.20040413091539.00aa7798@mail.msu.edu> are you a pharmacy, let me know! thank you From hao.xing at inet.com Tue Apr 13 12:59:25 2004 From: hao.xing at inet.com (hao xing) Date: Thu Apr 15 09:47:47 2004 Subject: [XML-SIG] XInclude Message-ID: <407C1C6D.6070805@inet.com> Hello: Is there any support to XInclude in the latest PyXml package? thanks ------------------------------------------------------------------------ Confidentiality Notice: This e-mail transmission may contain confidential and/or privileged information that is intended only for the individual or entity named in the e-mail address. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or reliance upon the contents of this e-mail message is strictly prohibited. If you have received this e-mail transmission in error, please reply to the sender, so that proper delivery can be arranged, and please delete the message from your computer. Thank you. Inet Technologies, Inc. ------------------------------------------------------------------------ From thomasj at eworld.hu Sat Apr 17 05:15:22 2004 From: thomasj at eworld.hu (Thomas) Date: Sat Apr 17 05:15:35 2004 Subject: [XML-SIG] SAX encoding and special characters Message-ID: <806592003.20040417111522@eworld.hu> Hello, I'm playing with SAX with Python-2.3.3. My goal is to parse XML files (I don't want to generate them). My XML file starts with: I would like to get the encoding before parsing (I would like to use it in ContentHandler class). Is there a way to get encoding from the XML file with SAX? I tryed to open the file with InputStream and ask with getEncoding() but it returned None all the time. Is the encoding given in the XML file used by SAX? My second problem/question is about special characters in XML. Sometimes I have spec. chars (with char code 0-31) in XML and the parser ends with: Traceback (most recent call last): File "./testXML.py", line 175, in ? saxparser.parse(sys.argv[1]) File "/usr/local/lib/python2.3/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/local/lib/python2.3/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/local/lib/python2.3/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/usr/local/lib/python2.3/xml/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: spec_char.xml:68271:61: not well-formed (invalid token) I would like to just ignore/drop out the problematic char. How can I do that? I thought about putting an ErrorHandler but I think it can only catch that situation but cannot process further the problematic field. I googled some hours on the net but didn't find any solution. I would be happy to get some ideas. Thanks in advance, Thomas I have: oh = optionsHandler() saxparser = make_parser() saxparser.setContentHandler(oh) saxparser.parse(sys.argv[1]) optionsHandler works fine. From fredrik at pythonware.com Sat Apr 17 11:46:34 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Apr 17 11:46:22 2004 Subject: [XML-SIG] Re: SAX encoding and special characters References: <806592003.20040417111522@eworld.hu> Message-ID: "Thomas" wrote: > I'm playing with SAX with Python-2.3.3. My goal is to parse XML files > (I don't want to generate them). > My XML file starts with: > > I would like to get the encoding before parsing (I would like to use > it in ContentHandler class). just curious, but why do you need the encoding to handle the content? > My second problem/question is about special characters in XML. > Sometimes I have spec. chars (with char code 0-31) in XML and the > parser ends with: > xml.sax._exceptions.SAXParseException: spec_char.xml:68271:61: > not well-formed (invalid token) as the parser says, control characters are not allowed in XML files (except for a few whitespace codes). if you really need to parse those files, you have to fix them up before passing them to the parser (you can simply read them into a python string, delete all junk characters, and then use parseString to parse them) From thomasj at eworld.hu Sun Apr 18 03:04:56 2004 From: thomasj at eworld.hu (Thomas) Date: Sun Apr 18 03:05:06 2004 Subject: [XML-SIG] Re: SAX encoding and special characters In-Reply-To: References: <806592003.20040417111522@eworld.hu> Message-ID: <1909283730.20040418090456@eworld.hu> Saturday, April 17, 2004, 5:46:34 PM, Fredrik wrote: FL> "Thomas" wrote: >> I'm playing with SAX with Python-2.3.3. My goal is to parse XML files >> (I don't want to generate them). >> My XML file starts with: >> >> I would like to get the encoding before parsing (I would like to use >> it in ContentHandler class). FL> just curious, but why do you need the encoding to handle the content? I need the encoding information, because later I need to convert unicode back to that coding. Unfortunately I can't change the XML format while it's not in my hands (I can't put into another element). >> My second problem/question is about special characters in XML. >> Sometimes I have spec. chars (with char code 0-31) in XML and the >> parser ends with: >> xml.sax._exceptions.SAXParseException: spec_char.xml:68271:61: >> not well-formed (invalid token) FL> as the parser says, control characters are not allowed in XML files (except FL> for a few whitespace codes). if you really need to parse those files, you FL> have FL> to fix them up before passing them to the parser (you can simply read them FL> into a python string, delete all junk characters, and then use parseString FL> to FL> parse them) FL> Yes, this was the 1st thing I tryed out. Unfortunately I got: Traceback (most recent call last): File "./xmlparser_new.py", line 210, in ? saxparser.parseString(document) AttributeError: ExpatParser instance has no attribute 'parseString' Do you have an idea how to fix it? (yes, I underestand that it's not supported by expat - unfortunately I don't have experience with it). Thanks, Thomas python-2.3.3, Debian Woody, libexpat1-1.95.2-6, libexpat1-dev-1.95.2-6 From fredrik at pythonware.com Sun Apr 18 04:14:05 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun Apr 18 04:13:57 2004 Subject: [XML-SIG] Re: Re: SAX encoding and special characters References: <806592003.20040417111522@eworld.hu> <1909283730.20040418090456@eworld.hu> Message-ID: "Thomas" wrote: > FL> > Yes, this was the 1st thing I tryed out. Unfortunately I got: > Traceback (most recent call last): > File "./xmlparser_new.py", line 210, in ? > saxparser.parseString(document) > AttributeError: ExpatParser instance has no attribute 'parseString' > > Do you have an idea how to fix it? (yes, I underestand that it's not > supported by expat - unfortunately I don't have experience with it). iirc, parse takes either a file name or a file object, so the following might work: import StringIO ... saxparser.parse(StringIO.StringIO(document)) Python's SAX implementation also supports incremental parsing; I think you should be able to simply do: saxparser.feed(document) saxparser.close() ::: and yes, since you have to read the entire document into a string, you can extract the encoding from that string. here's a fairly robust RE that should do the trick: m = re.match(r"<\?xml[^>]+encoding=['\"]([-\w]+)['\"]", data) if m: encoding = m.group(1) (a much better approach is to stick to a standard encoding in the output files, no matter what encoding the XML files use. XML is unicode, and the XML encoding shouldn't matter). From mike at skew.org Sun Apr 18 04:54:14 2004 From: mike at skew.org (Mike Brown) Date: Sun Apr 18 04:54:23 2004 Subject: [XML-SIG] Re: Re: SAX encoding and special characters In-Reply-To: "from Fredrik Lundh at Apr 18, 2004 10:14:05 am" Message-ID: <200404180854.i3I8sESW006861@chilled.skew.org> Fredrik Lundh wrote: > and yes, since you have to read the entire document into a string, you can > extract the encoding from that string. here's a fairly robust RE that > should > do the trick: > > m = re.match(r"<\?xml[^>]+encoding=['\"]([-\w]+)['\"]", data) > if m: > encoding = m.group(1) That works as long as the string itself is Unicode or is encoded with a superset of ASCII. It won't work on UTF-16 (w/BOM), UTF-16LE, or UTF-16BE strings. There's also this, for detecting the actual encoding, not necessarily what's declared: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52257 ...it's not perfect, though, as I noted in the comments. From thomasj at eworld.hu Sun Apr 18 05:15:14 2004 From: thomasj at eworld.hu (Thomas) Date: Sun Apr 18 05:15:57 2004 Subject: [XML-SIG] Re: Re: SAX encoding and special characters In-Reply-To: References: <806592003.20040417111522@eworld.hu> <1909283730.20040418090456@eworld.hu> Message-ID: <444014500.20040418111514@eworld.hu> Thanks Fredrik! Both suggested solutions work like a charm. Thank you for help! Regards, Thomas Sunday, April 18, 2004, 10:14:05 AM, Fredrik wrote: FL> "Thomas" wrote: >> FL> >> Yes, this was the 1st thing I tryed out. Unfortunately I got: >> Traceback (most recent call last): >> File "./xmlparser_new.py", line 210, in ? >> saxparser.parseString(document) >> AttributeError: ExpatParser instance has no attribute 'parseString' >> >> Do you have an idea how to fix it? (yes, I underestand that it's not >> supported by expat - unfortunately I don't have experience with it). FL> iirc, parse takes either a file name or a file object, so the following FL> might work: FL> import StringIO FL> ... FL> saxparser.parse(StringIO.StringIO(document)) FL> Python's SAX implementation also supports incremental parsing; I think FL> you should be able to simply do: FL> saxparser.feed(document) FL> saxparser.close() FL> ::: FL> and yes, since you have to read the entire document into a string, you can FL> extract the encoding from that string. here's a fairly robust RE that FL> should FL> do the trick: FL> m = re.match(r"<\?xml[^>]+encoding=['\"]([-\w]+)['\"]", data) FL> if m: FL> encoding = m.group(1) FL> (a much better approach is to stick to a standard encoding in the output FL> files, FL> no matter what encoding the XML files use. XML is unicode, and the XML FL> encoding shouldn't matter). FL> From fredrik at pythonware.com Mon Apr 19 11:18:09 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon Apr 19 11:17:54 2004 Subject: [XML-SIG] Re: XInclude References: <407C1C6D.6070805@inet.com> Message-ID: hao xing wrote: > Is there any support to XInclude in the latest PyXml package? it's not PyXml, but the latest versions of ElementTree has basic XInclude support: http://effbot.org/zone/element-xinclude.htm http://effbot.org/zone/element.htm You might wish to check if you can use ElementTree in your project; if not, feel free to use the ElementInclude module as a base for an XInclude implementation for the standard DOM (the spec is short and fairly simple, so that shouldn't be much work). From eustaquiorangel at yahoo.com Mon Apr 19 13:35:45 2004 From: eustaquiorangel at yahoo.com (Eustaquio Rangel de Oliveira Jr.) Date: Mon Apr 19 13:35:55 2004 Subject: [XML-SIG] Encoding again Message-ID: <40840DF1.9040104@yahoo.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hey. I have a XML file with: and a XLST file with: Using both files on Sablotron, for example, gives me a ISO-8859-1 output, with all the latin characters. But using libxml and Python to make this gives me weird chars. What could be happening here? Using encode/decode on the result string gives me a "UnicodeError: ASCII decoding error: ordinal not in range(128)". Thanks! - -- ~ .--. TaQ (Eust?quio Rangel) ((__-^^-,-^^-__)) ~ |o_o | Usu?rio registrado GNU/Linux no. 224050 `-_---' `---_-' ~ |:_/ | email : eustaquiorangel@yahoo.com `--|o` 'o|--' ~ // \ \ URL : http://beam.to/taq \ ` / ~ (| | ) ICQ : 110103942 ): :( /'\_ _/`\ PGP key: 0x784988BB :o_o: \___)=(___/ Eu gosto de GNU/Linux, e vc ? ;) "-" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAhA3xb6UiZnhJiLsRAt4+AKCuJP5l3jSxwlTebn4fT4Fs7cUF6wCfbj65 YiTR1RX5h2frx9XgkH+JuEc= =ZrHg -----END PGP SIGNATURE----- From fredrik at pythonware.com Mon Apr 19 13:52:39 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon Apr 19 13:52:26 2004 Subject: [XML-SIG] Re: Encoding again References: <40840DF1.9040104@yahoo.com> Message-ID: Eustaquio Rangel de Oliveira Jr. wrote: > But using libxml and Python to make this gives me weird chars. What > could be happening here? Using encode/decode on the result string gives > me a "UnicodeError: ASCII decoding error: ordinal not in range(128)". what's the output encoding? the XML default is UTF-8, which tends to look as weird characters if you view them as ISO-8859-1. check if you can specify the encoding when you're saving the file (according to the libxml2 docs, xmlSaveFileEnc and xmlSaveFileTo both take encoding arguments). From eustaquiorangel at yahoo.com Mon Apr 19 14:08:16 2004 From: eustaquiorangel at yahoo.com (Eustaquio Rangel de Oliveira Jr.) Date: Mon Apr 19 14:08:26 2004 Subject: [XML-SIG] Re: Encoding again Message-ID: <40841590.3040502@yahoo.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 |what's the output encoding? the XML default is UTF-8, which tends to |look as weird characters if you view them as ISO-8859-1. |check if you can specify the encoding when you're saving the file |(according to the libxml2 docs, xmlSaveFileEnc and xmlSaveFileTo both |take encoding arguments). Hey. Thanks for your answer! I'm running it on the console, not saving to a file. I'm using this: import libxml2 import libxslt styledoc = libxml2.parseFile("impmod.xsl") style = libxslt.parseStylesheetDoc(styledoc) mods = ("/home/taq/progs.xml","/home/taq/progs2.xml"); for mod in mods: doc = libxml2.parseFile(mod) result = style.applyStylesheet(doc,None) print result.content style.freeStylesheet() doc.freeDoc() result.freeDoc() There on "print result.content" is where the weird chars are. I'd like to make it works first viewing on the console and so maybe saving it to a file. That None value on applyStylesheet is kind of misterous to me. I'm a little confused by some questions as where I can get the API guide of all the functions I can use with libxml2/libslt with Python (dir works but it will be a cool thing find the APIs somewhere ehehe). For example, didn't know about xmlSaveFileEnc and xmlSaveFileTo, where are they? Excuse my ignorance about this matter, I'm a newbie on Python and a newbie using XML stuff on it. :-) Thanks again, - -- ~ .--. TaQ (Eust?quio Rangel) ((__-^^-,-^^-__)) ~ |o_o | Usu?rio registrado GNU/Linux no. 224050 `-_---' `---_-' ~ |:_/ | email : eustaquiorangel@yahoo.com `--|o` 'o|--' ~ // \ \ URL : http://beam.to/taq \ ` / ~ (| | ) ICQ : 110103942 ): :( /'\_ _/`\ PGP key: 0x784988BB :o_o: \___)=(___/ Eu gosto de GNU/Linux, e vc ? ;) "-" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAhBWQb6UiZnhJiLsRArAGAKC55hcAgDvQyGAISmU3NOkxvrl0bwCgp/3d l1b8KJ0vNsMtBMu3OulYNsU= =E956 -----END PGP SIGNATURE----- From fredrik at pythonware.com Mon Apr 19 16:12:55 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon Apr 19 16:12:40 2004 Subject: [XML-SIG] Re: Encoding again References: <40841590.3040502@yahoo.com> Message-ID: Eustaquio Rangel de Oliveira Jr wrote: > I'm running it on the console, not saving to a file. > I'm using this: > > import libxml2 > import libxslt > > styledoc = libxml2.parseFile("impmod.xsl") > style = libxslt.parseStylesheetDoc(styledoc) > mods = ("/home/taq/progs.xml","/home/taq/progs2.xml"); > for mod in mods: > doc = libxml2.parseFile(mod) > result = style.applyStylesheet(doc,None) > print result.content > style.freeStylesheet() > doc.freeDoc() > result.freeDoc() > > There on "print result.content" is where the weird chars are. I'd like > to make it works first viewing on the console and so maybe saving it to > a file. the "content" attribute contains the content of an element, using the default encoding. digging a little more, it looks as if you can use the "serialize" method to get the result you want: print result.serialize("iso-8859-1") > That None value on applyStylesheet is kind of misterous to me. I'm a > little confused by some questions as where I can get the API guide of > all the functions I can use with libxml2/libslt with Python (dir works > but it will be a cool thing find the APIs somewhere ehehe). >>> help(libxml2) >>> help(libxslt) also see: http://www.xmlsoft.org/ > For example, didn't know about xmlSaveFileEnc and xmlSaveFileTo, where > are they? somewhere inside libxml2, I suppose. looking again, the functions seem to be available as "SaveTo" and "SaveFileTo" and "SaveFileEnc" methods on the document objects. hope this helps! From eustaquiorangel at yahoo.com Mon Apr 19 16:32:20 2004 From: eustaquiorangel at yahoo.com (Eustaquio Rangel de Oliveira Jr.) Date: Mon Apr 19 16:35:11 2004 Subject: [XML-SIG] Re: Encoding again Message-ID: <40843754.1040101@yahoo.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! |digging a little more, it looks as if you can use the |"serialize" method to get the result you want: | print result.serialize("iso-8859-1") Perfect! Now I can see the latin chars! :-) |>>> help(libxml2) |>>> help(libxslt) | also see: http://www.xmlsoft.org/ | hope this helps! Sure helped! Thanks a lot! - -- ~ .--. TaQ (Eust?quio Rangel) ((__-^^-,-^^-__)) ~ |o_o | Usu?rio registrado GNU/Linux no. 224050 `-_---' `---_-' ~ |:_/ | email : eustaquiorangel@yahoo.com `--|o` 'o|--' ~ // \ \ URL : http://beam.to/taq \ ` / ~ (| | ) ICQ : 110103942 ): :( /'\_ _/`\ PGP key: 0x784988BB :o_o: \___)=(___/ Eu gosto de GNU/Linux, e vc ? ;) "-" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAhDdUb6UiZnhJiLsRAvg6AJ9uV0YTSsqwQGXP0cDkh61vZ2f/+QCfeoem qryPG8PCE+UV4IjW52OnZh8= =TOam -----END PGP SIGNATURE----- From veillard at redhat.com Mon Apr 19 18:03:18 2004 From: veillard at redhat.com (Daniel Veillard) Date: Mon Apr 19 18:03:47 2004 Subject: [XML-SIG] Re: Encoding again In-Reply-To: References: <40840DF1.9040104@yahoo.com> Message-ID: <20040419220316.GC5799@redhat.com> On Mon, Apr 19, 2004 at 07:52:39PM +0200, Fredrik Lundh wrote: > Eustaquio Rangel de Oliveira Jr. wrote: > > > But using libxml and Python to make this gives me weird chars. What > > could be happening here? Using encode/decode on the result string gives > > me a "UnicodeError: ASCII decoding error: ordinal not in range(128)". > > what's the output encoding? the XML default is UTF-8, which tends to > look as weird characters if you view them as ISO-8859-1. > > check if you can specify the encoding when you're saving the file (according > to the libxml2 docs, xmlSaveFileEnc and xmlSaveFileTo both take encoding > arguments). yes but xmlSaveFile... is not used directly, you need to use the XSLT related serialization routine since the encoding= and method= information are stored in the stylesheet and not in the resulting tree. If the command line xsltproc shows the problem, then it sounds like a bug, please report it http://xmlsoft.org/XSLT/bugs.html if the command line tool does it right then it sounds more like a problem about using libxml2 API. Daniel -- Daniel Veillard | Red Hat Desktop team http://redhat.com/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ From veillard at redhat.com Mon Apr 19 18:08:27 2004 From: veillard at redhat.com (Daniel Veillard) Date: Mon Apr 19 18:08:55 2004 Subject: [XML-SIG] Re: Encoding again In-Reply-To: References: <40841590.3040502@yahoo.com> Message-ID: <20040419220827.GD5799@redhat.com> On Mon, Apr 19, 2004 at 10:12:55PM +0200, Fredrik Lundh wrote: > digging a little more, it looks as if you can use the "serialize" method > to get the result you want: > > print result.serialize("iso-8859-1") taht works but it ignores the options provided in the stylesheet then :-) stringval = style.saveResultToString(result) is the right way to serialize accordingly to the xsl:output option. There is a few examples in the source tree under python/tests/ or /usr/share/doc/libxslt-python-$version/ if you installed from the RPMs Daniel -- Daniel Veillard | Red Hat Desktop team http://redhat.com/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ From MAILER-DAEMON at Acrys.COM Tue Apr 20 13:00:13 2004 From: MAILER-DAEMON at Acrys.COM (MAILER-DAEMON@Acrys.COM) Date: Tue Apr 20 13:00:22 2004 Subject: [XML-SIG] Virus intercepted Message-ID: <200404201700.i3KH0DmP004308@devil.acrys.com> A message you sent to contained a virus and has not been delivered. /var/virus-mail/msg.f1044x: Worm.SomeFool.Gen-2 FOUND The message in question has been quarantined as /var/virus-mail/msg.f1044x From derekfountain at yahoo.co.uk Thu Apr 22 01:39:32 2004 From: derekfountain at yahoo.co.uk (Derek Fountain) Date: Thu Apr 22 01:37:14 2004 Subject: [XML-SIG] Skipped entities under SAX Message-ID: <200404221339.32223.derekfountain@yahoo.co.uk> I've been exploring the PyXML SAX parser, and I've come across the handler for skipped entities. i.e. the skippedEntity method in the ContentHandler object. Puzzlement here. What exactly is a skipped entity? It's not mentioned in the XML spec, as far as I can see. The name suggests that it's an entity which the parser comes across and doesn't know how to handle, so it passes over it. Sort of: This odd &wierdthing; in the text where the weirdthing entity isn't declared. I'm only guessing the above, and it seems that the expat parser doesn't entertain such ideas. AFAICT, if it comes across an entity which it doesn't recognise, it throws an error. So what is a skipped entity, and when would I receive a call to the skippedEntity method in my SAX handler? -- > eatapple core dump From walter.doerwald at livinglogic.de Thu Apr 22 06:26:48 2004 From: walter.doerwald at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Thu Apr 22 06:27:08 2004 Subject: [XML-SIG] Skipped entities under SAX In-Reply-To: <200404221339.32223.derekfountain@yahoo.co.uk> References: <200404221339.32223.derekfountain@yahoo.co.uk> Message-ID: <40879DE8.40602@livinglogic.de> Derek Fountain wrote: > I've been exploring the PyXML SAX parser, and I've come across the handler for > skipped entities. i.e. the skippedEntity method in the ContentHandler object. > > Puzzlement here. What exactly is a skipped entity? It's not mentioned in the > XML spec, as far as I can see. The name suggests that it's an entity which > the parser comes across and doesn't know how to handle, so it passes over it. > Sort of: > > This odd &wierdthing; in the text > > where the weirdthing entity isn't declared. > > I'm only guessing the above, and it seems that the expat parser doesn't > entertain such ideas. AFAICT, if it comes across an entity which it doesn't > recognise, it throws an error. > > So what is a skipped entity, and when would I receive a call to the > skippedEntity method in my SAX handler? For ExpatParser you can overwrite reset like this: class ExpatParser(expatreader.ExpatParser): def reset(self): expatreader.ExpatParser.reset(self) self._parser.UseForeignDTD(True) Before parsing you have to call: parser.setFeature(handler.feature_external_ges, False) Then ExpatParser will pass entity references to skippedEntity(). Bye, Walter D?rwald From martin at v.loewis.de Thu Apr 22 14:26:34 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Apr 22 14:28:27 2004 Subject: [XML-SIG] Skipped entities under SAX In-Reply-To: <200404221339.32223.derekfountain@yahoo.co.uk> References: <200404221339.32223.derekfountain@yahoo.co.uk> Message-ID: <40880E5A.7040704@v.loewis.de> Derek Fountain wrote: > I've been exploring the PyXML SAX parser, and I've come across the handler for > skipped entities. i.e. the skippedEntity method in the ContentHandler object. > > Puzzlement here. What exactly is a skipped entity? It's not mentioned in the > XML spec, as far as I can see. The name suggests that it's an entity which > the parser comes across and doesn't know how to handle, so it passes over it. It's a SAX thing, and yes, it is exactly that. It is the implementation of 4.4.3 of XML 1.0: If a non-validating processor does not include the replacement text, it MUST inform the application that it recognized, but did not read, the entity. > I'm only guessing the above, and it seems that the expat parser doesn't > entertain such ideas. AFAICT, if it comes across an entity which it doesn't > recognise, it throws an error. Why are you saying this? It works for me just fine: import xml.sax,xml.sax.handler class Handler(xml.sax.ContentHandler): def skippedEntity(self, e): print "Skipping",e p = xml.sax.make_parser() h = Handler() p.setContentHandler(h) p.setFeature(xml.sax.handler.feature_external_ges, False) p.feed("&unknown;") This prints Skipping unknown Regards, Martin From derekfountain at yahoo.co.uk Fri Apr 23 01:33:34 2004 From: derekfountain at yahoo.co.uk (Derek Fountain) Date: Fri Apr 23 01:31:56 2004 Subject: [XML-SIG] Skipped entities under SAX In-Reply-To: <40880E5A.7040704@v.loewis.de> References: <200404221339.32223.derekfountain@yahoo.co.uk> <40880E5A.7040704@v.loewis.de> Message-ID: <200404231333.34954.derekfountain@yahoo.co.uk> > > I'm only guessing the above, and it seems that the expat parser doesn't > > entertain such ideas. AFAICT, if it comes across an entity which it > > doesn't recognise, it throws an error. > > Why are you saying this? It works for me just fine: Yes, your example works for me too. However, the same code with my XML sample didn't work. It raises an exception: xml.sax._exceptions.SAXParseException: ./bookmarks.xml:8:10: undefined entity It turns out that the problem is my lack of a DTD declaration. If I add an external DTD declaration: then it works. Why might that be? -- > eatapple core dump From fredrik at pythonware.com Fri Apr 23 02:11:15 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri Apr 23 02:11:01 2004 Subject: [XML-SIG] Re: Skipped entities under SAX References: <200404221339.32223.derekfountain@yahoo.co.uk><40880E5A.7040704@v.loewis.de> <200404231333.34954.derekfountain@yahoo.co.uk> Message-ID: Derek Fountain wrote: > Yes, your example works for me too. However, the same code with my XML sample > didn't work. It raises an exception: > > xml.sax._exceptions.SAXParseException: ./bookmarks.xml:8:10: undefined entity > > It turns out that the problem is my lack of a DTD declaration. If I add an > external DTD declaration: > > > > then it works. > > Why might that be? the parser knows that if the document doesn't have an external DTD, there's no way that entity could be defined. From derekfountain at yahoo.co.uk Fri Apr 23 02:59:09 2004 From: derekfountain at yahoo.co.uk (Derek Fountain) Date: Fri Apr 23 02:56:42 2004 Subject: [XML-SIG] Re: Skipped entities under SAX In-Reply-To: References: <200404221339.32223.derekfountain@yahoo.co.uk> <200404231333.34954.derekfountain@yahoo.co.uk> Message-ID: <200404231459.09475.derekfountain@yahoo.co.uk> > > It turns out that the problem is my lack of a DTD declaration. If I add > > an external DTD declaration: > > > > > > > > then it works. > > > > Why might that be? > > the parser knows that if the document doesn't have an external DTD, there's > no way that entity could be defined. Correct, but is there a reason it chooses to throw an error, rather than hand it over to the handler's skipped entity routine? If it chose the latter, it would obviously skip all entities and hand them all over, but why is that route not preferable to throwing an error and stopping the script? -- > eatapple core dump From postmaster at oswiecim.petex.bielsko.pl Fri Apr 23 04:27:11 2004 From: postmaster at oswiecim.petex.bielsko.pl (postmaster@oswiecim.petex.bielsko.pl) Date: Fri Apr 23 04:25:56 2004 Subject: [XML-SIG] VIRUS IN YOUR MAIL Message-ID: <200404230827.i3N8RBiw035837@oswiecim.petex.bielsko.pl> V I R U S A L E R T Our viruschecker found the Worm.Netsky.R Worm.Netsky.R virus(es) in your email to the following recipient(s): -> Please check your system for viruses, or ask your system administrator to do so. For your reference, here are the headers from your email: ------------------------- BEGIN HEADERS ----------------------------- From: xml-sig@python.org To: marek@oswiecim.pl Subject: Delivered Message (marek@oswiecim.pl) Date: Fri, 23 Apr 2004 11:30:32 +0300 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0016----=_NextPart_000_0016" X-Priority: 1 X-MSMail-Priority: High -------------------------- END HEADERS ------------------------------ From martin at v.loewis.de Sat Apr 24 06:53:57 2004 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Apr 24 08:21:40 2004 Subject: [XML-SIG] Re: Skipped entities under SAX In-Reply-To: <200404231459.09475.derekfountain@yahoo.co.uk> References: <200404221339.32223.derekfountain@yahoo.co.uk> <200404231333.34954.derekfountain@yahoo.co.uk> <200404231459.09475.derekfountain@yahoo.co.uk> Message-ID: <408A4745.1080903@v.loewis.de> Derek Fountain wrote: > Correct, but is there a reason it chooses to throw an error, rather than hand > it over to the handler's skipped entity routine? If it chose the latter, it > would obviously skip all entities and hand them all over, but why is that > route not preferable to throwing an error and stopping the script? Yes. The document is ill-formed, as it violates this well-formedness condition from section 4.1 of XML 1.0: In a document without any DTD, a document with only an internal DTD subset which contains no parameter entity references, or a document with "standalone='yes'", for an entity reference that does not occur within the external subset or a parameter entity, the Name given in the entity reference MUST match that in an entity declaration that does not occur within the external subset or a parameter entity, except that well-formed documents need not declare any of the following entities: amp, lt, gt, apos, quot. Your document is without any DTD, so to be well-formed, it must not contain any entity reference except for the five predefined ones. According to section 5.1, a processor must report a violation of a WFC. In SAX, such violations are reported to the error handler, which defaults to raising an exception. Regards, Martin