From stefan_ml at behnel.de Fri Feb 1 19:48:46 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 01 Feb 2008 19:48:46 +0100 Subject: [XML-SIG] lxml 2.0 released Message-ID: <47A3698E.7090208@behnel.de> Hi everyone, I'm very happy to announce the official release of lxml 2.0! http://codespeak.net/lxml/ http://pypi.python.org/pypi/lxml/2.0 ** What is lxml? """ In short: lxml is the most feature-rich and easy-to-use library for working with XML and HTML in the Python language. lxml is a Pythonic binding for the libxml2 and libxslt libraries. It is unique in that it combines the speed and feature completeness of these libraries with the simplicity of a native Python API. """ ** Install lxml 2.0 with $ easy_install lxml==2.0 ** The complete changelog is here: http://codespeak.net/lxml/changes-2.0.html This release marks the end of a development effort of more than 6 months, starting with the release of the last stable series lxml 1.3. The major differences are explained on this page: http://codespeak.net/lxml/lxml2.html lxml 2.0 is not a revolution, it is a gradual move towards a cleaner API with more things working together as expected. But it nevertheless comes with a lot of new tools and features, that makes your XML life easier - and even more your HTML life. There are also a couple of minor things that were deprecated, which will be removed for lxml 2.1. See the above link for details. The new release has already adopted a lot of changes from the upcoming ElementTree 1.3 library, and implements a much broader set of compatible features, such as the TreeBuilder interface for parser targets. Have fun, Stefan From info at thegrantinstitute.com Fri Feb 1 19:46:33 2008 From: info at thegrantinstitute.com (Anthony Jones) Date: 01 Feb 2008 10:46:33 -0800 Subject: [XML-SIG] Professional Grant Proposal Writing Workshop (February 2008: Lafayette, Louisiana) Message-ID: <20080201104633.6B499C9119568A39@thegrantinstitute.com> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20080201/41f2601d/attachment.htm From woodcocs at hotmail.com Sun Feb 3 00:04:21 2008 From: woodcocs at hotmail.com (woodcock) Date: Sat, 2 Feb 2008 15:04:21 -0800 (PST) Subject: [XML-SIG] SAX characters() output on multiple lines for non-ascii Message-ID: <15248449.post@talk.nabble.com> I am starting with SAX and am trying to parse a file that contains non-ascii characters. The xml file uses 'ISO-8859-1'. When it parses text containing non-ascii characters the output is across multiple lines. Example Trying to output 'Der Einfluss kleiner naturnaher Retentionsma?nahmen in der Fl?che auf den Hochwasserabfluss - Kleinr?ckhaltebecken' The output I get is Start ELEMENT ='title' String read is 'Der Einfluss kleiner naturnaher Retentionsma' String read is '?' String read is 'nahmen in der Fl' String read is '?' String read is 'che auf den Hochwasserabfluss - Kleinr' String read is '?' String read is 'ckhaltebecken -.' End ELEMENT ='title' whereas I want a single string something like... Start ELEMENT ='title' String read is 'Der Einfluss kleiner naturnaher Retentionsma?nahmen in der Fl?che auf den Hochwasserabfluss - Kleinr?ckhaltebecken -. End ELEMENT ='title' My code is: def characters(self, chars): newchars=[] newchars.append(chars.encode('ISO-8859-1')) if newchars[-1] == '\n': newchars = newchars[:-1] if len(newchars)> 0: output = 'String read is ' + "'" + ''.join(newchars) + "'\n" sys.stdout.write(output) return Does anyone have any ideas? -- View this message in context: http://www.nabble.com/SAX-characters%28%29-output-on-multiple-lines-for-non-ascii-tp15248449p15248449.html Sent from the Python - xml-sig mailing list archive at Nabble.com. From fdrake at acm.org Sun Feb 3 04:03:20 2008 From: fdrake at acm.org (Fred Drake) Date: Sat, 2 Feb 2008 22:03:20 -0500 Subject: [XML-SIG] SAX characters() output on multiple lines for non-ascii In-Reply-To: <15248449.post@talk.nabble.com> References: <15248449.post@talk.nabble.com> Message-ID: On Feb 2, 2008, at 6:04 PM, woodcock wrote: > I am starting with SAX and am trying to parse a file that contains > non-ascii > characters. The xml file uses 'ISO-8859-1'. When it parses text > containing > non-ascii characters the output is across multiple lines. This is a fundamental issue with the SAX interface (the interface doesn't mandate the splits, but states that they're allowed). If you want something that buffers the text and provides it in larger chunks, that could be written as a proxy content handler. It might be nice if one were provided out of the box, since this is a common request, but the basic issue is that some seriously huge amounts of data may be enclosed between non-text calls, and one of the advantages of SAX is that it doesn't require loading large portions of the document into memory if the application doesn't require it. -Fred -- Fred Drake From woodcocs at hotmail.com Sun Feb 3 15:12:26 2008 From: woodcocs at hotmail.com (woodcock) Date: Sun, 3 Feb 2008 06:12:26 -0800 (PST) Subject: [XML-SIG] SAX characters() output on multiple lines for non-ascii In-Reply-To: <15248449.post@talk.nabble.com> References: <15248449.post@talk.nabble.com> Message-ID: <15253815.post@talk.nabble.com> Thanks for your reply. Well I have looked at it again and I lose the repeated lines if I remove the \n and simplify part of it to: if len(newchars)> 0: output = ''.join(newchars) sys.stdout.write(output) Start ELEMENT ='title' Der Einfluss kleiner naturnaher Retentionsma?nahmen in der Fl?che auf den Hochwa sserabfluss - Kleinr?ckhaltebecken -. End ELEMENT ='title' However if I try and put some of the surrounding text back in either by concatenating strings or using multiple sys.stdout.write() calls I get repetitions of the strings. if len(newchars)> 0: output = ''.join(newchars) sys.stdout.write("String read is '") sys.stdout.write(output) sys.stdout.write("'") Start ELEMENT ='title' String read is 'Der Einfluss kleiner naturnaher Retentionsma'String read is '?'S tring read is 'nahmen in der Fl'String read is '?'String read is 'che auf den Ho chwasserabfluss - Kleinr'String read is '?'String read is 'ckhaltebecken -.' End ELEMENT ='title' -- View this message in context: http://www.nabble.com/SAX-characters%28%29-output-on-multiple-lines-for-non-ascii-tp15248449p15253815.html Sent from the Python - xml-sig mailing list archive at Nabble.com. From stefan_ml at behnel.de Sun Feb 3 18:23:24 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 03 Feb 2008 18:23:24 +0100 Subject: [XML-SIG] SAX characters() output on multiple lines for non-ascii In-Reply-To: <15253815.post@talk.nabble.com> References: <15248449.post@talk.nabble.com> <15253815.post@talk.nabble.com> Message-ID: <47A5F88C.7070408@behnel.de> Hi, woodcock wrote: > Thanks for your reply. Well I have looked at it again and I lose the repeated > lines if I remove the \n and simplify part of it to: > > if len(newchars)> 0: > output = ''.join(newchars) > sys.stdout.write(output) > > However if I try and put some of the surrounding text back in either by > concatenating strings or using multiple sys.stdout.write() calls I get > repetitions of the strings. Is there a reason why you want to use SAX? It's one of the most difficult to use XML interfaces. Stefan From info at thegrantinstitute.com Wed Feb 6 18:22:09 2008 From: info at thegrantinstitute.com (Anthony Jones) Date: 06 Feb 2008 09:22:09 -0800 Subject: [XML-SIG] Professional Grant Proposal Writing Workshop (May 2008: Salt Lake City, Utah) Message-ID: <20080206092209.287093ADD235AD09@thegrantinstitute.com> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20080206/72bdde94/attachment.htm From stefan_ml at behnel.de Wed Feb 6 22:49:25 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 06 Feb 2008 22:49:25 +0100 Subject: [XML-SIG] PyXML on PyPI In-Reply-To: <99b561f50801092233g1572c981v7e0f0c0751f3497e@mail.gmail.com> References: <99b561f50801092233g1572c981v7e0f0c0751f3497e@mail.gmail.com> Message-ID: <47AA2B65.5000200@behnel.de> Hi, Michael Dunstan wrote: > It would be useful to have a copy of source tar ball for PyXML 0.8.4 > uploaded to PyPI. Then it can be easy_installed or used in > zc.buildout. There is an existing entry for PyXML on PyPI which has > the owners "loewis, aaronsw, jkloth". Can one of these owners upload > PyXML-0.8.4.tar.gz to PyPI? (Or alternatively add myself, dunny, as an > owner for that project and I could have a go at doing the upload.) Didn't try, but the PyPI entry has a download link, so EasyInstall should work. Are there any problems with it? Stefan From martin at v.loewis.de Thu Feb 7 06:52:16 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 07 Feb 2008 06:52:16 +0100 Subject: [XML-SIG] PyXML on PyPI In-Reply-To: <47AA2B65.5000200@behnel.de> References: <99b561f50801092233g1572c981v7e0f0c0751f3497e@mail.gmail.com> <47AA2B65.5000200@behnel.de> Message-ID: <47AA9C90.9020403@v.loewis.de> >> It would be useful to have a copy of source tar ball for PyXML 0.8.4 >> uploaded to PyPI. Then it can be easy_installed or used in >> zc.buildout. There is an existing entry for PyXML on PyPI which has >> the owners "loewis, aaronsw, jkloth". Can one of these owners upload >> PyXML-0.8.4.tar.gz to PyPI? (Or alternatively add myself, dunny, as an >> owner for that project and I could have a go at doing the upload.) > > Didn't try, but the PyPI entry has a download link, so EasyInstall should > work. Are there any problems with it? I just added that a few days ago - before, easy_install couldn't find it. Regards, Martin From martin at v.loewis.de Thu Feb 7 07:01:52 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 07 Feb 2008 07:01:52 +0100 Subject: [XML-SIG] SAX characters() output on multiple lines for non-ascii In-Reply-To: <15253815.post@talk.nabble.com> References: <15248449.post@talk.nabble.com> <15253815.post@talk.nabble.com> Message-ID: <47AA9ED0.10609@v.loewis.de> > However if I try and put some of the surrounding text back in either by > concatenating strings or using multiple sys.stdout.write() calls I get > repetitions of the strings. > > if len(newchars)> 0: > output = ''.join(newchars) > sys.stdout.write("String read is '") > sys.stdout.write(output) > sys.stdout.write("'") > > > Start ELEMENT ='title' > String read is 'Der Einfluss kleiner naturnaher Retentionsma'String read is > '?'S > tring read is 'nahmen in der Fl'String read is '?'String read is 'che auf > den Ho > chwasserabfluss - Kleinr'String read is '?'String read is 'ckhaltebecken -.' > End ELEMENT ='title' Please read Fred Drake's answer again. SAX will split the data in the XML document into multiple pieces. You put your decoration ("String read is") around each piece. Multiple pieces -> multiple decorations. To solve this issue, collect all pieces in a global variable: output = u"" def characters(self, chars): global output output += chars def endElement(self, name): global output print "String read is", output.encode("latin-1") output = u"" You could also chose to make output an attribute of self. Regards, Martin From michael at elyt.com Thu Feb 7 07:49:11 2008 From: michael at elyt.com (Michael Dunstan) Date: Thu, 7 Feb 2008 19:49:11 +1300 Subject: [XML-SIG] PyXML on PyPI In-Reply-To: <47AA9C90.9020403@v.loewis.de> References: <99b561f50801092233g1572c981v7e0f0c0751f3497e@mail.gmail.com> <47AA2B65.5000200@behnel.de> <47AA9C90.9020403@v.loewis.de> Message-ID: <99b561f50802062249j1edc26eraf599e8c5172683c@mail.gmail.com> On Feb 7, 2008 6:52 PM, "Martin v. L?wis" wrote: > >> It would be useful to have a copy of source tar ball for PyXML 0.8.4 > >> uploaded to PyPI. Then it can be easy_installed or used in > >> zc.buildout. There is an existing entry for PyXML on PyPI which has > >> the owners "loewis, aaronsw, jkloth". Can one of these owners upload > >> PyXML-0.8.4.tar.gz to PyPI? (Or alternatively add myself, dunny, as an > >> owner for that project and I could have a go at doing the upload.) > > > > Didn't try, but the PyPI entry has a download link, so EasyInstall should > > work. Are there any problems with it? > > I just added that a few days ago - before, easy_install couldn't find it. Yup - that does the trick. Thanks. -- Michael Dunstan From sin at image.ocn.ne.jp Fri Feb 8 10:27:39 2008 From: sin at image.ocn.ne.jp (=?ISO-2022-JP?B?GyRCSXBGYhsoQiAbJEJGRjtOGyhC?=) Date: Fri, 8 Feb 2008 18:27:39 +0900 Subject: [XML-SIG] Thank You Very Much. Message-ID: <5292D904-76B9-4D99-9CB4-071E291840D4@image.ocn.ne.jp> I Was Downloaded "PyXML-0.8.4 " (Source Code) Thank You Very Much. From guido at python.org Sun Feb 10 18:20:44 2008 From: guido at python.org (Guido van Rossum) Date: Sun, 10 Feb 2008 09:20:44 -0800 Subject: [XML-SIG] [Baypiggies] News flash: Python possibly guilty in excessive DTD traffic In-Reply-To: <20080209040312.725218a2@dartworks.biz> References: <20080209040312.725218a2@dartworks.biz> Message-ID: [+xml-sig] On Feb 8, 2008 8:03 PM, Keith Dart ♂ wrote: > > http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic > > This is interesting. I've noticed that when you use Python's XML > package in validating mode it does try to fetch the DTD. Be careful > when you use that. I think this is worth filing a bug, but I'd like to understand better where the call is made. I can't find any places in the standard xml package that does this -- but I'm not all that familiar with the code. Do you know if it's in the base xml package, or in etree, or in the separately distributed "XMLplus"? Any details you have would be appreciated (like a traceback from the point where the call is made). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From bkline at rksystems.com Mon Feb 11 05:31:33 2008 From: bkline at rksystems.com (Bob Kline) Date: Sun, 10 Feb 2008 23:31:33 -0500 Subject: [XML-SIG] python-xml unsupported? Message-ID: <47AFCFA5.9040706@rksystems.com> According to http://www.python.org/community/sigs/current/xml-sig/ "While Python includes basic XML processing capabilities, the goal of this SIG is to make Python become //the// premier language for XML processing. The SIG, through the mailing list and the PyXML project hosted on SourceForge , is helping to decide what software is required for this purpose, and coordinate its implementation and documentation." When I follow the link to the SourceForge project, the first thing I see is " PyXML is no longer maintained." So does this mean that the SIG expects Python to become "/the/ premier language for XML processing" without support for validating XML documents? -- Bob Kline http://www.rksystems.com mailto:bkline at rksystems.com From stefan_ml at behnel.de Mon Feb 11 08:06:06 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 11 Feb 2008 08:06:06 +0100 Subject: [XML-SIG] python-xml unsupported? In-Reply-To: <47AFCFA5.9040706@rksystems.com> References: <47AFCFA5.9040706@rksystems.com> Message-ID: <47AFF3DE.4040807@behnel.de> Hi, Bob Kline wrote: > According to http://www.python.org/community/sigs/current/xml-sig/ > "While Python includes basic XML processing capabilities, the goal of > this SIG is to make Python become //the// premier language for XML > processing. The SIG, through the mailing list > and the PyXML > project hosted on SourceForge , > is helping to decide what software is required for this purpose, and > coordinate its implementation and documentation." When I follow the > link to the SourceForge project, the first thing I see is " PyXML is no > longer maintained." That is true. It has not been updated in years. > So does this mean that the SIG expects Python to > become "/the/ premier language for XML processing" without support for > validating XML documents? No, just by embracing better tools like ElementTree (which is in stdlib) and lxml (which supports validation, which you were asking for). http://codespeak.net/lxml Stefan From tesco4 at gmail.com Mon Feb 11 13:41:15 2008 From: tesco4 at gmail.com (Tesco Scoco) Date: Mon, 11 Feb 2008 14:41:15 +0200 Subject: [XML-SIG] PyXML Message-ID: Hi, I need info on PyXML with regard to the following: a. How it works in processing XML documents. b. How different is it from the normal built-in python xml processing capabilities A pdf document will be of value. Hey, I'm new in this business and I find it exciting. Regards, Tesco -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20080211/54f53f0b/attachment.htm From stefan_ml at behnel.de Mon Feb 11 15:20:27 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 11 Feb 2008 15:20:27 +0100 Subject: [XML-SIG] PyXML In-Reply-To: References: Message-ID: <47B059AB.7000801@behnel.de> Hi, Tesco Scoco wrote: > I need info on PyXML with regard to the following: > b. How different is it from the normal built-in python xml processing > capabilities I think the main difference is that it is no longer maintained. There are better tools which are (ElementTree) or are not (lxml) in the stdlib, and which *are* well maintained. Stefan From bkline at rksystems.com Mon Feb 11 15:28:10 2008 From: bkline at rksystems.com (Bob Kline) Date: Mon, 11 Feb 2008 09:28:10 -0500 Subject: [XML-SIG] python-xml unsupported? In-Reply-To: <47AFF3DE.4040807@behnel.de> References: <47AFCFA5.9040706@rksystems.com> <47AFF3DE.4040807@behnel.de> Message-ID: <47B05B7A.3070900@rksystems.com> Stefan Behnel wrote: > That is true. It has not been updated in years. > > >> So does this mean that the SIG expects Python to >> become "/the/ premier language for XML processing" without support for >> validating XML documents? >> > > No, just by embracing better tools like ElementTree (which is in stdlib) and > lxml (which supports validation, which you were asking for). > > http://codespeak.net/lxml > > Ah, thanks. Perhaps it would be a good idea for the SIG page to direct programmers to packages other than PyXML, then. -- Bob Kline http://www.rksystems.com mailto:bkline at rksystems.com From tesco4 at gmail.com Mon Feb 11 15:39:38 2008 From: tesco4 at gmail.com (Tesco Scoco) Date: Mon, 11 Feb 2008 16:39:38 +0200 Subject: [XML-SIG] python-xml unsupported? In-Reply-To: <47B05B7A.3070900@rksystems.com> References: <47AFCFA5.9040706@rksystems.com> <47AFF3DE.4040807@behnel.de> <47B05B7A.3070900@rksystems.com> Message-ID: Thanks Bob and Stef. On Feb 11, 2008 4:28 PM, Bob Kline wrote: > Stefan Behnel wrote: > > That is true. It has not been updated in years. > > > > > >> So does this mean that the SIG expects Python to > >> become "/the/ premier language for XML processing" without support for > >> validating XML documents? > >> > > > > No, just by embracing better tools like ElementTree (which is in stdlib) > and > > lxml (which supports validation, which you were asking for). > > > > http://codespeak.net/lxml > > > > > > Ah, thanks. Perhaps it would be a good idea for the SIG page to direct > programmers to packages other than PyXML, then. > > -- > Bob Kline > http://www.rksystems.com > mailto:bkline at rksystems.com > > _______________________________________________ > XML-SIG maillist - XML-SIG at python.org > http://mail.python.org/mailman/listinfo/xml-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20080211/70226cc1/attachment.htm From bkline at rksystems.com Mon Feb 11 16:58:55 2008 From: bkline at rksystems.com (Bob Kline) Date: Mon, 11 Feb 2008 10:58:55 -0500 Subject: [XML-SIG] python-xml unsupported? In-Reply-To: <47B05B7A.3070900@rksystems.com> References: <47AFCFA5.9040706@rksystems.com> <47AFF3DE.4040807@behnel.de> <47B05B7A.3070900@rksystems.com> Message-ID: <47B070BF.8080200@rksystems.com> Bob Kline wrote: > Stefan Behnel wrote: > >> That is true. It has not been updated in years. >> >> >> >>> So does this mean that the SIG expects Python to >>> become "/the/ premier language for XML processing" without support for >>> validating XML documents? >>> >>> >> No, just by embracing better tools like ElementTree (which is in stdlib) and >> lxml (which supports validation, which you were asking for). >> >> http://codespeak.net/lxml >> >> >> > > Ah, thanks. Perhaps it would be a good idea for the SIG page to direct > programmers to packages other than PyXML, then. > > Are there any plans to include the XML validation support in the standard Python libraries at some point? -- Bob Kline http://www.rksystems.com mailto:bkline at rksystems.com From fdrake at acm.org Tue Feb 12 15:36:32 2008 From: fdrake at acm.org (Fred Drake) Date: Tue, 12 Feb 2008 09:36:32 -0500 Subject: [XML-SIG] python-xml unsupported? In-Reply-To: <47B070BF.8080200@rksystems.com> References: <47AFCFA5.9040706@rksystems.com> <47AFF3DE.4040807@behnel.de> <47B05B7A.3070900@rksystems.com> <47B070BF.8080200@rksystems.com> Message-ID: On Feb 11, 2008, at 10:58 AM, Bob Kline wrote: > Are there any plans to include the XML validation support in the > standard Python libraries at some point? I don't know of any such plans at this time. Most programmers aren't actually in need of schema-based validation most of the time, though there are clearly times when it would be handy. Even then, however, getting programmers to agree as to what's needed doesn't result in a single clear set of requirements, mostly due to the variety of schema languages available; everyone needs something different. -Fred -- Fred Drake From stefan_ml at behnel.de Tue Feb 12 16:59:24 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 12 Feb 2008 16:59:24 +0100 (CET) Subject: [XML-SIG] python-xml unsupported? In-Reply-To: References: <47AFCFA5.9040706@rksystems.com> <47AFF3DE.4040807@behnel.de> <47B05B7A.3070900@rksystems.com> <47B070BF.8080200@rksystems.com> Message-ID: <10850.194.114.62.39.1202831964.squirrel@groupware.dvs.informatik.tu-darmstadt.de> > On Feb 11, 2008, at 10:58 AM, Bob Kline wrote: >> Are there any plans to include the XML validation support in the >> standard Python libraries at some point? > > I don't know of any such plans at this time. Most programmers aren't > actually in need of schema-based validation most of the time, though > there are clearly times when it would be handy. Even then, however, > getting programmers to agree as to what's needed doesn't result in a > single clear set of requirements, mostly due to the variety of schema > languages available; everyone needs something different. Hmmm, at least, there are not so many important schema languages that are not supported by lxml (read: libxml2). I think the combined force of DTD, XML Schema, RelaxNG and Schematron should meet most people's needs. Actually, the only one that I could imagine being missing would be Examplotron. Stefan From stefan_ml at behnel.de Tue Feb 12 17:20:01 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 12 Feb 2008 17:20:01 +0100 (CET) Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <472AEA6A.9040102@v.loewis.de> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> Message-ID: <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Martin v. L?win wrote: >> BTW, who's responsible for updating the XML-SIG page that the Python >> homepage links to behind it's prominent "XML" link? > > In short: anybody who volunteers. What about changing the "XML" link on the Python homepage to point to a Wiki page? I think this one would come close: http://wiki.python.org/moin/PythonXml Even the following page would be a better target if it was updated a little to point to the Wiki instead of the horribly outdated "XML topic guide": http://www.python.org/community/sigs/current/xml-sig/ That way, I think, it would be clear that there *is* an XML-SIG (or at least a mailing of interested people), and if you look for software, well, there's the link to the Wiki. Stefan From bkline at rksystems.com Tue Feb 12 18:06:05 2008 From: bkline at rksystems.com (Bob Kline) Date: Tue, 12 Feb 2008 12:06:05 -0500 Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <47B1D1FD.7010407@rksystems.com> Stefan Behnel wrote: > What about changing the "XML" link on the Python homepage to point to a > Wiki page? I think this one would come close: > > http://wiki.python.org/moin/PythonXml > If I understand the responses I got to my original question correctly, the SIG views the lxml package as the successor to PyXML, its former but now abandoned flagship for making Python the premier language for XML processing. Is that right? If so, I'm not sure that this WiKi page makes that fact clear. If I misunderstood the responses I got, well, it's hard to imagine Python becoming the premier language for XML processing without support for document validation in the standard library distribution. -- Bob Kline http://www.rksystems.com mailto:bkline at rksystems.com From behnel at dvs.tu-darmstadt.de Tue Feb 12 18:47:58 2008 From: behnel at dvs.tu-darmstadt.de (Stefan Behnel) Date: Tue, 12 Feb 2008 18:47:58 +0100 (CET) Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <47B1D1FD.7010407@rksystems.com> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47B1D1FD.7010407@rksystems.com> Message-ID: <48312.194.114.62.39.1202838478.squirrel@groupware.dvs.informatik.tu-darmstadt.de> > Stefan Behnel wrote: >> What about changing the "XML" link on the Python homepage to point to a >> Wiki page? I think this one would come close: >> >> http://wiki.python.org/moin/PythonXml > > If I understand the responses I got to my original question correctly, > the SIG views the lxml package as the successor to PyXML its former but > now abandoned flagship I didn't say that (and I'm far from speaking for "the XML SIG"). Although: > for making Python the premier language for XML processing. That comes close. :) [ok, I admit: I'm biased] > Is that right? If so, I'm not sure that this WiKi page > makes that fact clear. I just fixed it up a bit so that it makes clear what is in stdlib and what people should preferably use. http://wiki.python.org/moin/PythonXml I'm open for arguments, but so far, I would say it is backed by what I tend to read on c.l.py and this list. > If I misunderstood the responses I got, well, > it's hard to imagine Python becoming the premier language for XML > processing without support for document validation in the standard > library distribution. That may be so, but if validation is just an easy_install lxml and an updated import line away, I don't think that makes it that much less suited for the "premier language". Stefan From bkline at rksystems.com Tue Feb 12 19:34:25 2008 From: bkline at rksystems.com (Bob Kline) Date: Tue, 12 Feb 2008 13:34:25 -0500 Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <48312.194.114.62.39.1202838478.squirrel@groupware.dvs.informatik.tu-darmstadt.de> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47B1D1FD.7010407@rksystems.com> <48312.194.114.62.39.1202838478.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <47B1E6B1.7000005@rksystems.com> Stefan Behnel wrote: >> it's hard to imagine Python becoming the premier language for XML >> processing without support for document validation in the standard >> library distribution. >> > > That may be so, but if validation is just an > > easy_install lxml > > and an updated import line away, I don't think that makes it that much > less suited for the "premier language". > Well, I guess we could have said the same about PyXML when we adopted it for a large project with one of my customers, but now we have a bunch of critical processing tied to a package that's dead (despite the "semi-official" on the WiKi page -- we don't have any useful place to report a nasty bug we just ran into). I'd be happier if the validation support were folded in with the standard libraries. I remember attending a Python conference some years back, at which Guido seemed to be expressing a certain amount of disdain for XML generally, so I'm grateful for whatever we get. But such a view held by the BDFL could easily be seen as an obstacle in the path to the "premier language for XML processing" goal. -- Bob Kline http://www.rksystems.com mailto:bkline at rksystems.com From john at nmt.edu Tue Feb 12 22:35:42 2008 From: john at nmt.edu (John W. Shipman) Date: Tue, 12 Feb 2008 14:35:42 -0700 (MST) Subject: [XML-SIG] PyXML In-Reply-To: References: Message-ID: On Mon, 11 Feb 2008, Tesco Scoco wrote: +-- | I need info on PyXML with regard to the following: | a. How it works in processing XML documents. | b. How different is it from the normal built-in python xml processing | capabilities | | A pdf document will be of value. Hey, I'm new in this business and I find it | exciting. +-- I've written up a quick-reference for lxml, here: http://www.nmt.edu/tcc/help/pubs/pylxml/ There's a link to the PDF version at the top of the first page. This is a relatively new document, and I would greatly appreciate any comments on improving its usefulness. Best regards, John Shipman (john at nmt.edu), Applications Specialist, NM Tech Computer Center, Speare 119, Socorro, NM 87801, (505) 835-5950, http://www.nmt.edu/~john ``Let's go outside and commiserate with nature.'' --Dave Farber From john at nmt.edu Tue Feb 12 23:15:22 2008 From: john at nmt.edu (John W. Shipman) Date: Tue, 12 Feb 2008 15:15:22 -0700 (MST) Subject: [XML-SIG] python-xml unsupported? In-Reply-To: References: <47AFCFA5.9040706@rksystems.com> <47AFF3DE.4040807@behnel.de> <47B05B7A.3070900@rksystems.com> <47B070BF.8080200@rksystems.com> Message-ID: On Feb 11, 2008, at 10:58 AM, Bob Kline wrote: +-- | Are there any plans to include the XML validation support in the | standard Python libraries at some point? +-- On Tue, 12 Feb 2008, Fred Drake replied: +-- | I don't know of any such plans at this time. Most programmers aren't | actually in need of schema-based validation most of the time, though | there are clearly times when it would be handy. Even then, however, | getting programmers to agree as to what's needed doesn't result in a | single clear set of requirements, mostly due to the variety of schema | languages available; everyone needs something different. +-- Minor details aside, I'd like to point out that validation is a pretty open-ended process, and no canonical validator can handle all possible needs. For example, suppose you have a 'part-number' attribute that is not truly valid unless it is defined in your organization's part number database. Do you expect XSchema to go out and peer into the database? I think that's terribly unreasonable. The xsd: datatypes from XSchema that I use in my Relax NG schemas[^1] are perfectly adequate for well over 90% of my validation needs. The rest of them have to be covered by my Python application anyway. For me, one of the big payoffs of validating XML files as they are input is that it eliminates a lot of error-checking logic from my application. I never have to worry, 'what if this attribute is missing or isn't a valid number?' I can just grab the attribute and coerce it with int(). May I suggest that if there is some validating parser added to the library, stick with a basic set of content types that cover the vast majority of needs. Careful applications coders will not mind the additional overhead of validating the rest. Best regards, John Shipman (john at nmt.edu), Applications Specialist, NM Tech Computer Center, Speare 119, Socorro, NM 87801, (505) 835-5950, http://www.nmt.edu/~john ``Let's go outside and commiserate with nature.'' --Dave Farber --------------- [^1] http://www.nmt.edu/tcc/help/pubs/rnc/ esp. http://www.nmt.edu/tcc/help/pubs/rnc/xsd.html From stefan_ml at behnel.de Wed Feb 13 14:07:19 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 13 Feb 2008 14:07:19 +0100 (CET) Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47B1D1FD.7010407@rksystems.com> <48312.194.114.62.39.1202838478.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Message-ID: <46106.194.114.62.34.1202908039.squirrel@groupware.dvs.informatik.tu-darmstadt.de> Kevin Cole wrote: > On Feb 12, 2008 12:47 PM, Stefan Behnel wrote: > >> That may be so, but if validation is just an >> >> easy_install lxml >> >> and an updated import line away, I don't think that makes it that much >> less suited for the "premier language". > > An slightly off-topic rant: I've come to really detest the so-called > "easy_install" as so many eggs appear to put files wherever they feel > like on the system. That's definitely off-topic here, as lxml does not easy_install anything outside the Python site-packages directory (i.e. outside the one egg directory that easy_install creates). Actually, most Python eggs I have come across behave that way, as most of them were library packages, not applications. Stefan From stefan_ml at behnel.de Wed Feb 13 14:13:59 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 13 Feb 2008 14:13:59 +0100 (CET) Subject: [XML-SIG] python-xml unsupported? In-Reply-To: References: <47AFCFA5.9040706@rksystems.com> <47AFF3DE.4040807@behnel.de> <47B05B7A.3070900@rksystems.com> <47B070BF.8080200@rksystems.com> Message-ID: <24495.194.114.62.34.1202908439.squirrel@groupware.dvs.informatik.tu-darmstadt.de> John W. Shipman wrote: > For me, one of the big payoffs of validating XML files as they > are input is that it eliminates a lot of error-checking logic > from my application. I never have to worry, 'what if this > attribute is missing or isn't a valid number?' I can just > grab the attribute and coerce it with int(). That's why I added parse-time XML-Schema validation support to lxml 2.0, especially for lxml.objectify. It's the easiest way of making sure the object structure the parser returns is as expected. > May I suggest that if there is some validating parser added > to the library, stick with a basic set of content types that > cover the vast majority of needs. Careful applications > coders will not mind the additional overhead of validating > the rest. I think that's how additions to stdlib work (or at least should work) in general. Batteries included does not mean you also get a free power socket. Stefan From harmenkampinga at yahoo.com Thu Feb 14 16:11:40 2008 From: harmenkampinga at yahoo.com (Harmen Kampinga) Date: Thu, 14 Feb 2008 07:11:40 -0800 (PST) Subject: [XML-SIG] problem installing PyXML Message-ID: <723398.74369.qm@web52005.mail.re2.yahoo.com> Dear reader, I have a problem installing PyXML-0.8.4 on my Mac OS X Tiger. When running the steps 1 and 2 I recieve this error: IOError: [Errno 2] No such file or directory: '/System/Library/Frameworks/Python.framework/Versions/2.3/include/python2.3/pyconfig.h' What am I doing wrong? Should I do more? I hope that you can help. Kind regards, Harmen Kampinga ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs From mike at skew.org Sun Feb 17 13:36:00 2008 From: mike at skew.org (Mike Brown) Date: Sun, 17 Feb 2008 05:36:00 -0700 (MST) Subject: [XML-SIG] [Baypiggies] News flash: Python possibly guilty in excessive DTD traffic In-Reply-To: Message-ID: <200802171236.m1HCa1Zm016884@chilled.skew.org> Before looking for a bug, create a test case and verify that the behavior isn't expected for it. I mean, of *course* there'll be an attempt to fetch whatever DTD is mentioned in a DOCTYPE when your XML processor is validating, and it's quite reasonable to fetch one even when not validating, because there's more info in a DTD than just what's needed for validation. AFAICT, the main problem the W3C is talking about is not what happens when a legitimate DTD request occurs in response to a system ID in a DOCTYPE, but rather when there really shouldn't be such a request -- that is, when the DTD's URL is just a namespace ID. What evidence is there that Python's standard XML libs are making illegitimate requests for namespace IDs? I see none in that W3C blog post. Show us a reproducible example of a namespace ID being subjected to a fetch attempt while reading in an XML document with standard Python APIs. I don't think it's happening at all. Apparently there *is* evidence that urllib is ultimately called by something quite often to grab XHTML DTDs, and the HTTP response may not always be handled very well. But assuming it's part of normal XML processing, we have no details about whether it's a legitimate call for a DOCTYPE or an illegit one for a namespace ID, and whether it's really unreasonable to keep trying to fetch every time the reference is encountered. It sounds like application-level issues, not misbehavior by Python's SAX or DOM APIs. That blog author also seems to feel it's unreasonable for an app to seek out the same network-bound resource repeatedly, which is a sound position in some document and application contexts, but not others; it really depends on the situation, doesn't it? Sure, an app developer might be able to configure the parser to not read external entities, or could cache responses to minimize that traffic, if necessary, but it's not an obligation or necessarily a bug if that doesn't happen. And the XML spec is silent on the issue of unfetchable external entities anyway. To answer your question, legitimate DTD processing is probably a feature of the underlying parser (Expat). I assume it calls back to a urllib-based resolver. But like I said, there's no bug there; just a lack of features to encourage application developers to use XML catalogs. I don't know if this helps.. or am I missing something here? Guido van Rossum wrote: > [+xml-sig] > > On Feb 8, 2008 8:03 PM, Keith Dart ? wrote: > > > > http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic > > > > This is interesting. I've noticed that when you use Python's XML > > package in validating mode it does try to fetch the DTD. Be careful > > when you use that. > > I think this is worth filing a bug, but I'd like to understand better > where the call is made. I can't find any places in the standard xml > package that does this -- but I'm not all that familiar with the code. > Do you know if it's in the base xml package, or in etree, or in the > separately distributed "XMLplus"? Any details you have would be > appreciated (like a traceback from the point where the call is made). > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > XML-SIG maillist - XML-SIG at python.org > http://mail.python.org/mailman/listinfo/xml-sig From martin at v.loewis.de Sun Feb 24 06:30:43 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 24 Feb 2008 06:30:43 +0100 Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <47B1D1FD.7010407@rksystems.com> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47B1D1FD.7010407@rksystems.com> Message-ID: <47C10103.20908@v.loewis.de> > If I understand the responses I got to my original question correctly, > the SIG views the lxml package as the successor to PyXML, its former but > now abandoned flagship for making Python the premier language for XML > processing. Is that right? If so, I'm not sure that this WiKi page > makes that fact clear. If I misunderstood the responses I got, well, > it's hard to imagine Python becoming the premier language for XML > processing without support for document validation in the standard > library distribution. I don't think Python is or should be the premier language to do XML processing. If you have an application that is entirely about XML processing, use Java. If you have an application that integrates a lot of different things (or perhaps just two or three of them), and XML processing is one, you should consider Python. Then you should analyze your processing needs, and pick a Python library that meets these needs. If you found that validation is a processing need, I strongly recommend that you re-evaluate your processing needs (whether you use Python or not). IMHO, validation is much over-rated and over-used. As for the flagship library to do XML processing: I still think that's the standard library. It has always met my own processing needs, and it comes as an included battery. Most applications of PyXML should easily port to the standard library. Regards, Martin From martin at v.loewis.de Sun Feb 24 06:31:57 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 24 Feb 2008 06:31:57 +0100 Subject: [XML-SIG] problem installing PyXML In-Reply-To: <723398.74369.qm@web52005.mail.re2.yahoo.com> References: <723398.74369.qm@web52005.mail.re2.yahoo.com> Message-ID: <47C1014D.7090100@v.loewis.de> > I have a problem installing PyXML-0.8.4 on my Mac OS X > Tiger. > When running the steps 1 and 2 I recieve this error: > > IOError: [Errno 2] No such file or directory: > '/System/Library/Frameworks/Python.framework/Versions/2.3/include/python2.3/pyconfig.h' > > What am I doing wrong? Should I do more? Have you installed Apple's developer tools (Xcode, gcc, header files, ...)? Regards, Martin From martin at v.loewis.de Sun Feb 24 06:54:14 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 24 Feb 2008 06:54:14 +0100 Subject: [XML-SIG] [Baypiggies] News flash: Python possibly guilty in excessive DTD traffic In-Reply-To: References: <20080209040312.725218a2@dartworks.biz> Message-ID: <47C10686.3060804@v.loewis.de> > I think this is worth filing a bug, but I'd like to understand better > where the call is made. I can't find any places in the standard xml > package that does this -- but I'm not all that familiar with the code. > Do you know if it's in the base xml package, or in etree, or in the > separately distributed "XMLplus"? Any details you have would be > appreciated (like a traceback from the point where the call is made). In case you didn't get an answer yet: I don't know about the OP's stack trace, but the standard library accesses the internet in xml.sax.saxutils.prepare_input_source, which in turn may be called from xml.sax.expatreader.ExpatParser.external_entity_ref (unless the feature_external_ges is off). That, in turn, is called by the parser when it sees the DOCTYPE declaration. The OP was referring to validation, so more likely he was talking about the xmlproc parser (which is only in PyXML). I also agree with Mike Brown: The author of this W3C article apparently confuses a number of things, in particular whether an XML parser *should* fetch the SYSTEM identifier in a document. According to the XML spec, it should indeed. Now, the other question is whether there should be caching; and yes, there should be, and no caching is implemented (except in xmlproc, which supports catalogs). As for accessing URLs that are used as namespace URIs: our XML libraries never do that. In any case, AMK created issue2124. Regards, Martin From stefan_ml at behnel.de Sun Feb 24 10:08:00 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 24 Feb 2008 10:08:00 +0100 Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <47C10103.20908@v.loewis.de> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47B1D1FD.7010407@rksystems.com> <47C10103.20908@v.loewis.de> Message-ID: <47C133F0.2080402@behnel.de> Hi, Martin v. L?wis wrote: > I don't think Python is or should be the premier language to do XML > processing. I object! > If you have an application that is entirely about XML > processing, use Java. I highly object! Performance is one reason: http://www.xml.com/pub/a/2007/05/09/xml-parser-benchmarks-part-1.html http://www.xml.com/pub/a/2007/05/16/xml-parser-benchmarks-part-2.html http://effbot.org/zone/celementtree.htm#benchmarks Simplicity is another. Python wins that contest, for XML and most other topics you may choose. > If you have an application that integrates a lot of different things > (or perhaps just two or three of them), and XML processing is one, > you should consider Python. Then you should analyze your processing > needs, and pick a Python library that meets these needs. Absolutely. > If you found that validation is a processing need, I strongly recommend > that you re-evaluate your processing needs (whether you use Python > or not). IMHO, validation is much over-rated and over-used. It's very handy, though. You can validate on the way in (right in the parser) and be sure that the structure you get is as expected, without adding tons of "is this valid input" checks to your code. That one is about simplicity, too. I think validation is somewhat comparable to assertions that you put into your code. > As for the flagship library to do XML processing: I still think that's > the standard library. It has always met my own processing needs, and > it comes as an included battery. Most applications of PyXML should > easily port to the standard library. Again, depends on your needs. But I think we agree that the stdlib should include the batteries for the majority of processing needs, and leave special needs to external packages. And I would agree that it meets this goal for XML. Stefan From martin at v.loewis.de Sun Feb 24 11:05:34 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 24 Feb 2008 11:05:34 +0100 Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <47C133F0.2080402@behnel.de> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47B1D1FD.7010407@rksystems.com> <47C10103.20908@v.loewis.de> <47C133F0.2080402@behnel.de> Message-ID: <47C1416E.4040400@v.loewis.de> >> If you have an application that is entirely about XML >> processing, use Java. > > I highly object! > > Performance is one reason: > http://www.xml.com/pub/a/2007/05/09/xml-parser-benchmarks-part-1.html > http://www.xml.com/pub/a/2007/05/16/xml-parser-benchmarks-part-2.html > http://effbot.org/zone/celementtree.htm#benchmarks > > Simplicity is another. Python wins that contest, for XML and most other topics > you may choose. Implementation of specifications is another reason. XSLT2, XPath2, XQuery? Mapping of Schema definitions to Python classes? XML 1.1? XML Encryption and Signature? XML Base? Catalogs? Sure, if your processing needs are simple, the Python implementation will be simple, and perhaps also reasonably performant. However, in an application that is all about XML processing, chances are high that you need a functionality that is not available in the XML library of your choice, but would be available in Java. Regards, Martin From stefan_ml at behnel.de Sun Feb 24 16:05:39 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 24 Feb 2008 16:05:39 +0100 Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <47C1416E.4040400@v.loewis.de> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47B1D1FD.7010407@rksystems.com> <47C10103.20908@v.loewis.de> <47C133F0.2080402@behnel.de> <47C1416E.4040400@v.loewis.de> Message-ID: <47C187C3.6030103@behnel.de> Hi, Martin v. L?wis wrote: > Implementation of specifications is another reason. XSLT2, XPath2, > XQuery? There definitely isn't a "standard" solution, but it's not true that there is nothing, either. http://www.w3.org/XML/Query/#implementations http://behnel.de/cgi-bin/weblog_basic/index.php?p=12 > Mapping of Schema definitions to Python classes? A combination of lxml.objectify and schema validation is close enough to that, IMHO, but not a bit less powerful, as it's C-implemented and completely runtime configurable in Python code at basically any granularity. > XML 1.1? Honestly - what for? > XML Encryption and Signature? Should be easy to wrap libxmlsec if you need it. http://www.aleksey.com/xmlsec/ > XML Base? Supported by libxml2. > Catalogs? Supported by libxml2. > Sure, if your processing needs are simple, the Python implementation > will be simple, and perhaps also reasonably performant. However, > in an application that is all about XML processing, chances are high > that you need a functionality that is not available in the XML library > of your choice, but would be available in Java. If you start with ElementTree and find that you need a feature that isn't supported there, chances are high that you will either find it in lxml or that it would be easy to add to it if you really feel like needing it. Stefan From hgg9140 at seanet.com Sun Feb 24 17:21:13 2008 From: hgg9140 at seanet.com (Harry George) Date: Sun, 24 Feb 2008 08:21:13 -0800 Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <47C1416E.4040400@v.loewis.de> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47B1D1FD.7010407@rksystems.com> <47C10103.20908@v.loewis.de> <47C133F0.2080402@behnel.de> <47C1416E.4040400@v.loewis.de> Message-ID: <20080224082113.0dd67af5@fred.site> On Sun, 24 Feb 2008 11:05:34 +0100 "Martin v. L?wis" wrote: > >> If you have an application that is entirely about XML > >> processing, use Java. [snip] > > Sure, if your processing needs are simple, the Python implementation > will be simple, and perhaps also reasonably performant. However, > in an application that is all about XML processing, chances are high > that you need a functionality that is not available in the XML library > of your choice, but would be available in Java. > > Regards, > Martin This completely misses the point of XML. Its purpose is to provide a cross-platform, cross-language lingua franca everyone can use. As for complexity, the original permise was to be simple enough that a grad student can write a parser in a weekend. At bit more complexity is acceptable, as long as standards-approved complexity isn't used as a lockin mechanism. The fact that this promise scared the crap out of COTS vendors who then "embraced, enhanced, extended" XML into a bloated stds-based lockin monster is no excuse to surrender. Use a reasonably well-supported (on all platforms and languages) subset of XML. Use the libxml2 bindings. If you find there is XML which is truely unparseable with cross-language tools, then isolate it, translate it to reusable XML, and go from there. (And tell the supplier you are examining other options.) I've had to do this with "XML" data feeds from Microsoft, IBM, and Dassault Systemes products. Java wouldn't have helped. Further, when we've done Java and Python systems in parallel (e.g., in SOAP/SSL communications), python was by far the easiest, cleanest treatment. -- Harry George hgg9140 at seanet.com www.seanet.com/~hgg9140 From martin at v.loewis.de Sun Feb 24 19:28:40 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 24 Feb 2008 19:28:40 +0100 Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <47C187C3.6030103@behnel.de> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47B1D1FD.7010407@rksystems.com> <47C10103.20908@v.loewis.de> <47C133F0.2080402@behnel.de> <47C1416E.4040400@v.loewis.de> <47C187C3.6030103@behnel.de> Message-ID: <47C1B758.7070402@v.loewis.de> >> Mapping of Schema definitions to Python classes? > > A combination of lxml.objectify and schema validation is close enough to that, > IMHO, but not a bit less powerful, as it's C-implemented and completely > runtime configurable in Python code at basically any granularity. So can you also use that to generate documents from scratch? Suppose I have then with lxml.objectify, how would I spell p = Person(first="Monika", last="Mustermann", age=57) p.toxml() >> XML 1.1? > > Honestly - what for? To parse it, should you ever see documents that use it. In the language for best XML processing, it's reasonably to expect that this implemented, no? Xerces 2.9 for Java supports it. >> XML Encryption and Signature? > > Should be easy to wrap libxmlsec if you need it. > > http://www.aleksey.com/xmlsec/ Perhaps. In Java, I get working implementations without further work. > If you start with ElementTree and find that you need a feature that isn't > supported there, chances are high that you will either find it in lxml or that > it would be easy to add to it if you really feel like needing it. So how about a web services stack :-? Regards, Martin From dkuhlman at rexx.com Sun Feb 24 19:28:08 2008 From: dkuhlman at rexx.com (Dave Kuhlman) Date: Sun, 24 Feb 2008 10:28:08 -0800 Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <47C133F0.2080402@behnel.de> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47B1D1FD.7010407@rksystems.com> <47C10103.20908@v.loewis.de> <47C133F0.2080402@behnel.de> Message-ID: <20080224182808.GB33487@cutter.rexx.com> On Sun, Feb 24, 2008 at 10:08:00AM +0100, Stefan Behnel wrote: > Hi, > > Martin v. L?wis wrote: > > I don't think Python is or should be the premier language to do XML > > processing. > > I object! > > > > If you have an application that is entirely about XML > > processing, use Java. > > I highly object! I'll agree with Stefan, here. If lxml does not take care of your needs, then you must have very special needs. And, in the event that you *really* do need to use some Java XML class library, seriously consider using that library through Jython. Jython makes it very easy to use something like dom4j (http://www.dom4j.org/). I've even generated a Java jar file from an XML Schema using XMLBeans (http://xmlbeans.apache.org/), then used that jar file from Jython. [more good reasons for using Python XML support snipped] - Dave -- Dave Kuhlman http://www.rexx.com/~dkuhlman From fdrake at acm.org Sun Feb 24 23:54:45 2008 From: fdrake at acm.org (Fred Drake) Date: Sun, 24 Feb 2008 17:54:45 -0500 Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <47C133F0.2080402@behnel.de> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47B1D1FD.7010407@rksystems.com> <47C10103.20908@v.loewis.de> <47C133F0.2080402@behnel.de> Message-ID: <18FED7E4-529F-4A62-8160-2740BB41C584@acm.org> On Feb 24, 2008, at 4:08 AM, Stefan Behnel wrote: > I think validation is somewhat comparable to assertions that you put > into your > code. There's one case where we use schema-based validation in our products, and that's at organizational boundaries: Where we accept XML from another company, it gets validated using the schema which are defined for the purposes of that communications channel. But that's the only place we've had an actual use for validation. It's a place where validation is appropriate, though. -Fred -- Fred Drake From bkline at rksystems.com Mon Feb 25 01:15:12 2008 From: bkline at rksystems.com (Bob Kline) Date: Sun, 24 Feb 2008 19:15:12 -0500 Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <47C10103.20908@v.loewis.de> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47B1D1FD.7010407@rksystems.com> <47C10103.20908@v.loewis.de> Message-ID: <47C20890.10002@rksystems.com> Martin v. L?wis wrote: > I don't think Python is or should be the premier language to do XML > processing. I was just echoing language from the SIG's own home page. We don't need it to be the best language for XML processing, but it's the language we use for much of our administrative interfaces and all of our data exchange arrangements. > If you have an application that integrates a lot of different things > (or perhaps just two or three of them), and XML processing is one, > you should consider Python. Then you should analyze your processing > needs, and pick a Python library that meets these needs. That's what we did, and that package (the one the SIG recommended) has been abandoned in favor of a newer one. It would be nice if we could reduce the chance of that happening again by having the package incorporated into the standard library. > > If you found that validation is a processing need, I strongly recommend > that you re-evaluate your processing needs (whether you use Python > or not). IMHO, validation is much over-rated and over-used. Well, if you have customers (as we do) who have entered into data exchange arrangements with other organizations and companies with agreements to make sure that the documents exchanged meet sets of validation rules, then it seems that the choice you have is to use an existing XML validation package or to write your own XML validation software. We'd rather not re-invent the wheel, but we'd also like to avoid repeatedly replacing third-party packages as they are abandoned. -- Bob Kline http://www.rksystems.com mailto:bkline at rksystems.com From stefan_ml at behnel.de Mon Feb 25 06:46:00 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 25 Feb 2008 06:46:00 +0100 Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <47C1B758.7070402@v.loewis.de> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47B1D1FD.7010407@rksystems.com> <47C10103.20908@v.loewis.de> <47C133F0.2080402@behnel.de> <47C1416E.4040400@v.loewis.de> <47C187C3.6030103@behnel.de> <47C1B758.7070402@v.loewis.de> Message-ID: <47C25618.3030405@behnel.de> Hi Martin, Martin v. L?wis wrote: >>> Mapping of Schema definitions to Python classes? >> >> A combination of lxml.objectify and schema validation is close enough >> to that, >> IMHO, but not a bit less powerful, as it's C-implemented and completely >> runtime configurable in Python code at basically any granularity. > > So can you also use that to generate documents from scratch? Suppose I have > > > > > > > > > > then with lxml.objectify, how would I spell > > p = Person(first="Monika", last="Mustermann", age=57) > p.toxml() As the docs tell you to: from lxml.objectify import E, deannotate from lxml.etree import tostring # this is the line you want: p = E.person(E.first("Monika"), E.last("Mustermann"), E.age(57)) # now do whatever you like with p here, for example: p.age += 1 p.last = "Musterfrau" deannotate(p) # remove type hints tostring(p) # serialise >>> XML 1.1? >> >> Honestly - what for? > > To parse it, should you ever see documents that use it. In the language > for best XML processing, it's reasonably to expect that this > implemented, no? Xerces 2.9 for Java supports it. Fine. I've never seen an XML 1.1 document in the wild. Real-life applications tend to avoid them as people know they're not portable (and XML is about platform independence and portability and all that...) >>> XML Encryption and Signature? >> >> Should be easy to wrap libxmlsec if you need it. >> >> http://www.aleksey.com/xmlsec/ > > Perhaps. In Java, I get working implementations without further work. Ok, fine. We never had a request on the lxml list so far. Maybe people just don't use it (yet). But if someone needs it, it's not hard to enable. The implementation is there, it's just the binding to lxml that is missing. So just add that to the number of implementation weeks for your entire application and then go and compare that to a pure Java implementation. >> If you start with ElementTree and find that you need a feature that isn't >> supported there, chances are high that you will either find it in lxml >> or that >> it would be easy to add to it if you really feel like needing it. > > So how about a web services stack :-? Web services are not so much about XML processing (anymore) as you might think. They are more about hiding XML (and networking and...) than about processing it. So I don't really see the link to ElementTree or lxml here. That said, Google says it has some 5 million hits for "web service python": http://www.google.de/search?q=web+service+python including this: http://pywebsvcs.sourceforge.net/ but I have no idea how useful/usable/well-designed the tools are here. Stefan From martin at v.loewis.de Mon Feb 25 07:18:20 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 25 Feb 2008 07:18:20 +0100 Subject: [XML-SIG] PyXML for py 2.5 In-Reply-To: <47C20890.10002@rksystems.com> References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com> <472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de> <472AEA6A.9040102@v.loewis.de> <19904.194.114.62.39.1202833201.squirrel@groupware.dvs.informatik.tu-darmstadt.de> <47B1D1FD.7010407@rksystems.com> <47C10103.20908@v.loewis.de> <47C20890.10002@rksystems.com> Message-ID: <47C25DAC.3020506@v.loewis.de> > That's what we did, and that package (the one the SIG recommended) has > been abandoned in favor of a newer one. That's not exactly true. It hasn't been abandoned in favor of lxml, or some other package. It has been abandoned in favor of the standard library (at least, that's why *I* abandoned it; not sure why nobody was picking it up). > It would be nice if we could > reduce the chance of that happening again by having the package > incorporated into the standard library. PyXML *is* integrated into the standard library. Just use it there. > Well, if you have customers (as we do) who have entered into data > exchange arrangements with other organizations and companies with > agreements to make sure that the documents exchanged meet sets of > validation rules, then it seems that the choice you have is to use an > existing XML validation package or to write your own XML validation > software. We'd rather not re-invent the wheel, but we'd also like to > avoid repeatedly replacing third-party packages as they are abandoned. One option, of course, is to take over maintenance of a package that was abandoned. This is free software. Regards, Martin From asylvan at gmail.com Thu Feb 28 00:17:20 2008 From: asylvan at gmail.com (Aerik Sylvan) Date: Wed, 27 Feb 2008 15:17:20 -0800 Subject: [XML-SIG] XBEL and tags (aka "labels") Message-ID: <355a36af0802271517v25e5d4bm80383bb2a203eb4a@mail.gmail.com> Hi All, You've probably seen this article http://www.xml.com/pub/a/2005/03/02/restful.html which talks in the second part about XBEL and tags (which Google calls "labels"). From what I've read of the XBEL spec, it does not seem to support tags well - it supports folders, and you could say they are similar, but not the same. Has there already been discussion about adding tags to the XBEL spec? What's the current status? Thanks, Aerik -- http://www.wikidweb.com - the Wiki Directory of the Web http://tagthis.info - Hosted Tagging for your website! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20080227/49ec4ff5/attachment.htm