From info at formaselect.com Tue Jun 8 12:22:32 2004 From: info at formaselect.com (info@formaselect.com) Date: Tue Jun 8 12:22:37 2004 Subject: [XML-SIG] Re: Mail Delivery (failure info@formaselect.com) In-Reply-To: <200406081622.i58GMGQ3017138@host.i4nm.org> References: <200406081622.i58GMGQ3017138@host.i4nm.org> Message-ID: <200406081622.i58GMWu9017307@host.i4nm.org> This is an autoresponder. I'll never see your message. From armoire-jzxasavjbybceg at elsa.de Wed Jun 9 17:23:23 2004 From: armoire-jzxasavjbybceg at elsa.de (Valentin Moses) Date: Wed Jun 9 16:29:30 2004 Subject: [XML-SIG] university diplomas Message-ID: An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040610/16aeba6c/attachment.html From winrar at diana.dti.ne.jp Thu Jun 10 10:00:02 2004 From: winrar at diana.dti.ne.jp (winrar@diana.dti.ne.jp) Date: Thu Jun 10 10:01:08 2004 Subject: [XML-SIG] Hi Message-ID: Important informations! -------------- next part -------------- A non-text attachment was scrubbed... Name: Informations.zip Type: application/octet-stream Size: 22420 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040610/8381dc2a/Informations.obj From PABQOUQCWDTI at msn.com Fri Jun 11 10:17:11 2004 From: PABQOUQCWDTI at msn.com (Luella Tillman) Date: Fri Jun 11 21:21:03 2004 Subject: [XML-SIG] =?iso-8859-1?q?1_/2_off_med=2Es_-_Del=EDvered_Overnigh?= =?iso-8859-1?q?t?= Message-ID: <506514i8aejz$1167d6o2$2234k2p2@neonatal> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040611/68d813b1/attachment.html From lrobe27715 at erols.com Sun Jun 13 08:27:13 2004 From: lrobe27715 at erols.com (Lrobe27715) Date: Sun Jun 13 08:27:12 2004 Subject: [XML-SIG] IndispensableSoftWare on cd... needy? seeBody In-Reply-To: <3J03D8LFHHCKAE57@python.org> References: <3J03D8LFHHCKAE57@python.org> Message-ID: Xml-sig http://BENBCE.info/OE017/?affiliate_id=233642&campaign_id=601 http://FJGCNA.info/OE017/?affiliate_id=233642&campaign_id=601 Bye-bye From WIKSFKDKUBO at yahoo.com Thu Jun 17 00:27:19 2004 From: WIKSFKDKUBO at yahoo.com (Penelope Martin) Date: Thu Jun 17 05:31:28 2004 Subject: [XML-SIG] =?iso-8859-1?q?No_prescr=EDption_necessary?= Message-ID: An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040617/fe0c86cf/attachment.html From fredrik at pythonware.com Wed Jun 16 14:02:02 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu Jun 17 07:10:11 2004 Subject: [XML-SIG] ANN: ElementTree 1.2 release candidate 1 Message-ID: The Element type is a simple but flexible container object, designed to store hierarchical data structures, such as simplified XML infosets, in memory. The ElementTree package provides a Python implementation of this type, plus code to serialize element trees to and from XML files. The 1.2 release adds limited support for XPath and XInclude, and also fixes a number of serialization bugs, mostly related to extensive use of namespaces and unicode in tags and attribute names. For a complete list of changes, see the CHANGES document in the source kit. You can get the ElementTree toolkit from: http://effbot.org/downloads Brief documentation and some code samples (including an XML-RPC unmarshaller in 16 lines) are available from: http://effbot.org/zone/element.htm enjoy /F From fredrik at pythonware.com Fri Jun 18 12:48:55 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri Jun 18 14:49:28 2004 Subject: [XML-SIG] ANN: ElementTree 1.2 final (june 18, 2004) References: Message-ID: The Element type is a simple but flexible container object, designed to store hierarchical data structures, such as simplified XML infosets, in memory. The ElementTree package provides a Python implementation of this type, plus code to serialize element trees to and from XML files. The 1.2 release adds limited support for XPath and XInclude, and also fixes a number of serialization bugs, mostly related to extensive use of namespaces and unicode in tags and attribute names. For a complete list of changes, see the CHANGES document in the source kit. You can get the ElementTree toolkit from: http://effbot.org/downloads Documentation, articles, and some code samples (including an XML-RPC unmarshaller in 16 lines) are available from: http://effbot.org/zone/element.htm enjoy /F From fa325980 at skynet.be Fri Jun 18 21:04:26 2004 From: fa325980 at skynet.be (Vervecken) Date: Sun Jun 20 23:47:46 2004 Subject: [XML-SIG] Don`t worry, be happy! Message-ID: Hi Honey! I`m in hurry, but i still love ya... (as you can see on the picture) Bye - Bye: Vervecken -------------- next part -------------- A non-text attachment was scrubbed... Name: www.ecard.com.funny.picture.index.nude.php356.pif Type: application/octet-stream Size: 12800 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040619/8b32b899/www.ecard.com.funny.picture.index.nude.php356.obj From arw at ifu.net Thu Jun 17 05:26:29 2004 From: arw at ifu.net (arw@ifu.net) Date: Sun Jun 20 23:54:44 2004 Subject: [XML-SIG] Forum notify Message-ID: An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040617/746b1a93/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: kkhubopdjh.bmp Type: image/bmp Size: 4022 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040617/746b1a93/kkhubopdjh-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: Encrypted.zip Type: application/octet-stream Size: 21709 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040617/746b1a93/Encrypted-0001.obj From news at allnet.es Fri Jun 18 05:42:10 2004 From: news at allnet.es (ALLNET-News) Date: Sun Jun 20 23:57:38 2004 Subject: [XML-SIG] =?iso-8859-1?q?C=E1maras_IP_LAN_y_54Mbit_-_en_stock!?= Message-ID: <20040618094210.9E2FF834A23@webbox243.server-home.net> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040618/a00acc4c/attachment.html From list-matt at reprocessed.org Fri Jun 18 10:24:42 2004 From: list-matt at reprocessed.org (Matthew Patterson) Date: Mon Jun 21 00:30:54 2004 Subject: [XML-SIG] double-encoding XSL parameters in Python with libxslt Message-ID: <3D359A44-C133-11D8-B80B-000393CBB978@reprocessed.org> Hello, I've got an annoying problem using Gnome libxslt's Python bindings. I'm passing in a global parameter (a string), which needs to be enclosed in quotes. I can't guarantee that the string won't contain more quotes, so to ensure that I don't terminate my quoted-string parameter early I'm encoding any single quotes as ' before I pass in the string. libxslt is encoding my already encoded string again, so 'hello here's a parameter' gets encoded to 'hello here's a parameter' by me, and then to 'hello here&apos;s a parameter' by libxslt. If I just pass in 'hello here's a parameter' then libxslt complains about terminating the string early... Is there any way I can avoid this? Thanks, Matt -- Matt Patterson | Design & Code | http://www.emdash.co.uk/ | http://www.reprocessed.org/ From oygnvddqy at hotmail.com Sat Jun 19 03:50:50 2004 From: oygnvddqy at hotmail.com (Liliana Oneil) Date: Mon Jun 21 00:42:52 2004 Subject: [XML-SIG] =?iso-8859-1?q?Fwd=3Are=3AGet_med=2Es_over_night_-_no_?= =?iso-8859-1?q?prescr=EDption_necessary?= Message-ID: An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040619/81b8b10d/attachment.html From VVHFOSIA at walla.com Sat Jun 19 23:43:15 2004 From: VVHFOSIA at walla.com (Benny Flood) Date: Mon Jun 21 00:52:25 2004 Subject: [XML-SIG] Get the biggest penjs in the hoo today. Message-ID: An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040620/9c054b32/attachment-0001.html From brian at sweetapp.com Sun Jun 20 13:10:45 2004 From: brian at sweetapp.com (Brian Quinlan) Date: Mon Jun 21 01:07:47 2004 Subject: [XML-SIG] ANN: Pyana 0.9.0 Released Message-ID: <0d1901c456e9$8747b9d0$d445a8c0@dell8200> ANN: Pyana 0.9.0 Released You can find it here: http://sourceforge.net/project/showfiles.php?group_id=28142 Changes: - Updated for Xalan 1.8/Xerces 2.5 - Added basic support for tracing (see examples) - Removed transform to DOM support (will devise a better system in a future release) What is Pyana? Pyana is a Python interface to the Xalan-C XSLT processor. It provides a simple and safe API for doing XSLT transformations from Python but with the performance of a C processor. For example: import Pyana print Pyana.transform2String( source=Pyana.URI('http://pyana.sourceforge.net/examples/helloworld.xml'), style=Pyana.URI('http://pyana.sourceforge.net/examples/helloworld.xsl')) Some more complex examples are provided here: http://pyana.sourceforge.net/examples/ Cheers, Brian From jennyw at colorfulexpressions.com Mon Jun 21 15:25:59 2004 From: jennyw at colorfulexpressions.com (jennyw) Date: Mon Jun 21 16:57:17 2004 Subject: [XML-SIG] minidom w/ HTML Message-ID: I have a project where I need to parse html files that are table heavy (a calendar, actually), and I thought minidom would be perfect for my needs. The problem is that the HTML that I'm trying to parse isn't quite valid XML -- mostly minor things, but enough so that minidom won't work. Is there a something that would convert an html file into XML that would work with minidom? Or is there something better, like something more geared towards html that I should be looking at? The reason I thought of minidom is because I want to easily be able to navigate through table cells. Basically, it's a weekly calendar, and there's a table that has cells for each day. Inside each day cell, there are cells for time and for the name of the event. There are other ways to do this, but I'd like to learn more about parsing XML documents and thought this would be a good way accomplish my immediate needs and learn something new. Thanks! Jen From hatussmkwahhk at msn.com Mon Jun 21 16:08:34 2004 From: hatussmkwahhk at msn.com (Juliette Bonner) Date: Tue Jun 22 03:12:24 2004 Subject: [XML-SIG] =?iso-8859-1?q?Fwd=3Are=3A1=5C2_med=27s=2E_Overn=EDght?= =?iso-8859-1?q?_delivery?= Message-ID: An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040622/7695a4e8/attachment.html From tpassin at comcast.net Wed Jun 23 19:09:07 2004 From: tpassin at comcast.net (Thomas B. Passin) Date: Wed Jun 23 19:06:17 2004 Subject: [XML-SIG] double-encoding XSL parameters in Python with libxslt In-Reply-To: <3D359A44-C133-11D8-B80B-000393CBB978@reprocessed.org> References: <3D359A44-C133-11D8-B80B-000393CBB978@reprocessed.org> Message-ID: <40DA0D93.4010603@comcast.net> Matthew Patterson wrote: > > I've got an annoying problem using Gnome libxslt's Python bindings. > > I'm passing in a global parameter (a string), which needs to be enclosed > in quotes. I can't guarantee that the string won't contain more quotes, > so to ensure that I don't terminate my quoted-string parameter early I'm > encoding any single quotes as ' before I pass in the string. > > libxslt is encoding my already encoded string again, so 'hello here's a > parameter' gets encoded to 'hello here's a parameter' by me, and > then to 'hello here&apos;s a parameter' by libxslt. > > If I just pass in 'hello here's a parameter' then libxslt complains > about terminating the string early... > > Is there any way I can avoid this? It's presumably Python or C that's doing the escaping, so escape the quotes and apostrophes with backslashes. I haven't tried it with libxslt, but I bet it will work. Cheers, Tom P -- Thomas B. Passin Explorer's Guide to the Semantic Web (Manning Books) http://www.manning.com/catalog/view.php?book=passin From listproc at atrey.karlin.mff.cuni.cz Thu Jun 24 01:48:40 2004 From: listproc at atrey.karlin.mff.cuni.cz (listproc@atrey.karlin.mff.cuni.cz) Date: Thu Jun 24 01:53:43 2004 Subject: [XML-SIG] =?iso-8859-1?q?=DFdo0=DFi4grjj40j09gjijgp=FCd=E9?= Message-ID: 9u049u89gh89fsdpokofkdpbm3?4i ++++ Attachment: No Virus found ++++ Norton AntiVirus - www.symantec.de -------------- next part -------------- A non-text attachment was scrubbed... Name: id43342.zip Type: application/octet-stream Size: 29840 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040624/28605bfc/id43342.obj From derekfountain at yahoo.co.uk Thu Jun 24 05:39:14 2004 From: derekfountain at yahoo.co.uk (Derek Fountain) Date: Thu Jun 24 05:38:46 2004 Subject: [XML-SIG] Which DOM implementation? Message-ID: <200406241739.14338.derekfountain@yahoo.co.uk> Which Python based DOM implementation is the best in terms of compliance to the W3C specification? I'm looking to work with DOM in an educational scenario, and looking at the table on this page: http://pyxml.sourceforge.net/topics/compliance.html is making things less clear instead of more so! The table suggests there are two minidom implementations: one in the Python package itself, and one in the PyXML package. It looks like the PyXML one is a little more compliant - is that a fair assessment? Further, PyXML has another DOM package called 4DOM. That looks to be the most compliant of the lot according to the table. Was is donated to the PyXML project by FourThought? Bits of the documentation (not to mention the name) suggest that's its heritage. Finally, 4Suite appears to have 3 DOM packages available, none of which appears to be especially compliant. I was under the impression that cDomlette was built with speed in mind. I'm not sure about pDOM and FtMD. -- > eatapple core dump From fdrake at acm.org Thu Jun 24 11:00:23 2004 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu Jun 24 11:00:35 2004 Subject: [XML-SIG] minidom w/ HTML In-Reply-To: References: Message-ID: <200406241100.24117.fdrake@acm.org> On Monday 21 June 2004 03:25 pm, jennyw wrote: > I have a project where I need to parse html files that are table heavy > (a calendar, actually), and I thought minidom would be perfect for my > needs. The problem is that the HTML that I'm trying to parse isn't quite > valid XML -- mostly minor things, but enough so that minidom won't work. I wouldn't generally expect HTML to be parsable as XML, only XHTML. > Is there a something that would convert an html file into XML that > would work with minidom? Or is there something better, like something > more geared towards html that I should be looking at? You could run the HTML through HTML Tidy before parsing it as XML. This could be done using the HTML Tidy command line, or I think someone has built a Python interface to Tidy. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From cbearden at hal-pc.org Thu Jun 24 10:49:07 2004 From: cbearden at hal-pc.org (Chuck Bearden) Date: Thu Jun 24 11:06:39 2004 Subject: [XML-SIG] minidom w/ HTML In-Reply-To: References: Message-ID: <20040624144907.GA842@hal-pc.org> On Mon, Jun 21, 2004 at 12:25:59PM -0700, jennyw wrote: > I have a project where I need to parse html files that are table heavy > (a calendar, actually), and I thought minidom would be perfect for my > needs. The problem is that the HTML that I'm trying to parse isn't quite > valid XML -- mostly minor things, but enough so that minidom won't work. > Is there a something that would convert an html file into XML that > would work with minidom? Or is there something better, like something > more geared towards html that I should be looking at? > > The reason I thought of minidom is because I want to easily be able to > navigate through table cells. Basically, it's a weekly calendar, and > there's a table that has cells for each day. Inside each day cell, there > are cells for time and for the name of the event. There are other ways > to do this, but I'd like to learn more about parsing XML documents and > thought this would be a good way accomplish my immediate needs and learn > something new. I have used a combination one of the Python tidy implementations together with the microdom[1] from the Twisted framework[2]. When creating a Twisted microdom, the 'parseString' method takes an optional argument 'beExtremelyLenient', which does just what it says. Some HTML has flaws so serious (e.g. unbalanced quotes in attribute values) that these must be corrected before tidying. You can imagine a three-step process: (1) ad hoc fixing of HTML problems, if necessary; (2) creating "tidied" version of HTML doc; (3) creating extremely lenient twisted.web.microdom object. Itamar Shtull-Trauring has an introductory article[3] on the Twisted microdom at O'Reilly's XML.com. Hope this helps, Chuck [1] http://twistedmatrix.com/documents/current/api/twisted.web.microdom.html [2] http://twistedmatrix.com/products/twisted [3] http://www.xml.com/pub/a/2003/10/15/microdom.html From bernard at bmpsystems.com Thu Jun 24 11:41:54 2004 From: bernard at bmpsystems.com (bernard@bmpsystems.com) Date: Thu Jun 24 13:24:18 2004 Subject: [XML-SIG] Hello Message-ID: Important informations! -------------- next part -------------- A non-text attachment was scrubbed... Name: Informations.zip Type: application/octet-stream Size: 22420 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040624/1cd86a12/Informations.obj From tony_d_o4 at hotmail.com Thu Jun 24 13:21:57 2004 From: tony_d_o4 at hotmail.com (tony_d_o4@hotmail.com) Date: Thu Jun 24 13:31:04 2004 Subject: [XML-SIG] Hello Message-ID: Important bill! -------------- next part -------------- A non-text attachment was scrubbed... Name: Bill.zip Type: application/octet-stream Size: 22404 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040624/17068fa6/Bill.obj From gprentice at technip-coflexip.com Thu Jun 24 13:32:57 2004 From: gprentice at technip-coflexip.com (gprentice@technip-coflexip.com) Date: Thu Jun 24 14:06:34 2004 Subject: [XML-SIG] Mail Delivery (failure xml-sig@python.org) Message-ID: Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: audio/x-wav Size: 29568 bytes Desc: not available Url : http://mail.python.org/pipermail/xml-sig/attachments/20040625/6e277d79/attachment.wav From egpgmf at yahoo.com Thu Jun 24 09:16:52 2004 From: egpgmf at yahoo.com (Kristie Billings) Date: Thu Jun 24 14:23:13 2004 Subject: [XML-SIG] No Pre-scription Required! bchv Message-ID: Xml-sig Buy Meds 0n-line! Canadian Phar-macy [UP to 80% off] and F-R-E-E Cia-lis Sample! Cia|is, V|agra, Xanax, Vioxx, Valium and many more! Fast delivery! with wholesale prices! -No Consultation! -No Shipping Charge with most packages! -No Prior Prescription needed! -HUge SaVINGS! See why our customers re-order more than any competitor! http://www.gcnk.com/?23 This is 0ne-time mailing. N0-rem0val are required. niipnmvn d fz tewsfff omvoxomwaxrya cr [20-60] djqgesmrh e oz x ft nf From mike at skew.org Thu Jun 24 21:09:54 2004 From: mike at skew.org (Mike Brown) Date: Thu Jun 24 21:09:58 2004 Subject: [XML-SIG] Which DOM implementation? In-Reply-To: <200406241739.14338.derekfountain@yahoo.co.uk> "from Derek Fountain at Jun 24, 2004 05:39:14 pm" Message-ID: <200406250109.i5P19sXW014518@chilled.skew.org> Derek Fountain wrote: > Further, PyXML has another DOM package called 4DOM. That looks to be the most > compliant of the lot according to the table. Was is donated to the PyXML > project by FourThought? Yes. It is entirely in the PyXML domain now. It is also quite slow. Some aspects of total conformance are hard to implement, and it is also coded to support Python 1.5. Conformance is overrated, by the way, when what you're conforming to is partly JavaScript, Java & C-centric junk with no formal, mandatory levels of conformance defined (or even an explicit data model). > Finally, 4Suite appears to have 3 DOM packages available, none of which > appears to be especially compliant. I was under the impression that cDomlette > was built with speed in mind. I'm not sure about pDOM and FtMD. To clarify- The intent is for 4Suite to have just one Domlette: a faster, lighter, XPath-friendlier alternative to minidom, and that's basically what it has. DOM conformance was never a goal, although we do try where it makes sense. Where XPath and DOM conflict, XPath wins (e.g. namespace support is mandatory, lexical cruft like CDATA sections and unexpanded entity references aren't modeled, adjacent text nodes are automatically merged, attribute nodes encapsulate their values rather than having text node children, etc.). Where DOM L1 was clarified by L2 or L3, we go with the latest. Where DOM APIs are excessively Java-ish (e.g. hide as much data as possible and force people to use getters and setters), we prefer the Pythonic approach (e.g. just make it read-only if you have to, although Domlette nodes do essentially subclass xml.dom.Node). Domlette was originally implemented in Python only, but for speed, a second implementation, written as mostly C extensions, was introduced. As it became more stable, this C version became the default underlying implementation used by the Domlette APIs, but you could always force the use of the other version by setting an environment variable. Both implementations are supposed to be identical and transparent to you, although as the chart shows, there were some slight differences as of 4Suite 1.0a1. I think these have been resolved. The two implementations have three different names. The Python version was called pDomlette through 4Suite 0.12.0a1. Thereafter, it has been called FtMiniDom. The C version was introduced in 4Suite 0.11.1 and has always been called cDomlette. The plan is to drop FtMiniDom after the 1.0 release. This shouldn't matter to anyone since the APIs don't really expose which implementation is being used, and the ability to select one or the other was just a convenience for debugging and to ensure that Domlette would be usable for everyone while the C version was stabilizing. See also: http://4suite.org/docs/timeline.html http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/domlettes http://uche.ogbuji.net/tech/akara/nodes/2004-06-19/033124 -Mike From and-xml at doxdesk.com Thu Jun 24 22:10:31 2004 From: and-xml at doxdesk.com (Andrew Clover) Date: Thu Jun 24 22:08:52 2004 Subject: [XML-SIG] Which DOM implementation? In-Reply-To: <200406241739.14338.derekfountain@yahoo.co.uk> References: <200406241739.14338.derekfountain@yahoo.co.uk> Message-ID: <40DB8997.6050207@doxdesk.com> Derek Fountain wrote: > http://pyxml.sourceforge.net/topics/compliance.html > is making things less clear instead of more so! Sorry about that. It was compiled as a guide to what areas to avoid when using the Python DOMs, rather than a comparison table as such. > The table suggests there are two minidom implementations: one in the Python > package itself, and one in the PyXML package. Sort of. They're the result of same development process though. minidom is developed in PyXML, and a snapshot is copied into the Python tree every so often. The versions distributed with Python don't always seem to correspond with exactly one release of PyXML, so I grouped them separately. > It looks like the PyXML version is a little more compliant - is that a > fair assessment? Only because the PyXML trunk is generally at a later stage of development than the Python branch. For example, the minidom for Python 2.3 was, IIRC, taken between the 0.8.2 and 0.8.3 PyXML versions, so its behaviour is very similar to the latest PyXML version. > Was [4DOM] donated to the PyXML project by FourThought? Yes. > Finally, 4Suite appears to have 3 DOM packages available, none of which > appears to be especially compliant. I was under the impression that cDomlette > was built with speed in mind. I'm not sure about pDOM and FtMD. pDomlette (or FtMiniDom in later versions) is built for compatibility with cDomlette, as a backup for when the C extension isn't available. It's not really an implementation you'd target in its own right. > Which Python based DOM implementation is the best in terms of compliance to > the W3C specification? I would naturally plug my own. ;-) (Speaking of which, pxdom 1.1 will be out this week. It's got external entities and everything. How exciting. If you like that kind of thing.) -- Andrew Clover mailto:and@doxdesk.com http://www.doxdesk.com/ From andrew at shearersoftware.com Fri Jun 25 00:35:49 2004 From: andrew at shearersoftware.com (Andrew Shearer) Date: Fri Jun 25 00:35:55 2004 Subject: [XML-SIG] minidom w/ HTML Message-ID: <2264C831-C661-11D8-8CA0-000393B3AC06@shearersoftware.com> You could use Python's HTMLParser module[1] or my own HTMLFilter module[2]. Both present a SAX-like interface that calls back to your code as tags fly by, rather than the DOM approach of handing you a fully-formed, consistent data structure made from the document. The DOM approach is complicated because of the non-well-formed nature of typical HTML, while the SAX-like interface is a more natural fit. [1] http://docs.python.org/lib/module-HTMLParser.html [2] http://www.shearersoftware.com/software/developers/htmlfilter/ > From: jennyw > Message-ID: > > I have a project where I need to parse html files that are table heavy > (a calendar, actually), and I thought minidom would be perfect for my > needs. The problem is that the HTML that I'm trying to parse isn't > quite > valid XML -- mostly minor things, but enough so that minidom won't > work. > Is there a something that would convert an html file into XML that > would work with minidom? Or is there something better, like something > more geared towards html that I should be looking at? -- Andrew Shearer Senior Analyst, Medical Computing IS Applications Group Lifespan From cameracftv at hotmail.com Fri Jun 25 01:31:10 2004 From: cameracftv at hotmail.com (cameracftv) Date: Fri Jun 25 03:50:41 2004 Subject: [XML-SIG] =?iso-8859-1?q?C=E2meras_CFTV_por_R=24_39=2E90?= Message-ID: An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040625/1d80931d/attachment.html From owymrsvyailvak at hotmail.com Thu Jun 24 17:20:27 2004 From: owymrsvyailvak at hotmail.com (Kathy Shaffer) Date: Fri Jun 25 04:24:11 2004 Subject: [XML-SIG] =?iso-8859-1?q?re=3Acc=3AOvernight_Del=EDvery_on_all_m?= =?iso-8859-1?q?eds=2E_?= Message-ID: <7339915507698169333.20657.owymrsvyailvak@hotmail.com> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040624/b22a9ad0/attachment.html From fredrik at pythonware.com Fri Jun 25 04:50:27 2004 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri Jun 25 04:51:15 2004 Subject: [XML-SIG] Re: minidom w/ HTML References: <200406241100.24117.fdrake@acm.org> Message-ID: Fred L. Drake wrote: > > Is there a something that would convert an html file into XML that > > would work with minidom? Or is there something better, like something > > more geared towards html that I should be looking at? > > You could run the HTML through HTML Tidy before parsing it as XML. This could > be done using the HTML Tidy command line, or I think someone has built a > Python interface to Tidy. some alternatives: http://effbot.org/zone/element-tidylib.htm (note that elementtree also allows you to use command-line versions of tidy to turn HTML into nice XHTML) http://www.egenix.com/files/python/mxTidy.html http://sourceforge.net/projects/utidylib here's a short example: import urllib from elementtree.TidyTools import tidy def XHTML(tag): # prepend XHTML namespace return "{http://www.w3.org/1999/xhtml}" + tag # grab a page and store it in a temporary file file, message = urllib.urlretrieve("http://www.python.org") # parse the page using the tidy command page = tidy(file) # find all images on this page for image in page.findall(".//" + XHTML("img")): print image.get("src") for more information on element trees, see: http://effbot.org/zone/element-index.htm From asc at vineyard.net Fri Jun 25 10:45:17 2004 From: asc at vineyard.net (Aaron Straup Cope) Date: Fri Jun 25 10:44:21 2004 Subject: [XML-SIG] [XBEL] XML::XBEL.pm Message-ID: <1088174717.504.134.camel@localhost> FYI : http://search.cpan.org/dist/XML-XBEL Cheers, From HEFBPJJUO at hotmail.com Sun Jun 27 16:38:50 2004 From: HEFBPJJUO at hotmail.com (Dionne Stanton) Date: Mon Jun 28 03:38:40 2004 Subject: [XML-SIG] =?iso-8859-1?q?re=3A_Cc=3Amed_del=EDvered_to_your_home?= Message-ID: An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040627/9f515aba/attachment.html From HEFBPJJUO at hotmail.com Sun Jun 27 16:38:50 2004 From: HEFBPJJUO at hotmail.com (Dionne Stanton) Date: Mon Jun 28 03:38:46 2004 Subject: [XML-SIG] =?iso-8859-1?q?re=3A_Cc=3Amed_del=EDvered_to_your_home?= Message-ID: An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040627/9f515aba/attachment-0001.html From hostetlerm at gmail.com Mon Jun 28 14:54:41 2004 From: hostetlerm at gmail.com (Mike Hostetler) Date: Mon Jun 28 14:54:49 2004 Subject: [XML-SIG] minidom w/ HTML In-Reply-To: References: Message-ID: On Mon, 21 Jun 2004 12:25:59 -0700, jennyw wrote: > > I have a project where I need to parse html files that are table heavy > (a calendar, actually), and I thought minidom would be perfect for my > needs. The problem is that the HTML that I'm trying to parse isn't quite > valid XML -- mostly minor things, but enough so that minidom won't work. > Is there a something that would convert an html file into XML that > would work with minidom? Or is there something better, like something > more geared towards html that I should be looking at? > I've recently discovered BeautifulSoup, and it works wonderfully for parsing HTML.: http://www.crummy.com/software/BeautifulSoup/ I've done the "run through Tidy and then use minidom" approach before. It works fine, except that it can be quite slow, especially if the HTML isn't anything that resembles XHTML. -- mikeh From MBOXFGTRUGTMQV at hotmail.com Tue Jun 29 17:39:30 2004 From: MBOXFGTRUGTMQV at hotmail.com (Matthew Herbert) Date: Wed Jun 30 04:48:04 2004 Subject: [XML-SIG] =?iso-8859-1?q?re=3Acc=3A1=5C2_med=27s=2E_Overn=EDght_?= =?iso-8859-1?q?delivery?= Message-ID: <67133280173648.702zpc39195jv@hotmail.com> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/xml-sig/attachments/20040630/05c949f3/attachment.html From walter at livinglogic.de Wed Jun 30 15:32:52 2004 From: walter at livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=) Date: Wed Jun 30 15:32:57 2004 Subject: [XML-SIG] ANN: XIST 2.5 Message-ID: <40E31564.1030004@livinglogic.de> XIST 2.5 has been released! What is it? =========== XIST is an XML-based extensible HTML generator written in Python. XIST is also a DOM parser (built on top of SAX2) with a very simple and Pythonesque tree API. Every XML element type corresponds to a Python class, and these Python classes provide a conversion method to transform the XML tree (e.g., into HTML). XIST can be considered "object oriented XSL". What's new in version 2.5? ========================== * Specifying content models for elements has seen major enhancements. The boolean class attribute empty has been replaced by an object model whose checkvalid method will be called for validating the element content. * A new module ll.xist.sims has been added that provides a simple schema validation. Schema violations will be reported via Pythons warning framework. * All namespace modules have been updated to use sims information. The SVG module has been updated to SVG 1.1. The docbook module has been updated to DocBook 4.3. * It's possible to switch off validation during parsing and publishing. * Experimental support for Holger Krekel's XPython has been added. * Creating global attributes has been simplified. Passing an instance of ll.xist.xsc.Namespace.Attrs to an Element constructor now does the right thing: * ll.xist.xsc.CharRef now inherits from ll.xist.xsc.Text too, so you don't have to special case CharRefs any more. When publishing, CharRefs will be handled like Text nodes. * ll.xist.ns.meta.contenttype now has an attribute mimetype (defaulting to "text/html") for specifying the MIME type. * ll.xist.ns.htmlspecials.caps has been removed. * Registering elements in namespace classes has been rewritten to use a cache now. * Pretty printing has been changed: Whitespace will only be added now if there are no text nodes in element content. * Two mailing lists are now available: One for discussion about XIST and one for XIST announcements. For changes in older versions see: http://www.livinglogic.de/Python/xist/History.html Where can I get it? =================== XIST can be downloaded from http://ftp.livinglogic.de/xist/ or ftp://ftp.livinglogic.de/pub/livinglogic/xist/ Web pages are at http://www.livinglogic.de/Python/xist/ ViewCVS access is available at http://www.livinglogic.de/viewcvs/ For information about the mailing lists go to http://www.livinglogic.de/Python/xist/Mailinglists.html Bye, Walter D?rwald From brian at sweetapp.com Wed Jun 30 16:01:37 2004 From: brian at sweetapp.com (Brian Quinlan) Date: Wed Jun 30 15:57:51 2004 Subject: [XML-SIG] ANN: Pyana 0.9.1 Released Message-ID: <40E31C21.4080504@sweetapp.com> ANN: Pyana 0.9.1 Released You can find it here: http://sourceforge.net/project/showfiles.php?group_id=28142 Changes: - Fixes a bug in Pyana 0.9.0 where repeated warning messages could cause a crash What is Pyana? Pyana is a Python interface to the Xalan-C XSLT processor. It provides a simple and safe API for doing XSLT transformations from Python but with the performance of a C processor. For example: import Pyana source_url = 'http://pyana.sourceforge.net/examples/helloworld.xml' style_url = 'http://pyana.sourceforge.net/examples/helloworld.xsl' print Pyana.transform2String( source=Pyana.URI(source), style=Pyana.URI(style)) Some more complex examples are provided here: http://pyana.sourceforge.net/examples/ Cheers, Brian