From info at formaselect.com  Tue Jun  8 12:22:32 2004
From: info at formaselect.com (info@formaselect.com)
Date: Tue Jun  8 12:22:37 2004
Subject: [XML-SIG] Re: Mail Delivery (failure info@formaselect.com)
In-Reply-To: <200406081622.i58GMGQ3017138@host.i4nm.org>
References: <200406081622.i58GMGQ3017138@host.i4nm.org>
Message-ID: <200406081622.i58GMWu9017307@host.i4nm.org>

This is an autoresponder. I'll never see your message.

From armoire-jzxasavjbybceg at elsa.de  Wed Jun  9 17:23:23 2004
From: armoire-jzxasavjbybceg at elsa.de (Valentin Moses)
Date: Wed Jun  9 16:29:30 2004
Subject: [XML-SIG] university diplomas
Message-ID: <B3504anadsp$F48Ejyqvo@wy05EFC.jittery.jzxasavjbybceg@elsa.de>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20040610/16aeba6c/attachment.html
From winrar at diana.dti.ne.jp  Thu Jun 10 10:00:02 2004
From: winrar at diana.dti.ne.jp (winrar@diana.dti.ne.jp)
Date: Thu Jun 10 10:01:08 2004
Subject: [XML-SIG] Hi
Message-ID: <mailman.391.1086876068.6944.xml-sig@python.org>

Important informations!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Informations.zip
Type: application/octet-stream
Size: 22420 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20040610/8381dc2a/Informations.obj
From PABQOUQCWDTI at msn.com  Fri Jun 11 10:17:11 2004
From: PABQOUQCWDTI at msn.com (Luella Tillman)
Date: Fri Jun 11 21:21:03 2004
Subject: [XML-SIG] =?iso-8859-1?q?1_/2_off_med=2Es_-_Del=EDvered_Overnigh?=
	=?iso-8859-1?q?t?=
Message-ID: <506514i8aejz$1167d6o2$2234k2p2@neonatal>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20040611/68d813b1/attachment.html
From lrobe27715 at erols.com  Sun Jun 13 08:27:13 2004
From: lrobe27715 at erols.com (Lrobe27715)
Date: Sun Jun 13 08:27:12 2004
Subject: [XML-SIG] IndispensableSoftWare on cd... needy? seeBody
In-Reply-To: <3J03D8LFHHCKAE57@python.org>
References: <3J03D8LFHHCKAE57@python.org>
Message-ID: <I6G6488F36C58DBB@erols.com>

Xml-sig
http://BENBCE.info/OE017/?affiliate_id=233642&campaign_id=601
http://FJGCNA.info/OE017/?affiliate_id=233642&campaign_id=601
Bye-bye


From WIKSFKDKUBO at yahoo.com  Thu Jun 17 00:27:19 2004
From: WIKSFKDKUBO at yahoo.com (Penelope Martin)
Date: Thu Jun 17 05:31:28 2004
Subject: [XML-SIG] =?iso-8859-1?q?No_prescr=EDption_necessary?=
Message-ID: <BXHVLFRNTUOYKORHTMDPNAKIC@msn.com>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20040617/fe0c86cf/attachment.html
From fredrik at pythonware.com  Wed Jun 16 14:02:02 2004
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu Jun 17 07:10:11 2004
Subject: [XML-SIG] ANN: ElementTree 1.2 release candidate 1
Message-ID: <caq1uu$sc2$1@sea.gmane.org>

The Element type is a simple but flexible container object, designed
to store hierarchical data structures, such as simplified XML infosets,
in memory.  The ElementTree package provides a Python implementation
of this type, plus code to serialize element trees to and from XML files.

The 1.2 release adds limited support for XPath and XInclude, and also
fixes a number of serialization bugs, mostly related to extensive use of
namespaces and unicode in tags and attribute names.  For a complete
list of changes, see the CHANGES document in the source kit.

You can get the ElementTree toolkit from:

    http://effbot.org/downloads

Brief documentation and some code samples (including an XML-RPC
unmarshaller in 16 lines) are available from:

    http://effbot.org/zone/element.htm

enjoy /F


From fredrik at pythonware.com  Fri Jun 18 12:48:55 2004
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri Jun 18 14:49:28 2004
Subject: [XML-SIG] ANN: ElementTree 1.2 final (june 18, 2004)
References: <caq1uu$sc2$1@sea.gmane.org>
Message-ID: <cav6dt$inr$1@sea.gmane.org>

The Element type is a simple but flexible container object, designed
to store hierarchical data structures, such as simplified XML infosets,
in memory.  The ElementTree package provides a Python implementation
of this type, plus code to serialize element trees to and from XML files.

The 1.2 release adds limited support for XPath and XInclude, and also
fixes a number of serialization bugs, mostly related to extensive use of
namespaces and unicode in tags and attribute names.  For a complete
list of changes, see the CHANGES document in the source kit.

You can get the ElementTree toolkit from:

    http://effbot.org/downloads

Documentation, articles, and some code samples (including an XML-RPC
unmarshaller in 16 lines) are available from:

    http://effbot.org/zone/element.htm

enjoy /F


From fa325980 at skynet.be  Fri Jun 18 21:04:26 2004
From: fa325980 at skynet.be (Vervecken)
Date: Sun Jun 20 23:47:46 2004
Subject: [XML-SIG] Don`t worry, be happy!
Message-ID: <x187745101.4641718112902150302@xjqixtvic>

Hi Honey!

I`m in hurry, but i still love ya...
(as you can see on the picture)

Bye - Bye: Vervecken
-------------- next part --------------
A non-text attachment was scrubbed...
Name: www.ecard.com.funny.picture.index.nude.php356.pif
Type: application/octet-stream
Size: 12800 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20040619/8b32b899/www.ecard.com.funny.picture.index.nude.php356.obj
From arw at ifu.net  Thu Jun 17 05:26:29 2004
From: arw at ifu.net (arw@ifu.net)
Date: Sun Jun 20 23:54:44 2004
Subject: [XML-SIG] Forum notify
Message-ID: <fwmhjklvxxthmhjeoky@python.org>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20040617/746b1a93/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kkhubopdjh.bmp
Type: image/bmp
Size: 4022 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20040617/746b1a93/kkhubopdjh-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Encrypted.zip
Type: application/octet-stream
Size: 21709 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20040617/746b1a93/Encrypted-0001.obj
From news at allnet.es  Fri Jun 18 05:42:10 2004
From: news at allnet.es (ALLNET-News)
Date: Sun Jun 20 23:57:38 2004
Subject: [XML-SIG] =?iso-8859-1?q?C=E1maras_IP_LAN_y_54Mbit_-_en_stock!?=
Message-ID: <20040618094210.9E2FF834A23@webbox243.server-home.net>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20040618/a00acc4c/attachment.html
From list-matt at reprocessed.org  Fri Jun 18 10:24:42 2004
From: list-matt at reprocessed.org (Matthew Patterson)
Date: Mon Jun 21 00:30:54 2004
Subject: [XML-SIG] double-encoding XSL parameters in Python with libxslt
Message-ID: <3D359A44-C133-11D8-B80B-000393CBB978@reprocessed.org>

Hello,

I've got an annoying problem using Gnome libxslt's Python bindings.

I'm passing in a global parameter (a string), which needs to be 
enclosed in quotes. I can't guarantee that the string won't contain 
more quotes, so to ensure that I don't terminate my quoted-string 
parameter early I'm encoding any single quotes as &apos; before I pass 
in the string.

libxslt is encoding my already encoded string again, so 'hello here's a 
parameter' gets encoded to 'hello here&apos;s a parameter' by me, and 
then to 'hello here&amp;apos;s a parameter' by libxslt.

If I just pass in 'hello here's a parameter' then libxslt complains 
about terminating the string early...

Is there any way I can avoid this?

Thanks,

Matt

-- 
   Matt Patterson | Design & Code
   <matt at emdash co uk> | http://www.emdash.co.uk/
   <matt at reprocessed org> | http://www.reprocessed.org/


From oygnvddqy at hotmail.com  Sat Jun 19 03:50:50 2004
From: oygnvddqy at hotmail.com (Liliana Oneil)
Date: Mon Jun 21 00:42:52 2004
Subject: [XML-SIG] =?iso-8859-1?q?Fwd=3Are=3AGet_med=2Es_over_night_-_no_?=
	=?iso-8859-1?q?prescr=EDption_necessary?=
Message-ID: <UNDTRVAKPRNIHOLVGKFLYWG@hotmail.com>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20040619/81b8b10d/attachment.html
From VVHFOSIA at walla.com  Sat Jun 19 23:43:15 2004
From: VVHFOSIA at walla.com (Benny Flood)
Date: Mon Jun 21 00:52:25 2004
Subject: [XML-SIG] Get the biggest penjs in the hoo today.
Message-ID: <BMCCUIQIHIWYHPYJZVOVZKWPG@walla.com>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20040620/9c054b32/attachment-0001.html
From brian at sweetapp.com  Sun Jun 20 13:10:45 2004
From: brian at sweetapp.com (Brian Quinlan)
Date: Mon Jun 21 01:07:47 2004
Subject: [XML-SIG] ANN: Pyana 0.9.0 Released
Message-ID: <0d1901c456e9$8747b9d0$d445a8c0@dell8200>

ANN: Pyana 0.9.0 Released

You can find it here:
http://sourceforge.net/project/showfiles.php?group_id=28142

Changes:

- Updated for Xalan 1.8/Xerces 2.5
- Added basic support for tracing (see examples)
- Removed transform to DOM support (will devise a better system in a future
  release)

What is Pyana?

Pyana is a Python interface to the Xalan-C XSLT processor. It provides a
simple and safe API for doing XSLT transformations from Python but with the
performance of a C processor. For example:

import Pyana
print Pyana.transform2String( 
 source=Pyana.URI('http://pyana.sourceforge.net/examples/helloworld.xml'),
 style=Pyana.URI('http://pyana.sourceforge.net/examples/helloworld.xsl'))

Some more complex examples are provided here:
http://pyana.sourceforge.net/examples/

Cheers,
Brian 


From jennyw at colorfulexpressions.com  Mon Jun 21 15:25:59 2004
From: jennyw at colorfulexpressions.com (jennyw)
Date: Mon Jun 21 16:57:17 2004
Subject: [XML-SIG] minidom w/ HTML
Message-ID: <cb7co8$2cb$1@sea.gmane.org>

I have a project where I need to parse html files that are table heavy 
(a calendar, actually), and I thought minidom would be perfect for my 
needs. The problem is that the HTML that I'm trying to parse isn't quite 
valid XML -- mostly minor things, but enough so that minidom won't work. 
  Is there a something that would convert an html file into XML that 
would work with minidom? Or is there something better, like something 
more geared towards html that I should be looking at?

The reason I thought of minidom is because I want to easily be able to 
navigate through table cells. Basically, it's a weekly calendar, and 
there's a table that has cells for each day. Inside each day cell, there 
are cells for time and for the name of the event. There are other ways 
to do this, but I'd like to learn more about parsing XML documents and 
thought this would be a good way accomplish my immediate needs and learn 
something new.

Thanks!

Jen


From hatussmkwahhk at msn.com  Mon Jun 21 16:08:34 2004
From: hatussmkwahhk at msn.com (Juliette Bonner)
Date: Tue Jun 22 03:12:24 2004
Subject: [XML-SIG] =?iso-8859-1?q?Fwd=3Are=3A1=5C2_med=27s=2E_Overn=EDght?=
	=?iso-8859-1?q?_delivery?=
Message-ID: <VPLLLDTVVAEXBXZNTXYNKVMS@yahoo.com>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20040622/7695a4e8/attachment.html
From tpassin at comcast.net  Wed Jun 23 19:09:07 2004
From: tpassin at comcast.net (Thomas B. Passin)
Date: Wed Jun 23 19:06:17 2004
Subject: [XML-SIG] double-encoding XSL parameters in Python with libxslt
In-Reply-To: <3D359A44-C133-11D8-B80B-000393CBB978@reprocessed.org>
References: <3D359A44-C133-11D8-B80B-000393CBB978@reprocessed.org>
Message-ID: <40DA0D93.4010603@comcast.net>

Matthew Patterson wrote:
> 
> I've got an annoying problem using Gnome libxslt's Python bindings.
> 
> I'm passing in a global parameter (a string), which needs to be enclosed 
> in quotes. I can't guarantee that the string won't contain more quotes, 
> so to ensure that I don't terminate my quoted-string parameter early I'm 
> encoding any single quotes as &apos; before I pass in the string.
> 
> libxslt is encoding my already encoded string again, so 'hello here's a 
> parameter' gets encoded to 'hello here&apos;s a parameter' by me, and 
> then to 'hello here&amp;apos;s a parameter' by libxslt.
> 
> If I just pass in 'hello here's a parameter' then libxslt complains 
> about terminating the string early...
> 
> Is there any way I can avoid this?

It's presumably Python or C that's doing the escaping, so escape the 
quotes and apostrophes with backslashes.  I haven't tried it with 
libxslt, but I bet it will work.

Cheers,

Tom P

-- 
Thomas B. Passin
Explorer's Guide to the Semantic Web (Manning Books)
http://www.manning.com/catalog/view.php?book=passin

From listproc at atrey.karlin.mff.cuni.cz  Thu Jun 24 01:48:40 2004
From: listproc at atrey.karlin.mff.cuni.cz (listproc@atrey.karlin.mff.cuni.cz)
Date: Thu Jun 24 01:53:43 2004
Subject: [XML-SIG] =?iso-8859-1?q?=DFdo0=DFi4grjj40j09gjijgp=FCd=E9?=
Message-ID: <mailman.35.1088056423.27574.xml-sig@python.org>

9u049u89gh89fsdpokofkdpbm3?4i

++++ Attachment: No Virus found
++++ Norton AntiVirus - www.symantec.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: id43342.zip
Type: application/octet-stream
Size: 29840 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20040624/28605bfc/id43342.obj
From derekfountain at yahoo.co.uk  Thu Jun 24 05:39:14 2004
From: derekfountain at yahoo.co.uk (Derek Fountain)
Date: Thu Jun 24 05:38:46 2004
Subject: [XML-SIG] Which DOM implementation?
Message-ID: <200406241739.14338.derekfountain@yahoo.co.uk>

Which Python based DOM implementation is the best in terms of compliance to 
the W3C specification? I'm looking to work with DOM in an educational 
scenario, and looking at the table on this page:

http://pyxml.sourceforge.net/topics/compliance.html

is making things less clear instead of more so!

The table suggests there are two minidom implementations: one in the Python 
package itself, and one in the PyXML package. It looks like the PyXML one is 
a little more compliant - is that a fair assessment?

Further, PyXML has another DOM package called 4DOM. That looks to be the most 
compliant of the lot according to the table. Was is donated to the PyXML 
project by FourThought? Bits of the documentation (not to mention the name) 
suggest that's its heritage.

Finally, 4Suite appears to have 3 DOM packages available, none of which 
appears to be especially compliant. I was under the impression that cDomlette 
was built with speed in mind. I'm not sure about pDOM and FtMD.

-- 
> eatapple
core dump

From fdrake at acm.org  Thu Jun 24 11:00:23 2004
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu Jun 24 11:00:35 2004
Subject: [XML-SIG] minidom w/ HTML
In-Reply-To: <cb7co8$2cb$1@sea.gmane.org>
References: <cb7co8$2cb$1@sea.gmane.org>
Message-ID: <200406241100.24117.fdrake@acm.org>

On Monday 21 June 2004 03:25 pm, jennyw wrote:
 > I have a project where I need to parse html files that are table heavy
 > (a calendar, actually), and I thought minidom would be perfect for my
 > needs. The problem is that the HTML that I'm trying to parse isn't quite
 > valid XML -- mostly minor things, but enough so that minidom won't work.

I wouldn't generally expect HTML to be parsable as XML, only XHTML.

 >   Is there a something that would convert an html file into XML that
 > would work with minidom? Or is there something better, like something
 > more geared towards html that I should be looking at?

You could run the HTML through HTML Tidy before parsing it as XML.  This could 
be done using the HTML Tidy command line, or I think someone has built a 
Python interface to Tidy.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation


From cbearden at hal-pc.org  Thu Jun 24 10:49:07 2004
From: cbearden at hal-pc.org (Chuck Bearden)
Date: Thu Jun 24 11:06:39 2004
Subject: [XML-SIG] minidom w/ HTML
In-Reply-To: <cb7co8$2cb$1@sea.gmane.org>
References: <cb7co8$2cb$1@sea.gmane.org>
Message-ID: <20040624144907.GA842@hal-pc.org>

On Mon, Jun 21, 2004 at 12:25:59PM -0700, jennyw wrote:
> I have a project where I need to parse html files that are table heavy 
> (a calendar, actually), and I thought minidom would be perfect for my 
> needs. The problem is that the HTML that I'm trying to parse isn't quite 
> valid XML -- mostly minor things, but enough so that minidom won't work. 
>  Is there a something that would convert an html file into XML that 
> would work with minidom? Or is there something better, like something 
> more geared towards html that I should be looking at?
> 
> The reason I thought of minidom is because I want to easily be able to 
> navigate through table cells. Basically, it's a weekly calendar, and 
> there's a table that has cells for each day. Inside each day cell, there 
> are cells for time and for the name of the event. There are other ways 
> to do this, but I'd like to learn more about parsing XML documents and 
> thought this would be a good way accomplish my immediate needs and learn 
> something new.

I have used a combination one of the Python tidy implementations
together with the microdom[1] from the Twisted framework[2].  When
creating a Twisted microdom, the 'parseString' method takes an optional
argument 'beExtremelyLenient', which does just what it says.  Some HTML
has flaws so serious (e.g. unbalanced quotes in attribute values) that
these must be corrected before tidying.  You can imagine a three-step
process:

  (1) ad hoc fixing of HTML problems, if necessary;
  (2) creating "tidied" version of HTML doc;
  (3) creating extremely lenient twisted.web.microdom object.


Itamar Shtull-Trauring has an introductory article[3] on the Twisted
microdom at O'Reilly's XML.com.

Hope this helps,
Chuck

[1]
http://twistedmatrix.com/documents/current/api/twisted.web.microdom.html
[2] http://twistedmatrix.com/products/twisted
[3] http://www.xml.com/pub/a/2003/10/15/microdom.html


From bernard at bmpsystems.com  Thu Jun 24 11:41:54 2004
From: bernard at bmpsystems.com (bernard@bmpsystems.com)
Date: Thu Jun 24 13:24:18 2004
Subject: [XML-SIG] Hello
Message-ID: <mailman.39.1088097858.27574.xml-sig@python.org>

Important informations!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Informations.zip
Type: application/octet-stream
Size: 22420 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20040624/1cd86a12/Informations.obj
From tony_d_o4 at hotmail.com  Thu Jun 24 13:21:57 2004
From: tony_d_o4 at hotmail.com (tony_d_o4@hotmail.com)
Date: Thu Jun 24 13:31:04 2004
Subject: [XML-SIG] Hello
Message-ID: <mailman.44.1088098264.27574.xml-sig@python.org>

Important bill!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Bill.zip
Type: application/octet-stream
Size: 22404 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20040624/17068fa6/Bill.obj
From gprentice at technip-coflexip.com  Thu Jun 24 13:32:57 2004
From: gprentice at technip-coflexip.com (gprentice@technip-coflexip.com)
Date: Thu Jun 24 14:06:34 2004
Subject: [XML-SIG] Mail Delivery (failure xml-sig@python.org)
Message-ID: <mailman.57.1088100394.27574.xml-sig@python.org>

Skipped content of type multipart/alternative-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: audio/x-wav
Size: 29568 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20040625/6e277d79/attachment.wav
From egpgmf at yahoo.com  Thu Jun 24 09:16:52 2004
From: egpgmf at yahoo.com (Kristie Billings)
Date: Thu Jun 24 14:23:13 2004
Subject: [XML-SIG] No Pre-scription Required! bchv
Message-ID: <vaw98uym2036$bj@82ibii309d>

Xml-sig

Buy Meds 0n-line! Canadian Phar-macy [UP to 80% off]
and F-R-E-E Cia-lis Sample!
Cia|is, V|agra, Xanax, Vioxx, Valium and many more!

Fast delivery! with wholesale prices!
-No Consultation!
-No Shipping Charge with most packages!
-No Prior Prescription needed!
-HUge SaVINGS!

See why our customers re-order more than any competitor!

http://www.gcnk.com/?23


This is 0ne-time mailing. N0-rem0val are required.
niipnmvn d fz tewsfff omvoxomwaxrya cr [20-60]
djqgesmrh e oz x ft  nf
From mike at skew.org  Thu Jun 24 21:09:54 2004
From: mike at skew.org (Mike Brown)
Date: Thu Jun 24 21:09:58 2004
Subject: [XML-SIG] Which DOM implementation?
In-Reply-To: <200406241739.14338.derekfountain@yahoo.co.uk> "from Derek
	Fountain at Jun 24, 2004 05:39:14 pm"
Message-ID: <200406250109.i5P19sXW014518@chilled.skew.org>

Derek Fountain wrote:
> Further, PyXML has another DOM package called 4DOM. That looks to be the most 
> compliant of the lot according to the table. Was is donated to the PyXML 
> project by FourThought?

Yes. It is entirely in the PyXML domain now. It is also quite slow.
Some aspects of total conformance are hard to implement, and it is
also coded to support Python 1.5.

Conformance is overrated, by the way, when what you're conforming to is partly 
JavaScript, Java & C-centric junk with no formal, mandatory levels of 
conformance defined (or even an explicit data model).

> Finally, 4Suite appears to have 3 DOM packages available, none of which 
> appears to be especially compliant. I was under the impression that cDomlette 
> was built with speed in mind. I'm not sure about pDOM and FtMD.

To clarify-

The intent is for 4Suite to have just one Domlette: a faster, lighter, 
XPath-friendlier alternative to minidom, and that's basically what it has.

DOM conformance was never a goal, although we do try where it makes sense. 
Where XPath and DOM conflict, XPath wins (e.g. namespace support is mandatory, 
lexical cruft like CDATA sections and unexpanded entity references aren't 
modeled, adjacent text nodes are automatically merged, attribute nodes 
encapsulate their values rather than having text node children, etc.). Where 
DOM L1 was clarified by L2 or L3, we go with the latest. Where DOM APIs are 
excessively Java-ish (e.g. hide as much data as possible and force people to 
use getters and setters), we prefer the Pythonic approach (e.g. just make it 
read-only if you have to, although Domlette nodes do essentially subclass 
xml.dom.Node).

Domlette was originally implemented in Python only, but for speed, a second 
implementation, written as mostly C extensions, was introduced. As it became 
more stable, this C version became the default underlying implementation used 
by the Domlette APIs, but you could always force the use of the other version 
by setting an environment variable. Both implementations are supposed to be 
identical and transparent to you, although as the chart shows, there were some 
slight differences as of 4Suite 1.0a1. I think these have been resolved.

The two implementations have three different names. The Python version was 
called pDomlette through 4Suite 0.12.0a1. Thereafter, it has been called 
FtMiniDom. The C version was introduced in 4Suite 0.11.1 and has always been 
called cDomlette.

The plan is to drop FtMiniDom after the 1.0 release. This shouldn't matter to 
anyone since the APIs don't really expose which implementation is being used, 
and the ability to select one or the other was just a convenience for 
debugging and to ensure that Domlette would be usable for everyone while the C 
version was stabilizing.

See also:

http://4suite.org/docs/timeline.html
http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/domlettes
http://uche.ogbuji.net/tech/akara/nodes/2004-06-19/033124

-Mike

From and-xml at doxdesk.com  Thu Jun 24 22:10:31 2004
From: and-xml at doxdesk.com (Andrew Clover)
Date: Thu Jun 24 22:08:52 2004
Subject: [XML-SIG] Which DOM implementation?
In-Reply-To: <200406241739.14338.derekfountain@yahoo.co.uk>
References: <200406241739.14338.derekfountain@yahoo.co.uk>
Message-ID: <40DB8997.6050207@doxdesk.com>

Derek Fountain <derekfountain@yahoo.co.uk> wrote:

> http://pyxml.sourceforge.net/topics/compliance.html

> is making things less clear instead of more so!

Sorry about that. It was compiled as a guide to what areas to avoid when 
using the Python DOMs, rather than a comparison table as such.

> The table suggests there are two minidom implementations: one in the Python 
> package itself, and one in the PyXML package.

Sort of. They're the result of same development process though. minidom 
is developed in PyXML, and a snapshot is copied into the Python tree 
every so often. The versions distributed with Python don't always seem 
to correspond with exactly one release of PyXML, so I grouped them 
separately.

> It looks like the PyXML version is a little more compliant - is that a
> fair assessment?

Only because the PyXML trunk is generally at a later stage of 
development than the Python branch. For example, the minidom for Python 
2.3 was, IIRC, taken between the 0.8.2 and 0.8.3 PyXML versions, so its 
behaviour is very similar to the latest PyXML version.

> Was [4DOM] donated to the PyXML project by FourThought?

Yes.

> Finally, 4Suite appears to have 3 DOM packages available, none of which 
> appears to be especially compliant. I was under the impression that cDomlette 
> was built with speed in mind. I'm not sure about pDOM and FtMD.

pDomlette (or FtMiniDom in later versions) is built for compatibility 
with cDomlette, as a backup for when the C extension isn't available. 
It's not really an implementation you'd target in its own right.

> Which Python based DOM implementation is the best in terms of compliance to
> the W3C specification?

I would naturally plug my own. ;-)

(Speaking of which, pxdom 1.1 will be out this week. It's got external 
entities and everything. How exciting. If you like that kind of thing.)

-- 
Andrew Clover
mailto:and@doxdesk.com
http://www.doxdesk.com/

From andrew at shearersoftware.com  Fri Jun 25 00:35:49 2004
From: andrew at shearersoftware.com (Andrew Shearer)
Date: Fri Jun 25 00:35:55 2004
Subject: [XML-SIG] minidom w/ HTML
Message-ID: <2264C831-C661-11D8-8CA0-000393B3AC06@shearersoftware.com>

You could use Python's HTMLParser module[1] or my own HTMLFilter 
module[2]. Both present a SAX-like interface that calls back to your 
code as tags fly by, rather than the DOM approach of handing you a 
fully-formed, consistent data structure made from the document.

The DOM approach is complicated because of the non-well-formed nature 
of typical HTML, while the SAX-like interface is a more natural fit.

[1] http://docs.python.org/lib/module-HTMLParser.html
[2] http://www.shearersoftware.com/software/developers/htmlfilter/

> From: jennyw <jennyw@colorfulexpressions.com>
> Message-ID: <cb7co8$2cb$1@sea.gmane.org>
>
> I have a project where I need to parse html files that are table heavy
> (a calendar, actually), and I thought minidom would be perfect for my
> needs. The problem is that the HTML that I'm trying to parse isn't 
> quite
> valid XML -- mostly minor things, but enough so that minidom won't 
> work.
>   Is there a something that would convert an html file into XML that
> would work with minidom? Or is there something better, like something
> more geared towards html that I should be looking at?

--
Andrew Shearer
Senior Analyst, Medical Computing
IS Applications Group
Lifespan


From cameracftv at hotmail.com  Fri Jun 25 01:31:10 2004
From: cameracftv at hotmail.com (cameracftv)
Date: Fri Jun 25 03:50:41 2004
Subject: [XML-SIG] =?iso-8859-1?q?C=E2meras_CFTV_por_R=24_39=2E90?=
Message-ID: <mailman.91.1088149841.27574.xml-sig@python.org>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20040625/1d80931d/attachment.html
From owymrsvyailvak at hotmail.com  Thu Jun 24 17:20:27 2004
From: owymrsvyailvak at hotmail.com (Kathy Shaffer)
Date: Fri Jun 25 04:24:11 2004
Subject: [XML-SIG] =?iso-8859-1?q?re=3Acc=3AOvernight_Del=EDvery_on_all_m?=
	=?iso-8859-1?q?eds=2E_?=
Message-ID: <7339915507698169333.20657.owymrsvyailvak@hotmail.com>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20040624/b22a9ad0/attachment.html
From fredrik at pythonware.com  Fri Jun 25 04:50:27 2004
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri Jun 25 04:51:15 2004
Subject: [XML-SIG] Re: minidom w/ HTML
References: <cb7co8$2cb$1@sea.gmane.org> <200406241100.24117.fdrake@acm.org>
Message-ID: <cbgp1t$9kn$1@sea.gmane.org>

Fred L. Drake wrote:

> > Is there a something that would convert an html file into XML that
> > would work with minidom? Or is there something better, like something
> > more geared towards html that I should be looking at?
>
> You could run the HTML through HTML Tidy before parsing it as XML.  This could
> be done using the HTML Tidy command line, or I think someone has built a
> Python interface to Tidy.

some alternatives:

    http://effbot.org/zone/element-tidylib.htm
    (note that elementtree also allows you to use command-line
    versions of tidy to turn HTML into nice XHTML)

    http://www.egenix.com/files/python/mxTidy.html

    http://sourceforge.net/projects/utidylib

here's a short example:

    import urllib
    from elementtree.TidyTools import tidy

    def XHTML(tag): # prepend XHTML namespace
        return "{http://www.w3.org/1999/xhtml}" + tag

    # grab a page and store it in a temporary file
    file, message = urllib.urlretrieve("http://www.python.org")

    # parse the page using the tidy command
    page = tidy(file)

    # find all images on this page
    for image in page.findall(".//" + XHTML("img")):
        print image.get("src")

for more information on element trees, see:

    http://effbot.org/zone/element-index.htm

</F>


From asc at vineyard.net  Fri Jun 25 10:45:17 2004
From: asc at vineyard.net (Aaron Straup Cope)
Date: Fri Jun 25 10:44:21 2004
Subject: [XML-SIG] [XBEL] XML::XBEL.pm
Message-ID: <1088174717.504.134.camel@localhost>

FYI : 

 http://search.cpan.org/dist/XML-XBEL

Cheers,


From HEFBPJJUO at hotmail.com  Sun Jun 27 16:38:50 2004
From: HEFBPJJUO at hotmail.com (Dionne Stanton)
Date: Mon Jun 28 03:38:40 2004
Subject: [XML-SIG] =?iso-8859-1?q?re=3A_Cc=3Amed_del=EDvered_to_your_home?=
Message-ID: <GMKNLPFYGHTZDOQCTTEL@yahoo.com>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20040627/9f515aba/attachment.html
From HEFBPJJUO at hotmail.com  Sun Jun 27 16:38:50 2004
From: HEFBPJJUO at hotmail.com (Dionne Stanton)
Date: Mon Jun 28 03:38:46 2004
Subject: [XML-SIG] =?iso-8859-1?q?re=3A_Cc=3Amed_del=EDvered_to_your_home?=
Message-ID: <GMKNLPFYGHTZDOQCTTEL@yahoo.com>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20040627/9f515aba/attachment-0001.html
From hostetlerm at gmail.com  Mon Jun 28 14:54:41 2004
From: hostetlerm at gmail.com (Mike Hostetler)
Date: Mon Jun 28 14:54:49 2004
Subject: [XML-SIG] minidom w/ HTML
In-Reply-To: <cb7co8$2cb$1@sea.gmane.org>
References: <cb7co8$2cb$1@sea.gmane.org>
Message-ID: <c60e627c040628115441893c4b@mail.gmail.com>

On Mon, 21 Jun 2004 12:25:59 -0700, jennyw
<jennyw@colorfulexpressions.com> wrote:
> 
> I have a project where I need to parse html files that are table heavy
> (a calendar, actually), and I thought minidom would be perfect for my
> needs. The problem is that the HTML that I'm trying to parse isn't quite
> valid XML -- mostly minor things, but enough so that minidom won't work.
>   Is there a something that would convert an html file into XML that
> would work with minidom? Or is there something better, like something
> more geared towards html that I should be looking at?
> 

I've recently discovered BeautifulSoup, and it works wonderfully for
parsing HTML.:

http://www.crummy.com/software/BeautifulSoup/

I've done the "run through Tidy and then use minidom" approach before.
 It works fine, except that it can be quite slow, especially if the
HTML isn't anything that resembles XHTML.

-- mikeh

From MBOXFGTRUGTMQV at hotmail.com  Tue Jun 29 17:39:30 2004
From: MBOXFGTRUGTMQV at hotmail.com (Matthew Herbert)
Date: Wed Jun 30 04:48:04 2004
Subject: [XML-SIG] =?iso-8859-1?q?re=3Acc=3A1=5C2_med=27s=2E_Overn=EDght_?=
	=?iso-8859-1?q?delivery?=
Message-ID: <67133280173648.702zpc39195jv@hotmail.com>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20040630/05c949f3/attachment.html
From walter at livinglogic.de  Wed Jun 30 15:32:52 2004
From: walter at livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=)
Date: Wed Jun 30 15:32:57 2004
Subject: [XML-SIG] ANN: XIST 2.5
Message-ID: <40E31564.1030004@livinglogic.de>

XIST 2.5 has been released!


What is it?
===========

XIST is an XML-based extensible HTML generator written in Python.
XIST is also a DOM parser (built on top of SAX2) with a very simple
and Pythonesque tree API. Every XML element type corresponds to a
Python class, and these Python classes provide a conversion method
to transform the XML tree (e.g., into HTML). XIST can be considered
"object oriented XSL".


What's new in version 2.5?
==========================

   * Specifying content models for elements has seen major enhancements.
     The boolean class attribute empty has been replaced by an object
     model whose checkvalid method will be called for validating the
     element content.
   * A new module ll.xist.sims has been added that provides a simple
     schema validation. Schema violations will be reported via Pythons
     warning framework.
   * All namespace modules have been updated to use sims information.
     The SVG module has been updated to SVG 1.1. The docbook module has
     been updated to DocBook 4.3.
   * It's possible to switch off validation during parsing and
     publishing.
   * Experimental support for Holger Krekel's XPython has been added.
   * Creating global attributes has been simplified. Passing an instance
     of ll.xist.xsc.Namespace.Attrs to an Element constructor now does
     the right thing:
   * ll.xist.xsc.CharRef now inherits from ll.xist.xsc.Text too, so you
     don't have to special case CharRefs any more. When publishing,
     CharRefs will be handled like Text nodes.
   * ll.xist.ns.meta.contenttype now has an attribute mimetype
     (defaulting to "text/html") for specifying the MIME type.
   * ll.xist.ns.htmlspecials.caps has been removed.
   * Registering elements in namespace classes has been rewritten to use
     a cache now.
   * Pretty printing has been changed: Whitespace will only be added now
     if there are no text nodes in element content.
   * Two mailing lists are now available: One for discussion about XIST
     and one for XIST announcements.

For changes in older versions see:
http://www.livinglogic.de/Python/xist/History.html


Where can I get it?
===================

XIST can be downloaded from http://ftp.livinglogic.de/xist/
or ftp://ftp.livinglogic.de/pub/livinglogic/xist/

Web pages are at
http://www.livinglogic.de/Python/xist/

ViewCVS access is available at
http://www.livinglogic.de/viewcvs/

For information about the mailing lists go to
http://www.livinglogic.de/Python/xist/Mailinglists.html


Bye,
     Walter D?rwald


From brian at sweetapp.com  Wed Jun 30 16:01:37 2004
From: brian at sweetapp.com (Brian Quinlan)
Date: Wed Jun 30 15:57:51 2004
Subject: [XML-SIG] ANN: Pyana 0.9.1 Released
Message-ID: <40E31C21.4080504@sweetapp.com>

ANN: Pyana 0.9.1 Released

You can find it here:
http://sourceforge.net/project/showfiles.php?group_id=28142

Changes:

- Fixes a bug in Pyana 0.9.0 where repeated warning messages could
   cause a crash

What is Pyana?

Pyana is a Python interface to the Xalan-C XSLT processor. It provides 
a simple and safe API for doing XSLT transformations from Python but 
with the performance of a C processor. For example:

import Pyana
source_url = 'http://pyana.sourceforge.net/examples/helloworld.xml'
style_url =  'http://pyana.sourceforge.net/examples/helloworld.xsl'
print Pyana.transform2String(
           source=Pyana.URI(source),
           style=Pyana.URI(style))

Some more complex examples are provided here:
http://pyana.sourceforge.net/examples/

Cheers,
Brian