[Python-Dev] Fixing the XML batteries

Stefan Behnel stefan_ml at behnel.de
Mon Dec 12 10:59:23 CET 2011


"Martin v. Löwis", 11.12.2011 23:03:
> Am 09.12.2011 10:09, schrieb Xavier Morel:
>> On 2011-12-09, at 09:41 , Martin v. Löwis wrote:
>>>> a) The stdlib documentation should help users to choose the right
>>>> tool right from the start. Instead of using the totally
>>>> misleading wording that it uses now, it should be honest about
>>>> the performance characteristics of MiniDOM and should actively
>>>> suggest that those who don't know what to choose (or even *that*
>>>> they can choose) should not use MiniDOM in the first place.
>>>
> [...]
>>
>> Minidom is inferior in interface flow and pythonicity, in terseness,
>> in speed, in memory consumption (even more so using cElementTree, and
>> that's not something which can be fixed unless minidom gets a C
>> accelerator), etc… Even after fixing minidom (if anybody has the time
>> and drive to commit to it), ET/cET should be preferred over it.
>
> I don't mind pointing people to ElementTree, despite that I disagree
> whether the ET interface is "superior" to DOM.

Yes, that's clearly a point where we agree to disagree, and I understand 
that you are as biased towards minidom as I am biased towards ElementTree.

However, I think I made it clear that the implementation of cElementTree 
(and lxml.etree as well, for that purpose) is largely superiour to MiniDOM 
in terms of performance, for any sensible meaning of the word performance.

And I'm also convinced that the API is largely superiour in terms of 
usability. ET certainly matches Python as a language much better than 
MiniDOM. But that's just my personal opinion.


> It's Stefan's reasoning
> as to *why* people should be pointed to ET, and what words should be
> used to do that. IOW, I detest bashing some part of the standard
> library, just to urge users to use some other part of the standard library.

I'm all for finding a good way of putting it into words, as long as it 
keeps uninformed users from taking the wrong decision and getting the wrong 
idea of how complicated and slow Python is.


> People are still using PyXML, despite it's not being maintained anymore.

My experience with that is that it's only *new* users that are still 
running into PyXML by accident, because they didn't see that it's a dead 
project and they find it through ancient web pages that tell them that they 
need it because "it's the way to do XML in Python" and "if minidom is not 
enough, use PyXML". Maybe we should "misuse" the stdlib documentation to 
clear that up as well. "PyXML" is just too attractive a name for a dead 
project.

Just look through the xml-sig page, basically all requests regarding PyXML 
during the last five years deal with problems in installing it, i.e. 
*before* even starting to use it. So you can't use this to claim that 
people really *are* still using it.


> Telling them to replace 4DOM with minidom is much more appropriate

Do you actually have any evidence that anyone is still actively using 4DOM?


> than telling them to rewrite in ET.

I usually encourage people to rewrite minidom code for ET. It makes the 
code simpler, more readable, more maintainable and much faster.

Stefan



More information about the Python-Dev mailing list