Fixing the XML batteries

Stefan Behnel stefan_ml at behnel.de
Tue Dec 13 14:38:54 EST 2011


Serhiy Storchaka, 13.12.2011 19:57:
> 13.12.11 16:59, Stefan Behnel написав(ла):
>> It matches my opinion though.
>
> I would be glad to divide your intentions, however ElementTree looks less
> documented than minidom

It's certainly a lot smaller, which makes its API easier to learn and remember.


> and is not full replacement.

It's good enough for a surprisingly large part of all XML processing needs, 
and if you need more, there's lxml for that.


> For example, I  haven't found how to get XML encoding.

True - lxml provides it, but plain ET doesn't. However, I can't think of 
any major use cases where you'd care about the encoding of the original 
input file. Just use what suites your needs on the way back out. UTF-8 will 
usually do just fine.


> Also, at use of ElementTree instead
> of minidom the suffix "ns0:" is added to each element.

That's a "prefix", not a suffix. And since prefixes are basically useless 
for XML processing, it isn't commonly a problem whether they are called 
'nsXY' or 'abcdefg'. It's the parser's duty to handle them for you.


> I do not see how to _create_ a new element

element = Element('tagname')


> and to write it with <?xml ...?> header.

That's called a "declaration". You can get it with, e.g.,

ElementTree(element).write(encoding='utf8')

By default, ET doesn't write it unless it can put useful information into 
it. (Note that the XML spec makes the declaration optional for XML 1.0 
serialisation as UTF-8.)


> And DOM interface is more habitual for those who works with some other
> languages.

Not really. DOM is also considered unwieldy in many other languages. Even 
in a language as unwieldy as Java it's being frowned upon these days. In 
JavaScript, it has basically been replaced by jQuery, and many other 
languages also have substantially more "natural" ways to deal with XML than 
the DOM.

It's true, though, that ElementTree isn't a widely known interface outside 
of the Python world.


>> Yes, that's what C14N is there for, typically used for cryptography,
>> hashing, etc. However, MiniDOM doesn't implement that standard, so
>> you're on your own here.
>
> MiniDOM quite suited me earlier in this respect. I will pass to C14N as
> soon as I will be can.
>
>> The ET module is actually quite short (<1700 lines), so you can just
>> copy the Py2.7 version into your sources and optionally import it on
>> older Python releases. Since you only seem to depend on the serialiser
>> (which is worth using anyway because it is much faster in the Py2.7
>> version), older platform versions of cET should also work just fine with
>> that module copy, so you can basically just import everything from
>> xml.etree.cElementTree and use the ElementTree class and the tostring()
>> function from your own local version if the platform version is too old.
>>
>> Note that ET is also still available as a separately installable
>> package, may or may not be simpler to use for you.
>
> I thank, it is too bulky for my small scripts (which I have decided to
> update from Python 2.3 or 2.4 to modern Python 3 and 2.6+). I will better
> postpone high-grade migration for half-year or year while the Python 2.7
> and 3.2 won't appear in stable versions of popular distributives.

In case you are only dealing with small in-house scripts, I'd suggest 
installing ET 1.3 (or, even better, lxml) on the machines where you want to 
use it. Then you no longer have to care about those dependencies.


> I thank you for ET, it really is more convenient at some applications
> (especially at work with the text in elements).

Careful. ;) I'm just the author of lxml, not of ET. That would be Fredrik 
Lundh.

Stefan




More information about the Python-list mailing list