[XML-SIG] Finding _xmlplus in Python 2.3a2

Martijn Faassen faassen@vet.uu.nl
Tue, 4 Mar 2003 15:44:03 +0100


"Martin v. L=F6wis" wrote:
> Martijn Faassen wrote:
> >Don't know about this one. Does this happen a lot, and what is the
> >motivation? Is the use case like installing something in site-packages=
 but
> >keeping the new modules in a separate location instead?
>=20
> I believe only few packages ship with .pth files, which predate=20
> packages. AFAIK, Numerical Python uses that, to put stuff into=20
> site-packages/Numeric, and still have all modules inside Numeric appear=
=20
> at the toplevel.

All right. So let's not ship a .pth file with PyXML. :)

> >>2. A module can install additional codecs, causing changes in
> >>  behaviour to unicode_string.encode(), and codecs.lookup().
> >
> >Additional codecs is the key thing here. Can existing codecs get=20
> >overridden?
>=20
> This is underspecified, and undocumented, but I believe you can, atleas=
t=20
> by modifying the codecs aliases.

Okay. I hope this is not a common practice as it would get really
confusing (install a module and suddenly code that uses one codec
elsewhere all breaks, ugh, though presumably the code elsewhere would
still need to call a function register the new codec? in that case it's
not as bad).

> >Anyway, this is not installing a new version of source code, but
> >plugging in data.
>=20
> No. A codec is an algorithm, that performs some computation. Depending=20
> on whether certain codecs are installed, the very same Python program=20
> can behave differently on different machines.

A codec does not introduce new API however, which I probably a more accur=
ate
description of what I was trying to get at. Also the codec algorithm=20
is as far as I understand follows a fairly uncomplicated interface
(data in, data out), while PyXML provides a very extensive set of interfa=
ces
(which do frequently have benefit of being defined by standards at least).

> >>3. A module can register things in copy_reg, causing changes in
> >>  behaviour to the pickle, cPickle, and copy modules.
> >
> >Sure, a module can add its own __add__ too causing changes in behavior
> >in the core too. :) Does existing code commonly get *overridden* by th=
is
> >mechanism, or is it a mechanism to plug in more functionality?=20
>=20
> It is designed to plug in more functionality, just like _xmlplus is.=20

Are you sure this is equivalent? Does this system add new APIs? And it ad=
ds=20
abilities for objects presumably defined by the installed package itself.=
 Does=20
installing that package suddenly change the behavior of *other* code on t=
he
system, commonly, without this code having asked for it? I can't think of
examples (unless things like site.py get modified), but perhaps I'm missi=
ng=20
them.

> >And if something goes wrong, the offending module will be in the trace=
back.
>=20
> What is the offending module? The plugin, or the code using the plugin?

The plugin, presumably. As well as the code using the plugin. :)

> >If something goes wrong with PyXML, _xmlplus will be in the traceback
> >and this is not very enlightening.=20
>=20
> I can't see the difference. If something goes wrong after replacing xml=
=20
> with _xmlplus, both _xmlplus, and the offending code (usually its=20
> caller) will be in the traceback.

Except that nobody is supposed to know what the heck _xmlplus is, where
it comes from, and how it ended up in the 'xml' namespace, as it's an
implementation detail. And it happens when I run client software installe=
d
in a completely separate way.

So yes, I'll see _xmlplus in the traceback while I didn't before. But
I don't believe that commonly happens with the other systems you sketched
before. If you plug in code into the Python core so your own application
will be able to use it, fine. If you plug in code into the Python core so
that all sorts of applications suddenly start using it *without explicitl=
y
asking for it*, not fine.

> > And if you hadn't installed PyXML
> >_xmlplus won't be in the traceback, with the *same client code*.
>=20
> Just like any other plugin: If you don't install the plugin, you don't=20
> see the plugin in the traceback (obviously).

Give me examples of plugins in the Python core besides PyXML that change =
the
behavior of client code of core libraries that *don't* explicitly ask
for such a thing to happen (by for instance importing from the modules
that install the plugins). Give me examples of modules that when installe=
d,
will break code that never references that module explicitly, due to a
bug or bugfix in the newly installed module.

> >Yes, and people believe (if they reach this stage) that PyXML silently
> >augments the 'xml' functionality with new features. Unfortunately it
> >silently overrides all of its code instead.=20
>=20
> But it does augment, by overriding all code! The code it overrides is=20
> nearly-identical to the code it replaces, except for bug fixes.

I've already pointed out that even bug fixes cause confusion and problems.
You cannot shield users from having to know about the _xmlplus hack.

> >I don't believe that in the cases you mention core Python APIs get ext=
ended
> >with new ones, either.
>=20
> Defined "get extended". If I can't pass "big-5" to the builtin unicode=20
> function first, but can do so after installing a codec, I would say tha=
t=20
> I have just extended the builtin unicode function.

Add new classes and new functions to existing modules, and new methods to
existing classes. I'm not talking about allowing existing APIs to=20
accept new arguments.

Mind that I'm not arguing that such an architecture is always a bad idea.
I would like to have a registry for Python XML software where I can
ask for a DOM implementation or an XPath implementation and get one.=20
Indirection can be useful, especially if you have a clear set of interfac=
es.
An example in PyXML is how you can ask different engines to do the
parsing for you (or the default one if you don't care). A difference with
the current situation is that the user doesn't have to deal with near-ide=
ntical
code in two places (instead different implementations of the same
functionality, each with its own tradeoffs), and that the user can be
explicit as well if desired.

Java as well as Zope 3 now define a tree of interfaces, and various
implementations can claim to implement these interfaces. This type of
thing is very useful, and I'd like more of that, as then I could
more easily swap out one (say) xpath engine with another.

> >Hmm...can we at least get a consensus that the current _xmlplus approa=
ch i
> >not ideal and can be confusin and that we should try to improve it if =
we=20
> >can?
>=20
> I can agree with that.

Phew. So we at least end this posting on a positive note. :)

Regards,

Martijn