[Python-Dev] Re: [Patches] Patch for xmlrpc encoding

Ragnar Kjørstad python@ragnark.vestdata.no
Tue, 10 Dec 2002 04:45:39 +0100


On Mon, Dec 09, 2002 at 11:15:38AM +0100, M.-A. Lemburg wrote:
> > The dumps-method in xmlrpclib has the following comment:
> >     All 8-bit strings in the data structure are assumed to use the
> >     packet encoding.  Unicode strings are automatically converted,
> >     where necessary.
> > 
> > This doesn't work very well. In our particular case we're using latin=
_1
> > as our default encoding, and we're using UTF-8 for the packet encodin=
g.
> > We can't really change the default encoding, because the sql-modules
> > transfer latin_1 encoded data and we can't change the packet encoding=
 to
> > latin_1 because the xmlrpc-client (php) doesn't work with that.
> >
> > The attached patch changes xmlrpclib to convert strings to unicode us=
ing
> > the default encoding, and then convert them back to strings with the
> > packet encoding. If unicode is not available it falls back to the old
> > behaviour.
> 
> I believe this is overkill. If you need this behaviour, subclass
> the Marshaller in xmlrpclib and add your feature to that subclass.
> Then replace the Marshaller class in xmlrpclib with your subclass.

Well, we replaced the xmlrpclib.Marshaller.dump_string method from our
application. That works as a workaround for us, but my point was not not
to merely make our application work but to fix this problem for other
python users as well.

The library makes an assumption that is (IMHO) just not valid. There is
simply no reason to assume strings use the packet encoding. 

Why would you not like to fix this? Because of the performance? It would
be possible to have both functions available in the class, and only use
the encoding-convertion when the encodings are actually different. This
could be done with no other performance penalty than a simple check when
the encoding is set. (The constructor?)

I simply didn't include this code in my patch because it would make the
code harder to read and I think most people use ascii or latin_1 for their 
string-encoding and UTF-8 as their packet-encoding.


> Aside: xmlrpclib should support subclassing the Marshaller and
> Unmarshaller more transparently. Currently, the two are hard-coded
> into the rest of xmlrpclib without the possibility to provide your
> own subclasses without tweaking xmlrpclib from the outside.

In principle I agree, but it should not be neccessary to subclass the
Marshaller for most applications, and tweaking it from the outside can
be done pretty easily in python :)


> Please post patches using the SourceForge patch manager.

Didn't you just write that the patch was overkill and you didn't want
it? Do you want me to post it anyway? Or did you just mean for any
potentail future patches?



-- 
Ragnar Kjørstad
Zet.no