[I18n-sig] XML and codecs

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 5 Jun 2001 22:50:43 +0200


> > To create one of these, I need a file object. I just want a stateful
> > encoder, not a stream. So if I don't have a file object, how do I
> > create an encoder?
> 
> Simple: use cStringIO !

Are you serious? To encode strings, I need cStringIO ?!?

> > Plus, if I cannot use the functions returned from codecs.lookup in
> > stateful encodings, what are they good for, anyways?
> 
> For simple stateless encodings.

So it is not a general-purpose facility. What should a lookup function
return if it cannot provide a stateless encoding function?

> Please reread what I wrote and then think this over again... 

Why do you think I did not pay attention?

> by reusing a stateful encoder multiple times you would carry over
> state from one usage to the next, e.g. carry over the shift state
> from one data set to the next (which may not even use this shift
> state).

Indeed, that's what I want. How else could continuing after an
encoding error work?

If I want to start with fresh data, I also need to get a fresh codec
function, from codecs.lookup.

> These two APIs are exposed to simplify the interface for simple,
> stateless encodings. Since most encodings work just fine with
> these APIs they are indeed very useful.

It turns out that both UTF-16 and UTF-8 have problems with a stateless
approach, so I'm questioning the usefulness of the API.

Of course, having to use cStringIO isn't any better...

> Again, this decision was per design: the codec registry lookup
> mechanism caches the lookup tuples. With your proposal the cache
> would be rendered useless.

Given that encoding.search_function caches the result also, it is
questionable why codecs.lookup should do that. One cache should be
enough, and it should be in encodings, since all these encodings are
known to be stateless.

Regards,
Martin