[I18n-sig] XML and codecs

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 5 Jun 2001 22:05:04 +0200


> > What do you mean: "provided it's a StreamReader/Writer". What if I
> > invoke the encode method found in codec lookup, and get an exception?
> 
> The encoders/decoders returned in the lookup tuple are not
> supposed to store state. If you want to or need to store state,
> then you should use the factory functions (StreamWriter and
> -Reader) to first create an instance which can store state
> and then use its .encode()/.decode() methods.

To create one of these, I need a file object. I just want a stateful
encoder, not a stream. So if I don't have a file object, how do I
create an encoder?

Plus, if I cannot use the functions returned from codecs.lookup in
stateful encodings, what are they good for, anyways?

> > So I think the sentence in the documentation saying "expected to work"
> > is an error.
> 
> This is per design and not a mistake.

Ok, so it is an error in the design, not only in the documentation.

> If encoders/decoders (the first two items in the
> lookup tuple) would store state, then you would have serious problems
> when reusing them for different inputs. I'm not even talking about
> threading problems here.

What specific problems would you have? I.e. is there anything in the
standard library that gets into serious problems if codecs.lookup
returns a stateful object?

> The other two entries were designed to provide statefull codec
> interfaces, so your JIS codec would have to use those in order
> to store shift states etc. or do more complex work on the data.

First, as I said, I cannot use them as-is, since I need a file.

Furthermore, are you saying that I can use codecs.lookup(enc)[:2] only
for some encodings, not for others? That sounds like a huge design
flaw.

> The encoder/decoder functions should only provide very basic
> encoding/decoding facilities which do not require keeping
> state (e.g. they might have additional keyword arguments to
> parameterize them to work in different shift states).

Arghh. Whether the facilities are basic or not depends on the
encoding.

So again I consider this broken, and the best fix is to allow the
callable objects returned in codecs.lookup(enc)[:2] to maintain state
if they want.

Users must then either look them up again if they want to reuse them
for different input, or they can recycle them if they happen to know
that no state is maintained.

Regards,
Martin