[I18n-sig] Proposal: Extended error handlingforunicode.encode

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Fri, 5 Jan 2001 22:00:25 +0100


> The codec design is supposed to cover the general case of
> encoding/decoding arbitrary data from and to arbitrary formats.

Where is it documented as such? I believe it is wishful thinking to
assume they cover some general case, although I have to acknowledge
that *your* wish is more relevant than other people's wishes.

> Please don't try to break everything down to Unicode<->8-bit
> codecs. The design should be able to cover conversion between
> image formats, audio formats, compression schemes and other
> encodings just as well as between different text formats.

Is there any precedent that it is actually useful for anything else?

> I agree that the case for Unicode codecs allows some simplification
> to the codec API design, but restricting it to this range of
> application only would cause us much trouble in the years to come
> when other codec applications start to appear in the Python
> universe.

Well, there are a number of codec applications in the Python universe
already (e.g. uuencode/base64, various graphics format converters,
compression modules); none of which uses the codec module. I firmly
believe that they shouldn't - I rather have a good solution for each
single problem, than a mediocre solution that also solves unrelated
problems.

> Other applications do have a need to jump back and forth in
> the data stream, e.g. say you want to decode a corrupt image
> file or a truncated MP3 file.

Then they also need special API for that; your codec framework will be
useless.

> I am planning to add compression codecs based on zlib and
> possibly cryptographic codecs which can then be used together
> with stackable streams to provide seemless compression and/or
> encryption to application which otherwise do not provide this
> functionality.

Which application do you want to enhance with that functionality?  To
support writing compressed files, you just use gzip.open; or
gzip.GzipFile(fileobj=mystream) if you want to operate on a stream
instead of a named file.

> > > If we were to provide a callback as optional method to
> > > StreamReaders/Writers, the task could be done either statically
> > > by subclassing the existing codec StreamReaders/Writers or
> > > dynamically by asking the codec registry to return the StreamReader/
> > > Writer classes.
> > 
> > So how would the implementation of charmap_encode invoke this method?
> > It currently doesn't even get hold of the codec object.
> 
> Through the extended API I proposed earlier on: the extra context
> object would allow providing a callback mechanism. Alternatively,
> the StreamRead/Writer classes could use their own specific
> C coding functions.

Was there some detailed proposal of an API? I don't recall that; could
you kindly point me to the message in the archives which elaborate
that proposal?

Specifically, as an author of an application that wants to extend
existing codecs, could you post some Python code that shows how to
create the context objects (including an implementation of the codec
object's class), and how to pass it to Unicodeobject.encode?

> Exactly. There is a set of error strings which the codec
> must accept, but it is free to also implement other schemes
> as well.

Ok, the guaranteed error strings being 'strict','ignore' and
'replace'.

> I chose strings to simplify the implementation. Back when the
> design was discussed, we figured that the codec should take
> care of the error handling. Python's codec design is one of
> the few which does allow setting error handling behaviour --
> other implementations tend to simply raise an exception and leave
> the user in the dark.
> 
> It's too late to *change* the design. We can only extend it.

It's too late to change the *API*, the design of it can be changed as
long as the current API still emerges as a special case. That's what
Walter's proposal does: The API is extended to allow callable objects
as the eror parameter, and three well-known constants are
provided (codecs.{STRICT|IGNORE|REPLACE}).

Regards,
Martin