[I18n-sig] Proposal: Extended error handlingforunicode.encode

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Sat, 6 Jan 2001 19:48:02 +0100


> I don't see the point in trying to argue for uselessness of
> an existing design. If you want your own design, then nobody 
> will stop you from rolling your own.

The design does not exist but on paper. What really matters is the API
and the implementation. I could not care less about the design, but
you bring to up to argue why the implementation should not be changed.

I don't want my own design, I want to enhance the API.

> > > > So how would the implementation of charmap_encode invoke this method?
> > > > It currently doesn't even get hold of the codec object.
[...]
> There wasn't a detailed proposal, only a design idea...

That's one of the major problems here, IMO. If there was a specific
proposal, it would be possible to evaluate whether it meets the
requirements.

Instead, you use "design ideas" to claim that some other specific
proposal which we already have is a bad thing, and that the design
could be much more general. That is not very convincing, as apparently
nobody can follow your design to really understand whether what you
claim is true.

> For the general case, I'd rather add new PyUnicode_EncodeEx()
> and PyUnicode_DecodeEx() APIs which then take a Python
> context object as extra argument. The error treatment string
> would then define how to use this context object, e.g. 'callback'
> could be made to apply processing similar to what Walter
> suggested.

Ok, PyUnicode_EncodeEx would then invoke PyCodec_EncodeEx, which would
eventually end-up in encodings.koi8_r.Codec.encode (or
encoding.koi8_r.Codec.encode_ex?). Now, how would that be implemented?

> The xxxEx() APIs will have to take special precautions to also
> work with pre-2.1 codecs though, since the codec API definition
> does not include the extra context objext.

In the specific case of KOI8-R, how would these precautions look like,
specifically, using, say, Python as a notation?

> > Specifically, as an author of an application that wants to extend
> > existing codecs, could you post some Python code that shows how to
> > create the context objects (including an implementation of the codec
> > object's class), and how to pass it to Unicodeobject.encode?
> 
> Sure, but only *after* the context object design has implemented..
> otherwise there wouldn't be a point ;-)

So you want to implement it first, and discuss use cases later???
Or maybe you don't want to discuss the design at all?

> No, it does not: the error string parameter is defined as "const char*".

You mean, in PyUnicode_FromEncodedObject, PyUnicode_Decode, and other
C functions? So you would have to provide additional functions in the
C API, but that is the same as your proposal with the *Ex functions,
as I understand it.

> You can't change that to PyObject* in the C API and for the Python API
> I wouldn't want to introduce "switch semantics on type" variables.

Ah, but it's 'switch semantics on value' :-) If you pass the string
'ignore', it has a different semantics than passing 'replace', which
again has a different semantic than passing
codecs.REPLACE_WITH_XML_CHARACTER_ENTITIES, which happens to be
callable.

> Extending APIs is OK, changing them is not.

That just is an extension. For the C interface, it apparently means
duplication; for the Python interface, we can keep the old signatures
and extend the acceptable parameter values.

> I'll right a patch which implements the 'xml-escape' error
> treatment. Hopefully that will buy us some time to think of
> a design extension -- provided you play along :-)

Good. I'm willing to agree on any proposal once I can see that it does
what it was designed for...

Regards,
Martin