[I18n-sig] Proposal: Extended error handlingforunicode.encode

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Thu, 4 Jan 2001 11:41:38 +0100


> How would such a scheme allow passing back control information
> such as: continue with the next character in the stream or
> break with an exception ?

If it wanted to break with an exception, it would raise one. So the
function really has to acceptable results: an exception, and a Unicode
object. Since most Python functions are allowed to raise exceptions,
that went without saying.

> Which is what I'm talking about all along: the codecs know best
> what to do, so better extend them than try to fiddle in some
> information using a callback.

If that means to touch the source of all codecs, than that would be an
unacceptable solution. Doing it in a generic way would be ok - except
that I still can't see *how* this could possibly work.

> I did propose a solution which would satisfy your needs: simply
> add a new error treatment 'xml-escape' to the builtin codecs
> which then does the needed XML escaping. XML is general enough
> to warrant such a step and the required changes are minor.

Sorry, I missed that. That would also solve the problem at hand. Since
nobody has come up with a different use case for a more general
solution, that might be the solution which we can reasonably implement
for 2.1.

> Another candidate for a new error treatment would be
> 'unicode-escape' which then replaces the character in question with
> '\uXXXX'.

+0. While that falls into the same category, I haven't seen anybody
saying "I need such a feature".

> For the general case, I'd rather add new PyUnicode_EncodeEx()
> and PyUnicode_DecodeEx() APIs which then take a Python
> context object as extra argument. The error treatment string
> would then define how to use this context object, e.g. 'callback'
> could be made to apply processing similar to what Walter
> suggested.

What other acceptable values for the string would you foresee?

Regards,
Martin