[I18n-sig] XML and codecs

Walter Doerwald walter@livinglogic.de
Wed, 06 Jun 2001 17:51:10 +0200


On 06.06.01 at 17:26 M.-A. Lemburg wrote:

> Walter Doerwald wrote:
> > 
> > On 05.06.01 at 11:02 M.-A. Lemburg wrote:
> > 
> > > [...]
> > >
> > > Sure, but it breaks the current API completely. The above
> > > mechanism is different in that the communication in the error
> > > case is done by means of an exception. While this is not as
> > > fast as a callback it does have some advantages:
> > >
> > > * you can write the error handling code in the context using
> > >   the codec
> > >
> > > * it enables you to write error handling code at higher levels
> > >   in the calling stack
> > 
> > But this means that you would have to allow the encoder to keep
> > state between calls. That's no isse with a callback, because there
> > is only one call.
> 
> Well, either the codec keeps state or your application;
> here's some pseudo code to illustrate the first situation:
> 
> def do_something(data):
> 
>     StreamWriter =3D codec.lookup('myencoding')[3]
>     output =3D cStringIO(data)
>     writer =3D StreamWriter(output, 'break')
>     while 1:
>         try:
>             writer.write(data)
>         except UnicodeBreakError, (reason, position, work):
>             # Write data converted so far
>             output.write(work)
>             # Roll back 10 chars in the input and retry
>             data =3D data[position - 10:]
>         else:
>             break
>     return output.getvalue()

Apart from the fact, that I have to use a StreamWriter
(I probably would have to anyway, since only one BOM at the
start of an output file is required.) this looks usable.

The big question is: Is 'break' a temporary workaround
that will go away as soon as we have error handling
callbacks? Do we want error handling callbacks?

And finally: How fast is it?

> > > * it fits in with the current API
> > 
> > That's right. Unfortunately there are a lot of functions that
> > would have to be changed.
> 
> That's why I prefer small steps rather than replacing the
> complete codec suite with new interfaces.

The type of one argument changes in all the functions, i.e.
there's a new set of *Ex functions, where
	const char *errors
becomes
	PyObject *errors

Bye,
   Walter D=F6rwald

-- 
Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7
www.livinglogic.de