[I18n-sig] Codecs

M.-A. Lemburg mal@lemburg.com
Mon, 05 Jun 2000 15:11:56 +0200


Brian Takashi Hooper wrote:
> 
> This issue came up before on this list, I think Andy Robinson suggested
> it before in the midst of a lot of other Unicode musings.  One thing I
> remember Andy mentioned was that a codec object could then additionally
> contain methods in addition to those required by the codec API, for
> example a method to fix broken legacy encoding input strings, etc.
> 
> Personally, I would be happier to get an object back from
> codecs.lookup(), one vote in favor if it matters.
> 
> Are there any good reasons to prefer getting a tuple back from codecs.lookup()?

Here are some: 

* The tuple entries have two different flavours: the first
two are readily usable encode/decode APIs, while the last
two point to factory functions which can be used to create
new objects.

* Tuples are much easier to create and query at C level than
Python objects having a certain interface.

* The tuples can easily be cached and this is what the codec
registry currently does to enhance performance. Object lookups
are slower than tuple entry lookups (ok, no so much an argument,
because the conversion itself is likely to cause much more
overhead).

* There is quite a lot of code in the dist which already uses
the tuple value (all codecs, the codec registry, sample apps,
etc.).

* Who's going to write the code and produce the patches ?

Note that you can easily add you own wrappers of codecs.lookup()
which then give you an object instead of the tuple.

The extensibility argument is a problem with the current
solution, but is there really such a great need for extra
codec APIs ? (Please remember that all codec writers would
have to implement these new APIs -- there more you put in
there the more difficult and less attractive it gets...)

> --Brian
> 
> On Sun, 04 Jun 2000 09:54:01 -0500
> Paul Prescod <paul@prescod.net> wrote:
> 
> > Should codecs be returned to the user as objects instead of tuples?
> > Today we have:
> >
> > (UTF8_encode, UTF8_decode,
> >       UTF8_streamreader, UTF8_streamwriter) = codecs.lookup('UTF-8')
> >
> > output = UTF8_streamwriter( open( '/tmp/output', 'wb') )
> >
> > I think this would be a little simpler:
> >
> > output=codecs.lookup('UTF-8').stream_writer( open( '/tmp/output', 'wb')
> > )
> >
> > The object solution is more extensible, requires less "bogus"
> > assignments and does not require the user to remember the order of the
> > return values.
> >
> > --
> >  Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
> > Simplicity does not precede complexity, but follows it.
> >       - http://www.cs.yale.edu/~perlis-alan/quotes.html
> >
> > _______________________________________________
> > I18n-sig mailing list
> > I18n-sig@python.org
> > http://www.python.org/mailman/listinfo/i18n-sig
> >
> 
> _______________________________________________
> I18n-sig mailing list
> I18n-sig@python.org
> http://www.python.org/mailman/listinfo/i18n-sig

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/