codecs limitation

Wed Feb 18 11:06:49 EST 2004

On Wed, 18 Feb 2004 17:20:38 +0300 (MSK), 
	Denis S. Otkidach <ods at strana.ru> wrote:
> I have the same question as stated in comments: should we really
> enforce this and forget the idea to define some specialized
> encodings like 'html'?

I suppose it depends on what the codecs system is *for*.  If it's an
interface that goes between between the abstract world of Unicode code
points and the concrete world of 8-bit characters that represent those code
points, then the idea of returning anything but an 8-bit string from
.encode() doesn't make sense.  If codecs are for arbitrary string-to-string
transformations, then the restriction should be relaxed.

In any case, it's straightforward to define a separate string-like class
that escapes the string, e.g. as Quixote does:

>>> from quixote.html import htmltext as h
>>> h('<h1>%s</h1>') % 'Page title'
<htmltext '<h1>Page title</h1>'>
>>> h('<h1>%s</h1>') % 'Page title with <, &, > in it'
<htmltext '<h1>Page title with <, &, > in it</h1>'>
>>> h('<p>This is a test.')  + '<a href="http://example.com>'
<htmltext '<p>This is a test.<a href="http://example.com>'>

--amk