[Python-ideas] Add "htmlcharrefreplace" error handler

Fri Jun 14 17:25:37 CEST 2013

On Fri, 14 Jun 2013 18:09:16 +0300
Serhiy Storchaka <storchaka at gmail.com>
wrote:
> 14.06.13 11:49, Antoine Pitrou написав(ла):
> > I'd like to know which good reasons there are to not use utf-8 for HTML
> > pages in 2013.
> 
> Russian text requires 2 bytes per character in utf-8 (not counting 
> spaces, punctuation and markup) and only 1 byte per character in any 
> special encoding (cp1251/cp866/koi8-r). Same for other European non 
> latin-based alphabets.

And even latin-based (e.g. latin-1), but if you really care about this,
it's certainly more efficient to compress your HTTP response than
trying to save space at the character level.

> Some old databases contains data in one of this 
> 8-bit encoding and generating html page in the same encoding does not 
> requires encoding/decoding at all.

If it doesn't require encoding/decoding, how are you going to specify
an encoding error handler?

Regards

Antoine.