[Python-ideas] Add "htmlcharrefreplace" error handler

Fri Jun 14 10:49:31 CEST 2013

On Fri, 14 Jun 2013 09:44:09 +0200
"M.-A. Lemburg" <mal at egenix.com> wrote:
> 
> > IMHO character references (named or numerical) should never be used in
> > HTML (with the exception of " > and <).
> > They exist mainly for three reasons:
> > 1) provide a way to include characters that are not available in the
> > used encoding (e.g. if you are using an obsolete encoding like
> > windows-1252 but still want to use "fancy" characters);
> > 2) to keep the HTML source ASCII-only;
> 
> This is the main reason for using them. HTML's default encoding
> is Latin-1, unlike XML.

I'd like to know which good reasons there are to not use utf-8 for HTML
pages in 2013.
"Keeping the HTML source ASCII-only" is just silly IMO, and it doesn't
warrant special support in Python's codec error handlers.

Regards

Antoine.