[Python-ideas] Add "htmlcharrefreplace" error handler

Fri Jun 14 11:37:28 CEST 2013

On 2013-06-14 11:22 CEST, Antoine Pitrou wrote:
> On Fri, 14 Jun 2013 19:06:55 +1000
> Steven D'Aprano ... wrote:
>> On 14/06/13 18:49, Antoine Pitrou wrote:
>>> "Keeping the HTML source ASCII-only" is just silly IMO,
>>
>> Surely no sillier than "keep the Python std lib source ASCII-only".
>
> <ignored level="suggested"/> the difference
> between source code and hypertext documents?

still in 2013, if you upload documents to at least one standardizing 
organization and you use utf-8 as author you are fine, as long it only 
uses ASCII characters ;-)

Any umlaut or other typographically utf-8'd slipping in, ends up as 
broken latin-1 rendering.

It will take many more years I presume until the chain of submitted 
documents and servers serving the received versions is really utf-8 safe.

>>> and it doesn't
>>> warrant special support in Python's codec error handlers.
>>
> We're talking about this as if it were a major change. Doesn't
> this count as a trivial addition? The only question in my mind is, "Are the
> HTML char ref rules different enough from the XML rules that Python
> should provide both?"
>
> It's not trivial, it's additional C code in an important part of the
> language (unicode and codecs).
>
> And I haven't seen you propose a patch <ignored level="suggested"/>.

could we try to refrain from some b.t.w.'s \? (using trigraph-safe 
question mark encoding, in case some tool has trigraphs still turned on 
:-?)

All the ebst,
Stefan.