[Python-ideas] Add "htmlcharrefreplace" error handler

Serhiy Storchaka storchaka at gmail.com
Tue Jun 11 16:49:51 CEST 2013


I propose to add "htmlcharrefreplace" error handler which is similar to 
"xmlcharrefreplace" error handler but use html entity names if possible.

 >>> '∀ x∈ℜ'.encode('ascii', 'xmlcharrefreplace')
b'∀ x∈ℜ'
 >>> '∀ x∈ℜ'.encode('ascii', 'htmlcharrefreplace')
b'∀ x∈ℜ'

Possible implementation:

import codecs
from html.entities import codepoint2name

def htmlcharrefreplace_errors(exc):
     if not isinstance(exc, UnicodeEncodeError):
         raise exc
     try:
         replace = r'&%s;' % codepoint2name[ord(exc.object[exc.start])]
     except KeyError:
         return codecs.xmlcharrefreplace_errors(exc)
     return replace, exc.start + 1

codecs.register_error('htmlcharrefreplace', htmlcharrefreplace_errors)

Even if do not register this handler from the start, it may be worth to 
provide htmlcharrefreplace_errors() in the html or html.entities module.



More information about the Python-ideas mailing list