Pound sign problem

Tim Chase python.list at tim.thechases.com
Tue Apr 11 12:55:06 EDT 2017


On 2017-04-12 02:29, Steve D'Aprano wrote:
> >> In 2017, unless you are reading from old legacy files created
> >> using a non-Unicode encoding, you should just use UTF-8.  
> > 
> > Thanks for your opinion. My opinion differs.  
> 
> What would you suggest then, if not UTF-8?
> 
> My personal favourite legacy encoding is MacRoman, but I wouldn't
> recommend anyone use it except to interoperate with legacy Mac
> applications and/or data from the 80s and 90s.
> 
> What's your recommendation? "Anything but ASCII"?

Heh, how about "Unicode as ASCII-compatible-Python-strings"? ;-)

Got this from Peter Otten a while back in response to my request for
functionality something like this.

http://www.mail-archive.com/python-list@python.org/msg420100.html

-tkc



$ cat codecs_mynamereplace.py
# -*- coding: utf-8 -*-
import codecs
import unicodedata

try:
    codecs.namereplace_errors
except AttributeError:
    print("using mynamereplace")
    def mynamereplace(exc):
        return u"".join(
            "\\N{%s}" % unicodedata.name(c)
            for c in exc.object[exc.start:exc.end]
        ), exc.end
    codecs.register_error("namereplace", mynamereplace)


print(u"mañana".encode("ascii", "namereplace").decode())
$ python3.5 codecs_mynamereplace.py
ma\N{LATIN SMALL LETTER N WITH TILDE}ana
$ python3.4 codecs_mynamereplace.py
using mynamereplace
ma\N{LATIN SMALL LETTER N WITH TILDE}ana
$ python2.7 codecs_mynamereplace.py
using mynamereplace
ma\N{LATIN SMALL LETTER N WITH TILDE}ana



More information about the Python-list mailing list