smart quotes

Peter Otten __peter__ at web.de
Tue Aug 26 03:13:46 EDT 2008


Adrian Smith wrote:

> Can anyone tell me how to get rid of smart quotes in html using
> Python? I've tried variations on
> stuff = string.replace(stuff, "\“", "\""), but to no avail, presumably
> because they're not standard ASCII.

Convert the string to unicode. For that you have to know its encoding. I
assume UTF-8:

>>> s = "a “smart quote” example"
>>> u = s.decode("utf-8")

Now you can replace the quotes (I looked up the codes in wikipedia):

>>> u.replace(u"\u201c", "").replace(u"\u201d", "")
u'a smart quote example'

Alternatively, if you have many characters to remove translate() is more
efficient:

>>> u.translate(dict.fromkeys([0x201c, 0x201d, 0x2018, 0x2019]))
u'a smart quote example'

If necessary convert the result back to the original encoding:

>>> clean = u.translate(dict.fromkeys([0x201c, 0x201d, 0x2018, 0x2019]))
>>> clean.encode("utf-8")
'a smart quote example'

Peter



More information about the Python-list mailing list