smart quotes

Adrian Smith adrian_p_smith at yahoo.com
Tue Aug 26 18:30:28 EDT 2008


On Aug 26, 4:13 pm, Peter Otten <__pete... at web.de> wrote:
> Adrian Smith wrote:
> > Can anyone tell me how to get rid of smart quotes in html using
> > Python? I've tried variations on
> > stuff = string.replace(stuff, "\“", "\""), but to no avail, presumably
> > because they're not standard ASCII.
>
> Convert the string to unicode. For that you have to know its encoding. I
> assume UTF-8:
>
> >>> s = "a “smart quote” example"
> >>> u = s.decode("utf-8")
>
> Now you can replace the quotes (I looked up the codes in wikipedia):
>
> >>> u.replace(u"\u201c", "").replace(u"\u201d", "")
>
> u'a smart quote example'
>
> Alternatively, if you have many characters to remove translate() is more
> efficient:
>
> >>> u.translate(dict.fromkeys([0x201c, 0x201d, 0x2018, 0x2019]))
>
> u'a smart quote example'
>
> If necessary convert the result back to the original encoding:
>
> >>> clean = u.translate(dict.fromkeys([0x201c, 0x201d, 0x2018, 0x2019]))
> >>> clean.encode("utf-8")
>
> 'a smart quote example'
>
> Peter

Brilliant, thanks!



More information about the Python-list mailing list