Convert from unicode chars to HTML entities
Adonis Vargas
adonis at REMOVETHISearthlink.net
Sun Jan 28 22:44:09 EST 2007
Steven D'Aprano wrote:
> I have a string containing Latin-1 characters:
>
> s = u"© and many more..."
>
> I want to convert it to HTML entities:
>
> result =>
> "© and many more..."
>
> Decimal/hex escapes would be acceptable:
> "© and many more..."
> "© and many more..."
>
> I can look up tables of HTML entities on the web (they're a dime a
> dozen), turn them into a dict mapping character to entity, then convert
> the string by hand. Is there a "batteries included" solution that doesn't
> involve reinventing the wheel?
>
>
Its *very* ugly, but im pretty sure you can make it look prettier.
import htmlentitydefs as entity
s = u"© and many more..."
t = ""
for i in s:
if ord(i) in entity.codepoint2name:
name = entity.codepoint2name.get(ord(i))
entityCode = entity.name2codepoint.get(name)
t +="&#" + str(entityCode)
else:
t += i
print t
Hope this helps.
Adonis
More information about the Python-list
mailing list