Convert from unicode chars to HTML entities

Adonis Vargas adonis at REMOVETHISearthlink.net
Sun Jan 28 22:44:09 EST 2007


Steven D'Aprano wrote:
> I have a string containing Latin-1 characters:
> 
> s = u"© and many more..."
> 
> I want to convert it to HTML entities:
> 
> result =>
> "© and many more..."
> 
> Decimal/hex escapes would be acceptable:
> "© and many more..."
> "© and many more..."
> 
> I can look up tables of HTML entities on the web (they're a dime a
> dozen), turn them into a dict mapping character to entity, then convert
> the string by hand. Is there a "batteries included" solution that doesn't
> involve reinventing the wheel?
> 
> 

Its *very* ugly, but im pretty sure you can make it look prettier.

import htmlentitydefs as entity

s = u"© and many more..."
t = ""
for i in s:
     if ord(i) in entity.codepoint2name:
         name = entity.codepoint2name.get(ord(i))
         entityCode = entity.name2codepoint.get(name)
         t +="&#" + str(entityCode)
     else:
         t += i
print t

Hope this helps.

Adonis



More information about the Python-list mailing list