Html entities

Fredrik Lundh fredrik at pythonware.com
Wed Mar 21 10:29:36 EST 2001


Syver Enstad wrote:
> Is there an easy way to convert ISO Latin-1 characters that are above 127
> ascii to their HTML, XML entity form?

something like this might work:

# htmlentitydefs-example-3.py
# from (the eff-bot guide to) the standard python library

import htmlentitydefs
import re, string

# this pattern matches substrings of reserved and non-ASCII characters
pattern = re.compile(r"[&<>\"\x80-\xff]+")

# create character map
entity_map = {}

for i in range(256):
    entity_map[chr(i)] = "&%d;" % i

for entity, char in htmlentitydefs.entitydefs.items():
    if entity_map.has_key(char):
        entity_map[char] = "&%s;" % entity

def escape_entity(m, get=entity_map.get):
    return string.join(map(get, m.group()), "")

def escape(string):
    return pattern.sub(escape_entity, string)

print escape("<spam&eggs>")
print escape("å i åa ä e ö")

## prints:
## <spam&eggs>
## å i åa ä e ö

Cheers /F

<!-- (the eff-bot guide to) the standard python library:
http://www.pythonware.com/people/fredrik/librarybook.htm
-->





More information about the Python-list mailing list