Wanted: python script to convert to/from UTF-8 to/from XML Entities

Fredrik Lundh fredrik at pythonware.com
Sat Aug 30 12:54:43 EDT 2008


Siegfried Heintze wrote:

> Does someone have a little python script that will read a file in 
> UTF-8/UTF-16/UTF-32 (my choice) and search for all the characters between 
> 0x7f-0xffffff and convert them to an ASCII digit string that begins with 
> "&#" and ends with ";" and output the whole thing? If not, could someone 
> tell me how to write one?

     file = open("filename.txt", "rb")
     text = file.read()
     text = unicode(text, "utf-8")
     text = text.encode("ascii", "xmlcharrefreplace")
     print text

tweak as necessary.

</F>




More information about the Python-list mailing list