Unicode and rdf

Paul Prescod paul at prescod.net
Wed Mar 10 14:24:45 EST 2004


Richard West wrote:
 > I'm trying to parse the rdf dumps from dmoz.org (Open Directory
 > Project) and am having great difficulty just getting Python to read
 > the files.  The files are RDF in UTF-8 encoding according to the
 > dmoz.org web site, but I get the following error:
 >
 > UnicodeDecodeError: 'utf8' codec can't decode bytes in position
 > 52376-52378: invalid data


Perhaps you could try using another XML parser or validator unrelated to 
Python. I am 90% confident that you will find that it will report the 
same problem. For instance you could use "xmlwf" that comes with Expat

http://sourceforge.net/projects/expat/

  Paul Prescod





More information about the Python-list mailing list