Unicode and rdf
Paul Prescod
paul at prescod.net
Wed Mar 10 14:24:45 EST 2004
Richard West wrote:
> I'm trying to parse the rdf dumps from dmoz.org (Open Directory
> Project) and am having great difficulty just getting Python to read
> the files. The files are RDF in UTF-8 encoding according to the
> dmoz.org web site, but I get the following error:
>
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position
> 52376-52378: invalid data
Perhaps you could try using another XML parser or validator unrelated to
Python. I am 90% confident that you will find that it will report the
same problem. For instance you could use "xmlwf" that comes with Expat
http://sourceforge.net/projects/expat/
Paul Prescod
More information about the Python-list
mailing list