Demo/xml/roundtrip.py
Ignacio Vazquez-Abrams
ignacio at openservices.net
Sat Sep 8 16:29:46 EDT 2001
On Sat, 8 Sep 2001, Richard West wrote:
>
> I'm trying to parse the ODP (dmoz.org) RDF files with Python but I
> seem to be running into some problems. I've compiled Expat 1.2 in but
> I'm getting errors that I believe are related to the character set.
> The sample testing document can be found here:
>
> http://dmoz.org/rdf/structure.example.txt
>
> I'm just trying to run the document through roundtrip.py to check it
> out before I attempt to use the data.
>
> [snip]
>
> According to dmoz.org the files are UTF-8 encoded. I've been using
> Python for awhile now but I'm new at this whole xml and unicode stuff.
> Can anyone point me in the right direction?
Right. The .rdf files may be in UTF-8, but the file at that URL isn't. Use
iconv to convert it first:
iconv -f latin1 -t utf8 structure.example.txt -o structure.example.txt.utf8
--
Ignacio Vazquez-Abrams <ignacio at openservices.net>
More information about the Python-list
mailing list