Demo/xml/roundtrip.py

Ignacio Vazquez-Abrams ignacio at openservices.net
Sat Sep 8 16:29:46 EDT 2001


On Sat, 8 Sep 2001, Richard West wrote:

>
> I'm trying to parse the ODP (dmoz.org) RDF files with Python but I
> seem to be running into some problems.  I've compiled Expat 1.2 in but
> I'm getting errors that I believe are related to the character set.
> The sample testing document can be found here:
>
> http://dmoz.org/rdf/structure.example.txt
>
> I'm just trying to run the document through roundtrip.py to check it
> out before I attempt to use the data.
>
>  [snip]
>
> According to dmoz.org the files are UTF-8 encoded.  I've been using
> Python for awhile now but I'm new at this whole xml and unicode stuff.
> Can anyone point me in the right direction?

Right. The .rdf files may be in UTF-8, but the file at that URL isn't. Use
iconv to convert it first:

  iconv -f latin1 -t utf8 structure.example.txt -o structure.example.txt.utf8

-- 
Ignacio Vazquez-Abrams  <ignacio at openservices.net>





More information about the Python-list mailing list