write xml to txt encoding
Stefan Behnel
stefan_ml at behnel.de
Thu Jul 29 08:54:54 EDT 2010
William Johnston, 29.07.2010 14:12:
> I have a Python app that parses XML files and then writes to text files.
XML or HTML?
> However, the output text file is "sometimes" encoded in some Asian language.
>
> Here is my code:
>
>
> encoding = "iso-8859-1"
>
> clean_sent = nltk.clean_html(sent.text)
>
> clean_sent = clean_sent.encode(encoding, "ignore");
>
>
> I also tried "UTF-8" encoding, but received the same results.
What result?
Maybe the NLTK cannot determine the encoding of the HTML file (because the
file is broken and/or doesn't correctly specify its own encoding) and thus
fails to decode it?
Stefan
More information about the Python-list
mailing list