[Tutor] Is there a package to "un-mangle" characters?

Mark Lawrence breamoreboy at yahoo.co.uk
Thu Nov 21 21:40:35 CET 2013


On 21/11/2013 20:04, Albert-Jan Roskam wrote:
> Hi,
>
> Today I had a csv file in utf-8 encoding, but part of the accented characters were mangled. The data were scraped from a website and it turned out that at least some of the data were mangled on the website already. Bits of the text were actually cp1252 (or cp850), I think, even though the webpage was in utf-8 Is there any package that helps to correct such issues? (I tried looking for one but it doesn't really help that there is such a thing as name mangling! ;-) This comes pretty close though: https://gist.github.com/litchfield/1282752
>

1) Would something like this help 
https://pypi.python.org/pypi/charset/1.0.1 ?

2) Surely this topic is too advanced for a tutor mailing list?

-- 
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence



More information about the Tutor mailing list