[Tutor] Is there a package to "un-mangle" characters?
Albert-Jan Roskam
fomcl at yahoo.com
Sat Nov 23 13:12:35 CET 2013
--------------------------------------------
On Fri, 11/22/13, Steven D'Aprano <steve at pearwood.info> wrote:
Subject: Re: [Tutor] Is there a package to "un-mangle" characters?
To: tutor at python.org
Date: Friday, November 22, 2013, 4:30 PM
On Thu, Nov 21, 2013 at 12:04:19PM
-0800, Albert-Jan Roskam wrote:
> Hi,
>
> Today I had a csv file in utf-8 encoding, but part of
the accented
> characters were mangled. The data were scraped from a
website and it
> turned out that at least some of the data were mangled
on the website
> already. Bits of the text were actually cp1252 (or
cp850), I think,
> even though the webpage was in utf-8 Is there any
package that helps
> to correct such issues?
Python has superpowers :-)
http://blog.luminoso.com/2012/08/20/fix-unicode-mistakes-with-python/
====> Cool website! Love the corny terminology he uses. The function he created may be useful in situations where chardet, charset and icu may not be useful: a small amount of textual data that's a total mess.
More information about the Tutor
mailing list