[Tutor] Is there a package to "un-mangle" characters?

Albert-Jan Roskam fomcl at yahoo.com
Sat Nov 23 13:12:35 CET 2013


--------------------------------------------
On Fri, 11/22/13, Steven D'Aprano <steve at pearwood.info> wrote:

 Subject: Re: [Tutor] Is there a package to "un-mangle" characters?
 To: tutor at python.org
 Date: Friday, November 22, 2013, 4:30 PM
 
 On Thu, Nov 21, 2013 at 12:04:19PM
 -0800, Albert-Jan Roskam wrote:
 > Hi,
 > 
 > Today I had a csv file in utf-8 encoding, but part of
 the accented 
 > characters were mangled. The data were scraped from a
 website and it 
 > turned out that at least some of the data were mangled
 on the website 
 > already. Bits of the text were actually cp1252 (or
 cp850), I think, 
 > even though the webpage was in utf-8 Is there any
 package that helps 
 > to correct such issues?
 
 Python has superpowers :-)
 
 http://blog.luminoso.com/2012/08/20/fix-unicode-mistakes-with-python/
 
 
 ====> Cool website! Love the corny terminology he uses. The function he created may be useful in situations where chardet, charset and icu may not be useful: a small amount of textual data that's a total mess.
 


More information about the Tutor mailing list