UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to <undefined>

bellcanadardp at gmail.com bellcanadardp at gmail.com
Sun Jun 3 19:36:12 EDT 2018


On Tuesday, 22 May 2018 17:23:55 UTC-4, Peter J. Holzer  wrote:
> On 2018-05-20 15:43:54 +0200, Karsten Hilbert wrote:
> > On Sun, May 20, 2018 at 04:59:12AM -0700, bellcanadardp at gmail.com wrote:
> > 
> > > On Saturday, 19 May 2018 19:48:20 UTC-4, Skip Montanaro  wrote:
> > > > As Chris indicated, you'll have to figure out the correct encoding. You
> > > > might want to check out the chardet module (available on PyPI, I believe)
> > > > and see if it can come up with a better guess. I imagine there are other
> > > > encoding guessers out there. That's just one I'm familiar with.
> > > 
> > > thank you for the reply, but how exactly am i supposed to find oout what is the correct encodeing??
> > 
> > One CAN NOT.
> > 
> > The best you can do is to go ask the canonical source of the
> > file what encoding the file is _supposed_ to be in.
> 
> I disagree on both counts.
> 
> 1) For any given file it is almost always possible to find the correct
>    encoding (or *a* correct encoding, as there may be more than one).
> 
>    This may require domain-specific knowledge (e.g. it may be necessary
>    to recognize the human language and know at least some distinctive
>    words, or to know some special symbols likely to be used in a data
>    file), and it almost always takes a bit of detective work and trial
>    and error. But I don't think I ever encountered a file where I
>    couldn't figure out the encoding.
> 
>    (If you have several files in the same encoding, it may not be
>    possible to figure out the encoding from a subset of them. For
>    example, the files may all be in ISO-8859-2, but the subset you have
>    contains only characters <= 0x7F. But if you have several files, they
>    may not all be the same encoding, either).
> 
> 2) The canonical source of the file may not know. This is quite frequent
>    when the source is some non-technical person. Then you get answers
>    like "it's ASCII" (although the file contains umlauts, which aren't
>    in ASCII) or "it's ANSI" (which isn't an encoding, although Windows
>    pretends it is). Or they may not be aware that the file is converted
>    somewhere in the pipeline, to that the file they generated isn't
>    actually the file you received. So ask (or check the docs), but
>    verify!
> 
>         hp
> 
> -- 
>    _  | Peter J. Holzer    | we build much bigger, better disasters now
> |_|_) |                    | because we have much more sophisticated
> | |   | hjp at hjp.at         | management tools.
> __/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

hello peter ...how exactly would i solve this issue .....i have a script that works in python 2 but not pytho3..i did 2 to 3.py ...but i still get the errro...character undefieed..unicode decode error cant decode byte 1x09 in line 7414 from cp 1252..like would you have a sraright solution answer??..i cant get a straight answer..it was ported from ansi to python...so its utf-8 as far asi can see



More information about the Python-list mailing list