Newbie question about text encoding

Chris Angelico rosuav at gmail.com
Tue Feb 24 06:09:30 EST 2015


On Tue, Feb 24, 2015 at 9:49 PM,  <pierrick.brihaye at gmail.com> wrote:
> Working with pyshp, this is my code :
>
> import shapefile
>
> inFile = shapefile.Reader("blah")
>
> for sr in inFile.shapeRecords():
>     rec = sr.record[2]
>     print("Output : ", rec, type(rec))
>
> Output:  hippodrome du resto <class 'str'>
> Output:  b'stade de man\xe9 braz' <class 'bytes'>
>
> Why do I get 2 different types ?
> How to get a string object when I have accented characters ?

I don't know what pyshp is doing here, so you may want to seek a
pyshp-specific mailing list for help. My guess is that it's
automatically decoding to str if it's ASCII-only, and giving you back
the raw bytes if there are any that it can't handle. The question is:
What encoding _is_ that? Do you know what character you're expecting
to see there? Before you can turn that into a string, you have to
figure out whether it's Latin-1 (ISO-8859-1), or some other ISO-8859-x
standard, or a Windows codepage, or an ancient thing off a Mac, or
whatever else it might be. Once you know that, it's easy: you just
decode() the bytes objects. But you MUST figure out the encoding
first.

ChrisA



More information about the Python-list mailing list