Newbie question about text encoding

Dave Angel davea at davea.name
Tue Feb 24 06:25:24 EST 2015


On 02/24/2015 05:49 AM, pierrick.brihaye at gmail.com wrote:
> Hello,
>
> Working with pyshp, this is my code :

What version of Python, what version of pyshp, from where, and what OS? 
  These are the first information to supply in any query that goes 
outside of the standard library.

For example, you might be running CPython 3.2.1 on Ubuntu 14.04.1, and 
installed pyshp 1.2.1 from https://pypi.python.org/pypi/pyshp

Or some other combination.

>
> import shapefile
>
> inFile = shapefile.Reader("blah")
>
> for sr in inFile.shapeRecords():
>      rec = sr.record[2]
>      print("Output : ", rec, type(rec))
>
> Output:  hippodrome du resto <class 'str'>
> Output:  b'stade de man\xe9 braz' <class 'bytes'>
>
> Why do I get 2 different types ?
> How to get a string object when I have accented characters ?
>
> Thank you,
>
> p.b.
>

 From my (cursory) reading of the pyshp docs on the above page, I cannot 
see what the [2] element of the record list should look like.  So I'd 
have to guess.

The bytes object is presumably an encoded version of the character 
string.  I don't see anything on that page about unicode, or decode, so 
you might have to guess the encoding.  Anyway, you can decode the 
bytestring into a regular string if you can correctly guess the encoding 
method, such as utf-8.

If that were the right decoding, you could just use
     mystring = rec.decode()

But utf-8 does not seem to be the right encoding for that bytestring. 
So you'll need a form like:
     mystring = rec.decode(encoding='xxx')

for some value of xxx.







-- 
DaveA



More information about the Python-list mailing list