Pyhon 2.x or 3.x, which is faster?

Chris Angelico rosuav at gmail.com
Wed Mar 9 09:11:17 EST 2016


On Thu, Mar 10, 2016 at 1:03 AM, BartC <bc at freeuk.com> wrote:
> I've just tried a UTF-8 file and getting some odd results. With a file
> containing [three euro symbols]:
>
> €€€
>
> (including a 3-byte utf-8 marker at the start), and opened in text mode,
> Python 3 gives me this series of bytes (ie. the ord() of each character):
>
> 239
> 187
> 191
> 226
> 8218
> 172
> 226
> 8218
> 172
> 226
> 8218
> 172
>
> And prints the resulting string as: €€€.

The first three bytes are the "UTF-8 BOM", which suggests you may have
created this in a broken editor like Notepad.

For the rest, I'm not sure how you told Python to open this as text,
but you certainly did NOT specify an encoding of UTF-8. The 8218
entries in there are completely bogus. Can you show your code, please,
and also what you get if you open the file as binary?

Unicode handling is easy as long as you (a) understand the fundamental
difference between text and bytes, and (b) declare your encodings.
Python isn't magical. It can't know the encoding without being told.

ChrisA



More information about the Python-list mailing list