Pyhon 2.x or 3.x, which is faster?

BartC bc at freeuk.com
Wed Mar 9 09:39:29 EST 2016


On 09/03/2016 14:11, Chris Angelico wrote:
> On Thu, Mar 10, 2016 at 1:03 AM, BartC <bc at freeuk.com> wrote:
>> I've just tried a UTF-8 file and getting some odd results. With a file
>> containing [three euro symbols]:
>>
>> €€€
>>
>> (including a 3-byte utf-8 marker at the start), and opened in text mode,
>> Python 3 gives me this series of bytes (ie. the ord() of each character):
>>
>> 239
>> 187
>> 191
>> 226
>> 8218
>> 172
>> 226
>> 8218
>> 172
>> 226
>> 8218
>> 172
>>
>> And prints the resulting string as: €€€.
>
> The first three bytes are the "UTF-8 BOM", which suggests you may have
> created this in a broken editor like Notepad.

Yes, that's what I used, but what's broken about it? If Python doesn't 
understand the BOM, it should still resynchronise after a few bytes.

 > For the rest, I'm not sure how you told Python to open this as text,
 > but you certainly did NOT specify an encoding of UTF-8. The 8218
 > entries in there are completely bogus. Can you show your code, please,
 > and also what you get if you open the file as binary?

This is the code:

f=open("input","r")
t=f.read(1000)
f.close()

print ("T",type(t),len(t))

print (t)

for i in t:
	print (ord(i))

This doesn't specify any specific code encoding; I don't know how, and 
Steven didn't mention anything other than a text file. The input data is 
represented by this dump, and this is also what binary mode gives:

0000: ef bb bf e2 82 ac e2 82 ac e2 82 ac    ............

> Unicode handling is easy as long as you (a) understand the fundamental
> difference between text and bytes, and (b) declare your encodings.
> Python isn't magical. It can't know the encoding without being told.

Hence the BOM bytes.

(Isn't it better that it's automatic? Someone sends you a text file that 
you want to open within a Python program. Are you supposed to analyze it 
first, or expect the sender to tell you what it is (they probably won't 
know) then need to hack the program to read it properly?)

-- 
Bartc




More information about the Python-list mailing list