UnicodeDecodeError issue

Wed Sep 4 08:38:04 EDT 2013

On 4/9/2013 07:38, Ferrous Cranus wrote:

> Στις 4/9/2013 2:26 μμ, ο/η Dave Angel έγραψε:

>>
>>>>
>>>> So first in the interpreter, I ran
>>>>
>>>>
>>>>
>>>>>>>> f = open("junk.txt", "w")
>>>>
>>>>>>>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>>>>
>>>>>>>> f.close()
>>>>
>>>>
>>>>
         <snip>
>> So since the tets.py file was a sidetrack, I just ran those three lines
>> in the interpreter.
>>
> I'm still consused about this.
>
> say we save those 3 lines inside junk.txt and we save it by default as utf-8
>
> when we 'file junk.txt'
>
> what will file respond with?

junk2.txt: ASCII text

>
> filename's charset?
>
> or
>
> will it llook at the bystering within to decide what encoding it uses?
>

'file' isn't magic.  And again, it doesn't look at the filename, it
looks at the content.  What heuristics it uses, I don't know, but it has
hundreds of them.   ( I wish you hadn't confused the issue by using the
same name junk.txt for an entirely different purpose) When it looks at a
file like this one, it looks only at the bytes within it. In this
case, the instance of 'file' on my machine decides it's an ASCII file.

if I add an silly shebang line

#!/usr/tmp/pyttthon

it says
junk2.txt: a /usr/tmp/pyttthon script, ASCII text executable

It doesn't know it's python, it just trusts the shebang line.  And it
identifies it as ASCII, not utf-8, since there are no non-ascii
characters in it.  It certainly does not try to interpret the b'xxxx'
byte string by Python syntax rules.

-- 
DaveA