UnicodeDecodeError issue
Dave Angel
davea at davea.name
Wed Sep 4 08:38:04 EDT 2013
On 4/9/2013 07:38, Ferrous Cranus wrote:
> Στις 4/9/2013 2:26 μμ, ο/η Dave Angel έγραψε:
>>
>>>>
>>>> So first in the interpreter, I ran
>>>>
>>>>
>>>>
>>>>>>>> f = open("junk.txt", "w")
>>>>
>>>>>>>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>>>>
>>>>>>>> f.close()
>>>>
>>>>
>>>>
<snip>
>> So since the tets.py file was a sidetrack, I just ran those three lines
>> in the interpreter.
>>
> I'm still consused about this.
>
> say we save those 3 lines inside junk.txt and we save it by default as utf-8
>
> when we 'file junk.txt'
>
> what will file respond with?
junk2.txt: ASCII text
>
> filename's charset?
>
> or
>
> will it llook at the bystering within to decide what encoding it uses?
>
'file' isn't magic. And again, it doesn't look at the filename, it
looks at the content. What heuristics it uses, I don't know, but it has
hundreds of them. ( I wish you hadn't confused the issue by using the
same name junk.txt for an entirely different purpose) When it looks at a
file like this one, it looks only at the bytes within it. In this
case, the instance of 'file' on my machine decides it's an ASCII file.
if I add an silly shebang line
#!/usr/tmp/pyttthon
it says
junk2.txt: a /usr/tmp/pyttthon script, ASCII text executable
It doesn't know it's python, it just trusts the shebang line. And it
identifies it as ASCII, not utf-8, since there are no non-ascii
characters in it. It certainly does not try to interpret the b'xxxx'
byte string by Python syntax rules.
--
DaveA
More information about the Python-list
mailing list