Eclipse/PyDev - BOM Lexical Error

Diez B. Roggisch deets at web.de
Thu Oct 7 12:54:55 EDT 2010


Lawrence D'Oliveiro <ldo at geek-central.gen.new_zealand> writes:

> In message <87d3rorf2f.fsf at web.de>, Diez B. Roggisch wrote:
>
>> Lawrence D'Oliveiro <ldo at geek-central.gen.new_zealand> writes:
>> 
>>> What exactly is the point of a BOM in a UTF-8-encoded file?
>> 
>> It's a marker like the "coding: utf-8" in python-files. It tells the
>> software aware of it that the content is UTF-8.
>
> But if the software is aware of it, then why does it need to be told?

Let me rephrase: windows editors such as notepad recognize the BOM, and
then assume (hopefully rightfully so) that the rest of the file is text
in utf-8 encoding.

So it is similar to the coding-header in Python.

>
>> Naming it "BOM" is obviously stupid, but that's the way it is called.
>
> It is in fact a Unicode BOM character, and I can understand why it’s called 
> that. What I’m trying to understand is why you need to put one in a UTF-8-
> encoded file.

I hope that's clear now. It says "I'm a UTF-8 file".

Diez



More information about the Python-list mailing list