Eclipse/PyDev - BOM Lexical Error

Steven D'Aprano steve-REMOVE-THIS at cybersource.com.au
Thu Oct 14 02:40:52 EDT 2010


On Thu, 14 Oct 2010 16:41:13 +1300, Lawrence D'Oliveiro wrote:

> In message <mailman.1544.1286800257.29448.python-list at python.org>, Ethan
> Furman wrote:
> 
>> Lawrence D'Oliveiro wrote:
>>
>>> In message <mailman.1533.1286774527.29448.python-list at python.org>,
>>> Ethan Furman wrote:
>>> 
>>>>Lawrence D'Oliveiro wrote:
>>>>
>>>>>In message <mailman.1466.1286556950.29448.python-list at python.org>,
>>>>>Ethan Furman wrote:
>>>>>
>>>>MS treats those first three bytes as a flag -- if they equal the BOM,
>>>>MS treats it as UTF-8, if they equal anything else, MS does not treat
>>>>it as UTF-8.
>>> 
>>> So what does it treat it as? You previously gave examples of flag
>>> values for dBase III. What are the flag values for Windows-1252,
>>> versus, say, ISO-8859-15?
>> 
>> I am not aware of any other flag values for text files besides the BOM
>> for UTF-8.
> 
> Then how can you say “MS treats those first three bytes as a flag”,
> then?

Because Microsoft tools treat those first three bytes as a flag. An 
*optional* flag, but still a flag. If the first three bytes of a text 
file equal the UTF-8 BOM, most MS tools treat them as a BOM. If they 
equal any other value, then they are not treated as a BOM, but merely 
part of the file's contents.

http://blogs.msdn.com/b/oldnewthing/archive/2004/03/24/95235.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx

It's not just Notepad either:

http://support.microsoft.com/kb/301623
http://msdn.microsoft.com/en-us/library/cc295463.aspx


The Python interpreter does the same thing too:

http://docs.python.org/reference/lexical_analysis.html#encoding-declarations



-- 
Steven



More information about the Python-list mailing list