Eclipse/PyDev - BOM Lexical Error
Steven D'Aprano
steve-REMOVE-THIS at cybersource.com.au
Thu Oct 14 02:40:52 EDT 2010
On Thu, 14 Oct 2010 16:41:13 +1300, Lawrence D'Oliveiro wrote:
> In message <mailman.1544.1286800257.29448.python-list at python.org>, Ethan
> Furman wrote:
>
>> Lawrence D'Oliveiro wrote:
>>
>>> In message <mailman.1533.1286774527.29448.python-list at python.org>,
>>> Ethan Furman wrote:
>>>
>>>>Lawrence D'Oliveiro wrote:
>>>>
>>>>>In message <mailman.1466.1286556950.29448.python-list at python.org>,
>>>>>Ethan Furman wrote:
>>>>>
>>>>MS treats those first three bytes as a flag -- if they equal the BOM,
>>>>MS treats it as UTF-8, if they equal anything else, MS does not treat
>>>>it as UTF-8.
>>>
>>> So what does it treat it as? You previously gave examples of flag
>>> values for dBase III. What are the flag values for Windows-1252,
>>> versus, say, ISO-8859-15?
>>
>> I am not aware of any other flag values for text files besides the BOM
>> for UTF-8.
>
> Then how can you say “MS treats those first three bytes as a flag”,
> then?
Because Microsoft tools treat those first three bytes as a flag. An
*optional* flag, but still a flag. If the first three bytes of a text
file equal the UTF-8 BOM, most MS tools treat them as a BOM. If they
equal any other value, then they are not treated as a BOM, but merely
part of the file's contents.
http://blogs.msdn.com/b/oldnewthing/archive/2004/03/24/95235.aspx
http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx
It's not just Notepad either:
http://support.microsoft.com/kb/301623
http://msdn.microsoft.com/en-us/library/cc295463.aspx
The Python interpreter does the same thing too:
http://docs.python.org/reference/lexical_analysis.html#encoding-declarations
--
Steven
More information about the Python-list
mailing list