Eclipse/PyDev - BOM Lexical Error

Ethan Furman ethan at stoneleaf.us
Mon Oct 11 08:04:09 EDT 2010


Lawrence D'Oliveiro wrote:
> In message <mailman.1533.1286774527.29448.python-list at python.org>, Ethan 
> Furman wrote:
> 
> 
>>Lawrence D'Oliveiro wrote:
>>
>>
>>>In message <mailman.1466.1286556950.29448.python-list at python.org>, Ethan
>>>Furman wrote:
>>>
>>>
>>>>Lawrence D'Oliveiro wrote:
>>>>
>>>>
>>>>>But they can only recognize it as a BOM if they assume UTF-8 encoding to
>>>>>begin with. Otherwise it could be interpreted as some other coding.
>>>>
>>>>Not so.  The first three bytes are the flag.
>>>
>>>But this is just a text file. All parts of its contents are text, there
>>>is no “flag”.
>>>
>>>If you think otherwise, then tell us what are these three “flag” bytes
>>>for a Windows-1252-encoded text file?
>>
>>MS treats those first three bytes as a flag -- if they equal the BOM, MS
>>treats it as UTF-8, if they equal anything else, MS does not treat it as
>>UTF-8.
> 
> 
> So what does it treat it as? You previously gave examples of flag values for 
> dBase III. What are the flag values for Windows-1252, versus, say, 
> ISO-8859-15?

I am not aware of any other flag values for text files besides the BOM 
for UTF-8.  If the BOM is not there, I imagine MS defaults to whatever 
the locale for that machine is, but I do not know for sure.

~Ethan~



More information about the Python-list mailing list