remove BOM from string read from utf-8 file

Piet van Oostrum piet at cs.uu.nl
Fri Feb 27 18:37:38 EST 2004


>>>>> "Achim Domma" <domma at procoders.net> (AD) wrote:

AD> "Piet van Oostrum" <piet at cs.uu.nl> wrote in message
AD> news:wzoerkinig.fsf at Ordesa.local...

>> Check text[0] and len(text) to verify.

AD> That's what I did. The file contains 24 chinese characters and len(text) is
AD> 25. And 0xef is the hex code for the BOM if I'm not completely wrong.

Sorry, I was wrong.
You have to check for text.startswith(u'\ufeff')

-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: P.van.Oostrum at hccnet.nl



More information about the Python-list mailing list