Detecteing Unicode encodings
Christos TZOTZIOY Georgiou
tzot at sil-tec.gr
Sat Aug 21 15:35:22 EDT 2004
On Sat, 21 Aug 2004 10:57:34 -0700, rumours say that Jason Diamond
<jason at injektilo.org> might have written:
>If I read up to four bytes from the byte stream, I can figure out what
>encoding the stream is in but that has problems for UTF-8 streams
>without BOMs--I would have just eaten one or more bytes that might need
>to be decoded by the StreamReader. I could seek back to the beginning of
>the stream but what if the file-like object I was reading from didn't
>support seeking?
Two options pop up instantly:
1. "Programmers do it byte by byte" (mainly a joke, so go to option 2 :)
2. wrap your file-like object in a custom object, which implements a
pushback method and its read method returns first from the push-back
buffer. If you read data that you shouldn't, push them back and give
your custom object to the StreamReader.
--
TZOTZIOY, I speak England very best,
"Tssss!" --Brad Pitt as Achilles in unprecedented Ancient Greek
More information about the Python-list
mailing list