Detecteing Unicode encodings

Christos TZOTZIOY Georgiou tzot at sil-tec.gr
Sat Aug 21 15:35:22 EDT 2004


On Sat, 21 Aug 2004 10:57:34 -0700, rumours say that Jason Diamond
<jason at injektilo.org> might have written:

>If I read up to four bytes from the byte stream, I can figure out what
>encoding the stream is in but that has problems for UTF-8 streams
>without BOMs--I would have just eaten one or more bytes that might need
>to be decoded by the StreamReader. I could seek back to the beginning of
>the stream but what if the file-like object I was reading from didn't
>support seeking?

Two options pop up instantly:

1. "Programmers do it byte by byte" (mainly a joke, so go to option 2 :)

2. wrap your file-like object in a custom object, which implements a
pushback method and its read method returns first from the push-back
buffer.  If you read data that you shouldn't, push them back and give
your custom object to the StreamReader.
-- 
TZOTZIOY, I speak England very best,
"Tssss!" --Brad Pitt as Achilles in unprecedented Ancient Greek



More information about the Python-list mailing list