[Python-ideas] TextIOWrapper callable encoding parameter

Stephen J. Turnbull stephen at xemacs.org
Mon Jun 11 18:24:20 CEST 2012


Nick Coghlan writes:

 > Immediate thought: it seems like it would be easier to offer a way to
 > inject data back into a buffered IO object's internal buffer.

ungetch()?

If you're only interested in the top of the file (see below), I would
suggest allowing only one bufferfull, and then simply rewinding the
buffer pointer once you're done.  This is one strategy used by Emacsen
for encoding detection (for the reason pointed out by Rurpy: not all
streams are rewindable).

But is that really "easier"?  It might be more general, but you still
need to reinitialize the encoding (ie, from the trivial "binary" to
whatever is detected), with all the hair that comes with that.

 > > Executive summary:
 > > ==================
 > >
 > > There is no good way to read a text file when the
 > > encoding has to be determined by reading the start
 > > of the file.  A long-winded version of that follows.
 > > Scroll down the the "Proposal" section to skip it.

This may be insufficiently general.  Specifically, both Emacsen and vi
allow specification of editor configuration variables at the bottom of
the file as well as the top.  I don't know whether vi allows encoding
specs at the bottom, but Emacsen do (but only for files).

I wouldn't recommend paying much attention to what Emacsen actually
*do* when initializing a stream (it's, uh, "baroque").



More information about the Python-ideas mailing list