[Python-Dev] TextIOWrapper.tell()

Guido van Rossum guido at python.org
Wed Jun 30 19:28:10 CEST 2010


On Wed, Jun 30, 2010 at 10:20 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Wed, 30 Jun 2010 10:03:49 -0700
> Guido van Rossum <guido at python.org> wrote:
>>
>> > Also, please note that values used by seek() and tell() on
>> > text I/O are "opaque cookies". While they can happen to match the
>> > raw binary file position, it is a mere coincidence (or an
>> > implementation detail, at your will). Therefore, reusing tell() values
>> > of a binary file to seek() a TextIOWrapper accessing the same file
>> > is wrong.
>>
>> Well, um, I actually designed it carefully so that bytes offsets
>> *would* work as text offsets in those cases where they make sense at
>> all.
>
> Ah, this is embarrassing. I always assumed it was an implementation
> detail since neither the PEP nor the module docs say otherwise.
>
> PEP 3116 clearly says:
>
> “Unlike with raw I/O, the units for .seek() are not specified - some
> implementations (e.g. StringIO) use characters and others (e.g.
> TextIOWrapper) use bytes.”
>
> And also:
>
> “.seek(pos: object, whence: int = 0) -> int
>
>    Seek to position pos. If pos is non-zero, it must be a cookie
>    returned from .tell() and whence must be zero.”
>
> “it must be a cookie returned from .tell()” here seems to imply that
> non-zero values of other origin should not be used.

Guilty as charged. I really did take care that it would work, but
forgot to mention it. I guess we can depend on this property *inside*
the stdlib (as long as there are tests for each piece of code
depending on it that would break if it ever changed) but should not
advertise it widely. Note that it doesn't go the other way -- due to
encoding state, text streams can certainly return cookies that make no
sense to binary streams. But text streams take byte offsets too and do
the best they can. (Obviously if a byte offset points in the middle of
a multibyte character all bets are off.)

The C stdlib has a similar thing -- while AFAIK POSIX lseek() really
is required to return and take byte offsets, this is not required for
fseek() and ftell() according to the C std -- but I think it's still a
pretty safe bet, and I betcha lots of apps are making this assumption.

-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list