[Python-Dev] IO module precisions and exception hierarchy
Antoine Pitrou
solipsis at pitrou.net
Sun Sep 27 13:49:06 CEST 2009
Le Sun, 27 Sep 2009 10:20:23 +0200, Pascal Chambon a écrit :
> Q: Do we want to mandate in the specification that switching between
> reading and writing on a read-write object implies a .flush()?
It doesn't and shouldn't be mandated in the specification, IMO. An
implementation should be free to optimize out those "implicit flush()
calls" for performance reasons.
> Eg. If a user opens a file in r/w mode, writes two bytes in it (which
> stay buffered), and then reads 2 bytes, the two bytes read should be
> those on range [2:4] of course, even though the file pointer would, due
> to python buffering, still be at index 0.
Actually the raw file pointer would be at index N, where N is at least
the buffer size (say 4096).
However, it is not specified how the raw file pointer behaves when using
a Buffered{Reader, Writer, Random} wrapper over it. The buffered object
is free to do what it wants with the raw stream until flush() is called
(or the file is closed, which calls flush() implicitly).
Even after flush() is called, the raw file pointer can still be at
whatever place, but the raw file contents are consistent with the view
given by the buffered object.
> Q from me : What happens in read/write text files, when overwriting a
> three-bytes character with a single-byte character ? Or at the contrary,
> when a single chinese character overrides 3 ASCII characters in an UTF8
> file ? Is there any system designed to avoid this data corruption ? Or
> should TextIO classes forbid read+write streams ?
What happens isn't specified, but in practice (with the current
implementation) the overwriting will happen at the byte level, without
any check for correctness at the character level.
Actually, read+write text streams are implemented quite crudely, and
little testing is done of them. The reason, as you discovered, is that
the semantics are too weak, and it is not obvious how stronger semantics
could look like. People wanting to do sophisticated random reads+writes
over a text file should probably handle the encoding themselves and
access the file at the binary level.
> Here is a very rough beginning of IOError hierarchy. I'd liek to have
> people's opinion on the relevance of these, as well as on what other
> exceptions should be distinguished from basic IOErrors.
This deserves its own PEP IMO, although I'm not sure it would be accepted
(ISTR the idea of a detailed IO exception hierarchy was already refused
in the past).
Regards
Antoine.
More information about the Python-Dev
mailing list