[Python-Dev] IO module precisions and exception hierarchy

Antoine Pitrou solipsis at pitrou.net
Sun Sep 27 13:49:06 CEST 2009


Le Sun, 27 Sep 2009 10:20:23 +0200, Pascal Chambon a écrit :
> Q: Do we want to mandate in the specification that switching between
> reading and writing on a read-write object implies a .flush()?

It doesn't and shouldn't be mandated in the specification, IMO. An 
implementation should be free to optimize out those "implicit flush() 
calls" for performance reasons.

> Eg. If a user opens a file in r/w mode, writes two bytes in it (which
> stay buffered), and then reads 2 bytes, the two bytes read should be
> those on range [2:4] of course, even though the file pointer would, due
> to python buffering, still be at index 0.

Actually the raw file pointer would be at index N, where N is at least 
the buffer size (say 4096).

However, it is not specified how the raw file pointer behaves when using 
a Buffered{Reader, Writer, Random} wrapper over it. The buffered object 
is free to do what it wants with the raw stream until flush() is called 
(or the file is closed, which calls flush() implicitly).
Even after flush() is called, the raw file pointer can still be at 
whatever place, but the raw file contents are consistent with the view 
given by the buffered object.

> Q from me : What happens in read/write text files, when overwriting a
> three-bytes character with a single-byte character ? Or at the contrary,
> when a single chinese character overrides 3 ASCII characters in an UTF8
> file ? Is there any system designed to avoid this data corruption ? Or
> should TextIO classes forbid read+write streams ?

What happens isn't specified, but in practice (with the current 
implementation) the overwriting will happen at the byte level, without 
any check for correctness at the character level.

Actually, read+write text streams are implemented quite crudely, and 
little testing is done of them. The reason, as you discovered, is that 
the semantics are too weak, and it is not obvious how stronger semantics 
could look like. People wanting to do sophisticated random reads+writes 
over a text file should probably handle the encoding themselves and 
access the file at the binary level.

> Here is a very rough beginning of IOError hierarchy. I'd liek to have
> people's opinion on the relevance of these, as well as on what other
> exceptions should be distinguished from basic IOErrors.

This deserves its own PEP IMO, although I'm not sure it would be accepted 
(ISTR the idea of a detailed IO exception hierarchy was already refused 
in the past).

Regards

Antoine.




More information about the Python-Dev mailing list