file object, details of modes and some issues.

Christos TZOTZIOY Georgiou tzot at sil-tec.gr
Tue Aug 26 14:11:52 EDT 2003


On Tue, 26 Aug 2003 16:43:08 +0100, rumours say that simon place
<simon_place at lineone.net> might have written:

>is the code below meant to produce rubbish?, i had expected an exception.
>
>f=file('readme.txt','w')
>f.write(' ')
>f.read()
>
>( PythonWin 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)] on
>win32. )

[snip]

>     'w' is (w)rite mode so you can't read from the file, ( any existing file is
>erased or a new file created, and bear in mind that anything you write to the
>file can't be read back directly on this object.), you get 'IOError: [Errno 9]
>Bad file descriptor' if you try reading, which is an awful error description.
>BUT this only happens at the beginning of the file? when at the end of the
>file, as is the case when you have just written something ( without a backward
>seek, see below), you don't get an exception, but lots of rubbish data ( see
>example at beginning.) This mode allows you to seek backward and rewrite
>data, but if you try a read somewhere between the first character and the end,
>you get a different exception 'IOError: (0, 'Error')'

This is a bug (reading after writing on 'w' files returns garbage),
although I don't know if it is a Python bug or stdio's.  It seems to
happen only on Windows (Linux and Irix work correctly throwing IOError
with errno = 9, "bad file descriptor"), and I don't have a VS6 available
now.  Anyone out there willing to try?

>     'a' is (a)ppend mode, you can only add to the file, so basically write mode
>(with the same problems ) plus a seek to the end, obviously append doesn't
>erase an existing file and it also ignores file seeks, so all writes pile up
>at the end. tell() gives the correct location in the file after a write ( so
>actually always gives the length of the file.) but if you seek() you don't get
>an exception and tell() returns the new value but writes actually go to the
>end of the file, so if you use tell() to find out where writes are going, in
>this mode it might not always be right.

This is stdio behaviour.  Perhaps Python should try to provide a more
stable environment than the underlying library as it does in other
places.

>     'r+' is (r)ead (+) update, which means read and write access,  but
>don't read, without backward seeking, after a write because it will then read
>a lot of garbage.( the rest of the disk fragment/buffer i guess? )

stdio behaviour again.

>     'a+' is (a)ppend (+) update mode, which also means read and write, but
>file seeks are ignored,

No, they're not (on Linux at least).  I do use this mode in a script,
where I want to append to a file without duplicating data, so I first do
a .seek(0), read the existing data, and then do my .write(data).  Works
fine.

>so any reads seems a bit pointless since they always
>read past the end of the file!

I did try that on Windows (2k SP4) and you are absolutely correct;
either opening the file as text or binary, a .seek(0) followed by
.read() fetches garbage.  I will open a bug report if none other does,
but first I would like to know if it's the Windows stdio to blame or
not.

>returning garbage, but it does extend
>the file, so this garbage becomes incorporated in the file!! ( yes really )

Ouch!  I replicated that behaviour too...

>     'b', all modes can have a 'b' appended to indicate binary mode, i think this
>is something of a throw-back to serial comms ( serial comms being bundled into
>the same handlers as files because when these things were developed, 20+ years
>ago, nothing better was around. )

Treating everything as a file (common API to open, read and write)
seemed ingenious to me when I first met Unix 15 years ago; extra calls 
to accomodate extra features a la ioctl seems fine too.  'nothing better
was around' seems strange, sounds as a compromise... what would be / is
better?

>Binary mode turns off the 'clever' handling
>of line ends and ( depending on use and os ) other functional characters (
>tabs expanded to spaces etc ), the normal mode is already binary on windows so

Normal mode on windows is text, not binary.  AFAIK text mode == binary
mode only on *nix (VMS was CR+LF I believe, MAC is CR, MS "Doors" :) and
MS Windows is CR+LF, but I haven't used all OS'es since the beginning of
time).

>binary makes no difference on win32 files. But since in may do on other
>o.s.'s, ( or when actually using the file object for serial comms.) i think

What would be so different for serial comms?  Talking about *nix, by the
time that data arrive to the file layer, any 'stty tab?' etc have
already been implemented... if your tabs have been converted to spaces
because of the line settings, opening the tty in binary mode won't
change a thing.  And LF's are LF's.

>you should actually ALWAYS use the binary version of the mode, and handle the
>line ends etc. yourself. ( then of course you'll have to deal with the
>different line end types!)

>     Bit surprised that the file object doesn't do ANY access control, multiple
>file objects on the same actual file can ALL write to it!! and other software
>can edit files opened for writing by the file object. However a write lock on
>a file made by other software cause a 'IOError: [Errno 13] Permission denied'
>when opened by python with write access. i guess you need
>os.access to test file locks and os.chmode to change the file locks, but i
>haven't gone into this, shame that there doesn't appear to be a nice simple
>file object subclass that does all this! Writes to the file object actually
>get done when flush() ( or seek() ) is called.

A File object class with unified locking has been and still is a good
idea;  I believe you just volunteered to do it? :)

>     suffice to say, i wasn't entirely impressed with the python file object, then
>i remembered the cross platform problems its dealing with and all
>the code that works ok with it, and though i'd knock up this post of my
>findings to try to elicit some discussion / get it improved / stop others
>making mistakes.

Indeed, "cross-platformability" is not a trivial task...
-- 
TZOTZIOY, I speak England very best,
Microsoft Security Alert: the Matrix began as open source.




More information about the Python-list mailing list