[Tutor] New file stuff (formerly inoffensive non-commercial mail)

Scot W. Stevenson scot@possum.in-berlin.de
Tue, 13 Aug 2002 23:28:51 +0200


Hello Alan, 

> Yep, so maybe there's a reason...

There usually is. You know, back when I was 17, I knew it all, and ever 
since then, I seem to have become progressively more stupid =X)...

> Actually altho' Python does wrap the C stuff the file
> methods are very similar to every mainstream programming
> language around from ADA to Lisp, to Smalltalk...

Well, yes, but just because it is the way that it has always been done by 
computer scientists doesn't mean that a different way might not be easier 
for people who don't tend to start counting with 0. Look at indentation 
and (new) division, two places where Python is now marching to its own 
drummer. Asking somebody to remember "br+" or whatever it is to open a 
file just is not the way things are usually done in the language.

> Pascal kind of makes that distinction by having text
> files as a special category. But binary files are
> broken into myriad types... FILE OF <FOO>

If anybody wanted to do that (deal with a file as a collection of 64-bit 
words instead of bytes, for example) they could persumably subclass the 
binfile object. Or something like that. I'm guessing that people use text 
or text-like files (HTML, XML, whatnot) so much that having a set of 
commands for text files is worth the effort. 

> > With this type of file, we can iterate, splice, or index the content
> > without having to explicitly tell the Elves that we want to
> > read or write or whatnot:

> Thats actually quite tricky to do. Why not try implememting
> the interface in Python to see whats involved.... 

Yes, that would be the next logical step in my argument, wouldn't 
it...argh. I'll have to see what I can come up with (did I mention I'm 
just learning Python <g>) next week when I have some time on my hands...

> One problem is that under the covers you have to figure out predictively
> what mode to open the raw file in - what
> does the user want to do with it. Otherwise you have to
> open/close the file after each operation and keep track
> of where the last access was etc etc...

This is where I run up against my lack of background knowledge on operating 
system basics - why can't you just read the whole file into a buffer and 
manipulate that, which occasional flushes to a backup version? If I 
understand Linux correctly, this is what the operating system does anyway, 
or at least that is the excuse everybody keeps giving me when I ask why 
"free" shows me that all my nice RAM is being used for buffers and caches 
and stuff like that..

The trick (I guess) would be to make sure at all times that the file is not 
corrupted when the system crashes (this seems to be more a constant worry 
with Windows and (old) Mac users, but I also remember being told that 
Murphy was a computer scientist at heart). Include a buffer flush command 
after every write? Once you have everything in a buffer, you can do 
everything you want rather quickly (the end version has to be in C anyway 
for speed). 

If you do decide to do everything directly, yes, you might have to reopen 
and close the file a few times. But if speed is the problem, you can 
always go to the os module and do it the hard way. I'm assuming here that 
the lowest level you can get to are the POSIX calls (was that the name?) 
to the operating system, and that they force you to decide if you want to 
read or write? So you couldn't just write a new C library for opening and 
closing files?

> Not too hard if its text and you assume line by line
> access rather than characters, binary presumably
> returns bytes?

Yes - with maybe an option for multiples of bytes, but that would be for 
somebody to decide who knows more about the uses of binary files. I don't 
think I have ever accessed one in Python, but then I've heard that they 
are more common with Windows and (old) Macs than with Linux. 

[splices and indices]
> Ah, but now try implementing that on a binary file.
> But I guess you could just seek(0) after each
> operation... or could you? It might depend on the
> current mode...

Worse case would probably be close file, open file, seek(0). That certainly 
would not be fast in relative terms; the question is, how fast is this 
going to be in human terms? I'm assuming the heavy-lifting people will 
want to use the old version with the os module anyway, because staying 
close to C (or Java in the case of Jython) is always going to be faster 
than anything one level up.

> BTW Have you looked at the fileinput module which
> does a little bit of what you want I think....

No, I hadn't, thank you for the reference. Will read it.

I'll see about throwing together an interface for the new versions as a 
first step; tho I should warn everybody right away that I'll need a bit of 
help here...

Y, Scot

-- 
  Scot W. Stevenson wrote me on Tuesday, 13. Aug 2002 in Zepernick, Germany   
       on his happy little Linux system that has been up for 1363 hours       
        and has a CPU that is falling asleep at a system load of 0.00.