Python 3 is killing Python

Chris Angelico rosuav at gmail.com
Wed Jul 16 04:44:38 EDT 2014


On Wed, Jul 16, 2014 at 5:49 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> ... Although I'm open to the suggestion
> that maybe the Pythonic way to do that should be:
>
> print("foo bar baz", file="foo.txt")
>

And I would argue against that suggestion, having worked with a
language where that's the case. In REXX, you write to files using the
LINEOUT and CHAROUT functions (the former adds end-of-line after its
written content, the latter doesn't):

call lineout "foo.txt","foo bar baz"
/* or */
call charout "foo.txt","foo bar "
call lineout "foo.txt","baz"

And correspondingly, CHARIN and LINEIN to read from files. This is
nice and convenient, but it has a number of problems:

1) Hidden global state. Somewhere there's a mapping of file names to
open file handles, and it's not obvious.
2) Corollary: Surprising behaviour if you try to use a file twice in
one program.
3) Closing a file is sometimes unobvious. If you terminate the program
with open files, there's no problem, but if the program keeps running,
its files stay open.
4) Very VERY occasionally, you might run into a problem with too many
open files. (It should be noted that I learned REXX back in the 90s.
It's entirely possible that "too many open files" would be at some
insanely ridiculous number now.) At that point, you need to close
something... but how can you know?

Here's a REXX-style set of functions, implemented in Python:

# files.py
_filemap = {}
def _open(fn, mode): _filemap[fn] = open(fn, mode)

def charout(fn, s):
    if fn not in _filemap: _open(fn, "w")
    _filemap[fn].write(s)

def lineout(fn, s): charout(fn, s+"\n")

def charin(fn, n=1):
    if fn not in _filemap: _open(fn, "r")
    return _filemap[fn].read(n)

# Okay, the stream() function does a *lot* more than
# closing files, but that's all I'm implementing.
def stream(fn, *args):
    if args != ["c","close"]: raise NotImplemented
    del _filemap[fn]



That's more-or-less how REXX does things. There are a lot more
complications (I didn't implement LINEIN, which requires buffering -
more global state), but that's the basic layout. Now, that's already
scaring you a bit (all that global state!), but it gets worse: you
either have heaps of duplication all through your code (repeating the
file name in every output statement), or you have a variable with the
file name that functions as a cookie - it's the same as a file handle
integer, or a FILE *fp with the C stdio library, or a file object in
Python, or anything like that. Usage would be like this:

fn = "foo.txt"
print("foo bar baz", file=fn)
print("hello, world", file=fn)
close_file(fn)

Which has no significant improvement over the current:

f = open("foo.txt", "w")
print("foo bar baz", file=f)
print("hello, world", file=f)
f.close()

And it's worse, because if you put this into a function and return
early from it, the second form will garbage-collect f and close the
file, but the first form won't. That's a recipe for surprises down the
track.

There is a use-case where this is an improvement: you can have a
function that writes to a log file or something, and it doesn't need
to monitor state:

def some_func(...)
    do_stuff()
    if condition: print(some_state, file="some.log")
    do_more_stuff()

With Python's normal style, this would need to either keep on opening
and closing the file (slow and inefficient), or keep track of an open
file object somewhere (global state). If you're going to have global
state anyway, then it's easier to push it to someone else. But I'd
much rather NOT have that state... not to mention the potential
problems from having aliases to the file. I've never tried, for
instance, opening a file using two equivalent names, but it'd probably
open the file twice. Even more confusion.

It's great to be open to suggestions. It's great to be able to discuss
them and figure out which ones are actually worth pursuing :)

ChrisA



More information about the Python-list mailing list