[Python-ideas] struct.unpack should support open files

Steven D'Aprano steve at pearwood.info
Wed Dec 26 19:42:30 EST 2018


On Wed, Dec 26, 2018 at 01:32:38PM +0000, Paul Moore wrote:
> On Wed, 26 Dec 2018 at 09:26, Steven D'Aprano <steve at pearwood.info> wrote:
> > Regardless, my point doesn't change. That has nothing to do with the
> > behaviour of unpack. If you pass a non-blocking file-like object which
> > returns None, you get exactly the same exception as if you wrote
> >
> >     unpack(fmt, f.read(size))
> >
> > and the call to f.read returned None. Why is it unpack's responsibility
> > to educate the caller that f.read can return None?
> 
> Abstraction, basically - once the unpack function takes responsibility
> for doing the read, and hiding the fact that there's a read going on
> behind an API unpack(fmt, f), it *also* takes on responsibility for
> managing all of the administration of that read call.

As I keep pointing out, the json.load and pickle.load functions don't 
take on all that added administration. Neither does marshal, or 
zipfile, and I daresay there are others.

Why does "abstraction" apply to this proposal but not the others?

If you pass a file-like object to marshal.load that returns less than a 
full record, it simply raises an exception. There's no attempt to handle 
non-blocking streams and re-read until it has a full record:

py> class MyFile:
...     def read(self, n=-1):
...             print("reading")
...             return marshal.dumps([1, "a"])[:5]
...
py> marshal.load(MyFile())
reading
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
EOFError: EOF read where object expected

The use-case for marshall.load is to read a valid, complete marshall 
record from a file on disk. Likewise for json.load and pickle.load. 
There's no need to complicate the implementation by handling streams 
from ttys and other exotic file-like objects.

Likewise there's zipfile, which also doesn't take on this extra 
responsibility. It doesn't try to support non-blocking streams which 
return None, for example. It assumes the input file is seekable, and 
doesn't raise a dedicated error for the case that it isn't. Nor does it 
support non-blocking streams by looping until it has read the data it 
expects.

The use-case for unpack with a file object argument is the same. Why 
should we demand that it alone take on this unnecessary, unwanted, 
unused extra responsibility?


It seems to me that only people insisting that unpack() take on this 
extra responsibility are those who are opposed to the proposal. We're 
asking for a battery, and they're insisting that we actually need a 
nuclear reactor, and rejecting the proposal because nuclear reactors are 
too complex. Here are some of the features that have been piled on to 
the proposal:

- you need to deal with non-blocking streams that return None;
- if you read an incomplete struct, you need to block and read 
  in a loop until the struct is complete;
- you need to deal with OS errors in some unspecified way, apart from 
  just letting them bubble up to the caller.

The response to all of these are:

No we don't need to do these things, they are all out of scope for the 
proposal and other similar functions in the standard library don't do 
them. These are examples of over-engineering and YAGNI.

*If* (a very big if!) somebody requests these features in the future, 
then they'll be considered as enhancement requests. The effort required 
versus the benefit will be weighed up, and if the benefit exceeds the 
costs, then the function may be enhanced to support streams which return 
partial records.

The benefit will need to be more than just "abstraction".

If there are objective, rational reasons for unpack() taking on these 
extra responsibilities, when other stdlib code doesn't, then I wish 
people would explain what those reasons are. Why does "abstraction" 
apply to struct.unpack() but not json.load()?

I'm willing to be persuaded, I can change my mind. When Andrew suggested 
that unpack would need extra code to generate better error messages, I 
tested a few likely exceptions, and ended up agreeing that at least one 
and possibly two such enhancements were genuinely necessary. Those 
better error messages ended up in my subsequent proof-of-concept 
implementations, tripling the size from five lines to fifteen. (A second 
implementation reduced it to twelve.)

But it irks me when people unnecessarily demand that new proposals are 
written to standards far beyond what the rest of the stdlib is written 
to. (I'm not talking about some of the venerable old, crufty parts of 
the stdlib dating back to Python 1.4, I'm talking about actively 
maintained, modern parts like json.)

Especially when they seem unwilling or unable to explain *why* we need 
to apply such a high standard. What's so specially about unpack() that 
it has to handle these additional use-cases?

If an objection to a proposal equally applies to parts of the stdlib 
that are in widepread use without actually being a problem in practice, 
then the objection is probably invalid.

Remember the Zen:

Now is better than never.
Although never is often better than *right* now.

Even if we do need to deal with rare, exotic or unusual input, we don't 
need to deal with them *right now*. When somebody submits an enhancement 
request "support non-blocking streams", we can deal with it then.

Probably by rejecting it.


-- 
Steve


More information about the Python-ideas mailing list