[Tutor] Defining variable arguments in a function in python

Avi Gross avigross at verizon.net
Sun Dec 30 12:26:39 EST 2018


Replying to Steve's points. Again, it was not a serious design and said so
but was an ACADEMIC exploration of what could be done. I fully agree with
Steve that it is probably not a great idea to do this but note the original
request might not have been a great one in the first place.

There are people who make a family of related functions and tell users to
pick the right one.

def add_two(x, y): return x+y
def add_three(x, y, z): return add_two(x,y)+z

You get the idea, I hope. So you can create one function in which an
argument is required and a sister function in which it is not allowed and a
default is built in. The user picks which function to call.

Again, NOT suggesting that is a great solution, just exploring
possibilities. There are often many ideas you can brainstorm before
rejecting most or all of them.

-----Original Message-----
From: Tutor <tutor-bounces+avigross=verizon.net at python.org> On Behalf Of
Steven D'Aprano
Sent: Sunday, December 30, 2018 5:39 AM
To: tutor at python.org
Subject: Re: [Tutor] Defining variable arguments in a function in python.

So everything Steve says as commentary and even criticism is indeed true. It
is a bug magnet and so on. So is just about any code I have seen, albeit not
as bad. A function with 40 positional arguments very often gets called with
arguments out of order for example. Some languages allow a row of empty
commas so you can specify the third and then the eight and then the
sixteenth and skip the others. 

Languages that use named arguments may be better but some get cute and will
match any argument that starts right. I mean you can call them with
layout="portrait" but also lay="landscape" or even l="..." and who knows
what they do if your allowed arguments include layers=True".

That is not a defense. It is an acknowledgement that many things we do have
drawbacks as well as advantages. On the whole, I doubt I would use the
design I offered except as an academic display of what not to do even if the
language allows it.

Having said that, I do defend the placeholder construct in places where it
is needed as a serious way to make a point.

Consider a request to do something frequent where the default should be
something like ALL, or ALL UP TO HERE, or ALL FROM HERE. Think of how you
take slices out of a list or other data structure including
multi-dimensional structures.

Here is an assortment of examples with many more possible. This is not a
function call exactly but the sub-language used allows many variations with
the use of what seems to be a series of up to three arguments with a colon
between them. But sometimes the absence of an argument makes it look like
the colon defaulted to something.

>>> a = list(range(10))
>>> a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> a[ : 3]
[0, 1, 2]
>>> a[ 3: ]
[3, 4, 5, 6, 7, 8, 9]
>>> a[3:9]
[3, 4, 5, 6, 7, 8]
>>> a[3:9:2]
[3, 5, 7]
>>> a[9:3:-1]
[9, 8, 7, 6, 5, 4]
>>> a[-1]
9
>>> a[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

So the DEFAULT for anything before the colon is a loose concept. It is
anything before whatever the next argument states. This shows that the
default works in the extreme case to be nothing:

>>> a[:0]
[]

The third optional argument is defaulted to be "1".

But underneath it all this is what is termed syntactic sugar. Each and every
such pattern is translated into a slice call:

>>> slice(1,9,4)
slice(1, 9, 4)
>>> a[slice(1,9,4)]
[1, 5]
>>> a[slice(None,9,4)]
[0, 4, 8]

Did you see the value None is used to mean use the darn DEFAULT?

It can be used for any of the parts or even all.

>>> a[slice(None,None,None)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

So here is an example of a dangerous function we all use regularly, albeit
usually indirectly, that uses my odd design principle. 

I can drown you with more examples, but since I don't hate anyone here, just
one more. Without details, the Ellipsis seems to have been partially created
to be used in the numpy module as a way to deal with indexing arrays with
multiple indices. It is not easy to say I want the first and the last an
skip the middle ones. Enough said. 

Steven is quite reasonably more concerned when the placeholder is invisibly
changed than when it is obviously just a placeholder. As he says you can
just look up the default in documentation and use it. I could argue that a
pythonic way to do things is to make it so if the default later changes, you
are not stuck. Then again, maybe you don't want it to change invisibly.  A
compromise would be to have the user explicitly ask for whatever is the
default as in soup="Du Jour" gets you whatever the soup of the day is, and
may even get you a salad if they ran out.

Explicit is often better. But you can have too much of a good thing. People
often create a wrapper to a function with too many required or optional
arguments. An example is a function used to read data in from a file.  There
are general functions where you must tell it in excruciating detail what to
do. What character (or sequence) separates the fields? Is it a comma as in a
CSV file? Is it a tab or some other character? Or are the fields of
specified constant widths with no separator? Are all rows the same or is
there a header row? Should the program make an educated guess about the
number of columns of data or will you tell it how many or which ones you
want? Ditto for the types of data in each. If the file might have characters
embedded such as in quotes containing a comma, how is that recognized or
dealt with? If a column read in has repeated values, should it be read in as
a more compressed categorical form such as a factor in R. The list of
possibilities is HUGE.

So instead of using a function like read.table() people make available lots
of others like read.csv() that is a wrapper that merges your supplied
arguments with a set of defaults that make sense when reading a Comma
Separated value file. 

But in the above example, to be clear, the defaults are all done in a normal
way. You do not use silly placeholders.

Steve writes:

" But at least with sentinels like None, or Ellipsis, it is *obvious* that
the value is probably a placeholder. With a placeholder like 11 or 999, it
isn't. They look like ordinary values."

I partially agree. I can think of cases where None really means None as in I
want None for Dessert, not the gentle substitution of sugar-free Jell-O.

Steve then provides an example of a more subtly bug. To be succinct, it is a
class of arguments that boils down to localized knowledge fails when
evaluated elsewhere or worse when a substitution is evaluated elsewhere.
That is also true in more normal cases but is indeed a problem.

Back to my CSV example, if the wrapper function ASSUMES that there will
normally be a header line in a CSV file and calls the wrapped function
appropriately with that added request, then mistakes can happen. If it makes
the opposite assumption, mistakes can happen. If it makes no assumption and
specifies nothing, mistakes can happen!

So I ask the question of how to deal with real life scenarios? There are
many no-win situations out there. Do you try to give warnings? Or maybe a
customized warning that says you should call the function again and this
time specify one of several alternatives as an explicit argument? I have
seen that done.

Steve is right that the scenario where a function changes things quietly can
be nefarious especially downstream. A good example is when dealing with
missing data your program may use a placeholder like Nan or NA or 999 or a
blank string. But if you call a function to take the mean of a sample, it
may not see the marker as a suggestion to skip it. It may treat nonsense as
a 0 and get a wrong mean. It may skip it but count it when dividing by the
number of items. Or, it may need an argument like na.rm=TRUE and perhaps
even a second argument specifying what an 'na' looks like. Worse, I often
save the results of a calculation in a file using some format like a CSV. I
then have it re-read into perhaps a program in another programming language.
Some expect strange things like a single period to mean not available. So
you may want to be careful how you write a file containing NOT AVAILABLE
items. Even worse, you often cannot export some things reliably as the
receiver cannot handle any form. Can you save a value of Inf that is
meaningful even to read back in? I don't mean from using something like
pickle, but from some storage formal like CSV or EXCEL.

I see Steve wrote a bit more and will just say I agree. God programming
style tries to avoid surprises when it can.

On Sun, Dec 30, 2018 at 12:07:20AM -0500, Avi Gross wrote:

[...]
> Or on a more practical level, say a function wants an input from 1 to 10.
> The if statement above can be something like:
> 
> >>> def hello(a, *n, **m) :
> 	if not (1 <= a <= 10) : a=5
> 	print(a)
> 	print(*n)
> 
> 	
> >>> hello(1,2,3)
> 1
> 2 3
> >>> hello(21,2,3)
> 5
> 2 3
> >>> hello(-5,2,3)
> 5
> 2 3


This design is an example of "an attractive nuisance", possibly even a "bug
magnet". At first glance, when used for mickey-mouse toy examples like this,
it seems quite reasonable:

    hello(999, 1, 2)  # I want the default value instead of 999

but thinking about it a bit more deeply, and you will recognise some
problems with it.

First problem: 

How do you know what value to pass if you want the default? Is 999 out of
range? How about 11? 10? Who knows? If you have to look up the docs to know
what counts as out of range, you might as well read the docs to find out
what the default it, and just pass that:

    hello(5, 1, 2)  # I want the default value 5

but that kind of defeats the purpose of a default. The whole point of a
default is that you shouldn't need to pass *anything at all*, not even a
placeholder.

(If you need a placeholder, then you probably need to change your function
parameters.)

But at least with sentinels like None, or Ellipsis, it is *obvious* that the
value is probably a placeholder. With a placeholder like 11 or 999, it
isn't. They look like ordinary values.


Second problem:

Most of the time, we don't pass literal values to toy functions. We do
something like this example:

    for number, widget_ID, delivery_date in active_orders:
        submit_order(number, widget_ID, delivery_date)

Can you see the bug? Of course you can't. There's no obvious bug. But little
do you know, one of the orders was accidentally entered with an out-of-range
value, let's say -1, and instead of getting an nice error message telling
you that there's a problem that you need to fix, the
submit_order() function silently replaces the erroneous value with the
default.

The bug here is that submit_order() function exceeds its authority.

The name tells us that it submits orders, but it also silently decides to
change invalid orders to valid orders using some default value. But this
fact isn't obvious from either the name or the code. You only learn this
fact by digging into the source code, or reading the documentation, and
let's be honest, nobody wants to do either of those unless you really have
to.

So when faced with an invalid order, instead of getting an error that you
can fix, or even silently skipping the bad order, the submit_order()
function silently changes it to a valid-looking but WRONG order that you
probably didn't want. And that costs real money.

The risk of this sort of bug comes directly from the design of the function.
While I suppose I must acknowledge that (hypothetically) there could be
use-cases for this sort of design, I maintain that in general this design is
a bug magnet: responsibility for changing out-of-range values to in-range
values belongs with the caller, not the called function.

The caller may delegate that responsibility to another:

    for number, widget_ID, delivery_date in active_orders:
        number = validate_or_replace(number)
        submit_order(number, widget_ID, delivery_date)

which is fine because it is explicit and right there in plain sight.

This then allows us to make the submit_order() far more resiliant: if it is
passed an invalid order, it can either fail fast, giving an obvious error,
or at least skip the invalid order and notify the responsible people.


--
Steven
_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list