Defending the ternary operator

Sat Feb 8 19:10:23 EST 2003

Andrew Koenig:
> As an experiment, I went back through a 2800-line C program that I
> wrote about 20 years ago (!) to see how often I used ?: and in what
> context.  I figure that reading my own 20-year-old code is probably a
> lot like reading some else's code.

Ahh, but you're a good programmer, even 20 years ago.

I analyzed a codebase I have access to.  It's proprietary, so
I can't say much of the details, but can say it was written over
a course of years by life science researchers (chemists and
biologists) with little formal training in C++.  There were a
couple of programmers, but with only a couple of years of
professional experience.

EXAMPLE 1: (used 6 times)

  return arr[i] & mask ? true : false;

(This C++ code was pre any sort of built-in boolean so
those two values are defined as part of their package, I think.)

Python now has a bool, so this could be written as
   return bool(arr[i] & mask)

If C++ does have a boolean type, then this isn't needed.  Even
when it didn't exist, it would have been easy to define, just like
the values 'true' and 'false' were defined.

EXAMPLE 2:

ClassSpam::ClassSpam(huge, set, of, args, including, input_arg)
 : instance_var(input_arg ? 2 : 1)
{
  ... rest of constructor ...
}

One way to do this in Python (besides an if/then statement) is
   self.instance_var = bool(input_arg) + 1

This is needed for C++ because it's the only way to initialize
const values.  It isn't needed in Python.

EXAMPLE 3: (used twice, 3x each)

  sprintf(outputStr, msgFmt, is_clean ? "CLEAN": "DIRTY",
            "spam=", spamStr,
            output_format == FORMAT1 ? "STR1" : "STR2",
            output_format == FORMAT1 ? "" : output_filename

consider instead

   if self.output_format == FORMAT1:
     output_info = ("STR1", "")
  else:
    output_info == ("STR2", output_filename)
   print >> outputStr, msgFmt % ((["DIRTY", "CLEAN"][self.is_clean],
          "spam=", self.spamStr) + output_info)

For C++ this is useful.  Doesn't mean it's needed for Python.

EXAMPLE 4:

Personally, I can't figure out what this code is doing.  Not because
of the ?:, which looks like

  if (!ExportAs(inputStr, usingChemDraw ? CHEMDRAW : outputFilename))

but because the surrounding code is complicated.  The above is used twice,
once in two different branches.  I would have made it into a variable like

  destination = usingChemDraw ? CHEMDRAW : outputFilename;
   ..
  if (branch1) {
    ... ExportAs(inputStr, destination)
  } else {
    ... ConvertFile(inputFilename, destination);
 }

so I think the two uses of '?:' should have been folded into one.

EXAMPLE 5:

  // flg can be 0 ("no"), 1 ("same as last time"), or 2 ("yes")
  if (flg != 1 && cursor && (cursor.dirty() ? !flg : flg) {
     delete cursor;
     cursor = NULL;
  }
  if (!cursor) ...

I have a hard time following the logic.  I think it's more
understandable as

  if ((flg == 0 and cursor and cursor.dirty()) or
      (flg == 2 and cursor and not cursor.dirty()):

assuming I didn't get something swapped.

EXAMPLE 6:

Okay, this is going through an array-like object indexed
by opaque handles and deleting terms which don't meet
a given criteria ... I think.  The handles are never 0.

   for (prev = 0, curr = arrary.first();
         curr; curr = (prev ? array.first() : array.after(prev)) {
     ...
    if (need_to_delete) {
      array.remove(curr)
    } else {
      prev = curr;
   }

This is just wrong.  IMHO of course.

EXAMPLE 7
  n = num_items == 2 ? 1 : (num_items == 3 ? 3 : 6);

In Python, perhaps better written as

  n = {2: 1, 3: 3}.get(num_items, 6)

In C++ ...  hey, it's got a switch statement, right?
  switch (num_items) {
   2: n = 1; break;
   3: n = 2; break;
  default: n=6; break;
  }

Naah, pretty long. I'll let you all decide if it's needed for
C++.  For Python, I'm willing to live with the dict solution.

 EXAMPLE 8

    n = arr == NULL ? 0 : arr->maxVal();

All nice and pretty, right?  Well, two lines down is

  head = arr->GetNext(index, n)

so there's no need to check that arr == NULL .. because
if it was NULL, the other code would break!

EXAMPLE 9

  return ((minSize == maxval) ? 0 : minSize)

This term is looking for all items in a list which match  a given
condition.  If there are no items in the list with that condition, it
returns 0.  The search is done by setting minSize to maxint
and doing the old
  if condition_met:
    minSize = min(currSize, minSize)

None of the items will ever have maxint, so at the end there's
the check to see if it's maxint and, if so, return 0.

I don't like the way the search is done.  Still, if done that way
I would rather do something with a bit more comments

  if minSize == maxval:  # Nothing was found so return 0
    return 0
  return minSize

EXAMPLE 10 (4 cases)

    return needs_inversion ? 1 - val : val;

Here are two options

  if needs_inversion:
    return -val
  return val

Or perhaps the 'needs_inversion' should be a +1/-1 value
(inversion_factor) instead of a boolean flag, so the written as

   return inversion_factor * val

I can't tell because the flag is used/set somewhere outside of the
scope of the file I'm looking at.

As mentioned, this is used 4 times.  Twice as "1-val" and
once as "1.0 - val".  So I think that a flag is the wrong approach
and this should use an inversion factor instead.

EXAMPLE 11:

In this case, atoms are being rearranged.  Bonds can be in
either CIS or TRANS state.  If the order of the atoms is the
same, keep them in the same state.  Otherwise, invert them

   if (need_inversion) {
    bond -> setChirality(bond.chirality() == CIS ? TRANS : CIS);
  }

I would rather there be a "bond.invertChirality()".  Even then,
the internal implementation needs to swap.  One way would be
to have the states be 0 and 1, so it could be done with 1-old_state.

I'm somewhat okay with how it's written now

EXAMPLE 12:

  size = fromSize <= toSize ? fromSize : toSize

which in Python is

  size = min(fromSize, toSize)

I think C++ has a 'min' as well?

EXAMPLE 13:

  n = obj == NULL ? 0 : obj->getCount();
  vector.append(n);
  if (n > 0) {
    arr -> append(something_else);
  }

I would rather this be

  if (obj == NULL) {
    n = obj->getCount();
    vector.append(n);
    if (n > 0) {  // Actually, I think n can never be 0 so this isn't needed
      arr -> append(something_else);
   }
 } else {
  vector.append(0);
 }

Yes, it's more verbose (7 (or 6) coding lines vs. 4) but then I don't
have to track in my head the fact that n means 'no object or empty
object'.

EXAMPLE 14:

  box = (n > 2) ? (n-2) * BOX_FACTOR : BOX_FACTOR;

How about
  box = max(0, (n-2) * BOX_FACTOR)
?

And C++ has a max, right?

EXAMPLE 15:
  if (pn != NULL) {
    *pn = (rep_count < max_count ? rep_count + 1 : max_count);
 }
which can be replaced by 'min' in Python and C++.

EXAMPLE 16:  (4 in a row)

  index_A = (atom1 != OXYGEN)
                  ?   long_function_name(atom1->realAtom(),
                          atomCount, reactionMapping)
                  : -1;
  index_B = (atom2 != OXYGEN) ? function(atom2->realAtom(), params) : -1;
  index_C = (atom3 != OXYGEN) ? function(atom3->realAtom(), params) : -1;
  index_D = (atom4 != OXYGEN) ? function(atom4->realAtom(), params) : -1;

Yes, the real code is 4 lines long per assignment, so 16 lines total.
And it's hard to tell that all the parameters (unless noted otherwise)
are the same.

In C++ I would make a helper function.

  In Python this could be done better by building a list, as in
the following 7 lines

 indicies = []
for atom in (atom1, atom2, atom3, atom4):
  if atom != OXYGEN:
    indicies.append(long_function_name(atom.realAtom(),
                                                        atomCount,
reactionMapping))
 else:
   indicies.append(-1)

However, some might argue that a list comprehension and an
if/else expression is better.

  indicies = [(long_function_name(atom->realAtom(), atomCount,
reactionMapping)
                     if atom != OXYGEN else -1)
                        for atom in (atom1, atom2, atom3, atom4)]

I don't think so.

This is a case where list comprehensions and if/else expression
would only serve to make things *less* readable, because the function
call is so heineously long that the small '[]' and 'if/else' symbols get
hidden amoung the noise.

Yet I am sure there are those who would use it, because writing
this in the first place is easy.  Add if/else expressions and you enable
those people to write junk.

Okay, that's 42,500 lines of C++ code.  There's a lot more, but
this analysis takes time.  And in that code, there are (assuming I
counted correctly) 34 uses of ?:. making one use every 1,250 lines.

Andrew K found it was used once every 400 lines.  The difference
may reflect that he's a good C programmer, and a professional
developer, while the code I'm reviewing was written by a buch of
non-software engineers.

Of those, I think 9 are appropriate for the C++ code and
4, count 'em, 4! might be justified for Python.

Assume that Python code is 5x more expressive than C++.
That means I think it might be used once every 2,000 lines
of Python code, amoung my target audience.

Assuming it saves 4 lines of code, then overall the code is
reduced in size by 0.04%.

Yet were people to introduce this ternary operator to Python,
it opens the way up for all sorts of nasties, like people doing

    val = x if x < y else y
instead of
    val = min(x, y)

This will happen.  I have seen people post to this list asking
for the ternary operator exactly so they can write x<y?x:y.
We told them about min, max, and it made that request go away.

It also opens up Python code to allow the above listcomp + if/else

  indicies = [(long_function_name(atom->realAtom(), atomCount,
reactionMapping)
                     if atom != OXYGEN else -1)
                        for atom in (atom1, atom2, atom3, atom4)]

I can guarantee you that people will write in this form.  I do not
want to read nor maintain code written like this, but it will happen.

So from my sampling, I believe that people will misuse the ?: operator
about as often as it is used correctly, based on my analysis of a
non-trivial
amount of C++ code.  The abuse will come from 1) not using the operator
correctly (eg, having a check which isn't needed, lke #8), 2) using it
to introduce confusing control flow (#13), 3) use it to make a hard-to-
understand expression (#5, #6), 4) use it in lieu of existing functions
(the various min/max examples, esp. #14, and also #7) and 5)
because it is 'cool' (#16, in the list comprehension).

Anyone else have numbers to suggest otherwise for my target
audience of  "people who don't consider themselves to be software
engineers but still program"?

                    Andrew
                    dalke at dalkescientific.com