Real-world use cases for map's None fill-in feature?

Mon Jan 9 02:59:50 EST 2006

[Alex Martelli]
> I had (years ago, version was 1.5.2) one real-world case of map(max,
> seq1, seq2).  The sequences represented alternate scores for various
> features, using None to mean "the score for this feature cannot be
> computed by the algorithm used to produce this sequence", and it was
> common to have one sequence longer (using a later-developed algorithm
> that computed more features).  This use may have been an abuse of my
> observation that max(None, N) and max(N, None) were always N on the
> platform I was using at the time.

Analysis
--------

That particular dataset has three unique aspects allowing the map(max,
s1, s2, s3) approach to work at all.

1) Fortuitious alignment in various meanings of None:
   - the input sequence using it to mean "feature cannot be computed"
   - the auto-fillin of None meaning "feature used in later
     algorithms, but not earlier ones"
   - the implementation quirk where max(None, n) == max(n, None) == n

2) Use of a reduction function like max() which does not care about the
order of inputs (i.e. the output sequence does not indicate which
algorithm produced the best score).

3) Later-developed sequences had to be created with the knowledge of
the features used by all earlier sequences (lest two of the sequences
get extended with different features corresponding to the same ordinal
position).

Getting around the latter limitation suggests using a mapping
(feature->score) rather than tracking scores by ordinal position (with
position corresponding to a particular feature):

    bestscore = {}
    for d in d1, d2, d3:
        for feature, score in d.iteritems():
            bestscore[feature] = max(bestscore.get(feature, 0), score)

Such an approach also gets around dependence on the other two unique
aspects of the dataset.  With dict.get() any object can be specified as
a default value (with zero being a better choice for a null input to
max()).  Also, the pattern is not limited to commutative reduction
functions like max(); instead, it would work just as well with a
result.setdefault(feature, []).append(score) style accumulation of all
results or with other combining/analysis functions.

So, while map's None fill-in feature happened to apply to this
dataset's unique features, I wonder if its availability steered you
away from a better data-structure with greater flexibility, less
dependence on quirks, and more generality.

Perhaps the lesson is that outer-join operations are best expressed
with dictionaries rather than sequences with unequal lengths.

> I was relatively new at Python, and
> in retrospect I feel I might have been going for "use all the new toys
> we've just gotten"

That suggests that if itertools.zip_longest() doesn't turn out to be
TheRightTool(tm) for many tasks, then it may have ill-effects beyond
just being cruft -- it may steer folks away from better solutions.  As
you know, it can take a while for Python newcomers to realize the full
power and generality of dictionary based approaches.  I wonder if this
proposed itertool would distract from that realization.

> I don't recall ever relying on map's None-filling feature in other
> real-world cases, and, as I mentioned, even here the reliance was rather
> doubtful.  OTOH, if I had easily been able to specify a different
> filler, I _would_ have been able to use it a couple of times.

Did you run across any cookbook code that would have been improved by
the proposed itertools.zip_longest() function?

Raymond