Real-world use cases for map's None fill-in feature?

Mon Jan 9 03:29:38 EST 2006

"Raymond Hettinger" <python at rcn.com> wrote in message
news:mailman.194.1136781640.27775.python-list at python.org...
> Proposal
> --------
> I am gathering data to evaluate a request for an alternate version of
> itertools.izip() with a None fill-in feature like that for the built-in
> map() function:
>
> >>> map(None, 'abc', '12345')   # demonstrate map's None fill-in feature
> [('a', '1'), ('b', '2'), ('c', '3'), (None, '4'), (None, '5')]
>
> The motivation is to provide a means for looping over all data elements
> when the input lengths are unequal.  The question of the day is whether
> that is both a common need and a good approach to real-world problems.
> The answer can likely be found in results from other programming
> languages and from surveying real-world Python code.
>
> Other languages
> ---------------
> I scanned the docs for Haskell, SML, and Perl6's yen operator and found
> that the norm for map() and zip() is to truncate to the shortest input
> or raise an exception for unequal input lengths.  Ruby takes the
> opposite approach and fills-in nil values -- the reasoning behind the
> design choice is somewhat inscrutable:
>   http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-dev/18651

>From what I can make out (with help of internet
language translation sites) the relevent part
(section [2]) of this presents three options for
handling unequal length arguments:
1. zip to longest (Perl6 does it this way)
2. zip to shortest (Python does it this way)
3. use zip method and choose depending on
  whether argument list is shorter or longer
  than object's list.
It then solicits opinions on the best way.
It does not state or justify any particular choice.

If "perl6"=="perl6 yen operator" then there
is a contradiction with your earlier statement.

> Real-world code
> ---------------
> I scanned the standard library, my own code, and a few third-party
> tools.  I
> found no instances where map's fill-in feature was used.
>
> History of zip()
> ----------------
> PEP 201 (lock-step iteration) documents that a fill-in feature was
> contemplated and rejected for the zip() built-in introduced in Py2.0.
> In the years before and after, SourceForge logs show no requests for a
> fill-in feature.

My perception is that many people view the process
of advocating for a library addition as
1. Very time consuming due to the large amount of
   work involved in presenting and defending a proposal.
2. Having a very small chance of acceptance.
I do not know whether this is really the case or even if my
perception is correct, but if it is, it could account for the
lack of feature requests.

> Request for more information
> ----------------------------
> My request for readers of comp.lang.python is to search your own code
> to see if map's None fill-in feature was ever used in real-world code
> (not toy examples).  I'm curious about the context, how it was used,
> and what alternatives were rejected (i.e. did the fill-in feature
> improve the code).  Likewise, I'm curious as to whether anyone has seen
> a zip-style fill-in feature employed to good effect in some other
> programming language.

How well correlated in the use of map()-with-fill with the
(need for) the use of zip/izip-with-fill?

> Parallel to SQL?
> ----------------
> If an iterator element's ordinal position were considered as a record
> key, then the proposal equates to a database-style full outer join
> operation (one which includes unmatched keys in the result) where record
> order is significant.  Does an outer-join have anything to do with
> lock-step iteration?  Is this a fundamental looping construct or just a
> theoretical wish-list item?  Does Python need itertools.izip_longest()
> or would it just become a distracting piece of cruft?
>
> Raymond Hettinger
>
> FWIW, the OP's use case involved printing files in multiple
> columns:
>
>     for f, g in itertools.izip_longest(file1, file2, fillin_value=''):
>         print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip())
>
> The alternative was straightforward but less terse:
>
>     while 1:
>         f = file1.readline()
>         g = file2.readline()
>         if not f and not g:
>             break
>         print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip())

Actuall my use case did not have quite so much
perlish line noise :-)
Compared to
    for f, g in izip2 (file1, file2, fill=''):
        print '%s\t%s' % (f, g)
the above looks like a relatively minor loss
of conciseness, but consider the uses of the
current izip, for example

    for i1, i2 in itertools.izip (iterable_1, iterable_2):
          print '%-20s\t|\t%-20s' % (i1.rstrip(), i2.rstrip())

can be replaced by:
    while 1:
        i1 = iterable_1.next()
        i2 = iterable_2.next()
        print '%-20s\t|\t%-20s' % (i1.rstrip(), i2.rstrip())

yet that was not justification for rejecting izip()'s
inclusion in itertools.

The other use case I had was a simple file diff.
All I cared about was if the files were the same or
not, and if not, what were the first differing lines.
This was to compare output from a process that
was supposed to match some saved reference
data.  Because of error propagation, lines beyond
the first difference were meaningless.  The code,
using an "iterate to longest with fill" izip would be
roughly:

# Simple file diff to ident
    for ln1, ln2 in izip_long (file1, file2, fill="<EOF>"):
        if ln1 != ln2:
            break
    if ln1 == ln2:
        print "files are identical"
   else:
       print "files are different"

This same use case occured again very recently
when writing unit tests to compare output of a parser
with known correct output during refactoring.

With file iterators one can imagine many potential
use cases for izip but not imap, but there are probably
few real uses existant because generaly files may be
of different lengths, and there currently is no useable
izip for this case.

[jan09 08:30 utc]