syntax philosophy

Tue Nov 18 04:04:15 EST 2003

Tuang wrote:
   ...
> $dict{$color}++
> 
> to count up what you find. The first time that line is called, it
> creates the dictionary, then creates a key for $color, initializes its
> value to zero, then increments it to 1.

Right: it does a lot of different things depending on context.  Python
tends to avoid context-dependency while Perl tends to use it with
enthusiasm.

> This is a very common data analysis problem. It's the SQL database
> operation of GROUP BY and then returning COUNT, but applied to any
> sequence.

histogram = dict([ (value,seq.count(value)) for value in sets.Set(seq) ])

is a higher-order expression of the same concept.  Not quite as fast
as the lower-level expression, because each .count step is a separate
loop over seq, but sometimes one prefers abstraction and concision to
speed.  When speed IS preferred, being just a tad more explicit:

histogram = {}
for value in seq:
    histogram[value] = 1 + histogram.get(value, 0)

doesn't seem all that big a deal to me -- basically, you just have
to initialize the histogram to be the empty dictionary (no implicit
creation!) and be explicit about using 0 as "previous mapping" for
values that are not keys in the histogram yet.  I know that Raymond
Hettinger considers bags (aka multisets) just as fundamental as sets
(which he's laboring to make built-ins in the future 2.4 release),
so there may be a histogram.add(value) if he has his way -- but you
still will have to initialize histogram (to a bag, in that case).

Python will never second-guess you in terms of "oh he's using it as
a [set/bag/dict/list], and it doesn't exist, so I'll just create a
[whatever type] instance on the fly".

> But I may be misunderstanding Python's philosophy a bit. I'm surprised
> that value++ has to be spelled out as value = value+1, too, so I'm not

value += 1 is preferred these days.  But in the histogram case, there
is no previous "value" to increment, and guesstimating to insert a 0
there just isn't Python's way.

> quite sure that I understand the philosophy.

There is an ideal about "only one obvious way to do it", just as, in
C (per the Rationale to the C standard) there is an ideal to "provide
only one way to do an operation".  So, having both value++ and
value += 1 should never happen (though in C it did: ideals are ideals,
the real world is sometimes messier:-).  Perl's enthusiastic abundance
of multiple ways to perform each task is a very different philosophy.

> I agree for word frequency, but not for something as general as GROUP
> BY and (some operation, such as COUNT or SUM). Maybe using some of the
> functional programming constructs of Python (before they're removed in
> Python 3) would be the way to build my own.

List comprehensions (which Python copied from Haskell) are the key
FP construct in Python, and far from being removed they're growing
(with genexp's coming in 2.4 -- they're "lazy", like Haskell's LCs).

A more general GROUP BY (dict of lists) is built by:

histogram = {}
for value in seq:
    histogram.setdefault(f(value), []).append(value)

where f(...) represents the grouping -- e.g., to group by the
value of an attribute x.key of each item x,

    histogram.setdefault(x.key, []).append(x)

or if you want the higher-order abstraction,

histogram = dict([ (key, [y for y in seq if y.key==key] )
                    for key in sets.Set( [x.key for x in seq] )
                ])

Alex