syntax philosophy

Mon Nov 17 19:06:34 EST 2003

On 17 Nov 2003 13:29:16 -0800, tuanglen at hotmail.com (Tuang) wrote:

>I'm checking out Python as a candidate for replacing Perl as my "Swiss
>Army knife" tool. The longer I can remember the syntax for performing
>a task, the more likely I am to use it on the spot if the need arises.
>If I have to go off and look it up, as I increasingly have to do with
>Perl's ever hairier syntax, I'm more likely to just skip it, making me
>even less likely to remember the syntax the next time.
>
>So I hear that Python is easier to remember between uses than Perl. So
>far, I like what I see. Iterators and generators, for example, are
>great. Basic loops and other things are very convenient in Python.
>
>But I'm surprised at what you apparently have to go through to do
>something as common as counting the frequency of elements in a
>collection. For example, counting word frequency in a file in Perl
>means looping over all the words with the following line:
>
>$histogram{$word}++;
>
>The creation of the dictionary, the creation of new elements, the
>initialization to zero, are all taken care of automatically. You just
>tell it to start counting and the rest is taken care of for you. And
>the incrementing is just a simple "++".
>
>In Python, apparently you have to first remember to declare your
>dictionary outside the loop:
>
>histogram {}
>
>Then within the loop you use the following construct:
>
>histogram[word] = histogram.get(word, 0) + 1
>
>That's quite a bit hairier and it requires remembering to use braces
>{}, then square brackets [], then parentheses (), and accessing the
>dictionary via two different techniques in the same line.
>
>This seems sort of unPythonesque to me, given the relative cleanliness
>and obviousness (after seeing it once) of other common Python
>constructs.
>
>But I guess I'm making assumptions about what Python's philosophy
>really is. I would expect that a language with something as nice as
>
>[x**3 for x in my_list]
>
>would want to use something like:
>
>histogram[word]++
>
>or even combine them into something like:
>
>{histogram[word]++ for word in my_list}
>
>Is this just something that hasn't been done yet but is on the way, or
>is it a violation of Python's philosphy in some way?
>
>Since I'm trying to choose a good Swiss-Army-knife programming
>language, I'm wondering if this Python histogram technique is the sort
>of thing that bothers Pythonistas and gets cleaned up in subsequent
>versions because it violates Python's "philosophy" or whether, on the
>contrary, it's just the way Pythonistas like to do things and is a
>fair representation of what the community (or Guido) wants the
>language to be.
>
>Thanks.

IMO Python is a very good Swiss Army Knife. If you think you have a pattern
that you will want to re-use, then it is pretty easy to make something to
hide the stuff you want as default, and leave out some unnecessaries. E.g.,
if you want histograms, it's easy to make a histogram class that will
take a word sequence and give you a histogram object that will do what you want,
and that you can add to as your requirements change. E.g.,

 >>> class Histogram(dict):
 ...     def __iadd__(self, name):
 ...         self[name] = self.get(name, 0) + 1
 ...         return self
 ...     def __init__(self, wordseq=None):
 ...         if wordseq is not None:
 ...             for w in wordseq: self += w
 ...

now you can start to use this, e.g., pick some "words"

 >>> words = 'a a bb a c bb a'.split()
 >>> words
 ['a', 'a', 'bb', 'a', 'c', 'bb', 'a']

and make a histogram object

 >>> h = Histogram(words)
 >>> h
 {'a': 4, 'c': 1, 'bb': 2}

That's the __repr__ of the underlying dict showing the data. We can use other
dict methods, e.g.,

 >>> for name, value in h.items(): print '%6s: %s' %(name, value*'*')
 ...
      a: ****
      c: *
     bb: **

or we could have overridden .items() to return a list sorted by names or first by frequency value.
Or we could add some specialized methods to do anything you like.

pump some more data:

 >>> for c in 'cccc': h+=c
 ...
 >>> h
 {'a': 4, 'c': 5, 'bb': 2}

and another type:

 >>> for i in range(5): h+=i
 ...
 >>> h
 {'a': 4, 0: 1, 'c': 5, 3: 1, 4: 1, 'bb': 2, 1: 1, 2: 1}

that's not very orderly

 >>> hitems = h.items()
 >>> hitems.sort()
 >>> hitems
 [(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), ('a', 4), ('bb', 2), ('c', 5)]

Of course we could override the items method of our class to return a sorted list,
by key or by value (which is occurence frequency here)

and we could override __repr__ and/or __str__ to return other representations of
the histogram. Etc., etc.

The point is, if we created a built-in way to handle every little problem someone
would like a concise way to handle, python would become a gigantic midden of one-offs.

So Python makes your one-offs easy instead ;-)
OTOH, if enough people like something, eventually it may get added.

Regards,
Bengt Richter