My first Python program -- a lexer

Steve Holden steve at holdenweb.com
Tue Nov 11 11:18:09 EST 2008


Thomas Mlynarczyk wrote:
> John Machin schrieb:
> 
>> You are getting closer. A better analogy is that using a dict is like
>> transporting passengers along an autobahn in an aeroplane or
>> helicopter that never leaves the ground.
> 
> It is not a bad idea to transport passengers in an airplane, but then
> the airplane should not follow an autobahn, but use the shortest way --
> at an appropriate altitude. Translated back to Python dicts, this would
> mean that using a dict for my purposes is a good idea, but that I do not
> make use of its full capabilities -- in other words, I should rewrite my
> code -- still using dict but in a better way. Or do you mean that for 10
> kilometers of autobahn, an airplane would be overkill?
> 
The latter.

> Maybe I am a bit biased by my PHP background, but { name: regex, ... }
> looks simpler to me than [ ( name, regex ), ... ], because the former is
> not a nested structure, while the latter would be a 2D-array in PHP.
> 
But it's a question of the properties of the objects. Because lists are
ordered it's appropriate to use them when you want to iterate over them
and no random access is required.

If you need random access by key and no particular ordering is required
for the iteration then you should use a dict.

Forget "simplicity" until you know how the different objects are
implemented under the hood.

A good first rule is that simple, readable Python will usually be
efficient. As far as execution efficiency goes I'd be very surprised if
the dict solution was faster, but even if it were I would still prefer
the list-based solution because it's more appropriate to the problem.

"Premature optimization is the root of all evil" ...

> Suppose I use the dict and I want to access the regex associatetd with
> the token named "tokenname" (that is, no iteration, but a single
> access). I could simple write tokendict["tokenname"]. But with the list
> of tuples, I can't think of an equally easy way to do that. But then, as
> a beginner, I might be underestimating Python.
> 
But when do you want to do that? There's no point inventing use cases -
they should be demonstrated needs.

The only time your original code needs to use the token name to retrieve
the value is for your compile() method, but assuming the passed-in
tokens was of the form [(name, pattern), ...] you could just as easily
throw the method away and in __init__() write

    self.tokens = [(name, re.compile(pat)) for name, pat in tokens]

Or simply pass compiled token patterns in in the first place when they
are necessary ... then the caller has the option of not bothering to
optimize in the first place!

The advantage of the list-based approach is that it at least allows of
the possibility that you can observe the relative frequencies of the
tokens' appearance in real code, and then optimize the ordering to give
best (mean) performance (by putting the commonest token first) should
tokenization become a program hotspot.

With a dict you have no such opportunity, because the ordering is
determined by the implementation and not by your data structure.

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/




More information about the Python-list mailing list