Solve a Debate

Mon Feb 18 15:30:45 EST 2008

On Feb 17, 11:23 pm, greg <g... at cosc.canterbury.ac.nz> wrote:
> Wolfgang Draxinger wrote:
> > Somehow you seem to think, that a lookup table will require more
> > resources (memory I guess you thought) than a sequence of
> > comparisons. However you didn't take into account, that the
> > program code itself requires memory, too (for the operation
> > codes).
>
> In Python, there's the additional twist that for
> lookup tables based on a mutable type, such as a
> dictionary, you need to execute code to build the
> dictionary. So you end up using very roughly twice
> the memory -- for the code to build the dictionary,
> and the dictionary itself.
>
> If the dictionary isn't too huge, and it's looked
> up many times during each run of the program, the
> extra speed of the dict lookup is probably worth
> the overhead of creating it.
>
> If the dict is very large, and is consulted
> relatively few times in a given run, then it might
> well be faster and not use too much more memory to
> use a tree (NOT a linear sequence!) of if-else
> statements in the code.
>
> You could also consider loading the dict from some
> other form, such as a pickle, instead of creating
> it using code. This would use less memory, although
> probably would take roughly the same time to set
> up the table.
>
> For extremely large infrequently-used tables, it's
> probably better to look at not loading the whole
> table into memory at all, and keeping it in some
> kind of external indexed structure such as a b-tree,
> relational database, etc.
>
> In other languages, the tradeoffs will be completely
> different. E.g. in C, if you can describe the table
> entirely with static data, it'll be very fast to
> load and incur no overhead for code to create it at
> all. Also, with demand-paging out of the executable
> file, and infrequent lookups, only parts of the
> table will actually get loaded anyway.
>
> --
> Greg

Past a "many-small" certain point on numbers of hash-tables, if that's
the right word, in a program, and intepreter process on a machine, is
it be more time-efficient to allocate a 2**32-byte table?  Are
'modulo' and 'doublesize' the only steps of the lookup process that it
would eliminate, and are they expensive ones?  If so, what point?  If
not, what's a tighter bottleneck?