A critic of Guido's blog on Python's lambda

Ken Tilton kentilton at gmail.com
Mon May 8 14:29:38 EDT 2006


[Sorry, i was just reading comp.lang.lisp, missed the following till 
someone mentioned it in email. k]

Alex Martelli wrote:
> Carl Friedrich Bolz <cfbolz at gmx.de> wrote:
>    ...
> 
>>>an extension that allows the programmer to specify how the value of
>>>some slot (Lisp lingo for "member variable") can be computed.  It
>>>frees the programmer from having to recompute slot values since Cells
> 
>    ...
> 
>>I have not looked at Cells at all, but what you are saying here sounds
>>amazingly like Python's properties to me. You specify a function that
>>calculates the value of an attribute (Python lingo for something like a
> 
> 
> You're right that the above-quoted snipped does sound exactly like
> Python's properties, but I suspect that's partly because it's a very
> concise summary.  A property, as such, recomputes the value each and
> every time, whether the computation is necessary or not; in other words,
> it performs no automatic caching/memoizing.

Right, and the first thing we did was simply that, no memoizing. The 
second thing we did was memoize without tracking dependencies, updating 
everything on each pass thru the eventloop. We knew both would not 
(uh-oh) scale, but we wanted to see if the approach solved the 
computational problem that started the whole research programme.

Soon enough we were tracking dependencies.


> 
> A more interesting project might therefore be a custom descriptor, one
> that's property-like but also deals with "caching wherever that's
> possible". This adds interesting layers of complexity, some of them not
> too hard (auto-detecting dependencies by introspection), others really
> challenging (reliably determining what attributes have changed since
> last recomputation of a property).  Intuition tells me that the latter
> problem is equivalent to the Halting Problem -- if somewhere I "see" a
> call to self.foo.zap(), even if I can reliably determine the leafmost
> type of self.foo, I'm still left with the issue of analyzing the code
> for method zap to find out if it changes self.foo on this occasion, or
> not -- there being no constraint on that code, this may be too hard.

"no constraint on the code" is a sufficient objection but also: if the 
rule for the code is (in neutral pseudo-code)

     if a is true
      then return b
      else return c

..you only want to depend on one of b or c at a time. (also, not do the 
internal dependency tracking on, say, b if it happens to hold an 
immutable value (part of the Cells API). Nice performance win, it turns out.

I just keep what I call a "datapulse ID", sequentially growing from 
zero, in a global variable. Each ruled Cell keeps track of its memoized 
value, datapulse stamp, and whether it in fact changed value in reaching 
its current datapulse stamp. (I can reach the current datapulse stamp by 
determining no dependency (direct or indirect recursively down the 
dependency graph) is both more current than me /and/ in fact changed in 
value getting there.[1] If not, I take on the current datapulse but 
never run my rule. Or, if yes, I run my rule but might compute the same 
value as last time. Either way, I can flag myself as current but 
not-actually-changed.)

So we have push and pull. The whole thing gets kicked off by a setting 
operation on some slot, who notifies dependents that they should 
recompute. This is a cascade of further notifications if anyone notified 
recomputes and in fact computes a different value. The pull comes in 
while rules are running to make sure no obsolete value gets used. So the 
dependency graph gets updated JIT during rule evaluations, possively 
recursively so.


> 
> The practical problem of detecting alterations may be softened by
> realizing that some false positives are probably OK -- if I know that
> self.foo.zap() *MAY* alter self.foo, I might make my life simpler by
> assuming that it *HAS* altered it. This will cause some recomputations
> of property-like descriptors' values that might theoretically have been
> avoided, "ah well", not a killer issue.  Perhaps a more constructive
> approach would be: start by assuming the pseudoproperty always
> recomputes, like a real property would; then move toward avoiding SOME
> needless recomputations when you can PROVE they're needless. You'll
> never avoid ALL needless recomputations, but if you avoid enough of them
> to pay for the needed introspection and analysis, it may still be a win.
> As to whether it's enough of a real-world win to warrant the project, I
> pass -- in a research setting it would surely be a worthwhile study, in
> a production setting there are other optimizations that look like
> lower-hanging fruits to me.  But, I'm sure the Cells people will be back
> with further illustrations of the power of their approach, beyond mere
> "properties with _some_ automatic-caching abilities".

Who knows, maybe something like that lies in the future of cells. It 
would not be because of the need for update being undecidable, it would 
be because the decision somehow would be too expensive compared to "Just 
Run the Rule!"[2] It seems every new application brings up interesting 
new requirements where Cells can usefully be extended with new 
capabilities. But over time the code has gotten simpler and simpler, so 
I think we are headed in the right direction.

kenny

[1] Aha! I see a flaw. Arises if two datapulses pass before I (a cell 
<g>) get read and must determine if my cache is obsolete, and some 
dependency changed in the first datapulse but not the second. I have to 
enhance regression test suite and cure. One possible cure is akin to 
your thinking: if I missed a generation, I have to assume it changed in 
the missed generation. The alternatives are keeping a history of my 
datapulses going back no further than the last true change, or having a 
Cell keep a separate record of the last value used from each dependency. 
  The second could get expensive if I consult a lot of other cells, 
while the first ... ah wait, I do not need a history, I just need one 
new attribute: datapulse-of-latest-actual change. Sweet.

[2] I am making a mental note to look into that optimization.



More information about the Python-list mailing list