Dice gen and analyser script for RPGs: comments sought

Tue Sep 12 21:48:19 EDT 2006

Paul Rubin <http://phr.cx@NOSPAM.invalid> wrote 

(re <http://www.sailmaker.co.uk/newfiles/dice.py>)

> I looked at this for a minute

Thanks for taking the time to look over it; I greatly appreciate it!

> and found it carefully written but somewhat hard to understand. 

Obviously, as the author, I have a blind spot about others
understanding it, but I always strive for my code to be self-commenting
and understandable at first sight, so I would genuinely welcome
feedback on which parts are difficult to understand and need more
commenting. 

Perhaps it is insufficient commenting of the implementations of the
various subclasses of class Dice? I admit /they/ get a but funky in
places.

> I think it's maybe more OOP-y than
> Python programs usually are; perhaps it's Java-like.  

That's interesting: while my background is certainly more OO than
functional, Java is the language I use least. Certainly I relish the
freedom to mix OO, functional and procedural paradigms that you get
with Python, so if there are ways in which that mixture can improve
dice.py, please enlighten me!

On the other hand, is "being more OOP-y than Python programs usually
are" a bad thing in itself, if the object analysis is cleanly designed?

> There are some
> efficiency issues that would be serious if the code were being used on
> large problems, but probably no big deal for "3D6" or the like.  In
> particular, you call histogram.numSamples() in a bunch of places and
> each call results in scanning through the dict.  

A good point which I did consider, however after practical tests I
ruled in favour of code simplicity for the case in hand, while being
careful to leave the Histogram class open to optimisations such as you
describe via subclassing. 

An earlier draft of the Histogram class did indeed have each method do
a single pass through the dict, accumulating what is now numSamples(),
sigma() and sigmaSquared() in local variables. But in practice I found
that for my problem domain, even a 'large' problem would have
relatively few histogram buckets, and computing the summary statistics
for the histogram took a /miniscule/ amount of time compared to the
time taken by generating the data. 

On that basis I refactored the Histogram class to follow the "once and
only once" coding principle, resulting in a much clearer and more
concise Histogram class, with very simple and natural code that any
high-school statisician can immediately see is correct. I found no
measurable perfomance hit.

> You might want to cache this value in the histogram instance.  

On the same basis as above, I decided against caching any results in
the Histogram class, preferring simplicity of code to the additional
logic that would be required to manage cache invalidation upon arrival
of new samples. 

I've learned the hard way that caching, while a dramatic optimisation
in the right place, introduces a lot of complexity and pitfalls (even
more so in multi-threaded scenarios), so these days I only do it when
performance analysis reveals the need.

Of course, for histograms with huge numbers of buckets, there may well
be mileage in subclassing the Histogram class to implement the above
optimisations, and I think the class is sufficiently cleanly designed
to facilitate that.

> Making histogram a subclass of dict also seems a little bit peculiar.

Why? Genuine question. Keep in mind that I might want to re-use the
Histogram class for cases where the buckets may be floating point, so I
can't just use a vector.

I wanted a mapping of bucket->frequency with an easy way to add a new
sample by incrementing the frequency, so dict seemed a natural choice.
I am of course open to better ideas!

> You might find it more in the functional programming spirit to use
> gencomps rather than listcomps in a few places:
> 
>         return sum([freq * val for val, freq in self.items()])
> becomes
>         return sum((freq * val) for val, freq in self.iteritems())

Nice: thanks for making me aware of gencomps! 

I agree that they're better, but when at home I'm on Mac OS X and so
stuck with Python 2.3 unless I engage in the necessary shenanigans to
slap Python 2.4 onto Mac OS X 10.4.7. Yes, I know how to do it, I've
done it before, but it's a PITA and I don't want to deal with it any
more. 

Moreover I want my code to run on vanilla installations of Mac OS X
10.4.x, so I just put up with it and code to Python 2.3.

I'm on Apple's developer program, so if anyone has a Radar bug number I
can cite while requesting they pull their socks up and ship Python 2.4
with the OS, just let me know and I'll cite it to Apple. 

Richard.