set partition question

pball.benetech at gmail.com pball.benetech at gmail.com
Mon May 26 02:33:00 EDT 2008


On May 25, 10:41 pm, miller.pau... at gmail.com wrote:
> So, basically, V = (v_1, v_2, ... , v_{k-1}, v_k) can be regarded as
> an abstract, k-dimensional vector, right?

Yes.


> If I understand your revised problem statement correctly, what you
> really want to do is build a graph of these vectors, where graph
> adjacency is equivalent to adjacency in your sense.  That is, imagine
> you have V_1, V_2, ... , V_n all sitting out in front of you,
> represented abstractly as simple points.  Draw a line between V_i and
> V_j if they are "adjacent" in your sense of the word.  What you have
> then is a graph structure where your version of adjacency exactly
> corresponds to graph adjacency.  Then, in your language, a stratum is
> simply a path in this graph, and finding those is easy.

You're solving an earlier part of the problem which I call stratum
generation. I've never thought to use a graph representation for
stratum generation -- very interesting, and I'll pursue it. Would you
be willing to outline how you'd do it here?

I've had fun "breeding" strata using genetic algorithms, and a lot of
interesting ideas came out of that experiment, in particular the
utility of building randomly shaped strata to do indirect estimates
(the objective of the set arithmetic described here). I used GA's
because I think that the space of possible strata is too big to search
by brute force.

I'd still like to have the graph representation for the brute force
solution. Or, in another random algorithm, the graph could be a
sampling frame from which random paths could be pulled.

In a real dataset, some of the strata will contain adequate data to
make an estimate, and these are called valid strata. Other legal
strata (as shown by a path through the graph) may not have adequate
data to make an estimate (thus, legal but invalid).

We've done this problem by hand: observing that we can estimate a
stratum (1,2,3) and another (2,3) but not (1). So
     E(1) = E(1,2,3) - E(2,3)
There are issues with the variance of E(1), but that's a very
different problem. Using the valid strata E(1,2,3) and E(2,3) we've
made an indirect estimate of E(1).

I'm looking for an algorithm which automates the search for
combinations like the one above. Given a set of valid strata[1], yield
combinations of valid strata which, when combined via union or
difference[2], evaluate to a stratum (adjacent but possibly not
valid). The combination of valid strata is called an indirect
estimate

[1] strata can be generated via many methods; they all end up in the
same database
[2] Note that the set of operations is shrinking :)

Thanks in advance for your thoughts -- PB.




More information about the Python-list mailing list