[SciPy-User] [SciPy-user] Maximum entropy distribution for Ising model - setup?

Fri Mar 12 05:38:22 EST 2010

Hi,

I apologise for the extended delay for this reply - I originally wrote
it while I was on holiday, but there was something funny with the
coupling/theta values in my example that I couldn't get to the bottom
of in time and I'm afraid I forgot it about it...

Now I come to ask the scipy list another question of my own I thought
I should really first answer this! Hopefully it is still of some use.

On Wed, Jan 13, 2010 at 3:42 AM, Jordi Molins Coronado
<jordi_molins at hotmail.com> wrote:
> Hello,
> I find all the ideas posted in reply to my message very interesting, thank
> you very much to all who have answered to my question. Especially, I would
> like to know more about Kilian's and Robin's suggestions.
> In particular, I find difficult to understand and translate the ideas posted
> by them into my background. Of course, this is not Kilian's or Robin's
> fault, but my complete fault due to lack of knowledge.
> To Robin:
> - Is there a paper covering your package, but explained in layman's terms,
> not requiring previous knowledge on the subject? Or maybe a simple but
> fully-worked example (ideally closely related to the Ising model) that can
> be used in your package to see how everything works.

The original paper describing the toolbox and the maximum entropy
algorithm is here:
http://www.frontiersin.org/neuroscience/neuroinformatics/paper/10.3389/neuro.11/004.2009/
but I suppose it is pretty focussed on our field. I hope the maximum
entropy section might make sense on its own though.
The algorithm is based on information geometry of Amari which is
covered in this paper, which is a bit heavy going but not particularly
field specific:
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=930911

For an example:

# start with 4 binary variables - 1000 trials
X = np.random.random_integers(0,1,(4,1000))
# convert binary 4-vector to a decimal value for each trial
import pyentropy
X_d = pyentropy.decimalise(X,4,2)
# sample full joint probability distribution (16 possible values)
p = pyentropy.prob(X_d, 16)
# this p vector is ordered so that P(x=(1,0,1,0)) is found at index
with decimal #value of the word, ie p[1] = P(x=(0,0,0,1)), p[10] =
P(x=(1,0,1,0))

# obtain maximum entropy solution
from pyentropy.maxent import AmariSolve
s = AmariSolve(4,2)
# solve for maximum entropy distribution preserving up to second
# order marginals ( = ising for binary variables)
p_me = s.solve(p, k=2)

# theta vector (which is h, J's in different notation)
# this should be covered in the paper
# this is length 15 (no zero entry)
theta = s.theta_from_p(p_me)
# theta[:4] are the first order thetas (the h's)
# theta[4:10] are the second order thetas (the J's) (I would have to think
# a bit harder to get the exact ordering if you are interested in that)
# theta[10:] are the higher order terms which should be numerically
zero (ie #10^-16 or thereabouts)

# marginals
eta = s.eta_from_p(p_me)
# these are the corresponding ordered marginals
# so from contraints (s.eta_from_p(p) - s.eta_from_p(p_me))[:10] is close
# to zero (first and second order marignals are preserved)

If you don't want to sample the full space you can provide just the
vector of marginal values to the solve function... ie:

s.solve(s.eta_from_p(p),k=2, eta_given=True)
Or even
marg_constraints = s.eta_from_p(p)[:10]
s.solve(r_[marg_constraints,zeros(5)],k=2, eta_given=True)
since the higher order marginals don't make any difference.
(here marg_constraints would be a vector <S1>..<S4> then <SiSj> etc.)

Please let me know how you get on. Of course the major limitation with
this is that you can only do small populations of variables (up to
18/20 should be comfortable) - whereas the probabilistic flow method
looks very promising for much larger populations. I would be really
interested to see if the Ising model algorithm would be extendable to
the more general finite alphabet, any order maximum entropy solution.

In my work to date it hasn't been a practical problem though - since I
am interested in entropy and information values, which are impossible
to sample on much bigger spaces. For me the limitation has always been
the available data rather than the computation time for the maximum
entropy solutions.

Cheers

Robin