[SciPy-user] maxentropy

Ed Schofield schofield at ftw.at
Thu Mar 23 12:29:59 EST 2006


On 21/03/2006, at 9:53 PM, Matthew Cooper wrote:

>
> Hi Ed,
>
> I am playing around with the code on some more small examples and
> everything has been fine.  The thing that will hold me back from
> testing on larger datasets is the F matrix which thus far requires the
> space of (context,label) pairs to be enumerable.  I know that
> internally you are using a sparse representation for this matrix.  Can
> I initialize the model with a sparse matrix also?  This also requires
> changes with the indices_context parameter in the examples.  

Hi Matt,
Yes, good point.  I'd conveniently forgotten about this little problem
;)  It turns out scipy's sparse matrices need extending to support
this.  I've made some changes already (to the ejs branch); the next
requirement is more flexible slicing support.  I added partial slicing
support (for slicing an entire row of a lil_matrix) a couple of months
ago, but this isn't good enough here, although it shouldn't be too hard
to extend.  One upside of using slicing, rather than fancy indexing as
before (which some of scipy's sparse matrix formats do already support),
is that the indices_context parameter can then go away completely; we'll
just expect the features indices to be ordered contiguously, which I
think is perfectly reasonable here.

I've checked in my latest code (into the ejs branch) in case you want to
follow my progress or work on it yourself.  But the conditional maxent
examples no longer work, so avoid doing 'svn update' if you want to keep
the working version for now...


> I see that you also have an unconditional bigmodel class that seems
> related, but I'm not sure what would need to be changed.

Actually, the definition of 'big' here is 'requires Monte Carlo
simulation' -- for example, continuous models in many dimensions or
models on very large discrete spaces, such as the space of all possible
sentences.

I'll give some more thought to the rest of your post and get back to you
in a few more days...


-- Ed




More information about the SciPy-User mailing list