[SciPy-user] maxentropy

Tue Mar 21 15:53:04 EST 2006

Hi Ed,

I am playing around with the code on some more small examples and everything
has been fine.  The thing that will hold me back from testing on larger
datasets is the F matrix which thus far requires the space of
(context,label) pairs to be enumerable.  I know that internally you are
using a sparse representation for this matrix.  Can I initialize the model
with a sparse matrix also?  This also requires changes with the
indices_context parameter in the examples.

I see that you also have an unconditional bigmodel class that seems related,
but I'm not sure what would need to be changed.  For a conditional model,
computing the feature expectation under the current model still requires
knowledge of the training samples.  So what I think would make sense is to
use two sparse matrices.  One matrix needs to represent the training data
(our model is for q(x|w) but we still use the empirical p(w) as the prior on
the context when computing the feature expectations under the model (so we
don't need to consider the whole exponential space of possible contexts).
This is shown in Malouf's paper in the equation for the log-likelihood (2)
and the second equation in Sec 2.1).  Each feature then maps the training
data to the corresponding feature output.  This requires an N vector per
feature so a N by (#features) sparse matrix could be used for F.  Does this
make sense?

I should be able to test on some standard datasets if we can figure out how
to handle the larger context spaces that come with larger text collections.

Matt

On 3/18/06, Ed Schofield <schofield at ftw.at> wrote:
>
>
> On 18/03/2006, at 1:31 AM, Matthew Cooper wrote:
>
> >
> > Hi Ed,
> >
> > Thanks again for working on this.  I can try and work on it a bit
> > this weekend.  I've had time to look over the two example scripts
> > you provided.  There seemed to be  some difference in the two in
> > terms of the call to the conditionalmodel fit method.  In the low
> > level example, the count parameter seemed to provide the empirical
> > counts of the feature functions, where the features were simply
> > (context,label) co-occurrence.  In the high level example, the
> > features are more complicated, and the counts parameter seems to
> > have different dimensionality.  I'll try and get a working high
> > level example together next.
> >
>
> Hi Matt,
>
> I've now found and fixed some bugs in the conditional maxent code.
> The computation of the conditional expectations was wrong, and the
> p_tilde parameter was interpreted inconsistently.  Both the examples
> work now!  Fantastic!
>
> I'd be very grateful for any assistance you could give in providing
> more examples -- especially real examples from text classification.
> The two examples at the moment are too artificial and perhaps a bit
> confusing.  Or if you have any suggestions or patches for simplifying
> the interface (e.g. the constructor arguments) or any other
> improvements (e.g. bug fixes, better docs, or a tutorial) I'd also
> readily merge them.
>
> Let me know how you go with it.  When you're happy that it's all
> working, I'll merge it with the main SVN trunk.
>
> -- Ed
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20060321/ed5d130e/attachment.html>