[scikit-learn] Using perplexity from LatentDirichletAllocation for cross validation of Topic Models

Markus Konrad markus.konrad at wzb.eu
Wed Oct 11 10:33:43 EDT 2017


Hi again,

> just a note that if you're using this for topic modelling, perplexity might
> not be a good choice of objective function. others have been proposed. see
> the diagnostic functions for MALLET topic modelling for instance.

unfortunately I don't find any of these methods implemented in Python
and as they seem to be rather complicated, I don't think I can implement
them myself.
Since perplexity on held-out data is reported quite often in papers on
topic modeling, I wanted to use it for my own experiments in topic
modeling. There are also methods that don't rely on validation with
held-out data (like Cao, Juan 2009 or Arun 2010) and I'm using them but
still I'd like to compare those results with cross validation of models
with different num. of topics.

Bye,
Markus


More information about the scikit-learn mailing list