I would like to know if you people have some good tutorials (fast and straightforward) about topic models and LDA, teaching intuitively how to set some parameters, what they mean and if possible, with some real examples.
5 Answers
If you're working in R, Carson Sievert's tutorial on using LDA to model topics in movie reviews is an excellent starting point:
http://cpsievert.github.io/LDAvis/reviews/reviews.html
This tutorial makes use of LDAvis, an interactive visualization of topic and word distributions that can really aid intuition.
Also, although not short, David M. Blei's lectures on topic models are a great resource for understanding the meaning behind the parameters: http://videolectures.net/mlss09uk_blei_tm/
- 101
- 4
I highly recommend this tutorial: Getting Started with Topic Modeling and MALLET
Here are some additional links to help you get started...
Good introductory materials (including links to research papers): http://www.cs.princeton.edu/~blei/topicmodeling.html
Software:
- MALLET (Java): http://mallet.cs.umass.edu/topics.php
- topic modeling developer's guide: http://mallet.cs.umass.edu/topics-devel.php
- gensim (Python): http://radimrehurek.com/gensim/
- topicmodels (R): http://cran.r-project.org/web/packages/topicmodels/index.html
- Stanford Topic Modeling Toolbox (designed for use by social scientists): http://www-nlp.stanford.edu/software/tmt/tmt-0.4/
- Mr.LDA (scalable topic modeling using MapReduce): http://lintool.github.io/Mr.LDA/
- If you're working with massive amounts of input text, you might want to consider using Mr.LDA to build your topics models -- its MapReduce-based approach might be more efficient when working with lots of data.
Even more here on the Biased Estimates blog: Topic Models Reading List
- 1,541
- 10
- 10
If you are looking for something simple to start with and easy to implement, I would recommend this.
- 281
- 3
- 6
The CLARIN-D project has collected some good pointers to tutorials for topic modeling and LDA on the Teaching ans Learning Materials Collection (TeLeMaCo) site hosted by the Universität des Saarlandes CLARIN centre.
I suggest trying Machine Learning Plu's Gensim tutorial. It will give you a holistic overview, on NLP and LDA, including: how to pre-process your data, do feature engineering and apply LDA.
- 1,831
- 11
- 23
- 34
- 21
- 2