Questions tagged [document-term-matrix]

7 questions
4
votes
0 answers

Bag of words: Prediction on new (out-of-sample) data

I'm working with a bag of words in R: library(tm) corpus = VCorpus(textsource) dtm = DocumentTermMatrix(corpus) dtm = as.matrix(dtm) I use the matrix dtm to train a lasso model. Now I want to predict new (unseen) text. The problem is, that I need…
Peter
  • 7,896
  • 5
  • 23
  • 50
1
vote
0 answers

LIterature on query generation from a labelled document term matrix

I have a labelled dataset of relevant and non-relevant documents for which I built a boolean document term matrix. I am trying to develop an algorithm which given this input would create a text-based boolean search rule which identifies a subset of…
Bakaburg
  • 195
  • 5
1
vote
2 answers

How to interpret k=1 when summing over k? Is there something wrong with this equation?

I am having a hard time understanding the following equation to do with document clustering. Is there something wrong with it? $$\ln \left(\prod_{n=1}^N \sum_{k=1}^K p\left(x_n \mid z_n, k=1\right) p\left(z_n, k=1\right)\right)$$ In pseudo code I…
1
vote
2 answers

Topic Modeling - n-grams or 1,2,3,...n-grams?

Do people use n-grams or 1,2,3,...n-grams in both matrix factorisation and generative models in Topic Modeling? I've been trying to understand the basics of Topic Modeling and came to know that there are two ways - Matrix Factorisation like LSA and…
0
votes
1 answer

Text mining in Amazon product review using R. I wasn't able to extract the particular product's review

Text mining on Amazon product review using R Program. I wasn't able to extract the particular product's review(i.e.If iphone 11 has 6k review, I need to extract all of it.) I'm getting only one column labelled x. Please let me know where I need to…
0
votes
1 answer

Is it possible to classify documents of corpus using labels?

I have a corpus of 23000 documents that need to be classified into 5 different categories. I do not have any labeled data available to me, just freeform text documents and labels(yes, one-word labels, not topics). So I followed a 2-step…
0
votes
2 answers

building embeddings for Phrases from scratch

I have a datadet with many phrases which I would like to embed them from scratch. I dont want the cosine of the words in order to get a phrase embedding, this is because the phrases may appear in a different enviroment and I want to embed the two…