Questions tagged [document-term-matrix]
7 questions
4
votes
0 answers
Bag of words: Prediction on new (out-of-sample) data
I'm working with a bag of words in R:
library(tm)
corpus = VCorpus(textsource)
dtm = DocumentTermMatrix(corpus)
dtm = as.matrix(dtm)
I use the matrix dtm to train a lasso model.
Now I want to predict new (unseen) text. The problem is, that I need…
Peter
- 7,896
- 5
- 23
- 50
1
vote
0 answers
LIterature on query generation from a labelled document term matrix
I have a labelled dataset of relevant and non-relevant documents for which I built a boolean document term matrix.
I am trying to develop an algorithm which given this input would create a text-based boolean search rule which identifies a subset of…
Bakaburg
- 195
- 5
1
vote
2 answers
How to interpret k=1 when summing over k? Is there something wrong with this equation?
I am having a hard time understanding the following equation to do with document clustering.
Is there something wrong with it?
$$\ln \left(\prod_{n=1}^N \sum_{k=1}^K p\left(x_n \mid z_n, k=1\right) p\left(z_n, k=1\right)\right)$$
In pseudo code I…
Kirsten
- 67
- 7
1
vote
2 answers
Topic Modeling - n-grams or 1,2,3,...n-grams?
Do people use n-grams or 1,2,3,...n-grams in both matrix factorisation and generative models in Topic Modeling?
I've been trying to understand the basics of Topic Modeling and came to know that there are two ways - Matrix Factorisation like LSA and…
rahuladwani
- 11
- 1
0
votes
1 answer
Text mining in Amazon product review using R. I wasn't able to extract the particular product's review
Text mining on Amazon product review using R Program. I wasn't able to extract the particular product's review(i.e.If iphone 11 has 6k review, I need to extract all of it.) I'm getting only one column labelled x.
Please let me know where I need to…
0
votes
1 answer
Is it possible to classify documents of corpus using labels?
I have a corpus of 23000 documents that need to be classified into 5 different categories. I do not have any labeled data available to me, just freeform text documents and labels(yes, one-word labels, not topics).
So I followed a 2-step…
0
votes
2 answers
building embeddings for Phrases from scratch
I have a datadet with many phrases which I would like to embed them from scratch. I dont want the cosine of the words in order to get a phrase embedding, this is because the phrases may appear in a different enviroment and I want to embed the two…
Christina Valavani
- 27
- 2