For questions about text classification, the task of assigning predefined categories (or classes) to free-text documents.
Questions tagged [text-classification]
273 questions
77
votes
4 answers
What is purpose of the [CLS] token and why is its encoding output important?
I am reading this article on how to use BERT by Jay Alammar and I understand things up until:
For sentence classification, we’re only only interested in BERT’s output for the [CLS] token, so we select that slice of the cube and discard everything…
user3768495
- 987
- 1
- 7
- 8
9
votes
2 answers
Effect of Stop-Word Removal on Transformers for Text Classification
The domain here is essentially topic classification, so not necessarily a problem where stop-words have an impact on the analysis (as opposed to, say, sentiment analysis where structure can affect meaning).
With respect to the positional encoding…
Andy
- 650
- 4
- 13
6
votes
1 answer
Using Trainable=True in Keras Embedding obtained better performance
It is suggested by the author of Keras [1] to use Trainable=False when using the embedding layer in Keras to prevent the weights from being updated during training. But in my experience, I always got better performance (lower error in regression)…
sugab
- 163
- 1
- 5
6
votes
1 answer
How to use ndcg metric for binary relevance
I am working on a ranking problem to predict the right single document based on the user query and use the NDCG metric to measure the model.
Given the details :
Queries ( Q ), Result Document ( D ), Relevance score.
But the relevance score is a…
kannandreams
- 61
- 1
- 4
6
votes
2 answers
What are the exact differences between Word Embedding and Word Vectorization?
I am learning NLP. I have tried to figure out the exact difference between Word Embedding and Word Vectorization. However, seems like some articles use these words interchangeably. But I think there must be some sort of differences.
In…
Nahid
- 63
- 1
- 1
- 3
5
votes
1 answer
How to preprocess with NLP a big dataset for text classification
TL;DR
I've never done nlp before and I feel like I'm not doing it in the good way. I'd like to know if I'm really doing things in a bad way since the beginning or there's still hope to fix those problems mentioned later.
Some basic info
I'm trying…
gabriel garcia
- 153
- 4
5
votes
1 answer
Text classification based on n-grams and similarity
I have tried to cluster hundred texts using k-means clustering. I would like to consider other algorithms to group text based on their content and try to spot news not related to other news (topic different).
I would like to know if there is some…
Val
- 51
- 3
4
votes
1 answer
How to include categorical fields to enhance a text classification
I would have a question on how to add more categorical fields in a classification problem.
My dataset had initially 4 fields:
Date Text Short_Mex Username Label
01/01/2020 I…
Math
- 161
- 1
- 13
4
votes
0 answers
Bag of words: Prediction on new (out-of-sample) data
I'm working with a bag of words in R:
library(tm)
corpus = VCorpus(textsource)
dtm = DocumentTermMatrix(corpus)
dtm = as.matrix(dtm)
I use the matrix dtm to train a lasso model.
Now I want to predict new (unseen) text. The problem is, that I need…
Peter
- 7,896
- 5
- 23
- 50
3
votes
2 answers
Over-sampling: is my model over-fitting?
I would like to ask you some questions on how to consider (good or not) the following results:
OVER-SAMPLING
precision recall f1-score support
0.0 1.00 0.85 0.92 873
1.0 0.87 …
V_sqrt
- 295
- 1
- 8
3
votes
1 answer
Predictive output with your own model built
I would need to better understand how can be created a machine learning algorithm from scratch using an own model developed based on boolean values, for example # of words in a text, # of punctuation, # of capital letters, and so on, to determine if…
LdM
- 165
- 9
3
votes
2 answers
Is there any way to plot ROC curve for Ensemble hard voting classifier?
I am working on a multi-class text classification problem and performing an Ensemble learning for text classification. I chose hard voting as ensemble technique. I tried to plot ROC curve for my ensemble method but it didn't work by showing the…
Muneeb
- 73
- 1
- 7
3
votes
1 answer
use genetic algorithm as a feature selection for text classification
how to apply the genetic algorithm as a feature selection for text classification in python
I need to use GA to select most relevant feature in text classification
Ahmed
- 31
- 1
3
votes
1 answer
Overfitting with text classification using Transformers
I am trying to make a binary text classification model by using the encoder part of the transformer and then using its output to feed into an LSTM network. However, I am not able to achieve good accuracy on both the training set (92%) and the…
Khobaib Alam
- 39
- 1
- 2
3
votes
1 answer
Text vectorizer that capture feature offset in the text?
I'm using sklearn Tfifdfvectorizer to extract feature from text towards text classification.
I believe the information I need tends to be in the beginning of the document, so I would like to somehow capture the offset of each feature per document…
R Sorek
- 53
- 3