Highest Voted 'text-classification' Questions

77

votes

4 answers

What is purpose of the [CLS] token and why is its encoding output important?

I am reading this article on how to use BERT by Jay Alammar and I understand things up until: For sentence classification, we’re only only interested in BERT’s output for the [CLS] token, so we select that slice of the cube and discard everything…

asked Jan 09 '20 at 17:20

user3768495

987
1
7
8

9

votes

2 answers

Effect of Stop-Word Removal on Transformers for Text Classification

The domain here is essentially topic classification, so not necessarily a problem where stop-words have an impact on the analysis (as opposed to, say, sentiment analysis where structure can affect meaning). With respect to the positional encoding…

nlp preprocessing transfer-learning transformer text-classification

asked Dec 03 '20 at 20:24

Andy

650
4
13

6

votes

1 answer

Using Trainable=True in Keras Embedding obtained better performance

It is suggested by the author of Keras [1] to use Trainable=False when using the embedding layer in Keras to prevent the weights from being updated during training. But in my experience, I always got better performance (lower error in regression)…

keras word-embeddings sentiment-analysis text-classification

asked Feb 10 '20 at 08:10

sugab

163
1
5

6

votes

1 answer

How to use ndcg metric for binary relevance

I am working on a ranking problem to predict the right single document based on the user query and use the NDCG metric to measure the model. Given the details : Queries ( Q ), Result Document ( D ), Relevance score. But the relevance score is a…

machine-learning recommender-system ranking learning-to-rank text-classification

asked Jan 10 '20 at 20:03

kannandreams

61
1
4

6

votes

2 answers

What are the exact differences between Word Embedding and Word Vectorization?

I am learning NLP. I have tried to figure out the exact difference between Word Embedding and Word Vectorization. However, seems like some articles use these words interchangeably. But I think there must be some sort of differences. In…

nlp word-embeddings word2vec text-classification tfidf

asked Mar 13 '22 at 17:20

Nahid

63
1
1
3

5

votes

1 answer

How to preprocess with NLP a big dataset for text classification

TL;DR I've never done nlp before and I feel like I'm not doing it in the good way. I'd like to know if I'm really doing things in a bad way since the beginning or there's still hope to fix those problems mentioned later. Some basic info I'm trying…

python nlp text-classification

asked Feb 01 '21 at 20:38

gabriel garcia

153
4

5

votes

1 answer

Text classification based on n-grams and similarity

I have tried to cluster hundred texts using k-means clustering. I would like to consider other algorithms to group text based on their content and try to spot news not related to other news (topic different). I would like to know if there is some…

python clustering data-science-model similarity text-classification

asked May 21 '20 at 07:57

Val

51
3

4

votes

1 answer

How to include categorical fields to enhance a text classification

I would have a question on how to add more categorical fields in a classification problem. My dataset had initially 4 fields: Date Text Short_Mex Username Label 01/01/2020 I…

python logistic-regression supervised-learning text-classification

asked Aug 30 '20 at 12:30

Math

161
1
13

4

votes

0 answers

Bag of words: Prediction on new (out-of-sample) data

I'm working with a bag of words in R: library(tm) corpus = VCorpus(textsource) dtm = DocumentTermMatrix(corpus) dtm = as.matrix(dtm) I use the matrix dtm to train a lasso model. Now I want to predict new (unseen) text. The problem is, that I need…

r text-classification bag-of-words document-term-matrix

asked Jun 28 '20 at 11:24

Peter

7,896
5
23
50

3

votes

2 answers

Over-sampling: is my model over-fitting?

I would like to ask you some questions on how to consider (good or not) the following results: OVER-SAMPLING precision recall f1-score support 0.0 1.00 0.85 0.92 873 1.0 0.87 …

machine-learning overfitting sampling text-classification

asked Nov 30 '20 at 04:43

V_sqrt

295
1
8

3

votes

1 answer

Predictive output with your own model built

I would need to better understand how can be created a machine learning algorithm from scratch using an own model developed based on boolean values, for example # of words in a text, # of punctuation, # of capital letters, and so on, to determine if…

machine-learning python classification predictive-modeling text-classification

asked Oct 08 '20 at 14:01

LdM

165
9

3

votes

2 answers

Is there any way to plot ROC curve for Ensemble hard voting classifier?

I am working on a multi-class text classification problem and performing an Ensemble learning for text classification. I chose hard voting as ensemble technique. I tried to plot ROC curve for my ensemble method but it didn't work by showing the…

machine-learning ensemble-modeling text-classification

asked Jul 07 '20 at 17:56

Muneeb

73
1
7

3

votes

1 answer

use genetic algorithm as a feature selection for text classification

how to apply the genetic algorithm as a feature selection for text classification in python I need to use GA to select most relevant feature in text classification

feature-selection text-classification genetic-algorithms

asked Jul 05 '20 at 18:07

Ahmed

31
1

3

votes

1 answer

Overfitting with text classification using Transformers

I am trying to make a binary text classification model by using the encoder part of the transformer and then using its output to feed into an LSTM network. However, I am not able to achieve good accuracy on both the training set (92%) and the…

classification nlp transformer text-classification

asked Apr 23 '20 at 12:43

Khobaib Alam

39
1
2

3

votes

1 answer

Text vectorizer that capture feature offset in the text?

I'm using sklearn Tfifdfvectorizer to extract feature from text towards text classification. I believe the information I need tends to be in the beginning of the document, so I would like to somehow capture the offset of each feature per document…

scikit-learn feature-extraction text tfidf text-classification

asked Mar 19 '20 at 14:39

R Sorek

53
3

Questions tagged [text-classification]