Questions tagged [classifier]

75 questions
18
votes
3 answers

When should I use StandardScaler and when MinMaxScaler?

I have a feature vector with One-Hot-Encoded features and with continous features. How can I decide now, which data I shall scale with StandardScaler and which data scale with MinMaxScaler? I think I do not have to scale the one-hot-encoded anyway…
16
votes
1 answer

Train Accuracy vs Test Accuracy vs Confusion matrix

After I developed my predictive model using Random Forest I get the following metrics: Train Accuracy :: 0.9764634601043997 Test Accuracy :: 0.7933284397683713 Confusion matrix [[28292 1474] …
15
votes
2 answers

How do I get the feature importace for a MLPClassifier?

I use the MLPClassifier from scikit learn. I have about 20 features. Is there a scikit method to get the feature importance? I found clf.feature_importances_ but it seems that it only exists for decision trees.
jochen6677
  • 611
  • 2
  • 5
  • 10
6
votes
1 answer

XGBoost skews towards minority class

I have a dataset with 85k positive labels and 53k negative labels. For this use-case, I am trying to maximize my efforts to the negative class (accurately identify true negatives, and minimize false negatives). Currently, I am able to train a…
6
votes
3 answers

Classifier that optimizes performance on only a subset of the data?

I'm working on machine learning problem where I'm only interested in getting high accuracy within a narrow band of my predicted likelihoods. Specifically, I want an algorithm that will score very accurately when it predicts a likelihood above a…
5
votes
3 answers

How to handle "unknown" category in machine learning classification problems?

Tutorial problems come in the form of binary or mult-class classification where data are all properly labelled. In real-life applications, there are incoming data that do not belong to any category and cannot be classified. How can we handle these…
user781486
  • 1,455
  • 2
  • 17
  • 20
4
votes
1 answer

Selecting a boundary on a binary classifier to optimal precision and recall

I have a logistic regression classifier that shows differing levels of performance for precision and recall at different probability boundaries as follows: The default threshold for the classifier to decide which class something belongs to is 0.5.…
Sandy Lee
  • 267
  • 2
  • 9
4
votes
2 answers

What is a discrimination threshold of binary classifier?

With respect to ROC can anyone please tell me what the phrase "discrimination threshold of binary classifier system" means? I know what a binary classifier is.
girl101
  • 1,161
  • 2
  • 11
  • 26
4
votes
1 answer

Distinguising features of linear vs, non-linear machine learning models (algorithms)

What are some examples of linear and non-linear machine learning models (algorithms) for purposes of comparison between the two categories? Which are the parameters (or scalars in a linear algebraic sense) and which are the predictors/factors (or…
4
votes
1 answer

One hot encoding of target space

I had a face to face interview for a data scientist job a few days ago. One of the questions I was asked was: in the case of classifier predicting the brand of TV from some features (price, size, specs, ...) out of 4 possible brands, how do you…
Learning is a mess
  • 646
  • 1
  • 8
  • 16
4
votes
1 answer

How should I calculate AUROC if my (TPR,FPR) doesn't go till (1,1)? Should it be area just under the curve or should I include 1 and calculate?

I am running a model where it generates song detections with a confidence value. I then validate it across an annotated dataset. I then plot the values of TPR and FPR at each confidence threshold, starting with 0 till 1 with a stepping of 0.01. This…
3
votes
1 answer

What does these points mean in Naive Bayes?

I have two concept related questions related to Naïve Bayes. Naïve Bayes is robust to irrelevant features. What does this mean? Can anyone give an example how does the irrelevant features cancels out and what are the irrelevant features? It is…
3
votes
3 answers

Image Classification on non real images

I was wondering how image classifier networks perform on images that are not photographs. For example, if you were to feed a drawing of a car or a face to an image classifier that was only trained on photos would the network still be able to…
2
votes
1 answer

Question about reshaping array size for KNN Classifiers

I keep trying to run a new set of data through my KNN Classifier but would recieve the message: ValueError: query data dimension must match training data dimension It then used: x_new = pd.read_csv('NewFeaturePractice.csv' , names = attributes) …
2
votes
1 answer

Attitude to text mining and preparing tokens, irrelevant words, low accuracy

For purpose of quite big project I am doing a text mining on some documents. My steps are quite common: All to lower case Tokenization Stop list and stop words Lemmatizaton Stemming Some other steps like removing symbols. Then I prepare bag of…
1
2 3 4 5