Questions tagged [binary]

104 questions
63
votes
11 answers

How to deal with version control of large amounts of (binary) data

I am a PhD student of Geophysics and work with large amounts of image data (hundreds of GB, tens of thousands of files). I know svn and git fairly well and come to value a project history, combined with the ability to easily work together and have…
Johann
  • 741
  • 1
  • 5
  • 5
38
votes
5 answers

Best practices to store Python machine learning models

What are the best practices to save, store, and share machine learning models? In Python, we generally store the binary representation of the model, using pickle or joblib. Models, in my case, can be ~100Mo large. Also, joblib can save one model to…
Antoine Dusséaux
  • 481
  • 1
  • 4
  • 7
30
votes
2 answers

How to interpret classification report of scikit-learn?

As you can see, it is about a binary classification with linearSVC. The class 1 has a higher precision than class 0 (+7%), but class 0 has a higher recall than class 1 (+11%). How would you interpret this? And two other questions: what does…
user77241
20
votes
5 answers

Choose binary classification algorithm

I have a binary classification problem: Approximately 1000 samples in training set 10 attributes, including binary, numeric and categorical Which algorithm is the best choice for this type of problem? By default I'm going to start with SVM…
IgorS
  • 5,474
  • 11
  • 34
  • 43
11
votes
4 answers

Why might several types of models give almost identical results?

I've been analyzing a data set of ~400k records and 9 variables The dependent variable is binary. I've fitted a logistic regression, a regression tree, a random forest, and a gradient boosted tree. All of them give virtual identical goodness of fit…
JenSCDC
  • 327
  • 1
  • 11
9
votes
3 answers

Binary (Unary) Recommendation System with Biased Views

I would like to create a content recommendation system based on binary click data that also takes views into account. What content a user has been exposed to, and therefore has the chance to click on, is currently biased by a rule based system that…
elz
  • 43
  • 8
9
votes
1 answer

Using SVM as a binary classifier, is the label for a data point chosen by consensus?

I'm learning Support Vector Machines, and I'm unable to understand how a class label is chosen for a data point in a binary classifier. Is it chosen by consensus with respect to the classification in each dimension of the separating hyperplane?
gc5
  • 879
  • 2
  • 9
  • 17
8
votes
1 answer

Micro-F1 and Macro-F1 are equal in binary classification and I don't know why

I have a binary classification problem which in the test set, the number of data in both classes are equal (the test number of class 0 and class 1 are equal). Since we know that the number of samples from every class are equal, I use median on the…
user137927
  • 389
  • 1
  • 3
  • 11
8
votes
2 answers

Why are precision and recall used in the F1 score, rather than precision and NPV?

In binary classification problems it seems the F1 score is often used as a performance measure. As far as I've understood the idea is to find the best tradeoff between precision and recall. The formula for the F1 score is symmetric in precision and…
egdvnyjklu
  • 181
  • 2
6
votes
3 answers

What is the best metric to evaluate highly imbalanaced binary classifiction? (such as fraud detection in credit card)

What is the best metric to evaluate highly imbalanaced binary classifiction? (such as fraud detection in credit card? I have examining several metrics precision recall F1 lassification Report (macro avg,weighted avg), ROC, AUC,.. but I do not know…
user10296606
  • 1,906
  • 6
  • 18
  • 33
6
votes
2 answers

Is a correlation matrix meaningful for a binary classification task?

When examining my dataset with a binary target (y) variable I wonder if a correlation matrix is useful to determine predictive power of each variable. My predictors (X) contain some numeric and some factor variables.
Georg Heiler
  • 337
  • 2
  • 4
  • 13
5
votes
1 answer

Reduce multiclass classification targets to binary classification targets in scikit-learn

I would like to reduce multiclass classification targets to binary classification targets. Ideally, this mapping would happen within scikit-learn so the same transformation applies during both training and prediction. I looked at transforming the…
5
votes
1 answer

how can I generate a Bernoulli block mixture model in matlab?

I am trying to write the code of a Bernoulli block mixture model in matlab, but am facing an error every time I run the function. In particular, I'm having a problem with how to relate the distribution parameter $\alpha$ to the latent variables $Z$…
Ahmad Tay
  • 51
  • 2
5
votes
1 answer

Sklearn Aggregating Multiple Fitted Models Into A Single Model? (binary classification)

My problem context: dataset too big to fit into memory. binary classification [0,1] 30 csv files in a directory with exactly 30,000 samples (rows) each file contains 15,000 0 class and 15,000 1 class (no unbalance) model is…
Jarad
  • 239
  • 2
  • 11
5
votes
1 answer

How to do Exploratory Data Analysis when my response variable is binary?

I am doing a multilevel regression, and my response variable is binary (presence of females on a tech board). all the EDA methods i know are about plotting correlation, but this as this is a binary i dont know where to start. my predictors are some…
1
2 3 4 5 6 7