Highest Voted Questions - Data Science Stack Exchange

71

votes

2 answers

Are Support Vector Machines still considered "state of the art" in their niche?

This question is in response to a comment I saw on another question. The comment was regarding the Machine Learning course syllabus on Coursera, and along the lines of "SVMs are not used so much nowadays". I have only just finished the relevant…

data-mining svm state-of-the-art

asked Jul 09 '14 at 12:22

Neil Slater

29,388
5
82
101

71

votes

4 answers

What is the use of torch.no_grad in pytorch?

I am new to pytorch and started with this github code. I do not understand the comment in line 60-61 in the code "because weights have requires_grad=True, but we don't need to track this in autograd". I understood that we mention requires_grad=True…

python pytorch

asked Jun 05 '18 at 08:21

mausamsion

1,312
1
10
14

71

votes

5 answers

Adding Features To Time Series Model LSTM

have been reading up a bit on LSTM's and their use for time series and its been interesting but difficult at the same time. One thing I have had difficulties with understanding is the approach to adding additional features to what is already a list…

machine-learning neural-network deep-learning time-series

asked Feb 21 '17 at 22:17

Rjay155

1,235
2
12
9

71

votes

5 answers

Why mini batch size is better than one single "batch" with all training data?

I often read that in case of Deep Learning models the usual practice is to apply mini batches (generally a small one, 32/64) over several training epochs. I cannot really fathom the reason behind this. Unless I'm mistaken, the batch size is the…

machine-learning deep-learning mini-batch-gradient-descent

asked Feb 07 '17 at 12:40

Hendrik

8,767
17
43
55

70

votes

2 answers

Sparse_categorical_crossentropy vs categorical_crossentropy (keras, accuracy)

Which is better for accuracy or are they the same? Of course, if you use categorical_crossentropy you use one hot encoding, and if you use sparse_categorical_crossentropy you encode as normal integers. Additionally, when is one better than the…

neural-network keras loss-function encoding

asked Dec 01 '18 at 06:28

Master M

803
1
7
5

70

votes

11 answers

Why should the data be shuffled for machine learning tasks

In machine learning tasks it is common to shuffle data and normalize it. The purpose of normalization is clear (for having same range of feature values). But, after struggling a lot, I did not find any valuable reason for shuffling data. I have read…

machine-learning neural-network deep-learning

asked Nov 09 '17 at 07:42

Green Falcon

14,308
10
59
98

69

votes

5 answers

In softmax classifier, why use exp function to do normalization?

Why use softmax as opposed to standard normalization? In the comment area of the top answer of this question, @Kilian Batzner raised 2 questions which also confuse me a lot. It seems no one gives an explanation except numerical benefits. I get the…

machine-learning deep-learning

asked Sep 20 '17 at 05:53

Hans

793
1
6
5

67

votes

9 answers

Clustering geo location coordinates (lat,long pairs)

What is the right approach and clustering algorithm for geolocation clustering? I'm using the following code to cluster geolocation coordinates: import numpy as np import matplotlib.pyplot as plt from scipy.cluster.vq import kmeans2,…

machine-learning python clustering k-means geospatial

asked Jul 17 '14 at 09:50

rokpoto.com

813
1
7
6

66

votes

5 answers

How to get accuracy, F1, precision and recall, for a keras model?

I want to compute the precision, recall and F1-score for my binary KerasClassifier model, but don't find any solution. Here's my actual code: # Split dataset in train and test data X_train, X_test, Y_train, Y_test = train_test_split(normalized_X,…

machine-learning neural-network deep-learning classification keras

asked Feb 06 '19 at 13:29

ZelelB

1,067
2
11
15

66

votes

4 answers

Does batch_size in Keras have any effects in results' quality?

I am about to train a big LSTM network with 2-3 million articles and am struggling with Memory Errors (I use AWS EC2 g2x2large). I found out that one solution is to reduce the batch_size. However, I am not sure if this parameter is only related to…

deep-learning keras

asked Jul 01 '16 at 11:54

hipoglucido

1,200
1
10
19

65

votes

6 answers

When is a Model Underfitted?

Logic often states that by underfitting a model, it's capacity to generalize is increased. That said, clearly at some point underfitting a model cause models to become worse regardless of the complexity of data. How do you know when your model has…

efficiency algorithms parameter

asked Jun 13 '14 at 16:44

blunders

1,932
2
15
19

63

votes

9 answers

Tools and protocol for reproducible data science using Python

I am working on a data science project using Python. The project has several stages. Each stage comprises of taking a data set, using Python scripts, auxiliary data, configuration and parameters, and creating another data set. I store the code in…

python tools version-control

asked Jul 16 '14 at 20:09

Yuval F

761
1
6
7

63

votes

11 answers

How to deal with version control of large amounts of (binary) data

I am a PhD student of Geophysics and work with large amounts of image data (hundreds of GB, tens of thousands of files). I know svn and git fairly well and come to value a project history, combined with the ability to easily work together and have…

bigdata databases binary version-control

asked Feb 13 '15 at 10:09

Johann

741
1
5
5

63

votes

4 answers

Difference between OrdinalEncoder and LabelEncoder

I was going through the official documentation of scikit-learn learn after going through a book on ML and came across the following thing: In the Documentation it is given about sklearn.preprocessing.OrdinalEncoder() whereas in the book it was given…

machine-learning python scikit-learn preprocessing encoding

asked Oct 07 '18 at 18:55

Saurabh Singh

773
1
6
8

63

votes

5 answers

Is it always better to use the whole dataset to train the final model?

A common technique after training, validating and testing the Machine Learning model of preference is to use the complete dataset, including the testing subset, to train a final model to deploy it on, e.g. a product. My question is: Is it always…

machine-learning dataset training accuracy

asked Jun 12 '18 at 09:54

pcko1

4,030
2
17
30

Most Popular