Highest Voted Questions - Data Science Stack Exchange

295

votes

8 answers

Micro Average vs Macro average Performance in a Multiclass classification setting

I am trying out a multiclass classification setting with 3 classes. The class distribution is skewed with most of the data falling in 1 of the 3 classes. (class labels being 1,2,3, with 67.28% of the data falling in class label 1, 11.99% data in…

multiclass-classification model-evaluations

asked Dec 29 '16 at 17:39

SHASHANK GUPTA

3,855
4
20
26

286

votes

12 answers

What are deconvolutional layers?

I recently read Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, Trevor Darrell. I don't understand what "deconvolutional layers" do / how they work. The relevant part is 3.3. Upsampling is backwards strided…

neural-network convolutional-neural-network convolution

asked Jun 13 '15 at 09:56

Martin Thoma

19,540
36
98
170

267

votes

10 answers

How to set class weights for imbalanced classes in Keras?

I know that there is a possibility in Keras with the class_weights parameter dictionary at fitting, but I couldn't find any example. Would somebody so kind to provide one? By the way, in this case the appropriate praxis is simply to weight up the…

deep-learning classification keras weighted-data

asked Aug 17 '16 at 09:35

Hendrik

8,767
17
43
55

250

votes

10 answers

What's the difference between fit and fit_transform in scikit-learn models?

I do not understand the difference between the fit and fit_transform methods in scikit-learn. Can anybody explain simply why we might need to transform data? What does it mean, fitting a model on training data and transforming to test data? Does it…

python scikit-learn

asked Jun 21 '16 at 10:05

Kaggle

2,977
5
15
8

206

votes

18 answers

Train/Test/Validation Set Splitting in Sklearn

How could I randomly split a data matrix and the corresponding label vector into a X_train, X_test, X_val, y_train, y_test, y_val with scikit-learn? As far as I know, sklearn.model_selection.train_test_split is only capable of splitting into two not…

machine-learning scikit-learn cross-validation

asked Nov 15 '16 at 14:55

Hendrik

8,767
17
43
55

203

votes

35 answers

Publicly Available Datasets

One of the common problems in data science is gathering data from various sources in a somehow cleaned (semi-structured) format and combining metrics from various sources for making a higher level analysis. Looking at the other people's effort,…

open-source dataset

asked May 18 '14 at 18:45

Amir Ali Akbari

1,393
3
13
25

202

votes

13 answers

K-Means clustering for mixed numeric and categorical data

My data set contains a number of numeric attributes and one categorical. Say, NumericAttr1, NumericAttr2, ..., NumericAttrN, CategoricalAttr, where CategoricalAttr takes one of three possible values: CategoricalAttrValue1, CategoricalAttrValue2 or…

data-mining clustering octave k-means categorical-data

asked May 14 '14 at 05:58

IgorS

5,474
11
34
43

200

votes

6 answers

What is the "dying ReLU" problem in neural networks?

Referring to the Stanford course notes on Convolutional Neural Networks for Visual Recognition, a paragraph says: "Unfortunately, ReLU units can be fragile during training and can "die". For example, a large gradient flowing through a ReLU…

machine-learning neural-network deep-learning

asked May 07 '15 at 04:11

tejaskhot

4,125
7
22
18

200

votes

7 answers

How to draw Deep learning network architecture diagrams?

I have built my model. Now I want to draw the network architecture diagram for my research paper. Example is shown below:

machine-learning neural-network deep-learning svm software-recommendation

asked Nov 03 '16 at 03:10

Muhammad Ali

2,509
5
21
22

196

votes

2 answers

Difference between isna() and isnull() in pandas

I have been using pandas for quite some time. But, I don't understand what's the difference between isna() and isnull(). And, more importantly, which one to use when identifying missing values in a dataframe. What is the basic underlying difference…

python pandas dataframe

asked Sep 06 '18 at 10:14

Vaibhav Thakur

2,403
3
13
9

185

votes

21 answers

How do you visualize neural network architectures?

When writing a paper / making a presentation about a topic which is about neural networks, one usually visualizes the networks architecture. What are good / simple ways to visualize common architectures automatically?

machine-learning neural-network deep-learning visualization

asked Jul 18 '16 at 17:08

Martin Thoma

19,540
36
98
170

184

votes

6 answers

When to use GRU over LSTM?

The key difference between a GRU and an LSTM is that a GRU has two gates (reset and update gates) whereas an LSTM has three gates (namely input, output and forget gates). Why do we make use of GRU when we clearly have more control on the network…

neural-network deep-learning lstm gru

asked Oct 17 '16 at 11:47

Sayali Sonawane

2,101
3
13
13

177

votes

4 answers

When to use One Hot Encoding vs LabelEncoder vs DictVectorizor?

I have been building models with categorical data for a while now and when in this situation I basically default to using scikit-learn's LabelEncoder function to transform this data prior to building a model. I understand the difference between OHE,…

scikit-learn categorical-data feature-engineering

asked Dec 19 '15 at 19:30

anthr

1,893
3
12
11

153

votes

6 answers

The cross-entropy error function in neural networks

In the MNIST For ML Beginners they define cross-entropy as $$H_{y'} (y) := - \sum_{i} y_{i}' \log (y_i)$$ $y_i$ is the predicted probability value for class $i$ and $y_i'$ is the true probability for that class. Question 1 Isn't it a problem that…

machine-learning tensorflow

asked Dec 10 '15 at 06:22

Martin Thoma

19,540
36
98
170

151

votes

13 answers

Why do people prefer Pandas to SQL?

I've been using SQL since 1996, so I may be biased. I've used MySQL and SQLite 3 extensively, but have also used Microsoft SQL Server and Oracle. The vast majority of the operations I've seen done with Pandas can be done more easily with SQL. This…

python pandas sql

asked Jul 12 '18 at 09:25

vy32

611
3
7
11

Most Popular