Highest Voted Questions - Data Science Stack Exchange

9

votes

4 answers

Is t-SNE just for visualization?

I have used the t-SNE algorithm to visualize my high dimensional data. However, I was wondering if this is a practical method for inference?

machine-learning visualization dimensionality-reduction tsne

asked Jul 29 '16 at 15:12

smw

203
1
5

9

votes

3 answers

Is it possible to (de)activate a specific set of cells in jupyter?

I have a jupyter notebook and I would like to perform several runs, the code I want to run depending on the results of the previous runs. I divided my notebook into several cells, and for each run I would like to select which cell should be executed…

jupyter

asked Jul 26 '16 at 14:21

Manu H

419
2
4
13

9

votes

2 answers

Cross-attention mask in Transformers

I can't fully understand how we should create the mask for the decoder's cross-attention mask in the original Transformer model from Attention Is All You Need. Here is my attempt at finding a solution: Suppose we are training such Transformer model,…

nlp transformer attention-mechanism masking

asked Dec 27 '23 at 15:44

ИванКарамазов

230
2
9

9

votes

1 answer

How to approach the numer.ai competition with anonymous scaled numerical predictors?

Numer.ai has been around for a while now and there seem to be only few posts or other discussions about it on the web. The system has changed from time to time and the set-up today is the following: train (N=96K) and test (N=33K) data with 21…

machine-learning deep-learning cross-validation preprocessing competitions

asked Jun 29 '16 at 16:11

Richi W

165
2
11

9

votes

1 answer

feature importance via random forest and linear regression are different

Applied Lasso to rank the features and got the following results: rank feature prob. ================================== 1 a 0.1825477951589229 2 b 0.07858498115577893 3 c 0.07041793111843796 Note that the data set has…

feature-selection random-forest linear-regression

asked Jun 10 '16 at 08:35

neurite

193
2
10

9

votes

2 answers

Tips for a new data scientist

I am about to start a job in which I will be working with large datasets and will be expected to find trends, etc... I have found lots of resources on where to learn ML and other hard skills and feel that I am (semi) competent on this end. I am…

beginner

asked May 31 '16 at 15:07

Hobbes

1,469
9
15

9

votes

2 answers

How to update bias and bias's weight using backpropagation algorithm

I'm writing my own training algorithm, but I don't know how to set the bias weight. Have I to set bias in any layer? Must the bias weight, be updated in every layer?

machine-learning neural-network backpropagation

asked May 27 '16 at 15:01

miguelote

93
1
1
3

9

votes

5 answers

Convolutional Neural Networks in R

I don't see a package for doing Convolutional Neural Networks in R. Has anyone implemented this kind of algorithm in R?

r convolutional-neural-network software-recommendation

asked May 25 '16 at 13:30

Hack-R

1,949
1
21
34

9

votes

3 answers

Which, if any, machine learning algorithms are accepted as being a good tradeoff between explainability and prediction?

Machine learning texts describing algorithms such as gradient boosting machines or neural networks often comment that these models are good at prediction, but this comes at the price of a loss of explainability or interpretability. Conversely,…

machine-learning predictive-modeling

asked May 22 '16 at 23:56

Robert de Graaf

899
5
17

9

votes

3 answers

What recommendation engine for a situation where users can only see a fraction of all items?

I want to add a recommendation feature to a document management system. It is a server on which most company documents are stored. Employees browse the web interface and click to download (or read online) the documents they want. Each employee only…

machine-learning recommender-system

asked May 17 '16 at 10:16

Nicolas Raoul

345
2
12

9

votes

3 answers

Split a list of values into columns of a dataframe?

I am new to python and stuck at a particular problem involving dataframes. The image has a sample column, however the data is not consistent. There are also some floats and NAN. I need these to be split across columns. That is each unique value…

python pandas

asked May 17 '16 at 01:37

Drj

427
1
7
19

9

votes

1 answer

What is the distribution of categories in imagenet training set (ILSVRC2012)

http://arxiv.org/pdf/1409.0575v3.pdf Table 2 says there are 1,281,167 images and 732-1300 per class in the ILSVRC2012 training set. Ideally I'd like to avoid downloading the 138 GB just for this purpose as I otherwise don't need it. I was wondering…

dataset image-classification image-recognition

asked May 15 '16 at 10:46

user1030139

91
1
4

9

votes

1 answer

Is there any domain where Spiking Neural Networks outperform other algorithms (non-spiking)?

I'm reading about reservoir computing techniques like Echo State Networks and Liquid State Machines. Both of the methods involve feeding inputs to a population of randomly (or not) connected spiking neurons, and a relatively simple readout algorithm…

machine-learning classification neural-network deep-learning svm

asked Apr 29 '16 at 15:43

Justas

191
3

9

votes

4 answers

k-means in R, usage of nstart parameter?

I try to use k-means clusters (using SQLserver + R), and it seems that my model is not stable : each time I run the k-means algorithm, it finds different clusters. But if I set nstart (in R k-means function) high enough (10 or more) it becomes…

r k-means

asked Apr 28 '16 at 15:44

irimias

277
1
3
7

9

votes

2 answers

Why did Tufte call this a "superbly produced duck"?

I think I understand Tufte's concept of a "Duck" -- A graphic that is taken over by decorative forms. But I couldn't understand why he called this a duck (a "superbly produced" one at that). It seemed to me more functional than decorative.…

visualization terminology

asked Apr 22 '16 at 17:03

thanks_in_advance

325
2
11

Most Popular