Highest Voted Questions - Data Science Stack Exchange

9

votes

2 answers

Text similarity with sentence embeddings

I'm trying to calculate similarity between texts with various lengths. My current approach is following: Using Universal Sentence Encoder, I convert text to a set of vectors. I average these vectors to create the final feature vector. I compare…

word-embeddings similarity similar-documents

asked Sep 19 '19 at 20:04

Kertis van Kertis

143
1
6

9

votes

1 answer

Using Vowpal Wabbit for NER

The Vowpal Wabbit (VW) apparently supports sequence tagging functionality via SEARN. The problem is that I cannot find anywhere detailed parameter list with explanations and with some examples. The best I could find is Zinkov's blog entry with a…

machine-learning nlp

asked Jun 06 '15 at 07:00

Vladislavs Dovgalecs

481
3
9

9

votes

1 answer

Theano in deep learning research

How widely is Theano used in deep learning research? Is Theano a good start to learn the implementation of machine learning algorithms? Will learning the implementation of something like a feed forward network really help? Do graduate students…

machine-learning python deep-learning library

asked May 30 '15 at 08:33

Karthik Thiagarajan

9

votes

1 answer

Are there any unsupervised learning algorithms for time sequenced data?

Each observation in my data was collected with a difference of 0.1 seconds. I don't call it a time series because it don't have a date and time stamp. In the examples of clustering algorithms (I found online) and PCA the sample data have 1…

algorithms

asked May 29 '15 at 23:04

umair durrani

344
1
2
8

9

votes

2 answers

Features reduction for the not correlated data set

I am working with classification problem on a training data set, which have 100 features. All the features in pairs haven't visible correlation. One can see it in the example pair plot for the some of features: I am trying to find the right way to…

r feature-selection correlation

asked Sep 04 '19 at 18:45

Ruben Kazumov

211
1
4

9

votes

3 answers

Is There a Way to Re-Calibrate Predicted Probabilities After Using Class Weights?

I have classification data with far more negative instances than positive instances. I have used class weights in my models and have achieved the discrimination I want but the predicted probabilities from the models do not match the actual…

python prediction class-imbalance

asked Sep 03 '19 at 20:36

from keras import michael

370
3
13

9

votes

1 answer

Gensim LDA model: return keywords based on relevance (λ - lambda) value

I am using the gensim library for topic modeling, more specifically LDA. I created my corpus, my dictionary, and my LDA model. With the help of the pyLDAvis library I visualized the results. When I print the words with the highest probability on…

python topic-model lda gensim

asked Aug 21 '19 at 17:40

Tasos Lytos

91
4

9

votes

1 answer

How is the cross-product transformation defined for binary features?

I am reading the paper on Wide & Deep learning and for the wide component, it states that one of the most important transformations is the cross-product transformation. This is defined as follows: $$\phi_{k}(\mathbf{x})=\prod_{i=1}^{d} x_{i}^{c_{k…

machine-learning deep-learning recommender-system

asked Aug 12 '19 at 12:42

Dimitris Poulopoulos

93
1
3

9

votes

3 answers

In elbow curve how to find the point from where the curve starts to rise?

I am computing a distance metric on my data. The result is then being sorted in ascending order. The samples having distance more than a specific threshold are to be marked as outliers and will be discarded. Below is a plot of all distance…

python graphs outlier matplotlib

asked Aug 07 '19 at 10:49

Faiz Kidwai

235
1
2
12

9

votes

3 answers

xgboost: Is there a way to perform regression on rates/percentages data?

I have a dependent variable, $Y$, that is made up of rates/percentages data, so each value is between $0$ and $1$. I was attracted to the xgboost library because it allows focusing in on specific subsets of the data in training itself, but I am…

regression linear-regression xgboost distribution

asked Aug 06 '19 at 01:25

Coolio2654

300
3
10

9

votes

2 answers

Is Faster RCNN the same thing as VGG-16, RESNET-50, etc... or not?

My understanding is that Faster RCNN is an architecture for performing object detection. It finds objects in an image and classifies them. My understanding is also that VGG-16, RESNET-50, etc... also find objects in images and classify them. Are…

neural-network deep-learning faster-rcnn vgg16

asked Jun 26 '19 at 14:32

b19wh33l5

91
1
2

9

votes

2 answers

What does an Input layer of shape=(None,) or (None,12) actually mean?

Is this telling the model that there are two dimensions (i.e. it’s a matrix) but we don’t yet know the size of that particular dimension? If so, how can the model be compiled? Doesn’t the size of each dimension affect the number of nodes in middle…

neural-network keras tensorflow

asked Jun 20 '19 at 14:01

Nic Cottrell

303
1
2
10

9

votes

1 answer

How does SQL Server Analysis Services compare to R?

This may be too broad of a question with heavy opinions, but I really am finding it hard to seek information about running various algorithms using SQL Server Analysis Service Data Mining projects versus using R. This is mainly because all the data…

data-mining r algorithms

asked Mar 27 '15 at 08:41

Fastidious

213
2
7

9

votes

2 answers

How to determine input shape in keras?

I am having difficulty finding where my error is while building deep learning models, but I typically have issues when setting the input layer input shape. This is my model: model = Sequential([ Dense(32, activation='relu', input_shape=(1461,…

python deep-learning keras numpy

asked Jun 12 '19 at 03:21

Josh Zwiebel

193
1
1
6

9

votes

3 answers

Why do RNNs usually have fewer hidden layers than CNNs?

CNNs can have hundreds of hidden layers and since they are often used with image data, having many layers captures more complexity. However, as far as I have seen, RNNs usually have few layers e.g. 2-4. For example, for electrocardiogram (ECG)…

deep-learning cnn lstm rnn feature-extraction

asked Jun 09 '19 at 02:18

KRL

231
1
4

Most Popular