Highest Voted Questions - Data Science Stack Exchange

8

votes

1 answer

Difference between Gensim word2vec and keras Embedding layer

I used the gensim word2vec package and Keras Embedding layer for various different projects. Then I realize they seem to do the same thing, they all try to convert a word into a feature vector. Am I understanding this properly? What exactly is the…

keras word2vec word-embeddings gensim embeddings

asked Oct 11 '19 at 13:25

Edamame

2,785
5
25
34

8

votes

2 answers

Best way to store large data set using R from Twitter?

I am working on a project that aims to retrieve a large data-set (i.e., tweet data which is a couple of days old) from Twitter using the twitteR library on R. have difficulty storing tweets because my machine has only 8 GB of memory. It ran out of…

r dataset

asked Jun 18 '15 at 18:23

Digital Dude

181
1

8

votes

2 answers

Can a decision tree learn to solve a xOR problem?

I have read online that decision trees can solve xOR type problems, as shown in images (xOR problem: 1) and (Possible solution as decision tree: 2). My question is how can a decision tree learn to solve this problem in this scenario. I just don't…

random-forest decision-trees

asked Oct 04 '19 at 12:13

lguerra

83
1
5

8

votes

3 answers

Algorithm for segmentation of sequence data

I have a large sequence of vectors of length N. I need some unsupervised learning algorithm to divide these vectors into M segments. For example: K-means is not suitable, because it puts similar elements from different locations into a single…

machine-learning clustering sequence

asked Jun 14 '15 at 10:19

generall

273
1
11

8

votes

2 answers

visualize a horizontal box plot in R

I have a dataset like this. The data has been collected through a questionnaire and I am going to do some exploratory data analysis. windows <- c("yes", "no","yes","yes","no") sql <- c("no","yes","no","no","no") excel <-…

r visualization

asked Jun 11 '15 at 15:40

Hamideh

942
2
12
22

8

votes

4 answers

How to learn spam email detection?

I want to learn how a spam email detector is done. I'm not trying to build a commercial product, it'll be a serious learning exercise for me. Therefore, I'm looking for resources, such as existing projects, source code, articles, papers etc that I…

machine-learning classification text-mining

asked Jun 01 '15 at 12:36

SmallChess

3,760
2
21
31

8

votes

3 answers

What can be the cause of a sudden explosion in the loss when training a CNN (Deeplab)

I am training the following deeplab CNN: https://github.com/tensorflow/models/tree/master/research/deeplab During training I see the following loss: The first 50k steps of the training the loss is quite stable and low, and suddenly it starts to…

neural-network deep-learning tensorflow training loss-function

asked Sep 05 '19 at 13:58

MuadDev

181
1
1
2

8

votes

2 answers

Time-series prediction: Model & data assumptions in AI/ML models vs conventional models

I was wondering if there was a good paper out there that informs about model and data assumptions in AI/ML approaches. For example, if you look at Time Series Modelling (Estimation or Prediction) with Linear models or (G)ARCH/ARMA processes, there…

machine-learning neural-network time-series linear-regression

asked Aug 29 '19 at 06:45

Maeaex1

570
2
15

8

votes

4 answers

Why is there a difference between predicting on Validation set and Test set?

I have a XGBoost model trying to predict if a currency will go up or down next period (5 min). I have a dataset from 2004 to 2018. I split the data randomized into 95% train and 5% validation and the accuracy on the Validation set is up to 55%. When…

machine-learning xgboost

asked Aug 24 '19 at 20:10

DBSE

221
2
4

8

votes

1 answer

Complex Chunking with NLTK

I am trying to figure out how to use NLTK's cascading chunker as per Chapter 7 of the NLTK book. Unfortunately, I'm running into a few issues when performing non-trivial chunking measures. Let's start with this phrase: "adventure movies between 2000…

python nlp nltk

asked May 16 '15 at 00:15

grill

234
3
7

8

votes

1 answer

Which classification algorithms to try for classifying text data into 300 categories

I have 40000 rows of text data of health care domain. Data has one column for text (2-5 sentences) and one column for its category. I want to classify that into 300 categories. Some categories are independent while some are somewhat related.…

machine-learning classification nlp text-mining

asked May 07 '15 at 08:52

Alok Nayak

191
1
5

8

votes

2 answers

How to use Graph Neural Network to predict relationships between nodes with pytorch_geometric?

Let's say I have a partly connected graph that represents members of many unrelated communities. I would like to predict the possible friendships between members of the same community: on an sliding scale between 0 to 10 how likey would they like…

pytorch-geometric

asked Jul 31 '19 at 16:38

Soerendip

744
1
9
16

8

votes

5 answers

What is the best question generation state of art with nlp?

I was trying out various projects available for question generation on GitHub namely NQG,question-generation and a lot of others but I don't see good results form them either they have very bad question formation or the questions generated are…

machine-learning deep-learning nlp

asked Jul 27 '19 at 07:39

Sundeep

108
1
10

8

votes

2 answers

Why is taking the gradient of the average error in SGD not correct, but rather the average of the gradients of single errors?

I am a little confused about taking averages in cost functions and SGD. So far I always thought in SGD you would compute the average error for a batch and then backpropagate it. But then I was told in a comment on this question that that was wrong.…

machine-learning optimization gradient-descent mini-batch-gradient-descent

asked Jul 25 '19 at 21:13

lo tolmencre

235
1
9

8

votes

2 answers

Which classification algorithms are negatively affected by class imbalances?

I've seen a few posts and papers floating around the web (mostly those related to over/undersampling, SMOTE, and cost-sensitive training) that, when discussing class imbalance, specify that certain algorithms are negatively impacted by class…

machine-learning classification predictive-modeling multilabel-classification class-imbalance

asked Jul 03 '19 at 19:45

Danny David Leybzon

180
2

Most Popular