Highest Voted 'model-selection' Questions - Data Science Stack Exchange

29

votes

4 answers

Any "rules of thumb" on number of features versus number of instances? (small data sets)

I am wondering, if there are any heuristics on number of features versus number of observations. Obviously, if a number of features is equal to the number of observations, the model will overfit. By using sparse methods (LASSO, elastic net) we can…

feature-selection model-selection

asked Apr 24 '16 at 06:55

Arnold Klein

513
2
5
13

17

votes

3 answers

Why my network needs so many epochs to learn?

I'm working on a relation classification task for natural language processing and I have some questions about the learning process. I implemented a convolutional neural network using PyTorch, and I'm trying to select the best hyper-parameters. The…

machine-learning neural-network deep-learning model-selection hyperparameter-tuning

asked Jun 01 '19 at 09:58

user3319400

271
1
2
6

15

votes

4 answers

How to compare the performance of feature selection methods?

There are several feature selection / variable selection approaches (see for example Guyon & Elisseeff, 2003; Liu et al., 2010): filter methods (e.g., correlation-based, entropy-based, random forest importance based), wrapper methods (e.g.,…

feature-selection performance model-selection

asked Dec 06 '16 at 13:31

hopfk

341
2
10

10

votes

2 answers

How do scientists come up with the correct Hidden Markov Model parameters and topology to use?

I understand how a Hidden Markov Model is used in genomic sequences, such as finding a gene. But I don't understand how to come up with a particular Markov model. I mean, how many states should the model have? How many possible transitions? Should…

machine-learning model-selection hyperparameter markov

asked Oct 09 '15 at 00:02

SmallChess

3,760
2
21
31

10

votes

4 answers

Which is first ? Tuning the parameters or selecting the model

I've been reading about how we split our data into 3 parts; generally, we use the validation set to help us tune the parameters and the test set to have an unbiased estimate on how well does our model perform and thus we can compare models based on…

cross-validation model-selection hyperparameter hyperparameter-tuning

asked Nov 27 '18 at 21:00

Ahmed Gharbi

103
1
6

10

votes

2 answers

Adding feature leads to worse results

I have a dataset with 20 variables and ~50K observations, I created several new features using those 20 variables. I compare the results of a GBM model (using python xgboost and light GBM) and I found that it doesn't matter what are the…

xgboost feature-engineering model-selection gbm

asked Dec 07 '17 at 06:46

Yaron

201
1
2
5

10

votes

6 answers

What are some of the best practices for sharing data and models with colleagues?

As a data scientist who recently joined a new team, I wanted to ask the community how they share data and models among their colleagues. Currently I have to resort to storing data in some central server or location where all of us can access (which…

machine-learning predictive-modeling dataset data model-selection

asked Mar 17 '17 at 18:45

asampat3090

81
1
6

10

votes

3 answers

Nested cross-validation and selecting the best regression model - is this the right SKLearn process?

If I understand correctly, nested-CV can help me evaluate what model and hyperparameter tuning process is best. The inner loop (GridSearchCV) finds the best hyperparameters, and the outter loop (cross_val_score) evaluates the hyperparameter tuning…

python scikit-learn cross-validation model-selection

asked Aug 04 '16 at 01:28

BobbyJohnsonOG

103
1
1
4

9

votes

2 answers

Ethical consequences of non-deterministic learning processes?

Most advanced supervised learning techniques are non-deterministic by construction. The final output of the model usually depends on some random parts of the learning process. (Random weight initialization for Neural Networks or variable selection /…

model-selection methodology ethical-ai

asked Jun 23 '21 at 11:39

Lucas Morin

2,775
5
25
47

8

votes

2 answers

Why are RNN/LSTM preferred in time series analysis and not other NN?

I had recently a great discussion about the advantages of RNN/LSTM in time series analysis in comparison to other Neural Networks like MLP or CNN. The other side said, that: The NN just have to be deep enough to model the time connections RNNs are…

neural-network predictive-modeling time-series rnn model-selection

asked Sep 14 '17 at 14:15

Mimi Müller

248
2
9

8

votes

2 answers

Machine Learning models in production environment

Lets say a Model was trained on date $dt1$ using the available labeled data, split into training and test i.e $train_{dt1}$, $test_{dt1}$. This model is then deployed in production and makes predictions on new incoming data. Some $X$ days pass, and…

machine-learning cross-validation model-selection data-product

asked Aug 11 '16 at 17:48

trailblazer

263
2
6

7

votes

6 answers

Is there any way to explicitly measure the complexity of a Machine Learning Model in Python

I'm interested in model debugging, and one of the points that it mentions is to compare your model with a "less complex" one to check if the performance is substantially better on the most complex model as compared with the simpler one. So, it…

machine-learning python r predictive-modeling model-selection

asked Aug 19 '20 at 19:22

Multivac

3,199
2
10
26

7

votes

1 answer

difference between empirical risk minimization and structural risk minimization?

I understand the meaning of empirical risk minimization as separate topic and was reading about structural risk minimization, it is hard for me to understand the difference between these two. I read somewhere that perceptron uses Emperical risk…

machine-learning svm model-selection perceptron

asked Jan 20 '20 at 00:31

A.B

336
1
3
12

7

votes

3 answers

TypeError: Expected binary or unicode string, got [

ERROR SUMMARY: I'm getting the following error: TypeError: Expected binary or unicode string, got [ BACKGROUND: I have several features that are histories of user activity. I am trying to predict whether a given user will take an action…

predictive-modeling tensorflow model-selection

asked Feb 19 '18 at 15:25

Paul

171
1
1
3

7

votes

2 answers

How would you describe the trade-off between model interpretability and model prediction power in layman's terms?

I know it depends on the data and question asked but imagine a scenario that for a given dataset you could either go for a fairly complex nonlinear model (hard to interpret though) giving you a better prediction power perhaps because the model may…

machine-learning predictive-modeling model-selection

asked Jan 11 '18 at 08:56

TwinPenguins

4,429
3
22
54

Questions tagged [model-selection]