Questions tagged [best-practice]
7 questions
6
votes
1 answer
What are some popular but outdated or ineffective practices in data science?
I was taught stepwise feature selection (like forward and backward selection) during college, and at the time, it seemed like a really effective way to pick features. But recently i have been reading more and realized that it’s actually considered…
Guna
- 747
- 1
- 16
3
votes
3 answers
A good way to organize/store a lot of datasets
In machine translation, we often have bilingual dataset, e.g. for German-English and French-English we will have something that looks like this:
/en-de
train.de
train.en
dev.de
dev.en
test.de
test.en
/en-fr
train.fr
…
alvas
- 2,510
- 7
- 28
- 40
1
vote
0 answers
Best Practices for Storing "Target" Events Alongside Time Series Data in MongoDB
I’m currently storing sensor data in MongoDB using a schema like this, as recommended here:
{
"receivedAt": ISODate, // timeField
"node_id": "string", // metaField
"type": "string",
"data": { /* sensor-specific data */ }
}
Now, I want to…
trya2l
- 11
- 1
0
votes
0 answers
Python vs. SQL: best practices for feature engineering
I have an SQL database, and there are two ways I can connect to it:
Using Azure Data Studio and running SQL commands there (either using a .sql script file, or using .ipynb notebook). The pro of this is that you get some SQL syntax highlighting and…
Maverick Meerkat
- 240
- 2
- 8
0
votes
1 answer
training data includes data not needing predictions - should these be included in training? (best practice question)
Best practice advice for linear regression - if training data contains entries that do not need predictions, is it commonplace to remove these entries? For example, if you are predicting a fare amount but some fares are flat fee fares (not needing…
ssou
- 13
- 3
0
votes
2 answers
Data Science Project Data Workflow Structure
I'm in the middle of a project of marketing regarding the sales prediction with promotions. The client has very complex business processes and so the data needs a lot of preprocessing (joins, filters, etc.). I have organize the code in different…
ru.mp
- 73
- 1
- 1
- 7
0
votes
0 answers
Best way to store "small" data
I have a small dataset consisting of around 200 measurements of different light sources. The light is measured with a broadband and an IR diode, and the resulting voltage is sampled it with 4 megasamples/second. All the measurements have a length of…
ilja
- 1
- 1