Questions tagged [best-practice]

7 questions
6
votes
1 answer

What are some popular but outdated or ineffective practices in data science?

I was taught stepwise feature selection (like forward and backward selection) during college, and at the time, it seemed like a really effective way to pick features. But recently i have been reading more and realized that it’s actually considered…
3
votes
3 answers

A good way to organize/store a lot of datasets

In machine translation, we often have bilingual dataset, e.g. for German-English and French-English we will have something that looks like this: /en-de train.de train.en dev.de dev.en test.de test.en /en-fr train.fr …
alvas
  • 2,510
  • 7
  • 28
  • 40
1
vote
0 answers

Best Practices for Storing "Target" Events Alongside Time Series Data in MongoDB

I’m currently storing sensor data in MongoDB using a schema like this, as recommended here: { "receivedAt": ISODate, // timeField "node_id": "string", // metaField "type": "string", "data": { /* sensor-specific data */ } } Now, I want to…
trya2l
  • 11
  • 1
0
votes
0 answers

Python vs. SQL: best practices for feature engineering

I have an SQL database, and there are two ways I can connect to it: Using Azure Data Studio and running SQL commands there (either using a .sql script file, or using .ipynb notebook). The pro of this is that you get some SQL syntax highlighting and…
0
votes
1 answer

training data includes data not needing predictions - should these be included in training? (best practice question)

Best practice advice for linear regression - if training data contains entries that do not need predictions, is it commonplace to remove these entries? For example, if you are predicting a fare amount but some fares are flat fee fares (not needing…
ssou
  • 13
  • 3
0
votes
2 answers

Data Science Project Data Workflow Structure

I'm in the middle of a project of marketing regarding the sales prediction with promotions. The client has very complex business processes and so the data needs a lot of preprocessing (joins, filters, etc.). I have organize the code in different…
ru.mp
  • 73
  • 1
  • 1
  • 7
0
votes
0 answers

Best way to store "small" data

I have a small dataset consisting of around 200 measurements of different light sources. The light is measured with a broadband and an IR diode, and the resulting voltage is sampled it with 4 megasamples/second. All the measurements have a length of…