Questions tagged [mlops]
21 questions
9
votes
2 answers
MLOps for beginner
I am 1 year old in ML and have been using jupyter notebook to build static models all these days, do some analysis and present my results to the bosses as it was all POC.
Now, we would like to scale the solution to become automatic and be able to…
The Great
- 2,725
- 3
- 23
- 49
8
votes
1 answer
MLflow real world experience
Can someone provide a summary of the real world deployment experience of MLflow? We have a few ML models (e.g., LightGBM, tensorflow v2, etc.) and want to avoid framework like SageMaker (due to customer requirement). So we are looking into various…
David293836
- 217
- 1
- 2
- 6
8
votes
1 answer
How to Combat Data Drift
I have customer demographic data that include columns like: age, the first half of the postcode, occupation (there is a defined list of possible occupations), and more. Each month I get a new batch of 1000 rows of this type of data (which is not…
scott lucas
- 83
- 3
7
votes
1 answer
What is the difference between Covariate Shift, Label Shift, Concept Shift, Concept Drift, and Prior Probability Shift?
As a beginner in MLOps, I was overwhelmed by some confusing definitions.
As far as I understand, when we have a classifier or regressor with y = f(X) function:
Covariate Shift is changing the distribution of independent variables (X),
Label Shift…
Mohsen Mahmoodzadeh
- 103
- 2
- 6
4
votes
2 answers
Meaningfully compare target vs observed TPR & FPR
Suppose I have a binary classifier $f$ which acts on an input $x$. Given a threshold $t$, the predicted binary output is defined as:
$$
\widehat{y} = \begin{cases}
1, & f(x) \geq t \\
0, & f(x) < t
\end{cases}
$$
I then compute the $TPR$…
Alexandru Dinu
- 183
- 5
3
votes
1 answer
Which ML models can be saved using HDF5?
I've read that HDF5 format can be used to save machine learning models. However, when using a trained CNNClassifier instance from sktime:
import h5py
from sktime.classification.deep_learning.cnn import CNNClassifier
cnn = CNNClassifier(n_epochs=100,…
Rubem Pacelli
- 133
- 4
2
votes
1 answer
Data preprocessing framework/library alternatives
I am currently working on some python machine learning projects that are soon to be deployed to production. As such, in our team we are interested in doing this the most "correct" way, following MLOps principles.
Specifically, I am currently…
neondot42
- 23
- 3
1
vote
3 answers
Prompt Ops Alternatives
What are the main alternatives for prompt ops nowadays? By prompt ops, I mean a comprehensive solution for tracking prompt engineering experiments and also registering prompts in different stages, similar to how I would with an ML model in a model…
Raffaele
- 73
- 1
- 5
1
vote
0 answers
Integrating MLFlow and SageMaker for a More Robust ML Model Deployment Pipeline
I'm seeking advice on enhancing the deployment pipeline of a machine learning model that's accessed via a FastApi in production. My goal is to replace the existing setup with a more robust and efficient system that includes built-in model…
Daniel Ben Zaken
- 11
- 3
1
vote
1 answer
Should I apply the same data transformations in production for my classification model's inference steps
I am now moving my best classification model to production and doing tests currently.
Should I use the same scaler() I used in training during my inference in production?
Also, what should I do if I used SMOTE during training? Should I also apply…
easymoneysniper
- 13
- 5
1
vote
1 answer
Training a CNN in production on new data
How should I approach training a convolutional neural network in production on new data when I detect model performance degradation due to data or concept drift? Resources like this one and this one lead me to conclude that I need to fine tune the…
Fijoy Vadakkumpadan
- 113
- 4
1
vote
1 answer
Sustain learning separately - continuous learning
This question is to seek suggestions on how to architect the continuous learning approach in distributed manner. Let me explain the situation:
In my classification problem, I have classes which can grow in large number over a period of time, as…
Sandeep Bhutani
- 914
- 1
- 7
- 26
1
vote
0 answers
How is model scheduling set up in practice?
I have been working on various machine learning models so far, but never yet on the deployment phase of an ML project. I have vaguely used Apache Airflow and I'm aware that it is a tool for scheduling DAGs, but I never set up such a scheduling on…
lazarea
- 299
- 1
- 15
1
vote
2 answers
Automate Clustering predictions and RFM metrics
We did a POC for customer segmentation and followed the below approach
a) extract data from source system (SAP business objects)
b) Use python jupyter notebook to manipulate, merge and group data (multiple csv files)
c) We cluster based on some…
The Great
- 2,725
- 3
- 23
- 49
0
votes
0 answers
Is my idea of a Feature Store wrong?
Cross-posted on Reddit ML.
Should a Feature Store be part of an enterprise data catalog?
To me, a feature store seems to be a highly niche data catalog but missing a lot of the benefits of having an enterprise data catalog / data discovery tool. My…
Pouya Barrach-Yousefi
- 101
- 2