I'm very new to machine learning. I am doing a project for a subject called parallel and distributed computing, in which we have to speed up a heavy computation using parallelism or distributed computing. My idea was to have a dataset divided in equal parts, and for each subset to have a neural network to be trained on a separate machine in the cloud. Once the models are trained, they would be returned back to me and somehow combined into a single model. I am aware of federated learning but it doesn't quite fit my scenario of actually sending and dividing the dataset into the cloud. Does someone know any feasible approaches (maybe a variant of federated learning) of how one would do this?
Asked
Active
Viewed 97 times
1 Answers
0
There are many ways to parallelism machine learning. It is often better to distribute the model parameters, not the data.
Training models only a subset of data will result in worse parameter estimates than training a model on random samples of the data.
Additionally, moving data around is more expensive than moving parameters.
Brian Spiering
- 23,131
- 2
- 29
- 113