Should I use GPU or CPU for inference?

Question

I'm running a deep learning neural network that has been trained by a GPU. I now want to deploy this to multiple hosts for inference. The question is what are the conditions to decide whether I should use GPU's or CPUs for inference?

Adding more details from comments below.

I'm new to this so guidance is appreciated.

Memory: GPU is K80
Framework: Cuda and cuDNN
Data size per workloads: 20G
Computing nodes to consume: one per job, although would like to consider a scale option
Cost: I can afford a GPU option if the reasons make sense
Deployment: Running on own hosted bare metal servers, not in the cloud.

Right now I'm running on CPU simply because the application runs ok. But outside of that reason, I'm unsure why one would even consider GPU.

score 19 · Accepted Answer · edited Nov 29 '20 at 13:09

It is true that for training a lot of the parallalization can be exploited by the GPU's, resulting in much faster training. For Inference, this parallalization can be way less, however CNN's will still get an advantage from this resulting in faster inference. Now you just have to ask yourself: is faster inference important? Do I want this extra dependencies (a good GPU, the right files installed etc)?

If speed is not an issue, go for CPU. However note that GPU's can make it an order of magnitude faster in my experience.

score 8 · Answer 2 · answered Sep 29 '17 at 18:09

Running inference on a GPU instead of CPU will give you close to the same speedup as it does on training, less a little to memory overhead.

However, as you said, the application runs okay on CPU. If you get to the point where inference speed is a bottleneck in the application, upgrading to a GPU will alleviate that bottleneck.

score 5 · Answer 3 · answered Sep 27 '17 at 09:18

You'd only use GPU for training because deep learning requires massive calculation to arrive at an optimal solution. However, you don't need GPU machines for deployment.

Let's take Apple's new iPhone X as an example. The new iPhone X has an advanced machine learning algorithm for facical detection. Apple employees must have a cluster of machines for training and validation. But your iPhone X doesn't need a GPU for just running the model.

Should I use GPU or CPU for inference?

3 Answers3

Linked