Load Testing GPUs

In a typical machine learning setup you probably rely on GPU power. And chances are, you expect your infrastructure to scale based on GPU load. Graphic cards have their own memory and compute cores, so it’s reasonable to be able to trigger autoscaling based on any of the two resources nearing saturation.

Working with a client that uses GPU instances for image recognition, I had to come up with a solution that would allow me to test an AWS autoscaling policy based on GPU memory and compute load in a predictable and easy way. A quick research on GPU load generation tools revealed a couple of abandoned projects and some more-or-less active, but I haven’t found a single tool that covers both of my required use cases. So I came up with a docker image that acts as a wrapper for two different opensource utilities and allows to saturate GPU’s compute or memory resources. This container image requires nvidia-docker (I will cover nvidia-docker setup on ECS with Amazon Linux 2 in a separate post).

If you run gpu-loadtest without arguments, it will print a brief help message, which essentially boils down to this: you either call this image with compute or memory argument. You can also provide more options if you wish: for compute it’s time to run the load generation (10 minutes by default), for memory it’s the size of GPU memory in Mb that the tool will try to use (by default – maximum allocatable size).

Here is a couple examples to illustrate it better

docker run --rm registry.gitlab.com/yaroslav.tarasenko/gpu-loadtest compute 300

Load test GPU compute resources for 5 minutes.

docker run --rm registry.gitlab.com/yaroslav.tarasenko/gpu-loadtest memory -g 1 1024 10

Test memory on the second GPU card (-g 1, indexing from zero), by reading/writing to the first 1024 Mb for 10 times.

P.S.: Check out the Dockerfile here.

2018/09/23