In a typical machine learning setup you probably rely on GPU power. And chances are, you expect your infrastructure to scale based on GPU load. Graphic cards have their own memory and compute cores, so it’s reasonable to be able to trigger autoscaling based on any of the two resources nearing saturation.
Working with a client that uses GPU instances for image recognition, I had to come up with a solution that would allow me to test an AWS autoscaling policy based on GPU memory and compute load in a predictable and easy way. A quick research on GPU load generation tools revealed a couple of abandoned projects and some more-or-less active, but I haven’t found a single tool that covers both of my required use cases. So I came up with a docker image that acts as a wrapper for two different opensource utilities and allows to saturate GPU’s compute or memory resources. This container image requires nvidia-docker (I will cover nvidia-docker setup on ECS with Amazon Linux 2 in a separate post).
If you run gpu-loadtest without arguments, it will print a brief help
message, which essentially boils down to this: you either call this
image with compute
or memory
argument. You can also provide more
options if you wish: for compute
it’s time to run the load
generation (10 minutes by default), for memory
it’s the size of GPU
memory in Mb that the tool will try to use (by default – maximum
allocatable size).
Here is a couple examples to illustrate it better
docker run --rm registry.gitlab.com/yaroslav.tarasenko/gpu-loadtest compute 300
Load test GPU compute resources for 5 minutes.
docker run --rm registry.gitlab.com/yaroslav.tarasenko/gpu-loadtest memory -g 1 1024 10
Test memory on the second GPU card (-g 1, indexing from zero), by reading/writing to the first 1024 Mb for 10 times.
P.S.: Check out the Dockerfile here.