A reference implementation for performing offline inference using multiple GPUs, with each GPU hosting one instance of the model. Surprisingly, I couldn't find existing tools that easily support this. Therefore, I had to manually launch several server instances on different ports and use Ray's data ...
A reference implementation for performing offline inference using multiple GPUs, with each GPU hosting one instance of the model. Surprisingly, I couldn't find existing tools that easily su...