Production ML architectures (deployed at scale in production) are evolving at a rapid pace. We suggest there have been two generations so far: the first generation were very much fixed function pipelines with predetermined stages, the second generation was pluggable components with a bit more flexibility but still pretty constrained. If history is a guide (especially looking at the evolution of GPU APIs), the third generation is going to come from making the computational power accessible and flexible.
We share our experiences with Ray, a system that makes distributed computing accessible and flexible. We give a two slide introduction to Ray, and show how Ray’s flexibility enables approaches like online reinforcement learning that are not easy to fit in to existing production ML architectures without some serious shoe-horning.
We then outline how different companies (such as Uber, Ant Financial, McKinsey) are using Ray in a way that extends beyond the constraints of existing second generation architectures.