Anyone who has tried doing machine learning at scale knows it can get expensive. The costs associated with training models using on-demand compute and storing features in low latency databases can quickly get out of hand, and we're often forced to make hard decisions on what to include in a model in order to keep it financially viable.
Not all features are created equal - they can vary widely in their relevance to a particular use case, and how much they cost to productionise. As such, data scientists need to be able to make intelligent decisions on feature cost versus impact with sometimes incomplete information.
Drawing on experience from productionising machine learning at scale at Atlassian, this talk will explore how to better make these decisions, including:
- An exploration of the potential costs involved when productionising features
- How to estimate the cost of a feature before productionising it
- What tradeoffs can be made
- A technique for factoring in feature cost to the performance of a model during training and feature selection