At Stitch Fix we have 130+ “Full Stack Data Scientists” who in addition to doing data science work, are also expected to engineer and own data pipelines for their production models. One data science team, the Forecasting, Estimation, and Demand team was in a bind. Their data generation process was causing them iteration & operational frustrations in delivering time-series forecasts for the business. In this talk I’ll present Hamilton, a novel python micro framework, that solved their pain points by changing their working paradigm.
Specifically, Hamilton enables a simpler paradigm for a Data Science team to create, maintain, and execute code for generating wide dataframes, especially when there are lots of intercolumn dependencies. Hamilton does this by building a DAG of dependencies directly from python functions defined in a special manner, which also makes unit testing and documentation easy; tune into the talk to find out how. I’ll also cover our experience migrating to it and using it in production for over a year, along with possible future directions.