A Point in Time: Mutable Data in Online Inference
Most business applications mutate relational data. Online inference is often done on this mutable data, so training data should reflect the state at the prediction’s “point in time” for each object. There are a number of data architecture / domain …
Redis as an Online Feature Store
Feature stores are becoming an important component in any ML/AI architecture today. What is a feature store? – In a nutshell, the feature store allows you to build and manage the features for your training phase (offline feature store) and inference …
Evolution and Unification of Pinterest ML Platform
As Pinterest grew over time, machine learning use cases proliferated organically across multiple teams, leading to a proliferation of technical approaches with bespoke infrastructure. The ML Platform team has been driving Pinterest Engineering to …
MLOps Done Right with Centralized Model Performance Management Powered by XAI
Machine Learning brings success to any business through additional revenue and competitive advantages. But due to its high reliance on data, it is natural for ML models to degrade in performance over time. Whether it be from data drift or integrity, …
Feature Stores at Tide
After a brief introduction to Tide, we’ll talk about the challenges Tide faced to quickly productionize models, how we decided to move forward with a feature store and how this interacts with rules based engines.
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning Without a Data Lake
Machine Learning (ML) is separated into model training and model inference. ML frameworks typically use a data lake like HDFS or S3 to process historical data and train analytic models. But it’s possible to completely avoid such a data store, using …
Towards Reproducible Machine Learning
We live in a time of both feast and famine in machine learning. Large organizations are publishing state-of-the-art models at an ever-increasing rate but the average data scientist face daunting challenges to reproduce the results themselves. Even in …
Third Generation Production ML Architectures: Lessons from History, Experiences with Ray
Production ML architectures (deployed at scale in production) are evolving at a rapid pace. We suggest there have been two generations so far: the first generation were very much fixed function pipelines with predetermined stages, the second …
Supercharging our Data Scientists’ Productivity at Netflix
Netflix’s unique culture affords its data scientists an extraordinary amount of freedom. They are expected to build, deploy, and operate large machine learning workflows autonomously with only limited experience in systems or data engineering. …
Data Observability: The Next Frontier of Data Engineering
As companies become increasingly data driven, the technologies underlying these rich insights have grown more nuanced and complex. While our ability to collect, store, aggregate, and visualize this data has largely kept up with the needs of modern …
Hamilton: a Micro Framework for Creating Dataframes
At Stitch Fix we have 130+ “Full Stack Data Scientists” who in addition to doing data science work, are also expected to engineer and own data pipelines for their production models. One data science team, the Forecasting, Estimation, and Demand …
Tying the Room Together: Apache Arrow and the Next Generation of Data Analytics Systems
In this talk, I will discuss the coming architectural shift in data analytics systems that has been enabled by the widespread adoption of Apache Arrow, a universal language-independent standard for analytical data processing, and companion …