Explainability and Observability Archives | Tecton


Workshop: The Key Pillars of ML Observability and How to Apply Them to Your ML Systems

Taking a model from research to production is hard — and keeping it there is even harder! As more machine learning models are deployed into production, it is imperative to have tools to monitor, troubleshoot, and explain model decisions. Join Amber Roberts, Machine Learning Engineer at Arize AI, in an overview of Arize AI’s ML Observability platform, enabling ML teams to surface, resolve, and improve model performance issues automatically.

Experience ML observability firsthand with a deep dive into the Arize platform using a practical use case example. Attendees will learn how to identify segments where a model is underperforming, troubleshoot and perform root cause analysis, and proactively monitor your model for future degradations. … Read More

ralf: Real-time, Accuracy Aware Feature Store Maintenance

Feature stores are becoming ubiquitous in real-time model serving systems, however there has been limited work in understanding how features should be maintained over changing data. In this talk, we present ongoing research at the RISELab on streaming feature maintenance that optimizes both resource costs and downstream model accuracy. We introduce a notion of feature store regret to evaluate feature quality of different maintenance policies, and test various policies on real-world time-series data. … Read More

More ethical machine learning using model cards at Wikimedia

First proposed by Mitchell et al. in 2018, model cards are a form of transparent reporting of machine learning models, their uses, and performance for public audiences. As part of a broader effort to strengthen our ethical approaches to machine learning at Wikimedia, we started implementing model cards for every model hosted by the Foundation. This talk is a description of our process, motivation, and lessons learned along the way. … Read More

Machine Learning in Production: What I learned from monitoring 30+ models

It’s a software monitoring best practice to alert on symptoms, not on causes. “Customer Order Rate dropped to 0” is a great alert: it alerts directly on a bad outcome. For machine learning stacks, this means we should focus monitoring on the output of our models. Data monitoring is also helpful, but should come later in your maturity cycle. In this talk, I will provide practical strategies for prioritizing your monitoring efforts. … Read More

Wild Wild Tests: Monitoring Recommender Systems in the Wild

As with most Machine Learning systems, recommender systems are typically evaluated through performance metrics computed over held-out data points. However, real-world behavior is undoubtedly nuanced, and case-specific tests must be employed to ensure the desired quality. We introduce RecList, a behavioral-based testing methodology and open source package for RecSys, designed to scale up testing through sensible defaults, extensible abstractions and wrappers for popular datasets. … Read More

Data Observability for Machine Learning Teams

Once models go to production, observability becomes key to ensuring reliable performance over time. But what’s the difference between “ML Observability” and “Data Observability”, and how can ML Engineering teams apply them to maintain model performance? Get fast, practical answers in this lightning talk by Uber’s former leader of data operations tooling, and founder of data observability company, Bigeye. … Read More

Machine Learning Platform for Online Prediction and Continual Learning

This talk breaks down stage-by-stage requirements and challenges for online prediction and fully automated, on-demand continual learning. We’ll also discuss key design decisions a company might face when building or adopting a machine learning platform for online prediction and continual learning use cases. … Read More

Model Calibration in the Etsy Ads Marketplace

When displaying relevant first-party ads to buyers in the Etsy marketplace, ads are ranked using a combination of outputs from ML models. The relevance of ads displayed to buyers and costs charged to sellers are highly sensitive to the output distributions of the models. Various factors contribute to model outputs which include the makeup of training data, model architecture, and input features. To make the system more robust and resilient to modeling changes, we have calibrated all ML models that power ranking and bidding.

In this talk, we will first discuss the pain points and use cases that identified the need for calibration in our system. We will share the journey, learnings, and challenges of calibrating our machine learning models and the implications of calibrated outputs. Finally, we will explain how we are using the calibrated outputs in downstream applications and explore opportunities that calibration unlocks at Etsy. … Read More

Building Malleable ML Systems through Measurement, Monitoring & Maintenance

Machine learning systems are now easier to build than ever, but they still don’t perform as well as we would hope on real applications. I’ll explore a simple idea in this talk: if ML systems were more malleable and could be maintained like software, we might build better systems. I’ll discuss an immediate bottleneck towards building more malleable ML systems: the evaluation pipeline. I’ll describe the need for finer-grained performance measurement and monitoring, the opportunities paying attention to this area could open up in maintaining ML systems, and some of the tools that I’m building (with great collaborators) in the Robustness Gym and Meerkat projects to close this gap. … Read More

© Tecton, Inc. All rights reserved. Various trademarks held by their respective owners.

Request a free trial

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Tell us a bit more...​

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Request a free trial

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​