Tecton is the interface between raw data and production ML models. Sitting at the intersection between data and AI enables Tecton to provide powerful visibility into feature lineage – namely, what raw data ultimately powers which models.
Today we’re excited to announce the new Tecton Dataflow diagram. The Dataflow diagram renders an end-to-end visualization of feature pipelines, from data sources through serving features in production.
Dataflow has been designed to provide a full view of every Tecton resource in a workspace and how they all work together to produce the real-time features powering your production machine learning applications. In this article, we’ll discuss why lineage is so important when creating and using machine learning features, the new aspects of Dataflows that make tracking the lineage of Tecton resources even easier, and how Tecton solves the lineage problem.
Why lineage matters
Changes are always occurring in Enterprise machine learning organizations. These teams are constantly working to improve the performance and efficiency of data, features, and models that make up production ML applications. As the number of moving parts increase, it quickly becomes critical to track the details of where features are being developed from and where they are being served to in order to ensure pipelines are operating as best as possible.
For example, a Data Engineer may want to update a data source in some way by changing a table name, dropping or adding a column, or changing a data type. Without proper lineage tracking, the Data Engineer may make this change that breaks feature pipelines and silently degrades model performance. With Dataflow, the Data Engineer can now see every Feature Service that references the updated data sources and alert the owners before the changes take effect.
In a similar fashion, an ML Engineer or Data Scientist may want to alter feature transformation logic or add additional features. A quick reference to a Workspace’s Dataflow will show every Feature Service that depends on the newly altered features to ensure updates do not have any unintentional issues caused by cascading updates. Tecton’s plan/apply paradigm adds additional checks against this by automatically flagging all the dependencies on features to be updated and requires confirmation before proceeding with changes.
Tracking Lineage with Tecton Dataflow
With Dataflow, Tecton visualizes all the feature pipelines in a Workspace, from data source all the way through to production models. This visualization can promote feature and data source re-use and standardization by making duplicate and unused resources in a Workspace easy to identify and trace.
Understanding Model Inputs
When you highlight a Feature Service, the Dataflow diagram shows the data sources used. A Data Scientist using a Feature Service to develop a new model can use the Dataflow diagram to quickly identify other data sources and features that are available to use that have already been created. These additional model inputs could potentially improve model performance if they were added to a Feature Service. Data Engineers can also benefit from frequent checkups on Feature Services in a Dataflow Diagram to ensure that data sources and feature are being reused across Feature Services as much as possible. Duplicate model inputs can lead to unnecessary costs and unintended performance issues, but a Dataflow diagram makes it easy to spot and consolidate these duplicates.
Understanding Data Consumers
Likewise, with Dataflows you can see how a data owner can easily track every materialized data source. This provides a quick way to monitor and control costs associated with an Online Store. If there is a Feature View using the Online Store that is not in production, it may be more cost efficient to turn off its materialization to an Online Store until it is ready for production.
Understanding Feature Pipelines
Dataflows showcase Workspaces by illustrating the different types of feature pipelines Tecton orchestrates in a centralized feature platform. Our UI improvements can help you better understand all the pipelines running in a workspace and Tecton resources that make up each one. This can be especially useful for new users of Tecton looking to learn more about how it works and the intricacies of what compute is used in different parts of the feature pipeline.
Using Dataflow diagrams
These diagrams provide a great new overview of everything happening within your Tecton Workspace – the resources every user has built, the lineage that connects them, and the pipelines Tecton is automatically orchestrating with them. If you would like to start seeing Dataflows in your account, please contact your Tecton representative. These diagrams will display any resource you create with Tecton, including using Tecton with Rift, Tecton’s new AI-optimized, Python-based compute engine for batch, streaming, and real-time features. For more information on Rift, please see our announcement.