If you work with machine learning in any form, then you know getting powerful models to production, faster, is the intent. But in order for an individual, a team, and/or an organization to develop and scale machine learning systematically, the people involved need the ability to tame, train, and manage the data and underlying systems that fuel predictive applications and products in production, from input to output.
With each new release, Tecton strives to help its customers do just that: transform raw data into powerful predictive signals and use those signals on demand to power predictive models, cost-efficiently.
Our latest release, Tecton 0.5, has exciting new capabilities designed to give our customers more flexibility and control of their features and underlying systems—all the while accelerating their journey toward real-time machine learning.
Advanced Data Flexibility & Quality Capabilities
A model is only as good as the data that powers it. That’s why Tecton 0.5 improves how you can access and interact with data.
Serverless feature retrieval. No Spark required!
What this means: Tecton’s SDK can now leverage AWS Athena compute—an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL—to generate training data sets from materialized features.
Why this matters: This enables fast offline feature retrieval without the need for Spark. It’s particularly useful if you want to generate training data sets using Tecton as part of Airflow, Kubeflow, Dagster, etc.
Unlimited data source flexibility with Spark data source functions
What this means: You can now use functions to define data sources for both batch and streaming Spark features. Whatever you can do in an interactive Spark notebook, you can now do in Tecton.
Why this matters: By simply writing any PySpark function that returns a DataFrame, you have unlimited flexibility in data source types, authentication mechanisms, schema registry integrations, partition filtering logic, and more.
Improve models with batch feature view skew reduction
What this means: In order to select historically accurate feature values, Tecton’s time-travel queries now consider more information, such as scheduling details.
Why this matters:Reducing online / offline skew is critical to achieving good model quality. Tecton now further ensures that offline feature data reflects the values that would have been available in the online store at a given time.
Additional Transformation Functionalities
From a simple feature definition, Tecton compiles and orchestrates production-ready pipelines that transform batch, real-time, and streaming data into predictive features. Tecton 0.5 lets you further fine-tune how and when to materialize these features for consumption.
Program upstream job triggers with the Feature Materialization API
What this means: You can now trigger feature materialization jobs programmatically. In other words, Tecton now makes it easy to use upstream data pipelines that run outside of Tecton to kick off feature processing as soon as new raw data is ready. The API can also be used to monitor feature materialization job completion statuses in order to kick off training or inference when new feature data is ready. The Tecton Airflow provider makes leveraging this API in Airflow DAGs quick and easy!
Why this matters: Manage your entire ML pipeline, from feature materialization and ML model training, all the way to making ML predictions, in the pipeline orchestration tool of your choice (Airflow, Kubeflow, Dagster, Prefect, etc.).
Trigger event-driven applications with new feature updates using Feature View Output Streams
What this means: You can now enable event-driven applications that react to new feature updates in Tecton. Tecton 0.5 supports both Kafka and Kinesis.
Why this matters: Once you configure the output stream for a feature view, Tecton will write records to that stream for every new value processed. For example, if you’re building a movie recommendation system, you may want to refresh “watch next” recommendations in the background after a user clicks on a new title.
Optimized Cost Capabilities
Real-time predictions drive real revenue, but they can also incur costs. That’s why Tecton 0.5 gives you more options to better optimize and dynamically scale resource utilization based on pre-defined requirements.
Optimize costs with “Suppress Object Recreation”
What this means: By default, Tecton automatically re-materializes feature data when changes are made to a feature’s transformation logic. This keeps historical feature data accurate. With the “Suppress Object Recreation” function, Tecton admins can now choose to suppress the recreation of objects and avoid unnecessary materialization costs.
Why this matters: Tecton’s Command Line Interface (CLI) now offers greater control over evolving feature pipelines and their underlying costs. With Tecton 0.5, admins can choose to avoid rematerialization costs if the changes do not affect feature semantics (e.g., commenting code, extending a data source schema, changing to a mirror data source).
But that’s not all: Tecton 0.5 also optimizes feature retrieval on Spark with a more stable and performant implementation of the point-in-time join, supports structs as feature types, and makes it easy to programmatically access metadata via the Python SDK.