From healthcare and online retail to meal kits and financial services, companies across industries are deploying real-time machine learning applications. This creates ever-increasing demands on ML engineering teams, which, in turn, need the support of a toolkit to build and deploy ML models quickly and reliably.
A feature platform is a core component of the modern ML stack, and helps ML teams deploy features to production. (A feature store also is an integral part of the system—we detail the differences between feature stores and feature platorms in this post.) While many companies have organically developed ad-hoc feature stores for their ML use cases, those that adopt a full-fledged feature platform stand to benefit in multiple ways, which I’ll detail in this post.
Benefit 1: Increase efficiency of ML teams and improve ML models
A good feature platform acts as a central source of truth that standardizes ML data workflows from design to production and, over time, breaks down silos and reduces tech debt. It provides easy access to high-quality training data that data scientists can use to train ML models, reducing training / serving skew.
Because training data is the foundation for all ML models, having easy access to vetted, high-quality data (i.e., data that mirrors real-life environments) makes a big difference in how quickly companies can build and deploy new models into production. If a model isn’t generating accurate predictions, a good feature platform makes it easier for a data scientist who’s training the model or an engineer who’s maintaining the model in a production environment to know right away if the cause is bad data or another problem. Meanwhile, ML engineers benefit because they can focus on deploying new models into production rather than doing infrastructure and maintenance work, reducing deployment time.
A large American insurance provider originally spent 60% to 80% of its data science bandwidth exclusively on ML engineering—i.e., infrastructure and maintenance work. After adopting a feature platform, the company reported that figure is now just 20%, freeing up the data scientists’ time so they can instead focus on improving features that feed into their ML models.
Feature platforms also encourage standardization in how data scientists and data engineers write features, leading to feature re-use across teams and models, saving precious time. At Tecton, we’ve found that feature re-use tends to follow the 80/20 principle: In similar use cases, the bulk of features get re-used from one model to the next, and teams need to create only a few features for new models.
Finally, for smaller companies that can dedicate only a small team to machine learning, feature platforms enable them to operate at or even above the level of a larger company, reaping all of the above benefits.
Benefit 2: Improve reliability and optimize ML infrastructure costs
Building and maintaining reliable data pipelines is one of the most challenging tasks in real-time ML—and broken data pipelines are one of the main hurdles to getting these ML models into production. A good feature platform will handle data pipeline orchestration so that features are accurately and reliably generated from raw data, stored, and served to models for both training and predictions.
Good feature platforms should also be able to handle high volumes of data and serve features at low latency, which is vital since extra milliseconds spent making a prediction could negatively impact the end-user experience. For example, if a customer submits an application for a car loan, the finance company needs to know in less than a minute whether the customer is approved or not and / or whether it could be a fraudulent transaction.
We all know it: writing new data values that are stored online for immediate retrieval by predictive systems is expensive and not necessary for most ML use cases. This is where a good feature platform comes in—users can optimize which features should be stored online for use cases that require real-time predictions (like fraud detection and dynamic pricing, which require super fresh features) while everything else can be stored offline.
Another cost saving is the reduction in engineering overhead required. Using a feature platform means needing fewer engineers to build and maintain data pipelines because the platform automatically builds, orchestrates, and maintains said pipelines. Data scientists don’t need to constantly rewrite features and can simply re-use feature materializations that the feature platform already automatically computes.
Benefit 3: Unlock potential new revenue sources with real-time data
The most valuable advantage of a feature platform comes into play when you want to use real-time data. This means you’re going beyond purely analytical cases like reporting, sales forecasting, or customer segmentation, all of which use batch historical data. With a feature platform, you can unlock new use cases that require near-instant predictions and use real-time (or streaming) data—examples include fraud detection, recommender systems, and loan approvals.
If you’ve tried implementing these use cases on your own, you probably experienced how difficult it is to get right—this is new territory for most companies. Integrating streaming infrastructure with your ML systems and having it all work nicely with your batch data is a massive task. Most teams lack the expertise and so they are stuck with real-time use cases that work well in a data scientist’s notebook or a product manager’s roadmap but are too difficult and costly to productionize.
To solve this problem, a good feature platform should be capable of providing real-time computed data to power real-time ML models, all while helping accelerate batch systems as well. This is why we often see companies adopting a feature platform for the batch ML use cases they’re building today, with an eye on the new real-time ML use cases they want to build to stay competitive tomorrow.
A good feature platform should enable ML teams to efficiently manage and improve their features, which in turn will improve their ML models, while minimizing the associated infrastructure and human costs. It should be compatible with a wide variety of tools and technologies so that teams can keep working in environments they are familiar with—for instance, if data scientists are used to writing everything in Python using Jupyter notebooks, they can continue doing what they’re doing.
With the right feature platform, your team will be well-equipped to handle growing volumes of data, successfully tackle and deliver a much wider range of use cases, and reap the benefits of real-time ML. For example, by adopting Tecton’s feature platform as a component of a wider ML platform, your team can gradually ramp up from using batch historical data to real-time data for ML models. Interested in learning more about how Tecton works? Sign up for our monthly demo with a live Q&A!