Knowing when to adopt a new technology is just as important as knowing what to adopt. If you’re even considering a feature platform, you’re probably somewhere in the middle of a massive transformation of your data stack: laying the foundation for machine learning (ML) by moving to new streaming or batch data sources, transitioning to a microservices architecture, adding data quality monitoring, and introducing new CI/CD processes for models.
Depending on your goals and your current ML maturity, a feature platform could unlock a lot of value—or it might be too early and you’re throwing money into the wind. So how do you know when it’s time to shore up your capabilities, and whether it makes more sense to wait? Here are 5 signs that indicate a team is ready to benefit significantly from a feature platform.
1. You want to make predictions in real time
Most teams start out using machine learning for offline, analytical use cases like reporting and forecasting. But at some point, they become interested in using ML to make real-time or “on-demand” predictions.
One example is processing credit card transactions, where your model needs to immediately decide whether to flag a transaction as fraudulent and decline it or let the transaction go through. Another example is when you’re trying to personalize content on your front page without introducing any lag when users load the page.
With offline, analytical ML, you can wait for the prediction to finish or rerun it if it fails. But real-time predictions in your product are operational, they have to work reliably, at scale, and without introducing latency. Typically, for model inference that is part of a product flow, you want an SLA of ~100 milliseconds.
To solve this, a team with the requisite data engineering expertise might try precomputing features and storing them in a database like Redis or DynamoDB for fast lookup. However, this means you now have to maintain new, mission-critical infrastructure that handles feature computation and serving—including monitoring and having someone on-call at all times if data pipelines break.
On the other hand, a feature platform that’s optimized for real-time serving simplifies how you orchestrate feature computation and storage, and provide an API for fast retrieval. It can also provide SLAs, support, and built-in monitoring.
Note that online storage requires low-latency data stores, which is expensive—there’s no way around it, whether you use a third-party feature platform or build it yourself. So if you’re moving to real-time predictions, you’ll need a use case that justifies this expense.
2. You want to use extremely fresh data to make predictions
Most teams start out using batch historical data to make predictions. This data is typically updated daily or every few days (and maybe even less frequently). This is great for offline analytics use cases and may even work in some instances where you’re making real-time predictions—like if you want to base a product recommendation on a customer’s long-term history and don’t anticipate much lift from adding real-time or near real-time signals.
But at a certain point, this starts to break down. For example, you might want to serve a recommendation for a new customer for whom you haven’t precomputed any batch features (the “cold start” problem). Or you might have some new products that were just ingested today, and you want your model to highlight them immediately.
With fresh data, you can build ML-driven experiences that react immediately as new information comes in—which can be hugely valuable (ask ByteDance). However, to get fresh features to your models, you’re typically looking at using new data from many sources. Some examples include:
- Streaming data (Kafka/Kinesis)
- Third-party APIs
- Transactional databases
- Internal APIs managed by other teams
- Real-time context from the application
Each of these real-time or near-real-time sources presents its own challenges, such as doing efficient time-windowed aggregations for streaming data. For each new data source, you’ll need to build and maintain pipelines to handle the ETL, increasing the complexity of your ML stack. If you’re also using fresh features to make real-time predictions in your product (see Sign #1 above), this additional infrastructure becomes mission-critical and will likely require significant resources to monitor and maintain.
By plugging into the various real-time data sources that you want to use for features, a feature platform takes over the burden of orchestrating, monitoring, and maintaining this infrastructure. And it’s built to serve features from different fresh data sources at the enterprise scale and low latency required for production use cases.
3. Your ML team is spending too much time maintaining and not enough time building
The old adage that 80% of time spent on ML goes towards sourcing and wrangling data still holds true for many teams. For instance, a common workflow might look like this:
- A data scientist wants to improve a production model and gets an idea for a new feature.
- They first have to figure out where the data lives by poking around in the data warehouse or asking other teams.
- They eventually manage to get the data in the right format and test out the feature in offline model training, where they see a good performance uplift.
- At this point, the data scientist often will create a data model and “throw it over the fence” to ML engineers to implement in production. This process can be frustratingly drawn out with a lot of back and forth to explain requirements and explore options—especially when, as often happens, the engineer discovers that the model is built with data that can’t be used in a production environment.
- After the model is finally pushed to production, future iterations can be slow work, too. For instance, the data science team might believe they could improve with fresher streaming data. But the engineering team will need to implement new infrastructure to enable this, so the improvement is put on hold.
A feature store acts as a central repository and single source of truth for all your ML features. With features defined as code, a feature store gives data scientists access to the same DevOps practices (like version control and CI/CD) that are the backbone of modern agile engineering teams.
A good feature platform takes this a step further and goes beyond just storing and serving features. It also manages and orchestrates the data pipelines that transform raw data into features and helps automate the delivery of features to models. This helps get models to production faster and lets scientists test and iterate more quickly, ultimately resulting in more innovation. It also frees up ML engineers to work on thornier problems or help support other teams rather than just replicating data scientists’ work for production environments.
4. You want to scale to multiple ML use cases
If you’re only building ML for one use case at your company, especially if it’s a batch use case, you can probably get away without having a feature platform. However, once you try to scale that—even using batch data only—it can be really challenging to move quickly and to manage all the moving parts across teams and use cases.
If you want your ML team to operate more like a software team, where collaboration is easy and new ideas don’t get blocked by constantly wondering “where will we get the data” or thinking “but this will be hard to productionize and then maintain,” a feature platform can help you scale effectively.
For example, let’s say you build features around customer data for a single model. Once you have feature definitions and feature values stored in your feature platform, you don’t have to reinvent the wheel for your next model. You can use those same features over and over again across different use cases and teams to build all kinds of batch, streaming, or real-time predictive applications.
5. You want to future-proof your ML program
Let’s be clear: There will be challenges when adopting a feature platform. Data scientists are going to need to upskill. You’ll encounter new processes and need to change some aspects of how your team currently works, which is never easy (even if it makes life easier later on). So when it comes to considering a feature platform, teams often look at how their use of ML will evolve one to three years into the future. If you’re already investing in your digital transformation in other ways—e.g., you’ve moved from a monolith to microservices or your data is now in Snowflake or Databricks vs. Hadoop—then you’re familiar with what it takes to invest in new infrastructure and what the timeframe looks like for full adoption.
Similar to these technologies, an enterprise feature platform will be geared at supporting both the more basic use cases that you might have today in addition to advanced ones you might need tomorrow. If you decide you want an ML feature solution in a year, then realistically, it’s a good idea to start evaluating your options now. If you’re thinking of adding real-time or streaming use cases in the next year or two, it’s time to look into a feature platform.
To wrap up, your team stands to benefit from a feature platform if one or more of these things are true:
- You want to deploy ML in user-facing products to deliver instantaneous (real-time) predictions.
- You want to build models that use real-time or near-real-time data.
- Your team spends too much time wrangling data and not enough time building models, AND you want to be able to deploy and iterate on models faster.
- You want to build for multiple ML use cases. OR: you already have one or two models in production but have plans to deploy much more (something like that)
- You see ML as a long-term investment area vs. a one-off.
On the other hand, you probably don’t currently need a feature platform if:
- You only want to use machine learning for offline analytics with batch historical data.
- You have a limited number of ML use cases that are easy to maintain.
- You haven’t laid a foundation of digital modernization, such as moving to a microservices-based cloud architecture.
- You don’t see many opportunities for growth through expanding your use of ML.
If you have any questions that we didn’t answer here, we’d be more than happy to help you think through the potential costs and benefits of a feature platform like Tecton. Or if you prefer doing your own research first, you can view a Tecton demo and Q&A or sign up for a free trial.