In the past 10 years, we’ve seen an explosion of growth both in the use cases for machine learning (ML) and the MLOps tools for putting it into production. It’s now possible for a small team to deliver a use case like real-time fraud detection—which would have been extremely painful even 5 years ago, requiring a huge amount of engineering resources to get the data pipelines connected and deployed reliably in production.
In this article, we’ll look at the very early days of getting machine learning into production (aka production ML), where we are today, and some predictions for what it will be like to build ML applications in the near future, as well as the role real-time predictions play in staying ahead of the competition. Keep in mind that every business is at different stages in their adoption of production ML, so it’s possible (and in fact likely) that a given team will be slightly behind or ahead of the curve.
2015–Today: From analytical ML to MLOps
The story of production ML begins with analytics. The rise of complex decision tree and neural net models gave companies more sophisticated tools for answering business questions. At a typical Fortune 500 company, a data analytics team would be tasked with building ML models for internal sales forecasting, churn predictions, customer segmentation, and similar use cases.
Meanwhile, Silicon Valley was reaping the benefits of using machine learning in a very different way. They were putting models into production and using them to drive customer-facing products. Their data scientists were working with engineers and embedded on product teams, building features like route optimization, personalized recommendations, targeted advertising, and real-time pricing.
Most companies were left out of this early phase of ML-driven experiences because it was so new. Running a static analysis with ML is very different from deploying a model to serve live predictions in the product. These differences mostly have to do with the need for complex data pipelines to manage model features in production.
The data problem
Around 2019, a popular stat was circulating: 87% of data science projects never made it into production. Businesses could see the immense value that production ML was unlocking in the tech sector, and were at work on models for everything from gaming to ETA prediction to fraud detection—yet most of those models were failing to launch.
Why was it so hard to deploy ML four years ago? Most of the pain revolved around working with data. To begin with, it was hard for data scientists to discover all the sources of data they had to work with for building models. After they had a proof-of-concept, teams struggled to build reliable production data pipelines to feed models. Fetching fresh features from a Kafka stream and combining them with historical data in Snowflake, and doing this all at low latency and high scale, is a very challenging data engineering problem.
Data scientists typically worked in an analytics “silo,” and back-and-forths with engineering could drag on for months or quarters. Training-serving skew, where the data used to build the model is different from the data the model will see in production, made the process longer with painful debugging sessions.
Feature stores and the emergence of MLOps
Productionizing ML was painful—and the story felt a bit familiar. Something similar had played out decades prior, back when releasing production code was incredibly hard and time-consuming. Until, that is, DevOps tools came along, and made it easy to collaborate, test, deploy, and iterate on software systems.
The same tech companies that were successful with putting ML in the early days had figured this out. With data engineering expertise and large teams dedicated to building bespoke tools, they built massive DevOps-like systems for ML (e.g., Uber’s Michelangelo platform). These systems supported accessing data sources, creating features from raw data, and combining features into training data. They made it possible to quickly spin up the data infrastructure for calculating, serving, and monitoring features in production.
With growing demand for these capabilities to be democratized and available to any business, a new domain emerged: MLOps. Feature stores, as a cornerstone of a typical MLOps implementation, make it easy to define features from data sources and serve as a centralized place to share, manage, and collaborate on features for production models.
Today: Demand for production-readiness and real time
While it’s gotten easier to launch ML models, engineers are now being pushed to deliver on two very important dimensions: production-readiness and real time.
Businesses have learned over the past few years that it’s not enough to launch a model that’s accurate. Plenty of teams launch ML models that go stale, leaving the business back at square one. Instead, the model needs to be repeatedly accurate. For instance, can it make accurate predictions for thousands or millions of users per minute on an ongoing basis?
In addition to being repeatable, ML has to be as reliable as any other part of the application. It’s not enough for ML systems to make many predictions; they have to be served at the right latency, not drift, and remain highly available and trustworthy.
The second dimension ML teams are pushed on is real time. Real time is now at the competitive frontier for many companies. If your competitor is showing product recommendations based on what the customer just clicked on 10s ago and you’re not, you have a problem. If bad actors are getting faster and faster (e.g., creating thousands of fake accounts within a few minutes), and your fraud detection systems aren’t able to respond that quickly, you have a problem. Real time is increasingly not a nice-to-have, but a critical requirement for many production machine learning use cases.
Meeting these requirements is often very new and a big challenge for teams coming to production ML from analytics backgrounds, and it continues to challenge teams as ML is continually applied to new problems with new requirements.
Better results with better tools
The good news is that the modern MLOps ecosystem is getting less complicated and more powerful, and the following trends are making it much easier for data scientists and data engineers to build and deploy high-quality models to power products:
- Collapsing data storage costs. Historical data is now preserved almost indefinitely, and companies can collect, purchase, and store information about every touchpoint with customers.
- The modern data stack. With the modern data stack as a new architectural pattern, centralized data storage and/or access is replacing data silos.
- Real-time streaming data. Fresh data is a must for real-time ML use cases, and the past few years have seen massive adoption of streaming infrastructure like Kafka or Kinesis to supply applications with real-time signals.
- Embedded teams. While not all companies have moved to this model, many are beginning to embed ML teams within product and engineering teams, rather than keeping them as a separate analytics unit.
With the right tools and skills, a small team can now run an impressively sophisticated MLOps lifecycle for a real-time ML application. For example, using a feature platform like Tecton to orchestrate and manage all the data pipelines for building batch and real-time features means freeing up time so your teams can instead focus on building custom features.
As an example close to home for me, Cash App recently presented with me at Databricks’ Data + AI Summit about their real-time ML stack and how it’s become simpler and more powerful with a modern feature platform. It goes to show that what wasn’t possible last year may be possible today.
The future: A powerful ML flywheel
While building production ML use cases has gotten easier, teams still face too many roadblocks. The future of ML will be marked by advancements in how we manage the data lifecycle for machine learning. This lifecycle consists of four key stages:
- Decide: A model generates a prediction.
- Collect: Data needs to be logged to show what happened as a result of the model’s prediction. (E.g., did the customer click on the recommended ad?)
- Organize: Logs need to be turned into structured data that can be used to improve existing models or build new ones.
- Learn. Data scientists use the organized observations to extract features and train models.
Teams that can manage all four of these phases successfully will build a self-reinforcing feedback loop that unlocks faster iteration speeds and higher-quality models. This is the machine learning flywheel.
Today, some parts of the flywheel are still too hard. First, it can be painful to add logging for a new feature for a model, as this can require a lot of cross-team coordination. Secondly, even minor differences in data models can lead to problems integrating different sources into one model.
What’s needed are new abstractions enabling teams to automate previously challenging workflows. ML systems will need to be very closely integrated with the business’s existing cloud data warehouse or lakehouse. Furthermore, production ML teams will ideally avoid reinventing the wheel for use cases like product recommendations or real-time pricing. They’ll be able to pick and choose from the specific serving architecture they need to get a new use case running quickly out of the box.
Additionally, compliance and monitoring—essential considerations when building ML systems—must become less work for teams to manage on their own. These capabilities should become increasingly integrated into centralized feature and dataflow platforms, simplifying workflows.
Whether you’re a practitioner or a business leader, it’s an amazing time to be working in ML space. With the latest tools, it’s possible for a small team to have a major impact and create massive value. And these tools are still in the early stages of what’s possible to make real-time ML more reliable and achievable for every business. Interested in learning more about how to get started with real-time ML use cases? Check out these resources: