Hear Cash App’s journey through various generations of its core machine-learning capabilities and how Tecton’s real-time feature platform helps them deliver world-class recommendations.
Hear Cash App’s journey through various generations of its core machine-learning capabilities and how Tecton’s real-time feature platform helps them deliver world-class recommendations.
Hey, how’s everyone doing? Awesome. All right, first two presenters, both name Mike. So today we’re going to be talking about realtime ML at Cash App with Tecton. There’s a bunch to unpack here. Realtime, we want to do machine learning at scale with low latency. Cash app, that’s us and we’ll explain what we’re doing. And Tecton, that’s Mike.
So to give you an idea of what I do, I’m the head of Applied Machine Learning at Cash App. I run the engineering side of the machine learning team. So we are divided up into modelers and engineers and the key is that we need to work together to holistically solve a production machine learning problem.
And hey everybody. So I’m Mike. I’m also Mike. I’m co-founder and CEO of a company called Tecton, and we’ll talk about that a little bit today as well.
And my background, used to run some machine learning teams at Google and help start the machine learning team at Uber where we built out ML platform called Michelangelo.
A few of you may have heard of it. Cool. So what are we doing at Cash App? Most people know us as kind of a Venmo competitor, as an app where you can send payments between people, but actually our overall mission is to redefine the way that people interact with their money. So sending payments is just one piece of this and it’s an interesting piece. I mean we’ve got all these fun payment graphs that we’re working with there, but we want to do a lot more with this. We want every touchpoint with your money to be something that you can do within Cash App. So we support things like spending at Merchants. You have Pay with Cash App, banking, investing, Square. We also have the crypto thing going on with Bitcoin. So every time that we have the chance to touch a transaction or an interaction with money, we want to be there supporting it.With these use cases comes some unique challenges. If you want to be sending money between two users, you’re going to be dealing with a very large payment graph where user A is sending money to user B. Payments have an amount, they have a date and they have two users at the end of it as well as a bunch of other metadata. But that’s the fundamental architecture of the problem that we’re trying to solve there.
If you want to do spending your graph is now bipartite. So you’re taking a graph that was previously a user and another user and you’re replacing one of those users with an entity of some sort, a restaurant, a coffee shop, wherever you’re transacting. If you want to invest and let’s say you want to do a recommendation problem over an investment, then you need to be able to reason across users and assets and understand what makes those assets interesting and how you’re going to recommend similar assets to users.Search and discovery can span all of these problems. So overall what we’re trying to do is take this graph structure and make sense of the implicit relationships that exist between the two different entity types. If you’re going to search, you have the ability to significantly boost conversion rates. You can use distances in the search to limit candidates down and potentially speed your problem up. And search queries themselves are actually a really good indicator of user intent. So if you see a user that’s constantly searching for payments that look like maybe their rent payments for example, you then have a signal on what the user is trying to do within the app. They’re trying to pay their rent.
Discovery is similar to search but doesn’t involve a query. So if you’re going to be doing a discovery task within the app, you’re going to surface new things for a user. You’re going to find things that are similar to what they’ve already explored and put them there without the user doing anything just ambiently to help guide them through the app and realize their intent.
Both of these are unified by a single recommendation architecture, which I’m going to dive into. So let’s say we’ve got a data scientist, call him Bob, we should have named him Mike, and he wants to do recommendation. What is he fundamentally trying to do? He’s trying to join and rank two different entity types. So at Cash App, we like our eye logo and I always like to joke that it’s the eye on top of the pyramid on the dollar bill. I have no idea if that’s true or not, so don’t quote me on that. But what we’re doing is we’re ranking the eye against different shapes. The eye wants to go in the pyramid, so we have the strongest weighting on top of there.
This seems like a simple problem. You’ve got a graph, you’ve got some weights, you’re just maximizing them. It’s not. When you start scaling this graph up to the size of the internet, you’re getting into a bunch of complexity just trying to keep this thing in memory, sort out what this feature space is going to look like and figure out how you’re going to efficiently search over it.
The architecture of a recommender in today’s world can be summarized in about three steps. You have to feature as both entity types. You need to create a feature vector using some sort of feature extraction process to make sense of what these objects represent. You then generate joint embeddings over them. So you’ve got something like an auto encoder that will take the embedding of the shape features and an embedding of the color features. And then you have the ability to combine them using a dot product. This is called a two tower model and it’s widely replacing the older singular value, decomposition based search infrastructure that preceded it. You can still use those, but for most newer search tasks you’ll see an architecture like this evolving. This is also one of the foundational techniques behind newer techniques such as clip where you’re trying to interrelate two different modalities using some sort of latent space that you’re deriving across both.
And then the third step is retrieval. So once you’ve got this feature vector, you need the ability to plug in a new pair of things. In particular, you may want to plug in one of these and then search over the other one in order to find the best match. And that in turn will give you a ranked set of pairs that you can then use to return the recommendation results in order. This is the five-minute recommendation in a nutshell. Every one of these has a whole bunch of details and subtleties and nuances. So let’s say we’re really impressed by Bob’s understanding of this problem. Congrats, you’re hired. Now do it in Cash app.
Now we run into some problems. So Cash App, and these are all public statistics. We have 44,000,000 monthly active users as of Q4, 2021 and 12,000,000,000 in revenue as of 2022. So this is very large. If you’re going to be dealing with a 44,000,000 squared graph of users, you’re not going to be able to fit that in memory or make this an efficient pipeline that will run on one machine. You need to send a lot of queries per second in particular to your model hosting and feature store aspects of your pipeline. So queries per second upwards of a 100K are not uncommon in this type of pipeline architecture. You need latency to be very low as well. The end user is not going to want to wait more than a couple of hundred milliseconds. So that is basically your end-to-end deadline to get results back to users.
And you also need this to be available. You need ideally at least three nines of uptime so that you’re not suffering from outages in the middle of this. The data itself also needs to be very recent and this is one of the challenges that gives the talk the real time in its name. You do have the ability to use caching to speed up a lot of these feature lookups if you’re willing to accept some level of staleness of your features. So if you only want to update your feature vectors once every 30 minutes, you can run a batch job that essentially updates some kind of cash, let’s say a Redis cache that allows you to do fast similarity search lookups with low latency. But if you want something real time, then your updates are probably going to be on the right path or on the read path are very close to the frequency of request.
There are also organizational problems to this. So in addition to the technical problems, you’ve got privacy. Privacy is very important. We need to protect user privacy at all costs, critical. Team ownership, that’s another thing that starts to come into play when you’re dealing with multiple aspects of these systems like ranking and recommendation and storage and feature extraction and model hosting. These all need team structures to maintain them and with those structures come boundaries, come on call rotations and team setups and the need to staff each of these independently.
Support and maintenance, this is an ongoing burden. Once you think you’re done, you’re not actually done. You need to continue maintaining them forever. And then understanding the pipeline and being able to run experiments. This is something that requires different groups across the organization to talk with one another and really understand where the other one’s problems are. So all this to say is this is really hard.
Our existing infrastructure wasn’t a good match for this. So we had a feature store, it wasn’t designed for this level of throughput. Our model hosting service also ended up needing an overhaul because there were network and serialization costs to anticipate. If you’re storing models in a format that needs to be serialized and de serialized, then that’s going to add to your latency as well.
Feature caching, as I mentioned, it didn’t quite work out because it’s not real time enough. There’s a trade-off there and then the existing infrastructure couldn’t handle array valued features which are important if you’re going to be doing things like storing embeddings and trying to retrieve them later. And this just resulted in a bunch of bad things for us. It was causing P99 latencies to be pushed out. It was delaying feature updates. We had problems using these in recommendation problems to begin with and it sucked up all the bandwidth of the eng team.
We looked at typical feature pipeline architectures. If you’ve dealt with this problem before, you’ve probably come up with something that overall looks like this in some way. You’re taking your traffic, you’re getting events, pushing them onto a Kafka queue or something like that. Your data’s then being stored in a data lake. Databricks is supposed to have quite a good data lake from what I understand, chainless plug. You’re ETL that into some kind of distributing processing pipeline using something like Spark for example, or Google’s data flow. You’re deriving features, you’re taking some sort of dimensionality reduction, PCA, auto encoders or just even simple aggregate functions and storing those somewhere. And then these are all getting plugged into a feature store somewhere which then goes out to your models. There are challenges here.
In particular when you’re dealing with the processing two feature store step, there are a lot of things that need to happen. You need to be logging your features throughout this pipeline so you can go back and introspect if something goes wrong. You need the ability to debug, to interrupt your pipeline in the middle of it and then say there’s a problem here. Let’s isolate this step and see what we can do. You need the ability to orchestrate these services. These are separate services and therefore you need infrastructure that’s going to be able to help orchestrate them and have them communicate with each other.
Caching is a problem as well. You need your cache to live close to the actual user request to minimize latency, and that entails some level of replication if you’re running a global service. And then maintaining this whole thing is always a nightmare. So that is to say that this architecture, it works, but it definitely is a big lift in order to support.
So we looked into different architectures and we’ve started to settle on something which Mike will explain more called a feature platform. This essentially combines the processing and the future storage steps into one more comprehensive platform that can be maintained independently. And I’ll turn it over to Mike now to explain what a feature platform is.
Cool, thank you. So just to kind of recap the problems we were just looking at, this is a technical problem. It’s really hard, really high scale. It’s also an organizational problem. And so what we’re trying to figure out is how can Mike’s team not have to have a lot of full-time people dedicated to solving these data pipeline and data management problems for their realtime machine learning use cases.
So what is the feature platform? Well, I’ll start with how you use it. So the feature platform powers the data flows in the ML application and what does it look like from someone on Mike’s team’s perspective?
First, they just define their features and we’ll look at how that’s done. But then the feature platform orchestrates all of the data flows that are related to that machine learning application. So think of things like backfilling that feature data so that you can generate historical training data sets, generating point in time, accurate training data sets for when we’re building machine learning models, constantly computing fresh feature values as new data arrives, as new interactions happen in the product. And also serving those features in real time to allow the product to use those for real time predictions.
And then logging everything that we see that’s related to features, labels, predictions from the product back into this central system to allow for efficient training data set updates. Everything we see, if someone’s taking a certain action that might become a feature we need to update features based on that. And then monitoring all of this stuff for drift for data quality and a lot of operational concerns as well, making sure it’s actually running at the right speed in production. So all of these data flows, these are things that someone like an engineer on Mike’s team would have to be fully on top of building, managing, and designing for every single data application, every single ML application.
So with that handled, their modelers can actually just get straight to after defining those features, get straight to training models and deploying them into production and making predictions with them. So what does this look like? So the feature platform, it’s all about making it very easy for the modeling team or the engineers that support them to have a very simple way to define their features. So a single declarative feature definition file, which is just a Python file as you can see on the left side here in this case, a sequel feature that’s being defined. And then the user asked to define a little bit of metadata that tells the system what to do, what kind of feature this is. Is this a feature, a realtime feature? Who owns this feature? Stuff like that. How should it be backfilled, for example. And then what Tecton does, the feature platform does is takes that feature definition and essentially compiles it to a plan of all of the infrastructure needed to operationalize the different data flows for this feature.
So in this case, we’re showing connecting to data source, running an offline transformation, running that query, orchestrating that query transformation on say like a daily basis, taking those values, loading them into an online store for real time-serving, taking those values and saving them historically. And an offline store, you’re likely your data lake. We call that the offline feature store so that it’s really easy to generate historical training data sets in the future.
And then a variety of those other data flows we talked about as well. Logging which of these features are served, when these features are served in productions for debug ability and even looking at what’s happening in production and helping you keep these data sets updated in that first data source. So it’s about both kind of orchestrating these data pipelines, provisioning this infrastructure and managing it and maintaining it in production so that there’s a lot less work on the modeler side because they don’t need engineering support to build up this pipeline. And then a lot less work on the engineering side because it’s a significantly lower maintenance task.
So how do we do this right? Well, there’s kind of three components that are important here. One is just this development workflow, managing features as code. So really simple declarative feature definitions. They’re just Python files, just right, really simple Python, PySpark, SQL writing, a couple of other things and then iterate on this in a very easy way. Use similar to how you would do any other kind of get based workflows. Create a branch to run your experiment, deploy this feature code to an experimental workspace, your private workspace, kind of like a branch. And then you can test out your model, evaluate your model in the feature store there. Then you feel good about your model, you’ve evaluated it deployed to production, so merge that to the master branch and just go through your normal CICD process to deploy that code to production.
So it’s all about bringing engineering best practices to these types of data science workflows, these data science projects and treating the data science projects as production software really and making it as reliable as production software, as debugable as production software and having it fit in with our other production software, our other production software infrastructure.
Secondly, the feature pipeline. So we talked about that, that’s quite important. There’s a handful of types of feature pipelines that are supported in the platform. Every use case is going to need a different set of features, but those features may be all of different types. It’s not always that simple, run a SQL query once a day kind of thing. My team has a bunch of use cases that are, hey, we need to compute this feature value in real time. It’s actually only data that we see at prediction time. So we want to process that right away.
Sometimes there’s a streaming transaction and there’s a streaming aggregation. So the platform needs to provision and maintain a streaming job behind the scenes. And sometimes it’s just that kind of simple run as Spark Job, run a SQL job behind the scenes as well. But the output of those feature pipelines is both fresh data that is then used for serving and historical data that is then used for compiling this long list of the kind of historical values of a feature for later model training. So let’s see where those data go.
So the last element here is the feature store. So all of that data that is computed, all that feature data that’s computed, we store that data into in two places, the online store and the offline store. The online store is really used for that fast retrieval to support real-time predictions. So if you have a transaction and you have to determine someone’s trying to send a payment and you got to determine, “Hey, is this fraud? Should I allow this? Should this be accepted?” You need to decide right away and you need to retrieve those features really quickly so that you can make a real-time prediction all within a pretty tight latency budget. And so that’s what that online store is intended to support. The freshest values of every feature is always available from there in real time.
And then the offline store hosts all of the historical values and put behind a really intuitive API for data scientists, it allows them to easily generate point in time, correct training data sets so they can say, “Hey, my model needs these 1000 features and I need it for every feature at this point in time. Every time a user logged in the past, these timestamps give me that full array or that full matrix of all of those features at those points in time.” So that’s what that offline stores supports. And with that you can really support a really interesting set of workflows in the full kind of data life cycle and the ML application.
So we have our data source, but we also have the product. The whole point of doing all the ML is to power predictions in the product. Well, let’s look at the data flows through the whole data life cycle here. The first is just extracting the features from the raw data, turning them into training data sets, building our model, then using our model, deploying it online and then generating if we’re doing recommendations, maybe we need to generate some candidates that we’re going to score. Then generating the feature data for each of those candidates and then creating those predictions. So these are all the different data sets that are needed in an operational machine learning application, but that’s not really all of it.
There’s actually a second half of this loop, which is the data that is generated from the user’s behavior in production that then feeds the training data that we are constantly trying to keep updated. So we’re looking at what’s happening, we’re looking at if someone clicks the ad that we’re recommending or the item that we’re recommending, we’re logging that. We’re logging metrics, we’re logging predictions, features, labels. Did someone click it? This is the ground truth. We’re joining that all together into different data sets that can then be used by downstream ML processes. And so this whole loop is really important. And at Tecton we call this the ML flywheel. And it’s my claim that the teams that are really good at operational ML are really good at managing this loop. This flywheel is super important and the teams that are really good at ML, they’re actually very intentional about designing this loop and building this loop and managing it.
The teams that are still struggling with ML tend to not even realize there’s a bottom half of this loop and there’s just someone else in the organization. The data just shows up somehow and they’re not even really aware of where does that rest of that data come from. And so being very intentional about this loop allows your team to go faster, have higher debug ability throughout the different processes throughout any stage of the loop, and also just collaborate much more easily with your team.
So what’s the whole point of this? To allow feature engineers on your team to build and deploy and improve your models much faster. And so the idea is to support a really effective and natural workflow for those feature engineers. Define those features, deploy them, fetch them offline for training, online for predictions. But also we talked about not just the technical side of things, but the kind of organizational side. It’s important for collaboration, for sharing, for governance, to be able to track and reuse all of these features and then monitor them to ensure that they stay correct over time.
So at Tecton, we’ve built this platform and this is what we were referring to with feature platform. So we connect to your different data sources, the system is implemented, so both the monitoring, the feature pipelines, the feature store, and all of the feature repository and workflows are all implemented on top of your underlying data platform. So that could be a Databricks for example. And then on the other end, you just pulled training data sets or predictions from Tecton through the different APIs that are available in your data science environments.
So we just want to say one final thing about the flywheel that this flywheel is super important to be very intentional about. And I think it’s a thing that we’re going to see again and again. It’s what we’re focused on enabling with Tecton because the hard part and the kind of friction to enable this loop is tends to be through all of the data processes that underlie this loop, but it’s one of the things that we’ve been focused on enabling the Cash App team with how can they get as fast as possible, as many iterations as possible through this loop. And I think we’ve gotten there so far.
I’ll let you take over, Mike.
Yeah, sounds good. Yeah, so the quicker that you can do that, the more efficient your ML team’s going to be. The faster that you can get data in and out of the app and understand what your users are doing in response to your predictions and then use those in turn to influence the predictions, the more you’ll be able to run experiments and test things out and really accelerate the ML workflow in your business.
So going back to the payments recommendation problem, what does this give us? Primarily, it’s speeds our teams up. That’s the key. This reduces the time to deploy a feature from days to hours. And what this will allow us to do is focus more on the high level business logic and less on things like configuring and serving infrastructure or just keeping things running. And it also allows you to do things like trace back through your pipeline more easily, log and introspect what’s going on.
So this gives us a bunch of nice outcomes. It simplifies your ecosystem. I love simple. The simpler things are the less we have to maintain. That is music to my ears. Data and compute are kept close. That’s actually very important from a latency perspective because you don’t need to ferry things over the network. That will save you at least a few tens of MS, if not more. Serialization, that also tends to be an enemy of latency. So it eliminates the serialization overhead that we had passing data between two different systems that didn’t understand each other and that reduces things like having to go back and forth from JSON or something really expensive like that. And it also requires less maintenance overhead. Less maintenance overhead, as I mentioned, means more focus on the business problem.
Organizationally, this also solves some big problems for us. It creates one system that we don’t have to worry about organizational overlap on. So we don’t need to get into questions of which team should be working on what. And it reduces the overhead of having to coordinate with multiple teams as well. This is really easy for the scientist to plug into, and that I think is an unspoken advantage of every one of these types of technologies that if you have the ability for your scientist to iterate more quickly, they’ll come up with better features. You’ll end up with higher accuracy in your ML systems because their loop will become tighter. This allows us to also focus more on the business logic to find the data that we need to trace and resolve issues and then reduce the time to market for a new feature, as I mentioned.
So faster iteration leads to happier modelers. Features getting easier, it leads to happier data scientists. More focus on business logic leads to happier engineers. And lower latency leads to happier users. So happily ever after. If you think this is cool, shameless plug for Cash App careers. We’re hiring ML engineers throughout our organization and we’d be really happy to speak with you.
And I’ll also plug Tecton, we’re hiring. Come visit our Tecton booth or Booth 127 and you can request a free trial and try it out if this sounds relevant for your team on tecton.ai or just come talk to us. And we have two solutions we have. Tecton is an enterprise feature platform. It’s really great for collaboration, high scale production, operational ML, but we also support and are the maintainers of Feast, which is the most popular open source feature store. And a lot of people are using that to get their hands dirty. So if you have any interest in both of those, just come chat with us afterwards. Thank you.
If you have an existing pipeline, can you still take it over?
Yeah. Should I just repeat the question?
Yeah, just repeat it.
The question is if you have an existing feature pipeline, can you just take it over? Yeah. Basically, all of Tecton’s customers, they’re not starting from zero. They’re like, “Hey, we’ve got some machine learning pipelines that we already have.” And as everybody knows, it’s kind of a pain in the ass to migrate a data pipeline from one technology to another. So what we’ve really good at is just plugging into the existing pipelines that you have to consume the output of those. And some people want to maintain their own orchestration engines, their own feature code, their own feature transformations, and that’s fine. It’s more of an add-on kind of wrapper around the, or consuming the outcome of that rather than asking you to lift and shift onto an entirely new platform. But then what we see is people tend to say, “You know what? Let me build my next feature pipelines in Tecton.” That’s quite common. Okay, cool. Yeah, question.
So when the ML scientists are using certain features, two different people will be using the same features, but as feature engineering platform, if they write two files, like an example, two Python files, but they’re generating the same features, how do you avoid that redundancy, telling the ML scientist that these features have already been generated? You can just use them.
So the question, multiple people using the feature platform, how do you prevent two people from building the same feature?
Yeah. So the shorter answer is there’s not a lot of automation around preventing that, and we frankly don’t see that as being a main problem that our users and our customers have. What we do encourage is we encourage teams like Mike’s team to back everything, their feature repository, treat it production code, get repository, and when there’s changes you’re making to that repo, do a code review. Have someone who knows a little bit about that to go through, just like any software change you would make to production, but that a lot of that can be managed in the data science world as well. So you don’t need someone like Mike to review it. You can have some of the modeling team review it and if you have the right review process, someone can say, “Hey, we already have that.” If you have the right naming scheme, you can have someone notice, wait, there’s already a feature that does this. But the kind of automation around validating, hey, this is 100% a different feature, or this is the same feature, that’s not in the platform today. It just hasn’t been a focus.
Keeping things tidy and centralized helps a lot too with that. I think there was a question in the back, right?
Thank you. Yeah, I have a question regarding what about the feature change? So while you’re developing a project that you realized there are new features that you needed, so how do you handle the feature change?
And also another question is about the lineage. Yesterday during the training, [inaudible 00:31:38] does track the each feature, which feature was used in which model, the lineage in the data?
And the last one is how do you compare Tectons feature store compared to the other existing feature store, like a Databricks feature store and SageMaker feature store? Because whatever presented there looks quite similar to whatever SageMaker feature store has offered as well. So thank you.
Cool. So sorry. So what was the first part? So its lineage, changes, and then comparison to other feature store. So changes, there’s actually pretty robust change management in Tecton, and I would love to hear, Mike, if you guys do something specific other than that. But Tecton knows which models are using which features and it knows which features are derived from which data sources and has a full lineage graph throughout that to make that really visible. But because we really focus on that lineage, then what that enables is some smarter change management processes to make it easy for your teams to avoid common mistakes. So a common mistake is, oh, I just changed this feature and then I didn’t know that this other model would dependent on this feature. And now I was doing my own work, but I broke the fraud model that from the other folks.
So all the features are immutable in Tecton, so we make it really hard for you to change a feature. Accidentally,` that’s going to break another model. But also if you really are making that change, then we have logic in the system, we have advanced error messages and workflows to highlight that, emphasize that to the user. So we’ll say, “Hey, are you really sure you want to change this feature because this model depends on this feature already and you may break these other two models and you’re not even in the team that those models are owned by?” So that’s kind of tricky. And if you pair that up with the right access controls, then you can be very safe about which models can be affected by whom and preventing any errors. Before I go on, I’m curious if you guys do anything different above that.
Yeah, I mean, I don’t know how much I can get into the specifics there, but basically versioning of features and models is very important. So very often you’ll want to reproduce a particular point in time and then being able to see exactly what changed in the feature graph gives you that ability. So not much to add there. The immutability allows you to notify different downstream teams that you might break them. And then the ability to see the lineage will also allow you to go back and make sure that your training is consistent.
Cool. And then just in terms of comparison between Tecton and other feature source. So Tecton is a feature platform and we do a lot more than just storing and serving the features, but I think there’s kind of two ways to look at it. One way to look at it is the same kind of model as Databricks compared to AWS, right. AWS makes a hosted Spark service, but Databricks, that’s their business’s focus and they’re 100% focused on making the best Spark execution engine. And we’re super focused on building the best user experience around feature platform. So we just think overall across a variety of different parts of the user experience, it’s just much nicer.
But some specifics, and I know I’m getting the book here, so just some specifics to be more concrete. We do realtime feature transformations. We have super efficient realtime aggregations that are really easy to define. We have data connections that are really easy to set up and a lot more self-serve than other solutions. And the thing that we really differentiate on is reliability. We support Cash App and these guys put us through the ringer. So it’s really a focus on being able to support business critical data processes and not treating it like a data science tool, but treating it like a production software tool has been our focus and it affects how you build it. Thank you. Thanks for the question.
Thank you. Thank you very much guys.
Thanks everyone. Thank you for your attention, everybody. Appreciate it.
Interested in trying Tecton? Leave us your information below and we’ll be in touch.
Interested in trying Tecton? Leave us your information below and we’ll be in touch.