Intelligent Systems with real-time ML systems

apply(conf) - May '22 - 30 minutes

In an omni-commerce space such as Walmart, Personalization is the key to enable customer journeys tailored to their individual needs, preferences and routines. Moreover, in e-commerce, customers’ needs and intent evolve with time as they navigate and engage with hundreds of millions of products. Real-time session-aware ML systems are best suited to adapt to such changing dynamics and can power intelligent systems to provide 1:1 personalized customer experiences, from finding the product to delivering it to the customer. In this talk we will look at how we leverage session features to power customer preference engines in real-time applications at Walmart scale.

Manoj Agarwal:

Hey, everyone. My name is Manoj Agarwal. I’m an architect at Walmart. I’m joined today with my colleague Praveen. We are here to share how we do personalization on our e-commerce platform. Praveen, we can share the slides. Praveen.

Praveen Kumar Kanumala:

Yes?

Manoj Agarwal:

Would you like to introduce yourself?

Praveen Kumar Kanumala:

Yeah, hi. Hi, everyone. This is Praveen. I work as a principal software engineer at Walmart Global Tech within the personalization and recommendations area, and I’m very excited to present our work at Apply conference.

Manoj Agarwal:

Awesome. Let’s go to the next slide. Our agenda is actually pretty simple today. We are just going to talk, walk you through the high-level architecture of our personalization platform. We’ll walk you through the use cases that we have on walmart.com. And then Praveen is going to take you, a deeper dive, into couple of areas around customer our preference engine and our feature and model store. Praveen, next slide please.

Manoj Agarwal:

We have hundreds of millions of customers across the US and international markets who are shopping at Walmart across our thousands of stores and our websites. This diagram shows how, as we look at the data about our customers through different channels, the picture of customer becomes more and more complete. Looking at the data of only one channel enables you to focus on customer behavior of only that channel. But to understand our customers more holistically, we need to be able to look at the broader picture and the holistic picture of what our customers are doing. With that, let’s start by understanding more about what personalization and recommendation does for our business and for our customers. Next slide please.

Manoj Agarwal:

Some of these use cases may just be an example for the e-commerce experience, but they also equally apply to the other channels as well, like in-store purchases and things like that. Basically, the customer journey expands different stages of acquisition, stages of finding products, deciding the most relevant product that a customer may want to purchase on either our e-commerce platform or in-store. And finally, converting or making the purchase for those items that they’re looking for. Customer journey, it also includes the experiences after the purchase, like changing or amending their orders, handling substitutions, recommendations in case of out-of-stock items.

Manoj Agarwal:

Again, for personalization and recommendations, as a customer is going through these different stages, majority of these experiences are what we call zero query problems. What it means is that for example, when customer opens our application and lands on the homepage, the customer has not fed us any query into the app to tell us what she’s there to do. We have to rely on what we know about the customer prior to the start of this session. We have to rely on what her preferences could be, predict what her intent would be to come to the Walmart app, and then paint our homepage based on those intents.

Manoj Agarwal:

As customer engages and interacts more during these stages with different experiences that are presented to her, we are able to also build a better view and better predictions of what her intentions could be in the given session. So it helps us shape customer’s experience, and [inaudible] it towards her exact needs in the current session and allow her to convert with a seamless and consistent experience. As you look at these different stages in the journey, there are several designs of recommender systems that are in play in the background. They enable us to find the most relevant content that we can show to the customers.

Manoj Agarwal:

Again, taking the homepage as an example, if the customer is a repeat grocery customer, then we want to make sure that the recommender systems at play, they help the customer build their basket very quickly. Because grocery baskets can be very big, almost 25 to 30 items typically, and adding each item to the cart can be time-consuming for the customers. So we provide a personalization module right on the homepage that allows customers to add 10 to 15 items to the card with a single click. That significantly reduces our basket building time and allows customers to complete their journey with efficiency.

Manoj Agarwal:

Apart from that, we also have recommendations that allow customers to discover new products. It’s not only just about purchasing what she has purchased in the past, but also looking at the other sections of the inventory that you might not have engaged in the past, which probably you have shown interest but not converted in the past. These recommendations, they also contain seasonal and event-based content as well. We have, let’s say, fathers day coming up next month, or rollback event coming up, so understanding what relevant content could be shown to our customers at different stages of their journey.

Manoj Agarwal:

Currently, personalization serves billions of product impressions across the entire ecosystem of our customer journey. And aside from just product impressions, we also personalize non-product related content like banners and badges, what to show to in what context to the customer. All of these different contents, they’re in a way the language which we speak to our customers by telling them that we know them, we understand their needs and their intent, and that we are there to help them build their basket with the right items that they’re looking for.

Manoj Agarwal:

For these predictions to be right, we need to understand the customer intent at the micro level. And we’ll talk about that micro intent to in detail in upcoming sites as well. Basically, the idea is that customer’s intent, it can shift multiple times during a session. For example, they might start by shopping for produce items, but immediately switch over to shopping for pantry items and then maybe to bathroom supplies and baby care products. So our inference engine has to be able to understand the shift of the customer’s intent and adequately move based on that so the customer doesn’t see a lot of friction points in our journey.

Manoj Agarwal:

In the cart, there are two kinds of recommendation experiences that we provide. One is that we call post-add to cart, or PAC in short. Another one is called last call experience. Post-add to cart, as the name suggests, it shows recommendations based on the items that you have in your cart. So it shows for example items that may go well with the items that you have added into the cart. Whereas, the last call experience, as the name suggests again, that is the last reminder for you before you hit the checkout to see if there could be other items that you might be interested in. So this, the last call, actually, mimicking the behavior of the checkout aisle in the stores, so it might have low consideration items like candies or batteries or other things that our prediction engines recommends that the customer would likely buy at the time of the checkout.

Manoj Agarwal:

For these cart experiences, like last call or post-add to cart, for all of these, the objectives of the recommendation system is different. In the last call, typically, we don’t want customer to go back to the discovery experience again. They have made up their mind to checkout, and we want to make sure that they still go ahead and proceed to checkout. So these are the different areas in which we provide recommendations. Praveen, next slide please.

Manoj Agarwal:

This picture actually, it represents high-level architecture of, we can say, probably any ML system. We have a data lake. With that data lake, we have multiple different kinds of data processing units attached. There are batch processing, stream processing, on-demand processing kind of data processing units. They do typically feature engineering and basically data transformation. Eventually, all the data is used for model training and testing. Once the model is ready, then we store that in model store. All the features go to a feature store. And from model store, basically, model store support multiple versions of the models. And there, it’s picked up. The model is picked up by the inferencing service, and that’s how it gets plugged in into the application. And then the feedback loop completes by application providing more and more data, maybe click-through rate and other feedback to the data lake that is used again for further trimming. There is nothing new. I’m sure that anybody attending this conference, they have seen this kind of flow multiple times, so I won’t spend much time on it. Praveen, next slide please.

Manoj Agarwal:

So yeah, let’s talk a little bit about the intent since I have been mentioning that for a while. Here, we are introducing the concept of what we call as micro intent, mainly because intent is animate in nature, and it’s applicable in a given context at a given time for the given customer. So in this example, what we see is let’s say the customer is trying to arrange for a party. First, she starts looking at the party supplies that she needs. She’s adding plastic forks, spoons, and other utensils. And then immediately, she starts looking for ice cream, because her micro intent has changed from trying to get party supplies to getting the food or the grocery for the party. And then she moves on to adding gift cards. Maybe she’s looking for return gifts or something like that.

Manoj Agarwal:

This is just one example of how the intent or the micro intent comes into play in the given session. In addition to data, we also pay a lot of attention to detecting the micro intent of the customer so our inferencing engine can actually shift accordingly. Let’s move to the next slide. Praveen will walk you through the details of the recommendation engine and personalization system.

Praveen Kumar Kanumala:

Thank you, Manoj. It was a very good overview of the personalization area and the recommendation area. And also, you had discussed very good about the intent area. Coming to the what exactly are the customer preferences, how do we actually come up with a model, we’ll just give a high-level overview of some of the feature data, what we use within the customer preferences model.

Praveen Kumar Kanumala:

Here, you can actually see an example where we are trying to capture all the historical information about a particular customer, whether what are the different item views, what transactions the customer has made, what are the different items he has actually added to the cart. And as you can see, we actually try to learn about using these historical features, different model rates. And we are actually trying to predict that at future into a time interval what items the customer has a high probability of purchase.

Praveen Kumar Kanumala:

For training the model weights, we actually use a variable time window here, different periods of time like a 5-day period or a 10-day period. And you can actually see an example that we use, a historical information from March to January 2021, to predict the future, what a customer can actually has a high property or prediction in our February of 2021.

Praveen Kumar Kanumala:

Coming to a real-world example of how a customer context actually works in the reranking of any kind of an item recommendation pool, underlying item recommendation pool, you can see an example here of what we are presenting. There is a customer who actually we have identified how he or his customer context to being, like he or she might like iPhone product. And by learning that customer context, there are a couple of things we can also predict as part of this, generally. One is given the product, the iPhone, is actually we can also predict what kind of price bands the customer actually can also might buy. So with that, both the price band and the brand, definitely data, what we call it, for customer understanding, we can use these set of customer context information to rerank the item, any kind of underlying item recall set, to actually highlight the items which has a very high probability of purchase at any point of time.

Praveen Kumar Kanumala:

So here, you can actually see what we call it as a customer understanding ranking algorithm, which actually is a function of the underlying item recommendation pool relevant score. And it also considers the item attribute understanding like what we discussed. For any given product, what is the brand? What is the prize band for that particular item? And also takes into the consideration the customer context. Here, for example, there is a different scoring, brand level affinity scores, available for any given customer. And also, the price level affinity scores are also available for any given customer.

Praveen Kumar Kanumala:

So with both this information altogether, we are able to actually see the example here to rerank the items from… The AirPods item actually has been reranked to top of the item recommendation list, and the remaining items were actually reranked below. This is a very good real-world example how we actually use customer understanding preferences data.

Praveen Kumar Kanumala:

Coming at a very high level, how do we actually build the customer preferences engine? As you can see here, you can see on the left-hand side the item recommendation flow, and we actually can have a little more detail-oriented of how the recall set scoring architecture actually looks like. If you look at the left-hand side picture which is we actually how able to gather the recall-related information by using the user-to-item interaction data, for example, what are the items which are viewed and also added to the cart. And using this information, we form a recall set. And then we also actually extract the item interaction, user-to-item interaction, embedding data out of it. And from the item catalog data, we are also able to generate the item embedding vectors, which actually can be used to find the similarity of different items.

Praveen Kumar Kanumala:

And also, there is a pairwise item attributes being extracted from the item catalog. And all these different feature data has actually been fed into the pairwise scoring algorithm, what we use. It is a wide and deep algorithm. So we actually combine both the wide and the deep to minimize the loss function at the very last layer and to actually provide us a pairwise score. And using this scoring, relevant scoring data, we are able to present a good recall set which customer has a high probability of purchase at any point of time.

Praveen Kumar Kanumala:

Moving to the next slide, we also talk about one of the use cases. I think one of the use cases is also how do we actually generate the recall sector at a faster pace in the sense by keeping the latencies very low. Here for the use cases, what we use to actually gather all of our embedding data, we use the Milvus ANN platform where we have different collections of the embeddings data as we can actually see is represented, one embedding set being targeted towards to find similar item embeddings, whereas another embeddings are targeted towards customer embeddings.

Praveen Kumar Kanumala:

And there are different kinds of collections of the embeddings actually being created into the Milvus platform. And using this data, we are able to fetch on a real-time basis at any given point, given any anchor item. Let’s say, if I want to find similar items, we can actually able to query the Milvus platform and find what are all the different similar items for a given item embedding within this embedding space available.

Praveen Kumar Kanumala:

And this is one of the very good example of how do we actually efficiently generate the recall set information so that your model coverage can be very high. Rather than preloading all the inference, batch inferred data, into a backend data store and only serving the data if the data is available in the data store, rather than there, if you actually move some of the use cases towards this Milvus platform, you can actually efficiently improve the coverage of your model since you will be accessing this data on a real-time basis.

Praveen Kumar Kanumala:

And coming to a combined or a in-detailed architecture of however a feature and the model store looks like. And as everyone know, for building a good real-time inference platform, you need to have a very good feature store available. And also, that feature store should be also accessible for real-time inference at a low latency, because the real-time calls are very latency… You have to keep the latencies very minimum. That’s your main goal. Otherwise, there is a chance that we might lose the customer if the time it takes the recommendations to load is very long, the customer might move away from the recommendation module altogether. So it’s very important to keep the latency very low.

Praveen Kumar Kanumala:

And as Manoj actually mentioned, the session-related context plays a very important role in the context of customer and intent understanding. We focus very much on stitching the events on a real-time basis or aggregating the data of a session context so that we actually use this data for a real-time inference so that we can actually provide better recommendations to the customer. Because the customer intent might be changing within a particular session itself based on what items or transactions he’s making.

Praveen Kumar Kanumala:

And you can see here, in terms of the feature store, we also aggregate the data from different feature store. And also, we keep the data latency to very low. And also, it helps us to build a very, very useful and a straightforward session-related context for a given customer to predict what is the intent and what is understanding of that particular customer identity given point.

Praveen Kumar Kanumala:

And on the other side, you can also see that there is a model store where we actually used to store all our trade models so that we can also able to easily A/B test all the different models to identify whether the new version of the model is performing very well compared to the old versions. And if not, what are the steps we need to take by using the feedback data to retrain those models?

Praveen Kumar Kanumala:

Coming to the next slide which is a detailed architecture view of our online inference and serving platform which actually plays a very critical role in predicting intent of the customer at any given session, what a customer is doing. So here, we have completely built this online inference platform in communities of cloud environment. And we are also being a multi-cloud as Walmart operates with a different cloud partners also. So we want to use, whatever the platform we have built can be easily be deployed across our different cloud partners site.

Praveen Kumar Kanumala:

As you can see here, as part of the online inference platform, so a couple of things what we did to ensure this platform is a multi-model supported platform. What that means is a data scientist member can actually bring either a TensorFlow model or PyTorch model or Onyx on any kind of a model they can actually bring to this platform, and they can easily able to spin up a microservice out of this model. And we are also providing a couple of interesting features where you can also tie different kind of models together as an ensemble fashion. So in the sense, if you need to interact, if you have a two independently very focused models and you want to actually build them as an ensemble model also so that you want to fit the output of one mode as a feature into the second model, you can also be able to easily do that by using the configuration where you can actually tie both these models together.

Praveen Kumar Kanumala:

And this, as you can see there, we have built something called the inference graph, which is at a very high level. So the concept of this, the main idea of the inference graph, was there are two things. One is we want to expose the human readable APIs to all our clients, because majority of the time, the models usually take embedding vectors or the features which were probably not… I mean, only the engineers or ML engineers or data scientists or whoever working on the model can easily understand, but to actually build a platform, you need to have a proper human readable API. So this inference graph, what it does was we expose the gRPC or REST API rate.

Praveen Kumar Kanumala:

And this inference graph also has the capabilities. Given a feature, it actually can automatically talk to a feature store to gather all the features which are required. And also call based on the configurations, how do you configure your model configuration? It has a set of pre-processing steps and the post-processing steps also, which you can actually configure so that… To give an example of a post-processing, let’s say we have some business use cases for a particular model. We only want to show items which can be delivered to your home rather than the items which you can actually pick up in the store at any given point. So to actually handle these different kinds of cases, we actually use this post-processing as a filtering mechanism of the recall set, which is generated from the model, and basically takes the output of this post-processing model and response back to the claims.

Praveen Kumar Kanumala:

And couple of other things you can also notice here was we envisioned this inference, real-time inference platform, handles both a synced and real-time calls. What that means was there are some cases where you actually are doing some kind of predictions and updating those prediction scores into some kind of a data store where you actually do the predictions, in such a way, using a Lambda architecture or something like that where you do the capture, I mean, regenerate this predictions by customer interaction events or something like that. So we actually use streaming jobs in those cases to find the prediction score and then update into the data stores also. And yeah, so this is a preview of what our talk today about intelligent customer preferences platform is, so yeah.

Demetrios:

Awesome, guys. That is super cool. I want to ask a few questions from the chat because we have a minute before the scheduled time. In the meanwhile, while we are waiting for the questions to come in, I think there was a really cool one that I’d like to see what you all think about it. It’s asking about feature stores in general. Does a feature store contain pre-computed features for all possible online inputs in order to avoid the need to compute features in real-time during inference, or is it used differently to speed up inference?

Manoj Agarwal:

Yeah. So for online use cases, it’s used exactly for that purpose, that you already have the feature computed available there and we use that to enrich the influencing request. But feature store has many other purposes. We use that to save compute costs for model training. So if we have already spent some compute to calculate the feature, compute the feature, then we will store that to intermediate result or the final result in the feature store, and it can be reused when we are training some other model if they need that same feature or retraining the same model next time. But yeah, for inferencing, it’s absolutely that same use case that we just refer to feature store to enrich the request. And at one time, we don’t have to do all that feature integrate.

Demetrios:

Excellent. Sweet. Well, thanks guys. It looks like the chat’s pretty quiet. Anyone that wants to throw any questions in here for Manoj or Praveen, feel free to hit them up on Slack as usual as we’ve been doing all day. Manoj, it’s great seeing you again, man. And I hope-

Manoj Agarwal:

Yeah, same here.

Demetrios:

… we can come back on the Analog’s community podcast sometime soon and talk all about this.

Manoj Agarwal:

Yeah, looking forward to that.

Manoj Agarwal

Architect Fellow

Walmart Global Tech

Manoj has 25 years of strong distributed systems and ML Platforms experience. Currently, he is an Architect Fellow at Walmart, making e-Commerce smarter with AI. Previously, At Salesforce, he architected a brand-new comprehensive machine learning platform designed to serve millions of models and billions of inferences per day. He has been building Search and ML platforms for the last ten years. He modernized the search middleware at Yahoo and was an initial architect of the Amazon Visual Search platform. He holds 10+ patents in the Search and ML area. Earlier in his career, he enjoyed building cloud platforms. He was a founding member of the Azure team at Microsoft; he led a team delivering a b2b integration suite of services to Azure. His cloud platforms passion led him to work at Rackspace, contributing to the OpenStack control plane. Manoj likes to share his knowledge at various meetups and conferences; recently, he presented at the IEEE Infrastructure Conference, AI DevWorld, MLOps.community, and other meetup groups. He likes to play board games and explore bay area hikes with his wife and two young adult children in his spare time.

Praveen Kumar Kanumala

Principal Software Engineer

Walmart Global Tech

Praveen is a Principal Software Engineer of Personalization & Recommendations at Walmart Global Tech with 10+ years of experience working on distributed systems, Micro-Services and ML Inference Platforms. In his current role, he is leading a group of engineers building multi-model framework ML inference platforms, Similarity Vector Data Store and Interleaving Testing platform for ranking algorithms. which can server millions of requests and also power the core models of Personalization. During his previous roles at Walmart, he was instrumental in building catalog services for Samsclub.com. He likes to drive innovation, research in the space of ML platforms focused on Real time Inference in Recommender and Personalization area. During his free time, he likes to watch movies and spend time with his family.

Add Your Heading Text Here

Intelligent Systems with real-time ML systems

Manoj Agarwal

Praveen Kumar Kanumala

Follow Us

Book a Demo

Contact Sales

Request a free trial