Extending Open Source Feature Stores to Fit Adyen | Tecton

Tecton

Extending Open Source Feature Stores to Fit Adyen

apply(conf) - May '22 - 10 minutes

We walk you through how we adopted Feast at Adyen. We’ll discuss the decisions we made because of infra and tech constraints, and the customizations we added— in particular for our open source project, spark-offline-store, which was adopted into the main feast repo. We hope our journey can help you reason about adopting Feast into your stack.

Joost van Ingen:

Hi guys, my name is Joost. I’m here with Thijs, and we are here to speak a little bit about feature stores and the work we did for that at Adyen. So, yeah, as you can see from the slide, I’m no longer part of Adyen, but I guess we left on such good terms that they still agreed to let me do this talk. I’m doing this talk with Thijs. It’ll consist of two parts.

Joost van Ingen:

So in the first bit, I’m basically going to give a whole bunch of context. I found this quite useful from previous iterations of the Apply Conference that we watched while doing the work, so we’re returning the favor here. Thijs will do the second part, in which we will discuss the plug-in that we developed in order to make sure that we can run Feast on our infrastructure at Adyen. So in the next slide, there’s a bit of introduction about Adyen. We are a payment company, basically, if you’re a business and you want to accept money from shoppers, Adyen can help you do that. You can see a bunch of colorful logos of companies that already trust us. As you can see from the numbers, it’s quite a big company, so we process big amounts of money. So next slide, we listed a few of the use-cases that Adyen uses machine learning for.

Joost van Ingen:

I think many people can sort of predict that credit card fraud is something that we want to prevent. Definitely we use ML for that. Another use-case that is a bit maybe more new is the fact that payments fail. Of course, it’s not new, but payments do fail, and a common case for payments failing is that people don’t expect money in their account, later in the month they might have their salary, so you can use ML to predict when it’s a good time to retry the payments. And like that, there are many use-cases.

Joost van Ingen:

In order to support all these use-cases at Adyen, we built an in-house machine learning platform. So in the next slide, its important to know about this platform that everything is hosted on-premise. So early on in the history of the company, Adyen made the decision to not use public cloud, so we have to host everything ourselves. Some other effects and figures of our ML platform, we release couple model versions every week, process quite a large volume, and many of these machine learning models are also part of the real time payment flow, right? So we have some low latency and reliability constraints to that. And as you can see, over the last years, the ML team has grown quite a bit to sizeable organization. So it was very fun to be able to work on all this.

Joost van Ingen:

Well, the next slide we’re getting near to feature stores. So this is a basic timeline of last year and the major milestones we had in implementing a feature store based on Feast. So yeah, a bit more than a year and a half ago, there was no general purpose feature store at Adyen, which was a very sad time, it was very difficult for data scientists to reuse the feature engineering efforts they did. And that’s why we decided to implement one, actually based on Feast, 0.8 was the first version we used, long time ago. We saw quite some adoption, people liked it and that’s why we kept developing. And then the end of last year we were able to upgrade to a recent version of Feast 0.14, and at the same time develop a plugin to make sure that we can run Feast on Spark, which is the main data processing tool that we use. And then quite a few months ago, this plugin was moved into the main Feast project, which is also main reason we’re here today.

Joost van Ingen:

And then the next slide is sort of a mandatory feature store diagram slide, always appreciated this a lot. This gave us a lot of confirmation that we were building something sensible. Basically on the right side you see the part where Feast is really living at the end in our big data/machine learning platform, it’s an Hadoop platform, I’m not an infrastructure engineer, so I’m probably forgetting bunch of important technologies. We use HDFS for storage and Python and Spark for processing, and, we use Feast basically only here. So on the left side, you see the real time payment processing platform with the online feature store, you can see there everything is based on Java and Postgres and we built a very custom solution.

Joost van Ingen:

So this is definitely a part where we see some future improvements and also something that Adyen is still working on. I think that the rest of the presentation will focus mainly on the right side and the component called the Spark Offline Store. So.. Thijs?

Thijs Brits:

Thanks you Host. So yeah, the Feast-Spark-offline-store, in-house built package, I’ll explain a bit why we really needed to build this, but of course it’s for Feast, the open source feature store. In earlier versions Spark was supported in Feast, but later they decided to move it out of the main repo. And then other repos became incompatible that were focusing on Spark and Feast combination so we really needed something to interface with it. And the reason for it was that, at our gen, the most wanted offline features were like aggregations over day ranges, larger calculations, like every feature store has, well, facilitates. And yeah, a lot of our internal ETL pipelines were of course, calculating these features using PySpark. And of course, Feast interface nicely with Python, and we were using Spark internally as well.

Thijs Brits:

So yeah, we needed some consistency and we wanted to promote feature reuse within the company, so we wanted a familiar looking interface instead of having like adapters that we built ccustomably. So we wanted to have consistency with the actual Feast repo and set up a custom plugin. So we decided to build this first in-house and then see if it got adopted and then maybe later opensource it. Then we started building the Spark Offline Store, basically followed the templates and we, well, we followed the templates of BigQuery and Redshift. We could really see these nice examples and focus on the consistency between these. And it was really easy to develop this way.

Thijs Brits:

So here’s an example of what it looks like, basically you can use Spark to read your different files, or tables that you have in maybe, your Hive metastore or put an SQL query directly in it, the usual way, but it uses the data source objects that Feast uses to interface with. You can also configure, your Spark configuration, you can add it to your feature store YAML, it’s not necessary though. You can also just… if you have already a Spark session running, that’s also fine.

Thijs Brits:

So we got some buy on adoption within the company, but then we were like, okay, we need to open source this. We need to show, give back to the community because we’re using Feast a lot. And, that was great.

Thijs Brits:

We got like some comments, some pool requests from other people. And then at a certain point, we got a context by, I think by the main contributor of Feast, if we wanted to add this to the main [inaudible] which was recently done. So big thanks to Kevin and Danny who are, I think, also giving a talk at this conference. It’s currently in the alpha status, so beware still, a little bit, but it will continue to be worked on. And so the conclusion for us is like we’ve seen, the feature store is a valuable part of an ML platform. We’ve seen a lot of buy-in within Adyen, we built a lot of custom logic, but Feast is valuable to provide good abstractions and guide the engineering thinking. And over time, the custom logic became less and less needed with new Feast upgrades and yeah, Feast is extensible to fit diverse tech and infra constraints. So we like the new direction. And to conclude that our experience in creating custom plugin was, was very smooth. So yeah, that’s it. Thanks.

Joost van Ingen

Data Engineer

Dexter Energy

Joost is a software engineer with experience working on ML and data platforms. While at Adyen, he helped design and build a feature store. He recently joined Dexter Energy, a startup providing predictions on the energy market, where he will continue his career building data systems.
Thijs Brits

ML Engineer

Adyen

Thijs is a Tech Lead and ML engineer at Adyen and worked on various feature lifecycle projects both for online- and batch prediction ML products.

Let's keep in touch

Receive the latest content from Tecton!

© Tecton, Inc. All rights reserved. Various trademarks held by their respective owners.

Request a free trial

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Contact Sales

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​