Tecton

Effective ML System Development

apply(conf) - May '22 - 10 minutes

In order to efficiently deliver and maintain ML systems, the adoption of MLOps practices is a must. In recent times, the ML community has embraced and modified ideas originating from software engineering with reasonable success. Software 2.0 (AI/ML) poses some additional challenges that we are still struggling with today. In addition to code, data and models also abide by the continuous principles (Continuous Integration, Delivery and Training). At Volvo Cars, we are embracing a git-centric, declarative approach to ML experimentation and delivery. The adoption of MLOps principles requires cultural transformation alongside supportive infrastructure & tooling that enables efficient development throughout the ML lifecycle. Join us for this session to learn about how Volvo Cars embraces MLOps.

Leonard Aukea:

Cool. I just realized by hearing all of these other talks that I might be preaching to the choir, but anyways, I have 10 minutes to go and I was planning to talk about effective ML system development. Sorry, just give me a second. Right there.

Leonard Aukea:

First and foremost, I think some people might forget, this machine learning is actually software. It’s data intensive software. I see it as usually being implemented as data intensive distributed systems. Anyways, it’s also, as stated, dependent on data. If you want to build good machine learning systems or reliable ones, you need to figure out how you’re managing your data.

Leonard Aukea:

Uncertainty is a feature of machine learning, but also sometimes a bug. It’s harder to test for those reasons that we mentioned. You need to also test your model and you need to test your data. That’s also why it’s harder to debug since the search base for a potential bug is much larger, has additional dimensions to it.

Leonard Aukea:

Here are some, by my opinion, low hanging fruits. Basically you should care about system design. This is something that I see in practice not being done properly. Furthermore, you should adopt a branching strategy. Many machine learning teams or data science teams don’t really do that.

Leonard Aukea:

Another aspect is have a clear review process where you review your code and your analysis. I think this is very obvious for many of you. This ensures quality and you distribute knowledge across the team.

Leonard Aukea:

Anyways, another thing that is important is that you should write tests. As mentioned above, they are harder to write and there are not just standard assertions. You might need to write some statistical tests as well. You should adopt practices where, for failure modes that you have identified in production, you should build that into the mindset or the way of working for the team to automatically start working on a regression test and add that to your system to make sure that further releases does not regress over time.

Leonard Aukea:

Documentation is very important. You should document your analysis, your approach, your code base in general. This is something that in standard software development also is something that we should care more about. Given that you can have silent failures, as in your model and your system can be up and running, but you won’t really see it from a standard monitoring for just seeing that your system is up and running and providing predictions. You need to actually make sure that their predictions are within the scope of requirements.

Leonard Aukea:

Another part is obviously automation. If you don’t have a release process and a well-defined path towards production, and you don’t utilize Git properly, it’s hard to build CICD into your daily daily work to effectualize your development process.

Leonard Aukea:

You should plan for disaster because disaster will happen and you should probably have a disaster recovery plan in place from their early stages. That said, you should start simple and gradually iterate. I guess those were my slides. I don’t know how I am on time. Those were really quick, but I guess we can have a discussion around it.

Demetrios:

I love it, man. I appreciate that. We had a lot of people mentioning that your slides are very effective and the simplicity is key. It is crucial. So if there are any questions that you have for Leonard, throw them in the chat right now. Otherwise throw them into Slack and he will be over there. I already saw that you introduced yourself.

Leonard Aukea:

One thing is why I’m stating it this way is that I don’t see our challenges as being mainly technical in this case. It’s about practices being in place. And I see this adaptation as being very much low-hanging fruit.

Demetrios:

Ooh, I like that. We’ve got one question that came through. Since we actually are a little bit early right now, I’m going to ask it to you right real quick. Could you elaborate on branching strategy and review process?

Leonard Aukea:

Well, I mean, if you’re doing experimentation and working on a machine learning system or any type of software development, you should have a branching strategy in place like either Gitflow is quite simple. We have adopted that mainly because going with more complicated approaches is usually something that shows to be a challenge for machine learning practitioners, in particular data scientists. We need to build a solid foundation for them to collaborate. I mean, that’s how you build automation around your development process. You can’t really get up running integration tests or automatic deployments and stuff like that.

Leonard Aukea:

So, the way we build our CI systems is based on having a particular branching strategy. That’s why that is important. And when it comes to having a review process, you should take different perspectives so you can invite some that are, say, more software-centric to have a look at the code base, but also maybe some senior ML engineers or data scientists to review the analysis. I think we can try to make that easier for the practitioner. For example, like being able to review in the sense, in a more interactive fashion to look at plots, distributions, and whatnot. But it just ensures quality in a sense, and you should not merge a new feature in any way, shape, or form, unless it’s been reviewed.

Demetrios:

Makes complete sense. I mean, it’s kind of obvious, but I know it’s not as easy said than or done rather than said.

Leonard Aukea:

Sorry.

Demetrios:

No, no, go ahead.

Leonard Aukea:

No, it’s just, I feel like right now the wind is blowing this way and the machine learning community, which is great, but I’ve worked in this space for quite a while and I think we haven’t cared about these things at all in a broader setting for a long period of time and we need to shape up.

Demetrios:

Well, in that same vein, there’s a great question that just came through that’s talking about what disaster management techniques have ML practitioners used in general. Usually these are not discussed during the design process. Curious to learn your view.

Leonard Aukea:

Well, I think they might and should be discussed during the design process where then you try to iterate towards setting up well-defined requirements for your system and which scope it should function. Then you should run some type of stress testing to understand, really think about what are the worst case scenarios that might happen in production or things that might be exposed to the user. I think that’s something worth taking up in the early stages, and make sure that you have tests to ensure that your system does not behave in that way. I previously in other talks talked about a red team, which I think should definitely be taking care of these things and auditing systems.

Leonard Aukea

Head of Machine Learning Engineering & Operations

Volvo Cars

Leonard is driving ML Engineering and Operations at Volvo Cars. He is responsible for defining the overall mission and strategy for ML Engineering and Operations, leading the build of reproducible ML systems. Leonard Aukea has spent most of his career as a Data Scientist/ML Engineer.

Request a Demo

Unfortunately, Tecton does not currently support these clouds. We’ll make sure to let you know when this changes!

However, we are currently looking to interview members of the machine learning community to learn more about current trends.

If you’d like to participate, please book a 30-min slot with us here and we’ll send you a $50 amazon gift card in appreciation for your time after the interview.

CTA link

or

CTA button

Contact Sales

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Unfortunately, Tecton does not currently support these clouds. We’ll make sure to let you know when this changes!

However, we are currently looking to interview members of the machine learning community to learn more about current trends.

If you’d like to participate, please book a 30-min slot with us here and we’ll send you a $50 amazon gift card in appreciation for your time after the interview.

CTA link

or

CTA button

Request a free trial

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Unfortunately, Tecton does not currently support these clouds. We’ll make sure to let you know when this changes!

However, we are currently looking to interview members of the machine learning community to learn more about current trends.

If you’d like to participate, please book a 30-min slot with us here and we’ll send you a $50 amazon gift card in appreciation for your time after the interview.

CTA link

or

CTA button