Model serving Archives | Tecton


Workshop: Bring Your Models to Production with Ray Serve

In this workshop, we will walk through a step-by-step guide on how to deploy an ML application with Ray Serve. Compared to building your own model servers with Flask and FastAPI, Ray Serve facilitates seamless building and scaling to multiple models and serving model nodes in a Ray Cluster.

Ray Serve supports inference on CPUs, GPUs (even fractional GPUs!), and other accelerators – using just Python code. In addition to single-node serving, Serve enables seamless multi-model inference pipelines (also known as model composition); autoscaling in Kubernetes, both locally and in the cloud; and integrations between business logic and machine learning model code.

We will also share how to integrate your model serving system with feature stores and operationalize your end-to-end ML application on Ray. … Read More

Streamlining NLP Model Creation and Inference

At Primer we deliver applications with cutting-edge NLP models to surface actionable information from vast stores of unstructured text. The size of these models and our applications’ latency requirements create an operational challenge of deploying a model as a service. Furthermore, creation/customization of these models for our customers is difficult as model training requires the procurement, setup, and use of specialized hardware and software. Primer’s ML Platform team solved both of these problems, model training and serving, by creating Kubernetes operators. In this talk we will discuss why we chose the Kubernetes operator pattern to solve these problems and how the operators are designed. … Read More

Accelerating Model Deployment Velocity

All ML teams need to be able to translate offline gains to online performance. Deploying ML models to production is hard. Making sure that those models stay fresh and performant can be even harder. In this talk, we will cover the value of regularly redeploying models, and the failure modes of not doing so. We will discuss approaches to make ML deployment easier, faster and safer which allowed our team to spend more time improving models, and less time shipping them. … Read More

Building a Best-in-Class Customer Experience Platform – The Hux Journey – Deloitte Digital

New technologies have been advancing rapidly across the areas of frictionless data ingestion, customer data management, identity resolution, feature stores, MLOps and customer interaction orchestration.  Over the same period many large enterprises have started to find themselves in the uncomfortable position of watching from the sidelines as these advances happen faster than they can evaluate the opportunities, build and sell the business cases, and select and integrate the new desired components.  Offering a pre-configured architecture – using best-in-class components and packaged as a performant and proven platform – offers an opportunity to jump to the desired end-state.  Shorter time-to-value, lower total cost of ownership and much reduced risk are the KPIs of interest.  The CTO of Hux, by Deloitte Digital, and the VP Hux ML Technology talk about the Hux journey, from thesis to execution, and from pain to proof over the last three years. … Read More

Reusability in Machine Learning

In this session we will explore modern techniques and tooling which empower reusability in data and analytics solutions. Creating and leveraging reusable machine-learning code has many similarities with traditional software engineering but is also different in many respects.

We will discuss ways of developing, delivering, assembling and deploying reusable components. We will compare multi-repos with mono-repos, libraries with micro-libraries, components with templates and pipelines, and present tooling which fosters discoverability and collaboration. We will touch on code and data dependency resolution and injection, reusable data assets, data lakes and feature stores. Additionally, we will discuss tooling and MLOps automation which empowers rapid development and continuous integration/delivery. The discussion is going to frequently link back to functional and non-functional requirements like modularity, composability, single source of truth, versioning, performance, isolation and security.

This talk aims to cover tools of choice, processes and design patterns for building and sharing production ready ML components at scale. It will surface learnings and battle-scars after trying to prevent reinvention of the wheel in one of the largest consultancies with 2000+ analytics practitioners. … Read More

Towards Reproducible Machine Learning

We live in a time of both feast and famine in machine learning. Large organizations are publishing state-of-the-art models at an ever-increasing rate but the average data scientist face daunting challenges to reproduce the results themselves. Even in the best cases, where a newly forked code runs without syntax errors (often not the case), this only solves a part of the problem as the pipelines used to run the models are often completely excluded. The Self-Assembling Machine Learning Environment (SAME) project is a new Kubernetes and Kubeflow project and community around a common goal: creating tooling that allows for quick ramp-up, seamless collaboration and efficient scaling. This talk will discuss our initial public release, done in collaboration with data scientists from across the spectrum, where we are going next and how people can use our learnings in their own practices. … Read More

Third Generation Production ML Architectures: Lessons from History, Experiences with Ray

Production ML architectures (deployed at scale in production) are evolving at a rapid pace. We suggest there have been two generations so far: the first generation were very much fixed function pipelines with predetermined stages, the second generation was pluggable components with a bit more flexibility but still pretty constrained. If history is a guide (especially looking at the evolution of GPU APIs), the third generation is going to come from making the computational power accessible and flexible.

We share our experiences with Ray, a system that makes distributed computing accessible and flexible. We give a two slide introduction to Ray, and show how Ray’s flexibility enables approaches like online reinforcement learning that are not easy to fit in to existing production ML architectures without some serious shoe-horning.

We then outline how different companies (such as Uber, Ant Financial, McKinsey) are using Ray in a way that extends beyond the constraints of existing second generation architectures. … Read More

Machine Learning is Going Real-Time

This talk covers different levels of real-time machine learning, their use cases, challenges, and adoption. … Read More

Scaling Online ML Predictions to Meet DoorDash Logistics Engine and Marketplace Growth

As DoorDash business grows, the online ML prediction volume grows exponentially to support the various Machine Learning use cases, such as the ETA predictions, the Dasher assignments, the personalized restaurants and menu items recommendations, and the ranking of the large volume of search queries.

The prediction service built to meet these use cases now supports many dozens of models spanning different Machine Learning algorithms such as gradient boosting, neural networks and rule-based. The service supports greater than 10 billion predictions every day with a peak hit rate of above 1 million per second.

In this session, we will share our journey of building and scaling the prediction service, the various optimizations experimented, lessons learned, technical decisions and tradeoffs made. We will also share how we measure success and how we set goals for the future. Finally, we will end by highlighting the challenges ahead of us in extending the service to wider use cases across the DoorDash machine learning realm. … Read More

Let's keep in touch

Receive the latest content from Tecton!

© Tecton, Inc. All rights reserved. Various trademarks held by their respective owners.

The Gartner Cool Vendor badge is a trademark and service mark of Gartner, Inc., and/or its affiliates, and is used herein with permission. All rights reserved.
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Request a Demo

Request a free trial

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Contact Sales

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​