Exploiting the Data Code: Duality Applying Modern Software Development Practices to Data with Dali | Tecton


Exploiting the Data Code: Duality Applying Modern Software Development Practices to Data with Dali

apply(conf) - Apr '21 - 10 minutes

Most large software projects in existence today are the result of the collaborative efforts of hundreds or even thousands of developers. These projects consist of millions of lines of code and leverage a plethora of reusable libraries and services provided by third parties. Projects of this scale would not be possible without the tools and processes that now define the practice of modern software development: language support for decoupling the interface from the implementation, version control, semantic versioning of artifacts, dependency management, issue tracking, peer review of code, integration testing, and the ability to tie all of these things together with comprehensive code search and dependency tracking mechanisms. We have observed similar forces at play in the world of big data. At LinkedIn the number of people who produce and consume data, the number of datasets they need to manage, and the rate at which these datasets change are all growing at an exponential rate. This has resulted in a host of problems: rampant duplication of business logic and data, increasingly fragile and hard to maintain data pipelines, and schemas that are littered with deprecated fields due to the prohibitive costs of making backward incompatible changes. In order to cope with these challenges the team built Dali, a unified data abstraction layer for offline (Hadoop, Spark, Presto, etc) and nearline (Kafka, Samza) systems that enables data engineers to benefit from the same processes and infrastructure that are already used by LinkedIn’s software engineers. In this talk, Carl explains how Dali employs virtual SQL views to decouple the API of a dataset from the details of its implementation, describe how view versioning and dependency tracking allow us to make backward incompatible changes without breaking downstream consumers, and review the ways we have integrated Dali with the rest of LinkedIn’s software development ecosystem. Finally, he discusses how he leverages Dali in several company-wide initiatives including the redesign of the LinkedIn mobile app and GDPR.
Carl Steinbach

Senior Staff Software Engineer


Carl Steinbach is a software engineer and member of the Big Data Platform group at LinkedIn. He is the tech lead for the Grid Platform team and the architect of Dali, LinkedIn’s unified, virtualized data access layer for batch analytics. Before joining LinkedIn Carl was an early employee at Cloudera. He is an ASF member and former PMC chair of the Apache Hive Project.

Let's keep in touch

Receive the latest content from Tecton!

© Tecton, Inc. All rights reserved. Various trademarks held by their respective owners.

The Gartner Cool Vendor badge is a trademark and service mark of Gartner, Inc., and/or its affiliates, and is used herein with permission. All rights reserved.
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Request a Demo

Request a free trial

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Contact Sales

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​