Lambda vs. Delta Architecture: An Overview - Pynomial (2024)

Menu

Delta Architecture is gaining popularity and advocates around the Big Data world. It’s because it offers more simplicity, quality, and reliability with ACID transactions, compared to other options, such as Lambda or Kappa Architectures.

As pointed out by Denny Lee, Developer Advocate at Databricks, a data engineer’s dream is to “process data continuously and incrementally as new data arrive in a cost-efficient way without having to choose between batch or streaming”. The Delta Architecture promises to move engineers one step closer to that dream.

Previously, we discussed the differences between Lambda vs. Kappa Architecture. So, now it’s time to focus and understand better how Delta Architecture can be an evolution for data management.

The Lambda Architecture, an Old Friend

During the start of the decade of 2010, processing data, especially huge amounts of datasets, in real-time was still a problem. Latency, complexity, and no single tools to build a Big Data System were some of the problems signalized by Nathan Marz at the time. In this context, Marz proposed the Lambda architecture, which tried to solve this problem with a hybrid approach, “by decomposing the problem into three layers: the batch layer, the serving layer, and the speed layer.”

In this architecture, the Batch Layer can take its time to process tons of data that take a lot of computation time (cold path), while the Speed Layer computes in real time and performs incremental updates to the batch layer results (hot path). Finally, the Serving Layer takes the outputs of both and uses this data to solve pending queries. Additionally, ‘it features an append-only immutable data source that serves as a system of record. Timestamped events are appended to existing events and nothing ever gets overwritten”, as this blog post refers.

However, complexity has always been a downside. “While a Lambda architecture can handle large volumes of batch and streaming data, it increases complexity by requiring different code bases for batch and streaming, along with its tendency to cause data loss and corruption. In response to these data reliability issues, the traditional data pipeline architecture adds even more complexity by adding steps like validation, reprocessing for job failures, and manual update and merge”, says Hector Leno in this article.

The Kappa Architecture, an Improvement

Later, the Kappa Architecture appeared as an alternative. It’s event-based and doesn’t separate the layers. The Kappa Architecture only has the Streaming Layer and the Serving Layer, so every kind of data that needs to be processed will be handled by a single technology stack.

The Kappa proposal represented an evolution to data processing and data analysis. Anyhow, it still has a high level of complexity for implementation and the use of extensive compute resources. It’s also hard to scale.

Delta Architecture, a New Approach

Currently, the Delta Architecture seems to be the next step around data. But first, it’s better to be familiar with the Delta Lake concept, considering that Delta Architecture relies on that. Delta Lake, as we explained previously, is an open-source storage framework that brings ACID transaction support and schema enforcement to Apache Spark-driven data lakes. It allows users to build a “Data Lakehouse” architecture that works with structured, semi-structured, and unstructured data.

Delta Lake “extends Parquet data files with a file-based transaction log for ACID transactions and scalable metadata handling.” It’s also compatible with Apache Spark APIs and integrated with Structured Streaming. Additionally, the separation between Layers in Delta Architecture is minimal compared to the Lambda Architecture, so there’s no need to treat data differently based on its source.

Considering this context, Databricks presents it as “a completely different approach to ingesting, processing, storing, and managing data focused on simplicity. All the processing and enrichment of data from Bronze (raw data) to Silver (filtered) to Gold (fully ready to be used by analytics, reporting, and data science) happens within Delta Lake, requiring fewer data hops”.

Delta Architecture’s Promises

  • Lower your costs: its simplicity helps you reduce costs significantly by reducing the amount of data that needs to be sent and received, the time needed to process data, as well as the amount of times you need to run jobs because of failures.
  • Delta = Less code: as we already said, Lambda Architectures need different code bases for each part of the architecture. But using Delta, as transactions are ACID compliant, you ensure your code is less complex because several parts of the code that needed to be done manually (to guarantee data consistency, for example) aren’t needed anymore.
  • Improved Indexing: when you use Delta Lake as the storage for your architecture, you bring together the capabilities of using Bloom Filter Indexes, which improve query execution performance by over 50%, according to MSSQLTips.com.
  • One source of data: when using other architectures and trying to simplify processes, data will often be copied from a data lake to other smaller data warehouses. This creates consistency and versioning issues that are solved by using the Delta Architecture.
  • Adding more data sources? No problem: usually, after a data architecture is designed and deployed for a specific use case, it’s hard for new data sources to be added. But when using Delta Lake as your engine, this no longer presents an enormous challenge as schema evolution makes adding new data sources (or changing the formats of existing data sources) a simpler task.

Summary

In a world seeking to be data-driven, developing a robust solution that can scale and handle any amount or type of data, has been the biggest challenge in the last few years. At this time, proposals such as Lambda and Kappa architecture have surged as a response to this need. However, they’re still far from ideal.

“There have been attempts to unify batch and streaming into a single system in the past. Organizations have not been that successful though in those attempts. But, with the advent of Delta Lake, we are seeing a lot of our customers adopting a simple continuous data flow model to process data as it arrives. We call this architecture, the Delta Architecture”, explains Databricks, the company behind it.

“Using this approach, we can improve our data through a connected pipeline that allows us to combine streaming and batch workflows through a shared file store with ACID-compliant transactions and provides the best of both worlds”, complements this analysis.

Subscribe

Login

0 Comments

Inline Feedbacks

View all comments

Lambda vs. Delta Architecture: An Overview - Pynomial (2024)

References

Top Articles
How did Matt Sturniolo become Famous? His Age, Height, GF, Bio - Net Worth Post
Where Do The Sturniolo Triplets Reside In 2024: A Comprehensive Guide
Northern Counties Soccer Association Nj
Uca Cheerleading Nationals 2023
Television Archive News Search Service
Skamania Lodge Groupon
Pinellas County Jail Mugshots 2023
Repentance (2 Corinthians 7:10) – West Palm Beach church of Christ
Affidea ExpressCare - Affidea Ireland
Localfedex.com
Call Follower Osrs
Achivr Visb Verizon
Crime Scene Photos West Memphis Three
2021 Tesla Model 3 Standard Range Pl electric for sale - Portland, OR - craigslist
Full Range 10 Bar Selection Box
Radio Aleluya Dialogo Pastoral
Moviesda3.Com
Urban Dictionary: hungolomghononoloughongous
Who called you from +19192464227 (9192464227): 5 reviews
The Weather Channel Local Weather Forecast
8005607994
Gotcha Rva 2022
Southland Goldendoodles
Expression Home XP-452 | Grand public | Imprimantes jet d'encre | Imprimantes | Produits | Epson France
Pioneer Library Overdrive
Uno Fall 2023 Calendar
Craigslist Middletown Ohio
2487872771
Matlab Kruskal Wallis
Bozjan Platinum Coins
Jay Gould co*ck
Craigslist Neworleans
Vip Lounge Odu
Barrage Enhancement Lost Ark
Maxpreps Field Hockey
Culver's of Whitewater, WI - W Main St
Jack In The Box Menu 2022
Ursula Creed Datasheet
Bob And Jeff's Monticello Fl
'Guys, you're just gonna have to deal with it': Ja Rule on women dominating modern rap, the lyrics he's 'ashamed' of, Ashanti, and his long-awaited comeback
Amc.santa Anita
Brown launches digital hub to expand community, career exploration for students, alumni
Suntory Yamazaki 18 Jahre | Whisky.de » Zum Online-Shop
Sacramentocraiglist
Scott Surratt Salary
Online TikTok Voice Generator | Accurate & Realistic
Game Like Tales Of Androgyny
Maurices Thanks Crossword Clue
Kidcheck Login
Where Is Darla-Jean Stanton Now
Download Twitter Video (X), Photo, GIF - Twitter Downloader
Phumikhmer 2022
Latest Posts
Article information

Author: Greg Kuvalis

Last Updated:

Views: 6666

Rating: 4.4 / 5 (55 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Greg Kuvalis

Birthday: 1996-12-20

Address: 53157 Trantow Inlet, Townemouth, FL 92564-0267

Phone: +68218650356656

Job: IT Representative

Hobby: Knitting, Amateur radio, Skiing, Running, Mountain biking, Slacklining, Electronics

Introduction: My name is Greg Kuvalis, I am a witty, spotless, beautiful, charming, delightful, thankful, beautiful person who loves writing and wants to share my knowledge and understanding with you.