+

Data Feast Weekly

Get weekly insights on modern data delivered to your inbox, straight from our hand-picked curations!

Back

Weight of the Modern Data Stack

6 min
|
Double-edged outcome, alignment with Agile, and Impact
Nov 23, 2022
,
  and
Originally published on
Modern Data 101 Newsletter,
the following is a revised edition.

The Modern Data Stack or MDS has been a frequent subject of debate among technocrats. This could easily be confusing for data personnel and the frequent question that floats up is

Aren’t we supposed to implement the modern data stack instead of running away from it? After all, it is the “modern” equivalent of the legacy frameworks which aren’t the best alternatives either.


Consider the MDS as a giant experimental direction that has not been faring too well, so the data industry at large is experimenting with other strategies or directions. But what is the Modern Data Stack experiment and why are data practitioners overwhelmed with it?



🧩 The Modern Data Stack is a collection of multiple point solutions that are stitched together by users to enable a working flow from physical data to business insights.


Some key phrases to note here:

  • “multiple point solutions stitched together by users”
  • “working flow from physical data to business insights”

The MDS is positioned to solve the above two problems, but is it doing a good job at it? Not really.

Modern Data Stack: The Double-Edged Sword


The MDS could easily be referred to as a double-edged sword: while it solves problems A & B, it creates problems C & D, and the cycle continues. Anybody that wields the MDS is bound to suffer a couple of wounds.

After engineers solved (data collaboration) problems within Big Tech, the rest of the industry raced to copy them without being thoughtful about their unique data needs. The products of the Modern Data Stack were primarily developed in highly mature data ecosystems to solve very specific infra issues. The tools were not fundamentally designed to operate in a workflow. This is not necessarily a bad thing (In fact, I think most tools in the MDS are excellent) but it creates a mentality rooted in ‘what’s next?’ instead of ‘Are our fundamentals in place?’.

~ Chad Sanderson | Head of Product @Convoy

MDS is a consistent state of dilemmas

Over the last five to six years, hundreds of tools popped up in the data space and each claimed to solve part of the problem, creating an overwhelming number of competitive tools. In any typical data architecture today, data practitioners are faced with a constant need to decide and report which tool to plug in and why that tool makes more sense than a pool of other options.

The integration overhead

Every tool in the data ecosystem is not typically integrable with any and every other tool in the data architecture. Often a little push is necessary to fit the tool right in. Moreover, even if tools work in harmony with all other pieces in the infrastructure, there’s always an added effort of checking integration feasibility and then tying up the tool with every other piece every time any new tool is onboarded.

The resource overhead

Every time a new tool is onboarded, an excess cost is incurred behind the solution and licensing fees which have to be processed and renewed recurrently. There’s also a huge cost behind employee onboarding, training, and hiring the right expertise. Especially when it comes to niche tools that require specific skills.

The maintenance overhead

The MDS redirects data teams from focussing on data and applications to maintaining the infrastructure that powers the data and applications. If, say, 70% of the team’s effort is spent on maintenance, 30% is spent on building the right applications that could mine valuable insights that could actually impact the bottom line.

Creating data siloes instead of solving it

In spite of tightly coupled ties between multiple tools, an overwhelming number of integrations easily become fragile, and soon the tools stop communicating with each other smoothly. In other words, there is rampant friction between tools, leading to data and insights being caged within the walls of siloed tools and environments.

Image Credit: Modern

Learn more on what is a data stack and its evolution in this blog.
Evolution of the Data Stack: The story of how we interpret ever-growing data

Breaking the 12 Principles of an Agile Data Approach


Optimized Collaboration

Agile Principles

  • Business people and developers must work together daily throughout the project.
  • Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.
  • The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.
  • The best architectures, requirements, and designs emerge from self-organizing teams.
  • At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.

State of Modern Data Stack

MDS has successfully solved several collaboration issues of legacy systems. However, while it solved several challenges, it also presented new ones. The primary reason was that it was not built around people, but more around processes, tools, and specific infrastructures which neither accommodate change nor flexibility.

Due to complex integrations, multiple layers, and the resulting information friction, significant gaps between business and IT folks develop. No common layer for business teams to define, enforce, and operate business logic and requirements across the physical data and architectures exists. This actively increases the number of interactions between teams to resolve the after-effects of data solos.

With isolated tools, there is not enough information or metadata, preventing a detailed insight into the data ecosystem. So while teams brainstorm new ways to improve excellency, their plans are limited to foundations of partial information.

Change Management

Agile Principles

  • Our highest priority is to satisfy the customer through the early and continuous delivery of valuable software.
  • Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.

State of Modern Data Stack

MDS is resistant to change due to the vast web of upstream and downstream integration points where change has to be reflected. Moreover, the change is always reflected in abrupt phases since all the pipelines are never up at the same time in real-world scenarios. Due to this resistance, continuous delivery and continuous integration (CI/CD) is challenging for MDS. With tons of point solutions to integrate, operate, and fix, the focus is stolen away from ensuring a continuous data flow across business and IT teams.

Success Measurement

Agile Principles

  • Deliver working software frequently, from a couple of weeks to a couple of months, with a preference for a shorter timescale.
  • Working software is the primary measure of progress.

State of Modern Data Stack

Installing and maintaining the MDS drags the data-to-insights journey across months and even years. The cycle ends up being a long span when integration, maintenance, and onboarding overheads are taken into account. To get to the point of a working data product, several iterations and backtracking must be overcome.

Sustainable Consistency

Agile Principles

  • Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.
  • Continuous attention to technical excellence and good design enhances agility.
  • Simplicity–the art of maximizing the amount of work not done–is essential.

State of Modern Data Stack

While even the MDS can be designed well, it is limited to a weaved architecture where every integration is a continuous source of data debt and requires consistent maintenance. MDS imposes complexity instead of simplicity on its users with data silos, integration overheads, and recurrent licensing complexities.

Impact of MDS


  • Short-term impact In the short-term MDS can create data debt build-ups at every integration point that could stack up over time to incur huge resource and maintenance costs to the organization. A more day-to-day impact of the MDS would be the increasing length of the data-to-insights journey and a wider gap between data and business teams.
  • Long-term impact The long-term impact of MDS is the feared Data Swamp. A data swamp is a heap of rich and valuable data that cannot be utilized or operationalized by data teams due to missing business context. In other words, due to a wide gap between business and data teams, the physical data is stored and mapped with irrelevant business logic, which essentially makes the data meaningless.
Since its inception, ModernData101 has garnered a select group of Data Leaders and Practitioners among its readership. We’d love to welcome more experts in the field to share their story here and connect with more folks building for better. If you have a story to tell, feel free to email your title and a brief synopsis to the email-ID of the author.