New Release

Learn More
Your submission has been received!
Thank you for submitting!
Thank you for submitting!
Download your PDF
Oops! Something went wrong while submitting the form.
Table of Contents
Get weekly insights on modern data delivered to your inbox, straight from our hand-picked curations!
the following is a revised edition.
Today, businesses are mostly drowning in data, but the quest for context and data-driven actionable insights remains a meandering loop with no destination in sight. Despite having access to vast amounts of data and cutting-edge language model technologies, many organizations struggle to make meaningful use of them that transparently and measurably affect the KPIs.
But what’s really holding them back?
Today’s LLMs are very impressive! They can whip up content, translate languages, answer questions, and more.
But here’s the thing—LLMs on their own are adolescents who mostly appear smart but need immediate guidance and much more knowledge before they can jump to conclusions. LLMs are essentially advanced text processors and struggle with basic math and logic. While they’re incredible, they’re not flawless.
Issues like hallucinations, contradictions, and an inability to handle complex logic still linger.
So, while improving LLM accuracy and minimizing hallucinations is important, that's not the crux of the problem.
The real challenge lies in how these LLMs struggle to capture context around cross-functional workflows. They don’t natively connect the dots between data silos and multiple processes, which leaves a huge gap in delivering cross-domain insights—the true essence of business.
There are 2 primary problems that limit the penetration of AI accurately across cross-domain functions and decisions.
LLMs typically struggle with understanding domain-specific semantics (like jargons), correctly mapping user intent to structured data queries, and reconciling inconsistent terminology across domains. The models built on general datasets often fall short when it comes to the precision required for each industry-specific domain.
Moreover, siloed context—for example, where different apps in the domain have their own vector DBs, or where each domain has its own separate knowledge graph, context gets divided and isolated from other domains/apps—developing into branches that don’t understand the data/events beyond its purview.
This creates an echo chamber for the domain—the domain’s self-induced knowledge gap!
Why self-induced? Because of the choice to dissociate context with separate tools and technologies—the output of which cannot converse with each other.
This issue is exacerbated when dealing with heterogeneous data sources or contexts requiring precise retrieval from structured databases, as LLMs are inherently designed to process unstructured text and may lack the necessary contextual grounding for structured inputs.
What organisations need here to bridge the gap between broad capabilities and the deep knowledge needed to perform accurately in these areas.
While building an application, the development teams and DEs often need the capability to call the models (say, LLMs) during multiple steps in the process.
For instance, you’re creating an adaptive survey. You start by asking a survey group a series of questions, but the responses trigger follow-up questions based on specific answers. For example, if a participant mentions that a product feature was "okay but a bit confusing," you might want to dig deeper: Was it confusing because of terminology? Usability? Expectations?
To make this possible, you'd need a combination of external logic combined with an LLM’s insights at each step.
External Logic + Responsive LLM Tasks = Intuitive Solution
Here, an LLM can help detect the sentiment behind a response—such as identifying frustration or ambivalence—and suggest the next logical question for richer data collection. This adaptive, back-and-forth approach allows for deeper insights, making the survey much more valuable.
But this demands a wider context during each jump- from one task to another. Often interwoven across domain data. For example, to ask intelligent questions, the LLMs across different steps would need context or retrieve information from not just the Product Domain but also Sales & Marketing to truly grasp the intentions, patterns, and response expectations of different survey segments. Moreover, the retrieved data from different domains must be able to understand each other (e.g.: context that ‘target_region’ from marketing data and ‘geo_state’ in a sales asset could refer to the same information).
Let’s face it: no business operates on simple, one-off tasks. It’s always about navigating complex workflows, juggling numerous deliverables, and aligning processes across different domains with those ultimate goals that push the organisation forward.
The approach is to humanise the entire problem-solving process by breaking down a problem into smaller, manageable parts and assigning specialized agents to tackle each one. This needs decomposing the problem into sub-parts and assigning tailored and specialized agents to solve each of the subproblems.
This is where AI agents have made a sharp entry to turn your LLM into more than just a chatty assistant, connecting it with specialized skills to take on real-world challenges and closing the ‘reasoning and step-wise execution’ gap.
When it comes to addressing cross-domain & cross-functional needs, multi-agent AI workflows help break down complex domain-specific tasks to allow individual agents to handle specialized aspects of a problem, creating a highly adaptive and efficient workflow.
In this multi-agent workflow, every agent is an independent actor powered by LLMs and collaborating with other agents. Every agent has its own prompt, a connected LLM and some custom codes to collaborate with the other agents.
Imagine an adaptive survey where follow-up questions dynamically evolve based on real-time sentiment analysis. In this scenario:
By having specialized agents handle these decision-making steps, businesses ensure workflows that are both robust and flexible, allowing for more personalized and insightful user interactions.
1. Adding too many tools to a single agent increases the likelihood of failures. Multi-agent systems mitigate this by assigning each agent only the tools needed for its task.
2. Different prompts can be coupled with different agents, and each prompt can have its own instructions powered by separate custom-tailored LLM.
3. Each agent can be independently improved without disrupting the entire system, enabling targeted updates, risk reduction, and easier scaling while maintaining system stability.
Reinforcement learning (RL) is like learning from trial and error! In the multi-agent system, RL algorithms allow agents to adjust their actions based on rewards, such as task accuracy or efficiency improvements.
While orgs try to achieve optimal results through reinforcement learning, leveraging RAG systems in addition to RL makes the system more resilient, intelligent, and context-aware by enabling continuous learning.
The RAG systems comprise two primary components: the retrieval and generation components, where the retrieval phase identifies relevant external data or documents, and the generation phase produces contextually appropriate responses or outputs.
In cross-domain scenarios, RL models often need to handle a variety of contexts. RAG’s retrieval system helps by fetching relevant external knowledge to guide the RL agent's decision-making. This allows RL to apply general strategies to specific situations, benefiting from data that wasn’t initially available in the agent’s training environment.
Instead of relying only on internal model predictions, the RL agent can pull in relevant, external data from large document corpora or databases with RAG’s retrieval component. This is especially useful when the agent needs to understand the environment more deeply (such as answering queries or generating text based on real-world facts).
RL spanning across domains often require the agent to adapt to new information in real-time, which can be difficult if the agent only has access to a narrow, domain-specific dataset. RAG allows the agent to access external knowledge relevant to the task at hand.
The proposed solution integrates the strength of individual components and their capabilities to integrate with each other in order to address the challenges of multi-functional tasks across multiple domains.
Integrating multiple technology stacks is complex, causing siloed data, misaligned goals, and requiring high maintenance as teams struggle with communication and data consistency.
In one line, Data Products serving contextually enhanced data aligned to a specific agent’s primary tasks/functions.
In this part of the article, we’ll try to curate a use case where the data product approach integrates with the multi-agent workflow in support of LLMs to solve the challenges of performing cross-functional operations.
In the typical business setting that takes the marketing and supplier example, let’s assume we want to measure the impact of a marketing campaign on suppliers.
Our focus: Deriving how cross-functional needs across these domains can be met with the data product approach + agentic workflows.
The individual data products are bounded sources of domain-specific intelligence, storing data and contextual information that agents can query effectively.
At the core, the data products act as specialised containers representing the discrete and specific domains. So, each data product represents a domain-specific dataset with defined boundaries and intelligence, like a "marketing" data product would mean one of the data products hosted by the marketing domain.
For simplicity, we can assume there’s one data product for each of the domains in this example: a ‘Campaign Compass’ data product for the marketing domain and a ‘Supplier Hub’ data product for the supply domain.
Campaign Compass provides its users with an end-to-end view of different metrics like:
Supplier Hub has insights into metrics like:
These two data products provide specialised data, metrics, and dimensions, which can be queried to answer questions while analysing or making decisions that evolve campaigns, optimise supply, or troubleshoot.
The RAGs here act as intermediaries for understanding complex, cross-functional questions and identifying the domains to parse the queries into sub-tasks designated for relevant agents or data products.
The retrieval counterpart of RAGs collects query-specific data from across domains, and the generative counterpart generates a new query with additional context, which is then used to deliver context-enhanced and accurate data.
For example, an LLM could supply a high-level question to a RAG model, which retrieves and interprets the various elements (e.g., marketing campaign performance and supplier impact), and direct each agent to query the necessary data products.
There is a defined workflow where step-by-step processes are defined for the agents.
In the established scenario, an LLM interprets complex, multi-faceted queries like:
"How did Campaign X impact Supplier Y's fulfillment rates and delivery times?"
Now, you have deployed a bunch of domain-specific AI agents in the multi-agent system, each agent dedicated to executing a part of the workflow that the AI model assigned to achieve the end goal.
Here, each data product is paired with a custom agent or a set of custom agents trained to query and analyze metrics unique to its domain.
Let’s consider agents A, B & C work with Campaign Compass that interact sequentially with each other to interpret marketing questions, translate them into queries, and validate the resulting data against the campaign objectives.
Similarly, agents working in the supplier domain with Supplier Hub collaborate to ensure supply chain insights are accurate and relevant, validating each step from question analysis to final data consistency.
This approach empowers businesses with reliable insights for managing supplier performance effectively.
Mapping AI Agent workflows with Data Object Graphs
A data object graph (DOG) is like a structured map showing how data objects or nodes (e.g., metrics, datasets, insights) connect and influence each other.
Data Expert Recommendation
Jon Cooke has done a spectacular job of advocating the impact of DOGs across his socials. Check out to learn more.
An agent creates a DOG and uses this graph to navigate through complex, multi-step business workflows. Starting with a clear business objective, the agent parsing the DOG breaks it into manageable subtasks, identifying the data objects and processes necessary to achieve the goal.
LLMs back in Action
Once the agents respond, the LLM integrates their insights, connecting marketing metrics (e.g., audience reach) to supply chain outcomes (e.g., delivery delays).
The LLM relies on structured prompts and templates to ensure consistency in outputs, integrating predefined metrics into readable text based on user queries, all while maintaining relevance and clarity for decision-makers.
Using NLG capabilities to combine outputs from the Question Analysis, Query Creator, and Data Analysis & Validation agents.
Hence, after the agents validate that Supplier Z’s on-time delivery rate dropped by 10% following Campaign X, the LLM might produce a summary like this: “Campaign X boosted demand for Supplier Z’s products by 20%, but fulfilment delays increased by 10%, likely impacting customer satisfaction.”
The semantic layer standardizes how data products define terms like "campaign success" or "partner performance," ensuring agents and LLMs interpret data consistently. A centralised repo of knowledge embedded within the unified platform ensures context sharing and closes the gaps between siloes vectors.
Utilising the semantic layer of the data product platform also allows users to create knowledge graphs to link entities (e.g., campaigns, partners, demographics) across domains, enabling nuanced cross-domain queries.
For instance:
The LLM adds context by relating marketing data (like campaign engagement) to supply chain metrics (like delivery performance), offering stakeholders a clear picture of cause-and-effect relationships. Here, the semantic layer ensures that both the LLM and the AI agents can query these metrics consistently across domains.
Resource Recommendation
A brief take by Animesh Kumar on the impact of Vector DBs when offered as an out-of-box feature of Unified Data Platforms
LLMs/RAG models use the semantic layer to interpret and translate marketing and supply chain terms into a unified business language.
The agents are configured to connect directly to the semantic layers. These layers abstract the technical complexity, allowing the agents to access clean, consistent data sets:
Organisations leveraging custom AI agents tend to derive optimal value when focused on effectively integrating knowledge graphs to allow agents to delineate and understand the context of and the relationship between multiple data points.
Instead of using basic flat vector databases to feed the AI agent’s memory that could misinterpret similar information, incorporating a knowledge graph helps agents track and remember data with nuanced relationships.
For instance, an AI agent with flat memory is deployed for a marketing task. This might confuse similar-looking campaigns and target the wrong audience.
Using a knowledge graph, the agent links campaigns with audience details, past performance, and demographics. This allows it to make precise recommendations for each audience segment, avoiding generic suggestions and boosting campaign effectiveness.
When it comes to AI and data initiatives, governance cannot be a step back. Beyond the operational improvements, incorporating the data product platform is a governance driver as well.
Your data product platform serves as a unified layer that supports all data products, ensuring consistency in data management, governance, and security across domains through shared foundational technologies.
The governance engine operates centrally in the control plane, enabling both local (use-case-/domain-specific) and global compliance for data products across multiple domains in the data activation plane.
This inherently addresses a spectrum of AI-related compliance drills one layer up in the consumption plane. And with user trust becoming the key priority, addressing governance needs is a must with a data product approach in the AI space.
Thanks for reading Modern Data 101! Subscribe for free to receive new posts and support our work.
If you have any queries about the piece, feel free to connect with the author(s). Or feel free to connect with the MD101 team directly at community@moderndata101.com 🧡
Find me on LinkedIn 💬
New on LinkedIn, join for a quick chat 💬
Find me on LinkedIn 💬
From The MD101 Team 🧡
Here’s your own copy of the Actionable Data Product Playbook. With 1500+ downloads so far and quality feedback, we are thrilled with the response to this 6-week guide we’ve built with industry experts and practitioners.
Stay tuned on moderndata101.com for more actionable resources from us!
A senior analytics engineer and a founding member of the Data Product Leadership Community spilling the beans for the curious ones!