New Release
Learn More
Your submission has been received!
Thank you for submitting!
Thank you for submitting!
Download your PDF
Oops! Something went wrong while submitting the form.
Table of Contents
Get weekly insights on modern data delivered to your inbox, straight from our hand-picked curations!
the following is a revised edition.
We want to partake in the data as a product experience (also focusing on product mindset), and in that pursuit, we need to define our first data product. To do so, we need to meet several prerequisites:
For the sake of discussion on building data products (our first one), let's consider ourselves as the data owners of all the data we will use, based on the following logical data model for the Customer Domain for our enterprise (called ACME):
Now, let's consider our physical data model, which consists of three distinct data assets inside our Data System (it could be a DWH, a Data Platform or whatever). Two relational tables ("CustomerTable" and "AddressTable") that explicitly express the many-to-many relationship through a simple foreign key, and a topic “AddressQueue” that encapsulates address change information (for simplicity, let's assume the schema perfectly overlaps with the "AddressTable" table). In this case, we have the following representation (note: in real life, it would be slightly different, but I've simplified it for illustrative purposes):
Assuming that our first data product aims to publish a comprehensive version of the customer base along with their addresses. At this point, someone should have already defined the structure of the data contract, which should include at least the following information:
In the process of building a data product, the data product owner should identify the basic information needed to satisfy a business need, whether it be for publication/sharing (data product source aligned) or for solving a specific use case (data product consumer aligned).
Together with any supporting figures (data owner, data steward, and so on), they should be able to complete the initial sections of the data contract, such as the name, classification, and necessary assets.
For our first iteration, let's consider we want to publish customer registry information with addresses without any logic other than the mere publication on our marketplace (or data mesh registry). In this way, the first version of the data contract for the 'Customer Registry' product would be as follows:
There are some fields that are important to focus on in the initial stage of building a data product. The "Data Content: Business Entities" field should refer to the taxonomy (the list of business terms in simple terms) of our data catalog. This is one of the cornerstones that enables subsequent semantic interoperability because it ensures two effects:
Then there is the "Quality Assurance" field, which is actually a separate section as it involves integration with the data quality engine, the exposure of scores, and their weighting (data veracity paradigm).
The absence of these fields is limiting because it does not allow for the management of interoperability and the evaluation of the contextualization of the reliability degree of the data product.
📝 Note from Editor:
The preceding information should be sufficient for publication on a generic marketplace. Once published on the marketplace, it's necessary to define the access possibilities. The first check to perform is what the company policies are for "internal" data (information that we have included in the first component of the contract). Assuming, therefore, that our company is not particularly restrictive and allows usage by the entire internal user base, we could have different cones on our marketplace for roles:
and for organizational units
These valuations obviously have aspects that can be further explored (in terms of scope and cost model), but it's decided that the dataset can be requested by "data analysts" and "data scientists" belonging to our same region (for example, Italy). We thus obtain the following enrichment.
Once the data product and its audience have been defined, it's necessary to establish the access methods to the same. Obviously, the components can vary greatly depending on the architectural aspects, but we can assume that the following methods have been defined:
For Data Analysts & Business Users:
For Data Scientists:
Having selected both data scientists and data analysts in the previous section as potential users, there should be a constraint to select at least one output port per type. This way, it is decided to eliminate read-only access (for example, due to workload or access cost reasons). The contract would then appear as follows:
In the example, the possibility of applying filters or local authorizations was not explicitly stated, but the system accommodates the profiling mentioned above. Therefore, users would have the option, depending on their role, to access only one mode of utilization (reporting for data analysts, lake for data scientists).
All other categories, such as non-Italian users, external users, and business analysts, would see the product on the marketplace but would not be able to access it (not even in the preview).
To better understand the journey of building a data product, in our example, we have included the simple case where the data product owner (DPO) or data product manager is equivalent to the data owner of all entities, and there are no complex validation paths due to the sensitivity level of the data. It is therefore reasonable to assume that the publication is self-approving, and there are no validation steps in this stage of building a data product.
Once the data product is published and made accessible/searchable, it should receive requests for usage (here, it needs to be evaluated whether they are automatically approved or not) and become an active member of the data economy. Therefore, DPOs must keep the products alive in the solution, evolve them, and manage them like any other asset under their responsibility.
Some of the evolutions could include: