+

Data Feast Weekly

Get weekly insights on modern data delivered to your inbox, straight from our hand-picked curations!

Module #
2

1

.

Evolving Data Personas

Understand the directional flow & context in data. Also learn about important details in the roles of data developers, data analysts, data engineers, and data product managers with key skills to develop for every persona.

The concept of Data Consumers and Data Producers offers a refreshing lens to understand data and the intricate roles humans play in this domain. From a technical perspective, the delineation between these roles defines the flow of data within systems. Modern applications, often comprising databases, APIs, and front-ends, seamlessly categorize into these roles, sometimes straddling both as hybrid entities.

Directional Flow emerges as a crucial aspect, where data moves unidirectionally, ensuring clarity and reducing cognitive load. This unidirectional movement from producers to consumers establishes a clear pathway, creating a flow that maximizes efficiency and purpose. It’s not just about data movement; it’s about ensuring that every producer has a corresponding consumer, avoiding aimless data production.

The introduction of Context offers depth to these relationships. Contextualizing these roles within specific scenarios or needs creates a more nuanced understanding. Whether it’s a service producing for human consumption or a report transformed into a producer in a different context, establishing clear contexts enhances the model’s applicability and relevance.

Source


At the heart of this model lies the human element—a constant producer and occasional consumer. Humans constantly interact with their environment, leaving behind trails of data. This dynamic is a cornerstone for businesses—leveraging human activity as either producers or consumers forms the core of ventures, from entertainment to research.

When applying this model to business strategies, it prompts reflection. Where does your venture lie? Are you primarily a producer or consumer of data? Striking a balance and understanding the necessity of both roles becomes pivotal. Whether it's creating content for consumption or gathering data to refine production strategies, this model offers a structured approach to decision-making.

From a personal standpoint, this model shifts the focus from mere categorization to actionable principles. Emphasizing effective production over consumption, understanding the importance of quality, and recognizing the interplay between production and consumption becomes imperative. Moreover, in a data-rich landscape, efficient consumption from existing sources minimizes the need for direct harvesting, optimizing processes and resources.

Ultimately, this framework isn't just about labeling but offers a structured way to approach problems, fostering a more concrete and strategic mindset. Whether it’s about enhancing data flow in technical systems or leveraging human activities for business, these concepts provide a guiding light for effective decision-making and action.

Deep Dive
Data Consumers and Data Producers

Data Developer

The term "Data Developer" emerged as a response to the evolving landscape of data science and analytics. Traditionally, the focus has been on data scientists conducting experiments with data, especially in large companies like Google and Facebook. However, the need for a broader set of skills beyond traditional data engineering led to the conceptualization of the role of a Data Developer. This role emphasizes the development and prototyping of data products with a keen eye on creating business value.

A Data Developer's main concern is to leverage the "Analytic Sphere" to provide analytic insights either as a service or through visualization. This requires a comprehensive understanding of computer science, software programming, and various data-related domains. While sharing similarities with data scientists in terms of required skills, the crucial difference lies in their perspective and problem-solving approach. Unlike data scientists who often engage in experiments, Data Developers are oriented towards building tangible data products based on existing frameworks and solutions.

Analytic Sphere: A toolbox for Data Developer | Source


The journey of a Data Developer involves building data products from the ground up. This includes architecting the platform, connecting data sources, and making analytics accessible. The "Analytic Sphere" serves as a toolbox for Data Developers, comprising Platform Architecture (core storage and processing), Connecting Data (data transformation and validation), and Accessing Analytics (surface layer providing analytic services and visualization).

In contrast to traditional data science roles, a Data Developer distinguishes themselves by actively engaging in user interfaces and understanding business requirements. They are not confined to concerns about storing, retrieving, and flowing data but extend their focus to evaluating new ideas within the production line. The term "Data Developer" is not just a semantic shift; it represents a shift in focus towards building data products that align with business objectives.

The responsibilities of a Data Developer extend beyond tuning algorithms or building competitive solutions for existing frameworks. Instead, they focus on building and connecting all layers in a data product, emphasizing practical applications rather than incremental performance improvements. The role is not fixated on specific technologies like Hadoop or SVM algorithms but is adaptable to different tools and frameworks based on the specific business value they bring.

The emergence of the term "Data Developer" serves a practical purpose in organizational structuring. It provides a clear definition of responsibilities and aids in recruitment processes, helping build focused teams that deliver value by incorporating big data. As organizations grapple with the increasing complexity of data, having a distinct role like Data Developer contributes to building a clear vision and fostering expertise in developing data products.

Deep Dive
Data Developer

What Does a Data Developer Do?

Data Producer (actions)

A data producer stands as the initial source, whether it's a person manually inputting data, an automated service gathering information from a website, or a device like an IoT sensor collecting data. Often, these data producers are junior members, possibly interacting directly with customers, like contact center agents or sales clerks inputting orders. Within an organization, various departments contribute to data production, such as sales, marketing, and customer service, each updating records in systems like CRM.

Source


As data plays a pivotal role across organizations, there's a multitude of data producers fulfilling different roles within different departments. Typically, data production is just one aspect of their responsibilities, integrated into their broader job functions. At the core of the data process are two main roles: data producers, creating data, and data consumers, utilizing this data for various tasks like planning, decision-making, or forecasting.

In addition to data producers and consumers, data governance frameworks encompass several other roles essential for maintaining data integrity and usability. These roles include the data owner, responsible for a particular dataset's quality, the data steward overseeing daily dataset management, the data custodian managing storage and maintenance, and the data governance manager monitoring the entire framework. Ultimately, data producers play a critical role in ensuring data quality and usability, requiring proper training and education on data processes and standards, with feedback loops between producers and consumers fostering a symbiotic relationship within the data governance framework.

Deep Dive
Data producer

The Data Fluency Framework

Data Consumer

The relationship between users and tech companies has become a complex, interconnected web. The collection of user data has drawn scrutiny due to its widespread use and, at times, misuse. Users often unwittingly consent to this data collection through lengthy and convoluted terms of service agreements, leading to a lack of transparency about the utilization of personal information. While this data isn't always sensitive like social security numbers, it delves into users' habits, preferences, and interactions—essentially shaping an individual's digital profile.

This data has multifaceted applications, predominantly for targeted advertising, analytical research, and even unethical practices like political manipulation or trolling. Legislation such as GDPR emerged to ensure transparency and protect consumer data, forcing companies worldwide to adapt their data collection practices. However, amidst the negative press, the underlying reason for collecting this data is evident: it serves as a fundamental tool for refining and developing products.

In this digital era, consumers have transformed into the linchpin for product development. Demographic data not only predicts market trends but also steers the direction of product innovation. Companies like Google, Facebook, and Microsoft heavily rely on user data to drive their product evolution. This data-driven approach aids in understanding users' behavior, preferences, and needs, allowing for personalized and targeted offerings.

The online landscape offers unparalleled advantages for collecting and analyzing user data. Websites track user metrics extensively, from engagement patterns to preferences, enabling companies to refine their products and marketing strategies. Yet, the data collection practices raise concerns, especially when it extends beyond product relevance to invade personal privacy by tracking online activities or social connections.

Despite this, not all data collection is invasive. Market research leveraging big data showcases how businesses can ethically gather information to enhance user experiences without breaching privacy. For instance, analyzing user age groups and their interactions with a website provides insights to refine marketing strategies, ensuring that content appeals to various demographics effectively.

Various tech giants utilize user data in different ways. Google's ad targeting and personalized search results rely on user data, while Facebook monetizes its vast user base through targeted ads. Spotify analyzes user behavior to enhance its music recommendations, while even smart devices collect data for improved user experiences. Amidst these practices, issues of trust and privacy loom large, prompting questions about the ethical use and safeguarding of user data.

The value of data in modern society is undeniable. It fuels insights, shapes decisions, and steers innovation. However, the trade-off between data collection and privacy remains a critical concern. Transparency in data practices and a balance between personalized experiences and user privacy emerge as crucial considerations in this data-driven landscape.

Deep Dive
In Big Data, The Consumer Is The Product

Data Analyst

Embarking on a career as a data analyst involves navigating through structured steps and acquiring a blend of technical and soft skills. The journey begins with laying a robust foundation in technical aspects: proficiency in data analysis tools like Excel, SQL for database querying, and programming languages like Python or R is pivotal. Understanding statistical analysis and data visualization tools like Tableau or Power BI forms the backbone of interpreting and presenting data effectively. Soft skills, equally crucial, encompass critical thinking, adept communication to relay findings to diverse audiences, and grasping the business landscape to provide context to your analyses.

Education is a key pillar, offering structured learning opportunities. Pursuing a degree in Mathematics, Statistics, Computer Science, or related disciplines provides a formal base. Alternatively, online courses from platforms like Coursera, Udemy, or LinkedIn Learning, and participation in bootcamps or workshops, offer a flexible route to acquire certifications and hands-on experience. Building a portfolio through personal projects, open-source contributions, or data analysis competitions serves as a testament to your skills and practical knowledge, crucial for attracting potential employers.

Source


Networking adds a dynamic dimension to your career trajectory. Engaging with the data analytics community, both online and through local events, fosters connections and exposes you to insights and opportunities. Job search and internships mark the transition from preparation to practical application. Applying for entry-level roles or internships and preparing for technical assessments and case studies becomes pivotal at this stage. Continuous learning is the crux of a data analyst's journey; staying abreast of the ever-evolving trends and technologies in data analytics ensures professional growth and adaptability in a dynamic landscape.

In conclusion, success as a data analyst stems from dedication, practice, and a continuous hunger for learning. The roadmap provided furnishes the foundational steps, but the journey itself is a blend of dedication, adaptability, and an unceasing thirst for knowledge and experience.

Deep Dive
How to Become a Data Analyst in 2023: A Complete Guide

11 Data Analyst Skills You Need

Data Engineer

The landscape of Data Engineering is undergoing a profound transformation, shifting from a focus on Data Infrastructure and Integration towards fostering Accessible Analytics. This evolution, exemplified within Meta and across the industry, highlights a crucial challenge—the tools predominantly cater to infrastructure and integration, not sufficiently addressing the burgeoning need for accessible, self-describing data. The emergence of Accessible Analytics redefines data accessibility beyond mere tool-based access to data, emphasizing the necessity for data to be inherently self-descriptive, enabling users to derive insights without specialized skills.

As analytics teams mature, the role of data engineers evolves. Initially focused on setting up foundational data infrastructure and ETL pipelines, the emphasis gradually shifts towards creating abstractions that enable consistent analysis, facilitating the development of sophisticated analytic applications. These applications, such as interactive dashboards and workflow automation, empower end-users to access and interpret data without deep technical expertise. The growing role of data engineers involves ensuring that the data foundation represents underlying business concepts accurately, supporting both data analysts and business users in drawing precise conclusions effortlessly.

In the maturation journey, data engineers increasingly concentrate on building high-quality analytic applications and ensuring a consistently modeled, scalable, and accessible data environment. This necessitates adopting software engineering best practices, like reusable abstractions and automated testing, to establish shared schema and business logic. Such practices not only enhance human accessibility but also facilitate system-driven accessibility, unlocking automation possibilities.

Source


The evolution within Meta's Data Engineering landscape has traversed distinct phases, starting from the focus on Data Integration at scale to addressing higher-level challenges of analytics complexity and privacy. Initially, the emphasis was on integrating data at scale, building curated datasets for exploration and reporting. However, as Meta expanded its product ecosystem, the analytics function decentralized, leading to diverse use cases and data ecosystems, prompting exponential growth in data processing demands.

The evolution journey further delved into establishing end-to-end workflows, enhancing the logging process for consistency, and developing modular pipeline authoring, all crucial to maintaining high-quality datasets. Yet, this evolution faced challenges, notably in achieving consistent product measurement and enforcing privacy constraints across a sprawling, siloed data lineage.

These challenges stem from the disunity between logical concepts (meaning and purpose) and the physical constructs used for production and consumption. Primitive schema types and unstructured schemas often fail to convey the meaning of sensitive columns throughout the data lineage, resulting in limited analytics potential. Moreover, the need to account for varying SQL dialects and business logic across different processing stacks poses consistency risks, hindering seamless analytics.

The reflections on the Data Engineering lifecycle and the challenges it poses pave the way for exploring solutions to bridge the gap between physical constructs and richer, descriptive logical datasets. Addressing questions about dataset attributes, schema definition and reuse, automatic enrichment, governance, and correctness of dynamically generated SQL becomes pivotal in shaping the next phase of Accessible Analytics within Meta and the broader industry.

Deep Dive
The future of the data engineer
(series)

Data Product Manager

The surge in data-driven decision-making has given rise to a pivotal role: the Data Product Manager (DPM). These professionals oversee the development and success of data products within organizations, serving as the linchpin between data science, technology, and business stakeholders. Their role spans diverse responsibilities, from envisioning product strategies and identifying business opportunities to prioritizing features, ensuring alignment with user needs, and tracking product performance for continual improvement.

Source


The journey to becoming a Data Product Manager involves a fusion of data science, analytics, and product management skills. Individuals aspiring for this role often engage in specialized courses or certifications, gaining proficiency in data visualization, hypothesis testing, modeling, and understanding the intricate relationships between variables. These professionals become architects of data strategies, shaping the product's journey by harnessing the power of data-driven insights and leveraging advanced analytics to solve complex problems.

A Data Product Manager is not just a product manager with a data-driven perspective; their expertise extends into programming, machine learning, and UX/UI design. While traditional Product Managers focus on technology, design, and business, Data PMs delve deeper into data intricacies, actively designing systems that collect, manage, and leverage data to refine and enhance the product. They play a crucial role in democratizing access to valuable insights across organizations, fostering collaboration, and paving the way for a future where data forms the cornerstone of success.

The collaborative nature of data product creation necessitates strong partnerships between various experts, with the Data PM acting as the bridge. They orchestrate collaboration between the Product Owner, Data Scientist, Product/UX Designer, Data Engineer, and Machine Learning Engineer, ensuring seamless integration of diverse skill sets for a unified product vision.

Determining whether an organization needs a Data PM depends on its product, stage of data strategy, and the level of interdepartmental collaboration required. These professionals offer enduring value by establishing robust data pipelines and democratizing access to insights, reducing reliance on individuals and ensuring data-driven decision-making resilience during transitions.

Embracing a Data Product Manager could be transformative, unlocking extraordinary creativity and development potential within organizations in an increasingly data-centric landscape. The fusion of technical expertise, data proficiency, and product management acumen positions Data PMs as catalysts for innovation, driving product success through a comprehensive understanding of the interplay between data and product strategy.

Deep Dive
Understand Data Products and Data Product Management

A step-by-step guide to becoming a Data Product Manager

Who is a Data Product Manager?