Data Lakehouse In Action

eBook Download

BOOK EXCERPT:

Propose a new scalable data architecture paradigm, Data Lakehouse, that addresses the limitations of current data architecture patterns Key FeaturesUnderstand how data is ingested, stored, served, governed, and secured for enabling data analyticsExplore a practical way to implement Data Lakehouse using cloud computing platforms like AzureCombine multiple architectural patterns based on an organization's needs and maturity levelBook Description The Data Lakehouse architecture is a new paradigm that enables large-scale analytics. This book will guide you in developing data architecture in the right way to ensure your organization's success. The first part of the book discusses the different data architectural patterns used in the past and the need for a new architectural paradigm, as well as the drivers that have caused this change. It covers the principles that govern the target architecture, the components that form the Data Lakehouse architecture, and the rationale and need for those components. The second part deep dives into the different layers of Data Lakehouse. It covers various scenarios and components for data ingestion, storage, data processing, data serving, analytics, governance, and data security. The book's third part focuses on the practical implementation of the Data Lakehouse architecture in a cloud computing platform. It focuses on various ways to combine the Data Lakehouse pattern to realize macro-patterns, such as Data Mesh and Data Hub-Spoke, based on the organization's needs and maturity level. The frameworks introduced will be practical and organizations can readily benefit from their application. By the end of this book, you'll clearly understand how to implement the Data Lakehouse architecture pattern in a scalable, agile, and cost-effective manner. What you will learnUnderstand the evolution of the Data Architecture patterns for analyticsBecome well versed in the Data Lakehouse pattern and how it enables data analyticsFocus on methods to ingest, process, store, and govern data in a Data Lakehouse architectureLearn techniques to serve data and perform analytics in a Data Lakehouse architectureCover methods to secure the data in a Data Lakehouse architectureImplement Data Lakehouse in a cloud computing platform such as AzureCombine Data Lakehouse in a macro-architecture pattern such as Data MeshWho this book is for This book is for data architects, big data engineers, data strategists and practitioners, data stewards, and cloud computing practitioners looking to become well-versed with modern data architecture patterns to enable large-scale analytics. Basic knowledge of data architecture and familiarity with data warehousing concepts are required.

Product Details :

Genre : Computers
Author : Pradeep Menon
Publisher : Packt Publishing Ltd
Release : 2022-03-17
File : 206 Pages
ISBN-13 : 9781801815109


Data Mesh In Action

eBook Download

BOOK EXCERPT:

Revolutionize the way your organization approaches data with a data mesh! This new decentralized architecture outpaces monolithic lakes and warehouses and can work for a company of any size. In Data Mesh in Action you will learn how to: Implement a data mesh in your organization Turn data into a data product Move from your current data architecture to a data mesh Identify data domains, and decompose an organization into smaller, manageable domains Set up the central governance and local governance levels over data Balance responsibilities between the two levels of governance Establish a platform that allows efficient connection of distributed data products and automated governance Data Mesh in Action reveals how this groundbreaking architecture looks for both small startups and large enterprises. You won’t need any new technology—this book shows you how to start implementing a data mesh with flexible processes and organizational change. You’ll explore both an extended case study and multiple real-world examples. As you go, you’ll be expertly guided through discussions around Socio-Technical Architecture and Domain-Driven Design with the goal of building a sleek data-as-a-product system. Plus, dozens of workshop techniques for both in-person and remote meetings help you onboard colleagues and drive a successful transition. About the technology Business increasingly relies on efficiently storing and accessing large volumes of data. The data mesh is a new way to decentralize data management that radically improves security and discoverability. A well-designed data mesh simplifies self-service data consumption and reduces the bottlenecks created by monolithic data architectures. About the book Data Mesh in Action teaches you pragmatic ways to decentralize your data and organize it into an effective data mesh. You’ll start by building a minimum viable data product, which you’ll expand into a self-service data platform, chapter-by-chapter. You’ll love the book’s unique “sliders” that adjust the mesh to meet your specific needs. You’ll also learn processes and leadership techniques that will change the way you and your colleagues think about data. What's inside Decompose an organization into manageable domains Turn data into a data product Set up central and local governance levels Build a fit-for-purpose data platform Improve management, initiation, and support techniques About the reader For data professionals. Requires no specific programming stack or data platform. About the author Jacek Majchrzak is a hands-on lead data architect. Dr. Sven Balnojan manages data products and teams. Dr. Marian Siwiak is a data scientist and a management consultant for IT, scientific, and technical projects. Table of Contents PART 1 FOUNDATIONS 1 The what and why of the data mesh 2 Is a data mesh right for you? 3 Kickstart your data mesh MVP in a month PART 2 THE FOUR PRINCIPLES IN PRACTICE 4 Domain ownership 5 Data as a product 6 Federated computational governance 7 The self-serve data platform PART 3 INFRASTRUCTURE AND TECHNICAL ARCHITECTURE 8 Comparing self-serve data platforms 9 Solution architecture design

Product Details :

Genre : Computers
Author : Jacek Majchrzak
Publisher : Simon and Schuster
Release : 2023-03-21
File : 326 Pages
ISBN-13 : 9781638351849


Data Mesh

eBook Download

BOOK EXCERPT:

Data Mesh: The future of data architecture! KEY FEATURES ● Decentralize data with domain-oriented design. ● Enhance scalability and data autonomy. ● Implement robust governance across domains. DESCRIPTION "Data Mesh: Principles, patterns, architecture, and strategies for data-driven decision making" introduces Data Mesh which is a macro data architecture pattern designed to harmonize governance with flexibility. This book guides readers through the nuances of Data Mesh topologies, explaining how they can be tailored to meet specific organizational needs while balancing central control with domain-specific autonomy. The book delves into the Data Mesh governance framework, which provides a structured approach to manage and control decentralized data assets effectively. It emphasizes the importance of a well-implemented governance structure that ensures data quality, compliance, and access control across various domains. Additionally, the book outlines robust data cataloging and sharing strategies, enabling organizations to improve data discoverability, usage, and interoperability between cross-functional teams. Securing Data Mesh architectures is another critical focus. The text explores comprehensive security strategies that protect data across different layers of the architecture, ensuring data integrity and protecting against breaches. By implementing the strategies discussed, data professionals will strengthen their ability to safeguard sensitive information in a distributed environment, making this book a vital resource for anyone involved in data management, security, or governance. WHAT YOU WILL LEARN ● Understand the evolution and need for Data Mesh architectures. ● Learn the core principles and design for Data Mesh implementations. ● Identify and apply Data Mesh architectural patterns and components. ● Implement effective Data Mesh governance frameworks. ● Develop and execute a strategic data cataloging plan. ● Create comprehensive data-sharing strategies and security strategies within Data Mesh. WHO THIS BOOK IS FOR This book is ideal for data professionals, including chief data officers, chief analytics officers, chief information officers, enterprise data architects, data stewards, and data governance and compliance professionals. TABLE OF CONTENTS 1. Establishing the Data Mesh Context 2. Evolution of Data Architectures 3. Principles of Data Mesh Architecture 4. The Patterns of Data Mesh Architecture 5. Data Governance in a Data Mesh 6. Data Cataloging in a Data Mesh 7. Data Sharing in a Data Mesh 8. Data Security in a Data Mesh 9. Data Mesh in Practice Appendix: Key terms

Product Details :

Genre : Computers
Author : Pradeep Menon
Publisher : BPB Publications
Release : 2024-05-16
File : 331 Pages
ISBN-13 : 9789355519962


Building Modern Data Applications Using Databricks Lakehouse

eBook Download

BOOK EXCERPT:

Develop, optimize, and monitor data pipelines on Databricks

Product Details :

Genre :
Author : Will Girten
Publisher : Packt Publishing Ltd
Release : 2024-10-21
File : 246 Pages
ISBN-13 : 9781804612873


Data Engineering With Apache Spark Delta Lake And Lakehouse

eBook Download

BOOK EXCERPT:

Understand the complexities of modern-day data engineering platforms and explore strategies to deal with them with the help of use case scenarios led by an industry expert in big data Key FeaturesBecome well-versed with the core concepts of Apache Spark and Delta Lake for building data platformsLearn how to ingest, process, and analyze data that can be later used for training machine learning modelsUnderstand how to operationalize data models in production using curated dataBook Description In the world of ever-changing data and schemas, it is important to build data pipelines that can auto-adjust to changes. This book will help you build scalable data platforms that managers, data scientists, and data analysts can rely on. Starting with an introduction to data engineering, along with its key concepts and architectures, this book will show you how to use Microsoft Azure Cloud services effectively for data engineering. You'll cover data lake design patterns and the different stages through which the data needs to flow in a typical data lake. Once you've explored the main features of Delta Lake to build data lakes with fast performance and governance in mind, you'll advance to implementing the lambda architecture using Delta Lake. Packed with practical examples and code snippets, this book takes you through real-world examples based on production scenarios faced by the author in his 10 years of experience working with big data. Finally, you'll cover data lake deployment strategies that play an important role in provisioning the cloud resources and deploying the data pipelines in a repeatable and continuous way. By the end of this data engineering book, you'll know how to effectively deal with ever-changing data and create scalable data pipelines to streamline data science, ML, and artificial intelligence (AI) tasks. What you will learnDiscover the challenges you may face in the data engineering worldAdd ACID transactions to Apache Spark using Delta LakeUnderstand effective design strategies to build enterprise-grade data lakesExplore architectural and design patterns for building efficient data ingestion pipelinesOrchestrate a data pipeline for preprocessing data using Apache Spark and Delta Lake APIsAutomate deployment and monitoring of data pipelines in productionGet to grips with securing, monitoring, and managing data pipelines models efficientlyWho this book is for This book is for aspiring data engineers and data analysts who are new to the world of data engineering and are looking for a practical guide to building scalable data platforms. If you already work with PySpark and want to use Delta Lake for data engineering, you'll find this book useful. Basic knowledge of Python, Spark, and SQL is expected.

Product Details :

Genre : Computers
Author : Manoj Kukreja
Publisher : Packt Publishing Ltd
Release : 2021-10-22
File : 480 Pages
ISBN-13 : 9781801074322


Delta Lake The Definitive Guide

eBook Download

BOOK EXCERPT:

Ready to simplify the process of building data lakehouses and data pipelines at scale? In this practical guide, learn how Delta Lake is helping data engineers, data scientists, and data analysts overcome key data reliability challenges with modern data engineering and management techniques. Authors Denny Lee, Tristen Wentling, Scott Haines, and Prashanth Babu (with contributions from Delta Lake maintainer R. Tyler Croy) share expert insights on all things Delta Lake--including how to run batch and streaming jobs concurrently and accelerate the usability of your data. You'll also uncover how ACID transactions bring reliability to data lakehouses at scale. This book helps you: Understand key data reliability challenges and how Delta Lake solves them Explain the critical role of Delta transaction logs as a single source of truth Learn the Delta Lake ecosystem with technologies like Apache Flink, Kafka, and Trino Architect data lakehouses with the medallion architecture Optimize Delta Lake performance with features like deletion vectors and liquid clustering

Product Details :

Genre : Computers
Author : Denny Lee
Publisher : "O'Reilly Media, Inc."
Release : 2024-10-30
File : 391 Pages
ISBN-13 : 9781098151904


Databricks Ml In Action

eBook Download

BOOK EXCERPT:

Get to grips with autogenerating code, deploying ML algorithms, and leveraging various ML lifecycle features on the Databricks Platform, guided by best practices and reusable code for you to try, alter, and build on Key Features Build machine learning solutions faster than peers only using documentation Enhance or refine your expertise with tribal knowledge and concise explanations Follow along with code projects provided in GitHub to accelerate your projects Purchase of the print or Kindle book includes a free PDF eBook Book DescriptionDiscover what makes the Databricks Data Intelligence Platform the go-to choice for top-tier machine learning solutions. Written by a team of industry experts at Databricks with decades of combined experience in big data, machine learning, and data science, Databricks ML in Action presents cloud-agnostic, end-to-end examples with hands-on illustrations of executing data science, machine learning, and generative AI projects on the Databricks Platform. You’ll develop expertise in Databricks' managed MLflow, Vector Search, AutoML, Unity Catalog, and Model Serving as you learn to apply them practically in everyday workflows. This Databricks book not only offers detailed code explanations but also facilitates seamless code importation for practical use. You’ll discover how to leverage the open-source Databricks platform to enhance learning, boost skills, and elevate productivity with supplemental resources. By the end of this book, you'll have mastered the use of Databricks for data science, machine learning, and generative AI, enabling you to deliver outstanding data products.What you will learn Set up a workspace for a data team planning to perform data science Monitor data quality and detect drift Use autogenerated code for ML modeling and data exploration Operationalize ML with feature engineering client, AutoML, VectorSearch, Delta Live Tables, AutoLoader, and Workflows Integrate open-source and third-party applications, such as OpenAI's ChatGPT, into your AI projects Communicate insights through Databricks SQL dashboards and Delta Sharing Explore data and models through the Databricks marketplace Who this book is for This book is for machine learning engineers, data scientists, and technical managers seeking hands-on expertise in implementing and leveraging the Databricks Data Intelligence Platform and its Lakehouse architecture to create data products.

Product Details :

Genre : Computers
Author : Stephanie Rivera
Publisher : Packt Publishing Ltd
Release : 2024-05-17
File : 280 Pages
ISBN-13 : 9781800564008


Delta Lake Up And Running

eBook Download

BOOK EXCERPT:

With the surge in big data and AI, organizations can rapidly create data products. However, the effectiveness of their analytics and machine learning models depends on the data's quality. Delta Lake's open source format offers a robust lakehouse framework over platforms like Amazon S3, ADLS, and GCS. This practical book shows data engineers, data scientists, and data analysts how to get Delta Lake and its features up and running. The ultimate goal of building data pipelines and applications is to gain insights from data. You'll understand how your storage solution choice determines the robustness and performance of the data pipeline, from raw data to insights. You'll learn how to: Use modern data management and data engineering techniques Understand how ACID transactions bring reliability to data lakes at scale Run streaming and batch jobs against your data lake concurrently Execute update, delete, and merge commands against your data lake Use time travel to roll back and examine previous data versions Build a streaming data quality pipeline following the medallion architecture

Product Details :

Genre : Computers
Author : Bennie Haelen
Publisher : "O'Reilly Media, Inc."
Release : 2023-10-16
File : 271 Pages
ISBN-13 : 9781098139681


Hands On Salesforce Data Cloud

eBook Download

BOOK EXCERPT:

Learn how to implement and manage a modern customer data platform (CDP) through the Salesforce Data Cloud platform. This practical book provides a comprehensive overview that shows architects, administrators, developers, data engineers, and marketers how to ingest, store, and manage real-time customer data. Author Joyce Kay Avila demonstrates how to use Salesforce's native connectors, canonical data model, and Einstein's built-in trust layer to accelerate your time to value. You'll learn how to leverage Salesforce's low-code/no-code functionality to expertly build a Data Cloud foundation that unlocks the power of structured and unstructured data. Use Data Cloud tools to build your own predictive models or leverage third-party machine learning platforms like Amazon SageMaker, Google Vertex AI, and Databricks. This book will help you: Develop a plan to execute a CDP project effectively and efficiently Connect Data Cloud to external data sources and build out a Customer 360 Data Model Leverage data sharing capabilities with Snowflake, BigQuery, Databricks, and Azure Use Salesforce Data Cloud capabilities for identity resolution and segmentation Create calculated, streaming, visualization, and predictive insights Use Data Graphs to power Salesforce Einstein capabilities Learn Data Cloud best practices for all phases of the development lifecycle

Product Details :

Genre : Computers
Author : Joyce Kay Avila
Publisher : "O'Reilly Media, Inc."
Release : 2024-08-09
File : 451 Pages
ISBN-13 : 9781098147839


Apache Iceberg The Definitive Guide

eBook Download

BOOK EXCERPT:

Traditional data architecture patterns are severely limited. To use these patterns, you have to ETL data into each tool—a cost-prohibitive process for making warehouse features available to all of your data. The lack of flexibility with these patterns requires you to lock into a set of priority tools and formats, which creates data silos and data drift. This practical book shows you a better way. Apache Iceberg provides the capabilities, performance, scalability, and savings that fulfill the promise of an open data lakehouse. By following the lessons in this book, you'll be able to achieve interactive, batch, machine learning, and streaming analytics with this high-performance open source format. Authors Tomer Shiran, Jason Hughes, and Alex Merced from Dremio show you how to get started with Iceberg. With this book, you'll learn: The architecture of Apache Iceberg tables What happens under the hood when you perform operations on Iceberg tables How to further optimize Apache Iceberg tables for maximum performance How to use Iceberg with popular data engines such as Apache Spark, Apache Flink, and Dremio How Apache Iceberg can be used in streaming and batch ingestion Discover why Apache Iceberg is a foundational technology for implementing an open data lakehouse.

Product Details :

Genre : Computers
Author : Tomer Shiran
Publisher : "O'Reilly Media, Inc."
Release : 2024-05-02
File : 352 Pages
ISBN-13 : 9781098148584