Practical Data Engineering with Apache Projects : Solving Everyday Data Challenges with Spark, Iceberg, Kafka, Flink, and More

個数：

Practical Data Engineering with Apache Projects : Solving Everyday Data Challenges with Spark, Iceberg, Kafka, Flink, and More

Danushka, Dunith

ウェブストア価格 ¥12,004（本体¥10,913）
APress（2026/01発売）
外貨定価 US$ 59.99
ポイント 109pt

提携先の海外書籍取次会社に在庫がございます。通常3週間で発送いたします。
【重要ご説明事項】
1. 納期遅延や、ご入手不能となる場合が若干ございます。
2. 複数冊ご注文の場合は、ご注文数量が揃ってからまとめて発送いたします。
3. 美品のご指定は承りかねます。

●3Dセキュア導入とクレジットカードによるお支払いについて

【入荷遅延について】
世界情勢の影響により、海外からお取り寄せとなる洋書・洋古書の入荷が、表示している標準的な納期よりも遅延する場合がございます。
おそれいりますが、あらかじめご了承くださいますようお願い申し上げます。

◆画像の表紙や帯等は実物とは異なる場合があります。

◆ウェブストアでの洋書販売価格は、弊社店舗等での販売価格とは異なります。
また、洋書販売価格は、ご注文確定時点での日本円価格となります。
ご注文確定後に、同じ洋書の販売価格が変動しても、それは反映されません。

製本 Paperback:紙装版/ペーパーバック版／ページ数 252 p.
言語 ENG
商品コード 9798868821417
DDC分類 005.432

Description

This book is a comprehensive guide designed to equip you with the practical skills and knowledge necessary to tackle real-world data challenges using Open Source solutions. Focusing on 10 real-world data engineering projects, it caters specifically to data engineers at the early stages of their careers, providing a strong foundation in essential open source tools and techniques such as Apache Spark, Flink, Airflow, Kafka, and many more.
Each chapter is dedicated to a single project, starting with a clear presentation of the problem it addresses. You will then be guided through a step-by-step process to solve the problem, leveraging widely-used open-source data tools. This hands-on approach ensures that you not only understand the theoretical aspects of data engineering but also gain valuable experience in applying these concepts to real-world scenarios.
At the end of each chapter, the book delves into common challenges that may arise during the implementation of the solution, offering practical advice on troubleshooting these issues effectively. Additionally, the book highlights best practices that data engineers should follow to ensure the robustness and efficiency of their solutions. A major focus of the book is using open-source projects and tools to solve problems encountered in data engineering.
In summary, this book is an indispensable resource for data engineers looking to build a strong foundation in the field. By offering practical, real-world projects and emphasizing problem-solving and best practices, it will prepare you to tackle the complex data challenges encountered throughout your career. Whether you are an aspiring data engineer or looking to enhance your existing skills, this book provides the knowledge and tools you need to succeed in the ever-evolving world of data engineering.

You Will Learn:

The foundational concepts of data engineering and practical experience in solving real-world data engineering problems
How to proficiently use open-source data tools like Apache Kafka, Flink, Spark, Airflow, and Trino
10 hands-on data engineering projects
Troubleshoot common challenges in data engineering projects

Who is this book for: Early-career data engineers and aspiring data engineers who are looking to build a strong foundation in the field; mid-career professionals looking to transition into data engineering roles; and technology enthusiasts interested in gaining insights into data engineering practices and tools.

Part I: Data Lakehouses, Iceberg, Batch ETL, and Orchestration.- Chapter 1: Foundational Data Engineering Concepts.- Chapter 2: Building a Data Lakehouse with Apache Iceberg.- Chapter 3: Batch ETL Pipeline with Apache Spark.- Chapter 4: Data Visualization with Apache Superset.- Chapter 5: Workflow Orchestration with Apache Airflow.- Part II: Streaming Data and Real-time Analytics. - Chapter 6: Change Data Capture with Debezium and Kafka.- Chapter 7: Low-latency Analytics Dashboard with ClickHouse.- Chapter 8: Real-time Fraud Detection with Apache Flink.- Part III: Machine Learning and Generative AI.- Chapter 9: Building a Product Recommendation Engine with Spark MLlib.- Chapter 10: Vector Similarity Search with Postgres and pgvector.

Dunith Danushka has over 10 years of experience in the data analytics domain, including real-time analytics, stream processing, streaming data platforms, and event-driven systems. Throughout his career, he has developed, supported, and designed large-scale data-intensive applications, led teams who did that, and helped customers build their own.
His transition from engineering to solution architecture, developer relations, and now Product Marketing at EDB has given him a comprehensive outlook on the data space. His goal is to provide practical, hands-on guidance to help aspiring data engineers bridge the gap between theoretical knowledge and real-world applications.