Apache Hadoop YARN : Moving Beyond MapReduce and Batch Processing with Apache Hadoop 2 (Addison-wesley Data and Analytics)

Murthy, Arun C./ Vavilapalli, Vinod Kumar/ Eadline, Doug/ Niemiec, Jos

Addison-Wesley Professional（2014/03発売）

ただいまウェブストアではご注文を受け付けておりません。 ⇒古書を探す

製本 Paperback:紙装版/ペーパーバック版／ページ数 304 p.
言語 ENG
商品コード 9780321934505
DDC分類 004.36

Full Description

"This book is a critically needed resource for the newly released Apache Hadoop 2.0, highlighting YARN as the significant breakthrough that broadens Hadoop beyond the MapReduce paradigm." -From the Foreword by Raymie Stata, CEO of AltiscaleThe Insider's Guide to Building Distributed, Big Data Applications with Apache Hadoop (TM) YARNApache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache Hadoop (TM) YARN, two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances.YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment.You'll find many examples drawn from the authors' cutting-edge experience-first as Hadoop's earliest developers and implementers at Yahoo! and now as Hortonworks developers moving the platform forward and helping customers succeed with it.Coverage includesYARN's goals, design, architecture, and components-how it expands the Apache Hadoop ecosystem Exploring YARN on a single node Administering YARN clusters and Capacity Scheduler Running existing MapReduce applications Developing a large-scale clustered YARN application Discovering new open source frameworks that run under YARN

Foreword by Raymie Stata xiiiForeword by Paul Dix xvPreface xviiAcknowledgments xxiAbout the Authors xxvChapter 1: Apache Hadoop YARN: A Brief History and Rationale 1Introduction 1Apache Hadoop 2Phase 0: The Era of Ad Hoc Clusters 3Phase 1: Hadoop on Demand 3Phase 2: Dawn of the Shared Compute Clusters 9Phase 3: Emergence of YARN 18Conclusion 20Chapter 2: Apache Hadoop YARN Install Quick Start 21Getting Started 22Steps to Configure a Single-Node YARN Cluster 22Run Sample MapReduce Examples 30Wrap-up 31Chapter 3: Apache Hadoop YARN Core Concepts 33Beyond MapReduce 33Apache Hadoop MapReduce 35Apache Hadoop YARN 38YARN Components 39Wrap-up 42Chapter 4: Functional Overview of YARN Components 43Architecture Overview 43ResourceManager 45YARN Scheduling Components 46Containers 49NodeManager 49ApplicationMaster 50YARN Resource Model 50Managing Application Dependencies 53Wrap-up 57Chapter 5: Installing Apache Hadoop YARN 59The Basics 59System Preparation 60Script-based Installation of Hadoop 2 62Script-based Uninstall 68Configuration File Processing 68Configuration File Settings 68Start-up Scripts 71Installing Hadoop with Apache Ambari 71Wrap-up 84Chapter 6: Apache Hadoop YARN Administration 85Script-based Configuration 85Monitoring Cluster Health: Nagios 90Real-time Monitoring: Ganglia 97Administration with Ambari 99JVM Analysis 103Basic YARN Administration 106Wrap-up 114Chapter 7: Apache Hadoop YARN Architecture Guide 115Overview 115ResourceManager 117NodeManager 127ApplicationMaster 138YARN Containers 148Summary for Application-writers 150Wrap-up 151Chapter 8: Capacity Scheduler in YARN 153Introduction to the Capacity Scheduler 153Capacity Scheduler Configuration 155Queues 156Hierarchical Queues 156Queue Access Control 159Capacity Management with Queues 160User Limits 163Reservations 166State of the Queues 167Limits on Applications 168User Interface 169Wrap-up 169Chapter 9: MapReduce with Apache Hadoop YARN 171Running Hadoop YARN MapReduce Examples 171MapReduce Compatibility 181The MapReduce ApplicationMaster 181Calculating the Capacity of a Node 182Changes to the Shuffle Service 184Running Existing Hadoop Version 1 Applications 184Running MapReduce Version 1 Existing Code 187Advanced Features 188Wrap-up 190Chapter 10: Apache Hadoop YARN Application Example 191The YARN Client 191The ApplicationMaster 208Wrap-up 226Chapter 11: Using Apache Hadoop YARN Distributed-Shell 227Using the YARN Distributed-Shell 227Internals of the Distributed-Shell 232Wrap-up 240Chapter 12: Apache Hadoop YARN Frameworks 241Distributed-Shell 241Hadoop MapReduce 241Apache Tez 242Apache Giraph 242Hoya: HBase on YARN 243Dryad on YARN 243Apache Spark 244Apache Storm 244REEF: Retainable Evaluator Execution Framework 245Hamster: Hadoop and MPI on the Same Cluster 245Wrap-up 245Appendix A: Supplemental Content and Code Downloads 247Available Downloads 247Appendix B: YARN Installation Scripts 249install-hadoop2.sh 249uninstall-hadoop2.sh 256hadoop-xml-conf.sh 258Appendix C: YARN Administration Scripts 263configure-hadoop2.sh 263Appendix D: Nagios Modules 269check_resource_manager.sh 269check_data_node.sh 271check_resource_manager_old_space_pct.sh 272Appendix E: Resources and Additional Information 277Appendix F: HDFS Quick Reference 279Quick Command Reference 279Index 287