Virtualizing Hadoop : How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture

  • ポイントキャンペーン

Virtualizing Hadoop : How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture

  • ただいまウェブストアではご注文を受け付けておりません。 ⇒古書を探す
  • 製本 Paperback:紙装版/ペーパーバック版/ページ数 454 p.
  • 言語 ENG
  • 商品コード 9780133811025
  • DDC分類 004

Full Description


Plan and Implement Hadoop Virtualization for Maximum Performance, Scalability, and Business AgilityEnterprises running Hadoop must absorb rapid changes in big data ecosystems, frameworks, products, and workloads. Virtualized approaches can offer important advantages in speed, flexibility, and elasticity. Now, a world-class team of enterprise virtualization and big data experts guide you through the choices, considerations, and tradeoffs surrounding Hadoop virtualization. The authors help you decide whether to virtualize Hadoop, deploy Hadoop in the cloud, or integrate conventional and virtualized approaches in a blended solution. First, Virtualizing Hadoop reviews big data and Hadoop from the standpoint of the virtualization specialist. The authors demystify MapReduce, YARN, and HDFS and guide you through each stage of Hadoop data management. Next, they turn the tables, introducing big data experts to modern virtualization concepts and best practices.Finally, they bring Hadoop and virtualization together, guiding you through the decisions you'll face in planning, deploying, provisioning, and managing virtualized Hadoop. From security to multitenancy to day-to-day management, you'll find reliable answers for choosing your best Hadoop strategy and executing it.Coverage includes the following:* Reviewing the frameworks, products, distributions, use cases, and roles associated with Hadoop* Understanding YARN resource management, HDFS storage, and I/O* Designing data ingestion, movement, and organization for modern enterprise data platforms* Defining SQL engine strategies to meet strict SLAs* Considering security, data isolation, and scheduling for multitenant environments* Deploying Hadoop as a service in the cloud* Reviewing the essential concepts, capabilities, and terminology of virtualization * Applying current best practices, guidelines, and key metrics for Hadoop virtualization* Managing multiple Hadoop frameworks and products as one unified system* Virtualizing master and worker nodes to maximize availability and performance* Installing and configuring Linux for a Hadoop environment

Contents

Foreword xixPreface xxiPart I: Introduction to HadoopChapter 1 Understanding the Big Data World 1The Data Revolution 2Traditional Data Systems 4Semi-Structured and Unstructured Data 5Causation and Correlation 7Data Challenges 8The Modern Data Architecture 17Organizational Transformations 20Industry Transformation 21Summary 22Chapter 2 Hadoop Fundamental Concepts 23Types of Data in Hadoop 23Use Cases 25What Is Hadoop? 26Hadoop Distributions 32Hadoop Frameworks 32NoSQL Databases 37What Is NoSQL? 38A Hadoop Cluster 42Hadoop Software Processes 45Hadoop Hardware Profiles 48Roles in the Hadoop Environment 56Summary 59Chapter 3 YARN and HDFS 61A Hadoop Cluster Is Distributed 61Hadoop Directory Layouts 65Hadoop Operating System Users 67The Hadoop Distributed File System 67YARN Logging 70The NameNode 70The DataNode 71Block Placement 75NameNode Configurations and Managing Metadata 77Rack Awareness 82Block Management 83The Balancer 84Maintaining Data Integrity in the Cluster 84Quotas and Trash 92YARN and the YARN Processing Model 93Running Applications on YARN 101Resource Schedulers 107Benchmarking 112TeraSort Benchmarking Suite 115Summary 117Chapter 4 The Modern Data Platform 119Designing a Hadoop Cluster 119Enterprise Data Movement 124Summary 140Chapter 5 Data Ingestion 141Extraction, Loading, and Transformation (ELT) 141Sqoop: Data Movement with SQL Sources 143Flume: Streaming Data 148Oozie: Scheduling and Workfl ow 167Falcon: Data Lifecycle Management 172Kafka: Real-time Data Streaming 176Summary 186Chapter 6 Hadoop SQL Engines 187Where SQL Was Born 187SQL in Hadoop 188Hadoop SQL Engines 190Selecting the SQL Tool For Hadoop 190Now Getting Groovy with Hive and Pig 198Hive 199HCatalog 213Pig 215Summary 221Chapter 7 Multitenancy in Hadoop 223Securing the Access 224Authentication 225Auditing 230Authorization 230Data Protection 232Isolating the Data 241Isolating the Process 251Summary 255Part II: Introduction to VirtualizationChapter 8 Virtualization Fundamentals 257Why Virtualize Hadoop? 258Introduction to Virtualization 261Summary 276References 276Chapter 9 Best Practices for Virtualizing Hadoop 277Running Virtualized Hadoop with Purpose and Discipline 277The Discipline of Purpose Starts with a Clear Target 279Virtualizing Different Tiers of Hadoop 280Industry Best Practices 282Summary 298Part III: Virtualizing HadoopChapter 10 Virtualizing Hadoop 299How Are Hadoop Ecosystems Going to Be Managed? 300Building an Enterprise Hadoop Platform That Is Agile and Flexible 301Clarification of Terms 302The Journey from Bare-Metal to Virtualization 303Why Consider Virtualizing Hadoop? 304Benefits of Virtualizing Hadoop 305Virtualized Hadoop Can Run as Fast or Faster Than Native 306Coordination and Cross-Purpose Specialization Is the Future 309Barriers Can Be Organizational 310Virtualization Is Not an All or Nothing Option 310Rapid Provisioning and Improving Quality of Development and Test Environments 311Improve High Availability with Virtualization 313Use Virtualization to Leverage Hadoop Workloads 313Hadoop in the Cloud 314Big Data Extensions 314The Path to Virtualization 315The Software-Defined Data Center 316Virtualizing the Network 318vRealize Suite 320Summary 321References 322Chapter 11 Virtualizing Hadoop Master Servers 323Virtualizing Servers in a Hadoop Cluster 324Virtualizing the Environment Around Hadoop 325Virtualizing the Master Hadoop Servers 325Virtualizing Without the SAN 330Summary 331Chapter 12 Virtualizing the Hadoop Worker Nodes 333A Brief Introduction to the Worker Nodes in Hadoop 333Deployment Models for Hadoop Clusters 335The Combined Model 336The Separated Model 339Network Effects of the Data-Compute Separation 341The Shared-Storage Approach to the Data-Compute Separated Model 343Local Disks for the Application's Temporary Data 345The Shared Storage Architecture Model Using Network-Attached Storage (NAS) 345Deployment Model Summary 348Best Practices for Virtualizing Hadoop Workers 349Disk I/O 349The Hadoop Virtualization Extensions (HVE) 354Summary 357References 358Resources 358Chapter 13 Deploying Hadoop as a Service in the Private Cloud 361The Cloud Context 361Stakeholders for Hadoop 362Overview of the Solution Architecture 368Summary 370References 371Chapter 14 Understanding the Installation of Hadoop 373Map the Right Solutions to the Right Use Case 373Thoughts About Installing Hadoop 374Configuring Repositories 376Installing HDP 2.2 378Environment Preparation 378Setting Up the Hadoop Configuration 389Starting HDFS and YARN 393Start YARN 396Verifying MapReduce Functionality 398Installing and Configuring Hive 400Installing and Configuring MySQL Database 401Installing and Configuring Hive and HCatalog 401Summary 404Chapter 15 Configuring Linux for Hadoop 405Supported Linux Platforms 406Different Deployment Models 406Linux Golden Templates 407Building a Linux Enterprise Hadoop Platform 408Selecting the Linux Distribution 411Optimal Linux Kernel Parameters and System Settings 411epoll 411Disable Swap Space 412Disable Security During Install 412IO Scheduler Tuning 414Check Transparent Huge Pages Configuration 414Limits.conf 414Partition Alignment for RDMs 415File System Considerations 416Lazy Count Parameter for XFS 418Mount Options 418I/O Scheduler 419Disk Read and Write Options 421Storage Benchmarking 421Java Version 422Set Up NTP 423Enable Jumbo Frames 424Additional Network Considerations 425Summary 427Appendix A Hadoop Cluster Creation: A Prerequisite Checklist 429Appendix B Big Data/Hadoop on VMware vSphere Reference Materials 433Deployment Guides 433Reference Architectures 434Customer Case Studies 434Performance 434vSphere Big Data Extensions (BDE) 435Other vSphere Features and Big Data 4369780133811025 TOC 7/7/2015

最近チェックした商品