مشخصات کتاب
-
Arun C. Murthy Vinod Kumar Vavilapalli Doug Eadline Joseph Niemiec Jeff Markham
-
2014
-
انگلیسی
-
1875
-
402
-
0
Apache Hadoop™ YARN
Foreword by Raymie Stata xiii
Foreword by Paul Dix xv
Preface xvii
Acknowledgments xxi
About the Authors xxv
1 Apache Hadoop YARN:
A Brief History and Rationale 1
Introduction 1
Apache Hadoop 2
Phase 0: The Era of Ad Hoc Clusters 3
Phase 1: Hadoop on Demand 3
HDFS in the HOD World 5
Features and Advantages of HOD 6
Shortcomings of Hadoop on Demand 7
Phase 2: Dawn of the Shared Compute Clusters 9
Evolution of Shared Clusters 9
Issues with Shared MapReduce Clusters 15
Phase 3: Emergence of YARN 18
Conclusion 20
2 Apache Hadoop YARN Install Quick Start 21
Getting Started 22
Steps to Configure a Single-Node YARN Cluster 22
Step 1: Download Apache Hadoop 22
Step 2: Set JAVA_HOME 23
Step 3: Create Users and Groups 23
Step 4: Make Data and Log Directories 23
Step 5: Configure core-site.xml 24
Step 6: Configure hdfs-site.xml 24
Step 7: Configure mapred-site.xml 25
Step 8: Configure yarn-site.xml 25
Step 9: Modify Java Heap Sizes 26
Step 10: Format HDFS 26
Step 11: Start the HDFS Services 27Foreword by Raymie Stata xiii
Foreword by Paul Dix xv
Preface xvii
Acknowledgments xxi
About the Authors xxv
1 Apache Hadoop YARN:
A Brief History and Rationale 1
Introduction 1
Apache Hadoop 2
Phase 0: The Era of Ad Hoc Clusters 3
Phase 1: Hadoop on Demand 3
HDFS in the HOD World 5
Features and Advantages of HOD 6
Shortcomings of Hadoop on Demand 7
Phase 2: Dawn of the Shared Compute Clusters 9
Evolution of Shared Clusters 9
Issues with Shared MapReduce Clusters 15
Phase 3: Emergence of YARN 18
Conclusion 20
2 Apache Hadoop YARN Install Quick Start 21
Getting Started 22
Steps to Configure a Single-Node YARN Cluster 22
Step 1: Download Apache Hadoop 22
Step 2: Set JAVA_HOME 23
Step 3: Create Users and Groups 23
Step 4: Make Data and Log Directories 23
Step 5: Configure core-site.xml 24
Step 6: Configure hdfs-site.xml 24
Step 7: Configure mapred-site.xml 25
Step 8: Configure yarn-site.xml 25
Step 9: Modify Java Heap Sizes 26
Step 10: Format HDFS 26
Step 11: Start the HDFS Services 27Managing Application Dependencies 53
LocalResources Definitions 54
LocalResource Timestamps 55
LocalResource Types 55
LocalResource Visibilities 56
Lifetime of LocalResources 57
Wrap-up 57
5 Installing Apache Hadoop YARN 59
The Basics 59
System Preparation 60
Step 1: Install EPEL and pdsh 60
Step 2: Generate and Distribute ssh Keys 61
Script-based Installation of Hadoop 2 62
JDK Options 62
Step 1: Download and Extract the Scripts 63
Step 2: Set the Script Variables 63
Step 3: Provide Node Names 64
Step 4: Run the Script 64
Step 5: Verify the Installation 65
Script-based Uninstall 68
Configuration File Processing 68
Configuration File Settings 68
core-site.xml 68
hdfs-site.xml 69
mapred-site.xml 69
yarn-site.xml 70
Start-up Scripts 71
Installing Hadoop with Apache Ambari 71
Performing an Ambari-based
Hadoop Installation 72
Step 1: Check Requirements 73
Step 2: Install the Ambari Server 73
Step 3: Install and Start Ambari Agents 73
Step 4: Start the Ambari Server 74
Step 5: Install an HDP2.X Cluster 75
Wrap-up 846 Apache Hadoop YARN Administration 85
Script-based Configuration 85
Monitoring Cluster Health: Nagios 90
Monitoring Basic Hadoop Services 92
Monitoring the JVM 95
Real-time Monitoring: Ganglia 97
Administration with Ambari 99
JVM Analysis 103
Basic YARN Administration 106
YARN Administrative Tools 106
Adding and Decommissioning YARN Nodes 107
Capacity Scheduler Configuration 108
YARN WebProxy 108
Using the JobHistoryServer 108
Refreshing User-to-Groups Mappings 108
Refreshing Superuser Proxy Groups
Mappings 109
Refreshing ACLs for Administration of
ResourceManager 109
Reloading the Service-level Authorization
Policy File 109
Managing YARN Jobs 109
Setting Container Memory 110
Setting Container Cores 110
Setting MapReduce Properties 110
User Log Management 111
Wrap-up 114
7 Apache Hadoop YARN Architecture Guide 115
Overview 115
ResourceManager 117
Overview of the ResourceManager
Components 118
Client Interaction with the
ResourceManager 118
Application Interaction with the
ResourceManager 120Interaction of Nodes with the
ResourceManager 121
Core ResourceManager Components 122
Security-related Components in the
ResourceManager 124
NodeManager 127
Overview of the NodeManager Components 128
NodeManager Components 129
NodeManager Security Components 136
Important NodeManager Functions 137
ApplicationMaster 138
Overview 138
Liveliness 139
Resource Requirements 140
Scheduling 140
Scheduling Protocol and Locality 142
Launching Containers 145
Completed Containers 146
ApplicationMaster Failures and Recovery 146
Coordination and Output Commit 146
Information for Clients 147
Security 147
Cleanup on ApplicationMaster Exit 147
YARN Containers 148
Container Environment 148
Communication with the ApplicationMaster 149
Summary for Application-writers 150
Wrap-up 151
8 Capacity Scheduler in YARN 153
Introduction to the Capacity Scheduler 153
Elasticity with Multitenancy 154
Security 154
Resource Awareness 154
Granular Scheduling 154
Locality 155
Scheduling Policies 155
Capacity Scheduler Configuration 155
Queues 156
Hierarchical Queues 156
Key Characteristics 157
Scheduling Among Queues 157
Defining Hierarchical Queues 158
Queue Access Control 159
Capacity Management with Queues 160
User Limits 163
Reservations 166
State of the Queues 167
Limits on Applications 168
User Interface 169
Wrap-up 169
9 MapReduce with Apache Hadoop YARN 171
Running Hadoop YARN MapReduce Examples 171
Listing Available Examples 171
Running the Pi Example 172
Using the Web GUI to Monitor Examples 174
Running the Terasort Test 180
Run the TestDFSIO Benchmark 180
MapReduce Compatibility 181
The MapReduce ApplicationMaster 181
Enabling Application Master Restarts 182
Enabling Recovery of Completed Tasks 182
The JobHistory Server 182
Calculating the Capacity of a Node 182
Changes to the Shuffle Service 184
Running Existing Hadoop Version 1
Applications 184
Binary Compatibility of org.apache.hadoop.mapred
APIs 184
Source Compatibility of org.apache.hadoop.
mapreduce APIs 185
Compatibility of Command-line Scripts 185
Compatibility Tradeoff Between MRv1 and Early
MRv2 (0.23.x) Applications 185Running MapReduce Version 1 Existing Code 187
Running Apache Pig Scripts on YARN 187
Running Apache Hive Queries on YARN 187
Running Apache Oozie Workflows on YARN 188
Advanced Features 188
Uber Jobs 188
Pluggable Shuffle and Sort 188
Wrap-up 190
10 Apache Hadoop YARN Application Example 191
The YARN Client 191
The ApplicationMaster 208
Wrap-up 226
11 Using Apache Hadoop YARN
Distributed-Shell 227
Using the YARN Distributed-Shell 227
A Simple Example 228
Using More Containers 229
Distributed-Shell Examples with Shell
Arguments 230
Internals of the Distributed-Shell 232
Application Constants 232
Client 233
ApplicationMaster 236
Final Containers 240
Wrap-up 240
12 Apache Hadoop YARN Frameworks 241
Distributed-Shell 241
Hadoop MapReduce 241
Apache Tez 242
Apache Giraph 242
Hoya: HBase on YARN 243
Dryad on YARN 243
Apache Spark 244
Apache Storm 244REEF: Retainable Evaluator Execution
Framework 245
Hamster: Hadoop and MPI on the
Same Cluster 245
Wrap-up 245
A Supplemental Content and Code
Downloads 247
Available Downloads 247
B YARN Installation Scripts 249
install-hadoop2.sh 249
uninstall-hadoop2.sh 256
hadoop-xml-conf.sh 258
C YARN Administration Scripts 263
configure-hadoop2.sh 263
D Nagios Modules 269
check_resource_manager.sh 269
check_data_node.sh 271
check_resource_manager_old_space_pct.sh 272
E Resources and Additional Information 277
F HDFS Quick Reference 279
Quick Command Reference 279
Starting HDFS and the HDFS Web GUI 280
Get an HDFS Status Report 280
Perform an FSCK on HDFS 281
General HDFS Commands 281
List Files in HDFS 282
Make a Directory in HDFS 283
Copy Files to HDFS 283
Copy Files from HDFS 284
Copy Files within HDFS 284
Delete a File within HDFS 284
Delete a Directory in HDFS 284
Decommissioning HDFS Nodes 284
Index 287