Provides an overview of how high availability for Resource Manager works.
The ResourceManager service tracks a cluster's resources and schedules YARN applications. Configure high availability for the ResourceManager so that the failure of the ResourceManager service is not a single point of failure for the cluster. The high availability of ResourceManager is based on the cluster configuration of the restart, recovery, and failover features.
By default, the Warden attempts to restart a failed service three times. You can configure the frequency that Warden attempts to restart failed services before initializing failover in the warden.conf file. For more information, see warden.conf.
When a ResourceManager restarts or fails over, the active ResourceManager can recover
the state of the previously running ResourceManager. By default, ResourceManager
recovery is enabled and it uses the FileSystemRMStateStore
implementation to store the ResourceManager state in the filesystem. You can configure the ResourceManager to have no
recovery or you can enable the recovery. You can also configure the state store
implementation that you want to use. For more information, see Recovery
for the ResourceManager.
When a ResourceManager fails, the cluster can fail over the ResourceManager process to another node. To configure failover, the cluster must have one or more nodes with the ResourceManager role.
Starting in Version 4.0.2, zero configuration failover provides automatic failover without requiring that you specify the ResourceManager nodes when you run configure.sh. It also does not require any further configuration to yarn-site.xml.
Upgrade any client nodes to the 4.0.2 client to ensure proper communication with the ResourceManager service. Earlier versions of the MapR client do not support the zero configuration feature.
You can select one of the following failover implementations when you use the configure.sh utility to configure each node:
You can perform the following procedures to manage ResourceManager: