ResourceManager High Availability

Provides an overview of how high availability for Resource Manager works.

The ResourceManager service tracks a cluster's resources and schedules YARN applications. Configure high availability for the ResourceManager so that the failure of the ResourceManager service is not a single point of failure for the cluster. The high availability of ResourceManager is based on the cluster configuration of the restart, recovery, and failover features.

Restart

By default, the Warden attempts to restart a failed service three times. You can configure the frequency that Warden attempts to restart failed services before initializing failover in the warden.conf file. For more information, see warden.conf.

Recovery

When a ResourceManager restarts or fails over, the active ResourceManager can recover the state of the previously running ResourceManager. By default, ResourceManager recovery is enabled and it uses the FileSystemRMStateStore implementation to store the ResourceManager state in the filesystem. You can configure the ResourceManager to have no recovery or you can enable the recovery. You can also configure the state store implementation that you want to use. For more information, see Recovery for the ResourceManager.

Failover

When a ResourceManager fails, the cluster can fail over the ResourceManager process to another node. To configure failover, the cluster must have one or more nodes with the ResourceManager role.

Note:

Starting in Version 4.0.2, zero configuration failover provides automatic failover without requiring that you specify the ResourceManager nodes when you run configure.sh. It also does not require any further configuration to yarn-site.xml.

Upgrade any client nodes to the 4.0.2 client to ensure proper communication with the ResourceManager service. Earlier versions of the MapR client do not support the zero configuration feature.

You can select one of the following failover implementations when you use the configure.sh utility to configure each node:

You can perform the following procedures to manage ResourceManager: