This procedure configures Spark to use the mounted NFS directory instead of the
/tmp directory on the local filesystem. Note that spill to disk
should be configured to spill to the filesystem node local
storage only if local disks are unavailable or space is limited on those
disks.
-
Install the
mapr-loopbacknfs and nfs-utils
packages if they are not already installed. For reference, see Installing the mapr-loopbacknfs Package and Setting Up MapR NFS.
-
Start the mapr-loopbacknfs service by following the steps at Managing the mapr-loopbacknfs Service.
-
To configure Spark Shuffle on NFS, complete these steps on all
nodes:
-
Create a local volume for Spark Shuffle:
sudo -u mapr maprcli volume create -name mapr.$(hostname -f).local.spark -path /var/mapr/local/$(hostname -f)/spark -replication 1 -localvolumehost $(hostname -f)
-
Point the NodeManager local directory to the Spark Shuffle volume
mounted through NFS by setting the following property in the
yarn-site.xml file on the NodeManager nodes:
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/mapr/my.cluster.com/var/mapr/local//spark</value>
</property>
-
(Optional) Configure how many times the NodeManager can attempt to
delete application-related directories from a volume when Spark is
configured to use the mounted NFS directory instead of the /tmp
directory on the local filesystem. Increasing the value (default is 2)
of this property can prevent application cache data from accumulating
in the volume. This functionality is available by default starting in
MEP 7.1.0. For previous MEP versions, request the patch. See Applying a Patch.
<property>
<name>yarn.nodemanager.max-retry-file-delete</name>
<value>2</value>
</property>
-
Restart the NodeManager service and the Resource Manager service on the
main node to pick up the
yarn-site.xml changes:
maprcli node services -name nodemanager -action restart -nodes <node 1> <node 2> <node 3>
maprcli node services -name resourcemanager -action restart