Explains the shuffle phase of a MapReduce application.
During the shuffle phase of a MapReduce application, HPE Ezmeral Data Fabric writes to a filesystem volume limited by its topology to the local node instead of writing intermediate data to local disks controlled by the operating system. This improves performance and reduces demand on local disk space while making the output available cluster-wide.
Direct Shuffle is the default shuffle mechanism for HPE Ezmeral Data Fabric. However, you can modify the
yarn-site.xml and mapred-site.xml configuration files to
enable Apache Shuffle for MapReduce applications. See Apache Shuffle on YARN.
The LocalVolumeAuxiliaryService runs in the NodeManager process. The LocalVolumeAuxiliaryService manages the local volume on each node and cleans up shuffle data after a MapReduce application has finished executing.

initializeApplication() on the LocalVolumeAuxiliaryService. stopApplication() on the
LocalVolumeAuxiliaryService to clean up data on the local volume. The deafult YARN parameters for Direct Shuffle are as follows:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,mapr_direct_shuffle</value>
<description>shuffle service that needs to be set for Map Reduce to run</description>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapr_direct_shuffle.class</name>
<value>org.apache.hadoop.mapred.LocalVolumeAuxService</value>
</property>
The default mapred parameters for Direct Shuffle are as follows:
<property>
<name>mapreduce.job.shuffle.provider.services</name>
<value>mapr_direct_shuffle</value>
</property>
<property>
<name>mapreduce.job.reduce.shuffle.consumer.plugin.class</name>
<value>org.apache.hadoop.mapreduce.task.reduce.DirectShuffle</value>
</property>
<property>
<name>mapreduce.job.map.output.collector.class</name>
<value>org.apache.hadoop.mapred.MapRFsOutputBuffer</value>
</property>
<property>
<name>mapred.ifile.outputstream</name>
<value>org.apache.hadoop.mapred.MapRIFileOutputStream</value>
</property>
<property>
<name>mapred.ifile.inputstream</name>
<value>org.apache.hadoop.mapred.MapRIFileInputStream</value>
</property>
<property>
<name>mapred.local.mapoutput</name>
<value>false</value>
</property>
<property>
<name>mapreduce.task.local.output.class</name>
<value>org.apache.hadoop.mapred.MapRFsOutputFile</value>
</property>