Lists the parameters for MapReduce configuration.
MapReduce is a type of application that can run on the Hadoop 2.x framework. MapReduce
configuration options are stored in the
/opt/mapr/hadoop/hadoop-2.x.x/etc/hadoop/mapred-site.xml file and
are editable by the root user. This file contains configuration
information that overrides the default values for MapReduce parameters. Overrides of the
default values for core configuration properties are stored in the HPE Ezmeral Data Fabric Parameters file.
To override a default value for a property, specify the new
value within the <configuration> tags,
using the following format:
<property>
<name> </name>
<value> </value>
<description> </description>
</property>
Configurations for MapReduce Applications
The configuration comprises the following parameters:
- mapreduce.framework.name
- Value: yarn
- Description: Execution framework set to Hadoop YARN.
- mapreduce.input.fileinputformat.split.maxblocknum
- Value: 0
- Description: Number of blocks that can be added to one split. A value of
0 means that a single split is generated per node.
- mapreduce.map.memory.mb
- Value: 1024
- Description: Larger resource limit for maps.
- mapreduce.map.java.opts
- Value:
-Xmx900m --add-opens java.base/java.lang=ALL-UNNAMED
-XX:+UseParallelGC
- Description: Larger heap-size for child jvms of maps.
- mapreduce.reduce.memory.mb
- Value: 3072
- Description: Larger resource limit for reduces.
- mapreduce.reduce.java.opts
- Value: -Xmx2560m --add-opens
java.base/java.lang=ALL-UNNAMED -XX:+UseParallelGC
- Description: Larger heap-size for child jvms of reduces.
- mapreduce.task.io.sort.mb
- Value: 512
- Description: Higher memory limit while sorting data for efficiency.
- mapreduce.task.io.sort.factor
- Value: 100
- Description: More streams merged at once while sorting files.
- mapreduce.reduce.shuffle.parallelcopies
- Value: 50
- Description: Higher number of parallel copies run by reduces to fetch
outputs from very large number of maps.
Configurations for MapReduce JobHistory Server
The configuration comprises the following parameters:
- mapr.localspill.expiration.date
- Value: days
- Description: Property to determine spill files expiration date in days.
Default value is 30 days.
- mapreduce.jobhistory.address
- Value: MapReduce JobHistory Server host:port
- Description: Default port is 10020.
- mapreduce.jobhistory.webapp.address
- Value: MapReduce JobHistory Server Web UI host:port
- Description: Default port is 19888.
- mapreduce.jobhistory.intermediate-done-dir
- Value:
/mr-history/tmp
- Description: Directory where history files are written by MapReduce
applications.
- mapreduce.jobhistory.intermediate-done-scan-timeout
- Value: milliseconds
- Description: Timeout in milliseconds for rescanning the
done_intermediate user directory to reduce JobHistory
Server loading. Information about a job is received with a delay equal to the
timeout. Adjust the setting based on the cluster load. Start with 5000 ms and
increase timeout as needed.
- mapreduce.jobhistory.done-dir
- Value:
/mr-history/done
- Description: Directory where history files are managed by the MapReduce
JobHistory Server.
- mapreduce.jobhistory.webapp.https.address
- Value: Secure MapReduce JobHistory Server Web UI host:port
(HTTPS)
- Description: Default port is 19890.
Sample Hadoop 2.x mapred-site.xml File
The following mapred-site.xml file defines values for two job history
parameters.
<configuration>
<property>
<name>mapreduce.jobhistory.address</name>
<value>__HS_IP__:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>__HS_IP__:19888</value>
</property>
</configuration>
Configuration for Apache Shuffle
You can disable Direct Shuffle and enable Apache Shuffle for MapReduce applications
through the following settings:
- mapreduce.job.shuffle.provider.services
- Value: mapreduce_shuffle
- mapreduce.job.reduce.shuffle.consumer.plugin.class
- Value:
org.apache.hadoop.mapreduce.task.reduce.Shuffle
- mapreduce.job.map.output.collector.class
- Value:
org.apache.hadoop.mapred.MapTask$MapOutputBuffer
- mapred.ifile.outputstream
- Value: org.apache.hadoop.mapred.IFileOutputStream
- mapred.ifile.inputstream
- Value: org.apache.hadoop.mapred.IFileInputStream
- mapred.local.mapoutput
- Value: true
- mapreduce.task.local.output.class
- Value: org.apache.hadoop.mapred.YarnOutputFiles