Any third-party library that is required by a MapReduce program must be accessible to the data node that processes the application.
A data node is a node in the cluster that includes the NodeManager role. You can provide the third-party libraries when you submit the program, or you can install the third-party libraries on each node that processes the application.
Including the third-party libraries with each program is the preferred method.
Perform one the following operations to include the third-party jars when you submit the program:
Package the third-party libraries with the MapReduce jar file. The benefit of this method is that the node from which you submit the program and the node that runs the program are not required to have the libraries files.
Use the -libjars parameter to specify the
third-party libraries on the command line. With this option, the library files are
submitted to the data node along with the program. The benefit of this method is
that the node that runs the program does not need to have the library files
installed. However, the node that submits the program must have the library files
installed.
You can also install the third-party libraries on each data node. However, this may not be preferred as there could be conflicts between library versions or library files.
To install the third-party libraries on each data node, perform one of the following operations:
Install the third-party libraries in the following directory on each Node Manager
node: /opt/mapr/hadoop/hadoop-2.x/share/hadoop/common
env_override.sh file. The
env_override.sh file is located in the following directory:
/opt/mapr/conf. For more information about the file, see About env_override.sh.