Managing Third-Party Libraries

Any third-party library that is required by a MapReduce program must be accessible to the data node that processes the application.

A data node is a node in the cluster that includes the NodeManager role. You can provide the third-party libraries when you submit the program, or you can install the third-party libraries on each node that processes the application.

Include the third-party libraries with each program

Including the third-party libraries with each program is the preferred method.

Perform one the following operations to include the third-party jars when you submit the program:

Install the third-party libraries on each node that runs the program

You can also install the third-party libraries on each data node. However, this may not be preferred as there could be conflicts between library versions or library files.

To install the third-party libraries on each data node, perform one of the following operations: