About DataTaps

DataTaps expand access to shared data by specifying a named path to a specified storage resource. Applications running within virtual clusters that can use the HDFS filesystem protocols can then access paths within that resource using that name, and DataTap implements Hadoop File System API. This allows you to run jobs using your existing data systems without the need to make time-consuming copies or transfers of your data. Tenant/Project Administrator users can quickly and easily build, edit, and remove DataTaps using the DataTaps screen, as described in The DataTaps Screen (Admin). Tenant Member users can access DataTaps by name.

Each DataTap requires the following properties to be configured, depending on the type of storage being connected to (MapR, HDFS, HDFS with Kerberos, or NFS):

The following fields depend on the DataTap type:

MapR

Note: All of the links to MapR articles in this section will open in a new browser tab/window.

A MapR DataTap is configured as follows:

See the following examples for additional information:

HDFS

An HDFS DataTap is configured as follows:

NFS

Note: This option is not available for Kubernetes tenants.

An NFS DataTap is configured as follows:

GCS

An GCS DataTap is configured as follows:

Using a DataTap

The storage pointed to by a DataTap can be accessed via a URI that includes the name of the DataTap.

A DataTap points to the top of the “path” configured for the given DataTap. The URI has the following form:

dtap://datatap_name/

In this example, datatap_name is the name of the DataTap that you wish to use. You can access files and directories further in the hierarchy by appending path components to the URI:

dtap://datatap_name/some_subdirectory/another_subdirectory/some_file

For example, the URI dtap://mydatatapr/home/mydirectory means that the data is located within the /home/mydirectory directory in the storage that the DataTap named mydatatap points to.

DataTaps exist on a per-tenant basis. This means that a DataTap created for Tenant A cannot be used by Tenant B. You may, however, create a DataTap for Tenant B with the exact same properties as its counterpart for Tenant A, thus allowing both tenants to access the same storage resource. Further, multiple jobs within a tenant may use a given DataTap simultaneously. While such sharing can be useful, be aware that the same cautions and restrictions apply to these use cases as for other types of shared storage: multiple jobs modifying files at the same location may lead to file access errors and/or unexpected job results.

Users who have a Tenant Administrator role may view and modify detailed DataTap information. Members may only view general DataTap information and are unable to create, edit, or remove a DataTap.

CAUTION:
Data conflicts may occur if more than one DataTap points to a location being used by multiple jobs at once.
CAUTION:
Editing or deleting a DataTap while it is being used by one or more running jobs may cause errors in the affected jobs.