Tenant Administrators have the ability to create DataTaps. Clicking the Create
button in the DataTaps screen opens the Create New DataTap screen.
DataTaps are created on a per-tenant basis. This means that a DataTap created in Tenant A
is not available to Tenant B. You may, however, choose to create DataTaps in different
tenants that point to the same storage path; in this situation, jobs in different
tenants can access the same storage simultaneously. Also, multiple jobs within a tenant
may use a given DataTap simultaneously. While such sharing can be useful, be aware that
the same cautions and restrictions apply to these use cases as for other types of shared
storage: multiple jobs modifying files at the same location may lead to file access
errors and/or unexpected job results.
CAUTION:
CREATING MULTIPLE DATATAPS TO THE SAME DIRECTORY CAN LEAD TO
CONFLICTS AND POTENTIAL DATA LOSS.
Note:
This article contains generic instructions for creating a
DataTap. Please see the following for more specific
examples:
To create a DataTap:
- Please see About DataTaps for important
limitations on where you can create DataTaps.
- Enter a name for the DataTap in the Name field. This name may contain letters
(A-Z or a-z), digits (0-9), and hyphens (-), but may not contain spaces.
- Enter a brief description for the DataTap in the Description field.
- You can make a DataTap read only by checking the Read Only check box. Clearing
this check box allows read/write access.
- Select the file system type using the Select Type pull-down menu. The
available options are:
- Review the entries you made in Steps 1-6 to make sure they are accurate.
When you have finished modifying the parameters for the DataTap, click Submit to
create the new DataTap.
MAPR Parameters
If you selected MAPR in Step 5, above, then enter the following
parameters:
HDFS Parameters
If you selected HDFS in Step 5, above, then enter the following
parameters:
- Host: DNS name or IP address of the server providing access to the
storage resource. For example, this could be the host running the namenode
service of an HDFS cluster.
- Standby NameNode: DNS name or IP address of a standby namenode host that
an HDFS DataTap will try to reach if it cannot contact the primary host. This
field is optional; when used, it provides high-availability access to the
specified HFDS DataTap.
- Port: For HDFS DataTaps, this is the port for the namenode server on the
host used to access the HDFS file system.
- Path: Complete path to the directory containing the data within the
specified HDFS file system. You can leave this field blank if you intend the
DataTap to point at the root of the specified file system.
- Kerberos parameters: If the HDFS DataTap has Kerberos enabled, then you
will need to specify additional parameters. HPE Ezmeral Container Platform
supports two modes of user access/authentication.
- Proxy mode permits a “proxy user” to be configured to have access to the
remote HDFS cluster. Individual users are granted access to the remote
HDFS cluster by the proxy user configuration. Mixing and matching
distributions is permitted between the compute Hadoop cluster and the
remote HDFS. See Sample HDFS
Proxy DataTap.
- Passthrough mode passes the credentials of the current user to the
remote HDFS cluster for authentication. See Sample HDFS Passthrough
DataTap for an example.
- HDFS file systems configured with TDE encryption as well as cross-realm Kerberos
authentication are supported. See HDFS DataTap TDE Configuration and HDFS DataTap
Cross-Realm Kerberos Authentication for additional configuration
instructions.
Continue from Step 6, above, after entering the HDFS parameters.
NFS Parameters
Note: This option is not available for Kubernetes tenants.
If you selected NFS in Step 5, above, then enter the following
parameters:
- Host: DNS name or IP address of the server providing access to the
storage resource.
- Share:This is the exported share on the selected host.
- Path: Complete path to the directory containing the data within the
specified NFS share. You can leave this field blank if you intend the DataTap to
point at the root of the specified share.
Also, be sure to configure the storage device to allow access from each host and each
Controller and Worker that will using this DataTap.
Continue from Step 6, above, after entering the NFS parameters.
GCS Parameters
An GCS DataTap is configured as follows:
- Bucket Name: Specify the bucket name for GCS.
- Credential File Source: This will be one of the following:
- When Upload Ticket File: is selected, Browse button is
enabled to select in the Credential File. The credential file is
a JSON file that contains the service account key.
- When Use the Existing One: is selected, enter the name of the
previously uploaded credential file. The credetial file is a JSON file
that contains the service account key.
- Proxy: This is optional. Specify http proxy to access GCS.
- Mount Path:Enter a path within the bucket that will serve as the starting
pointfor the DataTap. If the path is not specified, the starting point will
default to the bucket.