This article describes the various storage usages and how datasets are made available to the containerized clusters.
HDFS is provisioned within the Docker containers that comprise a virtual Hadoop cluster when that cluster is created. The underlying storage for the HDFS data nodes in the containers resides on local disks in the physical servers hosting those containers. The deployment refers to the set of local disks as node storage. When using HDFS storage in a virtual cluster, the data does not persist beyond the life of the virtual cluster.
Ephemeral storage is built from the local storage in each host. It is used for the disk volumes that back the local storage for each virtual node. Installing a host reserves a subset of the local disks on that host for node storage. Physical Linux® volumes are created on those disks and then used to create a Linux volume group. A Linux logical volume is then created from this Linux volume group. This Linux logical volume is assigned to the Linux Docker subsystem, which in turn uses portions of the logical volume to the containers running on that host for use as local storage within those containers.
Deploying a persistent data fabric is supported on the local disks within the hosts. This local storage can serve as either HDFS storage or as persistent volumes for Kubernetes clusters. Persistent volumes for Kubernetes stateful clusters are seamlessly available either from the native persistent data fabric or Nimble Storage using the storage interface driver (CSI) that is deployed during cluster creation.
Disks located across the hosts provide an integrated scale-out, edge-ready persistent HPE Ezmeral Data Fabric. This data fabric effectively handles the diversity of data types, data access, and ecosystem tools needed to manage data as an enterprise resource regardless of the underlying infrastructure or location.
A deployment of HPE Ezmeral Container Platform must use an HPE Ezmeral Data Fabric for persistent storage. The standard way to do this is via HPE Ezmeral Data Fabric on Kubernetes.

HPE Ezmeral Container Platform allows you to create a Data Fabric as an embedded deployment, also called "embedded Data Fabric." If you configure tenant storage to connect to an embedded Data Fabric, then you will not be able to connect tenant storage to HPE Ezmeral Data Fabric on Kubernetes.
A deployment of HPE Ezmeral Container Platform can include HPE Ezmeral Data Fabric on Kubernetes, or an embedded Data Fabric, but not both.
HPE highly recommends configuring your deployment to use HPE Ezmeral Data Fabric on Kubernetes instead of an embedded HPE Ezmeral Data Fabric.

Getting the maximum flexibility from a container-based solution requires being able to independently scale compute and storage resources. It is also essential to be able to support the persistence of Big Data datasets beyond the lifespan of a Big Data compute cluster. The DataTap and IOBoost technologies allow virtual clusters to access remote data regardless of location or format.
A DataTap creates a logical data lake overlay that allows access to shared data in the enterprise storage devices. This allows users to run Big Data and ML/DL jobs using the existing enterprise storage without needing to make time-consuming copies or transfers of data to local disks. IOBoost augments DataTap's flexibility by adding an application-aware data caching and tiering server to ensure high-speed remote data delivery.
This persistent storage can also serve as filesystem mount storage (FS mounts). The filesystem mount feature allows automatically adding mounts to virtual nodes/containers, thereby allowing virtual nodes/containers to directly access POSIX data as if they were local directories. You can use this feature to provide common files across all of the virtual nodes/containers in a given tenant, such as a common configuration file that will be used by all of the virtual nodes/containers in the Marketing tenant. This eliminates the need to manually copy common files to individual virtual nodes/containers.
All applications running in containers can natively access data across the HPE Persistent Storage fabric via both DataTaps and FS mounts. Persistent volumes are seamlessly available across clusters from this persistent data fabric.
For all host types, the recommended storage for the operating system is two 960 GB SSD's in a RAID 1 configuration. See Host Requirements for detailed storage requirements and recommendations.


Please see the following for additional information: