HPE Ezmeral Data Fabric Object Store
HPE Ezmeral Data Fabric Object Store is the industry’s first data fabric to combine S3-native object store, files, streams, and databases in one scalable data platform that spans edge to cloud.
+ show more
Technical white paper
Object storage, with its virtually infinite capacity and low cost, has a long history of being deployed for backup, archiving, disaster recovery, and regulatory compliance. But the demands of today’s data-centric organizations have brought the technology to center stage of digital transformation.
Object storage is the most effective underlying technology for applying data analytics, machine learning, and artificial intelligence to vast data stores. It enables enterprises to build critical applications on vast stores of aggregated data then analyze it for business insights.
The biggest advantage of object storage is its ability to add value to primary data through metadata, or labels, enabling companies to easily search large data volumes, view audit trails, set policies and provide auditable records on who can see, open, or download the data.
Object stores have become the preferred platform for data analytics because of their speed, scalability, security, data integrity, and reliability.
HPE Ezmeral Data Fabric Object Store
Designed for enterprises that need high speed, performance, and scalability for analytic workloads, HPE Ezmeral Data Fabric Object Store is the industry’s first solution to combine native S3 objects 1, files, streams, and databases in a unified data platform that spans on-premises, cloud, and edge deployments. Available on bare-metal and Kubernetes-native deployments, HPE Ezmeral Data Fabric Object Store delivers a global view and console to unify management, security, and data access to service demanding data engineering, data analytics, and data science applications. The global console enables customers to automate and orchestrate both apps and data while delivering outstanding performance.
The HPE Ezmeral Data Fabric Object Store provides key advantages over other object-based solutions:
- Optimized storage and performance for analytics by optimizing all object sizes for both performance and storage efficiency into a persistent data store.
- Multiprotocol object access using the native S3 API or standard interfaces such as NFS, HDFS, POSIX, and CSI allowing data science applications and teams to use existing data access mechanisms.
- Resiliency and reliability inherited from the Ezmeral Data Fabric with globally synchronized edge-to-cloud access while orchestrating clusters and data together. A global namespace simplifies edge-to-cloud topologies by incorporating traditional file storage and object store data into a single namespace.
- Global setting of policies for mirrors, snapshots and replication ensures the right people and applications are accessing datasets when they need it. Replication eliminates single point of failure ensuring real-time access to data.
Storage and performance optimized
With the HPE Ezmeral Data Fabric Object Store, customers connect to a load balancer or the native object store server, not a gateway, that is responsible for executing S3 API calls for access authentication and authorization. The user has the flexibility to place the client accessible OSS on either the object store cluster nodes or external nodes, depending on available resources.
Object metadata, small objects, and large objects are distributed and replicated across the cluster providing infinite scalability and high-performance. It also provides increased uptime and the efficient use of hardware resources, which is important when there is a low latency to data, essential for intensive data analytic workloads.
Resiliency and reliability
HPE Ezmeral Data Fabric Object Store delivers high availability and redundancy to ensure 24x7 high availability. User-defined replication, mirroring, snapshots, and erasure coding provide the ability to place data into hot and warm categories.
Automatic replication delivers high availability and protection against a single point of failure. Objects and metadata are stored in volumes. By default, HPE Ezmeral Data Fabric Object Store replicates three copies across separate nodes, but replication can go as low as two and as high as six replicas for each volume.
For each volume, you can specify a desired and minimum data replication factor and a desired and minimum namespace (name container) replication factor. When the replication factor falls below this minimum value, re-replication is optimized to rapidly rectify the situation if data is being actively written to the container.
HPE Ezmeral Data Fabric Object Store enables the configuration of mirroring by associating source volumes with mirror volumes. When synchronization of the mirror volume with the (local or remote) source volume is triggered (either manually or automatically based on a schedule), the mirror volume synchronizes with the source volume. If tiering is enabled, it must be configured for both the source and mirror volumes. However, the type of tiering need not be the same.
Snapshots are point-in-time read-only images of a volume that can be manually completed or completed via an automated schedule. Snapshots are perfect for data engineers and scientists who may need a historical record of work or need to recover a specific model when a sandbox version fails.
A snapshot can be created instantaneously and initially takes up no disk space until incremental changes are made. All changes after the initial snapshot are forever incremental allowing analysts and data science teams to roll back to the original state when required.
Erasure coding (EC) is a data protection method in which data is broken into fragments, expanded, and encoded with redundant data pieces, and stored across a set of different locations or storage media.
If data becomes corrupted, parity bytes are used to reconstruct the data. Time to reconstruct depends on the number of data fragments in the erasure coding scheme plus the number of failures. For example, reconstruction of an erasure coding scheme of 10+2 takes longer compared to the reconstruction of 4+2 erasure coding, as a larger number of data blocks must be read.
Multiprotocol access and AWS-compliant S3 API
Customers can access objects either with the native S3 API or with traditional methods, for example, NFS, POSIX, REST, and HDFS. This allows applications and users to use existing access patterns without disrupting productivity.
The HPE Ezmeral Data Fabric Object Store S3 API enables users to:
- Create and name a bucket that stores data—buckets are the fundamental containers in S3 storage
- Store data—Save an infinite amount of data in a bucket. Upload infinite objects into an S3 bucket; each object can contain up to 5 TB of data. Each object is stored and retrieved using a unique developer-assigned key
- Download data—Transfer object data
- Set permissions—Grant or deny access to upload or download data to an S3 bucket. Grant upload and download permissions to three types of users. Authentication mechanisms keep data secure from unauthorized access
The s3cmd function in HPE Ezmeral Data Fabric Object Store supports a command-line S3 client for Linux® and Mac. The following commands are available from both the API and the CLI.
- mb (make bucket)
- put (place a file in a bucket)
- get (retrieve a file from a bucket)
- sync (from file system to bucket and vice versa)
- ls (list buckets/objects)
- del (delete object)
- rb (remove bucket)
Single unified object store
The HPE Ezmeral Data Fabric Object Store provides a unified interface for object store configuration, administration, management, and monitoring, with similar functionality to the command-line or REST APIs. The unified interface also incorporates objects along with files, events, and databases when licensed separately.
When a user logs in to the HPE Ezmeral Data Fabric Object Store UI, it displays what is shown in Figure 7.
When a user logs in to the Object Store UI using their domain user ID, the access-level privileges are determined by the admin’s credentials in the LDAP or Active Directory server.
There are three levels of access available to domain users.
- Cluster admin
- Account admin
The cluster admin has the highest level of privileges. The cluster admin can create and delete:
- Account admins
- Access keys
- User policies
The concept of accounts is new to HPE Ezmeral Data Fabric Object Store and was introduced to emulate cloud-based object store definitions, for example, organizations. An account will contain one or more buckets. The accounts can be named to represent the organization's hierarchy, for example, accounts may be named Engineering, Sales Engineers, HR, and such. It is a grouping of buckets, not users, as a single user may be able to access multiple accounts.
When a cluster admin creates an account, they can assign an account admin, which is derived from the domain. The cluster admin may also assign a bucket policy to the account. This may be the default policy, or the cluster admin may attach a JSON-coded policy that has a definition of the bucket policy.
When an account is created in the HPE Ezmeral Data Fabric Object Store, a volume is automatically created. At the creation stage, the cluster admin can define various volume parameters and configuration options such as:
- Storage label
- Erasure coding scheme
- Replication factor
When buckets are created they belong to a volume. Volumes are sized automatically, as additional accounts and buckets are created and objects are uploaded. Volume sizes are determined by internal algorithms and cannot be configured by the cluster admin.
In addition to accessing the HPE Ezmeral Data Fabric Object Store via the unified interface, objects can be accessed directly, or programmatically, using API calls. This requires the use of access keys and secret keys and is the same concept used for accessing cloud-based object stores. The user can generate a permanent access key for use in developing applications from the UI. These access keys can also be generated using a REST call.
In addition to the three default user policies (read only, write only, and read write), the cluster admin can also create a custom user policy using a JSON file.
The cluster admin creates account admins. Account admins cannot create new accounts; they can only administer those accounts they have granted access to by the cluster admin. When account admins log in using their domain credentials, a dashboard similar to the cluster administrator is presented except the account admin is only presented with those accounts under their responsibility. The account admin has privileges to:
- Create buckets
- Create IAM users
- Create IAM group
- Set quota
- Create user policies
- Assign user policies to domain users or IAM users
- Create buckets
IAM user names are for programmatic access to the HPE Ezmeral Data Fabric Object Store and are not domain user names. When creating an IAM user, the account admin specifies:
- User name—For example, Engineering Dept.
- The account names to which the IAM user is granted access
- An IAM group if defined
- User policy
- IAM user access keys
Account admins can create, edit, or delete buckets. This privilege is assigned by the cluster admin.
When creating or editing a bucket, an account admin can:
- Enable versioning
- Explore buckets
- Upload objects to the bucket
- Assign policies to a bucket—account admins can create bucket policies using JSON, in the same manner as user policies.
- View bucket metrics
Buckets cannot be nested (a bucket inside a bucket), but a bucket can have a folder structure into which an admin can upload objects.
The bucket metrics screen shows statistics about the bucket, including bucket size and the total number of objects and their size
An account admin or a user can view, delete, upload, and download objects. A folder structure can be created if desired to add the object. Tags and metadata can also be added.
Query with S3 Select
It is possible to query an object using S3 select (Note: only one object may be queried at a time). The results of a query can be downloaded to a .CSV or JSON file.
A regular user has the lowest level of privileges. The user cannot:
- Create accounts
- Create buckets
- Create policies
- The user can only:
- Create a folder
- Upload/download an object
- Create access keys for programmatic access (Note: a user can only create two Access keys.)
HPE Ezmeral Data Fabric Object Store is the industry’s first data fabric to combine S3-native object store, files, streams, and databases in one scalable data platform that spans edge to cloud. Available on bare-metal and Kubernetes-native deployments, HPE Ezmeral Data Fabric Object Store provides a global view of an enterprise’s dispersed data assets—unified access to all data within a cloud-native model, securely accessible to the most demanding data engineering, data analytics, and data science applications. Designed with native S3 API, and optimized for advanced analytics, HPE Ezmeral Data Fabric Object Store enables customers to orchestrate both apps and data in a single control plane while delivering the best price for outstanding performance.
Key features of the HPE Ezmeral Data Fabric Object Store include:
- Storage and performance optimized for all object sizes
- Resiliency and reliability inherited from HPE Ezmeral Data Fabric
- Highly performant S3 compliant API and data management capabilities
- Multiprotocol object access and management for containerized and legacy workloads
- A single unified management interface that spans on-premises, cloud, and edge resources
TensorFlow is a registered trademark of Google LLC. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. Active Directory is either a registered trademark or trademark of Microsoft Corporation in the United States and/or other countries. All third-party marks are property of their respective owners.
- 1 Based on HPE internal analysis, September 2021