HPE Ezmeral Data Fabric
Modern data management for your data-driven enterprise
+ show more
Accelerate business insights by operationalizing data and then exploit it across multiple use cases
Data related challenges
Meaningful business insight comes from integrating and connecting data across multiple sources, tools, and technologies. Only then can you deliver the right data to the right place at the right time.
Connecting data across multiple sources is an enormous challenge. Do you replicate it to different geographical locations? How can multiple use cases leverage the same data sets? How can you democratize data without impacting security, compliance, or locality regulations?
Traditional data management solutions struggle to keep pace with the growth of data and the changing requirements of today’s digital enterprise. That’s because they don’t scale well, have complex infrastructures for each system, and require workforce with specialized management skillsets. As a result, delivery of business initiatives come with a set of large challenges that are time consuming and expensive.
The digital enterprise is responding with data modernization initiatives that include a modern data fabric allowing them to:
- Automate, curate, and orchestrate data across diverse sources
- Accelerate fit-for-purpose insights to business and technical users with self-service data access
- Enable global data sharing across peers, employees, partners, and customers
- Deliver consistent, real-time data to build modern API-based cloud-to-edge applications 1
Forrester reports that organizations without a data fabric strategy will spend more time and effort ingesting, integrating, curating, and securing their data. 2
HPE Ezmeral Data Fabric
Just as a loom weaves multiple threads into fabric, a modern data fabric unifies data from any source into a single database and then democratizes it with self-service access to trusted data. A data fabric operationalizes data, so it can be used across multiple use cases and processes no matter where the data is located.
HPE Ezmeral Data Fabric is a proven software-defined data store and file system with a strong track record across a wide variety of large-scale production environments. The founding vision was to make data-driven applications a reality for today’s digital enterprise by:
- Supporting multiple data types, application programming interfaces (APIs), and ingest mechanisms
- Enabling end-to-end artificial intelligence (AI)/machine learning (ML) workflows
- Leveraging community innovation to handle a large and evolving set of tools and frameworks
- Innovating data fabric technology
- Accelerating containers and Kubernetes
Universal data access
One of the core tenets of HPE Ezmeral Data Fabric is an open platform that provides direct data access for legacy applications, modern analytics, and AI applications.
Applications and users can use Hadoop Distributed File System (HDFS) or Portable Operating System Interface (POSIX)-based APIs. Developers can use Java, Python, SQL, Apache Spark, or Apache Hive queries. Data scientists can use the latest AI and ML tools, such as TensorFlow, H2O.ai, or PyTorch, reducing the need to copy data to a special-purpose system before it can be accessed. If your teams are familiar with Hadoop, they have access to the full range of Linux® commands without changing Hadoop or Spark programs (see Figure 1). The built-in Container Storage Interface (CSI) driver allows Kubernetes-based applications to directly access and manage data stored within HPE Ezmeral Data Fabric.
Workflow and architecture
The support of HPE Ezmeral Data Fabric for a wide range of APIs enables customers to reduce the proliferation of multiple point solutions.
Presented in the Figure 3 is a real-world use case where a variety of systems, such as local files, NFS farms, and HDFS are used to store data. Each gray box represents a different system or tool. The workflow is shown by the system of arrows. Before data can be shared, IT needs to manually copy data to secondary systems, each with a different security model resulting in gaps where data can be lost or exploited by malicious attacks.
Notice how many times IT needs to copy large data sets between systems.
The global namespace of HPE Ezmeral Data Fabric aggregates file information into a unified structure where physical servers or share locations are part of a single name.
Imagine the impact on your developers, analysts, and data scientists when applications and workloads can directly access files, tables, and event streams as if they were local. It simplifies the conceptual design of large systems and allows multiple applications to work together on the same data sets.
Scalability, reliability, and performance
With traditional data management solutions, you need to make trade-offs between scalability, reliability, and performance. HPE Ezmeral Data Fabric uses distributed metadata, self-healing, and internal load balancing to reduce these trade-offs.
Using 3-way replication, HPE Ezmeral Data Fabric distributes and replicates data across clusters to directories, files, tables, and streams protecting against data loss, single points of failure, or hotspots. Three-way replication also spreads data and metadata across multiple processors delivering high performance and resilience. If a replica is lost, HPE Ezmeral Data Fabric uses the remaining replicas to create a new copy with zero impact on the system, application, or the developer. When multiple applications need to access the same data, the solution avoids congestion by utilizing multiple replicas and network links, when available, to balance traffic between network paths.
Platform-level data management
In systems where there are millions of files and objects and hundreds of applications with conflicting needs, it is critical that the management effort remains nearly constant as the amount of data increases.
HPE Ezmeral Data Fabric helps ensures efficient platform-level data orchestration through a concept known as a data fabric volume. Data fabric volumes behave similar to directories because they expand only when data is ingested into the volume.
Snapshots and mirroring
Data fabric volumes serve as the foundation for mirroring and point-in-time snapshots as an efficient method of moving data from a source volume to a remote cluster. Snapshots and mirrors can be manually completed or their lifecycle managed through an automated policy. They are resource efficient using no storage capacity until the data changes.
Snapshots are perfect for maintaining exact data versions because files, tables, and event streams are mirrored together, which is required to train AI and ML models.
Mirroring provides an efficient way to move data between clusters or different geographic locations. The mirroring process starts when a snapshot on both source and destination volumes record the changes made since the last snapshot. Once the data is moved, future updates are considered forever incremental updates. Mirroring can be invoked manually or by a script, are commonly scheduled hourly, and are retained according to a defined retention or evaporation schedule.
You need a solution that unifies data while supporting the need for real-time decisions. HPE Ezmeral Data Fabric delivers a unified approach to data management that supports multiple use cases and processes while democratizing data to empower your technical and business teams.
Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. Java is a registered trademark of Oracle and/or its affiliates. All third-party marks are property of their respective owners.
- 1,2 “Enterprise Data Fabric Enables DataOps,” Forrester, January 2021