The 12 rules for a Data Fabric
Enable organizations to simplify data management across the enterprise
+ show more
The concept of a data fabric has emerged to describe a new approach to support agile development of data-driven applications, analytics, and artificial intelligence (AI).
This paper lays out the 12 rules for a data fabric. A true data fabric provides the foundation for the next generation of applications and supports AI, analytics, and Internet of things (IoT). It must also scale, perform, and be reliable as well as function on-premises, across clouds, and at the edge.
- Major paradigm shift
We are in the midst of a major paradigm shift. The traditional approach of having applications dictate how data is organized and stored is going through a major transformation driven by increasing data volumes and the advent of more complex applications, particularly new AI workloads that require large data volumes. The term data fabric has emerged to describe a solution to reduce complexity and support agility to address these requirements. There is, however, much confusion in the market created by a proliferation of different approaches all describing themselves as data fabrics.
One has to be careful to look past marketing hype and the repackaging of old technologies. For example, ETL vendors offer integration and data federation tools that define the data flows from sources and destinations as data fabrics. Storage vendors market data fabrics that extend traditional storage networks. Virtualization vendors also extol data fabric solutions. Finally, is every solution that provides a Hadoop distribution a legitimate data fabric?
One way to separate the contenders from the pretenders is to review the technical capabilities a data fabric must have in order to reduce complexity and enable agility. Without these capabilities, a data fabric is limited in its ability to scale, stretch across locations, meet performance levels, and ultimately drive business value.
- The 12 rules
The following rules are useful in defining a data fabric. If a solution fails to meet any of these rules, it won’t fully function as a data fabric, and an organization will be forced to work around its shortcomings.
How data is stored
1. Linear scalability: The fabric should be able to scale without limits with growing data volumes, number of files, and concurrent client access.
2. Architected to support scale, performance, and consistency: A data fabric should ensure data consistency locally and provide a simple consistency model for developers to implement across locations. For example, expecting a developer to implement tradeoffs between scale, performance, and consistency is not tenable.
3. Distributed metadata in the fabric: A fabric should allow metadata to be distributed across all data storing nodes to avoid failure points or bottlenecks.
How data is accessed
4. Mixed data access from multiple protocols: A fabric needs to provide equal support for disparate data types and access methods. A fabric needs to fully integrate multiple data types, not simply attempt to orchestrate across different underlying datastores to provide links to separate silos of data. The ability to have multiple file and object protocols access and update the same data eliminates ETL functions, allows a broader range of software applications, and delivers low-latency processing for complex applications.
5. Distributed multi-tenancy: To effectively support a wide range of applications and users as well as manage and secure the fabric, organizations need the ability to create logical separations in the fabric for administration, access, update, and execution. The fabric should be able to isolate heavy loads and provide protected resources. It should rebalance automatically based on load changes or after failures.
6. Global namespace: A data fabric needs a global namespace that supports a view of and ability to access data regardless of how it is distributed across physical locations including on-premises, cloud, and edge.
7. Integrated data streaming for AI: The line is blurring between data in motion and data at rest, particularly with IoT and AI applications that need to operate in very small event window to impact the business. A data fabric needs to support integrated streaming to easily ingest and integrate data in motion with data at rest with a common management, security, and analytics framework.
How data is distributed
8. Distributed location support: A data fabric must provide the ability to deploy and execute across on-premises, cloud, and edge locations with centralized management and fully integrated functionality across the entire infrastructure. A fabric needs to stretch across locations, not simply have the ability to install and run in different locations.
9. Multi-master replication: A data fabric requires multi-master replication across multiple locations to support distributed operations with transactional integrity.
10. Location awareness: Though a fabric is uniform, it must understand and control the placement of specific data and jobs for cost, performance, and compliance reasons. Distant data should be addressable and accessible from afar as well as replicable locally.
How data is secured
11. Capability to serve as a system of record: A data fabric requires features that prevent data corruption and provide backup and disaster recovery capabilities. Without these, a fabric is only appropriate for a narrow set of uses that are not impacted by data loss.
12. Data security and governance within the fabric: Data must be secured within the fabric and not be a function of the access method. As applications and access methods expand, security becomes more complex and open to compromises if it is not secured at the lowest level of the fabric.
So how do the various data fabrics stack up? Very few solutions in the market today meet more than a few of these rules. Some vendors provide a data fabric that is little more than a collection of separate products. In this case, the data fabric is a brand, not a technology, and struggles to meet more than three of the above rules. The one exception is HPE Ezmeral Data Fabric as it fulfills all 12 of the rules for a data fabric.
By fulfilling all of these rules, HPE Ezmeral Data Fabric enables organizations to simplify data management across the enterprise, easily move data, and execute workloads across on-premises, cloud, and the edge for performance, cost, or compliance reasons. HPE Ezmeral Data Fabric also provides the speed, agility, and flexibility required to successfully integrate AI into business operations.
A data fabric is the foundation that helps organizations reduce costs and drive innovation. These innovations combine AI and analytics into next-generation applications that increase revenue, efficiency, and the ability to manage risk. Review the rules to save you and your organization a lot time, frustration, and money.
© Copyright 2020 Hewlett Packard Enterprise Development LP. The information contained herein is subject to change without notice. The only warranties for Hewlett Packard Enterprise products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Hewlett Packard Enterprise shall not be liable for technical or editorial errors or omissions contained herein.