Is data gravity a barrier to becoming a data-first digital enterprise?
The edge is about action, and as such, it's quickly becoming the dominant point for data generation. Add to the mix the growth of machine learning and artificial intelligence and it follows that processing and acting on data where it is created, at the edge, will lead to results faster. This is all well and good, but the flip side is that moving compute power to the edge at scale requires us to formalize that movement in terms of people, process, and technology. This has led to the concept of data gravity: Data may not be tangible, but it certainly has weight, which makes it hard to move around easily.
Denis Vilfort, an edge evangelist, is the director responsible for edge strategy at Hewlett Packard Enterprise. For Vilfort, it's a matter of time before the edge is where the majority of compute power will reside. Lin Nease is an HPE Fellow and chief technologist. His overarching view of the internet of things is that it needs to be captured and controlled in order to be used to generate greater insights.
To understand their differing perspectives, we invited the duo to discuss data gravity as a barrier to becoming a data-first enterprise.
Explain the problem of data gravity.
Vilfort: The obvious factor in data gravity is the amount of data. But there is another dimension that is less obvious, and that is time. The value of data typically declines over time. The newer the data, the more valuable it is. This is especially true for data captured in real time, which needs immediate processing to yield value. Using camera streams to inspect production in a manufacturing plant can create 100 megabytes or more of data per second.
To be of value, this data must be processed in less than a second—sometimes in less than a millisecond. We can therefore express data gravity as the relationship between the amount of data generated and the size of the data processing window. A 100-megabyte dataset needing to be processed in a single second has 1/1,000th the data gravity of the same dataset needing to be processed in a millisecond. Data gravity is the same as the amount of data and the window in which it is processed.
Nease: The edge is simply the set of locations where the enterprise runs its operations—factories, hospitals, retail stores, logistics centers, even customer work sites. More and more enterprises are transforming these operations via digitalization, enabling them to automate processes, make better decisions, generate better forecasts, predict operational disruptions, use pattern detection to uncover bottlenecks, and so on.
The data generated by operations is typically regarding the physical world and generated at the same location as the physical phenomena the data describes. Video, audio, and time series sensor data arrives in real time, must be processed in some way locally, and represents a volume too large to send completely to core locations—cloud or data centers—for processing.
Please read: AI sharpens its edge
The more data that is generated and the less time one has to process it, the higher the data gravity. The data simply cannot yield value in a timely fashion and be moved from that edge location. It is too "heavy"—the location exerts gravitational pull on the data mass.
Can data gravity be addressed by a cloud-first approach?
Vilfort: Cloud-first strategies have been talked about for a while now, but here we are discussing a data-first strategy. Is an approach to data gravity different from either approach? Cloud-first is a necessary step.
This is infrastructure-centric modernization, using cloud platforms to manage silos of multi-generational compute and storage. Driven by development velocity on the one hand and cost optimization on the other, it cedes control to cloud kingdoms. Data-first is about seeing around corners, delivering digitalized, differentiated experiences, where the value comes from decision-making through data insights while keeping control.
When people talk about cloud-first, they are saying that they are getting out of the business of building and managing their own data centers located on the premises. Instead, they move their workloads and datasets to a large data center somewhere else, run by a service provider. This shifts costs from capital expenditure to operational charges.
There is a blind spot in this attitude, in that the data that the typical distributed enterprise generates is not generated in the cloud but rather at each company location. Moving data to the cloud for processing takes time—in many cases, time that is not available when datasets are large and processing time short. This means that critical insight that could be gleaned from timely data processing now simply cannot be obtained. And the more data that is generated in each company location, the more data gravity it has.
Nease: Let's take an application-centric view. Cloud-first strategies are really about application support rather than data support. In many cases, the persistent data associated with a given application is compact and well-encapsulated and does not factor heavily in the decision to rehost the application in the cloud. However, when huge volumes of data are generated in the enterprise's operational locations, migrating it to anywhere else—cloud or otherwise—makes little sense and is cost prohibitive.
Cloud-first strategies do at times address this, by virtue of proclaiming that the applications that need local access to these large datasets cannot be migrated to the cloud. However, treating data equally to application logic would be a much more head-on way of addressing data gravity.
Didn't data and data gravity grow at the same time as cloud-first?
Nease: Data logic is to a data-first strategy as application logic is to a cloud-first strategy, so aren't these mutually exclusive trends? The same hardware and network capacity and software productivity boosts that enabled data gravity have also enabled cloud services. Cloud is simply a concentration of commoditized data centers with ever deeper software-defined services.
Yesterday's apps are relatively easy to host with today's technology, and cloud providers are doing just that. Data gravity is focused more on new workloads and other challenging data problems: AI at the edge; IoT; smart factories, cities, venues, and buildings; autonomous entities; and so on.
Vilfort: I agree. Cloud usage will continue to grow and mature. As centralized cloud paradigms, under digital transformation, give way to "cloud where you need it," edge computing will become the premier strategy. Data gravity is happening faster at the edge. In fact, having data gravity occurring in a central cloud benefits only the cloud provider and creates an expanding moat between the enterprise and its data.
Data is best when fresh, and that means it's best left in place. It is the cloud experience and the data computation that now must be distributed across the gravitational pull of edge locations.
How will edge architecture change because of increasing data gravity?
Vilfort: Edge architecture will increasingly make use of massive parallelism and extremely high bandwidth. As sensors, and especially cameras, continue to increase in resolution, considerable processing firepower will be needed at the edge. There will simply not be enough WAN bandwidth to process this data anywhere but at the location of the camera.
Please read: Crushing edge complexity with automation
If we advance to three-dimensional imagery, we will need massive amounts of GPU cores to process such large data streams. My feeling is a multi-location enterprise estate covering hundreds or thousands of locations will have fully integrated fleet management to keep costs under control.
Nease: To accommodate the processing and data lifecycle needs of all this distributed data, edge locations will need to adopt architectures akin to cloud data centers, much like the communications world has become software-driven with network function virtualization. But the edge is even more complex than a data center, because the edge must also accommodate users, IoT devices like sensors, and operational technologies like control systems.
How does data gravity change the organizational dynamics between the business and IT?
Vilfort: Data that has a high degree of gravity is predominantly operational data. This is the data the business needs to make better real-time and frequently automated decisions. As data flows now mirror the creation and allocation of goods and services in real time, IT discipline and practices must directly be integrated with the organization's value creation. All businesses will be digital businesses. Increasingly, a move to fully integrated and highly distributed DevOps will be the norm.
Nease: Full-on IT skills and processes will be needed more and more at the edge. Over time, we will come to regard IT less as an organization and more as a set of processes, skills, and technologies. Every line of business will have IT-like organizations.
At the same time, IT processes will need to consume less labor, since data gravity is demanding more IT processes to be applied in more places than ever before. The solution? Managed as-a-service technologies, with vendors picking up the slack to monitor and manage their modular portions of this new edge architecture.
This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.