Siloed data is rising from the dead
Data is the foundation upon which all enterprises innovate, drive business models, woo customers, disrupt the market, and ultimately reap profits. However, data can’t support all that if bits and bytes are missing. Thus, it became a business imperative to bust data silos—and that effort largely succeeded thanks to the rise of the API economy. But new technologies are once again creating silos of data and often in incompatible formats. So now what?
In their previous incarnation, data silos were mostly applications and the data attached to them. Each was a quagmire of proprietary code chained to a monolith infrastructure firmly grounded to a company’s own premises. But even when the cloud became not only "a thing" but THE thing in IT, software-as-a-service (SaaS) applications largely carried on the tradition of vendor lock-in and data hostage-taking. A few things contributed to the move away from data silos:
- First, API proliferation connected datasets. APIs didn’t do much to help with cleaning or updating data nor in reconciling versions, admittedly. Still, at least the move to APIs made such work possible, once the previously separated data was integrated.
- Open source also took flight about then. Open source projects impacted formerly completely proprietary applications. That made it easier to share data more easily and thus eased some of the need for an ever-increasing number of APIs. Unfortunately, all of this built a bank of clouds, which also became data silos.
The challenge of integrating multiple clouds led to more APIs and new tools. And—voila!—the issue of data silos was largely considered solved or at least managed.
But technology doesn’t stand still. The issue of silo management is back on the problem list. The culprit: new technologies, from wearables and IoT to blockchain.
More inputs, more datasets
“IoT devices are multiplying, as device manufacturers all over the world create exciting new devices loaded with sensors that can measure and react to many metrics, including light, sound, temperature, visuals, motion, etc.,” says Neeraj Murarka, CTO of Bluzelle, a blockchain startup.
“Unfortunately, there is no clear standard yet on how these data points are collected, formatted, scaled, and stored," Murarka adds. "This means that interoperability of data between devices and between different manufacturers is a challenge. In fact, it can be a challenge even between devices from the same manufacturer.” Previous solutions to these dilemmas often don’t work because they are next-gen data silos with entirely different characteristics.
Not only is much of the data produced outside of the company’s firewall, but it can be computed and stored somewhere else too. “The majority of data in modern enterprises is produced outside of the data center. Examples of such data include video surveillance, output of IoT devices, and sensors, as well as documents generated by end users on their desktops, laptops, and mobile device,” says Aron Brand, CTO of CTERA, an enterprise file services platform for telcos and managed service providers.
In the cases of wearables and IoT, data may be used and stored on the device, at a gateway device, or in the cloud. Various data points from that device may even be stored across all three.
“There’s a clear trade-off between processing done at the edge and in the cloud. Computing power and storage is limited at the edge compared to the cloud, whereas the edge has access to higher fidelity data,” says Johnathan Vee Cree, PhD, embedded and wireless systems scientist at Pacific Northwest National Laboratory (PNNL), a U.S. Department of Energy government research laboratory.
Similarly, data from healthcare devices often is locked in HIPAA-protected repositories such as patient or provider clouds. Yet, that data and analysis is often needed immediately for medical staff to react in time. “There are obvious needs for edge computing in healthcare. Devices need to be able to react to changing patient conditions as quickly as possible and must not rely on the Internet to do so. Any calculations being performed need to be accurate and have access to as much data as necessary,” says William Moeglein, a software engineer at PNNL.
Balancing patient privacy and compliance with regulations such as HIPAA against the need to bust silos and gain access to much needed data, even in the absence of an Internet connection, is a complex set of problems. In the case of blockchain, data is stored in a closed network where it has multiple points of verification and validation and is guarded against breach and change attacks. But yes, those very characteristics can make a blockchain’s network a data silo too.
But completely new data silos are springing up every day from applications that were only imagined a short time ago. Take, for example, commercial smart buildings, which use networks of sensors to manage systems such as lighting, security, and HVAC systems, often on a per-zone basis. “Sensors measure signals such as temperature, humidity, air quality, and occupancy to decide how to optimally route resources. These sensors increasingly having more interconnectivity and can communicate to collectively make decisions,” explains Vee Cree. “For example, building smoke detectors can communicate with the HVAC system to disable the flow of air to slow the spread of fire.”
It doesn’t take a rocket scientist to see that APIs and data lakes alone cannot bust these newfangled data silos. Is it time to reconsider how we handle all this data?
How innovation is tackling the new data silos
That is not to say that these new datasets can’t or shouldn’t be busted apart and the data made more accessible. Indeed, innovation already has made it possible in some cases.
Take, for example, video surveillance footage. It now can be stored in the cloud and analyzed in real-time data feeds. But that also poses more than a few immediate problems. “In practice, sending footage directly to the cloud has major limitations. How would your cameras keep on recording during a lengthy Internet service interruption? And do you need to be able to retrieve videos from the archive quickly in order to view them on site?” asks CTERA's Brand.
“Do local storage silos need to make a comeback? This thought is causing IT to shiver with horror. Remote branch office storage is extremely hard to manage and to protect at scale, and when something goes wrong with a remote storage silo, things can get really ugly,” Brand explains.
The same problems Brand outlines with video surveillance footage are exacerbated in the case of autonomous, connected cars where data must be collected and analyzed faster than it takes for a car to crash.
This new urgency potentially creates more localized data silos, even as it also spawns the creation of edge computing, wherein data can be analyzed locally and then moved to the cloud for more in-depth evaluation.
One solution for preventing a data silo in just such a scenario is local caching. “Capturing cloud and edge synergy can be done in edge caching, often in the form of edge gateways. These gateways ingest all of the files from the old servers, including existing security ACLs and shares, and push everything to the cloud while exposing the same experience as a traditional network share. By adding a layer of smart local caching, everything feels as fast and responsive as before,” says Brand.
Other possible solutions include using more open source applications and platforms and developing industry-wide standards. “The lack of interoperability means that the value of the data being collected is limited. It is siloed only for use by the applications and developers that have intimate knowledge of the data and how to use it,” says Murarka. “But commercially, there is an incentive to lock customers into the datasets and proprietary ‘standard’ a company uses for its own devices, to ensure that customers cannot easily consider competitors.”
Fortunately, pressure is mounting on vendors to stop these lock-in practices. Addressing this issue is much of the impetus behind open source adoption in proprietary software companies. Many business intelligence (BI) applications, for example, enable data ingestion from a variety of sources. However, those also tend to be focused on SQL or NoSQL—structured vs. unstructured data—which can complicate the problems in using siloed data further. To combat this, some BI software providers are working toward developing more general BI apps that use machine learning to overcome variety in data formats and the resulting complexities in querying the data.
Sometimes a technology that is currently forming a data silo is ultimately its own answer to the problems it creates. Blockchains are a prime example, as they are on course to becoming providers of clean, verified data that can then be fed directly into data storage or ingested by analytics like any other data source. Blockchain can ensure data origin, authenticity, and accuracy, while it also guards against editing, says Ron Wince, CEO of Myndshft, an AI and blockchain provider for healthcare organizations. “The ability to process it is similar to what is faced today," Wince says. "Unstructured data is still a problem for generating insights. For that, the same tools that companies are using today for unstructured data are still the best bets.”
All paths are leading to data decentralization
While enterprises have to stay sharp to prevent the formation of more data silos and break data free whenever it does get trapped, a larger trend is forming in the IoT sphere that will likely resolve most future data silo problems: data decentralization.
"The best way to address the problem is to keep the data from going into silos in the first place," says Duncan Pauly, CTO of Edge Intelligence, a distributed analytics provider. "Edge computing services are rapidly maturing to offer greater capabilities that analyze vast amounts of IoT-generated and other kinds of big data in real time, wherever it resides and without needing to move it to a centralized location, so there isn’t the issue of data going into unreachable silos. For legacy data, flexible/configurable data import functionality included in these evolving services ensures that potential silos are removed, by making it possible to analyze newly generated data along with legacy systems, all at once."
Return of silo data: Lessons for leaders
- A problem once thought solved is returning.
- Open standards and open source may drive solutions to the silo problem.
- Expect the massive increase in edge data to exacerbate the issue.
This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.