How the pandemic is driving data literacy
This article was first published in The new IT playbook, a report that explores what it means to be resilient and adaptable in the face of disruption.
The novel coronavirus appeared abruptly in early 2020 and spanned the globe within weeks. No country was spared the ravages of the commonly lethal COVID-19 it spawned. The disease continues to stump doctors and researchers because it is almost freakishly personalized in terms of early symptoms, clinical presentations, and lingering health effects.
Tremendous amounts of data have been and are still being collected and shared as the entire scientific world collaborates in a massive, historic push to stop the seemingly unstoppable. Finding treatments and vaccines are, of course, the intended end goals. But it turns out that the data has much more to reveal.
Microorganism generates macro data and the big picture
As the world stood still and united in lockdowns, massive amounts of data―on the virus and the affected nations―were shared among multiple entities. These include academic institutions such as Johns Hopkins University, national governments, and international organizations such as the World Health Organization (WHO).
Some of the historic data had existed for quite some time, but a good bit of it was hard to access. Newer data was often proprietary and closed off to many researchers and competing entities. But then the pandemic hit and suddenly these datasets were made accessible by everyone. The data is available in many formats, from programmatically accessible APIs and downloadable comma delimited files to ready prepared data visualizations.
"The amount of data on the pandemic that is suddenly available is breathtaking. It's not just medical data, but economic data, societal data, community and world response data, educational and cultural impact data, remote and migrated workforce data, and on and on," says Iveta Lohovska, principal data scientist at Hewlett Packard Enterprise.
Models were built, algorithms coded, inputs weighed, queries made, and artificial intelligence loosed upon it, as all of mankind mined that data from multiple perspectives and with a single purpose: to save lives.
We've rarely been more informed about the current status of anything. But the discoveries proved myriad and surprising. For example, this newfound wealth of data also brought to light the true power of data aggregation.
Going beyond immediate results
Initially, the thinking was that the immediacy of real-time data and analytics was the sole aim, given the immediacy of the threat. The lesson eventually learned is that immediate insights are only part of the story. When new statistics were published, everyone had immediate access to that and were running it through their own models. This meant a wide variety of results were generated and, on the surface, could even be conflicting, even though based on the same data. This was where understanding of how the results were derived was very important.
There are only a limited number of conclusions that can be drawn from the number of active and resolved cases per nation and region. Over time, this can show us a trend, and it also gives a very real snapshot of where we stand today. However, if we layer on additional data such as what actions were taken and when, as well as data from the larger macroeconomic sources, we can see clear pictures of the impact of that strategy over time.
Crisis data wins and warnings
Nations take different approaches based on a number of their specific characteristics, ranging from current political forces to culture and other socioeconomic factors. Several valuable lessons can be learned from making side-by-side comparisons of their strategies and effectiveness in combating the spread of the novel coronavirus, but only if the reasons for those differences are well understood and accounted for.
These comparative studies are helping organizations and governments make decisions going forward. In fact, the data we are producing today by processing all of these feeds may turn out to be far more valuable for the next pandemic than it will for this one.
Today's data illiteracy rates are very high in corporate workforces and the general population. Yet, the urgency of the crisis spurs immediacy in the release of information, which typically means with little regard as to how the information without the needed context is likely to be interpreted by untrained minds. The default assumption of those releasing the data is that it will be interpreted by subject matter experts, and so the data is often incomplete or lacks context. The expectation is that the SME already has that background data and can aggregate, whereas unintended recipients may not even be aware of this need. This is one thing that can lead to varied interpretations of the same data.
For example, a reduction in infections within a country could be interpreted as an improvement in the condition rather than a simple variation that is expected. At the time of writing, the total number of recorded deaths from the novel coronavirus stood at more than 325,000 and continues to rise. The numbers can be large or small depending on the time frame, the geographic scale, the demographic composition of the population affected, or percentage of the population.
Improved data literacy improves future responses
Enterprises encounter similar issues in a business crisis or a natural disaster. Improving data literacy rates now can go far in preparing an organization to survive and even thrive through the next disaster. Of course, these skills are cross-transferable now; the principles, when applied to market data or supply chain analysis, will yield much better insights when the consumer understands how to apply basic data literacy and source understanding.
Other lessons are being learned as well, such as cautions against unfettered data collection. In short, more data is not always better. That's true, in part, because data has a defined shelf life. Data fluctuates, in other words, particularly medical data. And data surrounding a specific event can become less relevant over time. Quality also suffers over time as less attention is given to keeping the data clean and current. Privacy concerns may also emerge over time, leading to data pollution―the intentional feed of false information by people trying to protect their privacy.
Presenting information―not in its raw format but normalized, cleaned, and presented alongside other influencing data―is what teams at Hewlett Packard Enterprise and other organizations have been doing. Further, sharing contextualized data with the community drives valuable insights from the mass of data we have. And augmenting that data to provide new contexts and new insights is proving ever more valuable in how societies react and even predict the impact of COVID-19. That will also hold true for corporations seeking refined and accurate predictions and actions for any crisis.
By analyzing every facet of the world's biggest collective crisis experience, corporations can pay it forward in learning the immediacy of data analysis during the next crisis. In the end, a truly data-driven organization is one that can capture the meaning of the moment and render a meaningful corrective or leveraging action in record time.
Where the big picture and the bottom line meet
What all of this work on novel coronavirus pandemic-related data has taught us, as eager data scientists and engineers, is that there are two personas when it comes to data science: producers and consumers.
There are multiple levels of consumers, the intended audience, which is often professionals who have their own implied context, and passive observers who are exposed to the data through news reporting, Internet searches and general discovery.
The data scientists and engineers are the producers, and they have a specific view on that data and a firm understanding of how to interpret the data they are working with and which bits can be safely disregarded. Seeing a scatter plot or a hexbin map or other such visualization can be intuitively processed and provides immediate understanding to the viewer.
The consumers do not have the necessary experience and training required to make such judgments. Therefore, data should be presented to the consumer with the intent to fully inform, which means taking into account likely assumptions and pertinent context.
The data parsing and interpretation skills the population can learn from this approach―along with the fine-tuning of the skills of the data professionals in learning how to present the information in consumable packages―will likely coalesce to bring a data literacy level never before seen. This same skill set can then be leveraged for presenting social, political, financial, and many other verticals of data to a data-savvy populace.
The best defense for the next event
In summary, current work focused on the news about COVID-19 and surrounding data is only the tip of a very large iceberg. This reactive response is likely not the work that will have the most impact over the longer term.
Educating millions on the meaning of data when it is presented in context will drive new social conversations far in the future. This will allow us all to equally understand how our societies and economies really work and fully understand what our priorities should be, so that when the next pandemic hits the world, we are ready and informed.
The learning we are doing now will be the best defense for the next event, while helping us make immediate decisions to inform our reaction to this one.
- Telemedicine reduces healthcare workloads and increases access
- How healthcare technology uses data to transform the system
This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.