Cloud Data Platform

What is a Cloud Data Platform?

A cloud data platform is a data center that is located on the cloud, including a server and data storage. It provides virtualized access to data from multiple sources in multiple locations.

What does a cloud data platform do?

One step in an organization’s digital transformation involves migrating its data ecosystem and enterprise data to the cloud from its traditional on-premises data centers or warehouses. A cloud data platform is where these resources are relocated to, allowing enterprises to create a data lake that can be accessed anywhere, any time. With this “democratized” data, both structured and unstructured data can be rapidly ingested to empower analytics. The platform can also scale quickly as data and analytics needs changes.

Why do enterprises use a cloud data platform?

By using a cloud data platform, enterprises gain an easier way to leverage their data. It allows the data to be managed, secured, and viewed from any location, both remotely and on-premises. These virtual data platforms offer the reliability of an on-premises data warehouse with an affordability that physical hardware cannot match. Organizations use these platforms to gain a much more flexible data exchange, which then empowers more informed business decisions.

Cloud data platform elasticity

Cloud data platforms are far more elastic than their on-premises counterparts and provide an integrated view into the data hosted on the platform. These platforms enable full observability of everything running on them, including CPU and memory utilization, as well as insight into what queries are running and how they can be optimized. 

Data is stored in clusters and by observing actual workload behavior, an enterprise can grow or shrink a cluster to avoid having underutilized capacity.

Moving to a cloud data platform

CIOs often find it difficult to predict the peak usages of their enterprise, making it likely that they will overprovision their data warehouses to avoid performance problems. As a result, the case to modernize data resources and move them to a cloud data platform that can quickly scale seems obviously beneficial. 

However, many CIOs are slow to give up on more than six decades of running and maintaining their workloads on-premises. To stay on top of their data, enterprises need to do a cost-benefit analysis for a potential switch to a cloud data platform. Fundamentally, they need to decide if the cost of migration and new licenses outweighs the cost of overprovisioning and long-term operations.

What is the architecture of a cloud data platform?

A typical data platform is made up of several components that handle different aspects of data management. The architecture is layered into:

  • Data lineage
  • Date security and audit logging
  • Metadata, business glossary, data catalog, and data search
  • Storage and compute
  • Data governance
  • Data quality and data trust

The cloud itself allows users to decouple all components of data platforms, which helps enterprises scale applications and avoid getting locked into any vendor’s proprietary tools. And most cloud data platform providers separate compute and storage for better data control and agility. 

Data is first imported and then cleaned in data pipes. As for storage, cloud data platforms store data in two tiers: one for “hot” data and the other for “cold” data. The first tier is memory, where the data index and the most frequently accessed data are held. The second tier is local disk, or persistent disk (often a solid-state disk), which is typically basic cloud object storage. This tier usually delivers slower performance.

To store data, the cloud data platform first writes updates to the fastest in-memory tier and then copies out to the cloud object storage tier to help improve overall performance. The hot data tier pulls data up from the cold data tier when queried, and looks at the data on a very deep, granular level, which eases the path toward business-critical insights.

WHAT ARE THE ADVANTAGES AND DISADVANTAGES OF CLOUD DATA PLATFORMS?

As workloads fluctuate and unstructured data volume continues to rise, the pressure to modernize IT is accelerating. However, organizations need to carefully consider whether and how to incorporate cloud infrastructure, such as cloud data platforms, into their IT ecosystem. 

Advantages

  • Flexibility: As data and analytics needs evolve, cloud data platforms can scale capacity quickly and easily.
  • Visibility: Cloud data platforms rapidly ingest structured and unstructured data that empowers faster analytics.
  • Access: Moving resources to the cloud facilitates creation of a data lake to democratize data and share it anywhere and anytime.
  • Right-sized costs: Rather than paying for an overprovisioned system, using a cloud data platform with its consumption-based model allows enterprises to pay only for what they use, as they use it.

Disadvantages

  • Utilization: Data center utilization can quickly change from full capacity to two-thirds of utilization as workloads are moved to the cloud. Dropping a single server-refresh cycle will create that scenario.
  • Complexity: Shifting workloads can increase the complexity of IT operations—decisions to ramp up/down are made on a case-by-case basis due to changes in business priorities or portfolio and workload shifts.
  • Increased compliance pressure: Data privacy and data residency regulations continue to evolve, making the need to move workloads changeable.

How are cloud data platforms used?

The elastic nature of cloud data platforms makes them an ideal tool for responding to changing workloads, business goals, and markets. But how exactly do businesses use them? Read below for a few use cases:

  • Data consolidation: Rather than using multiple spreadsheets and other flat-file data sources, analysts use cloud data platforms to build a “data mart.” There they can easily load and optimize data from multiple sources for analysis and actionable insights.
  • Operational insight: Data on a cloud data platform can be easily integrated with business-critical applications, offering a simple way for results to be operationalized and fed back into applications to enable data-driven decisions.
  • Versatile analysis: Data analysts all have their own favorite tools, particularly open-source tools, which can be incompatible with fixed data platforms. Cloud data platforms offer full interoperability, which enables subscribers to plug in their own tools and use them within the platform. This way, they can migrate insights to another tool if needed and prevent vendor lock-in.
  • Streaming data processing: A cloud data platform combines the abilities of a data lake and a data warehouse to process streaming data and other unstructured enterprise data, enabling machine learning (ML).

HPE and cloud data platforms

Organizations face many challenges in managing their data—not just how to optimize data workloads on the cloud, but also how to optimize them in hybrid environments that comprise edge, data center, cloud and multi-cloud infrastructure. HPE offers an edge-to-cloud platform for users to run applications and services on-premises and in the cloud, along with services to manage the workload. For example, the growing portfolio of HPE GreenLake cloud services include:

  • Analytics: Open and unified analytics cloud services to modernize all data and applications everywhere—on-premises, at the edge, and in the cloud.
  • Data protection: Disaster recovery and backup cloud services to help customers take ransomware head-on and secure data from edge to cloud.
  • HPE Edge-to-Cloud Adoption Framework and automation tools: A comprehensive, proven set of methodologies, expertise, and automation tools to accelerate and de-risk the path to a cloud experience everywhere.
  • HPE Ezmeral Data Fabric Object Store: A Kubernetes-based storage technology that will run across hybrid environments. It enables users to combine different types of data from files, object event streams, and databases into the same data fabric.

HPE also recently introduced Ezmeral Unified Analytics, a cloud data lakehouse platform built with a group of open-source technologies that provide a data fabric for users to run data analytics and business intelligence workloads without being locked into any singular vendor’s technologies.