Enabling modern data analytics with HPE Ezmeral Container Platform and StreamSets

Learn how StreamSets and HPE Ezmeral Container Platform enable Modern Data Analytics to expand the analytics user base to a broader user base across the organization.

+ show more
Solution brief

Enterprise data analytics has evolved from being a business intelligence (BI) function focusing on historical reporting and dashboards to a democratized function where any user can access and analyze information using BI tools or advanced analytics techniques.


  • DataOps—DevOps for data integration


    Today’s business success hinges on advanced analytics techniques that leverage data from multiple different sources from edge to cloud. The ever-expanding volume and variety of data puts increasing pressure on the systems that ingest, store, transform, and deliver this data to data analytics and machine learning applications.


    Modern data analytics requires enterprises to expand the analytics user base to a broader user base across the organization. It requires datasets that augment structured data from on-premises databases with semi-structured and unstructured data stored in flexible and cost-effective platforms such as HDFS, cloud object stores, and even smart-edge devices. With the expansion in data users, enterprise data teams need to ensure data access for various applications from SQL queries and data mining to data science, advanced analytics, machine learning, deep learning, and artificial intelligence. Enterprises also need a flexible platform that allows their data users to spin up data analytics, data science, and AI/ML environments on a shared pool of resources—all while securely accessing enterprise data sources without data duplication.

  • HPE Ezmeral Container Platform and SreamSets bring DataOps to the Enterprise

    HPE Ezmeral Container Platform and StreamSets solution combines the benefits of a modern DataOps platform with a best-in-class container platform with 100% open-source Kubernetes.

    It enables enterprises to continuously flow big, streaming, and traditional data to their data science and data analytics applications. This solution uniquely handles data drift—those frequent and unexpected changes to upstream data that break pipelines and damage data integrity—while allowing for the execution of any-to-any pipelines, ETL processing, and machine learning. An intuitive operations portal allows for continuous automation and monitoring of complex multipipeline topologies.


    StreamSets on the HPE Ezmeral Container Platform provides the flexibility to create data for batch and streaming data with clear visibility into their data operations and performance.


    Ease of use: With an easy-to-use drag-and-drop interface, StreamSets enables the entire data team no matter their expertise to create data pipelines for their data applications. Data scientists, data analysts, and software developers can access the drag-and-drop interface of the StreamSets UI to build data pipelines that execute on Apache Spark or other stream processing engines running on containerized single-node or multinode compute clusters on the HPE Ezmeral Container Platform.


    Any data, any format: The data that organizations need, to make the best business decisions arrive from multiple sources in multiple forms. This solution eliminates the overhead of managing multiple tools for different data formats. With StreamSets on HPE Ezmeral Container Platform, users can connect to any data source and access it through their choice of tools and data processing engines.


    Extreme extensibility: This solution not only allows for data pipelines with any data source, format, or environment, it is also extensible for use by any member of the data team. StreamSets provides higher-order transformations for data engineers, SQL-based queries with SparkSQL for analysts, PySpark for data scientists, and custom Java/Scala processors for Apache Spark developers.


    Automated operations: The HPE Ezmeral Container Platform’s app store enables self-service, one-click deployment of applications from a curated collection of validated and certified application and microservice images with out-of-the-box configuration of networking, load balancing, and storage.


    Hybrid deployments: The HPE Ezmeral Container Platform offers a performance-optimized solution to deploy and manage data pipelines on any infrastructure, either on-premises or in multiple public clouds.


    Enterprise-ready: A solution built upon a proven enterprise-class multitenant container platform with 100% open-source Kubernetes with integrations into security and authentication services, HPE Ezmeral Container Platform supports high availability, fault tolerance, and resiliency for business-critical data applications.

  • Key benefits
    • Speed: Unified console for collaboration and visibility across all lifecycle stages, all design patterns, all engines. Rapid access to data science and advanced data analytics environments with in-place access to data sources.
    • Flexibility: A centralized dashboard to monitor and manage across multiple data sources with the ability to add or change sources or platforms without losing visibility or control. Including the ability to deploy and manage across your hybrid infrastructure.
    • Resiliency: Live data maps (aka topologies) for real-time operational insights with broad visibility to detect and prevent issues and lower deployment risk.
    • Enterprise-grade: Multitenancy with data isolation to ensure logical separation between different users with integrations into enterprise security and authentication mechanisms such as LDAP, Active Directory, and Kerberos.

    The ability to process data and make it available for analytics—rapidly, accurately, and securely—has become a core competitive capability for enterprises across industries and geographies. This ability is crucial for enabling analytics, but tools have not kept pace with today’s fast-multiplying data sources and formats.


    HPE Ezmeral Container Platform and StreamSets combine to deliver a next-generation DataOps solution. This solution—built on a best-in-class container platform—streamlines your data pipelines with on-demand, flexible access to containerized environments for various data science and data analytics use cases to deliver business-critical analytics, insights, and predictions.


Figure 1. StreamSets control hub




Download the PDF

The Intel logo is a trademark of Intel Corporation in the U.S. and other countries. Active Directory is either a registered trademark or trademark of Microsoft Corporation in the United States and/or other countries. All third-party marks are property of their respective owners.