StreamSets speeds data integration for data lakes and data warehouses for hybrid and multi-cloud environments. High-performance execution engines combined with a powerful management hub give you the flexibility and resiliency you need to deliver continuous data in the face of constant change.

Product Name

StreamSets Data Collector

Product Version


HPE Ezmeral Runtime Version



StreamSets is a modern DataOps platform that enables enterprises to continuously flow big, streaming, and traditional data to their data science and data analytics applications. StreamSets uniquely handles data drift—those frequent and unexpected changes to upstream data that break pipelines and damage data integrity—while allowing for the execution of any-to-any pipelines, ETL processing, and machine learning. An intuitive operations portal allows for continuous automation and monitoring of complex multipipeline topologies. 

StreamSets on the HPE Ezmeral Runtime provides the flexibility to create data for batch and streaming data with clear visibility into their data operations and performance. 

StreamSets provides the flexibility to create data for batch and streaming data with clear visibility into their data operations and performance.  

Ease of use: With an easy-to-use drag-and-drop interface, StreamSets enables the entire data team no matter their expertise to create data pipelines for their data applications. Data scientists, data analysts, and software developers can access the drag-and-drop interface of the StreamSets UI to build data pipelines that execute on Apache Spark or other stream processing engines running on containerized single-node or multinode compute clusters on the HPE Ezmeral Runtime.  

Any data, any format: The data that organizations need, to make the best business decisions arrive from multiple sources in multiple forms. This solution eliminates the overhead of managing multiple tools for different data formats. With StreamSets on HPE Ezmeral Runtime, users can connect to any data source and access it through their choice of tools and data processing engines.  

Extreme extensibility: This solution not only allows for data pipelines with any data source, format, or environment, it is also extensible for use by any member of the data team. StreamSets provides higher-order transformations for data engineers, SQL-based queries with SparkSQL for analysts, PySpark for data scientists, and custom Java/Scala processors for Apache Spark developers. 

Additional Information

Evaluate StreamSets Data Collector.   

Explore the industry’s first enterprise-grade container platform for cloud-native and distributed non-cloud native applications, HPE Ezmeral Runtime

Interested in learning more about the HPE Ezmeral Runtime and StreamSets? Please contact us to learn more.     

Explore other featured applications