HPE EZMERAL MARKETPLACE

Explore | Learn | Engage | Deploy

Run.AI

Helps organizations manage Graphics Processing Unit (GPU) resource allocation and increase cluster utilization. Applies advanced scheduling mechanisms to dynamically set policies and orchestrate jobs for optimal resource utilization for IT Ops teams and data scientists.

Product Name:  

Run:AI Orchestration Platform

Product Version: 

1.0.43

Ezmeral Container Platform Version:

5.0

Overview

Run:AI’s software builds off of powerful distributed computing and scheduling concepts implemented as a simple plugin to HPE Ezmeral Container Platform. Together, the HPE Ezmeral Container Platform and Run:AI GPU orchestration solution enable dynamic provisioning of GPUs so that resources can be easily shared, for more efficient orchestration of AI/ML workloads and optimized use of resources. With Run:AI data scientists can seamlessly consume massive amounts of GPU power to accelerate their research.

Run:AI creates a virtualization and acceleration layer over GPU resources that manages granular scheduling, prioritization, and allocation of compute power. A dedicated batch scheduler, running on top of HPE Ezmeral Container Platform, manages GPU based workloads and includes mechanisms for creating multiple queues, setting fixed and guaranteed resource quotas, and managing priorities, policies, and multi-node training. It provides an elegant solution to simplify complex ML scheduling processes.

Current standards for orchestrating AI workloads rely on static resource allocations and lack the ability to schedule dynamic access to GPU. Run:AI provides GPU resource optimization that enables:

●        Efficient use of resources – jobs run on as many GPUs as they need, based on availability of the entire environment, essentially getting a ‘guaranteed quota’ of compute resources from a shared pool

●        Simplified GPU sharing - dynamic resource allocation removes static allocation hassles

●        Fractional GPU – multiple workloads can share a single GPU for more efficient resource utilization

●        Automated job scheduling – jobs run concurrently as long as there are available resources, greatly reducing the time for training tasks like hyper parameter tuning

●        Granular monitoring of GPU usage - by cluster, node, project, and job

Documentation
Additional Information

Evaluate Run:AI with a free trial

Explore the industry’s first enterprise-grade container platform for cloud-native and distributed non-cloud native applications, HPE Ezmeral Container Platform.   

Interested in learning more about the HPE Ezmeral Container Platform and Run:AI? Please contact us to learn more.      

Explore other featured applications