HPE Machine Learning Development System

Scale AI models training from Idea to Impact with minimal code rewrites or infrastructure changes.


Discover and scale high-quality AI models seamlessly—focusing on innovation, not infrastructure

Unleash your AI models, developing and training at scale without hassle or delays. Instead of addressing infrastructure issues, iterate and collaborate on inventive models and leverage the best ones.

Avoid IT complexity with a turnkey AI system

Enable AI modelling capabilities that accelerate model training at scale, with a purpose-built, preconfigured, fully installed, performant out-of-the-box solution and built on a flexible architecture with foundation for heterogeneous accelerator support. Create the right environment for AI success with a system that is validated and backed by solution-level support.

Confidently advance your vision for AI with resources and reliability from an expert trusted partner

Propel your AI initiatives forward with an industry leader uniquely qualified to deliver a purpose-built, high-performance system. Build AI competencies rather than AI systems by drawing on a wealth of expertise, insights, and support.

First of its kind purpose built system for AI model training at scale

Only HPE Machine Learning Development System combines HPE hardware, third party accelerators with the HPE Cluster Manager and HPE Machine Learning Development Environment software for faster set-up, ability to monitor infrastructure, and resource utilization of the system, by workloads and faster/easier model development and training at scale. Unlike other solutions, HPE Machine Learning Development system saves you time, resources, and costs with the industry’s first turnkey, purpose-built AI solution integrating all hardware, software, and services creating a faster path to more accurate AI models and model training at scale.

+ show more
An AI Turnkey Solution

An AI turnkey solution for model training and development at scale

The HPE Machine Learning Development System is a turnkey solution that combines model training and development software with high performance computing in an optimized AI infrastructure, including accelerators, backed by expert installation and support services, that is validated and performant out of the box and ready for model training and development on Day One. Features like distributed training allow you to perform machine learning across GPU clusters without re-writing code or restructuring infrastructure, and automated hyperparameter optimization lets you automatically find and train more accurate models – faster.

Accelerating time to AI value

ESG research found that 60% of organizations need more than four months to see value from their AI/ML initiatives. Why does this matter?

Maximize the productivity of your teams

Scale your best ideas while maximizing the productivity of your teams

Give your team the resources they need to improve model quality and train the models at scale without delay. Free your data scientists and MLEs from writing infrastructure code. Save IT from wasting time in researching and configuring infrastructure solutions for AI model development and training.

The HPE Machine Learning Development System is:

  • Built for Scale: Seamlessly scale experiments from 10 to 100 GPUs powered by distributed training, high-performance computing, and cluster management on a validated, purpose-built infrastructure.
  • Collaborative:  Data scientists can easily and safely share resources, experiments, and data, enabling them to build on one another’s progress.
  • Complete:  Get everything — software, hardware (network, compute, storage, accelerators, etc.), and services (on-site installation, standard model set-up, etc.) in one system purpose-built for scalability. Single provider eliminates integration and support issues and frees team to focus on driving outcomes.
  • Flexible:  Flexibility today with standard and custom options —and a foundation for heterogeneity using accelerators. A variety of financing options are available through HPE Financial Services.
  • Trusted:  Sterling reputation and proven success: HPE in data center and advisory services, and HPE Machine Learning Development Environment for modelling software platform.
Learn more about HPE Machine Learning Development System
Out-of-the-box performance

Pre-configured solution provides out-of-the-box performance

The HPE Machine Learning Development System is a purpose built, validated, and pre-configured solution that reduces IT complexity and gives you out-of-the-box performance, allowing you to focus time and resources on model-development and training. This solution includes a platform for distributed ML/DL model training (HPE Machine Learning Development Environment software) and is integrated with HPE hardware infrastructure (HPE Apollo 6500 Gen10 Plus system) for standardized and configurable AI clusters, creating a faster path to more accurate models at scale. Built for exascale computing, each node in the system supports up to 8 powerful NVIDIA® A100 80GB GPUs and includes fast, local solid-state drives (SSDs) for establishing a distributed file system. Connecting through a Mellanox® InfiniBand HDR I switch, the HPE Machine Learning Development System establishes a high-speed, low-latency InfiniBand network ideal for distributed ML/DL training.

See key technical specifications below.

Get quick specs
View architecture guide
Integrated, efficient, automated AI/ML infrastructure

The HPE Machine Learning Development System is easy to set up, use, and scale.

Competitive advantages

HPE Machine Learning Development System advantages

System competitive advantages include:

  • Seamless Distributed training with an easy-to-use interface allows you to perform deep learning across GPU clusters AVOIDING rewriting code or managing infrastructure
  • Turnkey and validated out-of-box solution, ready for model development and training on Day One
  • Track and reproduce ML model work with experiment tracking that works out-of-the-box, covering code versions, metrics, checkpoints, and hyperparameters.
  • Automated hyperparameter optimization automatically finds and trains more accurate models, faster
  • Integrated hardware, software, and services solution that is performant ready out of the box, installed on-site, and built for scale
  • Future-proof AI infrastructure with flexibility for heterogenous accelerators
  • Validated and supported by a trusted vendor
Learn more about HPE Machine Learning Development System

Key technical specifications

Pre-configured, fully installed and performant out of the box

  • Out-of-the-box performance means reduced IT complexity
  • Focus time and resources on model-development 
  • On site installation, configuration and standard model setup

Seamless scalability - distributed training, hyperparameter optimization

  • Perform deep learning across GPU clusters with minimal code changes (Distributed Training, Hyperparameter Optimization)
  • Manage GPU costs

Manageability and observability

  • Monitor infrastructure and model metrics through single interface
  • Improved experiment tracking and collaboration between ML engineers

Trusted vendor and enterprise-level support and services

  • Predictable and secure supply chain
  • Access to talent pool of AI, HPC, and IT experts
  • Continuously evolving and improving software stack

Flexible and heterogenous architecture

  • Roadmap for multiple accelerator support

Component architecture

  • Compute Node – Apollo 6500 Gen10+ with NVIDIA® A100 8 way 80GB GPU
  • Management Node – Proliant DL325
  • Cluster Manager - HPCM
  • Storage – HPE Parallel File System
  • Mellanox HDR IB

Software and hardware supported

  • Software: HPE Machine Learning Development Environment, HPE Performance Cluster Manager, Red Hat Linux, SUSE Linux
  • Hardware:  HPE Apollo 6500 Gen 10 plus  NVIDIA® A100 8 way 80GB GPU, HPE Proliant DL325, Mellanox Infiniband, HPE Parallel File System

Service and support

  • Warranties and service options are based on the offerings of the underlying components
  • HPE customer support provides onsite hardware break/fix support and remote, remedial software call center support

HPE Machine Learning Development System portfolio