HPE Machine Learning Development System
Scale AI models training from Idea to Impact with minimal code rewrites or infrastructure changes.
Discover and scale high-quality AI models seamlessly—focusing on innovation, not infrastructure
Unleash your AI models, developing and training at scale without hassle or delays. Instead of addressing infrastructure issues, iterate and collaborate on inventive models and leverage the best ones.
Avoid IT complexity with a turnkey AI system
Enable AI modelling capabilities that accelerate model training at scale, with a purpose-built, preconfigured, fully installed, performant out-of-the-box solution and built on a flexible architecture with foundation for heterogeneous accelerator support. Create the right environment for AI success with a system that is validated and backed by solution-level support.
Confidently advance your vision for AI with resources and reliability from an expert trusted partner
Propel your AI initiatives forward with an industry leader uniquely qualified to deliver a purpose-built, high-performance system. Build AI competencies rather than AI systems by drawing on a wealth of expertise, insights, and support.
First of its kind purpose built system for AI model training at scale
Only HPE Machine Learning Development System combines HPE hardware, third party accelerators with the HPE Cluster Manager and HPE Machine Learning Development Environment software for faster set-up, ability to monitor infrastructure, and resource utilization of the system, by workloads and faster/easier model development and training at scale. Unlike other solutions, HPE Machine Learning Development system saves you time, resources, and costs with the industry’s first turnkey, purpose-built AI solution integrating all hardware, software, and services creating a faster path to more accurate AI models and model training at scale.
+ show more
- An AI Turnkey Solution
- Maximize the productivity of your teams
- Out-of-the-box performance
- Competitive advantages
An AI turnkey solution for model training and development at scale
The HPE Machine Learning Development System is a turnkey solution that combines model training and development software with high performance computing in an optimized AI infrastructure, including accelerators, backed by expert installation and support services, that is validated and performant out of the box and ready for model training and development on Day One. Features like distributed training allow you to perform machine learning across GPU clusters without re-writing code or restructuring infrastructure, and automated hyperparameter optimization lets you automatically find and train more accurate models – faster.
Scale your best ideas while maximizing the productivity of your teams
Give your team the resources they need to improve model quality and train the models at scale without delay. Free your data scientists and MLEs from writing infrastructure code. Save IT from wasting time in researching and configuring infrastructure solutions for AI model development and training.
The HPE Machine Learning Development System is:
- Built for Scale: Seamlessly scale experiments from 10 to 100 GPUs powered by distributed training, high-performance computing, and cluster management on a validated, purpose-built infrastructure.
- Collaborative: Data scientists can easily and safely share resources, experiments, and data, enabling them to build on one another’s progress.
- Complete: Get everything — software, hardware (network, compute, storage, accelerators, etc.), and services (on-site installation, standard model set-up, etc.) in one system purpose-built for scalability. Single provider eliminates integration and support issues and frees team to focus on driving outcomes.
- Flexible: Flexibility today with standard and custom options —and a foundation for heterogeneity using accelerators. A variety of financing options are available through HPE Financial Services.
- Trusted: Sterling reputation and proven success: HPE in data center and advisory services, and HPE Machine Learning Development Environment for modelling software platform.
Pre-configured solution provides out-of-the-box performance
The HPE Machine Learning Development System is a purpose built, validated, and pre-configured solution that reduces IT complexity and gives you out-of-the-box performance, allowing you to focus time and resources on model-development and training. This solution includes a platform for distributed ML/DL model training (HPE Machine Learning Development Environment software) and is integrated with HPE hardware infrastructure (HPE Apollo 6500 Gen10 Plus system) for standardized and configurable AI clusters, creating a faster path to more accurate models at scale. Built for exascale computing, each node in the system supports up to 8 powerful NVIDIA® A100 80GB GPUs and includes fast, local solid-state drives (SSDs) for establishing a distributed file system. Connecting through a Mellanox® InfiniBand HDR I switch, the HPE Machine Learning Development System establishes a high-speed, low-latency InfiniBand network ideal for distributed ML/DL training.
See key technical specifications below.
HPE Machine Learning Development System advantages
System competitive advantages include:
- Seamless Distributed training with an easy-to-use interface allows you to perform deep learning across GPU clusters AVOIDING rewriting code or managing infrastructure
- Turnkey and validated out-of-box solution, ready for model development and training on Day One
- Track and reproduce ML model work with experiment tracking that works out-of-the-box, covering code versions, metrics, checkpoints, and hyperparameters.
- Automated hyperparameter optimization automatically finds and trains more accurate models, faster
- Integrated hardware, software, and services solution that is performant ready out of the box, installed on-site, and built for scale
- Future-proof AI infrastructure with flexibility for heterogenous accelerators
- Validated and supported by a trusted vendor
ESG recently evaluated the HPE Machine Learning Development System, exploring how the system can help organizations accelerate their time to insight, providing tools to accelerate and simplify model development and training. The team reviewed the productivity, ease of use, flexibility, performance, and investment value of the solution.
Key technical specifications
Pre-configured, fully installed and performant out of the box
- Out-of-the-box performance means reduced IT complexity
- Focus time and resources on model-development
- On site installation, configuration and standard model setup
Seamless scalability - distributed training, hyperparameter optimization
- Perform deep learning across GPU clusters with minimal code changes (Distributed Training, Hyperparameter Optimization)
- Manage GPU costs
Manageability and observability
- Monitor infrastructure and model metrics through single interface
- Improved experiment tracking and collaboration between ML engineers
Trusted vendor and enterprise-level support and services
- Predictable and secure supply chain
- Access to talent pool of AI, HPC, and IT experts
- Continuously evolving and improving software stack
Flexible and heterogenous architecture
- Roadmap for multiple accelerator support
- Compute Node – Apollo 6500 Gen10+ with NVIDIA® A100 8 way 80GB GPU
- Management Node – Proliant DL325
- Cluster Manager - HPCM
- Storage – HPE Parallel File System
- Mellanox HDR IB
Software and hardware supported
- Software: HPE Machine Learning Development Environment, HPE Performance Cluster Manager, Red Hat Linux, SUSE Linux
- Hardware: HPE Apollo 6500 Gen 10 plus NVIDIA® A100 8 way 80GB GPU, HPE Proliant DL325, Mellanox Infiniband, HPE Parallel File System
Service and support
- Warranties and service options are based on the offerings of the underlying components
- HPE customer support provides onsite hardware break/fix support and remote, remedial software call center support