Stateful Container Migration

This topic describes container migration and the external persistent storage pool used when migrating containers between hosts in HPE Ezmeral Container Platform deployments that implement EPIC.

In deployments of HPE Ezmeral Container Platform that implement EPIC, you may specify an external persistent storage pool for tenants and/or AI/ML projects using the Application Persistent Storage tab of the System Settings screen (see Application Persistent Storage Tab). Persistent storage exists either on hyper-converged resources or on a remote storage resource that is pointed to but not managed by HPE Ezmeral Container Platform. The persistent storage pool can then be used when migrating virtual containers among hosts.

Persistent storage can be implemented across some or all hosts, as shown here:

You can create, expand, and shrink storage capacity just as you would any other resource. This feature allows you to migrate containers between hosts by (default) preserving the following critical container folders for ongoing use:

/usr
/opt
/var
/etc
/home

The contents of other folders can be preserved during container migration by specifying their names in the metadata JSON file when creating a new application image generated using the App Workbench.

Use Cases

Big Data applications such as Hadoop and Spark offer robust high availability capabilities; however, some enterprise customers have operational requirements that require the ability to move virtual nodes/containers from one host to another. These operational requirements include:

A host crashing and the containers that were running on that host must be redeployed on other working hosts with minimal downtime and no additional configuration required.
One or more hosts needs to be replaced for maintenance and/or as part of a server refresh cycle. In this scenario, the containers running on those hosts must be seamlessly moved to other hosts that are not being replaced, with minimal downtime to the applications running in the containers.
Resolving a condition where there is an inability to meet the SLA for an application running in a containerized cluster (e.g. Spark or Hadoop) due to poor performance (CPU, network, or storage). This is a resource contention (bottleneck) condition that requires a re-balancing of virtual nodes/containers onto hosts with more available resources.

Enabling Container Migration

Once you have enabled persistent storage, the next step is to create one or more flavors that include at least 20GB of persistent storage, as described in Creating a New Flavor and Editing an Existing Flavor. Containers created using a flavor with persistent storage enabled will be preserved as described above. You may also assign a persistent storage quota to tenants, as described in Tenant and Project Quotas.

Note: Containers created using a flavor that does not have persistent storage enabled will not benefit from this feature, even if you later edit the flavor to enable persistent storage.

Note: You may create a flavor that specifies persistent storage even if no persistent storage has been defined in the Application Persistent Storage tab; however, an error will be returned if you attempt to use this flavor before enabling persistent storage.

Note: The persistent storage resource must have enough free capacity to accommodate the sum of all tenant persistent storage quotas or to accommodate the amount of persistent storage specified in all applicable flavors times the number of containers that use the flavors, whichever is greater. Further, if you specify a per-tenant persistent storage quota, then that quota must be large enough to accommodate the flavor-defined persistent storage times the number of containers using the applicable flavors.

There are two ways to use persistent storage to migrate a container:

Worker Vacate (EPIC hosts only): If an EPIC Worker host goes down and some or all of the containers on that host use persistent storage, then you can click the Worker Vacate button (moving dolly) for the desired host in the EPIC Hosts Installation screen (see The EPIC Hosts Installation Screen) to perform a Worker vacate function. All jobs running on the affected containers will end, but the containers themselves will be recovered as follows:
- No new containers will be placed on the affected host.
- The protected containers are removed from the affected host.
- Containers automatically migrate to one or more new hosts, provided that there are sufficient available resources, including any applicable placement constraints, as described in About Tags and Tenant/Project Tags and Constraints.
Node Migration: This use case applies to a scenario where the hosts are functioning properly but are overburdened. In this case, the Tenant Administrator or Platform Administrator can add new hosts as described in EPIC Worker Installation Overview. Containers can then be migrated to the new hosts on a container-by-container basis, as described in Viewing and Migrating Virtual Nodes. Placement constraints apply to this type of container migration as well.

The following storage systems are supported for persistent storage:

CEPH RBD
NFS
ScaleIO
Local MapR

Migrating a container/virtual node has the following effects:

The cluster to which the container belongs will be impacted because the container will not be executing jobs during the migration process.
Any jobs or ActionScripts running on a cluster with one or more migrating containers will be lost and must be run again after completing the migration.
Any data residing in non-persistent storage directories of a container being migrated will be lost.
Any external hosts that have access to HPE Ezmeral Container Platform will be unable to access the affected containers until the migration process completes.
Migrated containers maintain their configuration and IP addresses.