Draining Nodes for Planned Maintenance with Windows Server 2012

First published on MSDN on Apr 03, 2012

 2012 Clusters are easier to manage and maintain with the new “Node Drain” and “Resume with Failback” features. This enables nodes to be gracefully drained for planned maintenance. This functionality is part of the infrastructure that enables “Cluster Aware Updating” (CAU) for patching nodes in a cluster.


Bringing an individual node down for planned maintenance is a common administrative task, to for example install a Service Pack or hardware upgrades.

On a 2008 R2 Cluster, this is a manual process where you place a cluster node in PAUSED state, and then move individual Roles (workloads) to the other nodes in the cluster as outlined in

this KB article


In  2012 conducting planned maintenance on Clusters is dramatically simplified, as these steps are automated in the Node Drain (or Node Maintenance Mode) feature.

Node Drain

Using Node Drain you can automate moving the Roles (workloads) off of a cluster node. Think of Node Drain is to as an enhanced, workload aware Node Pause.

Steps automated by Node Drain:

1)      The cluster node is put in a PAUSED state, which prevents other workloads hosted on other nodes from moving to the node.

2)      The Roles (workloads) currently owned by the cluster node, are sorted according to their Priority order. (Priority of Roles is another new Failover Clustering functionality in Windows Server 2012.)

3)      The Roles are then distributed to the other active nodes in the cluster in priority order. Node Drain works with all workloads running on the cluster. For virtual machines, it leverages live migrations and memory-aware intelligent placement.

4)      When all the Roles are moved off of the cluster node, Node Drain operation is completed.

Initiating Node Drain through Failover Cluster Manager:

Initiating Node Drain through Failover Cluster Manager snap-in is a simple one-click operation:

  1. Open

    Failover Cluster Manager

  2. On the left hand pane navigate to

  3. Right-click on the node you wish to drain
  4. Under



    Drain Roles

    Note: If you select “Do Not Drain Roles”, then it would simply “PAUSE” the node similar to Windows Server 2008 R2.

    Initiating Node Drain through PowerShell:

    You can initiate Node Drain using the “Suspend-ClusterNode” PowerShell command.

    There are additional advanced options available through PowerShell to manage draining nodes, which includes:




    Initiates Node Drain


    The destination node where all drained roles will be moved/live migrated to


    Moves the roles off of the draining node even if the Group cannot move either because no other node can host this group or it is in locked state


    Defines an amount of time to wait for the Node Drain operation to begin

    Status of Drained Node:

    When a Node Drain is initiated, the command returns the NodeDrainStatus property, indicating that the cluster node has begun the node drain operation. You can track the status of the on-going node drain operation using these two cluster node common properties:

    Node Common Property




    0 – Not Initiated

    This property indicates the current status of the Node Drain.

    1 – In Progress

    2 – Completed

    3 – Failed


    Cluster Node Id

    ID of the cluster node which all the workload will be moved to. This ID is set when you use the TargetNode parameter.

    Node Drain Failure:

    Node Drain will fail if a virtual machine's Live Migration fails due to some reason, or if a Role cannot be moved as the node being drained is the last possible owner node for the Role.

    Upon encountering an error with an individual role, the node drain operation will continue to drain the remaining roles hosted on the node. The status of node drain would be set to “3” only after the remaining roles are drained from the cluster node.

    Restarting Node Drain and optionally you can specify “-ForceDrain” parameter to override any errors encountered during the initial node drain.

    Rebooting a Drained Node:

    Once a node is drained, it will remain in the PAUSED state across reboots to prevent any roles from moving to that node, until the node is resumed. This keeps the node drained for the duration of the maintenance window.

    Node Resume with Failback

    When a node is drained, the cluster will remember the workload(s) that were moved off of the node. When resuming the node after maintenance, you have the option of moving back all the workload(s) to the cluster node.  This will restore the cluster back to the original state it was in before the maintenance.

    Steps automated Node Resume with Failback:

    1)      The cluster node is removed from PAUSED state – this enables workload(s) to move to this node.

    2)      The workload(s) that were originally drained from the node are moved back using Failback.

      1. If a failback policy is configured to only failback during a specific failback window, resume will honor the setting and the roles failback will be delayed until the failback window.

    Resuming Node through Failover Cluster Manager:

    1. Open

      Failover Cluster Manager

    2. On the left hand pane navigate to

    3. Right-click on the node you wish to resume
    4. Under



      Fail Roles Back

    Note: If you select “Do Not Fail Roles Back”, then it would simply “RESUME” the node similar to Windows Server 2008 R2.

    Resuming Node through PowerShell:

    You can resume a node using the Resume-ClusterNode PowerShell command.

    There are additional advanced options available through PowerShell to manage resuming nodes, which includes:






    – Don't Failback workload


    – Failback immediately


    – Failback during configured Window

    This defines the type of failback to expect after node is resumed.

    Additional Information:

    Cancelling Node Drain:

    Draining a node may be a long running operation.  A Node Drain that is in progress can be cancelled by initiating a Node Resume. This will cause the Node Drain operation to stop, and if Fail Roles Back is specified, the drained workloads which were moved will be moved back to the cluster node.

    Configuring the Move Type for a Virtual Machine

    Node Drain and Node Resume with Failback will leverage Live Migration for virtual machines so that a node can be drained with no downtime. Live Migration may at times be a long running operation, and there may be scenarios where you wish to quickly drain a node. Node draining provides the flexibility to allow configuration of how VMs should be moved, using either Live Migration or Quick Migration.

    You also have the granular control to configure the move type to be used based on the priority setting of the VM.  This is configured with the Resource Type property private property NodeDrainMoveTypeThreshold:





    (Private Property)

    Priority of Virtual Machines

    Virtual Machines with Priority equal to or higher than the specified priority will be moved using Live Migration.

    Virtual Machines with Priority lower than the specified priority will be moved using Quick Migration.

    Example PowerShell commands to view or modify this private property:

    Creating property:

    Get-ClusterResourceType “” | Set-ClusterParameter -Create @{“NodeDrainMoveTypeThreshold”=”3000”}

    Modifying created property:

    Get-ClusterResourceType “Virtual Machine” | Set-ClusterParameter -Multiple @{“NodeDrainMoveTypeThreshold”=”3000”}

    Reading property:

    Get-ClusterResourceType “Virtual Machine” | Get-ClusterParameter NodeDrainMoveTypeThreshold


    Node Drain is a great new time-saving feature in Windows Server 2012 Failover Clustering for conducting planned maintenance. Using this feature, you can easily drain the workload(s) off of a cluster node in a single click, and easily restore them when maintenance operations are completed on the cluster node.


    Amitabh Tamhane                                                                                                           Lokesh Koppolu

    Program Manager II                                                                                                        Principal Development Lead

    Clustering & High Availability                                                                                       Clustering & High Availability

    Microsoft                                                                                                                          Microsoft


    This article was originally published by Microsoft’s Failover Clustering Blog. You can find the original article here.