Storage Spaces Direct Step by Step: Part 2 Troubleshooting

Testing guidelines for S2D cluster network and storage components in a common configuration. Instructions on health monitoring included. Part 2 of 4.

Introduction

This article is the second of a four-part series.

  1. Configuring Core Cluster
  2. Storage Clusters [this article]
  3. Configuring Storage Network Infrastructures
  4. Managing Storage Clusters

Test Health & Performance

This article provides a guideline for performing health tests for a S2D cluster. The most common configuration issue for a new (S2D) is the networking and storage components. The tests documented here only cover the and storage components and provide a starting procedure for doing more comprehensive testing and .

The testing procedures covered are:

  1. Stress load each network port and test for performance and errors. Often network issues present as reduced network speed caused by hardware problems in the network interconnecting each physical server.
  2. Perform a simple storage stress test on each cluster node. This uses a simple file copy or performance tool between the physical node to the S2D Storage system.
  3. Load the cluster with a group of VMs running stress tests using VMFleet. This will simulate a large number of VM application workloads.

Instructions on monitoring are included at the end of this article.

Create Cluster Shared Volume

Create the volume with two or three storage tiers based on your hardware configuration and the types of physical disks in place. The following storage performance tests can optionally be performed on various Storage Spaces Virtual Disk configurations. Since each Virtual Disk configuration may have different performances, this procedure is a valuable method to verify storage volumes perform as desired.

Documentation on creating Storage Spaces Volumes

  1. Planning Volumes on Storage Spaces Direct
  2. New-Volume
failover cluster manager console
Failover Cluster Manager Console

Once the Cluster Shared Volume is created, it should appear in the Failover Cluster Manager console on the Disks section.

The local path on each physical server will be C:ClusterStorageVolume1, and in this case C:ClusterStorageVMSpace1.

Storage tests use the direct local file path rather than accessing storage via a share.

Test Server-to-Server

This step is optional but useful. Often network issues can be caused by network cable issues or Top of Rack (TOR) switches in the path between cluster nodes. This step will perform network tests on each Network Interface Card (NIC) port.

The test can be performed using either a network performance software utility or by performing file copies to and from the cluster node.

Simple File Copy Tests

This test will require creating a network share on the boot disk on the target server. This will bypass Storage Spaces, isolating the environment to only the servers and local area network.

  1. Each network port will require an IP address. The server network interfaces will not have IP addresses by default. This IP address will be used to isolate network traffic to the specific NIC port.
  2. Create a share on the C: drive on each cluster node
  3. Select a large file and copy this file to and retrieve file to the share
  4. Multiple copy operations will likely be required to place a large load on the high-speed network interfaces.
  5. Monitor the network performance and error counters on both source and target cluster server.

Info

A simple method to monitor network speed is documented below at Monitor Network Speed with Resource Monitor.

Monitoring network performance and error counters is documented below in section Monitor Performance and Error Counters with Performance Monitor.

Network Test Utility

Numerous network software tools are available that will both create a load of network traffic and provide performance statistics.

  1. Configure the utility as required to send traffic between a source cluster node to a target network node. Some software utilities have a software component on both the source and target server node. Others require a network share.
  2. Run the performance tests.

Info

iPerf is a common network performance tool, which is available at the iPerf – The ultimate speed test tool for TCP, UDP and SCTP.

Review Test Results

The file copies between servers should be fast and consistent. Network errors will cause sporadic performance drops and spikes in the Windows Resource Monitor graphs.

Run Windows Performance Monitor and add the following counters:

  • RDMA Activity
  • Network Adapter
  • Network Interface
  • Configuring Windows Performance Monitor and adding counters is documented below.

Look at the following Network Adapter Counters and Network Interface Counters

  • Output Queue Length
  • Packets Outbound Discarded
  • Packets Outbound Errors
  • Packets Received Discarded
  • Packets Received Errors

Network Queue Length should be low. Larger queue lengths occur when network congestion causes packets to queue until processed.

Packet Discarded and Errors should be very low and usually zero for healthy networks.

Info

Often running simple tests described in this section will not be able to saturate the high-speed LAN networks. Creating the large volume of network traffic usually requires multiple clients running in parallel, such as running a group of VMs working simultaneously. A process which does this is described in a section below.

Monitor Performance

This section will describe storage performance test procedures using both simple file copies and with the Microsoft Disk Speed utility.

File Copy Tests

In this section, we describe a simple storage test by copying large files from a S2D storage cluster node to the folder path.

The direct path uses the local path: C:StorageCluster

Copy data on each cluster node to the local path for the volume being tested. The data being written and read will be processed by and distributed across all storage nodes.

Monitor the read/write speed during the file copy for speed and consistency. Using File Explorer will show a performance graph during copies. The spikey and sporadic performance will likely indicate configuration or health issues.

Review Physical Disks Health

As the data copies are in process, run the following PowerShell command to show the health status of Storage Spaces disk drives:

Get-StoragePool -IsPrimordial $False | Get-PhysicalDisks

This command will create a list of disk drives and the health status of each. The HealthStatus field of each disk should be “Healthy”.

The following PowerShell shows the physical disk SMART counters

Get-StoragePool -isPrimordial $false | Get-PhysicalDisk | Get-StorageReliabilityCounter | fl

The following counters should be monitored periodically.

  • ReadErrorsCorrected : 0
  • ReadErrorsTotal : 24
  • ReadErrorsUncorrected : 24
  • WriteErrorsCorrected :
  • WriteErrorsTotal :
  • WriteErrorsUncorrected :

Some errors may be expected. These error counters incrementing too quickly or accelerate can indicate a disk will soon critically fail.

Documentation on Storage Spaces health states, Troubleshoot Storage Spaces Direct health and operational states.

Microsoft Disk Speed Utility

Microsoft has a utility available on GitHub called Disk Speed (DiskSpd). DiskSpd can perform a variety of disk performance benchmark tests and creates a performance report.

Download

Download DISKSPD from GitHub then refer to the DiskSpd Storage Performance Tool documentation.

Download DiskSpd and run this utility on one of the cluster storage nodes. Use the Storage Spaces volume as the target disk to test.

An example command to test random concurrent reads of 4KB blocks:

diskspd -c2G -b4K -F8 -r -o32 -W60 -d60 -Sh C:ClusterStorageVMSpace1

VMSpace1 is the name of the volume. Use your volume name.

Review Test Results

As the storage performance tests are in process, monitor the following Windows Performance Monitor counters:

  • Cluster CSVFS
  • RDMA Activity
  • Storage Spaces Tier
  • Storage Spaces Virtual Disk
  • Storage Spaces Write Cache
  • Cluster Storage Cache Stores

The counters for Storage Spaces Direct are numerous and complex. Review the counter values for reasonable values. The Cluster Storage Cache Stores will show Cache performance.

The RDMA Activity will report on the RDMA health and activities.

More information on Storage Performance Counters at Windows Performance Monitor Disk Counters Explained.

Workload Stress S2D Cluster – VMFleet

Microsoft has provided a software package which will create a workload for S2D hyperconverged systems. VMFleet will launch a group of VMs running Diskspd and place a load on the S2D cluster network and storage subsystems.

VMFleet is part of DISKSPD available on GitHub. Instructions on installing and running VMFleet can be found at Leverage VM Fleet Testing the Performance of Storage Space Direct.

VMFleet can be configured to run for an extended time and apply as much of a stress workload as desired. At the end of the tests, VMFleet will return a report of performance calculations including overall storage performance measurements.

Monitor Network Speed with Resource Monitor

Windows Resource Monitor can be launched on any of the storage node desktops.

Run: All Programs → Windows Administrative Tools → Resource Monitor

Select the Network tab to display the network activities for each physical and virtual network interface.

resource monitor network
Resource Monitor Network

Monitor Performance and Error Counters with Performance Monitor

The following steps show how to configure Windows Performance Monitor to select and display specific system counters.

Windows Performance Monitor can be launched on any of the storage node desktops.

  1. Start Windows Performance monitor as Administrator.

Run: All Programs → Windows Administrative Tools → Performance Monitor

performance monitor administrator
Run Performance Monitor as Administrator
  1. Select Performance Monitor
performance monitor
Performance Monitor
  1. Click on the green plus icon to add counters to the Performance Monitor.

After clicking the icon, the Add Counters form will appear.

performance monitor add counter
Add Counter to Performance Monitor
  1. To add counter groups, select the counter group, then click Add.
performance monitor counters
Performance Monitor Counters

Continue selecting another group or click OK to display all the selected counters.

Info

Individual counters can be selected from a counter group.

  1. The default display is a line graph of each counter.
performance monitor counter graph
Performance Monitor Counter Graph
  1. Click on the display type pulldown arrow and select Report.
performance monitor choose report
Choose Report in Performance Monitor

The Report layout will display all of the active counters selected.

performance monitor report
Performance Monitor Report

References

Leave a Reply

Your email address will not be published. Required fields are marked *