Network protocols like SMB or NFS are actually remote file systems. They allow a client to mount destination storage as if it were their own local disks and read and write files to them. These protocols rely on underlying transports like TCP and then provide a layer on top for your apps to think that the server 5,000 miles away is truly just your F: volume.
Because remote networks, latency, and storage added up to a much slower experience than local I/O for the first few decades of computing, file servers that implemented these protocols were stuffed with buffers and caches to squeeze better performance out of craptastic spinning disks, as well as help the servers deal with lots of clients fighting for resources simultaneously – a problem your laptop doesn't have.
With the advent of incredibly high throughput storage like SSD, NVME, and NVDIMM and incredibly low latency networking fabrics like RDMA, these simple SMB servers with simple user workloads morphed into Scale-out File Servers, where applications like SQL and Hyper-V want to use them as Scale-as part of a Software-defined Storage fabric that required near-perfect work resiliency and durability. That also meant that we needed to stop using caches and start requiring data to commit to the disk, not memory, for safety.
I know what you're thinking right now: “Doggone software devs, playing games with my data.” Well, disks have buffers too! SCSI and modern SATA drives implement “Force Unit Access” (FUA), which guarantees when an IO is marked for write-through it will land on true stable storage and not the disk's own caches, which are legion in the constant battle of IOPS brochures between hardware makers. Basically, if your drive gets told “you better write this IO and don't reply until it's really written for realsies,” it will.
We first added FUA support for SCSI in Windows Vista. We later added the SATA support and I am here to tell, you dear reader, that we still see SATA disks out there which answer write-through commands then don't actually write through so I recommend sticking with commercial, name brand disks when not using SAS/SCSI storage!
If you suffer from insomnia, I recommend reading more about WRITE DMA FUA EXT (command 3Dh), WRITE DMA QUEUED FUA EXT (command 3Eh), and WRITE MULTIPLE FUA EXT (command CEh).
Ensuring write through on Windows failover clusters
For organizations using those Scale-out File Servers for software-defined datacenter workloads like SQL, you get write-through for free as soon as you create Windows Server 2012, 2012 R2, 2016, and 2019 failover clusters with the File Server resource configured. The Scale-out File Server (SoFS) cluster role enables the “Continuous Availability” flag on every share you create, guaranteeing write-through as part of a larger set of durability and reliability guarantees for your application data workload. When combined with features like Transparent Failover and Persistent Handles, a dead cluster node will not lead to a crashed workload – IOs are persisted and handed over to another node, all while getting FUA.
We also enable the CA share flag on regular file server cluster nodes but admins often disable it for performance reasons, the same way they might avoid SoFS for compatibility reasons. Remember when I wrote the Shakespearean prose to scale out or not to scale out? CA is not designed for copying files but for handing IOs on a file opened then being modified forever because it's a virtual machine or database.
Forcing write-through from Windows 10 clients and Windows Server 2019
That's all fine for a specific workload type, but what if you want to force write-through from a client and not care what your Windows Server OS version and configuration are? Starting in Windows 10 1809 and Windows Server 2019, I've got an answer for you:
NET USE /WRITETHROUGH
When you map a UNC path (with or without a letter) to a remote Windows Server using whatever flavor of SMB and provide the new flag, you will send along the write-through command for any files you create or modify over that session. Now an admin can specify for users' logon scripts or their own mapped drives that any IO happening on there will ignore those caches and guarantee writes for maximum durability when you don't trust the reliability of your servers. And you'll certainly find out how fast your drives really are!
Let's see it in action. First, I map a drive normally and copy a single 10GB file then 3,100 little files that added up to 10GB. I use robocopy for all my tests because it has exact copy times and lets me add efficiency like multi-threaded copies; stay away from File Explorer for any testing. Hell, stay away from it for copies the rest of the time too!
As you can see, my single big file took 48 seconds, and my many small files took a bit longer. A large batch, multi-directory small file copy to a server has tons of overhead that a single sequential IO write large file does not, so just to get that time close I still needed to add multi-threading (another reason to use Robocopy instead of File Explorer every time!)
Now let's try that again with write-through enabled:
We took a hit in time. Still, perhaps worth guaranteeing that some 125Gb file copy wasn't corrupted by a server crash at the last couple seconds of IO the way Murphy always guarantees.
How it looks on the wire
The difference in SMB quite simple: a single flag will now be enabled on every Create File request operation. This tells all subsequent writes to the file to require the storage to support and use FUA.
Nothing else needs to be done and the rest of the SMB conversation will look normal.
As you saw, there is a performance hit to requiring write through, and it can vary a little or a lot. Your mileage will vary here – I am using some pretty quick SSD storage and low latency 10Gb networking without congestion, you might not be so fortunate. You can use the Robocopy /J option on very large files – tens of GB or larger – to offset this a hit a bit, if you're feeling fancy.
Test test test!
The odds of you needing write through for a normal user doing normal user things is pretty low; their files are small, apps like Office often keep local copies, and their window of some IO living in a server buffer just as it replied back to the client but then crashed before committing to disk is really quite small. The overhead for them is pretty light too, however; unless they are copying very large files all the time, they typically won't see a huge downside to you mapping their drives with write through.
Until next time,
Ned “write on!” Pyle