By James Kehr, Networking Support Escalation Engineer
There are, as of this writing, five Container network types in Windows: NAT, Transparent, L2bridge, Overlay, and L2tunnel.
This part of the article series will cover the NAT network type. Part 6, the conclusion, will cover Transparent and L2bridge, plus Hyper-V isolation. Overlay and L2tunnel will not be discussed. Overlay because Docker needs to be in swarm mode for that to work, and I’m not Docker-savvy enough to setup swarm mode. L2Bridge will not be discussed because that is exclusive to the Microsoft Cloud Stack, so I’ll let the Cloud Stack folks write about that.
Which brings us back the Container NAT network.
- Auto-magically setup when Containers are installed.
- Works well out-of-the-box.
- Doesn’t require any configuration for basic dev workload.
- Can require an insane amount of management for production needs, especially when running multiple Containers on a single host for the same service (like multiple web applications).
- Not good for latency sensitive applications, as the NBLs (see Part 2) have a longer travel distance inside of Windows, plus a trip through WinNAT.
- Higher resource needs on the host.
- Did I mention the management nightmare?
What is a NAT?
NAT stands for Network Address Translation. It’s the technology that allows 2 desktops, a laptop, four tablets, and three smartphones to get Internet access with the single public Internet address provided by your ISP. NAT works by taking your inside IP address, usually something in the 192.168.1.xxx IP address range, and changes it to the single outside IP address when you need to reach an Internet resource.
NAT does this by creating a session table. The session table is a collection of inside to outside IP address combinations. For example, when going to your preferred search engine – which is, of course, Bing.com – the inside IP address and port of your computer gets matched to an available outside IP address and port combination, with the destination address (a Bing.com address like 188.8.131.52) and port (TCP 443) added to complete the uniqueness of the session. This is what allows NAT to transfer data back and forth between multiple systems using only a single public IP address.
Time for a pretty diagram.
The same basic process happens inside of Windows. Except the NAT portion is handled by a subsystem called, WinNAT. Think of it like a virtual home router sitting between the Windows host and Containers. The inside Container address get turned into an outside address, the pairing is written to a session table inside of Windows, and that allows Containers to reach the world through WinNAT-based networking. This is needed because addresses on the inside of the NAT are isolated. Meaning nothing from the outside can reach the inside, and nothing from the inside can reach the outside, without WinNAT’s permission.
Because WinNAT is a newer Windows subsystem it has an ETW provider, Microsoft-Windows-WinNAT. Modify the provider list from the PowerShell steps in Part 2, run a capture on the Container host, and you can see all the address translation magic happening.
<p># the primary WNV ETW provider.<p>[array]$providerList = 'Microsoft-Windows-Hyper-V-VmSwitch', 'Microsoft-Windows-WinNAT'<p>
Here’s an example of some WinNAT traffic. Look closely at the events and you’ll spot the first sign of Compartment ID’s.
|Microsoft_Windows_WinNat||TCP binding created. Internal transport addr: 172.23.183.39:49161 (CompartmentId 1), External transport addr 192.168.1.70:1227, SessionCount: 0, Configured: False|
|Microsoft_Windows_WinNat||TCP binding session count updated. Internal transport addr: 172.23.183.39:49161 (CompartmentId 1), External transport addr 192.168.1.70:1227, SessionCount: 1, Configured: False|
|Microsoft_Windows_WinNat||TCP session created. Internal source transport addr: 172.23.183.39:49161 (CompartmentId 1), Internal dest transport addr: 184.108.40.206:443, External source transport addr 192.168.1.70:1227, External dest transport addr 220.127.116.11:443, Lifetime: 6 seconds, TcpState:Closed/NA|
|Microsoft_Windows_WinNat||TCP session state updated. Internal source transport addr: 172.23.183.39:49161 (CompartmentId 1), Internal dest transport addr: 18.104.22.168:443, External source transport addr 192.168.1.70:1227, External dest transport addr 22.214.171.124:443, Lifetime: 120 seconds, TcpState: Internal SYN received|
|Microsoft_Windows_WinNat||TCP session lifetime updated. Internal source transport addr: 172.23.183.39:49161 (CompartmentId 1), Internal dest transport addr: 126.96.36.199:443, External source transport addr 192.168.1.70:1227, External dest transport addr 188.8.131.52:443, Lifetime: 120 seconds, TcpState: Internal SYN received|
|Microsoft_Windows_WinNat||NAT translated and forwarded IPv4 TCP packet which arrived over INTERNAL interface 8 in compartment 1 to interface 2 in compartment 1.|
The Binding Issue
One of the benefits of Containers is that you can run multiple Containers per host. Dozens to hundreds of Containers on a single host, depending the workload and host hardware. The main problem with the NAT network is binding. Network-based services normally have a common network port. Web sites and services use TCP ports 80 and 443. SQL Server uses TCP port 1443 as the default. And so on.
The problem: Each port to IP address binding must be unique.
Let’s say our example host is running 100 containers: 75 web sites, 10 Minecraft servers, and 15 SQL Server Containers. All the web-based Containers are going to want to use TCP ports 80 and 443 because browsers are built to use those ports. The SQL Servers will use TCP 1433 by default. Minecraft, TCP port 25565.
NAT’ing means that the host only needs a single IP address for outgoing Container traffic. All 100 Containers could connect to anything using just that one host address. For clients to reach a Container service, however, something called port forwarding is needed. Port forwarding allows devices on the outside of the NAT to reach a service on the inside of the NAT, and works like a NAT in reverse.
To host 20 Minecraft servers on port 25565 the Container host needs 20 IP address. 75 IP addresses for the web site Containers. 15 for the SQL Servers for public access. While there can be some overlap, each time you need to bind port 80, or 1433, or 25565 to a new Container, the host needs another IP address. This goes back to the uniqueness requirement for a binding. Port 80 cannot be bound twice to the same IP address.
Adding a bunch of IP addresses to a server isn’t the complex part. Getting those unique IP addresses to work with NAT requires an additional feature called a PortProxy. The PortProxy feature is a type of port forwarding. This feature can forward network traffic from one IP:port combination to a different IP:port combination inside of Windows. This allows an administrator to use PortProxy to route traffic to an individual Container, giving the NAT type network the ability to host many Containers publicly.
This practice is highly discouraged in a production environment.
Not Recommended for Production
The first problem with NAT in production is the double network stack connection. The PortProxy creates one network connection on the host. The second connection is created on the Container itself (which is also the host, kind of). Ultimately, there is one connection between the client and host, and a second between the host and the Container. While this technically works, it’s an unnecessary mess.
Then there’s the administrative nightmare trying to keep all the PortProxy settings straight, which can be a chore since the only mechanism that exposes PortProxy is the legacy netsh command. Perhaps the biggest issue is the latency. All that bouncing around inside of Windows adds precious microseconds to each packet. That doesn’t seem like a lot, but it adds up fast.
This is an example of a client connecting to a host with a PortProxy and a NAT Container network. The TCP SYN arrives from the client and a TCP/IP connection is created on the host.
Followed by the second connection from the host to the Container. The traffic goes from the host, across the vmSwitch, bypassing WinNAT, and directly to the Container. Or as directly as the NBL can travel through the vmSwitch. A second network stack connection is then established on the Container.
From packet arrival on the host to the packet arriving at the Container’s network stack is about 0.6783 milliseconds, or 678 microseconds, in this example. This makes NAT + PortProxy about 15-20 times slower than standard vmSwitch traversal. Which is another reason why NAT is good for testing, less so for production purposes.
The final article in this series will cover the preferred production Container network type, transparent mode. I’ll talk briefly about L2Bridge mode, because it’s just transparent with a catch, and end the series with a brief explanation about what happens when you throw Hyper-V Container isolation into the mix.