How to deploy gMSA on AKS with Terraform

The other day I posted a blog on how to deploy an AKS cluster that is ready for Windows workloads using Terraform. Today, I wanted to expand that to include gMSA, which is a highly requested feature from Windows customers running containers on AKS. Obviously, the complexity of the Terraform template grows a lot, so this blog post will provide the details on what is needed for that to work.

gMSA requirements and items outside of Terraform scope

Before diving into the Terraform template, it's important to review the gMSA pre-requisites and what is not part of the scope of Terraform when deploying the Azure resources:

  • Azure resources: As part of the gMSA environment, we need different Azure resources, such as an AKS cluster, Azure Virtual , Azure Key Vault, Azure Managed Identity, access for the Managed Identity to the Azure Key Vault, a secret in the Azure Key Vault containing the standard user that retrieves the GMSA, and a . All of these will be created using the Terraform template.
  • Non-Azure resources: To use gMSA, you will need to manually configure Active Directory in the Domain Controller . This includes installing the AD role, creating the a new forest with a root domain, and enabling gMSA in AD via the KDS feature. You also need to install the gMSA credential spec on your AKS cluster. These two operations are very sensitive, and the credential spec needs to be configured according to your environment.

A few notes on the Terraform template:

  1. The template deploys a Domain Controller. If your environment has a Domain Controller with Active Directory configured, you can remove this section of the Terraform template. Keep in mind that your AKS cluster needs to be configured with the IP address of the DC, so you will need to change that in the template. Also, make sure you read my other blog post with networking and AD considerations for gMSA on AKS.
  2. The script uses the same username and password for the Windows nodes on AKS and the Domain Controller. This is just so it's easier for the deployment, but there's no need to use the same info and you can update the template to use a different one.
  3. The standard user account stored in Azure Key Vault doesn't exist in AD at the moment on which this script runs – the DC is being created by the script. Make sure you create the user account with the same username and password as you provided when you deployed the template.

Since this is a more complex Terraform template, I invite you to collaborate on it and if you see an opportunity for improvement, please send your suggestions!

gMSA on AKS Terraform template

The Terraform deployment has two files. The main.tf file contains the resources to be deployed. The variables.tf file contains the variables used during the deployment. Note that some of the variables' values are not set in the file, both because you need to define it for the deployment and because some are sensitive, such as passwords.

Here is the main.tf file:

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "=3.55.0"
    }
  }
}
data "azurerm_client_config" "current" {}
data "azurerm_subscription" "current" {}
provider "azurerm" {
  features {
    key_vault {
      purge_soft_delete_on_destroy    = true
      recover_soft_deleted_key_vaults = false
    }
  }
}
#Creates Azure Resource Group
resource "azurerm_resource_group" "rg" {
  name     = var.resource_group
  location = var.location
}
#Creates Azure User Assigned Managed Identity
resource "azurerm_user_assigned_identity" "managed_identity" {
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  name                = "gmsami"
}
#Creates Azure Key Vault
resource "azurerm_key_vault" "akv" {
  name                        = "viniapgmsatest"
  location                    = azurerm_resource_group.rg.location
  resource_group_name         = azurerm_resource_group.rg.name
  tenant_id                   = data.azurerm_client_config.current.tenant_id
  soft_delete_retention_days  = 90
  purge_protection_enabled    = false
  sku_name = "standard"
}
#Assign reader role to MI on Azure Key Vault
resource "azurerm_role_assignment" "mi_akv_reader" {
  scope                = azurerm_key_vault.akv.id
  role_definition_name = "Reader"
  principal_id         = azurerm_user_assigned_identity.managed_identity.principal_id
}
#Define AKV access policy for MI
resource "azurerm_key_vault_access_policy" "akvpolicy" {
  key_vault_id = azurerm_key_vault.akv.id
  tenant_id    = data.azurerm_client_config.current.tenant_id
  object_id    = azurerm_user_assigned_identity.managed_identity.principal_id
  secret_permissions = [
    "Get"
  ]
}
#Define AKV access for terraform session
resource "azurerm_key_vault_access_policy" "tfpolicy" {
  key_vault_id = azurerm_key_vault.akv.id
  tenant_id    = data.azurerm_client_config.current.tenant_id
  object_id    = data.azurerm_client_config.current.object_id
  secret_permissions = [
    "Get",
    "List",
    "Set"
  ]
}
#Creates the secret on Azure Key Vault (careful: this is the standard user on your AD)
resource "azurerm_key_vault_secret" "gmsa_secret" {
  name         = "gmsasecret"
  value        = "${var.netbios_name}${var.gmsa_username}:${var.gmsa_userpassword}"
  key_vault_id = azurerm_key_vault.akv.id
}
#Creates Azure Virtual Network
resource "azurerm_virtual_network" "vnet" {
  name                = "gmsavnet"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  address_space       = ["10.0.0.0/16","10.1.0.0/26"]
}
#Creates the gMSA Subnet - both pods and Domain Controller will use this subnet
resource "azurerm_subnet" "gmsasubnet" {
  name                 = "gmsasubnet"
  resource_group_name  = azurerm_resource_group.rg.name
  virtual_network_name = azurerm_virtual_network.vnet.name
  address_prefixes     = ["10.0.0.0/16"]
}
#Optional: Creates the Azure Bastion vNEt for RDP into DC01
resource "azurerm_subnet" "AzureBastionSubnet" {
  name                 = "AzureBastionSubnet"
  resource_group_name  = azurerm_resource_group.rg.name
  virtual_network_name = azurerm_virtual_network.vnet.name
  address_prefixes     = ["10.1.0.0/26"]
}
#Creates a vNIC for the DC VM - remove this if you have an existin DC
resource "azurerm_network_interface" "dc01_nic" {
  name                = "dc01_nic"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  ip_configuration {
    name                          = "dc01_nic"
    subnet_id                     = azurerm_subnet.gmsasubnet.id
    private_ip_address_allocation = "Dynamic"
  }
}
#Creates the DC VM - remove this if you have an existing VM
#You need to connect to this VM and finish the Active Directory configuration
resource "azurerm_windows_virtual_machine" "dc01" {
  name                = "DC01"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
  size                = "Standard_D4s_v3"
  admin_username      = var.win_username
  admin_password      = var.win_userpass
  network_interface_ids = [
    azurerm_network_interface.dc01_nic.id
  ]
  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
  }
  source_image_reference {
    publisher = "MicrosoftWindowsServer"
    offer     = "WindowsServer"
    sku       = "2022-Datacenter"
    version   = "latest"
  }
}
#Creates AKS cluster with Windows profile and gMSA enabled, and uses existing vNet
#This is dependable on DC01 VM as we need to set up the DNS primary IP for the Windows nodes
resource "azurerm_kubernetes_cluster" "aks" {
  name                = "ContosoCluster"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  dns_prefix = "contosocluster"
  default_node_pool {
    name           = "lin"
    node_count     = var.node_count_linux
    vm_size        = "Standard_D2_v2"
    vnet_subnet_id = azurerm_subnet.gmsasubnet.id
  }
  windows_profile {
    admin_username = var.win_username
    admin_password = var.win_userpass
    gmsa {
      dns_server = "10.0.0.4"
      root_domain = var.Domain_DNSName
    }
  }
  network_profile {
    network_plugin = "azure"
    service_cidr = "10.240.0.0/16"
    dns_service_ip = "10.240.0.10"
  }
  identity {
    type         = "SystemAssigned"
  }
  depends_on = [
    azurerm_windows_virtual_machine.dc01
   ]
}
#Creates Windows node pool on AKS cluster
resource "azurerm_kubernetes_cluster_node_pool" "win" {
  name                  = "wspool"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.aks.id
  vm_size               = "Standard_D4s_v3"
  node_count            = var.node_count_windows
  os_type               = "Windows"
}
output "kube_config" {
  value = azurerm_kubernetes_cluster.aks.kube_config_raw
  sensitive = true
}
#Assigns the User assigned Managed Identity to the Windows node pool
resource "null_resource" "identity_assign" {
  provisioner "local-exec" {
    command = "az vmss identity assign -g MC_${azurerm_resource_group.rg.name}_${azurerm_kubernetes_cluster.aks.name}_${azurerm_resource_group.rg.location}  -n aks${azurerm_kubernetes_cluster_node_pool.win.name} --identities /subscriptions/${data.azurerm_subscription.current.subscription_id}/resourcegroups/${azurerm_resource_group.rg.name}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/${azurerm_user_assigned_identity.managed_identity.name}"
  }
  depends_on = [
    azurerm_kubernetes_cluster_node_pool.win
   ]
}
#Update the VMSS instances
resource "null_resource" "vmss_update" {
  provisioner "local-exec" {
    command = "az vmss update-instances -g MC_${azurerm_resource_group.rg.name}_${azurerm_kubernetes_cluster.aks.name}_${azurerm_resource_group.rg.location}  -n aks${azurerm_kubernetes_cluster_node_pool.win.name} --instance-ids *"
  }
  depends_on = [
    null_resource.identity_assign
   ]
}
#Optional: Creates a public IP address for the Azure Bastion host
resource "azurerm_public_ip" "bastion_ip" {
  name                = "bastionip"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  allocation_method   = "Static"
  sku                 = "Standard"
}
#Optional: Creates a Bastion Host to connect to the DC VM via RDP
resource "azurerm_bastion_host" "gmsa_dc_bastion" {
  name                = "gmsabastion"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  ip_configuration {
    name                 = "configuration"
    subnet_id            = azurerm_subnet.AzureBastionSubnet
    public_ip_address_id = azurerm_public_ip.bastion_ip.id
  }
}

Here is the variables.tf file:

variable "resource_group" {
    type = string
    description = "Resource group name"
    default = "58TestRG"
}
variable "location" {
    type = string
    description = "RG and resources location"
    default = "East US"
}
variable "node_count_linux" {
    type = number
    description = "Linux nodes count"
    default = 1
}
variable "node_count_windows" {
    type = number
    description = "Windows nodes count"
    default = 2
}
variable "win_username" {
  description = "Windows node username"
  type        = string
  sensitive   = false
}
variable "win_userpass" {
  description = "Windows node password"
  type        = string
  sensitive   = true
}
variable "Domain_DNSName" {
  description = "FQDN for the Active Directory forest root domain"
  type        = string
  sensitive   = false
}
variable "netbios_name" {
  description = "NETBIOS name for the AD domain"
  type        = string
  sensitive   = false
}
variable "SafeModeAdministratorPassword" {
  description = "Password for AD Safe Mode recovery"
  type        = string
  sensitive   = true
}
variable "gmsa_username" {
  description = "Username for the standard domain account"
  type        = string
  sensitive   = false
}
variable "gmsa_userpassword" {
  description = "Password for standard domain account"
  type        = string
  sensitive   = true
}

With the two files in the same folder, you can run:

az login
az account set 
terraform init
terraform apply

I did not include the -auto-approve flag as you probably want to confirm that everything will run as you expected. Once you have the plan for the deployment, type yes and continue with it.

Now, let me go over the details of this template:

We start by creating a Resource group. The information about name and location for the RG are in the variables.tf file.

Next, we create the auxiliary Azure services (Key Vault and user assigned managed identity). You could use the regular identity from the AKS cluster once it's deployed. I decided to go with a new one for testing and learning purposes. We then assign the managed identity a reader role to Azure Key Vault, and give it the “Get” permission for secrets. This is what will allow the managed identity to read the standard user account to then connect to AD. We then create the secret on Key Vault. Note that we also give the Terraform session itself list and set permissions to the Key Vault, so it can write the value of the standard user account into the secrets of that Key Vault.

Moving on, we create the Azure virtual , and two subnets. One for the AKS cluster and Domain Controller , and another for Azure Bastion. This last one is optional as you might not need it, but I added it just in case.

To create the Domain Controller , we create a interface associated with the gMSA subnet, and then create the Windows VM on Azure with the vNIC associated with it. Here you can change the size and disk of the VM – depending on your environment and cost limitations. The image used here is a 2022 image. While that's the recommended version, this deployment would work with 2019. Keep in mind that you need to /connect into this VM to finish the Active Directory configuration – this is outside the scope of this template.

We then finally create the AKS cluster. This is a standard AKS cluster with a simple default node pool with Linux nodes. Note that the subnet associated with it is the gMSA subnet created earlier. We also use a Windows profile for this cluster and already configure gMSA. IMPORTANT: At this moment, you must indicate the gMSA DNS server and FQDN of the AD root domain. If you have an existing DC that is a DNS server, you should pass on the internal IP address of that machine. This is just like adding a primary (and secondary) DNS server on the IP configuration of a Windows instance. However, if you are using this template for deploying your DC, do not change the DNS Server here. Since the DC VM was the first to be created in the subnet, it gets the first available IP address, which in this case is 10.0.0.4, hence the configuration on the template. For that to work, I set the “depends_on” flag on this resource. (In other words, the AKS cluster is created after the DC VM). Next, the Windows node pool is created with standard configurations. Here you can change the number of Windows nodes and the VM size.

The final steps in the template are to assign the managed identity to the Virtual Machine Scale Set (VMSS) of the Windows node pool and then update it. Since the managed identity has access to the Azure Key Vault, and we're associating the managed identity to the VMSS, all nodes in that VMSS will be able to access the secret and authenticate with AD.

Post installation steps

The template does the heavy lifting of creating the Azure resources for the gMSA to work. As mentioned before, there are additional steps, so let me just go over it once again:

  • Finish the AD preparation on the DC VM.
    • This includes deploying Active Directory itself and configuring the KDS service.
    • You need to create the gMSA account which will be used in the credential spec.
    • You also need to create the standard user account to be stored in the Axure Key Vault.
  • Deploy the credential spec.
    • This is environment and application specific. Just keep in mind that some parameters used in the Terraform template are also needed in the credential spec.

Conclusion

It is possible to deploy a gMSA application on Windows containers on an AKS cluster. Automating this process reduces the chances of errors in the future and allows you to set up a CI/CD pipeline. This blog post covered the Terraform deployment of Azure resources for gMSA on AKS to work. It deploys all the Azure resources and configures it, while some environment specific actions are still needed.

I hope this is helpful. No doubt you'll need to modify the template to your environment. Luckly, you can leverage the ITOpsTalk repo to do that – and even let us know if you have any feedback it by submitting a PR! Let us know what you think!

 

This article was originally published by Microsoft's Entra (Azure AD) Blog. You can find the original article here.