Azure Policy Recommended Practices

Azure Policy has multiple uses including general governance, monitoring setup, security, and compliance. It should not be used to deal with items better handled with role-based access control (RBAC). The following rules codify this:

  • Prohibit anybody and any service from doing something: Azure Policy.
  • Prohibit specific users and service principals from doing something: RBAC.

Note: Many professionals use security and compliance interchangeably. Security encompasses much more than some checkboxes on a compliance spreadsheet; however, complying with Microsoft Cloud Security Benchmark and NIST-880-53 are a decent baseline for enforcing security aspects with Azure Policy.

Heinrich_Gantenbein_1-1681769199762.png

Policy as Code

I am not covering PaC solutions in detail here. The author recommends Enterprise Azure Policy as Code (EPAC). I'm one of the maintainers of Enterprise Azure Policy as Code (EPAC). Not surprisingly, I believe EPAC to be vastly superior to any other PaC solution.

Cloud and most on-prem datacenters are software defined leading to the term (IaC). Azure Policy is a special form of infrastructure; therefore, we call the approach: Policy as Code (PaC). When adopting (or building) a Policy as Code solutions, you should ensure that deployments are:

  • Idempotent (you can run the deployment multiple times without any harm).
  • Desired state (reverses any drift from the last deployment).
  • Co-existence of different teams owning some aspects of Azure Policy.
  • Do not Repeat Yourself for the code/definition (DRY principle) instead of a definition which repeats the same information in multiple files (Write Everything Twice/Thrice – WET anti-pattern).
  • CI/CD following GitHub flow (https://docs.github.com/en/get-started/quickstart/github-flow) or a similar easy branching strategy.
  • Solution can read an existing environment and extract the existing deployment to be ingested later (round-trip capable).
  • Minimize the amount of JSON, Bicep, or Terraform to be written.

Management Groups and Policy Resources

Custom Policy/Initiative Definitions and Policy Assignments need to be deployed at a scope.

Custom definitions should always be deployed at the top Management Group (MG) in each tenant. That MG should be the single MG (no siblings) underneath the “Tenant root group” as recommended by Microsoft (see https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/landing-zone/design-areas) or at the actual “Tenant root group” if you are not following Microsoft's recommendation verbatim.

Policy Assignment must be at this level or lower. They should be at the highest MG group possible. Do NOT assign Policies to subscriptions or resource groups.

Note 1: The landing zones diagram in the link above shows Policy Assignments at the subscription level which is technically incorrect as they are applied at the management group scope and inherited upon subscriptions. The rest of the Cloud Adoption Framework documentaion puts it correctly at Management Group level (see https://github.com/Azure/Enterprise-Scale/wiki/ALZ-Policies).

Note 2: You must set the default location for new subscriptions in a MG at or below the scope where the security-oriented Policy Assignments are deployed to prevent rogue subscriptions from bypassing your security controls enforcement with Azure Policy.

Policy Assignments

Policies are inert elements in Azure until you create a Policy Assignment at a scope. Each assignment should:

  • Define semi-readable short name (limited to 24 characters by Azure)
  • Define a readable displayName (visible in Portal).
  • Define a description.
  • May have metadata, such as a work item id.

“Azure Security Benchmark” (ASB – “name”: “1f3afdf9-d0c9-4c3d-847f-89da613e70a8”) is automatically assigned by for Cloud in each subscription to protect new environments. All Policy effects are set to “Audit”. In most scenarios, you will set some of the effects to “Deny”. It is best to create a new Assignment at a MG (see above in “Management Groups and Policy Resources” to change the effects centrally. Once done you should remove the auto-assigned Policy Assignments to avoid difficulties on overlaps.

It is essential that ASB is assigned to cover all subscriptions. for Cloud depends on this Policy Assignment.

You may assign additional security-oriented and compliance-oriented Initiatives, such as “NIST SP 800-53 Rev. 5” (“name”: “179d1daa-458f-4e47-8086-2a68d0d6c38f”). You should limit yourself to no more than 5 Initiatives (including custom Initiatives). Larger numbers will make maintenance and managing Policy Exemptions extremely difficult.

Assignments containing Policies with Modify or DeployIfNotExists Policies require a Managed Identity (MI). The MI must be granted Azure roles, as specified in the details section of the Policy rule.

I prefer System-assigned Managed Identity SPN (service principal names) since they cannot be used outside a single assignment, eliminating the minimal (Azure provides controls for the usage) threat of malicious usage.

To reduce the number of role assignments, user-assigned MI is used.

Custom Definitions

First question the need for any custom Policy/Initiative definition requested. While the built-in Policies are not perfect, the choices made are often made due to constraints and conflicts between settings and include tradeoffs in risk versus usability. If you still think you need custom definitions, sleep on it and revisit the topic one more time.

If you have multiple tenants, the same definition should be propagated to every tenant (DRY principle). Do not use a separate repo which would cause copy/paste issue (WET anti-pattern).

Policy Definitions

Custom Policy definitions are notoriously hard to design/implement. Debugging issues is even harder. There are a few items which will make the experience easier.

  • The name should be a GUID or a unique name within your company. Using a GUID simplifies contributing the Policy to the community or merging multiple tenants, especially in a merger (companies) scenario.
  • Create a nested properties structure with only the name outside.
  • Supply a displayName for the Policy.
  • Description is highly recommended.
  • version – in metadata; use semantic versioning.
  • category – in metadata, must be one of the categories in the built-in Policies and Policy Sets.

Azure's community contributed Policy definitions repo (https://github.com/Azure/Community-Policy/blob/master) contains a script which validates the above and corrects the definition if necessary (see https://github.com/Azure/Community-Policy/blob/master/Submit-PolicyDefinitionFile.ps1)

Do not include system generated properties:

  • properties.policyType
  • properties.metadata
    • createdOn
    • createdBy
    • updatedOn
    • updatedBy

Policy effects should always be parameterized. Name the parameter “effect”, displayName is “Effect” and specify “allowedValues” and a “defaultValue”. Recommended combinations are:

“allowedValues” Sets Recommended “defaultValue”
“Append”, “Deny”, “Audit”, “Disabled” Append
“Append”, “Audit”, “Disabled”
Use only when Deny is not possible
Append
“Modify”, “Deny”, “Audit”, “Disabled” Modify
“Modify”, “Audit”, “Disabled”
Use only when Deny is not possible
Modify
“Deny”, “Audit”, “Disabled” Audit
“Audit”, “Disabled”
Use only when Deny is not possible
Audit
“DeployIfNotExists”, “AuditIfNotExists”, “Disabled” AuditIfNotExists or DeployIfNotExists
“AuditIfNotExists”, “Disabled” AuditIfNotExists
“DenyAction”, “Disabled” DenyAction
“Manual”, “Disabled” Manual

Append, Modify and DeployIfNotExists Policies are only advisable if the required parameters are known at Policy Assignment time.

Note: Modify and Append can interfere with desired state deployment technologies (e.g., Terraform). Terraform has an element “ignore_changes” to account for this problem (see https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle#ignore_changes).

Policy Set Definitions

Like Policy definitions, Initiative (Policy Set) definitions benefit from the same guidelines.

  • The name should be a GUID or a unique name within your company. Using a GUID simplifies contributing the Initiative to the community or merging multiple tenants, especially in a merger (companies) scenario.
  • Create a nested properties structure with only the name outside.
  • Supply a displayName for the Initiative.
  • Description is highly recommended.
  • version – in metadata; use semantic versioning.
  • category – in metadata, must be one of the categories in the built-in Policies and Policy Sets.

Parameters (especially effect parameters) should be surfaced by the Initiative. You will need to prefix the Policy level name with an indicator for the Policy in the Initiative.

When including Policies with a GUID name, I recommend that you make the policyDefinitionReferenceId a short version of the Policy's displayName to make the Initiative readable.

Policy Exemptions

Even with the best intentions some Policies may get in the way. If there is a business reason within acceptable risk parameters, you can grant an Exemption.

Exemptions come in two flavors (without any technical meaning):

  • Mitigated – Most often used for permanent exemptions. An example is allowing public IP addresses for a account which is used as an upload folder AND mitigations, such as Virus scans and deleting of processed data.
  • Waiver – Most often used for temporary exemptions to allow a solution team to fix their non-compliant deployment. Generally granted until Monday after the ETA (estimated time of arrival) for the fix.

Exemptions allow metadata. Add a link in metadata to the work item (e.g., Azure work item, GitHub issue, Jira ticket, etc.) to keep a record of why the exemption was granted and who granted it.

If you exempt an entire subscription with a Mitigated, it is likely that you should have used notScope (called Excluded Scope in Azure Portal) in the Assignment instead.

Warning: When you delete a Policy Assignment with Exemptions, then the Exemptions are not deleted and become orphaned.

Operating Azure Policy

Operational tasks (e.g., Remediation tasks, generating documentation) must be scripted. Do not use CI/CD tools (including Terraform) to execute operational tasks since CI/CD is intended to deploy resources, not to operate those resources.

Keeping track of built-in changes

I keep track of changes by cloning and following Microsoft's official Azure Policy repo on GitHub (https://github.com/Azure/azure-policy/tree/master/built-in-policies). When I receive an email about a merged PR (pull request), I'll fetch the latest version from GitHub into my clone. This allows me to use Code on my local clone instead of using Azure Portal or GitHub web interface.

 

This article was originally published by Microsoft's SQL Server Blog. You can find the original article here.