Azure Policy enables an organization to enforce standards and assess compliance against regulatory requirements. In general, there are four main Azure Policy objects.
- A Policy Definition describes the conditions under which it is enforced. This Azure Policy object has no effect until it is assigned.
- A Policy Set can bundle Policy Definitions that are related. Again, this Azure Policy object has no effect until it is assigned.
- A Policy Assignment assigns a Policy Definition or Policy Set on a specific scope. When a Policy Assignment has been created, the Azure Policy artifacts that it assigns influence the environment.
- A Policy Exemption exempts a specific scope from a Policy Assignment. As a result, the Policy Definitions or Policy Sets that were assigned by the Policy Assignment have no effect over that scope.
There are multiple use cases for Policy Definitions with the DeployIfNotExists effect. These types of Policy Definitions conduct a deployment when their existence condition is not met. For instance, when the Diagnostic Settings of a Storage Account is not configured, and it should be according to the existence condition, a deployment is conducted to remediate this situation. Unfortunately, deployments might fail due to changes in the environment such as the removal of a Role Assignment, which was required for a successful deployment.
When there is a team actively monitoring the compliance state of Azure services, remediation activities can be conducted regularly. Unfortunately, a lot of organizations do not have that luxury, often resulting in long-term non-compliance across their environment.
In this blogpost, I will discuss a solution that automatically creates Remediation Tasks for non-compliant Policy Definitions of effect DeployIfNotExists. On top of that, when one or more Remediation Tasks fail, a Bug is created in Azure DevOps so that the failed deployment can be investigated right away.
To build the solution defined above, different Azure and Azure DevOps services are implemented. In Figure 1, these services, and the role these play in the overall solution, are visualized in more detail.
As you can see in Figure 1, the solution comprises different Azure and Azure DevOps services, denoted by a number.
- First, an Azure Pipeline is being used to run different Tasks, including the automatic remediation of non-compliant Policy Definitions and the creation of a Bug when one or multiple of the Remediation Tasks fail.
- To be able to authenticate to Azure DevOps, a Personal Access Token (PAT) is leveraged. The PAT is stored as a secret in a Key Vault so that it is protected throughout its lifecycle.
- As part of the Azure Pipeline, the Azure Policy compliance state of the environment is retrieved. Hence, Policy Definitions play an important role in this solution.
- Finally, Azure Boards is used for the creation of the Bug.
The execution flow is also visualized in Figure 1, this time with the use of letters.
- The Azure Pipeline runs on a daily schedule. The first Task retrieves the PAT from the Key Vault so that it can be used for authentication in a later Task.
- After that, the second Task is triggered. This Task retrieves the Azure Policy compliance state of the environment and checks for non-compliant Policy Definitions.
- When one or multiple non-compliant Policy Definitions have been discovered, a Remediation Task is created for every one of them.
- If one of the Remediation Tasks fails, the third and final Task is started. This Task creates a Bug on the Board that contains a HTML table with a link to all the failed Remediation Tasks.
- As a result, the team is provided with a clear overview of the failed Remediation Tasks, enabling them to investigate and resolve these as quickly as possible.
Enough theory for now. Let’s check how the solution works in practice!
Putting Theory into Practice
By default, the Azure DevOps Pipeline is configured to run every day at 12:00 AM. However, as you can see in Figure 2, I triggered a run manually as I did not want to wait around while writing this blogpost.
In the first Task, the Personal Access Token, is retrieved from the Key Vault. I simply used the AzureKeyVault@2 Task for this purpose and set the ‘RunAsPreJob’ property to ‘true’ to ensure that the secret is available throughout the entire Job.
After the first Task finished successfully, the second Task is triggered as is visualized in Figure 3.
This Task first retrieves the Azure Policy compliance state for the environment** within a certain timeframe. The reason for selecting a certain timeframe is that the Azure Policy compliance state should be as current as possible.
**By default, the Azure Pipeline Agent runs the Connect-AzAccount command before executing the PowerShell script. As a result, the logic is running in that context. If you want to change the Azure context, you should modify the Get-AzPolicyState command, used in the first PowerShell script, by for instance adding the ‘ManagementGroupName’ parameter.
After retrieving the Azure Policy compliance state, all unique non-compliant Policy Definitions are selected. According to the logic I build, there should be five non-compliant Policy Definitions across the environment. As you can see in Figure 4, this information is accurate.
Subsequently, a Remediation Task is created for every non-compliant Policy Definition in a sequential manner. When a Remediation Task succeeds, the PowerShell script simply moves on to the next Policy Definition. However, when a Remediation Task fails, it is added to a variable that will be used later in the Azure Pipeline. As visualized in Figure 3, one single Remediation Task failed. According to Figure 5, this information is true indeed.
The Remediation Task failed because I removed the Role Assignments of the System-assigned Managed Identity that was used by the Policy Assignment. In Figure 6, the error message of the Remediation Task is visualized, showing that the deployment indeed failed due to a permission issue.
Since a Remediation Task has failed, the third Task in the Azure Pipeline is triggered as visualized in Figure 7.
This Task starts with installing and importing the VSTeam PowerShell module since it is required for the creation of the Bug. After that, the PAT is used to authenticate to the Contoso Azure DevOps Project. Subsequently, the current Iteration of the Contoso Team is selected since it is logical to create the Bug there. Finally, the failed Remediation Task is converted to a HTML table which is included in the Bug that is placed on the current Iteration of the Contoso Team. In Figure 8, the Bug is visualized in more detail.
Consequently, members of the Contoso Team see the Bug on their current Iteration and can start resolving it. By selecting the URL of a failed Remediation Task, members of the Contoso Team are conveniently directed to that Remediation Task in the Azure portal.
But what would have happened if the Role Assignments of the Managed Identity were there? Well, the Remediation Task would have succeeded as you can see in Figure 9.
On top of that, since no Remediation Task failed, the third Task in the Azure DevOps Pipeline is skipped as visualized in Figure 10.
How can you use this solution?
As this blogpost does not provide detailed information on the configuration of the solution, I have uploaded all code in my public GitHub repository.
If more information on the configuration of the solution, and the use of the different artifacts in the GitHub repository is needed, please let me know so that I can then create a follow-up blogpost.
The sample scripts are not supported by any Microsoft standard support program or service. The sample scripts are provided AS IS without a warranty of any kind. Microsoft further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall Microsoft, its authors, or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Microsoft has been advised of the possibility of such damages.