Using HashiCorp Vault to generate ephemeral Azure AD Service Principles

Matt Betts
5 min readJun 16, 2021

As part of a zero-trust security posture, service accounts are often overlooked. In the past service accounts typically would have no or perhaps a 365 day expiry date on the credentials, but even then due to the potential overhead of rotating the password, it may just be tracked as an accepted organizational risk, rather than remediated. As organizations continue to transition to Cloud providers (Azure, AWS, GCP etc), there is an opportunity to prevent some of these bad habits out of the gate.

This article is going to explain how you can leverage HashiCorp Vault along side HashiCorp Terraform to generate Azure AD Service Principles on a per terraform run, thus taking service principles from never expire, to a 1 hour (or even less) TTL.

Within HashiCorp Vault, the Azure Secrets Engine has the ability to dynamically generate Azure AD Applications on a per request basis and handles the creation, role assignment and cleanup all in an automated fashion. HashiCorp provide documentation on how to enable this in their documentation, so below is how to leverage this functionality within Terraform.

HashiCorp Vault and Terraform with Azure workflow

HashiCorp Vault and Terraform with Azure workflow

The workflow is as follows:

1. A Vault Administrator enables the Azure Secret Engine within the Vault instance (or namespace if these are enabled).

2. The Vault Administrator configures the Vault Secret Engine with an Azure AD Application. Step 34 of the Azure Secrets Engine documentation has the Application granted the Owner level permissions, however in reality this is overkill. As the account is only to be used for granting permissions to newly generated Azure AD Applications, the permission level of User Access Administrator is more than sufficient. In the event that management groups are leveraged by the organization, rather than granting the role to the subscription, grant it to the Management Group instead.

3. Within Vault, create a Vault Role and Policy. The instructions on how to do this can be found within the Create a Role documentation. The sample role generates a Contributor level access to a Resource Group, however this can be changed to something more appropriate for your deployment. For example, if the app is to be used to administrator management groups, the following sample role could be leveraged

vault write azure/roles/lz-permission ttl=60m azure_roles=-<<EOF
[
{
"role_name": "Reader",
"scope": "/subscriptions/$SUBSCRIPTION_ID"
},
{
"role_name": "User Access Administrator",
"scope": "/providers/Microsoft.Management/managementGroups/MgmtGroup"
},
{
"role_name": "Management Group Contributor",
"scope": "/providers/Microsoft.Management/managementGroups/MgmtGroup"
}
]
EOF

Now that the role is configured, an application policy needs to be defined in Vault. Instructions on how to do this are in the Vault documentation, however below is an updated policy that contains the minimum requirements for Terraform to interact with Vault:

path "azure/creds/edu-app" {
capabilities = [ "read" ]
}
path "auth/token/create" {
capabilities = [ "update" ]
}

4. Using the newly created role, follow the instructions to create a Token to be used in the VAULT_TOKEN environment variable in Terraform execution.

5. Now that we have a token and Vault is setup to interact with Azure AD (instructions to test with the token can be found in the Vault documentation), we’re now ready to configure Terraform to interact with Vault and Azure.

To begin with the following environment variables need to be set:

export ARM_TENANT_ID=Azure AD Tenant ID
export ARM_SUBSCRIPTION_ID=Azure Subscription ID
export VAULT_ADDR=The vault address i.e. http://127.0.0.1
export VAULT_TOKEN=Created in step 4

For the purpose of this example we’ll create a simple resource group and virtual network using Terraform.

To get started we’ll need to define the Azure and Vault providers in Terraform N.B. in the below example I have pinned the provider versions so this demo remains consistent.

terraform {

required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=2.46.0"
}
vault = {
source = "hashicorp/vault"
version = "2.20.0"
}
}
}

The Vault provider is then used to query the credentials from the Vault instance that is defined in the environment variable. Details of the specific settings can be found here vault_azure_access_credentials | Data Sources | hashicorp/vault | Terraform Registry

provider "vault" {
}
data "vault_azure_access_credentials" "creds" {
backend = "azure"
role = "default-azure-access"
validate_creds = true
num_sequential_successes = 2
num_seconds_between_tests = 20
max_cred_validation_seconds = 1200 // 20 minutes
}

Next up we’ll define the AzureRM provider. Typically you’d define ARM_CLIENT_ID and ARM_CLIENT_SECRET as environment variables, however as we’re extracting these from vault, we’ll define them as data resources:

provider "azurerm" {
features {}
client_id = data.vault_azure_access_credentials.creds.client_id
client_secret = data.vault_azure_access_credentials.creds.client_secret
}

Now that the providers are configured, the Terraform code can be defined. Below is a complete sample of the terraform file, including the above code snippets:

terraform {

required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=2.46.0"
}
vault = {
source = "hashicorp/vault"
version = "2.20.0"
}
}
}
provider "vault" {}data "vault_azure_access_credentials" "creds" {
backend = "azure"
role = "edu-app"
validate_creds = true
num_sequential_successes = 2
num_seconds_between_tests = 20
max_cred_validation_seconds = 1200 // 20 minutes
}
provider "azurerm" {
features {}
client_id = data.vault_azure_access_credentials.creds.client_id
client_secret = data.vault_azure_access_credentials.creds.client_secret
}
# Create a resource group
resource "azurerm_resource_group" "example" {
name = "example-resources"
location = "East US 2"
}
# Create a virtual network within the resource group
resource "azurerm_virtual_network" "example" {
name = "example-network"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
address_space = ["10.0.0.0/16"]
}

6/7/8. When a Terraform Plan is performed, a request is now sent to Vault. Vault will go to Azure AD and provision a new application with an associated service principle. It will then grant the necessary Azure Roles back to the newly generated application. Once generated the Client ID and Client Secret will be returned to the Terraform Plan.

9. Due to the globally distributed nature of Azure AD an app can take a few minutes to propogate. Within the Vault data provider checks are performed to validate if the credentials are propogated sufficiently. These settings can be controlled using the num_sequential_successes, num_seconds_between_tests and max_cred_validation_seconds settings.

10/11. Terraform will now perform a plan and output any desired state changes. Once the plan is successful, you can run a Terraform Apply in the same way.

Once the TTL of the Azure AD application expires, Vault will perform a cleanup task in the background to remove this from Azure AD. If you need to manually remove the Azure AD applications, within Vault the following command can be executed, with more examples available in the Vault documentation.

vault lease revoke -prefix azure/creds/edu-app

Conclusion

Using the above example, it’s very easy to remove the need for long live Azure Service Principles when interacting with Azure using Terraform. Whilst the above is focused on Azure and Terraform, it is easy to leverage this for AWS, GCP and other Cloud Providers. Using the Vault APIs it is also possible to leverage this functionality from other services, such as Azure DevOps.

--

--