Execute Mitigations

TL;DR

Ask your agent to fix an issue — it proposes, you approve, it executes
Full audit trail: who triggered it, what changed, whether it worked
Choose your level of trust: Review mode (approve each action) or Autonomous mode (agent handles it)

The problem: diagnosis without action wastes time

You've identified the issue. Now what? You navigate to the Azure portal, find the right blade, confirm the resource, click through confirmation dialogs, wait for the operation to complete, then verify it worked. The investigation took five minutes; the fix takes another ten.

This friction exists across your operational workflows:

Daily operations: Scaling resources for expected load, restarting services during maintenance windows
Compliance checks: Hardening security settings across dozens of storage accounts
On-call response: Executing well-known fixes quickly so engineers can get back to sleep
Proactive optimization: Adjusting SKUs based on usage patterns before problems occur

How your agent closes the loop

When your agent identifies an issue, it doesn't stop at telling you what's wrong. It proposes a specific remediation action and, depending on your run mode, either waits for your approval or executes immediately.

The agent follows a consistent pattern: diagnose → identify action → check permissions → execute (or propose) → verify the fix worked. Every action is logged with who triggered it, what changed, why, and whether it succeeded.

Agent response paths: execute fix, create work item, or send notification — After investigating, your agent can take direct action, create tracking items, or notify your team — each with full context.

What makes this different from scripts

Scripts are rigid — they run the same action regardless of context. Your agent reasons about the situation first. It considers what it found during investigation, what it remembers from past incidents, and what your skills and knowledge base recommend. The same symptom might lead to a restart in one case and a scale-up in another, because the agent adapts based on evidence.

Run modes give you graduated trust. Start in Review mode where the agent proposes and you approve. Move to Autonomous when you're confident in the pattern. Use ReadOnly for monitoring-only agents that never take action.

What your agent can do

Your agent can execute any Azure action through Azure CLI commands — if you can run it in az, your agent can run it too. This includes managing any resource type, modifying configurations, creating resources, and running any Azure operation.

Command type	What it enables
Read commands	Query any Azure resource — `az webapp list`, `az containerapp show`, `az vm list`, `az network vnet show`. Run immediately, no approval needed.
Write commands	Modify any Azure resource — `az webapp restart`, `az containerapp update`, `az vm resize`, `az role assignment create`. Requires approval in Review mode.

The agent's actions are constrained only by the permissions assigned to its managed identity. Grant Contributor on a resource group, and your agent can manage everything in that group. Grant a custom role with specific actions, and your agent is limited to those actions.

Safety guardrails

The agent enforces safety constraints at the command level:

Delete operations blocked — delete and remove commands are never executed. The agent returns an error directing users to the Azure Portal for deletions.
Key Vault commands blocked — all az keyvault commands are blocked to prevent credential exposure.
Management locks respected — before modifying any resource, the agent checks for Azure management locks. Resources with ReadOnly locks cannot be modified.
Subscription validation — subscription IDs in commands are validated for correct GUID format before execution.

Before and after

	Before	After
Fix execution	Navigate to Azure Portal, find resource, click through blades	Ask agent, approve, done
Verification	Manually check if fix worked	Agent verifies and reports result
Audit	Hope someone documented what they did	Full audit trail in Application Insights
Knowledge	One engineer knows the fix	Agent applies learned patterns consistently

Permission requirements

By default, agents have Reader access and cannot take actions. You explicitly grant write permissions by assigning roles to your agent's managed identity.

Scope	What the agent can act on	Recommended for
Resource	A single resource only	Maximum restriction, start here
Resource Group	All resources in one group	Production workloads
Subscription	Any resource in the subscription	Development and testing only

warning

The agent checks Azure management locks before modifying any resource. Resources with ReadOnly locks cannot be modified, regardless of permissions or run mode. Delete and remove operations are blocked entirely — use the Azure Portal for deletions.

When the agent lacks permissions

If the agent's managed identity doesn't have the permissions needed to run a command, the agent doesn't fail silently. Instead, it follows a transparent fallback:

Tries with its managed identity first — every command starts with the agent's own credentials
Detects the authorization error — the agent recognizes when access is denied
Explains what happened — an info message tells you exactly what went wrong and what action you can take

The message reads:

"The agent tried to execute this command using its managed identity but received an authorization error. If you grant permissions, the command will be re-executed using your credentials (OBO) with scope: [scope]"

The button changes from Approve action to Grant permissions, making the consent action explicit.

Granting temporary permissions

Click Grant permissions to let the agent re-execute the command using your credentials via the on-behalf-of flow. See Permissions — OBO lifecycle for details on how your token is stored and scoped.

What you see	What it means
"Grant permissions" button	The agent's managed identity was denied — it needs your credentials to proceed
"Approve action" button	The agent has sufficient permissions — this is standard approval in Review mode
Scope in the message	The Azure API scope the agent needs access to (for example, `management.azure.com`)

After granting, the result shows who authorized the action: "The action was completed using temporary permissions granted by [your name]."

If your own Azure permissions are also insufficient for the command, the action still fails — you see: "Temporary permissions were granted by [your name], but the action failed to run." Check that you have the necessary Azure RBAC role (for example, Contributor) on the target resource.

Who can grant permissions

Only users with the SRE Agent Administrator role can authorize on-behalf-of requests. Standard Users see the info message but the buttons are disabled. Personal Microsoft accounts cannot authorize OBO — only work or school (Entra ID) accounts.

See Agent Permissions for details on the on-behalf-of flow.

Alternative response paths

Direct mitigations aren't the only option. Many teams prefer to route findings to work items or ticketing systems instead of executing actions directly — especially when human review is required or change management processes apply.

Response path	How it works	Best for
Direct mitigation	Agent executes restart, scale, or hardening	Trusted patterns, non-production
Create work item	Agent creates GitHub Issue or ADO work item	Human-in-the-loop, change management
Send notification	Agent posts to Teams or sends email	Awareness without action
Trigger workflow	Agent dispatches GitHub Actions or Logic Apps	CI/CD integration, multi-step processes

Configure work item creation and notifications through connectors. For example, connect a GitHub MCP server to let your agent create issues, or connect Azure DevOps to create work items automatically.

See Send Notifications and Workflow Automation for chaining these response types together.

Example: Incident-triggered mitigation

3:47 AM — PagerDuty fires an alert: "High memory on prod-api"

Your agent (in Review mode) handles everything while you sleep:

Acknowledges the incident — PagerDuty shows "Acknowledged by SRE Agent"
Investigates automatically
- Queries App Insights: memory at 94%, trending up over 2 hours
- Checks deployment history: no recent deploys
- Recalls from memory: "Last time this happened, restart resolved it"

Proposes a fix — Posts to the incident thread:

Memory at 94% on prod-api (App Service).
Recommended action: Restart the App Service.

Evidence:
- Memory climbing since 1:30 AM
- No recent deployments
- Past incident: restart resolved similar issue on 2026-01-15

[Approve] [Deny]

You approve (or in Autonomous mode, agent executes immediately)

Agent executes and verifies

✓ Restarted prod-api
✓ Memory now at 42%
✓ Incident resolved

What happened: You clicked Approve and the agent handled investigation, action, and verification.

Audit trail

Every mitigation action is recorded with full context:

Field	What's captured
Identity	Which agent and managed identity
Action	Exact operation performed
Timestamp	When it executed
Trigger	The diagnosis or condition that led to the action
Result	Success or failure, with post-action verification

Query the audit trail in Application Insights via Monitor → Logs in the agent portal. Every az command is logged as an AgentAzCliExecution custom event. See Audit Agent Actions.

Get started

Mitigations work out of the box with the built-in Azure CLI tool. Control how much autonomy your agent has through Run Modes.

Resource	What you'll learn
Set Up a Response Plan →	Configure response plans that include automated mitigations
Run Modes →	Configure ReadOnly, Review, or Autonomous execution levels

Capability	What it adds
Run Modes →	Control the level of autonomy for each action
Scheduled Tasks →	Schedule health checks that trigger mitigations automatically
Workflow Automation →	Chain mitigations with notifications and ticket creation
Audit Agent Actions →	Review and query the full action history
Permissions →	Understand agent permission model

The problem: diagnosis without action wastes time​

How your agent closes the loop​

What makes this different from scripts​

What your agent can do​

Safety guardrails​

Before and after​

Permission requirements​

When the agent lacks permissions​

Granting temporary permissions​

Who can grant permissions​

Alternative response paths​

Example: Incident-triggered mitigation​

Audit trail​

Get started​

Related capabilities​