Skip to main content

Execute Mitigations

TL;DR
  • Ask your agent to fix an issue — it proposes, you approve, it executes
  • Full audit trail: who triggered it, what changed, whether it worked
  • Choose your level of trust: Review mode (approve each action) or Autonomous mode (agent handles it)

The problem: diagnosis without action wastes time

You've identified the issue. Now what? You navigate to the Azure portal, find the right blade, confirm the resource, click through confirmation dialogs, wait for the operation to complete, then verify it worked. The investigation took five minutes; the fix takes another ten.

This friction exists across your operational workflows:

  • Daily operations: Scaling resources for expected load, restarting services during maintenance windows
  • Compliance checks: Hardening security settings across dozens of storage accounts
  • On-call response: Executing well-known fixes quickly so engineers can get back to sleep
  • Proactive optimization: Adjusting SKUs based on usage patterns before problems occur

How your agent closes the loop

When your agent identifies an issue, it doesn't stop at telling you what's wrong. It proposes a specific remediation action and, depending on your run mode, either waits for your approval or executes immediately.

The agent follows a consistent pattern: diagnose → identify action → check permissions → execute (or propose) → verify the fix worked. Every action is logged with who triggered it, what changed, why, and whether it succeeded.

Agent response paths: execute fix, create work item, or send notification
After investigating, your agent can take direct action, create tracking items, or notify your team — each with full context.

What makes this different from scripts

Scripts are rigid — they run the same action regardless of context. Your agent reasons about the situation first. It considers what it found during investigation, what it remembers from past incidents, and what your skills and knowledge base recommend. The same symptom might lead to a restart in one case and a scale-up in another, because the agent adapts based on evidence.

Run modes give you graduated trust. Start in Review mode where the agent proposes and you approve. Move to Autonomous when you're confident in the pattern. Use ReadOnly for monitoring-only agents that never take action.

What your agent can do

Your agent can execute any Azure action through Azure CLI commands — if you can run it in az, your agent can run it too. This includes managing any resource type, modifying configurations, creating resources, and running any Azure operation.

Command typeWhat it enables
Read commandsQuery any Azure resource — az webapp list, az containerapp show, az vm list, az network vnet show. Run immediately, no approval needed.
Write commandsModify any Azure resource — az webapp restart, az containerapp update, az vm resize, az role assignment create. Requires approval in Review mode.

The agent's actions are constrained only by the permissions assigned to its managed identity. Grant Contributor on a resource group, and your agent can manage everything in that group. Grant a custom role with specific actions, and your agent is limited to those actions.

Safety guardrails

The agent enforces safety constraints at the command level:

  • Delete operations blockeddelete and remove commands are never executed. The agent returns an error directing users to the Azure Portal for deletions.
  • Key Vault commands blocked — all az keyvault commands are blocked to prevent credential exposure.
  • Management locks respected — before modifying any resource, the agent checks for Azure management locks. Resources with ReadOnly locks cannot be modified.
  • Subscription validation — subscription IDs in commands are validated for correct GUID format before execution.

Before and after

BeforeAfter
Fix executionNavigate to Azure Portal, find resource, click through bladesAsk agent, approve, done
VerificationManually check if fix workedAgent verifies and reports result
AuditHope someone documented what they didFull audit trail in Application Insights
KnowledgeOne engineer knows the fixAgent applies learned patterns consistently

Permission requirements

By default, agents have Reader access and cannot take actions. You explicitly grant write permissions by assigning roles to your agent's managed identity.

ScopeWhat the agent can act onRecommended for
ResourceA single resource onlyMaximum restriction, start here
Resource GroupAll resources in one groupProduction workloads
SubscriptionAny resource in the subscriptionDevelopment and testing only
warning

The agent checks Azure management locks before modifying any resource. Resources with ReadOnly locks cannot be modified, regardless of permissions or run mode. Delete and remove operations are blocked entirely — use the Azure Portal for deletions.

Alternative response paths

Direct mitigations aren't the only option. Many teams prefer to route findings to work items or ticketing systems instead of executing actions directly — especially when human review is required or change management processes apply.

Response pathHow it worksBest for
Direct mitigationAgent executes restart, scale, or hardeningTrusted patterns, non-production
Create work itemAgent creates GitHub Issue or ADO work itemHuman-in-the-loop, change management
Send notificationAgent posts to Teams or sends emailAwareness without action
Trigger workflowAgent dispatches GitHub Actions or Logic AppsCI/CD integration, multi-step processes

Configure work item creation and notifications through connectors. For example, connect a GitHub MCP server to let your agent create issues, or connect Azure DevOps to create work items automatically.

See Send Notifications and Workflow Automation for chaining these response types together.

Example: Incident-triggered mitigation

3:47 AM — PagerDuty fires an alert: "High memory on prod-api"

Your agent (in Review mode) handles everything while you sleep:

  1. Acknowledges the incident — PagerDuty shows "Acknowledged by SRE Agent"

  2. Investigates automatically

    • Queries App Insights: memory at 94%, trending up over 2 hours
    • Checks deployment history: no recent deploys
    • Recalls from memory: "Last time this happened, restart resolved it"
  3. Proposes a fix — Posts to the incident thread:

    Memory at 94% on prod-api (App Service).
    Recommended action: Restart the App Service.

    Evidence:
    - Memory climbing since 1:30 AM
    - No recent deployments
    - Past incident: restart resolved similar issue on 2026-01-15

    [Approve] [Deny]
  4. You approve (or in Autonomous mode, agent executes immediately)

  5. Agent executes and verifies

    ✓ Restarted prod-api
    ✓ Memory now at 42%
    ✓ Incident resolved

What happened: You clicked Approve and the agent handled investigation, action, and verification.


Audit trail

Every mitigation action is recorded with full context:

FieldWhat's captured
IdentityWhich agent and managed identity
ActionExact operation performed
TimestampWhen it executed
TriggerThe diagnosis or condition that led to the action
ResultSuccess or failure, with post-action verification

Query the audit trail in Application Insights via Monitor → Logs in the agent portal. Every az command is logged as an AgentAzCliExecution custom event. See Audit Agent Actions.

Get started

Mitigations work out of the box with the built-in Azure CLI tool. Control how much autonomy your agent has through Run Modes.

ResourceWhat you'll learn
Set Up a Response Plan →Configure response plans that include automated mitigations
Run Modes →Configure ReadOnly, Review, or Autonomous execution levels
CapabilityWhat it adds
Run Modes →Control the level of autonomy for each action
Scheduled Tasks →Schedule health checks that trigger mitigations automatically
Workflow Automation →Chain mitigations with notifications and ticket creation
Audit Agent Actions →Review and query the full action history
Permissions →Understand agent permission model
Was this page helpful?