Skip to main content

Diagnose with Azure Observability

TL;DR
  • Your agent queries App Insights, Log Analytics, Azure Monitor metrics, Resource Graph, Activity Logs, and resource-specific diagnostics — all in one investigation
  • No connectors needed — everything works through managed identity and Azure RBAC
  • Your agent decides which sources to query based on the symptom, correlates evidence across them, and explains what it found
  • Deep diagnostics go beyond metrics — CPU profiling, memory analysis, connectivity checks, deployment history

The problem: too many places to look

Azure's observability stack is comprehensive. Application Insights captures traces and dependencies. Log Analytics stores custom logs and events. Azure Monitor tracks resource metrics. Resource Graph maps topology. Activity Logs record configuration changes. Each Azure service has its own diagnostics — Container Apps console logs, App Service deployment history, Function App health checks, AKS pod status.

That breadth is the problem. During an incident, you need data from several of these sources, but you have to remember which portal has which data, write KQL from scratch, manually copy operation IDs between tools, and correlate timestamps across tabs. The data exists everywhere. Knowing where to look — and connecting what you find — is what takes time.

How your agent investigates

How your agent diagnoses Azure services — querying Application Insights, Log Analytics, Azure Monitor metrics, Resource Graph and Activity Logs, plus built-in skills and Az CLI for resource-specific diagnosticsHow your agent diagnoses Azure servicesServiceissueSRE Agentqueries all sourcescorrelates evidenceexplains root causeApplication InsightsKQL queries, traces, dependenciesLog AnalyticsWorkspace logs, custom eventsAzure Monitor MetricsCPU, memory, requests, analysisResource Graph & ActivityTopology, changes, deployment logsBuilt-in Skills & Az CLIDeep diagnostics, connectivity,resource-specific checks, remediationCorrelateddiagnosisEvidence from all sourcesNo connectors needed — works through managed identity and Azure RBAC

Your agent has built-in access to Azure's full diagnostic surface. Grant permissions once, and your agent queries the right sources automatically based on the symptom:

  1. Discovers resources — Resource Graph finds topology, relationships, and connected resources across your subscriptions
  2. Queries logs — Application Insights for request traces, exceptions, and dependencies; Log Analytics for custom workspace data
  3. Analyzes metrics — Azure Monitor for CPU, memory, request rates, and availability with automatic time-series analysis
  4. Checks changes — Activity Logs surface recent configuration changes and deployments that may correlate with the issue
  5. Runs deep diagnostics — Built-in skills perform CPU profiling, memory analysis, latency assessment, connectivity checks, and resource-specific health analysis
  6. Executes Azure CLI commands — Reads resource state, checks configurations, and inspects properties that APIs don't expose directly
  7. Correlates everything — Evidence from all sources is connected automatically, with no copy-paste between portals
Zero configuration

Your agent selects the right tools for each resource type automatically. You don't configure which tools to use — your agent decides based on the symptom and the resource involved.

What makes this different

This is "external pain made easier." Azure's observability capabilities are excellent — the challenge is navigating them under pressure. Your agent eliminates the cognitive overhead of knowing where to look and how to connect what you find.

Unlike portal-hopping, your agent queries all sources in one investigation. You don't need to remember whether a specific metric lives in Azure Monitor, App Insights, or a resource-specific blade.

Unlike writing KQL from scratch, your agent constructs queries based on the symptom. It knows which tables to query, which dimensions to split by, and how to interpret the results in context.

Unlike manual correlation, your agent follows the thread automatically — operation IDs, timestamps, resource relationships, deployment timelines — across every source it queries.


Before and after

BeforeAfter
Tools opened3-5 portals, separate browser tabs0 — your agent queries them all
Query writingKQL from scratch for each sourceYour agent constructs queries from symptoms
CorrelationCopy operation IDs between portalsAutomatic cross-source correlation
Deep diagnosticsNavigate to resource-specific bladesYour agent runs CPU profiling, connectivity checks, deployment history automatically
InterpretationYou read raw results across toolsYour agent explains what the evidence means together
Which tool?You remember which portal has which dataYour agent knows

Built-in diagnostic capabilities

Your agent uses these capabilities automatically during investigation. For the full list of tools available, see Tools.

Application Insights

Your agent queries Application Insights resources directly using their resource ID or app ID — running KQL, correlating time series, tracing distributed transactions, and assessing the scope of impact.

When investigating slow responses, for example, your agent might query the requests table for duration spikes, use time-series correlation to identify which endpoints and result codes are contributing, trace a slow request through all dependencies, and report: "Endpoint /api/orders is 5x slower due to SQL dependency timeout."

Log Analytics

Your agent runs KQL queries against Log Analytics workspaces using either resource ID or workspace ID. This covers custom logs, performance counters, Azure diagnostic logs, and any other data routed to a workspace.

Azure Monitor metrics

Your agent discovers available metrics for any resource type, queries time-series data with dimension filtering, and performs automated trend analysis. It can also generate charts from metric data for inclusion in reports and investigation threads.

When your agent uses Azure Monitor as its incident platform, it also manages alerts directly — acknowledging and closing them during investigation.

Alert management permissions

Your agent needs the Monitoring Contributor role at subscription scope to acknowledge and close alerts during investigation. This role is assigned automatically when you configure managed resource groups during agent creation, or when you connect Azure Monitor as your incident platform. If the role is missing, a banner in the portal prompts you to assign it.

Resource Graph and Activity Logs

Resource Graph discovers resources across your subscriptions, maps topology, and finds connected resources. Your agent uses this to understand the full blast radius — finding related resources in the same resource group, virtual network, or app service plan.

Activity Logs surface recent configuration changes, deployments, and administrative actions. These help your agent correlate an outage with a recent change — "App started failing 4 minutes after this deployment" or "network rules changed at the same time availability dropped."

Built-in skills and Azure CLI

Beyond metrics and logs, your agent has specialized capabilities that go deeper:

CategoryWhat it does
Deep diagnosticsCPU profiling, memory analysis, latency assessment, threadpool starvation detection
Connectivity checksTCP connectivity tests, DNS resolution, storage connectivity verification
Resource-specific diagnosticsContainer App revision management, App Service configuration checks, Function App deployment history, AKS kubectl commands, Redis diagnostics, PostgreSQL health, API Management analysis
Reliability assessmentApp Service health scoring — AlwaysOn, health checks, instance count, auto-heal configuration
Azure CLIRead commands (az ... show, az ... list) for any Azure service, and write commands (az ... update, az ... scale) with approval
ARM operationsDirect resource property inspection, app settings management, deployment slot operations

Your agent selects these tools based on the resource type and the nature of the problem. A Container App issue triggers revision listing and console log inspection. An App Service issue triggers CPU profiling and deployment history. You don't configure this — it happens automatically.


Investigation example

You: "My container app ca-web is throwing errors"

Agent investigates:

  1. Finds the resource via Resource Graph

    Found: /subscriptions/.../resourceGroups/rg-groceryapp/providers/Microsoft.App/containerApps/ca-web-3syj3i2fat5dm
    Latest revision: ca-web-3syj3i2fat5dm--azd-1769393374 (100% traffic)
  2. Checks logs, metrics, and diagnostics

    • Console logs: web starts normally, listens on port 3000
    • Metrics (last 30m): requests=0, CPU≈0%, memory≈3%
    • Container health: running, no restarts
    • No stack traces or 4xx/5xx in this container
  3. You provide more context: "ERROR Supplier API is rate limited!"

  4. Continues investigation — checks related API container logs and configuration

    ContainerAppConsoleLogs_CL
    | where RevisionName_s contains "ca-api"
    | where Log_s contains "rate" or Log_s contains "limit"

    Found: SUPPLIER_RATE_LIMIT=5 in environment, but request bursts of 40/min observed

Agent conclusion: "API exceeding supplier rate limit — configured at 5 requests/min but observed bursts up to 40/min. Recommend implementing client-side throttling with exponential backoff."


Get started

Azure observability works out of the box when you grant your agent Reader access to your subscription during Step 1: Create and Set Up.

ResourceWhat you'll learn
Create and Set Up →Grant permissions during initial agent setup
Manage Permissions →Add or change resource access after setup

CapabilityWhat it adds
External Observability →Datadog, Splunk, and custom systems via MCP connectors
Kusto Tools →Custom Azure Data Explorer queries
Root Cause Analysis →Hypothesis-driven investigation across all evidence
Was this page helpful?