Evidence-Driven AI · Incident Investigation

From outage to root cause in minutes.

Governed AI agents investigate incidents across logs, metrics, APIs, and deployments — grounded by a Business Context Layer that ties every signal to the services and impact that matter, so each conclusion is sharper and backed by traceable evidence.

Book a guided demo See how it works

Faster mean time to resolution

Source-linked, auditable evidence

Autonomous triage coverage

−45% MTTR

vs. manual triage

14 sources

cited in this trace

diagnara · investigationSEV-1 · live

High latency after deploy · contract-provider

INC-4821 · 02:14 UTC

Detect +0.4s

p95 latency 4.2× baseline across 3 regions after release.

metrics://datadog · checkout.p95

Investigate +6.1s

Runbook executed — pulled logs, deploy events, and the change ticket.

runbook://latency-after-deploy

Correlate running…

Mapping deploy #9f2a1c → slow query cluster on payments-db.

github://PR-2287 · diff

Conclude

Synthesizing evidence into a ranked root-cause hypothesis.

Root cause identifiedconfidence 96%

PR #2287 removed an index used by the contract lookup query, causing full-table scans under load. Rollback recommended.

The problem

Critical context — technical and business — is scattered across every tool you own.

During an incident, engineers manually correlate logs, metrics, deployments, API behavior, and tickets — then have to map it all to the business: which services, SLAs, customers, and revenue are actually at stake. Most AI tools just summarize raw telemetry — with no business context and no defensible justification. Diagnara investigates instead.

// fragmented during an incident

Logs (Elastic)Metrics (Datadog)Deploys & PRs (GitHub)Jira / ServiceNowPostgresAPIsFeature flagsKafka / SQSAWS / InfrastructureBusiness rulesDomain knowledge (spread across teams)

governed investigation

One evidence-backed conclusion

Root cause · confidence · impact · next actions · full audit trail

Why Diagnara is different

Not a chatbot. An investigation.

Every answer is the result of a structured investigation — with a reproducible reasoning path you can defend to engineering, leadership, compliance, and audit.

Evidence over conversation

Every conclusion links to a cited data point — a log line, metric, ticket, deploy, or contract record. No claim without a source, no source without a tool call.

Protocol over improvisation

Investigations follow versioned runbooks that define the steps, required tools, evidence thresholds, and approval gates — so outputs are consistent and reproducible.

Governance over open access

RBAC, quotas, rate limits, data masking, model routing, and audit events are built in — agents only touch approved tools, within approved environments.

The evidence pipeline

A structured protocol, end to end.

Diagnara turns a signal into a defensible conclusion through five explicit stages — every run reproducible, every step auditable.

Stage 01 · Detect

Catch the signal, from anywhere

An alert, a Jira ticket, or a manual prompt opens an investigation. Diagnara intakes the signal and scopes it to the right environment.

CRIT p95 latency 4.2× baseline · contract-provider 02:14:06

WARN slow query cluster detected on payments-db 02:14:09

INTAKE environment=Production · runbook matched 02:14:12

Stage 02 · Investigate

Run a runbook, not a guess

A versioned investigation runbook defines the steps, required tools, evidence thresholds, and approval gates for this class of incident — executed inside guardrails.

STEP 1 pull deploy events (last 60m) tool: deployments

STEP 2 compare pre/post p95 latency tool: metrics

STEP 3 fetch change ticket + PR diff tool: jira, github

GATE sensitive query → human approval policy

Stage 03 · Correlate

Test hypotheses against real data

Diagnara links evidence across systems, forms competing hypotheses, and validates each against the data — accepting and discarding with reasons.

H1 ✓ PR #2287 dropped index → full-table scans 96%

H2 ✗ upstream provider degradation discarded · 31%

H3 ✗ traffic surge / capacity discarded · 12%

Stage 04 · Conclude

A conclusion you can defend

The output is a ranked root cause, a confidence label, an impact assessment, and recommended next actions — every claim source-linked and reproducible.

ROOT CAUSE missing index on contract_lookup conf 96%

IMPACT 3 regions · ~14m · 38k requests assessed

ACTION rollback PR #2287 / hotfix index recommended

Stage 05 · Institutionalize

Every investigation makes the next one faster

The timeline, evidence graph, decisions, and report are stored as reusable, auditable knowledge — so when a similar signal returns, the investigation builds on what's known and reaches a confident answer far faster.

SAVED investigation timeline + evidence graph knowledge base

AUDIT 14 tool calls · 2 approvals logged immutable

REUSE linked to runbook v3 · postmortem indexed

Architecture

Governed, tool-driven, multi-agent by design.

Every input is scoped, every action is policy-checked, and every fact comes from a registered tool — so you can see exactly where your data flows before you ever book a demo.

Scroll horizontally to follow the flow

Business Context Layer

Guides every step — ties each signal to the services, SLAs, ownership & impact that matter.

Business rules & domain logicCritical business flows & journeys

Our differentiator

01 · Signals

Inputs

alerts · tickets · prompts

02 · Governance

Control plane

rbac · policy · audit

03 · Multi-agent

Investigation

runbook · agents

04 · Tool-first

Governed tools

logs · metrics · sql

05 · Sources

External systems

elastic · datadog · github

Evidence store & timeline

Every tool call returns a cited, source-linked fact into one reproducible, auditable record.

all evidence converges here

Explore the full interactive architecture

Governed tools

Connected to where the evidence lives.

Agents access data only through registered tools — each with permissions, rate limits, timeouts, and human-in-the-loop approval.

Logs

Search and correlate application and gateway logs across services and regions.

ElasticREST

Metrics & APM

Compare latency, error rates, and saturation; trace spans across dependencies.

Datadogtraces

Deployments & PRs

Tie incidents to releases — deploy events, PR diffs, and approval trails.

GitHubdeploy events

Tickets

Pull change tickets and incident records to reconstruct context and timeline.

JiraServiceNow

Databases (SQL)

Query production databases and read replicas directly through governed SQL.

PostgresSQL

APIs

Call internal and partner business services — billing, contracts, and ledger — as governed REST, SOAP, and GraphQL endpoints.

RESTSOAPGraphQL

The output

An evidence-backed report, not a hunch.

Every investigation ends in a structured report your whole organization can trust — and auditors can review.

Ranked root cause & confidence

A primary hypothesis with a calibrated confidence label — and the alternatives that were ruled out.

Source-linked evidence

Each conclusion cites the exact log, metric, ticket, or deploy — traceable back to the tool call.

Impact & next actions

Scope of impact and a recommended remediation — rollback, hotfix, replay, or escalation.

diagnara · reportresolved

Connection errors spike · checkout-api

INC-4906 · checkout-api · Production

Root cause93% confidence

The orders-db RDS instance hit its max_connections ceiling (820/820) after release v3.4 opened a new client per request without pooling, throwing FATAL: too many connections across checkout.

Recommended next action

Raise max_connections and recycle the leaked pool, then roll back v3.4. Add a connection-pool leak assertion to the load-test gate.

Evidence · 4 sources

rds://DatabaseConnections pinned at 820 ceiling04:12

metrics://checkout 5xx 0.2% → 31%spike

github://v3.4 opens a client per requestdiff

Tools used · 4

awsCloudWatchawsRDS consolelogsKibanagithubPR #341

Resolved in 3m12s9 tool calls · 1 approval · full audit trail

Schema drift after release · billing-service

INC-4877 · billing-service · Production

Root cause94% confidence

Liquibase changeset add-invoice-status never ran — the deploy migration job exited on a stale DATABASECHANGELOGLOCK, so v2.9 queried a column that doesn't exist yet: column "invoice_status" does not exist.

Recommended next action

Release the stale changelog lock and re-run the Liquibase migration, then add a post-deploy migration verification gate to the release runbook.

Evidence · 4 sources

logs://column "invoice_status" does not exist×2.3k

db://DATABASECHANGELOGLOCK held 41mlock

deploy://migrate job exit code 1step

Tools used · 4

logsKibanasqlPostgresciLiquibasegithubchangeset #512

Resolved in 1m54s11 tool calls · 2 approvals · full audit trail

High latency after deploy · contract-provider

INC-4821 · contract-provider · Production

Root cause96% confidence

PR #2287 changed the contract_lookup query to one that no longer uses its index, causing full-table scans under peak load and a 4.2× p95 latency increase across three regions.

Recommended next action

Roll back PR #2287 or ship a hotfix re-adding the index. Add a migration check to the deploy runbook.

Evidence · 5 sources

metrics://p95 latency 210ms → 880ms post-deploy02:14

github://PR #2287 drops idx_contract_lookupdiff

logs://seq scan on contracts (1.2k slow queries)1.2k

Tools used · 4

logsKibanametricsGrafanagithubcontract-querygithubPR #2287

Resolved in 2m38s14 tool calls · 2 approvals · full audit trail

Governance & Trust

Move faster without giving up control.

Governance isn't an add-on — it's the product. RBAC, quotas, audit, and human-in-the-loop approval are enforced by default, so teams accelerate while compliance stays intact.

RBAC & scope

Admin, Manager, and User roles — every action scoped to approved tools and environments.

Quotas & rate limits

Execution and tool usage are bounded to protect system stability and prevent abuse.

Audit trail

Runs, policy changes, and tool access are logged as immutable, reviewable events.

Human-in-the-loop

Low-confidence, sensitive, or costly actions pause for explicit human approval.

See Diagnara run a live investigation.

Book a guided demo tailored to your incident workflow — from signal detection to evidence-backed conclusion, with full traceability and governance.