How to build AI agents your security team will approve

Published on June 16, 2026

A security engineer spends three weeks building an AI agent that triages phishing reports. The demo lands well. Then it hits the security review queue, and the questions start: Which tools can it call? What happens if it misclassifies? Who approves an account lockout at 2 a.m.? Where are the logs? Three more weeks pass, and the agent is still sitting in staging.

This is the pattern most teams run into. The agent works, but the governance story doesn't. Closing that gap requires building governance into the agent architecture from day one instead of retrofitting it after deployment.

Intelligent workflow platforms that unify deterministic automation, agentic AI, and human decision-making under a single governance model enable teams to do so without rebuilding the agent twice.

How AI agents work: the components inside a working agent 

An AI agent is one piece of a larger workflow, and governance has to apply across the entire workflow. Security review looks at each piece on its own, so it helps to know what those pieces are. Most production agents have four core parts:

  • Perception layer: the agent's senses. It pulls in data from outside sources, such as alert payloads from a SIEM, telemetry from an EDR, and identity events from an Identity Provider (IdP) like Okta, then cleans it up so the rest of the system can work with it.

  • Reasoning engine: the agent's brain. This is usually a large language model like Claude or GPT-4. It reads the task, figures out a plan, and decides which tools to use. Most production agents follow the ReAct pattern (Reason + Act): think, act on a tool, observe the result, reflect, and repeat.

  • Tool use layer: the agent's hands. This is where reasoning connects to real systems. For security teams, that might mean pulling related events from a SIEM or isolating a laptop through an EDR.

  • Memory system: the agent's notebook. It keeps track of what happened earlier in the conversation or investigation, so the agent can pick up context when needed.

Guardrails sit across all four parts as runtime checks on what the agent can read, remember, decide, call, and say. Traditional guardrails ask whether the words an AI produces are safe. Agent guardrails ask whether the actions an AI takes are safe, because isolating a laptop or revoking a login can't be undone the way a chatbot reply can. A typical action guardrail might allow an agent to query an EDR on its own but require human approval before it isolates a host.

Because LLMs are non-deterministic, the same task can play out differently from one run to the next, even when the agent reaches the right outcome. That's why each component needs its own controls, and why those controls have to be built into the architecture rather than enabled as a setting after the agent is already running.

Types of AI agents (and which type security teams actually approve) 

AI agents carry different levels of risk, and security teams review them accordingly. Reviewers first ask how autonomous and predictable the agent is. Deterministic agents follow strict rules and produce identical results every time. Fully autonomous agents reason their way to decisions and act without human input. Most production agents fall between these extremes.

Approval then hinges on a few practical questions: how risky is the agent's work, how much can it do alone, who's accountable when something fails, and can the team reconstruct what happened. The three patterns below map to those questions and show why security teams treat each one differently.

1. Deterministic workflow agents 

Deterministic agents follow explicit conditional logic with no autonomous reasoning outside defined parameters. A workflow playbook that auto-closes low-fidelity alerts matching exact signatures, or an automated ticket-routing rule based on asset classification, both qualify.

For example, a SOC team can set up a deterministic agent to triage low-severity vulnerability scan findings. When a Tenable or Qualys report flags a CVE with a CVSS score below a defined threshold on a non-production asset with no internet exposure, the agent opens a tracking ticket in Jira.

It then assigns the ticket to the asset owner, applies a standard remediation SLA, and tags it as "auto-triaged: low-risk finding." The same inputs always produce the same outcome, so analysts can audit a week's worth of triage decisions in minutes.

Every action traces to a specific rule. Behavior is identical across runs given identical inputs. Rollback is straightforward. Security teams approve these with little friction for well-bounded, reversible tasks.

2. Fully autonomous agents 

Fully autonomous agents perceive their environment, reason over goals, select tools, and execute multi-step actions without human approval at each step. Security teams reject these for production in most enterprise environments.

OWASP-aligned guidance emphasizes that high-risk or privileged actions should be governed by deterministic workflow controls, including human approval, permissions, and audit logging, rather than relying on the LLM alone. Autonomous agents with broad service account permissions and no human checkpoint before consequential actions violate this requirement.

3. Human-in-the-loop hybrid agents 

Human-in-the-loop (HITL) hybrid agents autonomously handle investigation, enrichment, and analysis, but require explicit human approval before executing consequential or irreversible actions. Security teams commonly approve this pattern for production, especially when built-in human approval checkpoints are in place.

The human approval decision creates an accountability record with authorizer identity and timestamp, while operation-level logging to a SIEM shortens the detection window. Blast radius stays bounded through least-privilege design.

Matching autonomy to blast radius is the underlying principle: agents can scale faster where consequences are low, while tighter controls hold where the stakes are greater. The build process below applies that principle at every step.

A step-by-step blueprint for designing review-ready AI agents 

Agents that pass security review share a common foundation: they automate well-documented manual processes with constrained scope, documented logic, and reversible actions.

The six steps below turn that foundation into a build sequence, from governance alignment to phased rollout. Each step builds on the last, so reviewers can trace every design decision to a control they already recognize.

Step 1. Start with the governance framework, not the use case 

Security review often fails when governance comes last. Before selecting a use case, align your agent design to the controls your security team already enforces. The NIST AI Risk Management Framework GOVERN function is the organizational governance reference for AI risk management.

It emphasizes documented risk tolerance, AI-related policies, and processes for third-party AI and other external dependencies. That sequence mirrors what organizations are already prioritizing: a 2025 Forrester study commissioned by Tines found that 54% prioritize AI governance, privacy, and regulations.

Concretely, your pre-submission package should include:

  • An agent registry entry: what the agent does, which tools it can call, what data it accesses.

  • A tiered authorization model: which actions are autonomous and which require human approval.

  • A defined blast radius: for each action category.

Starting with high-volume, rule-heavy workflows with clear success criteria is a simpler approval path than starting with ambiguous, strategic tasks.

Step 2. Map your triggers and enrichment chain 

Every agent workflow starts with a trigger. For a phishing response agent, the trigger is a user clicking "Report Phishing," delivered as a webhook payload. After the trigger fires, the agent calls external tools to gather context before classifying.

An enrichment workflow can add identity and endpoint context to a credential-dumping alert before deciding whether to escalate. Restrict which tools each agent persona can access and enforce tool allow lists at execution time, not just at configuration time.

Step 3. Build tiered decision logic with human gates 

After enrichment, the agent classifies the alert and selects an action tier. Low-confidence alerts that match known false-positive patterns and have no risk indicators can be auto-closed with documented reasoning, with no human needed.

High-severity, high-blast-radius actions, such as disabling a production identity or blocking a network segment, always require human approval before execution. Between those poles, agents post findings to Slack with recommended actions and wait for analyst approval.

Approval gates for irreversible decisions matter most, and teams should verify the logic in staging before going live. A June 2025 incident (CVE-2025-53773) highlighted the risk of an AI agent altering its approval settings to bypass human review. It then gained broader execution access. Guardrail controls themselves must be tamper-resistant and not modifiable by the agent at runtime.

Step 4. Wire response actions to named systems 

Response actions connect the agent's decisions to live infrastructure. For a confirmed phishing incident, this means blocking URLs in the email gateway, querying the SIEM for all recipients who received the same email, terminating sessions and forcing MFA re-enrollment for users who clicked, and creating an incident in the ticketing system with the full evidence chain.

In practice, that sequence runs as a single workflow that wires the agent's classification to deterministic HTTP requests against the email gateway, SIEM, IdP, and ticketing system, with each call logged as a structured audit record. This is the kind of coordination that security orchestration was built for.

For human-in-the-loop patterns, a Slack approval gate works well in production. The agent posts an enriched alert summary to a dedicated channel with approve/reject/escalate buttons. The on-call engineer reviews the reasoning and recommended action, and the workflow executes only after approval, with the engineer's identity and timestamp logged.

Step 5. Implement audit trails that reconstruct reasoning 

An agent that closes 200 cases overnight fails the audit test if no one can reconstruct why a specific case was dismissed. Every agent action needs a logged record that includes the agent identity, the human authorizer identity, where applicable, the operation performed, and the timestamp.

Audit memory stores access separately from action logs, because persistent memory is an active attack vector in which indirect prompt injection can poison long-term memory and cause persistent behavioral changes across sessions.

Structured schemas with server-side validation support the deterministic, testable records auditors expect. Server-side validation helps constrain and validate non-deterministic AI outputs. It improves the reliability and testability of the surrounding workflow, but by itself does not make AI operations deterministic.

Step 6. Run a phased rollout 

A staged rollout is usually easier to review than a broad launch with no baseline comparison. First, measure current alert volume, triage time, and false-positive rate with no automation deployed.

Then deploy AI-driven triage on only the top three alert categories, running in parallel with human analysts and tracking AI/human agreement rates daily. Expand to additional security tools only after the pilot demonstrates sufficient AI/human agreement on triage verdicts.

The approval problem is an architectural problem 

The gap between AI agent deployment and security approval narrows when governance is built into the architecture. Both the NIST AI RMF GOVERN function and practical governance heuristics support careful review of an agent's operating context and controls.

When teams build AI agents within the same intelligent workflow platform that runs their deterministic and human-led workflows, every AI Action produces structured metadata (input tokens, output tokens, model used, duration) captured as traceable audit records. Because Tines runs AI features on its own infrastructure, customer data stays secure and private by design.

Through Tines Agents (AI agents that operate within Tines workflows under the same governance as every other Action), teams configure reasoning and action boundaries based on the inputs and tools they control, with human-in-the-loop gates for any consequential action.

Tines Cases (Tines' built-in ticketing and incident-management surface) helps teams organize investigation information, actions, and progress in real time. Because audit logs, role-based access, and guardrails are built into the architecture, governance controls are in place from day one, no separate layer to bolt on, no developer dependency for approval gates.

The agents getting approved today are well-documented processes wrapped in deterministic infrastructure, with human judgment at the decision points that matter. You can start building by booking a demo to see how Tines brings AI agents, deterministic workflows, and human decision-making under a single governance model.

Frequently asked questions about building AI agents 

What is the difference between a deterministic workflow and an AI agent? 

A deterministic workflow follows predefined conditional logic, where every action traces back to a specific rule and behavior is identical across runs. An AI agent uses a reasoning engine to interpret tasks, plan multi-step approaches, and dynamically select tools, which requires additional governance controls to achieve the same level of auditability.

Which governance frameworks should I reference when submitting an AI agent for security review? 

The NIST AI Risk Management Framework is a common enterprise baseline for AI risk management, particularly the GOVERN function. Security guidance for LLM applications emphasizes strong controls around high-risk and privileged operations.

What controls reduce prompt injection risk in AI agents? 

OWASP's guidance on prompt injection focuses on mitigating risk with controls such as human approval for high-risk actions and human-in-the-loop safeguards for privileged operations. Prompt injection is an attack surface to manage through input validation, tool scope restrictions, and action-level guardrails rather than something a single control eliminates outright.

What should an AI agent audit trail contain? 

At minimum, agent identity, human authorizer identity (for actions requiring approval), operation performed, policy evaluation outcome, and timestamp, all captured in structured schemas with server-side validation. Memory store access should be audited separately from action logs because persistent memory is a distinct attack surface.

Built by you,
powered by Tines

Already have an account? Log in.