Navigating AI risks: understanding and mitigating prompt injection

Written by Conor O'DonnellStaff Software Engineer – AI , Tines

Published on December 1, 2025

Prompt injection is inevitable. Safe AI workflows come from good design, not perfect models. 

AI is becoming a routine part of technical operations. Teams use models to support ticket triage, incident routing, knowledge retrieval, code analysis, and customer interactions. As these agents move closer to production workflows, the conversation about security becomes much more important.

One of the most persistent and widely misunderstood issues is prompt injection. It is not a vulnerability that can be fully patched or trained away. It is a predictable consequence of how language models interpret text.

The right response is not to fear the risk, but to design workflows that remain safe even when prompt injection occurs.

This post explains why prompt injection persists, where guardrails fall short, and how teams can architect AI-driven workflows that remain secure and reliable.

What prompt injection looks like 

Prompt injection occurs when user-provided input causes an AI system to override its intended behavior. Any system that mixes untrusted inputs with the ability to take sensitive actions is vulnerable.

Consider a simple example – an attacker sends a crafted email to a support inbox. An AI agent reads the message and has access to internal tools. The attacker embeds instructions within the text that convince the agent to use those tools in unintended ways, such as sending sensitive information to an external server.

The dynamic is always the same. If a user can supply text to the model, that user can influence the model more than zero percent of the time.

Why prompt injection cannot be fully fixed 

Prompt injection is not like a traditional code vulnerability. There is no universal patch because the issue is tied to the fundamental mechanics of large language models.

Several properties make complete prevention impossible:

  • Models can produce arbitrary output
    They generate text based on massively complex decision networks, not strict rules. With the right input, almost any output is possible.

  • Correctness is subjective
    Determining whether a model’s response is “safe” depends on context. Malicious and benign instructions often look similar.

  • Some input will always succeed
    Adding “do not do X” to a prompt does not meaningfully constrain the model. Like any other instruction, it can be overridden by malicious inputs.

For these reasons, teams should treat prompt injection as an unavoidable environmental condition, not a solvable flaw.

Guardrails help, but they are not perfect 

Guardrails and classifiers are commonly deployed to filter harmful prompts or redact sensitive information. They can be helpful, but they also share the model’s limitations.

  • Guardrails are AI systems with failure rates
    They produce false positives and false negatives. There is no path to absolute accuracy.

  • They can be bypassed
    Attackers can change the phrasing of a request, encode the content, or shift formats to evade detection. Even simple transformations can break guardrail logic.

  • Strict guardrails create workflow friction
    Over-filtering can block legitimate tasks, reject entire messages due to mild language, or halt valuable automation.

Guardrails can be useful as additive protections. They cannot serve as a complete defense within an AI workflow.

The practical path: reduce impact instead of chasing zero-risk 

Since prompt injection cannot be eliminated, the goal shifts to impact reduction. A safe system is one where an injected instruction cannot cause serious harm.

A simple framework helps teams evaluate and improve their workflows.

  1. Assess the worst-case impact

    If the agent followed a malicious instruction perfectly, what could happen?

    • Low-impact example

      • An AI agent that classifies incoming IT tickets or drafts routing suggestions based on simple metadata. If prompt injection occurs, the impact is limited to misrouted tickets or unhelpful categorization. It creates noise, not harm.

    • High-impact example

      • An AI agent with access to identity systems or incident tooling, such as the ability to reset passwords, disable security controls, retrieve configuration data, or send outbound messages to unmanaged destinations, without human approval. If prompt injection succeeds, the agent could expose sensitive information or trigger unintended actions across critical systems. The operational and security impact is significant.

    Understanding the worst-case scenario determines how tightly controlled the surrounding workflow must be.

  2. Restrict the tools the model can access

    This is the single most effective way to improve safety.

    • Control which tools are available and lock down action parameters

      • If the agent composes emails, ensure there is a human-in-the-loop step prior to sending. Do not let the model send data without approval if it also has access to sensitive data.

    • Constrain data access

      • Allow retrieval only of the records associated with the predefined data sources. Prevent broad search queries.

    • Add deterministic validation

      • Validate inputs before they reach the model. Check domains, formats, and origins with deterministic rules.

    This shifts trust away from the model and toward the workflow itself, where hard boundaries can be enforced.

  3. Reduce the consequences of unintended actions

    Not every workflow should allow full autonomy. Consider sticking to Meta’s “Rule of Two”:

    • Insert human review for sensitive or high-impact actions.

    • Use AI for suggestions, drafts, or summaries instead of direct execution

    • Route sensitive operations through controlled internal APIs that enforce policy.

    These steps significantly reduce the risk even if a prompt injection attempt succeeds.

The responsible way to operationalize AI is not to demand perfect compliance from the model, but to shape the environment around it.

Designing workflows for safety 

Safe workflows share several traits:

  • Clear separation of trusted instructions and untrusted input.

  • Deterministic boundaries on what tools can do.

  • Minimal privilege for model-driven actions.

  • Human oversight where outcomes matter.

  • Acknowledgement that prompt injection attempts will eventually occur.

With these principles in place, teams can adopt AI confidently without creating hidden channels for data exposure or unintended behavior.

If you are exploring how to put these principles into practice, Tines can help you build workflows that follow this approach by default. The platform allows you to tightly scope tool actions, enforce deterministic checks around LLM behavior, and include human review where needed, all while integrating cleanly with the systems you already use. It will not solve prompt injection on its own, but it gives you a safer foundation to orchestrate AI in a way that limits any negative impact and keeps control in the hands of your team.

AI can be a powerful operational accelerator. Achieving that safely depends on good workflow design, and managing expectations of model behavior.

Get started building secure, intelligent workflows today for free in Tines.

Built by you,
powered by Tines

Already have an account? Log in.