Guard — Agent Runtime Security
Protect your AI agent pipeline from prompt injection attacks delivered through tool inputs and outputs.
On this page
Overview
AI agents call external tools — web search, file read, code execution, database queries — whose outputs can contain indirect prompt injection payloads. An attacker embeds malicious instructions in a web page or document, and the agent retrieves and processes them as if they were trusted input.
Guard sits between the agent and its tools. It wraps tool functions with the same deterministic detection engine as @safepaste/core, scanning both inputs and outputs at runtime. When an attack is detected, Guard can log it, warn about it, block it, or hand the decision to your callback — depending on the mode you choose.
Guard never calls the SafePaste API. It runs @safepaste/core locally, in-process, with zero network overhead.
Installation
npm install @safepaste/guard @safepaste/core
@safepaste/core is a peer dependency — you provide it. This means Guard always uses whatever version of the detection engine you have installed, and you control when to upgrade.
Quick Start
Three steps to protect your agent tools from prompt injection.
Create a guard instance
Choose a mode that determines what happens when an attack is detected. Use block to throw on detection, or warn to log and continue.
Wrap your tool functions
Call guard.wrapTool(name, fn) to create a guarded version of any tool function. The wrapped function scans inputs before execution and outputs after.
Handle blocked attacks
In block mode, a GuardError is thrown when an attack is detected. Catch it and respond appropriately.
var { createGuard } = require('@safepaste/guard'); // 1. Create a guard instance var guard = createGuard({ mode: 'block' }); // 2. Wrap your tool function var safeSearch = guard.wrapTool('web_search', searchFn); // 3. Handle blocked attacks try { var result = await safeSearch('latest news'); } catch (e) { if (e.name === 'GuardError') { console.log('Blocked:', e.guardResult.scan.risk); } }
Modes
The mode controls what happens when Guard detects a prompt injection. Choose the right mode for your use case.
| Mode | On detection | Use case |
|---|---|---|
log |
Returns the GuardResult silently. No side effects. | Monitoring and analytics. Collect data without affecting behavior. |
warn |
Calls console.warn() with detection details. Tool still executes. |
Development and staging. See detections in your logs without blocking. |
block |
Throws a GuardError. Tool does not execute. |
Production enforcement. Prevent attacks from reaching your tools. |
callback |
Calls your function with the GuardResult. Return false to block. |
Custom logic. Route to a review queue, apply business rules, etc. |
Per-direction mode configuration
You can set different modes for inputs and outputs. For example, warn on inputs but block on outputs — useful when tool outputs (like web page content) are the higher-risk vector.
var guard = createGuard({ mode: { input: 'warn', output: 'block' } });
Callback mode
Pass a function as the mode to make your own allow/block decisions. Return false to block; any other return value allows the tool to proceed.
var guard = createGuard({ mode: function (guardResult) { if (guardResult.scan.score > 80) { return false; // block high-confidence attacks } sendToReviewQueue(guardResult); // log medium-confidence for review } });
API Reference
createGuard(options)
Creates a guard instance with the given configuration. Returns an object with scanInput, scanOutput, wrapTool, and wrapTools methods.
| Option | Type | Default | Description |
|---|---|---|---|
mode |
string | Function | Object |
'warn' |
Detection mode: 'log', 'warn', 'block', a callback function, or { input, output } for per-direction config. |
strict |
boolean |
false |
Use strict detection threshold (25 instead of 35). Catches more borderline cases. |
on.detection |
Function |
null |
Called on every detection (flagged result), regardless of mode. Receives the GuardResult. |
on.blocked |
Function |
null |
Called when a detection results in a block (before the GuardError is thrown). Receives the GuardResult. |
on.error |
Function |
null |
Called when the scanning process itself fails (not a block). Receives the error and context { tool, point }. |
guard.wrapTool(name, fn)
Wraps a standalone tool function. The returned function scans the input arguments before calling the original function, then scans the output after it resolves. Works with both sync and async functions.
var safeSearch = guard.wrapTool('web_search', searchFn); var safeRead = guard.wrapTool('file_read', readFileFn); // Wrapped functions have the same signature as the originals var results = await safeSearch('query'); var content = await safeRead('/path/to/file.txt');
The wrapped function calls fn with this=null. To wrap a method, bind it first: guard.wrapTool('name', obj.method.bind(obj)).
guard.wrapTools(toolMap)
Wraps all functions in a plain { name: fn } object. Returns a new object with the same keys, where each function is wrapped. Non-function values are copied by reference.
var tools = { web_search: searchFn, file_read: readFileFn, code_exec: execFn }; var safeTools = guard.wrapTools(tools); // safeTools.web_search, safeTools.file_read, safeTools.code_exec are all guarded
guard.scanInput(text, ctx) / guard.scanOutput(text, ctx)
Manually scan text without wrapping a tool function. Useful when you want to scan at a specific point in your pipeline rather than wrapping entire functions.
// Scan tool input manually var inputResult = guard.scanInput(userQuery, { tool: 'web_search' }); // Scan tool output manually var outputResult = guard.scanOutput(webPageContent, { tool: 'web_search' }); if (outputResult.flagged) { console.log('Attack detected in output:', outputResult.scan.risk); }
scanToolInput(text, opts) / scanToolOutput(text, opts)
Standalone functions that scan text without creating a guard instance. These always use log mode — they never throw or warn, just return the GuardResult. Useful for one-off checks or when you want to handle the result entirely yourself.
var { scanToolInput, scanToolOutput } = require('@safepaste/guard'); var result = scanToolInput(text, { tool: 'web_search', strict: true }); if (result.flagged) { // handle it your way }
GuardResult Shape
Every scan returns a GuardResult object with the following structure.
| Field | Type | Description |
|---|---|---|
flagged |
boolean |
Whether a prompt injection was detected (score exceeded threshold). |
action |
'pass' | 'log' | 'warn' | 'block' | 'callback' |
The action taken based on the mode. 'pass' means no detection. |
scan.flagged |
boolean |
Same as top-level flagged. |
scan.risk |
'low' | 'medium' | 'high' |
Risk level based on score. |
scan.score |
number |
Threat score from 0 to 100. |
scan.threshold |
number |
Score threshold used (35 normal, 25 strict). |
scan.matches |
Array |
Matched patterns with category, pattern, and weight. |
scan.meta |
Object |
Additional scan metadata. |
guard.point |
'input' | 'output' |
Whether this scan was on the tool input or output. |
guard.tool |
string | null |
Name of the tool being guarded. |
guard.mode |
string |
The resolved mode for this direction. |
guard.timestamp |
number |
Unix timestamp (ms) when the scan was performed. |
guard.durationMs |
number |
How long the scan took in milliseconds. |
GuardError
When a detection occurs in block mode (or when a callback returns false), Guard throws a GuardError. This is a standard Error with additional properties.
try { var result = await safeTool(input); } catch (e) { if (e.name === 'GuardError') { // e.message — Human-readable description of what was blocked // e.name — Always 'GuardError' // e.guardResult — Full GuardResult object (see table above) console.log(e.message); // "Prompt injection blocked in output (web_search) — risk: high, score: 82" console.log(e.guardResult.scan.matches); // [{ category: 'instruction_override', pattern: '...', weight: 35 }] console.log(e.guardResult.guard.tool); // 'web_search' } }
| Property | Type | Description |
|---|---|---|
name |
string |
Always 'GuardError'. Use this for reliable catch filtering. |
message |
string |
Describes the block: direction, tool name, risk level, and score. |
guardResult |
Object |
The full GuardResult that triggered the block. |
Framework Examples
Guard works with any agent framework. Here are examples for the most common ones.
OpenAI SDK
Wrap tool functions before passing them to the OpenAI function-calling flow.
var { createGuard } = require('@safepaste/guard'); var OpenAI = require('openai'); var client = new OpenAI(); var guard = createGuard({ mode: 'block' }); // Your tool implementations var tools = { web_search: async function (query) { /* ... */ }, read_file: async function (path) { /* ... */ } }; // Wrap all tools var safeTools = guard.wrapTools(tools); // In your function-calling loop: for (var call of toolCalls) { try { var result = await safeTools[call.function.name]( JSON.parse(call.function.arguments) ); } catch (e) { if (e.name === 'GuardError') { result = { error: 'Tool blocked: prompt injection detected' }; } } }
Vercel AI SDK
Guard the tool execute functions in your Vercel AI SDK tool definitions.
var { createGuard } = require('@safepaste/guard'); var { tool } = require('ai'); var { z } = require('zod'); var guard = createGuard({ mode: 'block' }); var searchTool = tool({ description: 'Search the web', parameters: z.object({ query: z.string() }), execute: guard.wrapTool('web_search', async function ({ query }) { // Your search implementation return await fetchSearchResults(query); }) });
LangChain JS
Wrap the function inside your LangChain DynamicTool or DynamicStructuredTool definitions.
var { createGuard } = require('@safepaste/guard'); var { DynamicTool } = require('@langchain/core/tools'); var guard = createGuard({ mode: 'block' }); var searchTool = new DynamicTool({ name: 'web_search', description: 'Search the web for information', func: guard.wrapTool('web_search', async function (query) { // Your search implementation return await fetchSearchResults(query); }) });
Custom Agent Loop
For custom agent implementations, use manual scanning at the points you control.
var { createGuard } = require('@safepaste/guard'); var guard = createGuard({ mode: { input: 'warn', output: 'block' }, on: { detection: function (r) { logToMonitoring(r); }, blocked: function (r) { alertOps(r); } } }); async function agentLoop(messages) { while (true) { var response = await llm.chat(messages); if (!response.toolCall) break; // Scan the input the agent is sending to the tool guard.scanInput(response.toolCall.args, { tool: response.toolCall.name }); // Execute the tool var toolResult = await executeTool(response.toolCall); // Scan the output coming back from the tool try { guard.scanOutput(toolResult, { tool: response.toolCall.name }); } catch (e) { if (e.name === 'GuardError') { toolResult = '[blocked: injection detected in tool output]'; } } messages.push({ role: 'tool', content: toolResult }); } }
Fail-Open Design
on.error callback is called so you can log the failure. Guard degrades to no scanning rather than blocking everything. The only exception is GuardError — an intentional block from detecting an attack — which is always re-thrown.
This means you can add Guard to a production pipeline without risk of it becoming a single point of failure. If something goes wrong with the scan, your agent keeps running and you get notified through the on.error callback.
var guard = createGuard({ mode: 'block', on: { error: function (err, ctx) { // Scanning failed — tool still executes console.error('Guard scan error on', ctx.tool, ctx.point, err); metrics.increment('guard.scan_error'); } } });
Need Help?
Questions about integrating Guard into your agent pipeline? We're here to help.
Contact Support