Guard — Agent Runtime Security

Protect your AI agent pipeline from prompt injection attacks delivered through tool inputs and outputs.

On this page

Overview

AI agents call external tools — web search, file read, code execution, database queries — whose outputs can contain indirect prompt injection payloads. An attacker embeds malicious instructions in a web page or document, and the agent retrieves and processes them as if they were trusted input.

Guard sits between the agent and its tools. It wraps tool functions with the same deterministic detection engine as @safepaste/core, scanning both inputs and outputs at runtime. When an attack is detected, Guard can log it, warn about it, block it, or hand the decision to your callback — depending on the mode you choose.

Guard never calls the SafePaste API. It runs @safepaste/core locally, in-process, with zero network overhead.

Installation

Shell
npm install @safepaste/guard @safepaste/core
Peer dependency: @safepaste/core is a peer dependency — you provide it. This means Guard always uses whatever version of the detection engine you have installed, and you control when to upgrade.

Quick Start

Three steps to protect your agent tools from prompt injection.

1

Create a guard instance

Choose a mode that determines what happens when an attack is detected. Use block to throw on detection, or warn to log and continue.

2

Wrap your tool functions

Call guard.wrapTool(name, fn) to create a guarded version of any tool function. The wrapped function scans inputs before execution and outputs after.

3

Handle blocked attacks

In block mode, a GuardError is thrown when an attack is detected. Catch it and respond appropriately.

JavaScript
var { createGuard } = require('@safepaste/guard');

// 1. Create a guard instance
var guard = createGuard({ mode: 'block' });

// 2. Wrap your tool function
var safeSearch = guard.wrapTool('web_search', searchFn);

// 3. Handle blocked attacks
try {
  var result = await safeSearch('latest news');
} catch (e) {
  if (e.name === 'GuardError') {
    console.log('Blocked:', e.guardResult.scan.risk);
  }
}

Modes

The mode controls what happens when Guard detects a prompt injection. Choose the right mode for your use case.

ModeOn detectionUse case
log Returns the GuardResult silently. No side effects. Monitoring and analytics. Collect data without affecting behavior.
warn Calls console.warn() with detection details. Tool still executes. Development and staging. See detections in your logs without blocking.
block Throws a GuardError. Tool does not execute. Production enforcement. Prevent attacks from reaching your tools.
callback Calls your function with the GuardResult. Return false to block. Custom logic. Route to a review queue, apply business rules, etc.

Per-direction mode configuration

You can set different modes for inputs and outputs. For example, warn on inputs but block on outputs — useful when tool outputs (like web page content) are the higher-risk vector.

JavaScript
var guard = createGuard({
  mode: { input: 'warn', output: 'block' }
});

Callback mode

Pass a function as the mode to make your own allow/block decisions. Return false to block; any other return value allows the tool to proceed.

JavaScript
var guard = createGuard({
  mode: function (guardResult) {
    if (guardResult.scan.score > 80) {
      return false; // block high-confidence attacks
    }
    sendToReviewQueue(guardResult); // log medium-confidence for review
  }
});

API Reference

createGuard(options)

Creates a guard instance with the given configuration. Returns an object with scanInput, scanOutput, wrapTool, and wrapTools methods.

OptionTypeDefaultDescription
mode string | Function | Object 'warn' Detection mode: 'log', 'warn', 'block', a callback function, or { input, output } for per-direction config.
strict boolean false Use strict detection threshold (25 instead of 35). Catches more borderline cases.
on.detection Function null Called on every detection (flagged result), regardless of mode. Receives the GuardResult.
on.blocked Function null Called when a detection results in a block (before the GuardError is thrown). Receives the GuardResult.
on.error Function null Called when the scanning process itself fails (not a block). Receives the error and context { tool, point }.

guard.wrapTool(name, fn)

Wraps a standalone tool function. The returned function scans the input arguments before calling the original function, then scans the output after it resolves. Works with both sync and async functions.

JavaScript
var safeSearch = guard.wrapTool('web_search', searchFn);
var safeRead = guard.wrapTool('file_read', readFileFn);

// Wrapped functions have the same signature as the originals
var results = await safeSearch('query');
var content = await safeRead('/path/to/file.txt');

The wrapped function calls fn with this=null. To wrap a method, bind it first: guard.wrapTool('name', obj.method.bind(obj)).

guard.wrapTools(toolMap)

Wraps all functions in a plain { name: fn } object. Returns a new object with the same keys, where each function is wrapped. Non-function values are copied by reference.

JavaScript
var tools = {
  web_search: searchFn,
  file_read: readFileFn,
  code_exec: execFn
};

var safeTools = guard.wrapTools(tools);
// safeTools.web_search, safeTools.file_read, safeTools.code_exec are all guarded

guard.scanInput(text, ctx) / guard.scanOutput(text, ctx)

Manually scan text without wrapping a tool function. Useful when you want to scan at a specific point in your pipeline rather than wrapping entire functions.

JavaScript
// Scan tool input manually
var inputResult = guard.scanInput(userQuery, { tool: 'web_search' });

// Scan tool output manually
var outputResult = guard.scanOutput(webPageContent, { tool: 'web_search' });

if (outputResult.flagged) {
  console.log('Attack detected in output:', outputResult.scan.risk);
}

scanToolInput(text, opts) / scanToolOutput(text, opts)

Standalone functions that scan text without creating a guard instance. These always use log mode — they never throw or warn, just return the GuardResult. Useful for one-off checks or when you want to handle the result entirely yourself.

JavaScript
var { scanToolInput, scanToolOutput } = require('@safepaste/guard');

var result = scanToolInput(text, { tool: 'web_search', strict: true });
if (result.flagged) {
  // handle it your way
}

GuardResult Shape

Every scan returns a GuardResult object with the following structure.

FieldTypeDescription
flagged boolean Whether a prompt injection was detected (score exceeded threshold).
action 'pass' | 'log' | 'warn' | 'block' | 'callback' The action taken based on the mode. 'pass' means no detection.
scan.flagged boolean Same as top-level flagged.
scan.risk 'low' | 'medium' | 'high' Risk level based on score.
scan.score number Threat score from 0 to 100.
scan.threshold number Score threshold used (35 normal, 25 strict).
scan.matches Array Matched patterns with category, pattern, and weight.
scan.meta Object Additional scan metadata.
guard.point 'input' | 'output' Whether this scan was on the tool input or output.
guard.tool string | null Name of the tool being guarded.
guard.mode string The resolved mode for this direction.
guard.timestamp number Unix timestamp (ms) when the scan was performed.
guard.durationMs number How long the scan took in milliseconds.

GuardError

When a detection occurs in block mode (or when a callback returns false), Guard throws a GuardError. This is a standard Error with additional properties.

JavaScript
try {
  var result = await safeTool(input);
} catch (e) {
  if (e.name === 'GuardError') {
    // e.message    — Human-readable description of what was blocked
    // e.name       — Always 'GuardError'
    // e.guardResult — Full GuardResult object (see table above)

    console.log(e.message);
    // "Prompt injection blocked in output (web_search) — risk: high, score: 82"

    console.log(e.guardResult.scan.matches);
    // [{ category: 'instruction_override', pattern: '...', weight: 35 }]

    console.log(e.guardResult.guard.tool);
    // 'web_search'
  }
}
PropertyTypeDescription
name string Always 'GuardError'. Use this for reliable catch filtering.
message string Describes the block: direction, tool name, risk level, and score.
guardResult Object The full GuardResult that triggered the block.

Framework Examples

Guard works with any agent framework. Here are examples for the most common ones.

OpenAI SDK

Wrap tool functions before passing them to the OpenAI function-calling flow.

JavaScript
var { createGuard } = require('@safepaste/guard');
var OpenAI = require('openai');

var client = new OpenAI();
var guard = createGuard({ mode: 'block' });

// Your tool implementations
var tools = {
  web_search: async function (query) { /* ... */ },
  read_file: async function (path) { /* ... */ }
};

// Wrap all tools
var safeTools = guard.wrapTools(tools);

// In your function-calling loop:
for (var call of toolCalls) {
  try {
    var result = await safeTools[call.function.name](
      JSON.parse(call.function.arguments)
    );
  } catch (e) {
    if (e.name === 'GuardError') {
      result = { error: 'Tool blocked: prompt injection detected' };
    }
  }
}

Vercel AI SDK

Guard the tool execute functions in your Vercel AI SDK tool definitions.

JavaScript
var { createGuard } = require('@safepaste/guard');
var { tool } = require('ai');
var { z } = require('zod');

var guard = createGuard({ mode: 'block' });

var searchTool = tool({
  description: 'Search the web',
  parameters: z.object({ query: z.string() }),
  execute: guard.wrapTool('web_search', async function ({ query }) {
    // Your search implementation
    return await fetchSearchResults(query);
  })
});

LangChain JS

Wrap the function inside your LangChain DynamicTool or DynamicStructuredTool definitions.

JavaScript
var { createGuard } = require('@safepaste/guard');
var { DynamicTool } = require('@langchain/core/tools');

var guard = createGuard({ mode: 'block' });

var searchTool = new DynamicTool({
  name: 'web_search',
  description: 'Search the web for information',
  func: guard.wrapTool('web_search', async function (query) {
    // Your search implementation
    return await fetchSearchResults(query);
  })
});

Custom Agent Loop

For custom agent implementations, use manual scanning at the points you control.

JavaScript
var { createGuard } = require('@safepaste/guard');

var guard = createGuard({
  mode: { input: 'warn', output: 'block' },
  on: {
    detection: function (r) { logToMonitoring(r); },
    blocked: function (r) { alertOps(r); }
  }
});

async function agentLoop(messages) {
  while (true) {
    var response = await llm.chat(messages);
    if (!response.toolCall) break;

    // Scan the input the agent is sending to the tool
    guard.scanInput(response.toolCall.args, {
      tool: response.toolCall.name
    });

    // Execute the tool
    var toolResult = await executeTool(response.toolCall);

    // Scan the output coming back from the tool
    try {
      guard.scanOutput(toolResult, {
        tool: response.toolCall.name
      });
    } catch (e) {
      if (e.name === 'GuardError') {
        toolResult = '[blocked: injection detected in tool output]';
      }
    }

    messages.push({ role: 'tool', content: toolResult });
  }
}

Fail-Open Design

Guard never breaks your agent pipeline. If the scanning process itself fails (a bug in scanning, unexpected input types, etc.), the tool still executes and the on.error callback is called so you can log the failure. Guard degrades to no scanning rather than blocking everything. The only exception is GuardError — an intentional block from detecting an attack — which is always re-thrown.

This means you can add Guard to a production pipeline without risk of it becoming a single point of failure. If something goes wrong with the scan, your agent keeps running and you get notified through the on.error callback.

JavaScript
var guard = createGuard({
  mode: 'block',
  on: {
    error: function (err, ctx) {
      // Scanning failed — tool still executes
      console.error('Guard scan error on', ctx.tool, ctx.point, err);
      metrics.increment('guard.scan_error');
    }
  }
});

Need Help?

Questions about integrating Guard into your agent pipeline? We're here to help.

Contact Support