Test CLI — Adversarial Simulation

Simulate prompt injection attacks against your system prompts. Verify your AI applications are protected in CI/CD.

On this page

Overview

Simulate prompt injection attacks against your system prompts. @safepaste/test generates adversarial variants by injecting known attack payloads into a target prompt, scans each with @safepaste/core, and reports which attacks were detected.

Primary use case: CI/CD gating — verify that your AI system prompts are protected against known prompt injection categories before deploying to production.

No API key needed. The Test CLI runs entirely locally using @safepaste/core's deterministic detection engine. No network requests, no data sent anywhere.

Installation

Install both the test CLI and the core detection engine as dependencies.

Bash
npm install @safepaste/test @safepaste/core

Or run directly without installing.

Bash
npx @safepaste/test "You are a helpful assistant"
Peer dependency: @safepaste/core (>= 0.3.0) must be installed alongside @safepaste/test. The test CLI treats it as a black box — text in, detection result out.

Quick Start

Three ways to provide your system prompt to the test CLI.

1

Positional argument

Pass the prompt directly as a quoted string.

Bash
safepaste-test "You are a helpful coding assistant"
2

File input

Read the prompt from a file — useful for long system prompts.

Bash
safepaste-test --file system-prompt.txt
3

Stdin

Pipe the prompt from another command or script.

Bash
echo "You are a helpful assistant" | safepaste-test

CLI Reference

All available flags and their defaults.

Flag Description Default
--format <report|json|jsonl> Output format report
--strict Strict mode (detection threshold 25 instead of 35) off
--categories <cat1,cat2,...> Test specific categories only all 13
--pass-threshold <N> Minimum detection rate, 0 to 1 0.8
--file <path> Read prompt from file -
--help Show help text -
--version Show version number -

Output Formats

Choose from three output formats depending on your use case.

Report (default)

Human-readable summary with an ASCII bar chart showing detection rates per category. This is the default when no --format flag is specified.

Output
SafePaste Attack Simulation
============================================================
Target: "You are a helpful coding assistant"
Categories: 13 | Variants: 78 | Threshold: 35

  instruction_override          6/6  100%  ████████████
  role_hijacking                6/6  100%  ████████████
  system_prompt                 6/6  100%  ████████████
  exfiltration                  6/6  100%  ████████████
  secrecy                       6/6  100%  ████████████
  jailbreak                     6/6  100%  ████████████
  obfuscation                   4/6   67%  ████████
  instruction_chaining          6/6  100%  ████████████
  meta                          5/6   83%  ██████████
  tool_call_injection           6/6  100%  ████████████
  system_message_spoofing       6/6  100%  ████████████
  roleplay_jailbreak            0/6    0%
  multi_turn_injection          5/6   83%  ██████████

Result: PASS  63/78 detected (80.8% >= 80% threshold)

JSON

Single JSON object with full results. Use --format json for programmatic consumption.

Bash
safepaste-test --format json "You are a helpful assistant"
JSON
{
  "pass": true,
  "prompt": "You are a helpful assistant",
  "config": { "threshold": 35, "categories": 13, "passThreshold": 0.8 },
  "summary": {
    "total": 78,
    "detected": 63,
    "missed": 15,
    "rate": 0.808,
    "pass": true
  },
  "categories": { /* per-category stats */ },
  "variants": [ /* full variant details */ ]
}

JSONL

One JSON object per line, one per variant. Use --format jsonl for streaming or log ingestion.

Bash
safepaste-test --format jsonl "You are a helpful assistant"
JSONL
{"category":"instruction_override","strategy":"prepend","flagged":true,"score":72}
{"category":"instruction_override","strategy":"append","flagged":true,"score":68}
{"category":"instruction_override","strategy":"wrap","flagged":true,"score":65}
...

Exit Codes

The CLI uses exit codes for CI/CD integration. A non-zero exit code fails your pipeline.

Code Meaning
0 Detection rate >= pass threshold (default 80%)
1 Detection rate < pass threshold — your prompt may be vulnerable
2 Usage error (missing prompt, invalid flag, file not found)

CI/CD Integration

Add prompt injection testing to your deployment pipeline. The CLI exits with code 1 when detection falls below the pass threshold, which fails the CI step.

GitHub Actions

YAML
name: Prompt Injection Test
on: [push, pull_request]
jobs:
  security-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 18
      - run: npm install @safepaste/test @safepaste/core
      - name: Test prompt injection detection
        run: npx @safepaste/test --file prompts/system.txt --pass-threshold 0.8

Generic CI

Bash
# Basic — fail the build if detection rate is below 80%
safepaste-test "Your system prompt here" || exit 1

# Strict mode with higher threshold
safepaste-test --strict --pass-threshold 0.9 --file system-prompt.txt

# Save a JSON report as a build artifact
safepaste-test --format json --file system-prompt.txt > security-report.json
Tip: Start with --pass-threshold 0.7 and increase it as you improve your system prompt's resilience. A threshold of 0.9 with --strict is a strong security gate.

Attack Categories

The test CLI generates payloads across all 13 prompt injection attack categories. Each category has 2 seed payloads that get expanded into 6 variants (3 injection strategies each).

Category Description
instruction_override Attempts to replace or override the system's instructions
role_hijacking Tries to change the AI's assigned role or persona
system_prompt Attempts to extract or reveal the system prompt
exfiltration Tries to exfiltrate data via URLs, markdown, or other channels
secrecy Instructs the AI to hide its actions from the user
jailbreak Attempts to bypass safety guardrails and restrictions
obfuscation Uses encoding, character substitution, or formatting tricks to hide payloads
instruction_chaining Chains multiple instructions to slip malicious ones through
meta Meta-level attacks about the AI's own behavior or training
tool_call_injection Injects fake tool calls or function invocations
system_message_spoofing Spoofs system-level messages or delimiters
roleplay_jailbreak Uses roleplay scenarios to bypass restrictions
multi_turn_injection Simulates multi-turn conversation context to inject instructions

Filter to specific categories with the --categories flag.

Bash
safepaste-test --categories exfiltration,tool_call_injection --file system-prompt.txt

How It Works

The test CLI follows a four-step process to generate and evaluate adversarial test cases.

1

Select seed payloads

Selects 2 seed payloads per attack category (26 total across 13 categories). Each payload is a known prompt injection technique.

2

Generate injection variants

For each payload, generates 3 injection variants using different strategies: prepend (payload before prompt), append (payload after prompt), and wrap (payload wraps the prompt). This produces 78 total variants.

3

Scan each variant

Scans each variant with @safepaste/core's scanPrompt() function. The core engine runs 61 weighted detection patterns and returns a score, risk level, and flagged status.

4

Aggregate and report

Aggregates results by category, computes the overall detection rate, and determines pass/fail against your threshold. Outputs the report in your chosen format.

Programmatic API

Use the run() function directly in your Node.js code for custom integrations, test suites, or dashboards.

JavaScript
var { run } = require('@safepaste/test');

var report = run('You are a helpful coding assistant', {
  strict: false,
  categories: null,       // all 13 categories
  passThreshold: 0.8
});

console.log(report.pass);             // true or false
console.log(report.summary.rate);     // 0.808
console.log(report.summary.detected); // 63
console.log(report.summary.total);    // 78

Return value

The run() function returns an object with the following structure.

Field Type Description
pass boolean Whether the detection rate meets the pass threshold
prompt string The original target prompt
config object Configuration used: threshold, categories count, passThreshold, strict
summary object total, detected, missed, rate, pass
categories object Per-category stats: { total, detected } for each category
variants array Full details for every variant: category, strategy, text, flagged, score, risk

Options

Option Type Default Description
strict boolean false Use strict mode (threshold 25 instead of 35)
categories string[] | null null Array of category names to test, or null for all
passThreshold number 0.8 Minimum detection rate (0-1) for a passing result

Need Help?

Questions about the Test CLI or CI/CD integration? Reach out.

Contact Support