Test CLI — Adversarial Simulation

npm install @safepaste/test @safepaste/core
Flag	Description	Default
`--format <report\|json\|jsonl>`	Output format	`report`
`--strict`	Strict mode (detection threshold 25 instead of 35)	off
`--categories <cat1,cat2,...>`	Test specific categories only	all 13
`--pass-threshold <N>`	Minimum detection rate, 0 to 1	`0.8`
`--file <path>`	Read prompt from file	-
`--help`	Show help text	-
`--version`	Show version number	-
Code	Meaning
`0`	Detection rate >= pass threshold (default 80%)
`1`	Detection rate < pass threshold — your prompt may be vulnerable
`2`	Usage error (missing prompt, invalid flag, file not found)
Category	Description
`instruction_override`	Attempts to replace or override the system's instructions
`role_hijacking`	Tries to change the AI's assigned role or persona
`system_prompt`	Attempts to extract or reveal the system prompt
`exfiltration`	Tries to exfiltrate data via URLs, markdown, or other channels
`secrecy`	Instructs the AI to hide its actions from the user
`jailbreak`	Attempts to bypass safety guardrails and restrictions
`obfuscation`	Uses encoding, character substitution, or formatting tricks to hide payloads
`instruction_chaining`	Chains multiple instructions to slip malicious ones through
`meta`	Meta-level attacks about the AI's own behavior or training
`tool_call_injection`	Injects fake tool calls or function invocations
`system_message_spoofing`	Spoofs system-level messages or delimiters
`roleplay_jailbreak`	Uses roleplay scenarios to bypass restrictions
`multi_turn_injection`	Simulates multi-turn conversation context to inject instructions
Field	Type	Description
`pass`	boolean	Whether the detection rate meets the pass threshold
`prompt`	string	The original target prompt
`config`	object	Configuration used: `threshold`, `categories` count, `passThreshold`, `strict`
`summary`	object	`total`, `detected`, `missed`, `rate`, `pass`
`categories`	object	Per-category stats: `{ total, detected }` for each category
`variants`	array	Full details for every variant: category, strategy, text, flagged, score, risk
Option	Type	Default	Description
`strict`	boolean	`false`	Use strict mode (threshold 25 instead of 35)
`categories`	string[] \| null	`null`	Array of category names to test, or null for all
`passThreshold`	number	`0.8`	Minimum detection rate (0-1) for a passing result