The Self-Evolving Agent Ecosystem — Trading agents that evolve through Darwinian selection and adversarial self-play
-
Updated
Apr 13, 2026 - Python
The Self-Evolving Agent Ecosystem — Trading agents that evolve through Darwinian selection and adversarial self-play
AI Robustness Evaluation System
Open-source AI agent red-team engine, SDK, and CLI. Run offline or against the Humanbound Platform.
Open-source framework for building and testing LLM-powered applications: IRIS (single-agent orchestration), AETHER (declarative multi-agent systems), and AEGIS (adversarial security testing). Developed at MSU Denver's Community-Centered Computing (C3) Lab.
MCP server that wraps the xAI Grok CLI. Lets Claude Code, Cursor, Cline, and any MCP host use Grok as a peer code reviewer, adversary, and second-opinion consultant.
Open-source test harness for AI agents. Stress-test production agents with adversarial multi-turn scenarios in CI
Red-team your AI agents from any coding IDE. Adversarial security testing skills for Claude Code, Cursor, Codex, and 40+ agents.
Mechanism-grounded taxonomy of 40 LLM jailbreak patterns across 10 categories. 8,000-trial bootstrap evaluation for the June 2026 frontier (Claude Opus 4-8, GPT-5.5, Gemini 3.5, DeepSeek V4). Every citation direct-WebFetch verified; refuted claims documented.
A marketplace of Claude Code plugins for adversarial security and architectural code review.
Elenchus MCP Server - Adversarial verification system for code review
Claude Code skill that stress-tests startup ideas with adversarial AI agents — 68 animals, elimination rounds, blind scoring. Your idea either survives or you get 3 pivots
Benchmark LLM jailbreak resilience across providers with standardized tests, adversarial mode, rich analytics, and a clean Web UI.
Multi-perspective code review council for Claude Code. 3 advisors by default, 10 agents in deep mode (Opus + Codex). Evidence chains, adversarial self-test, dual-path verdict. Based on Karpathy's LLM Council.
AI safety evaluation framework testing LLM epistemic robustness under adversarial self-history manipulation
Context engineering toolkit for LLMs — pack, cache, debug, red-team, and orchestrate context windows. Council of Experts, adversarial testing, immune system, context compiler, drift detection, multi-agent entanglement. TypeScript + Python.
Adversarial testing of LLMs on constraint satisfaction deadlocks
Adversarial testing for LLM applications. Calibrated prompt-injection and jailbreak scans with reproducible reports. Pip install, async-first, framework-agnostic.
Agent-driven adversarial paper audit framework
Evolutionary adversarial testing framework for AI safety using quality-diversity search to discover interpretable, transferable vulnerabilities across LLMs. (ICLR 2026)
Add a description, image, and links to the adversarial-testing topic page so that developers can more easily learn about it.
To associate your repository with the adversarial-testing topic, visit your repo's landing page and select "manage topics."