The Least Discussed AI Frameworks

Beyond the widely discussed frameworks (NIST AI RMF, MITRE ATLAS, OWASP LLM Top 10, etc.), there is a second tier of influential but lesser-known frameworks and methodologies that major AI labs and advanced security teams increasingly rely on.

These “missing frameworks” typically fall into three categories:

  • AI evaluation & assurance frameworks
  • AI red-teaming frameworks
  • AI runtime security / posture frameworks

Below are six important frameworks rarely discussed in public governance conversations but increasingly important inside AI labs.


1. TEVV (Test, Evaluation, Verification, and Validation)

Type: AI assurance and evaluation methodology

Originally used in defense systems engineering, TEVV has become a foundational concept for evaluating AI safety and robustness.

Purpose:

  • test AI behavior
  • evaluate performance under adversarial conditions
  • verify system correctness
  • validate real-world outcomes

Security agencies emphasize that AI red teaming should operate within a TEVV framework, integrating testing and validation across the system lifecycle. (CISA)

Why it matters

AI systems are probabilistic, meaning you cannot fully verify them with traditional software testing. TEVV provides a structured approach to continuous evaluation.

Criticism

TEVV is a methodology rather than a security framework, so organizations must design their own operational processes.


2. NVIDIA Garak

Garak

Type: AI red-teaming framework/tool

Garak is an open-source vulnerability scanner designed specifically for large language models and generative AI systems. (Wikipedia)

Architecture:

  • Probes → adversarial attacks
  • Generators → target models
  • Detectors → vulnerability detection

This allows automated testing for:

  • prompt injection
  • jailbreak attacks
  • unsafe outputs
  • data leakage

Why it matters

It provides automated AI penetration testing, similar to how tools like Nmap or Metasploit transformed traditional cybersecurity.

Criticism

  • Coverage of attack techniques is incomplete
  • Some security teams report unstable tooling
  • Requires manual interpretation of results

3. Microsoft PyRIT (Python Risk Identification Toolkit)

Type: LLM red-teaming framework

PyRIT is Microsoft’s open-source framework for systematic AI attack testing.

Capabilities include:

  • taxonomy-based attack generation
  • jailbreak testing
  • sensitive data extraction testing
  • prompt mutation

Security teams use it to run repeatable adversarial tests across different models. (Mend.io)

Why it matters

It enables structured red-team automation, which is essential for continuous AI evaluation.

Criticism

  • Primarily designed for LLM systems
  • Limited support for other AI modalities (vision, RL).

4. Trustworthy AI Posture (TAIP)

Type: Continuous AI assurance framework

TAIP reframes AI trust as a continuous signal rather than a one-time audit.

Key idea:

  • governance rules
  • operational evidence
  • runtime monitoring

These signals combine to generate a continuous trust posture for AI systems. (arXiv)

Why it matters

Traditional compliance frameworks assume static software, but AI systems change behavior dynamically.

TAIP enables:

  • continuous monitoring
  • automated assurance
  • machine-speed compliance signals

Criticism

Still mostly experimental and not widely deployed.


5. Automated AI Red-Teaming Frameworks

Modern research is exploring fully automated adversarial testing systems.

Example: AutoRedTeamer

Capabilities include:

  • autonomous attack generation
  • multi-agent testing architectures
  • adaptive attack discovery

Research shows automated red-teaming frameworks can discover significantly more vulnerabilities than manual testing. (arXiv)

Why it matters

AI systems have huge attack surfaces that humans cannot test manually.

Criticism

  • Can produce unrealistic attack scenarios
  • Hard to prioritize discovered vulnerabilities.

6. Evolutionary Agent Security Evaluation (NAAMSE)

Type: agentic AI security evaluation framework

Agentic AI systems introduce complex multi-step behaviors.

The NAAMSE framework evaluates these systems using evolutionary adversarial attacks, where attack strategies mutate over time to find weaknesses. (arXiv)

Core features:

  • genetic prompt mutation
  • adaptive attack discovery
  • multi-turn agent testing

Why it matters

Traditional red-teaming assumes static prompts, but agentic systems evolve dynamically.

Criticism

Still largely academic and experimental.


The Real AI Security Stack

Putting everything together, a mature AI security program in 2026 typically combines multiple frameworks:

LayerExample Frameworks
GovernanceNIST AI RMF, ISO 42001
Threat intelligenceMITRE ATLAS
Application securityOWASP LLM Top 10
MLSecOpsDatabricks AI Security Framework
Red teamingGarak, PyRIT
EvaluationTEVV
Continuous assuranceTAIP

In other words:

AI security is becoming a layered architecture rather than a single framework.


The Deeper Insight

The biggest conceptual shift happening in AI security:

Traditional security = protect software

AI security = continuously evaluate behavior

AI models can change behavior due to:

  • training updates
  • prompt interactions
  • new external data
  • agent tool use

That means security becomes a continuous testing process, not just code auditing.


One interesting trend:

Many security researchers now believe AI security will converge into something similar to the Kubernetes security ecosystem — a stack of tools rather than one dominant framework.