Beyond the widely discussed frameworks (NIST AI RMF, MITRE ATLAS, OWASP LLM Top 10, etc.), there is a second tier of influential but lesser-known frameworks and methodologies that major AI labs and advanced security teams increasingly rely on.
These “missing frameworks” typically fall into three categories:
- AI evaluation & assurance frameworks
- AI red-teaming frameworks
- AI runtime security / posture frameworks
Below are six important frameworks rarely discussed in public governance conversations but increasingly important inside AI labs.
1. TEVV (Test, Evaluation, Verification, and Validation)
Type: AI assurance and evaluation methodology
Originally used in defense systems engineering, TEVV has become a foundational concept for evaluating AI safety and robustness.
Purpose:
- test AI behavior
- evaluate performance under adversarial conditions
- verify system correctness
- validate real-world outcomes
Security agencies emphasize that AI red teaming should operate within a TEVV framework, integrating testing and validation across the system lifecycle. (CISA)
Why it matters
AI systems are probabilistic, meaning you cannot fully verify them with traditional software testing. TEVV provides a structured approach to continuous evaluation.
Criticism
TEVV is a methodology rather than a security framework, so organizations must design their own operational processes.
2. NVIDIA Garak
Garak
Type: AI red-teaming framework/tool
Garak is an open-source vulnerability scanner designed specifically for large language models and generative AI systems. (Wikipedia)
Architecture:
- Probes → adversarial attacks
- Generators → target models
- Detectors → vulnerability detection
This allows automated testing for:
- prompt injection
- jailbreak attacks
- unsafe outputs
- data leakage
Why it matters
It provides automated AI penetration testing, similar to how tools like Nmap or Metasploit transformed traditional cybersecurity.
Criticism
- Coverage of attack techniques is incomplete
- Some security teams report unstable tooling
- Requires manual interpretation of results
3. Microsoft PyRIT (Python Risk Identification Toolkit)
Type: LLM red-teaming framework
PyRIT is Microsoft’s open-source framework for systematic AI attack testing.
Capabilities include:
- taxonomy-based attack generation
- jailbreak testing
- sensitive data extraction testing
- prompt mutation
Security teams use it to run repeatable adversarial tests across different models. (Mend.io)
Why it matters
It enables structured red-team automation, which is essential for continuous AI evaluation.
Criticism
- Primarily designed for LLM systems
- Limited support for other AI modalities (vision, RL).
4. Trustworthy AI Posture (TAIP)
Type: Continuous AI assurance framework
TAIP reframes AI trust as a continuous signal rather than a one-time audit.
Key idea:
- governance rules
- operational evidence
- runtime monitoring
These signals combine to generate a continuous trust posture for AI systems. (arXiv)
Why it matters
Traditional compliance frameworks assume static software, but AI systems change behavior dynamically.
TAIP enables:
- continuous monitoring
- automated assurance
- machine-speed compliance signals
Criticism
Still mostly experimental and not widely deployed.
5. Automated AI Red-Teaming Frameworks
Modern research is exploring fully automated adversarial testing systems.
Example: AutoRedTeamer
Capabilities include:
- autonomous attack generation
- multi-agent testing architectures
- adaptive attack discovery
Research shows automated red-teaming frameworks can discover significantly more vulnerabilities than manual testing. (arXiv)
Why it matters
AI systems have huge attack surfaces that humans cannot test manually.
Criticism
- Can produce unrealistic attack scenarios
- Hard to prioritize discovered vulnerabilities.
6. Evolutionary Agent Security Evaluation (NAAMSE)
Type: agentic AI security evaluation framework
Agentic AI systems introduce complex multi-step behaviors.
The NAAMSE framework evaluates these systems using evolutionary adversarial attacks, where attack strategies mutate over time to find weaknesses. (arXiv)
Core features:
- genetic prompt mutation
- adaptive attack discovery
- multi-turn agent testing
Why it matters
Traditional red-teaming assumes static prompts, but agentic systems evolve dynamically.
Criticism
Still largely academic and experimental.
The Real AI Security Stack
Putting everything together, a mature AI security program in 2026 typically combines multiple frameworks:
| Layer | Example Frameworks |
|---|---|
| Governance | NIST AI RMF, ISO 42001 |
| Threat intelligence | MITRE ATLAS |
| Application security | OWASP LLM Top 10 |
| MLSecOps | Databricks AI Security Framework |
| Red teaming | Garak, PyRIT |
| Evaluation | TEVV |
| Continuous assurance | TAIP |
In other words:
AI security is becoming a layered architecture rather than a single framework.
The Deeper Insight
The biggest conceptual shift happening in AI security:
Traditional security = protect software
AI security = continuously evaluate behavior
AI models can change behavior due to:
- training updates
- prompt interactions
- new external data
- agent tool use
That means security becomes a continuous testing process, not just code auditing.
✅ One interesting trend:
Many security researchers now believe AI security will converge into something similar to the Kubernetes security ecosystem — a stack of tools rather than one dominant framework.
