Evaluation & Safety
Generative AI Governance
Enterprise AI governance frameworks — policy creation, usage guidelines, risk assessment, compliance tracking, and responsible AI frameworks
Published: 2026-04-22 · Last updated: 2026-04-13
Generative AI Governance
As generative AI moves from experimental to production-critical, organizations need structured governance frameworks to ensure responsible, compliant, and effective use. This guide provides practical frameworks for creating AI governance programs that balance innovation with risk management.
Why Governance Matters
| Risk Category | Example | Business Impact |
|---|---|---|
| Regulatory | EU AI Act non-compliance | Fines up to 7% of global revenue |
| Data Privacy | PII leakage in prompts | GDPR fines, reputational damage |
| IP Risk | Training on copyrighted material | Litigation, injunctions |
| Security | Prompt injection attacks | Data breaches, unauthorized access |
| Reputational | Biased or harmful outputs | Brand damage, customer loss |
| Operational | Unvetted model causing errors | Financial loss, incorrect decisions |
Governance Framework Structure
A comprehensive AI governance program has five layers:
┌─────────────────────────────────────┐
│ Layer 1: Policy & Principles │ What we believe and commit to
├─────────────────────────────────────┤
│ Layer 2: Usage Guidelines │ What employees can and cannot do
├─────────────────────────────────────┤
│ Layer 3: Risk Classification │ How we categorize AI use cases
├─────────────────────────────────────┤
│ Layer 4: Technical Controls │ What systems enforce the rules
├─────────────────────────────────────┤
│ Layer 5: Monitoring & Audit │ How we verify compliance
└─────────────────────────────────────┘
Layer 1: Policy & Principles
Core AI Principles
# AI Usage Principles
## 1. Human Oversight
All AI-generated outputs that impact users, customers, or business decisions
must have meaningful human review before deployment.
## 2. Transparency
We disclose when users are interacting with AI systems. We do not present
AI-generated content as human-created without disclosure.
## 3. Privacy Protection
No personally identifiable information (PII), confidential business data,
or sensitive employee data may be sent to external AI providers without
explicit approval from the Data Protection Officer.
## 4. Fairness
AI systems must be tested for demographic bias before deployment and
monitored for bias drift in production.
## 5. Accountability
Each AI system has a named owner who is responsible for its behavior,
outputs, and compliance with these principles.
## 6. Security
AI systems must undergo security review including prompt injection testing
before production deployment.
## 7. Auditability
All AI system decisions, prompts, and outputs must be logged for
audit and retrospective analysis.
Layer 2: Usage Guidelines
Approved Use Cases Matrix
| Use Case | Approval Level | Data Classification | Review Required |
|---|---|---|---|
| Internal drafting (emails, docs) | Self-service | Public only | None |
| Code generation assistance | Team lead | Public + Internal | Quarterly review |
| Customer-facing content | Manager + Legal | Public only | Per-use review |
| Data analysis & summarization | Manager | Public + Internal | Monthly audit |
| Decision support (hiring, lending) | Director + Compliance | None allowed | Per-deployment audit |
| Automated decision-making | VP + Legal + Board | None allowed | Continuous monitoring |
| Creative content generation | Marketing lead | Public only | Human review required |
| Code review & security analysis | Security team | Internal only | None (internal tool) |
Prompt Data Classification
from enum import Enum
from dataclasses import dataclass
class DataClassification(Enum):
PUBLIC = "public"
INTERNAL = "internal"
CONFIDENTIAL = "confidential"
RESTRICTED = "restricted"
@dataclass
class PromptComplianceCheck:
allowed_classifications: list[DataClassification]
blocked_patterns: list[str]
requires_approval: bool
def check(self, prompt_text: str, data_class: DataClassification) -> dict:
result = {
"allowed": True,
"violations": [],
"requires_approval": self.requires_approval,
}
# Check data classification
if data_class not in self.allowed_classifications:
result["allowed"] = False
result["violations"].append(
f"Data classification '{data_class.value}' not allowed for this AI tool"
)
# Check for blocked patterns
import re
for pattern in self.blocked_patterns:
if re.search(pattern, prompt_text, re.IGNORECASE):
result["allowed"] = False
result["violations"].append(f"Blocked pattern detected: {pattern}")
return result
# Default policy for employee self-service AI tools
SELF_SERVICE_POLICY = PromptComplianceCheck(
allowed_classifications=[DataClassification.PUBLIC],
blocked_patterns=[
r"password|secret|api[_-]?key|token",
r"social[_\s]?security|ssn",
r"credit[_\s]?card",
r"salary|compensation|payroll",
r"patient|medical.*record",
],
requires_approval=False,
)
Layer 3: Risk Classification
AI Risk Tiers
| Tier | Risk Level | Description | Examples | Requirements |
|---|---|---|---|---|
| Tier 1 | Minimal | Informational only, no impact | Internal summarization, brainstorming | Self-service, basic logging |
| Tier 2 | Limited | Supports but doesn't drive decisions | Draft generation, code suggestions | Manager approval, bias testing |
| Tier 3 | High | Influences decisions with user impact | Customer chatbots, content publishing | Legal review, red teaming |
| Tier 4 | Critical | Automated decisions affecting people | Hiring screening, credit decisions | Board approval, continuous monitoring |
Risk Assessment Questionnaire
RISK_ASSESSMENT_QUESTIONS = [
{
"id": "Q1",
"question": "Does the AI system interact directly with end users?",
"weights": {"yes": 2, "no": 0},
},
{
"id": "Q2",
"question": "Can the AI system's outputs cause financial harm if incorrect?",
"weights": {"yes": 3, "no": 0},
},
{
"id": "Q3",
"question": "Does the system process personal or sensitive data?",
"weights": {"yes": 3, "no": 0},
},
{
"id": "Q4",
"question": "Are the AI's outputs used for decisions about individuals?",
"weights": {"yes": 4, "no": 0},
},
{
"id": "Q5",
"question": "Is there a human in the loop reviewing outputs before action?",
"weights": {"yes": 0, "no": 3},
},
{
"id": "Q6",
"question": "Can the system be influenced by adversarial inputs?",
"weights": {"yes": 2, "no": 0},
},
{
"id": "Q7",
"question": "Is the system used in a regulated domain (healthcare, finance, etc.)?",
"weights": {"yes": 3, "no": 0},
},
]
def calculate_risk_tier(answers: dict[str, str]) -> tuple[str, int]:
"""Calculate the risk tier based on assessment answers."""
total_score = 0
for question in RISK_ASSESSMENT_QUESTIONS:
answer = answers.get(question["id"], "no")
total_score += question["weights"].get(answer, 0)
if total_score <= 2:
return "Tier 1 (Minimal)", total_score
elif total_score <= 5:
return "Tier 2 (Limited)", total_score
elif total_score <= 9:
return "Tier 3 (High)", total_score
else:
return "Tier 4 (Critical)", total_score
Layer 4: Technical Controls
Prompt Filtering and Guardrails
class PromptGuardrail:
"""Technical enforcement of governance policies."""
def __init__(self):
self.pii_detector = PIIDetector()
self.toxicity_detector = ToxicityDetector()
self.injection_detector = InjectionDetector()
async def check_prompt(self, prompt: str, user_id: str) -> dict:
"""Run all guardrails on a prompt before it reaches the LLM."""
checks = {
"pii": await self._check_pii(prompt),
"toxicity": await self._check_toxicity(prompt),
"injection": await self._check_injection(prompt),
"rate_limit": await self._check_rate_limit(user_id),
"allowed_model": await self._check_model_access(user_id),
}
violations = [k for k, v in checks.items() if not v["passed"]]
return {
"allowed": len(violations) == 0,
"checks": checks,
"violations": violations,
"user_id": user_id,
"timestamp": datetime.utcnow().isoformat(),
}
async def _check_pii(self, prompt: str) -> dict:
detected = self.pii_detector.find(prompt)
return {
"passed": len(detected) == 0,
"detected": detected,
"action": "block_and_redact" if detected else "allow",
}
async def _check_injection(self, prompt: str) -> dict:
risk_score = self.injection_detector.score(prompt)
return {
"passed": risk_score < 0.7,
"risk_score": risk_score,
"action": "block" if risk_score >= 0.7 else "allow",
}
Output Validation
class OutputGuardrail:
"""Validate LLM outputs before delivering them to users."""
def __init__(self):
self.hallucination_detector = HallucinationDetector()
self.fact_checker = FactChecker(knowledge_base="company_docs")
self.compliance_checker = ComplianceChecker()
async def validate_output(
self, prompt: str, output: str, context: dict = None
) -> dict:
validation = {
"hallucination_risk": await self._check_hallucination(prompt, output),
"factual_accuracy": await self._check_facts(output),
"compliance": await self._check_compliance(output),
"safety": await self._check_safety(output),
}
risk_level = self._compute_risk_level(validation)
validation["risk_level"] = risk_level
validation["action"] = self._determine_action(risk_level)
return validation
def _compute_risk_level(self, validation: dict) -> str:
scores = {
"hallucination_risk": validation["hallucination_risk"]["score"],
"factual_accuracy": 1 - validation["factual_accuracy"]["accuracy_score"],
"compliance": 1 - validation["compliance"]["compliance_score"],
"safety": 1 - validation["safety"]["safety_score"],
}
max_score = max(scores.values())
avg_score = sum(scores.values()) / len(scores)
if max_score > 0.8:
return "critical"
elif max_score > 0.6 or avg_score > 0.4:
return "high"
elif max_score > 0.4 or avg_score > 0.25:
return "medium"
else:
return "low"
def _determine_action(self, risk_level: str) -> str:
actions = {
"critical": "block_and_escalate",
"high": "require_human_review",
"medium": "flag_for_review",
"low": "allow",
}
return actions[risk_level]
Layer 5: Monitoring & Audit
Compliance Dashboard
class GovernanceDashboard:
"""Track governance compliance across all AI systems."""
def __init__(self, audit_log: AuditLog):
self.audit_log = audit_log
async def generate_report(self, period: str = "last_30_days") -> dict:
logs = await self.audit_log.query(period)
return {
"summary": {
"total_ai_requests": len(logs),
"total_violations": len([l for l in logs if l.violation]),
"violation_rate": len([l for l in logs if l.violation]) / len(logs),
"blocked_prompts": len([l for l in logs if l.action == "blocked"]),
"human_reviews_triggered": len([l for l in logs if l.action == "human_review"]),
},
"by_system": self._group_by_system(logs),
"by_risk_tier": self._group_by_risk_tier(logs),
"top_violations": self._top_violations(logs),
"trend": self._compliance_trend(logs),
"recommendations": self._generate_recommendations(logs),
}
def _top_violations(self, logs: list) -> list[dict]:
from collections import Counter
violation_types = Counter(l.violation_type for l in logs if l.violation)
return [
{"type": vtype, "count": count, "percentage": count / len(logs) * 100}
for vtype, count in violation_types.most_common(10)
]
def _generate_recommendations(self, logs: list) -> list[str]:
recommendations = []
# High violation rate
violation_rate = sum(1 for l in logs if l.violation) / len(logs)
if violation_rate > 0.05:
recommendations.append(
"Violation rate exceeds 5%. Review training materials and access controls."
)
# Repeated offenders
user_violations = Counter(l.user_id for l in logs if l.violation)
repeat_offenders = [u for u, c in user_violations.items() if c > 5]
if repeat_offenders:
recommendations.append(
f"{len(repeat_offenders)} users have 5+ violations. Schedule targeted training."
)
# System-specific issues
system_violations = Counter(l.system_name for l in logs if l.violation)
for system, count in system_violations.items():
if count > 100:
recommendations.append(
f"System '{system}' has {count} violations. Conduct a full audit."
)
return recommendations
Audit Trail
Every AI interaction should be logged for compliance:
@dataclass
class AuditEntry:
timestamp: str
user_id: str
system_name: str
risk_tier: str
prompt_hash: str # Hash for privacy, raw text stored separately with access controls
model_used: str
output_hash: str
tokens_used: int
cost_usd: float
guardrail_results: dict
action_taken: str # allowed, blocked, flagged, human_review
human_reviewer: str | None
human_review_decision: str | None
compliance_status: str # compliant, violation, pending_review
class AuditLog:
def __init__(self, database):
self.db = database
async def log(self, entry: AuditEntry):
await self.db.insert("ai_audit_log", entry)
async def query(self, period: str, filters: dict = None) -> list[AuditEntry]:
query = f"SELECT * FROM ai_audit_log WHERE timestamp >= {period}"
if filters:
for key, value in filters.items():
query += f" AND {key} = '{value}'"
return await self.db.execute(query)
Regulatory Compliance Mapping
| Regulation | Key Requirement | How to Comply |
|---|---|---|
| EU AI Act | Risk classification, transparency, human oversight | Implement risk tiers, disclose AI use, maintain human review |
| GDPR | Data minimization, right to explanation | Log data flows, provide output explanations, enable data deletion |
| SOC 2 | Access controls, monitoring, audit trails | Implement guardrails, maintain audit logs, regular reviews |
| HIPAA (healthcare) | PHI protection | Block PHI in prompts, use HIPAA-compliant providers, BAAs |
| CCPA (California) | Consumer rights, transparency | Disclose AI use in consumer interactions, honor opt-out |
| NYC Bias Law | Bias auditing for employment AI | Regular bias testing, impact assessments for hiring tools |
Building a Governance Program
Phase 1: Foundation (Weeks 1-4)
- Appoint an AI Governance Lead (or committee)
- Draft core AI principles and usage policy
- Inventory all existing AI tool usage
- Identify highest-risk use cases
Phase 2: Controls (Weeks 5-12)
- Deploy prompt/output guardrails
- Implement audit logging
- Create risk assessment process
- Launch employee AI training
Phase 3: Maturation (Weeks 13-24)
- Establish continuous monitoring
- Build compliance dashboards
- Conduct first AI risk audit
- Create incident response procedures
Phase 4: Optimization (Ongoing)
- Regular policy reviews and updates
- Automated compliance testing
- Industry benchmarking
- Regulatory change monitoring
Cross-References
- For security-specific guidance, see LLM Security Best Practices
- For prompt injection testing, see Prompt Security Testing
- For evaluating model bias, see LLM Bias Mitigation
- For adversarial testing, see AI Safety & Red Teaming
- For monitoring production systems, see LLM Observability & Monitoring
Related docs
LLM Security Best Practices
Securing LLM applications — API key management, prompt injection defense, data privacy, supply chain security, and compliance frameworks
Data Platform Reviewer Agent Implementation Guide
Architecture, workflow design, metrics, and rollout guidance for a data platform reviewer agent in production.
Developer Productivity Reviewer Agent Implementation Guide
Architecture, workflow design, metrics, and rollout guidance for a developer productivity reviewer agent in production.
Related agents
Data Platform Reviewer Agent
Data Platform agent blueprint focused on inspect drafts, tool outputs, or decisions for gaps, policy issues, and missing evidence before work moves forward for analysts and engineers need better query generation, pipeline debugging, and dataset explanation across changing schemas.
Developer Productivity Reviewer Agent
Developer Productivity agent blueprint focused on inspect drafts, tool outputs, or decisions for gaps, policy issues, and missing evidence before work moves forward for engineering teams want reliable help with issue triage, runbook guidance, and change review without obscuring system ownership.
Finance Operations Reviewer Agent
Finance Operations agent blueprint focused on inspect drafts, tool outputs, or decisions for gaps, policy issues, and missing evidence before work moves forward for finance teams need faster reconciliation, exception review, and policy-aware reporting for recurring operational workflows.