Evaluation & Safety

Generative AI Governance

Enterprise AI governance frameworks — policy creation, usage guidelines, risk assessment, compliance tracking, and responsible AI frameworks

Published: 2026-04-22 · Last updated: 2026-04-13

Generative AI Governance

As generative AI moves from experimental to production-critical, organizations need structured governance frameworks to ensure responsible, compliant, and effective use. This guide provides practical frameworks for creating AI governance programs that balance innovation with risk management.

Why Governance Matters

Risk CategoryExampleBusiness Impact
RegulatoryEU AI Act non-complianceFines up to 7% of global revenue
Data PrivacyPII leakage in promptsGDPR fines, reputational damage
IP RiskTraining on copyrighted materialLitigation, injunctions
SecurityPrompt injection attacksData breaches, unauthorized access
ReputationalBiased or harmful outputsBrand damage, customer loss
OperationalUnvetted model causing errorsFinancial loss, incorrect decisions

Governance Framework Structure

A comprehensive AI governance program has five layers:

┌─────────────────────────────────────┐
│     Layer 1: Policy & Principles    │  What we believe and commit to
├─────────────────────────────────────┤
│     Layer 2: Usage Guidelines       │  What employees can and cannot do
├─────────────────────────────────────┤
│     Layer 3: Risk Classification    │  How we categorize AI use cases
├─────────────────────────────────────┤
│     Layer 4: Technical Controls     │  What systems enforce the rules
├─────────────────────────────────────┤
│     Layer 5: Monitoring & Audit     │  How we verify compliance
└─────────────────────────────────────┘

Layer 1: Policy & Principles

Core AI Principles

# AI Usage Principles

## 1. Human Oversight
All AI-generated outputs that impact users, customers, or business decisions
must have meaningful human review before deployment.

## 2. Transparency
We disclose when users are interacting with AI systems. We do not present
AI-generated content as human-created without disclosure.

## 3. Privacy Protection
No personally identifiable information (PII), confidential business data,
or sensitive employee data may be sent to external AI providers without
explicit approval from the Data Protection Officer.

## 4. Fairness
AI systems must be tested for demographic bias before deployment and
monitored for bias drift in production.

## 5. Accountability
Each AI system has a named owner who is responsible for its behavior,
outputs, and compliance with these principles.

## 6. Security
AI systems must undergo security review including prompt injection testing
before production deployment.

## 7. Auditability
All AI system decisions, prompts, and outputs must be logged for
audit and retrospective analysis.

Layer 2: Usage Guidelines

Approved Use Cases Matrix

Use CaseApproval LevelData ClassificationReview Required
Internal drafting (emails, docs)Self-servicePublic onlyNone
Code generation assistanceTeam leadPublic + InternalQuarterly review
Customer-facing contentManager + LegalPublic onlyPer-use review
Data analysis & summarizationManagerPublic + InternalMonthly audit
Decision support (hiring, lending)Director + ComplianceNone allowedPer-deployment audit
Automated decision-makingVP + Legal + BoardNone allowedContinuous monitoring
Creative content generationMarketing leadPublic onlyHuman review required
Code review & security analysisSecurity teamInternal onlyNone (internal tool)

Prompt Data Classification

from enum import Enum
from dataclasses import dataclass

class DataClassification(Enum):
    PUBLIC = "public"
    INTERNAL = "internal"
    CONFIDENTIAL = "confidential"
    RESTRICTED = "restricted"


@dataclass
class PromptComplianceCheck:
    allowed_classifications: list[DataClassification]
    blocked_patterns: list[str]
    requires_approval: bool

    def check(self, prompt_text: str, data_class: DataClassification) -> dict:
        result = {
            "allowed": True,
            "violations": [],
            "requires_approval": self.requires_approval,
        }

        # Check data classification
        if data_class not in self.allowed_classifications:
            result["allowed"] = False
            result["violations"].append(
                f"Data classification '{data_class.value}' not allowed for this AI tool"
            )

        # Check for blocked patterns
        import re
        for pattern in self.blocked_patterns:
            if re.search(pattern, prompt_text, re.IGNORECASE):
                result["allowed"] = False
                result["violations"].append(f"Blocked pattern detected: {pattern}")

        return result


# Default policy for employee self-service AI tools
SELF_SERVICE_POLICY = PromptComplianceCheck(
    allowed_classifications=[DataClassification.PUBLIC],
    blocked_patterns=[
        r"password|secret|api[_-]?key|token",
        r"social[_\s]?security|ssn",
        r"credit[_\s]?card",
        r"salary|compensation|payroll",
        r"patient|medical.*record",
    ],
    requires_approval=False,
)

Layer 3: Risk Classification

AI Risk Tiers

TierRisk LevelDescriptionExamplesRequirements
Tier 1MinimalInformational only, no impactInternal summarization, brainstormingSelf-service, basic logging
Tier 2LimitedSupports but doesn't drive decisionsDraft generation, code suggestionsManager approval, bias testing
Tier 3HighInfluences decisions with user impactCustomer chatbots, content publishingLegal review, red teaming
Tier 4CriticalAutomated decisions affecting peopleHiring screening, credit decisionsBoard approval, continuous monitoring

Risk Assessment Questionnaire

RISK_ASSESSMENT_QUESTIONS = [
    {
        "id": "Q1",
        "question": "Does the AI system interact directly with end users?",
        "weights": {"yes": 2, "no": 0},
    },
    {
        "id": "Q2",
        "question": "Can the AI system's outputs cause financial harm if incorrect?",
        "weights": {"yes": 3, "no": 0},
    },
    {
        "id": "Q3",
        "question": "Does the system process personal or sensitive data?",
        "weights": {"yes": 3, "no": 0},
    },
    {
        "id": "Q4",
        "question": "Are the AI's outputs used for decisions about individuals?",
        "weights": {"yes": 4, "no": 0},
    },
    {
        "id": "Q5",
        "question": "Is there a human in the loop reviewing outputs before action?",
        "weights": {"yes": 0, "no": 3},
    },
    {
        "id": "Q6",
        "question": "Can the system be influenced by adversarial inputs?",
        "weights": {"yes": 2, "no": 0},
    },
    {
        "id": "Q7",
        "question": "Is the system used in a regulated domain (healthcare, finance, etc.)?",
        "weights": {"yes": 3, "no": 0},
    },
]

def calculate_risk_tier(answers: dict[str, str]) -> tuple[str, int]:
    """Calculate the risk tier based on assessment answers."""
    total_score = 0
    for question in RISK_ASSESSMENT_QUESTIONS:
        answer = answers.get(question["id"], "no")
        total_score += question["weights"].get(answer, 0)

    if total_score <= 2:
        return "Tier 1 (Minimal)", total_score
    elif total_score <= 5:
        return "Tier 2 (Limited)", total_score
    elif total_score <= 9:
        return "Tier 3 (High)", total_score
    else:
        return "Tier 4 (Critical)", total_score

Layer 4: Technical Controls

Prompt Filtering and Guardrails

class PromptGuardrail:
    """Technical enforcement of governance policies."""

    def __init__(self):
        self.pii_detector = PIIDetector()
        self.toxicity_detector = ToxicityDetector()
        self.injection_detector = InjectionDetector()

    async def check_prompt(self, prompt: str, user_id: str) -> dict:
        """Run all guardrails on a prompt before it reaches the LLM."""
        checks = {
            "pii": await self._check_pii(prompt),
            "toxicity": await self._check_toxicity(prompt),
            "injection": await self._check_injection(prompt),
            "rate_limit": await self._check_rate_limit(user_id),
            "allowed_model": await self._check_model_access(user_id),
        }

        violations = [k for k, v in checks.items() if not v["passed"]]

        return {
            "allowed": len(violations) == 0,
            "checks": checks,
            "violations": violations,
            "user_id": user_id,
            "timestamp": datetime.utcnow().isoformat(),
        }

    async def _check_pii(self, prompt: str) -> dict:
        detected = self.pii_detector.find(prompt)
        return {
            "passed": len(detected) == 0,
            "detected": detected,
            "action": "block_and_redact" if detected else "allow",
        }

    async def _check_injection(self, prompt: str) -> dict:
        risk_score = self.injection_detector.score(prompt)
        return {
            "passed": risk_score < 0.7,
            "risk_score": risk_score,
            "action": "block" if risk_score >= 0.7 else "allow",
        }

Output Validation

class OutputGuardrail:
    """Validate LLM outputs before delivering them to users."""

    def __init__(self):
        self.hallucination_detector = HallucinationDetector()
        self.fact_checker = FactChecker(knowledge_base="company_docs")
        self.compliance_checker = ComplianceChecker()

    async def validate_output(
        self, prompt: str, output: str, context: dict = None
    ) -> dict:
        validation = {
            "hallucination_risk": await self._check_hallucination(prompt, output),
            "factual_accuracy": await self._check_facts(output),
            "compliance": await self._check_compliance(output),
            "safety": await self._check_safety(output),
        }

        risk_level = self._compute_risk_level(validation)
        validation["risk_level"] = risk_level
        validation["action"] = self._determine_action(risk_level)

        return validation

    def _compute_risk_level(self, validation: dict) -> str:
        scores = {
            "hallucination_risk": validation["hallucination_risk"]["score"],
            "factual_accuracy": 1 - validation["factual_accuracy"]["accuracy_score"],
            "compliance": 1 - validation["compliance"]["compliance_score"],
            "safety": 1 - validation["safety"]["safety_score"],
        }

        max_score = max(scores.values())
        avg_score = sum(scores.values()) / len(scores)

        if max_score > 0.8:
            return "critical"
        elif max_score > 0.6 or avg_score > 0.4:
            return "high"
        elif max_score > 0.4 or avg_score > 0.25:
            return "medium"
        else:
            return "low"

    def _determine_action(self, risk_level: str) -> str:
        actions = {
            "critical": "block_and_escalate",
            "high": "require_human_review",
            "medium": "flag_for_review",
            "low": "allow",
        }
        return actions[risk_level]

Layer 5: Monitoring & Audit

Compliance Dashboard

class GovernanceDashboard:
    """Track governance compliance across all AI systems."""

    def __init__(self, audit_log: AuditLog):
        self.audit_log = audit_log

    async def generate_report(self, period: str = "last_30_days") -> dict:
        logs = await self.audit_log.query(period)

        return {
            "summary": {
                "total_ai_requests": len(logs),
                "total_violations": len([l for l in logs if l.violation]),
                "violation_rate": len([l for l in logs if l.violation]) / len(logs),
                "blocked_prompts": len([l for l in logs if l.action == "blocked"]),
                "human_reviews_triggered": len([l for l in logs if l.action == "human_review"]),
            },
            "by_system": self._group_by_system(logs),
            "by_risk_tier": self._group_by_risk_tier(logs),
            "top_violations": self._top_violations(logs),
            "trend": self._compliance_trend(logs),
            "recommendations": self._generate_recommendations(logs),
        }

    def _top_violations(self, logs: list) -> list[dict]:
        from collections import Counter
        violation_types = Counter(l.violation_type for l in logs if l.violation)
        return [
            {"type": vtype, "count": count, "percentage": count / len(logs) * 100}
            for vtype, count in violation_types.most_common(10)
        ]

    def _generate_recommendations(self, logs: list) -> list[str]:
        recommendations = []

        # High violation rate
        violation_rate = sum(1 for l in logs if l.violation) / len(logs)
        if violation_rate > 0.05:
            recommendations.append(
                "Violation rate exceeds 5%. Review training materials and access controls."
            )

        # Repeated offenders
        user_violations = Counter(l.user_id for l in logs if l.violation)
        repeat_offenders = [u for u, c in user_violations.items() if c > 5]
        if repeat_offenders:
            recommendations.append(
                f"{len(repeat_offenders)} users have 5+ violations. Schedule targeted training."
            )

        # System-specific issues
        system_violations = Counter(l.system_name for l in logs if l.violation)
        for system, count in system_violations.items():
            if count > 100:
                recommendations.append(
                    f"System '{system}' has {count} violations. Conduct a full audit."
                )

        return recommendations

Audit Trail

Every AI interaction should be logged for compliance:

@dataclass
class AuditEntry:
    timestamp: str
    user_id: str
    system_name: str
    risk_tier: str
    prompt_hash: str  # Hash for privacy, raw text stored separately with access controls
    model_used: str
    output_hash: str
    tokens_used: int
    cost_usd: float
    guardrail_results: dict
    action_taken: str  # allowed, blocked, flagged, human_review
    human_reviewer: str | None
    human_review_decision: str | None
    compliance_status: str  # compliant, violation, pending_review


class AuditLog:
    def __init__(self, database):
        self.db = database

    async def log(self, entry: AuditEntry):
        await self.db.insert("ai_audit_log", entry)

    async def query(self, period: str, filters: dict = None) -> list[AuditEntry]:
        query = f"SELECT * FROM ai_audit_log WHERE timestamp >= {period}"
        if filters:
            for key, value in filters.items():
                query += f" AND {key} = '{value}'"
        return await self.db.execute(query)

Regulatory Compliance Mapping

RegulationKey RequirementHow to Comply
EU AI ActRisk classification, transparency, human oversightImplement risk tiers, disclose AI use, maintain human review
GDPRData minimization, right to explanationLog data flows, provide output explanations, enable data deletion
SOC 2Access controls, monitoring, audit trailsImplement guardrails, maintain audit logs, regular reviews
HIPAA (healthcare)PHI protectionBlock PHI in prompts, use HIPAA-compliant providers, BAAs
CCPA (California)Consumer rights, transparencyDisclose AI use in consumer interactions, honor opt-out
NYC Bias LawBias auditing for employment AIRegular bias testing, impact assessments for hiring tools

Building a Governance Program

Phase 1: Foundation (Weeks 1-4)

  1. Appoint an AI Governance Lead (or committee)
  2. Draft core AI principles and usage policy
  3. Inventory all existing AI tool usage
  4. Identify highest-risk use cases

Phase 2: Controls (Weeks 5-12)

  1. Deploy prompt/output guardrails
  2. Implement audit logging
  3. Create risk assessment process
  4. Launch employee AI training

Phase 3: Maturation (Weeks 13-24)

  1. Establish continuous monitoring
  2. Build compliance dashboards
  3. Conduct first AI risk audit
  4. Create incident response procedures

Phase 4: Optimization (Ongoing)

  1. Regular policy reviews and updates
  2. Automated compliance testing
  3. Industry benchmarking
  4. Regulatory change monitoring

Cross-References

Related docs

Related agents