Fundamentals

Open Source vs Closed Models

Comprehensive comparison of open-weight and closed API models — trade-offs in capability, cost, privacy, customization, and selection guidance

Published: 2026-04-19 · Last updated: 2026-04-19

Open Source vs Closed Models

The choice between open-weight (open-source) and closed (proprietary API) models is one of the most consequential architectural decisions in any LLM project. This guide provides a comprehensive comparison to help you make an informed choice based on your specific requirements for capability, cost, privacy, customization, and operational complexity.

Defining the Terms

Open-Weight Models

Open-weight models publish their trained parameters, allowing anyone to download, inspect, modify, and deploy them. The term "open-weight" is more accurate than "open-source" since the training data and code are not always available.

Notable open-weight model families:

Model FamilyPublisherParameter SizesLicenseBest Known For
Llama 3.xMeta8B, 70B, 405BCustom (commercial use allowed)General capability, ecosystem
Mistral / MixtralMistral AI7B, 8x7B, 8x22BApache 2.0Efficiency, MoE architecture
Qwen 2.5Alibaba0.5B-72BApache 2.0Multilingual, coding
Gemma 3Google1B, 4B, 12B, 27BCustom (commercial use allowed)Efficiency at small sizes
DeepSeek-V3DeepSeek671B (MoE)CustomReasoning, coding
Phi-3/4Microsoft3.8B, 14BMITSmall model performance

Closed (Proprietary) Models

Closed models are accessible only via API. Their weights, architecture details, and training data are trade secrets.

Notable closed model providers:

ProviderModel FamilyAccessPricing ModelBest Known For
OpenAIGPT-4.x seriesAPI, AzurePer-tokenGeneral capability, tool use
AnthropicClaude 3.x/4.x seriesAPIPer-tokenSafety, long context, writing
GoogleGemini 2.x seriesAPI, Vertex AIPer-tokenMultimodal, Google integration
CohereCommand R+/RAPIPer-tokenRAG, enterprise features
xAIGrok seriesAPIPer-tokenReal-time data access

Capability Comparison

Benchmark Performance (April 2026)

ModelTypeMMLUHumanEvalGSM8KIFEvalContext Length
GPT-4.1Closed88.084.194.387.51M
Claude Sonnet 4Closed87.578.292.191.0200K
Gemini 2.5 ProClosed86.880.593.088.21M
Llama 3.1 405BOpen85.276.890.583.0128K
DeepSeek-V3Open84.575.389.882.5128K
Mistral Large 2Open83.072.188.281.0128K
Llama 3.1 70BOpen82.072.587.284.0128K
Qwen 2.5 72BOpen81.574.086.880.5128K

Key Observations

  1. Frontier closed models still lead on most benchmarks, but the gap is narrowing — especially in the 70B+ open-weight tier
  2. Open models at 70B+ parameters are competitive with closed models from 6-12 months ago
  3. Small open models (7B-14B) excel at narrow, fine-tunable tasks but struggle with general reasoning
  4. Closed models often have superior tool use and function calling capabilities out of the box

Cost Analysis

API Models: Pay Per Use

# Monthly cost estimate for API usage
def api_monthly_cost(
    daily_requests: int,
    avg_input_tokens: int,
    avg_output_tokens: int,
    input_price_per_m: float,
    output_price_per_m: float,
) -> float:
    daily_input_cost = (daily_requests * avg_input_tokens / 1_000_000) * input_price_per_m
    daily_output_cost = (daily_requests * avg_output_tokens / 1_000_000) * output_price_per_m
    return (daily_input_cost + daily_output_cost) * 30

# 100K requests/day, 1K input, 500 output
scenarios = {
    "GPT-4.1 Mini": api_monthly_cost(100_000, 1000, 500, 0.40, 1.60),
    "Claude Haiku 3.5": api_monthly_cost(100_000, 1000, 500, 0.80, 4.00),
    "GPT-4.1": api_monthly_cost(100_000, 1000, 500, 2.00, 8.00),
    "Claude Sonnet 4": api_monthly_cost(100_000, 1000, 500, 3.00, 15.00),
}

for model, cost in scenarios.items():
    print(f"{model}: ${cost:,.2f}/month")
# GPT-4.1 Mini: $3,600.00/month
# Claude Haiku 3.5: $7,200.00/month
# GPT-4.1: $15,000.00/month
# Claude Sonnet 4: $22,500.00/month

Self-Hosted Open Models: Fixed Infrastructure Cost

def self_hosted_monthly_cost(
    gpu_type: str,
    num_gpus: int,
    gpu_hourly_rate: float,
    utilization: float = 0.7,
) -> dict:
    """Estimate monthly cost of self-hosting an LLM."""
    hours_per_month = 730  # average
    active_hours = hours_per_month * utilization
    compute_cost = num_gpus * gpu_hourly_rate * active_hours

    # Additional costs
    storage_cost = num_gpus * 50  # $50/GPU-month for model storage
    network_cost = num_gpus * 0.10 * active_hours  # bandwidth
    engineering_overhead = 5000  # MLOps engineer time (rough)

    total = compute_cost + storage_cost + network_cost + engineering_overhead
    return {
        "compute": compute_cost,
        "storage": storage_cost,
        "network": network_cost,
        "engineering": engineering_overhead,
        "total": total,
    }

# Hosting Llama 3.1 70B on 4x H100s
costs = self_hosted_monthly_cost("H100", 4, 2.50)
print(f"Total self-hosted cost: ${costs['total']:,.2f}/month")
# ~$15,400/month (compute + engineering overhead)

Breakeven Analysis

Monthly requests at which self-hosting becomes cheaper than API:

                    GPT-4.1 Mini    GPT-4.1       Claude Sonnet 4
Llama 70B 4xH100    ~425K/day       ~100K/day     ~68K/day
Llama 405B 8xH100   ~1.2M/day       ~285K/day     ~195K/day

Rule of thumb: If you process more than 100K-200K requests per day with a mid-tier model, self-hosting open models often becomes cost-effective.

See Model Comparison Guide for detailed cost comparison methodology.

Privacy and Data Security

Closed API Models

AspectTypical PolicyEnterprise Option
Data retention30 days for abuse monitoringZero-retention available
Model training on your dataOpt-out required (varies by provider)Contractually guaranteed no-training
SOC 2 / HIPAAAvailable on enterprise tiersFull compliance packages
Data residencyLimited regionsMulti-region with VPC peering
Audit loggingAvailable via dashboardAPI-accessible, SIEM integration

Self-Hosted Open Models

AspectCapability
Data retentionFull control — data never leaves your infrastructure
Model training on your dataImpossible unless you choose to
SOC 2 / HIPAAYour responsibility to implement
Data residencyAnywhere you deploy
Audit loggingFull infrastructure-level logging available
Air-gapped deploymentFully supported

Compliance Decision Matrix

privacy_requirements:
  healthcare_phi_hipaa:
    closed: "Requires BAA with provider; verify zero-retention"
    open: "Preferred — full data control, but you carry compliance burden"
  financial_pci_gdpr:
    closed: "Available with enterprise agreements; check data residency"
    open: "Preferred for EU data residency requirements"
  government_ilr5:
    closed: "Limited — only providers with govcloud offerings"
    open: "Preferred — can deploy in classified environments"
  startup_mvp:
    closed: "Fine — standard API terms are acceptable for prototypes"
    open: "Consider if you have ML infra expertise on team"

Customization and Fine-Tuning

Closed Models

Fine-tuning options are limited and provider-specific:

ProviderFine-Tuning MethodSupported ModelsMax Training Examples
OpenAISupervised fine-tuningGPT-4.1 Mini, Nano~10K-100K
AnthropicModel distillation (indirect)Claude HaikuN/A
GoogleTuning via Vertex AIGemini Pro~10K
CohereFine-tuningCommand R+~50K

Limitations:

  • Cannot modify architecture or training process
  • Limited control over training hyperparameters
  • Fine-tuned models are only accessible via the same API
  • No ability to merge multiple fine-tuned models
  • Risk of provider discontinuing the base model

Open Models

Full customization freedom:

# Fine-tune Llama 3.1 70B with QLoRA using Unsloth
pip install unsloth transformers peft bitsandbytes

# Example: Fine-tune on custom instruction dataset
python finetune.py \
    --model_name "meta-llama/Llama-3.1-70B" \
    --dataset "my_org/customer_support_v2" \
    --lora_rank 64 \
    --learning_rate 2e-4 \
    --epochs 3 \
    --batch_size 4 \
    --gradient_accumulation 8 \
    --max_seq_length 4096 \
    --output_dir "./outputs/support-finetuned"

Capabilities exclusive to open models:

  • Full fine-tuning on any dataset with any hyperparameters
  • LoRA/QLoRA adapters for task-specific behavior without full retraining
  • Architecture modifications (attention variants, new layers)
  • Model merging (combining multiple fine-tuned adapters)
  • Continued pre-training on domain-specific corpora
  • Distillation to smaller models for edge deployment
  • Quantization to any precision (INT4, INT8, FP8)

For detailed fine-tuning guidance, see Fine-Tuning with LoRA/QLoRA and LLM Fine-Tuning Data Preparation.

Operational Complexity

Closed API Models: Low Operational Burden

# Minimal setup — just an API key
from openai import OpenAI

client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(
    model="gpt-4.1-mini",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

What you DON'T need to manage:

  • GPU infrastructure
  • Model loading and caching
  • Scaling and load balancing
  • Model updates and patches
  • Quantization and optimization

What you DO need to manage:

  • API key rotation and access control
  • Rate limiting and quota monitoring
  • Fallback logic for API outages
  • Cost monitoring and alerting
  • Prompt versioning and management

Self-Hosted Open Models: High Operational Burden

# Typical infrastructure stack for self-hosting
infrastructure:
  compute:
    - "GPU instances (H100, A100, L40S, or consumer GPUs)"
    - "CPU instances for preprocessing and API layer"
    - "Load balancer for multi-replica deployment"
  serving:
    options:
      - "vLLM — high-throughput, PagedAttention"
      - "TGI (Text Generation Inference) — HuggingFace official"
      - "SGLang — advanced serving with RadixAttention"
      - "TensorRT-LLM — NVIDIA optimized"
  monitoring:
    - "Prometheus + Grafana for metrics"
    - "ELK stack for logs"
    - "Custom quality monitoring pipelines"
  scaling:
    - "Kubernetes with GPU node pools"
    - "KEDA for event-driven autoscaling"
    - "Horizontal Pod Autoscaler (HPA)"

Operational Effort Comparison

TaskAPI ModelsSelf-Hosted
Initial setup1 hour1-4 weeks
Ongoing maintenance2 hours/week10-20 hours/week
Scaling to 10x trafficChange plan / contact salesProvision GPUs, test, deploy
Model updatesAutomaticManual download, test, deploy
Incident responseProvider's responsibilityYour team's responsibility
Required team skillsBackend engineeringBackend + MLOps + GPU infra

Decision Framework

Choose Closed API Models When

  1. You're building an MVP or prototype — speed to market matters most
  2. Your volume is moderate (< 50K-100K requests/day)
  3. You need best-in-class capability without fine-tuning
  4. Your team lacks ML/GPU expertise — no dedicated infra team
  5. Your data can leave your infrastructure — no strict air-gap requirements
  6. You need advanced features (tool use, web search, vision) out of the box

Choose Open-Weight Models When

  1. Your volume is high — self-hosting is more cost-effective at scale
  2. Data privacy is paramount — healthcare, finance, government
  3. You need deep customization — fine-tuning, architecture changes
  4. You have ML infrastructure expertise — or are willing to build it
  5. You need predictable costs — fixed infrastructure vs variable API
  6. Regulatory compliance requires it — EU AI Act, data sovereignty laws
  7. You want to avoid vendor lock-in — portable models and weights

Hybrid Approach

Many production systems use both:

hybrid_architecture:
  primary:
    model: "GPT-4.1 Mini (API)"
    use_case: "General queries, complex reasoning"
    fallback: "Claude Haiku 3.5"
  secondary:
    model: "Llama 3.1 70B (self-hosted)"
    use_case: "PII-containing requests, high-volume simple tasks"
  routing:
    logic: "Classify request -> check PII -> route accordingly"
    implementation: "Lightweight classifier or rule-based router"

This approach balances capability, cost, and privacy while providing redundancy against provider outages.

Cross-References

Summary

DimensionClosed APIOpen-Weight Self-HostedWinner
Raw capabilityLeading edge3-12 months behindClosed
Cost at low volumeVery lowHigh (fixed overhead)Closed
Cost at high volumeLinear growthFlat after infraOpen
Data privacyContractualAbsoluteOpen
CustomizationLimitedUnlimitedOpen
Setup speedMinutesWeeksClosed
Operational burdenLowHighClosed
Vendor lock-in riskHighLowOpen
Feature breadthBroadNarrow (DIY)Closed

The best choice depends entirely on your specific constraints. Many mature teams evolve from API models (fast start) to self-hosted open models (cost control and customization) as their scale and expertise grow.

Related docs

Related models

Related agents