Fundamentals

Open Source vs Closed Models

Comprehensive comparison of open-weight and closed API models — trade-offs in capability, cost, privacy, customization, and selection guidance

Published: 2026-04-19 · Last updated: 2026-04-19

Open Source vs Closed Models

The choice between open-weight (open-source) and closed (proprietary API) models is one of the most consequential architectural decisions in any LLM project. This guide provides a comprehensive comparison to help you make an informed choice based on your specific requirements for capability, cost, privacy, customization, and operational complexity.

Defining the Terms

Open-Weight Models

Open-weight models publish their trained parameters, allowing anyone to download, inspect, modify, and deploy them. The term "open-weight" is more accurate than "open-source" since the training data and code are not always available.

Notable open-weight model families:

Model Family	Publisher	Parameter Sizes	License	Best Known For
Llama 3.x	Meta	8B, 70B, 405B	Custom (commercial use allowed)	General capability, ecosystem
Mistral / Mixtral	Mistral AI	7B, 8x7B, 8x22B	Apache 2.0	Efficiency, MoE architecture
Qwen 2.5	Alibaba	0.5B-72B	Apache 2.0	Multilingual, coding
Gemma 3	Google	1B, 4B, 12B, 27B	Custom (commercial use allowed)	Efficiency at small sizes
DeepSeek-V3	DeepSeek	671B (MoE)	Custom	Reasoning, coding
Phi-3/4	Microsoft	3.8B, 14B	MIT	Small model performance

Closed (Proprietary) Models

Closed models are accessible only via API. Their weights, architecture details, and training data are trade secrets.

Notable closed model providers:

Provider	Model Family	Access	Pricing Model	Best Known For
OpenAI	GPT-4.x series	API, Azure	Per-token	General capability, tool use
Anthropic	Claude 3.x/4.x series	API	Per-token	Safety, long context, writing
Google	Gemini 2.x series	API, Vertex AI	Per-token	Multimodal, Google integration
Cohere	Command R+/R	API	Per-token	RAG, enterprise features
xAI	Grok series	API	Per-token	Real-time data access

Capability Comparison

Benchmark Performance (April 2026)

Model	Type	MMLU	HumanEval	GSM8K	IFEval	Context Length
GPT-4.1	Closed	88.0	84.1	94.3	87.5	1M
Claude Sonnet 4	Closed	87.5	78.2	92.1	91.0	200K
Gemini 2.5 Pro	Closed	86.8	80.5	93.0	88.2	1M
Llama 3.1 405B	Open	85.2	76.8	90.5	83.0	128K
DeepSeek-V3	Open	84.5	75.3	89.8	82.5	128K
Mistral Large 2	Open	83.0	72.1	88.2	81.0	128K
Llama 3.1 70B	Open	82.0	72.5	87.2	84.0	128K
Qwen 2.5 72B	Open	81.5	74.0	86.8	80.5	128K

Key Observations

Frontier closed models still lead on most benchmarks, but the gap is narrowing — especially in the 70B+ open-weight tier
Open models at 70B+ parameters are competitive with closed models from 6-12 months ago
Small open models (7B-14B) excel at narrow, fine-tunable tasks but struggle with general reasoning
Closed models often have superior tool use and function calling capabilities out of the box

Cost Analysis

API Models: Pay Per Use

# Monthly cost estimate for API usage
def api_monthly_cost(
    daily_requests: int,
    avg_input_tokens: int,
    avg_output_tokens: int,
    input_price_per_m: float,
    output_price_per_m: float,
) -> float:
    daily_input_cost = (daily_requests * avg_input_tokens / 1_000_000) * input_price_per_m
    daily_output_cost = (daily_requests * avg_output_tokens / 1_000_000) * output_price_per_m
    return (daily_input_cost + daily_output_cost) * 30

# 100K requests/day, 1K input, 500 output
scenarios = {
    "GPT-4.1 Mini": api_monthly_cost(100_000, 1000, 500, 0.40, 1.60),
    "Claude Haiku 3.5": api_monthly_cost(100_000, 1000, 500, 0.80, 4.00),
    "GPT-4.1": api_monthly_cost(100_000, 1000, 500, 2.00, 8.00),
    "Claude Sonnet 4": api_monthly_cost(100_000, 1000, 500, 3.00, 15.00),
}

for model, cost in scenarios.items():
    print(f"{model}: ${cost:,.2f}/month")
# GPT-4.1 Mini: $3,600.00/month
# Claude Haiku 3.5: $7,200.00/month
# GPT-4.1: $15,000.00/month
# Claude Sonnet 4: $22,500.00/month

Self-Hosted Open Models: Fixed Infrastructure Cost

def self_hosted_monthly_cost(
    gpu_type: str,
    num_gpus: int,
    gpu_hourly_rate: float,
    utilization: float = 0.7,
) -> dict:
    """Estimate monthly cost of self-hosting an LLM."""
    hours_per_month = 730  # average
    active_hours = hours_per_month * utilization
    compute_cost = num_gpus * gpu_hourly_rate * active_hours

    # Additional costs
    storage_cost = num_gpus * 50  # $50/GPU-month for model storage
    network_cost = num_gpus * 0.10 * active_hours  # bandwidth
    engineering_overhead = 5000  # MLOps engineer time (rough)

    total = compute_cost + storage_cost + network_cost + engineering_overhead
    return {
        "compute": compute_cost,
        "storage": storage_cost,
        "network": network_cost,
        "engineering": engineering_overhead,
        "total": total,
    }

# Hosting Llama 3.1 70B on 4x H100s
costs = self_hosted_monthly_cost("H100", 4, 2.50)
print(f"Total self-hosted cost: ${costs['total']:,.2f}/month")
# ~$15,400/month (compute + engineering overhead)

Breakeven Analysis

Monthly requests at which self-hosting becomes cheaper than API:

                    GPT-4.1 Mini    GPT-4.1       Claude Sonnet 4
Llama 70B 4xH100    ~425K/day       ~100K/day     ~68K/day
Llama 405B 8xH100   ~1.2M/day       ~285K/day     ~195K/day

Rule of thumb: If you process more than 100K-200K requests per day with a mid-tier model, self-hosting open models often becomes cost-effective.

See Model Comparison Guide for detailed cost comparison methodology.

Privacy and Data Security

Closed API Models

Aspect	Typical Policy	Enterprise Option
Data retention	30 days for abuse monitoring	Zero-retention available
Model training on your data	Opt-out required (varies by provider)	Contractually guaranteed no-training
SOC 2 / HIPAA	Available on enterprise tiers	Full compliance packages
Data residency	Limited regions	Multi-region with VPC peering
Audit logging	Available via dashboard	API-accessible, SIEM integration

Self-Hosted Open Models

Aspect	Capability
Data retention	Full control — data never leaves your infrastructure
Model training on your data	Impossible unless you choose to
SOC 2 / HIPAA	Your responsibility to implement
Data residency	Anywhere you deploy
Audit logging	Full infrastructure-level logging available
Air-gapped deployment	Fully supported

Compliance Decision Matrix

privacy_requirements:
  healthcare_phi_hipaa:
    closed: "Requires BAA with provider; verify zero-retention"
    open: "Preferred — full data control, but you carry compliance burden"
  financial_pci_gdpr:
    closed: "Available with enterprise agreements; check data residency"
    open: "Preferred for EU data residency requirements"
  government_ilr5:
    closed: "Limited — only providers with govcloud offerings"
    open: "Preferred — can deploy in classified environments"
  startup_mvp:
    closed: "Fine — standard API terms are acceptable for prototypes"
    open: "Consider if you have ML infra expertise on team"

Customization and Fine-Tuning

Closed Models

Fine-tuning options are limited and provider-specific:

Provider	Fine-Tuning Method	Supported Models	Max Training Examples
OpenAI	Supervised fine-tuning	GPT-4.1 Mini, Nano	~10K-100K
Anthropic	Model distillation (indirect)	Claude Haiku	N/A
Google	Tuning via Vertex AI	Gemini Pro	~10K
Cohere	Fine-tuning	Command R+	~50K

Limitations:

Cannot modify architecture or training process
Limited control over training hyperparameters
Fine-tuned models are only accessible via the same API
No ability to merge multiple fine-tuned models
Risk of provider discontinuing the base model

Open Models

Full customization freedom:

# Fine-tune Llama 3.1 70B with QLoRA using Unsloth
pip install unsloth transformers peft bitsandbytes

# Example: Fine-tune on custom instruction dataset
python finetune.py \
    --model_name "meta-llama/Llama-3.1-70B" \
    --dataset "my_org/customer_support_v2" \
    --lora_rank 64 \
    --learning_rate 2e-4 \
    --epochs 3 \
    --batch_size 4 \
    --gradient_accumulation 8 \
    --max_seq_length 4096 \
    --output_dir "./outputs/support-finetuned"

Capabilities exclusive to open models:

Full fine-tuning on any dataset with any hyperparameters
LoRA/QLoRA adapters for task-specific behavior without full retraining
Architecture modifications (attention variants, new layers)
Model merging (combining multiple fine-tuned adapters)
Continued pre-training on domain-specific corpora
Distillation to smaller models for edge deployment
Quantization to any precision (INT4, INT8, FP8)

For detailed fine-tuning guidance, see Fine-Tuning with LoRA/QLoRA and LLM Fine-Tuning Data Preparation.

Operational Complexity

Closed API Models: Low Operational Burden

# Minimal setup — just an API key
from openai import OpenAI

client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(
    model="gpt-4.1-mini",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

What you DON'T need to manage:

GPU infrastructure
Model loading and caching
Scaling and load balancing
Model updates and patches
Quantization and optimization

What you DO need to manage:

API key rotation and access control
Rate limiting and quota monitoring
Fallback logic for API outages
Cost monitoring and alerting
Prompt versioning and management

Self-Hosted Open Models: High Operational Burden

# Typical infrastructure stack for self-hosting
infrastructure:
  compute:
    - "GPU instances (H100, A100, L40S, or consumer GPUs)"
    - "CPU instances for preprocessing and API layer"
    - "Load balancer for multi-replica deployment"
  serving:
    options:
      - "vLLM — high-throughput, PagedAttention"
      - "TGI (Text Generation Inference) — HuggingFace official"
      - "SGLang — advanced serving with RadixAttention"
      - "TensorRT-LLM — NVIDIA optimized"
  monitoring:
    - "Prometheus + Grafana for metrics"
    - "ELK stack for logs"
    - "Custom quality monitoring pipelines"
  scaling:
    - "Kubernetes with GPU node pools"
    - "KEDA for event-driven autoscaling"
    - "Horizontal Pod Autoscaler (HPA)"

Operational Effort Comparison

Task	API Models	Self-Hosted
Initial setup	1 hour	1-4 weeks
Ongoing maintenance	2 hours/week	10-20 hours/week
Scaling to 10x traffic	Change plan / contact sales	Provision GPUs, test, deploy
Model updates	Automatic	Manual download, test, deploy
Incident response	Provider's responsibility	Your team's responsibility
Required team skills	Backend engineering	Backend + MLOps + GPU infra

Decision Framework

Choose Closed API Models When

You're building an MVP or prototype — speed to market matters most
Your volume is moderate (< 50K-100K requests/day)
You need best-in-class capability without fine-tuning
Your team lacks ML/GPU expertise — no dedicated infra team
Your data can leave your infrastructure — no strict air-gap requirements
You need advanced features (tool use, web search, vision) out of the box

Choose Open-Weight Models When

Your volume is high — self-hosting is more cost-effective at scale
Data privacy is paramount — healthcare, finance, government
You need deep customization — fine-tuning, architecture changes
You have ML infrastructure expertise — or are willing to build it
You need predictable costs — fixed infrastructure vs variable API
Regulatory compliance requires it — EU AI Act, data sovereignty laws
You want to avoid vendor lock-in — portable models and weights

Hybrid Approach

Many production systems use both:

hybrid_architecture:
  primary:
    model: "GPT-4.1 Mini (API)"
    use_case: "General queries, complex reasoning"
    fallback: "Claude Haiku 3.5"
  secondary:
    model: "Llama 3.1 70B (self-hosted)"
    use_case: "PII-containing requests, high-volume simple tasks"
  routing:
    logic: "Classify request -> check PII -> route accordingly"
    implementation: "Lightweight classifier or rule-based router"

This approach balances capability, cost, and privacy while providing redundancy against provider outages.

Cross-References

Model Comparison Guide — Systematic methodology for comparing any LLMs
LLM Fine-Tuning Data Preparation — Prepare datasets for fine-tuning open models
Fine-Tuning with LoRA/QLoRA — Efficient fine-tuning techniques for open-weight models
Cost Management & Optimization — Reduce costs regardless of model type
Deployment Strategies for Production — Production deployment patterns

Summary

Dimension	Closed API	Open-Weight Self-Hosted	Winner
Raw capability	Leading edge	3-12 months behind	Closed
Cost at low volume	Very low	High (fixed overhead)	Closed
Cost at high volume	Linear growth	Flat after infra	Open
Data privacy	Contractual	Absolute	Open
Customization	Limited	Unlimited	Open
Setup speed	Minutes	Weeks	Closed
Operational burden	Low	High	Closed
Vendor lock-in risk	High	Low	Open
Feature breadth	Broad	Narrow (DIY)	Closed

The best choice depends entirely on your specific constraints. Many mature teams evolve from API models (fast start) to self-hosted open models (cost control and customization) as their scale and expertise grow.

Related docs

Goose Guide

A guide to evaluating Goose for extensible terminal-agent workflows and tool-connected execution.

OpenCode Guide

A guide to using OpenCode for terminal-native, provider-flexible coding workflows.

Evaluation Metrics and Benchmarks

How to measure LLM capability — from academic benchmarks (MMLU, GSM8K, HumanEval) to practical evaluation pipelines for production systems

Related models

Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI

Cloudflare brings OpenAI’s GPT-5.4 and Codex to Agent Cloud, enabling enterprises to build, deploy, and scale AI agents for real-world tasks with speed and security.

AI fundamentals

Learn what AI is, how it works, and how tools like ChatGPT use large language models. A clear, beginner-friendly guide to understanding artificial intelligence.

Analyzing data with ChatGPT

Learn how to analyze data with ChatGPT by exploring datasets, generating insights, creating visualizations, and turning findings into actionable decisions.

Related agents

Goose

Block's open-source AI agent with CLI support, extensible tools, and workflows that go beyond simple code suggestion.

OpenCode

An open-source AI coding agent built for the terminal, with multi-session workflows, LSP support, and broad model-provider flexibility.