Topic Hub
Optimization
56 linked pages across the LLM-Docs library.
doc
LLM Latency Optimization
Achieving sub-second LLM latency — speculative decoding, model parallelism, prefill optimization, and real-time serving patterns
doc
Attention Mechanisms Variants
A deep technical survey of attention variants — from scaled dot-product to FlashAttention, linear attention, and state space alternatives
doc
Edge and On-Device LLM Inference
Running LLMs on phones, laptops, and IoT devices — model selection, optimization frameworks, and practical deployment guides for edge computing
doc
Cost Management and Optimization
Understanding and controlling LLM costs — token pricing, caching strategies, model selection for budget, and spend tracking at scale
doc
Inference Optimization and Quantization
Comprehensive guide to running LLMs efficiently — quantization methods, memory management, batching strategies, and throughput optimization
doc
Cost Optimization Architecture Patterns
Reference patterns, tradeoffs, and building blocks for designing cost optimization systems.
doc
Cost Optimization Cost and Performance
How to trade off latency, throughput, quality, and spend when operating cost optimization.
doc
Cost Optimization Evaluation Metrics
Metrics, scorecards, and review methods for measuring cost optimization quality in practice.
doc
Cost Optimization Failure Modes
Common failure patterns, debugging workflows, and prevention strategies for cost optimization.
doc
Cost Optimization Foundations
Core concepts, terminology, workflows, and mental models for reducing ai spend without undermining user outcomes or engineering velocity in modern AI systems.
doc
Cost Optimization Implementation Guide
A practical step-by-step guide for implementing cost optimization with production constraints in mind.
doc
Cost Optimization Production Checklist
Deployment checklist, operational controls, and rollout guidance for cost optimization workloads.
doc
Cost Optimization Vendor Landscape
How vendors, open-source options, and ecosystem tools compare for cost optimization use cases.
doc
Knowledge Distillation Architecture Patterns
Reference patterns, tradeoffs, and building blocks for designing knowledge distillation systems.
doc
Knowledge Distillation Architecture Patterns
Reference patterns, tradeoffs, and building blocks for designing knowledge distillation systems.
doc
Knowledge Distillation Cost and Performance
How to trade off latency, throughput, quality, and spend when operating knowledge distillation.
doc
Knowledge Distillation Cost and Performance
How to trade off latency, throughput, quality, and spend when operating knowledge distillation.
doc
Knowledge Distillation Evaluation Metrics
Metrics, scorecards, and review methods for measuring knowledge distillation quality in practice.
doc
Knowledge Distillation Evaluation Metrics
Metrics, scorecards, and review methods for measuring knowledge distillation quality in practice.
doc
Knowledge Distillation Failure Modes
Common failure patterns, debugging workflows, and prevention strategies for knowledge distillation.
doc
Knowledge Distillation Failure Modes
Common failure patterns, debugging workflows, and prevention strategies for knowledge distillation.
doc
Knowledge Distillation Foundations
Core concepts, terminology, workflows, and mental models for compressing capabilities from larger models into smaller and cheaper ones in modern AI systems.
doc
Knowledge Distillation Foundations
Core concepts, terminology, workflows, and mental models for compressing capabilities from larger models into smaller and cheaper ones in modern AI systems.
doc
Knowledge Distillation Implementation Guide
A practical step-by-step guide for implementing knowledge distillation with production constraints in mind.
doc
Knowledge Distillation Implementation Guide
A practical step-by-step guide for implementing knowledge distillation with production constraints in mind.
doc
Knowledge Distillation Production Checklist
Deployment checklist, operational controls, and rollout guidance for knowledge distillation workloads.
doc
Knowledge Distillation Production Checklist
Deployment checklist, operational controls, and rollout guidance for knowledge distillation workloads.
doc
Knowledge Distillation Vendor Landscape
How vendors, open-source options, and ecosystem tools compare for knowledge distillation use cases.
doc
Knowledge Distillation Vendor Landscape
How vendors, open-source options, and ecosystem tools compare for knowledge distillation use cases.
doc
Long-Context Systems Architecture Patterns
Reference patterns, tradeoffs, and building blocks for designing long-context systems systems.
doc
Long-Context Systems Cost and Performance
How to trade off latency, throughput, quality, and spend when operating long-context systems.
doc
Long-Context Systems Evaluation Metrics
Metrics, scorecards, and review methods for measuring long-context systems quality in practice.
doc
Long-Context Systems Failure Modes
Common failure patterns, debugging workflows, and prevention strategies for long-context systems.
doc
Long-Context Systems Foundations
Core concepts, terminology, workflows, and mental models for working with very large prompts and documents without losing relevance or speed in modern AI systems.
doc
Long-Context Systems Implementation Guide
A practical step-by-step guide for implementing long-context systems with production constraints in mind.
doc
Long-Context Systems Production Checklist
Deployment checklist, operational controls, and rollout guidance for long-context systems workloads.
doc
Long-Context Systems Vendor Landscape
How vendors, open-source options, and ecosystem tools compare for long-context systems use cases.
doc
Quantization Architecture Patterns
Reference patterns, tradeoffs, and building blocks for designing quantization systems.
doc
Quantization Architecture Patterns
Reference patterns, tradeoffs, and building blocks for designing quantization systems.
doc
Quantization Cost and Performance
How to trade off latency, throughput, quality, and spend when operating quantization.
doc
Quantization Cost and Performance
How to trade off latency, throughput, quality, and spend when operating quantization.
doc
Quantization Evaluation Metrics
Metrics, scorecards, and review methods for measuring quantization quality in practice.
doc
Quantization Evaluation Metrics
Metrics, scorecards, and review methods for measuring quantization quality in practice.
doc
Quantization Failure Modes
Common failure patterns, debugging workflows, and prevention strategies for quantization.
doc
Quantization Failure Modes
Common failure patterns, debugging workflows, and prevention strategies for quantization.
doc
Quantization Foundations
Core concepts, terminology, workflows, and mental models for reducing model memory and compute requirements while preserving useful quality in modern AI systems.
doc
Quantization Foundations
Core concepts, terminology, workflows, and mental models for reducing model memory and compute requirements while preserving useful quality in modern AI systems.
doc
Quantization Implementation Guide
A practical step-by-step guide for implementing quantization with production constraints in mind.
doc
Quantization Implementation Guide
A practical step-by-step guide for implementing quantization with production constraints in mind.
doc
Quantization Production Checklist
Deployment checklist, operational controls, and rollout guidance for quantization workloads.
doc
Quantization Production Checklist
Deployment checklist, operational controls, and rollout guidance for quantization workloads.
doc
Quantization Vendor Landscape
How vendors, open-source options, and ecosystem tools compare for quantization use cases.
doc
Quantization Vendor Landscape
How vendors, open-source options, and ecosystem tools compare for quantization use cases.
doc
Speculative Decoding and Generation Optimization
Speeding up LLM generation — speculative decoding, cache optimization, batched inference, and throughput maximization techniques
doc
Model Training and Pre-training
The complete LLM training pipeline — data preparation, distributed training, optimization techniques, and checkpoint management
doc
Model Scaling Laws
Understanding the mathematical relationships between model size, data, compute, and performance — Kaplan, Chinchilla, and modern scaling research