Topic Hub

Optimization

56 linked pages across the LLM-Docs library.

doc

LLM Latency Optimization

Achieving sub-second LLM latency — speculative decoding, model parallelism, prefill optimization, and real-time serving patterns

doc

Attention Mechanisms Variants

A deep technical survey of attention variants — from scaled dot-product to FlashAttention, linear attention, and state space alternatives

doc

Edge and On-Device LLM Inference

Running LLMs on phones, laptops, and IoT devices — model selection, optimization frameworks, and practical deployment guides for edge computing

doc

Cost Management and Optimization

Understanding and controlling LLM costs — token pricing, caching strategies, model selection for budget, and spend tracking at scale

doc

Inference Optimization and Quantization

Comprehensive guide to running LLMs efficiently — quantization methods, memory management, batching strategies, and throughput optimization

doc

Cost Optimization Architecture Patterns

Reference patterns, tradeoffs, and building blocks for designing cost optimization systems.

doc

Cost Optimization Cost and Performance

How to trade off latency, throughput, quality, and spend when operating cost optimization.

doc

Cost Optimization Evaluation Metrics

Metrics, scorecards, and review methods for measuring cost optimization quality in practice.

doc

Cost Optimization Failure Modes

Common failure patterns, debugging workflows, and prevention strategies for cost optimization.

doc

Cost Optimization Foundations

Core concepts, terminology, workflows, and mental models for reducing ai spend without undermining user outcomes or engineering velocity in modern AI systems.

doc

Cost Optimization Implementation Guide

A practical step-by-step guide for implementing cost optimization with production constraints in mind.

doc

Cost Optimization Production Checklist

Deployment checklist, operational controls, and rollout guidance for cost optimization workloads.

doc

Cost Optimization Vendor Landscape

How vendors, open-source options, and ecosystem tools compare for cost optimization use cases.

doc

Knowledge Distillation Architecture Patterns

Reference patterns, tradeoffs, and building blocks for designing knowledge distillation systems.

doc

Knowledge Distillation Architecture Patterns

Reference patterns, tradeoffs, and building blocks for designing knowledge distillation systems.

doc

Knowledge Distillation Cost and Performance

How to trade off latency, throughput, quality, and spend when operating knowledge distillation.

doc

Knowledge Distillation Cost and Performance

How to trade off latency, throughput, quality, and spend when operating knowledge distillation.

doc

Knowledge Distillation Evaluation Metrics

Metrics, scorecards, and review methods for measuring knowledge distillation quality in practice.

doc

Knowledge Distillation Evaluation Metrics

Metrics, scorecards, and review methods for measuring knowledge distillation quality in practice.

doc

Knowledge Distillation Failure Modes

Common failure patterns, debugging workflows, and prevention strategies for knowledge distillation.

doc

Knowledge Distillation Failure Modes

Common failure patterns, debugging workflows, and prevention strategies for knowledge distillation.

doc

Knowledge Distillation Foundations

Core concepts, terminology, workflows, and mental models for compressing capabilities from larger models into smaller and cheaper ones in modern AI systems.

doc

Knowledge Distillation Foundations

Core concepts, terminology, workflows, and mental models for compressing capabilities from larger models into smaller and cheaper ones in modern AI systems.

doc

Knowledge Distillation Implementation Guide

A practical step-by-step guide for implementing knowledge distillation with production constraints in mind.

doc

Knowledge Distillation Implementation Guide

A practical step-by-step guide for implementing knowledge distillation with production constraints in mind.

doc

Knowledge Distillation Production Checklist

Deployment checklist, operational controls, and rollout guidance for knowledge distillation workloads.

doc

Knowledge Distillation Production Checklist

Deployment checklist, operational controls, and rollout guidance for knowledge distillation workloads.

doc

Knowledge Distillation Vendor Landscape

How vendors, open-source options, and ecosystem tools compare for knowledge distillation use cases.

doc

Knowledge Distillation Vendor Landscape

How vendors, open-source options, and ecosystem tools compare for knowledge distillation use cases.

doc

Long-Context Systems Architecture Patterns

Reference patterns, tradeoffs, and building blocks for designing long-context systems systems.

doc

Long-Context Systems Cost and Performance

How to trade off latency, throughput, quality, and spend when operating long-context systems.

doc

Long-Context Systems Evaluation Metrics

Metrics, scorecards, and review methods for measuring long-context systems quality in practice.

doc

Long-Context Systems Failure Modes

Common failure patterns, debugging workflows, and prevention strategies for long-context systems.

doc

Long-Context Systems Foundations

Core concepts, terminology, workflows, and mental models for working with very large prompts and documents without losing relevance or speed in modern AI systems.

doc

Long-Context Systems Implementation Guide

A practical step-by-step guide for implementing long-context systems with production constraints in mind.

doc

Long-Context Systems Production Checklist

Deployment checklist, operational controls, and rollout guidance for long-context systems workloads.

doc

Long-Context Systems Vendor Landscape

How vendors, open-source options, and ecosystem tools compare for long-context systems use cases.

doc

Quantization Architecture Patterns

Reference patterns, tradeoffs, and building blocks for designing quantization systems.

doc

Quantization Architecture Patterns

Reference patterns, tradeoffs, and building blocks for designing quantization systems.

doc

Quantization Cost and Performance

How to trade off latency, throughput, quality, and spend when operating quantization.

doc

Quantization Cost and Performance

How to trade off latency, throughput, quality, and spend when operating quantization.

doc

Quantization Evaluation Metrics

Metrics, scorecards, and review methods for measuring quantization quality in practice.

doc

Quantization Evaluation Metrics

Metrics, scorecards, and review methods for measuring quantization quality in practice.

doc

Quantization Failure Modes

Common failure patterns, debugging workflows, and prevention strategies for quantization.

doc

Quantization Failure Modes

Common failure patterns, debugging workflows, and prevention strategies for quantization.

doc

Quantization Foundations

Core concepts, terminology, workflows, and mental models for reducing model memory and compute requirements while preserving useful quality in modern AI systems.

doc

Quantization Foundations

Core concepts, terminology, workflows, and mental models for reducing model memory and compute requirements while preserving useful quality in modern AI systems.

doc

Quantization Implementation Guide

A practical step-by-step guide for implementing quantization with production constraints in mind.

doc

Quantization Implementation Guide

A practical step-by-step guide for implementing quantization with production constraints in mind.

doc

Quantization Production Checklist

Deployment checklist, operational controls, and rollout guidance for quantization workloads.

doc

Quantization Production Checklist

Deployment checklist, operational controls, and rollout guidance for quantization workloads.

doc

Quantization Vendor Landscape

How vendors, open-source options, and ecosystem tools compare for quantization use cases.

doc

Quantization Vendor Landscape

How vendors, open-source options, and ecosystem tools compare for quantization use cases.

doc

Speculative Decoding and Generation Optimization

Speeding up LLM generation — speculative decoding, cache optimization, batched inference, and throughput maximization techniques

doc

Model Training and Pre-training

The complete LLM training pipeline — data preparation, distributed training, optimization techniques, and checkpoint management

doc

Model Scaling Laws

Understanding the mathematical relationships between model size, data, compute, and performance — Kaplan, Chinchilla, and modern scaling research