Models

Model tracker

Track leading models by provider, capabilities, release history, and practical use cases.

New Ranking Method

Traceable Capability Density ranks LLMs by how complete, current, and operationally legible they are.

Instead of chasing one benchmark, LLM-Docs uses a bespoke metric that rewards five things at once: concrete model identity, specification depth, workload reach, release freshness, and deployment confidence. The result is a conference-style grade system built for serious model tracking rather than hype cycles.

Identity Precision

/20

Rewards entries that clearly name a real model family or checkpoint instead of a generic article or use-case page.

Specification Depth

/25

Counts structured signals such as context window, modalities, pricing, tags, and use-case coverage.

Workload Reach

/20

Measures how much practical surface area the model exposes across modality support, context scale, and workload breadth.

Temporal Momentum

/15

Gives more credit to recent releases so the ranking stays tied to the current frontier.

Deployment Confidence

/20

Rewards active, clearly tracked entries and penalizes weak or auto-detected records with thin operational detail.

Conference-style grades

A*: exceptional across almost every tracked signal; clear model identity, high metadata depth, and strong operational clarity.

A: strong and credible, with only one weaker area preventing top-tier status.

B: useful model entry, but not yet elite on clarity, breadth, or deployment evidence.

C: low-confidence or thinly specified entry; present in tracking, but not yet strong enough for serious ranking trust.

How the metric works

The page computes a 100-point score for each tracked entry. High scores come from being a clearly named model with rich specs and current release signals. Weak scores usually mean the entry behaves more like a generic announcement, workflow guide, or topic page than a properly specified LLM profile. That keeps the leaderboard from confusing content noise with model quality.

This is intentionally not a raw intelligence benchmark. It is a structured ranking of model seriousness and traceability: the models you can reason about, compare, and operationalize with confidence.

Leaderboard

TCD leaderboard

Ranked from the current tracked set. Eligible entries: 12.

Rank	Model	Provider	Grade	TCD	Why it lands here
1	Reference Frontier Model	Example AI Lab	A	72	Broad workload reach, fresh release signal, high tracking confidence, but weak model identity hold this entry back.
2	Reference Open Model	Open Model Community	B	67	Fresh release signal, high tracking confidence, but weak model identity hold this entry back.
3	Gemma 4: Byte for byte, the most capable open models	Google DeepMind	C	39	Fresh release signal, but thin metadata hold this entry back.
4	Gemini 3.1 Flash Live: Making audio AI more natural and reliable	Google DeepMind	C	39	Fresh release signal, but thin metadata hold this entry back.
5	What 81,000 people want from AI	Anthropic	C	39	Fresh release signal, but thin metadata hold this entry back.
6	Gemini 3.1 Flash-Lite: Built for intelligence at scale	Google DeepMind	C	36	Fresh release signal, but thin metadata hold this entry back.
7	Nano Banana 2: Combining Pro capabilities with lightning-fast speed	Google DeepMind	C	36	Fresh release signal, but thin metadata hold this entry back.
8	Introducing Claude Sonnet 4.6	Anthropic	C	36	Fresh release signal, but thin metadata hold this entry back.
9	Introducing Claude Opus 4.6	Anthropic	C	36	Fresh release signal, but thin metadata hold this entry back.
10	Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI	OpenAI	C	34	Fresh release signal, but thin metadata hold this entry back.
11	ChatGPT for research	OpenAI	C	34	Fresh release signal, but thin metadata hold this entry back.
12	Writing with ChatGPT	OpenAI	C	34	Fresh release signal, but thin metadata hold this entry back.

Example AI Lab•active•A · 72

Reference Frontier Model

Template-style entry for tracking a flagship commercial model.

Context window: 256K

Broad workload reach, fresh release signal, high tracking confidence, but weak model identity hold this entry back.

Open Model Community•active•B · 67

Reference Open Model

Template-style entry for tracking an open-weight or open-source model.

Context window: 128K

Fresh release signal, high tracking confidence, but weak model identity hold this entry back.

Google DeepMind•auto-detected•C · 39

Gemma 4: Byte for byte, the most capable open models

Gemma 4: Our most intelligent open models to date, purpose-built for advanced reasoning and agentic workflows.

Context window: Not set