The Medical AI Productivity Index

APEX evaluates frontier AI models across four critical medical roles: General Practitioner, Radiology Expert, Pathology Specialist, and Cardiology Annotator. Our benchmark measures real-world medical task performance using validated datasets from leading medical institutions.

Overall Rankings

Aggregated performance across all medical specialties

Rank	Model	Score
1	O GPT 5 OpenAI	67% ± 2.1%
2	G Gemini 3 Pro Google	65.4% ± 1.8%
3	x Grok 4 xAI	64.2% ± 2.3%
4	O o3 OpenAI	63.8% ± 1.9%
5	A Opus 4.5 Anthropic	63.1% ± 2%
6	A Sonnet 4.5 Anthropic	62.1% ± 1.7%
7	G Gemini 2.5 Flash Google	61.5% ± 2.2%
8	O GPT OSS OpenAI	59.8% ± 1.6%

Specialty Performance

Detailed rankings for each medical role

About APEX

The Medical AI Productivity Index (APEX) is developed in collaboration with experts from University of Pennsylvania, Northwestern University, Cornell Medical Center, Brigham and Women's Hospital, and Mount Sinai Health System.

Our benchmark evaluates AI models on authentic medical tasks including diagnosis support, medical imaging analysis, pathology review, and clinical documentation. All scores represent performance on validated test sets with error margins calculated using bootstrap sampling.

The Medical AI Productivity Index

Overall Rankings

Specialty Performance

General Practitioner (MD)

Radiology Expert

Pathology Specialist

Cardiology Annotator

About APEX