moccet labs

The Medical AI Productivity Index

APEX evaluates frontier AI models across four critical medical roles: General Practitioner, Radiology Expert, Pathology Specialist, and Cardiology Annotator. Our benchmark measures real-world medical task performance using validated datasets from leading medical institutions.

Overall Rankings

Aggregated performance across all medical specialties

RankModelScore
1
O
GPT 5
OpenAI
67%
± 2.1%
2
G
Gemini 3 Pro
Google
65.4%
± 1.8%
3
x
Grok 4
xAI
64.2%
± 2.3%
4
O
o3
OpenAI
63.8%
± 1.9%
5
A
Opus 4.5
Anthropic
63.1%
± 2%
6
A
Sonnet 4.5
Anthropic
62.1%
± 1.7%
7
G
Gemini 2.5 Flash
Google
61.5%
± 2.2%
8
O
GPT OSS
OpenAI
59.8%
± 1.6%

Specialty Performance

Detailed rankings for each medical role

About APEX

The Medical AI Productivity Index (APEX) is developed in collaboration with experts from University of Pennsylvania, Northwestern University, Cornell Medical Center, Brigham and Women's Hospital, and Mount Sinai Health System.

Our benchmark evaluates AI models on authentic medical tasks including diagnosis support, medical imaging analysis, pathology review, and clinical documentation. All scores represent performance on validated test sets with error margins calculated using bootstrap sampling.