SEA-HELM

(Southeast Asian Holistic Evaluation of Language Models)

Jul 10, 2026

SEA-HELM is an assessment of large language models across various tasks, with an emphasis on Southeast Asian languages. The leaderboard evaluates models across key multilingual capabilities such as proficiency in Southeast Asian chat, instruction-following in Southeast Asian languages, Southeast Asian linguistic tasks and performance on a suite of English tasks.

62 open-weights models

Open-weights models served locally. Reasoning and non-reasoning variants of the same model are counted once.

9 closed-weights models

Closed-weights models evaluated through their respective APIs.

SEA Overall

Average of 30 bootstraps. 95% CI are shown.

Model Size: ≤200B

Open instruct models only