SEA-HELM
(Southeast Asian Holistic Evaluation of Language Models)
SEA-HELM is an assessment of large language models across various tasks, with an emphasis on Southeast Asian languages. The leaderboard evaluates models across key multilingual capabilities such as proficiency in Southeast Asian chat, instruction-following in Southeast Asian languages, Southeast Asian linguistic tasks and performance on a suite of English tasks.
SEA Overall
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
View all scores →
Performance for each SEA Language
Burmese
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
View all scores →
Filipino
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
View all scores →
Indonesian
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
View all scores →
Malay
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
View all scores →
Tamil
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
View all scores →
Thai
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
View all scores →
Vietnamese
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
View all scores →