SEA-HELM
(Southeast Asian Holistic Evaluation of Language Models)
SEA-HELM is an assessment of large language models across various tasks, with an emphasis on Southeast Asian languages. The leaderboard evaluates models across key multilingual capabilities such as proficiency in Southeast Asian chat, instruction-following in Southeast Asian languages, Southeast Asian linguistic tasks and performance on a suite of English tasks.
89
81 open & 8 closed models tested
Model families: Claude 4*, Gemini 2.5*, GPT-5*, Qwen 3, Gemma 3, Llama 4, Deepseek, Tulu, Apertus, and many more.
*Supported by credits from their respective teams.
1st
SEA-LION v4 Instruct ranking
At <200B model sizes, SEA-LION v4 is the top performing instruct model overall on SEA languages*
*Tested SEA Languages: Burmese, Filipino, Indonesian, Malay, Tamil, Thai and Vietnamese.
SEA Overall
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
View all scores →