SEA-HELM

(Southeast Asian Holistic Evaluation of Language Models)

SEA-HELM is an assessment of large language models across various tasks, with an emphasis on Southeast Asian languages. The leaderboard evaluates models across key multilingual capabilities such as proficiency in Southeast Asian chat, instruction-following in Southeast Asian languages, Southeast Asian linguistic tasks and performance on a suite of English tasks.

76

68 open & 8 closed models tested

Model families: Claude 4*, Gemini 2.5*, GPT-5*, Qwen 3, Gemma 3, Llama 4, Deepseek, Tulu, Apertus, and many more.

*Supported by credits from their respective teams.

1st

SEA-LION v4 Instruct ranking

At <200B model sizes, SEA-LION v4 is the top performing instruct model overall on SEA languages*

*Tested SEA Languages: Burmese, Filipino, Indonesian, Malay, Tamil, Thai and Vietnamese.

SEA Overall

Average of 30 bootstraps. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

AISG logo
AISG logo
SEA-LION v4 (Qwen)
32B
60.63±0.06
Alibaba logo
Alibaba logo
Qwen 3 Next
80B
MoE
60.55±0.05
AISG logo
AISG logo
SEA-LION v4 (Gemma)
27B
59.74±0.06
Google logo
Google logo
Gemma 3
27B
59.63±0.06
Alibaba logo
Alibaba logo
Qwen 3
32B
58.40±0.06

View all scores →

Performance for each SEA Language

Burmese

Average of 30 bootstraps. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

AISG logo
AISG logo
SEA-LION v4 (Qwen)
32B
48.28±0.14
Google logo
Google logo
Gemma 3
27B
47.36±0.17
AISG logo
AISG logo
SEA-LION v4 (Gemma)
27B
46.50±0.16
Meta logo
Meta logo
Llama 4 Scout
109B
MoE
44.27±0.17
Alibaba logo
Alibaba logo
Qwen 3 Next
80B
MoE
43.68±0.16

View all scores →

Filipino

Average of 30 bootstraps. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

AISG logo
AISG logo
SEA-LION v4 (Gemma)
27B
68.10±0.14
Google logo
Google logo
Gemma 3
27B
67.70±0.12
Alibaba logo
Alibaba logo
Qwen 3 Next
80B
MoE
66.48±0.13
AISG logo
AISG logo
SEA-LION v3 (Llama)
70B
66.38±0.17
AISG logo
AISG logo
SEA-LION v4 (Qwen)
32B
65.35±0.14

View all scores →

Indonesian

Average of 30 bootstraps. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

Alibaba logo
Alibaba logo
Qwen 3 Next
80B
MoE
67.11±0.10
CohereLabs logo
CohereLabs logo
Command A 03-2025
111B
66.73±0.17
AISG logo
AISG logo
SEA-LION v4 (Qwen)
32B
66.59±0.10
Alibaba logo
Alibaba logo
Qwen 3
32B
65.67±0.11
Alibaba logo
Alibaba logo
Qwen 2.5
72B
64.82±0.08

View all scores →

Malay

Average of 30 bootstraps. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

Alibaba logo
Alibaba logo
Qwen 3 Next
80B
MoE
62.80±0.12
AISG logo
AISG logo
SEA-LION v4 (Qwen)
32B
61.36±0.14
Alibaba logo
Alibaba logo
Qwen 3
30B
MoE
61.13±0.14
AISG logo
AISG logo
SEA-LION v4 (Gemma)
27B
61.10±0.16
Google logo
Google logo
Gemma 3
27B
60.92±0.17

View all scores →

Tamil

Average of 30 bootstraps. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

AISG logo
AISG logo
SEA-LION v4 (Gemma)
27B
64.43±0.16
Google logo
Google logo
Gemma 3
27B
64.36±0.22
AISG logo
AISG logo
SEA-LION v4 (Qwen)
32B
62.30±0.15
Alibaba logo
Alibaba logo
Qwen 3 Next
80B
MoE
60.05±0.13
Google logo
Google logo
Gemma 3
12B
59.86±0.21

View all scores →

Thai

Average of 30 bootstraps. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

Alibaba logo
Alibaba logo
Qwen 3 Next
80B
MoE
58.09±0.09
AISG logo
AISG logo
SEA-LION v4 (Qwen)
32B
57.91±0.13
Alibaba logo
Alibaba logo
Qwen 3
32B
56.98±0.15
Alibaba logo
Alibaba logo
Qwen 3
30B
MoE
55.77±0.11
Alibaba logo
Alibaba logo
Qwen 3
14B
54.82±0.15

View all scores →

Vietnamese

Average of 30 bootstraps. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

Alibaba logo
Alibaba logo
Qwen 3 Next
80B
MoE
65.68±0.10
Alibaba logo
Alibaba logo
Qwen 3
30B
MoE
65.56±0.14
AISG logo
AISG logo
SEA-LION v4 (Qwen)
32B
62.63±0.14
AISG logo
AISG logo
SEA-LION v3 (Llama)
70B
62.37±0.23
Alibaba logo
Alibaba logo
Qwen 3
32B
62.19±0.12

View all scores →

English

Average of 30 bootstraps. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

Alibaba logo
Alibaba logo
Qwen 3
32B
68.02±0.15
Alibaba logo
Alibaba logo
Qwen 3 Next
80B
MoE
67.31±0.10
AISG logo
AISG logo
SEA-LION v4 (Qwen)
32B
67.10±0.17
Meta logo
Meta logo
Llama 3.3
70B
66.50±0.16
AISG logo
AISG logo
SEA-LION v3 (Llama)
70B
65.20±0.18

View all scores →