SEA-HELM

(Southeast Asian Holistic Evaluation of Language Models)

SEA-HELM is an assessment of large language models across various tasks, with an emphasis on Southeast Asian languages. The leaderboard evaluates models across key multilingual capabilities such as proficiency in Southeast Asian chat, instruction-following in Southeast Asian languages, Southeast Asian linguistic tasks and performance on a suite of English tasks.

68

60 open & 8 closed models tested

Model families: Claude 4*, Gemini 2.5*, GPT-5*, Qwen 3, Gemma 3, Llama 4, Deepseek, Tulu, and many more.

*Supported by credits from their respective teams.

1st

SEA-LION v4 Instruct ranking

At <200B model sizes, SEA-LION v4 is the top performing instruct model overall on SEA languages*

*Tested SEA Languages: Burmese, Filipino, Indonesian, Malay, Tamil, Thai and Vietnamese.

SEA Overall

Average of 8 runs. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

AISG logo
AISG logo
SEA-LION v4
27B
67.52±0.11
Google logo
Google logo
Gemma 3
27B
67.35±0.08
Alibaba logo
Alibaba logo
Qwen 3
32B
65.00±0.16
Google logo
Google logo
Gemma 3
12B
64.88±0.10
AISG logo
AISG logo
SEA-LION v3 (Llama)
70B
64.44±0.38

View all scores →

Performance for each SEA Language

Burmese

Average of 8 runs. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

Google logo
Google logo
Gemma 3
27B
57.78±0.43
AISG logo
AISG logo
SEA-LION v4
27B
57.18±0.42
Meta logo
Meta logo
Llama 4 Scout
109B
MoE
54.76±0.23
Google logo
Google logo
Gemma 3
12B
52.82±0.22
Alibaba logo
Alibaba logo
Qwen 3
32B
43.03±0.63

View all scores →

Filipino

Average of 8 runs. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

AISG logo
AISG logo
SEA-LION v4
27B
74.53±0.28
Google logo
Google logo
Gemma 3
27B
74.09±0.12
AISG logo
AISG logo
SEA-LION v3 (Llama)
70B
72.84±0.46
Google logo
Google logo
Gemma 3
12B
72.02±0.31
Meta logo
Meta logo
Llama 3.3
70B
70.26±0.29

View all scores →

Indonesian

Average of 8 runs. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

CohereLabs logo
CohereLabs logo
Command A 03-2025
111B
74.75±0.66
Alibaba logo
Alibaba logo
Qwen 3
32B
72.81±0.18
Alibaba logo
Alibaba logo
Qwen 3
30B
MoE
72.36±0.28
AISG logo
AISG logo
SEA-LION v3 (Llama)
70B
72.15±0.48
AISG logo
AISG logo
SEA-LION v4
27B
71.89±0.33

View all scores →

Malay

Average of 8 runs. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

Alibaba logo
Alibaba logo
Qwen 3
30B
MoE
71.55±0.28
AISG logo
AISG logo
SEA-LION v4
27B
71.31±0.43
Google logo
Google logo
Gemma 3
27B
71.20±0.34
Alibaba logo
Alibaba logo
Qwen 3
32B
70.01±0.23
AISG logo
AISG logo
SEA-LION v3 (Llama)
70B
69.82±0.43

View all scores →

Tamil

Average of 8 runs. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

AISG logo
AISG logo
SEA-LION v4
27B
68.47±0.30
Google logo
Google logo
Gemma 3
27B
68.45±0.47
Google logo
Google logo
Gemma 3
12B
65.83±0.63
Meta logo
Meta logo
Llama 4 Scout
109B
MoE
64.22±0.28
Alibaba logo
Alibaba logo
Qwen 3
32B
64.10±0.39

View all scores →

Thai

Average of 8 runs. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

Alibaba logo
Alibaba logo
Qwen 3
32B
65.36±0.33
Alibaba logo
Alibaba logo
Qwen 3
30B
MoE
64.57±0.21
AISG logo
AISG logo
SEA-LION v4
27B
63.18±0.16
Alibaba logo
Alibaba logo
Qwen 3
14B
63.01±0.32
Alibaba logo
Alibaba logo
Qwen 2.5
72B
62.91±0.44

View all scores →

Vietnamese

Average of 8 runs. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

Alibaba logo
Alibaba logo
Qwen 3
30B
MoE
72.49±0.20
Alibaba logo
Alibaba logo
Qwen 3
32B
69.94±0.38
AISG logo
AISG logo
SEA-LION v3 (Llama)
70B
69.65±0.55
CohereLabs logo
CohereLabs logo
Command A 03-2025
111B
69.10±0.58
AI2 logo
AI2 logo
Tulu 3
70B
68.85±0.24

View all scores →

English

Average of 8 runs. 95% CI are shown.

Model Size: ≤200B

Open instruct models only

Alibaba logo
Alibaba logo
Qwen 3
32B
73.82±0.29
Meta logo
Meta logo
Llama 3.3
70B
72.16±0.15
Alibaba logo
Alibaba logo
Qwen 3
14B
71.66±0.24
AISG logo
AISG logo
SEA-LION v3 (Llama)
70B
71.35±0.45
Google logo
Google logo
Gemma 3
27B
70.90±0.24

View all scores →