SEA Performance
Overall SEA Average
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
32B 60.82±0.06 |
80B MoE 60.73±0.05 |
32B 59.88±0.05 |
27B 59.84±0.05 |
27B 59.74±0.06 |
32B 58.60±0.06 |
70B 57.65±0.07 |
12B 56.70±0.06 |
109B MoE 55.85±0.04 |
30B MoE 55.55±0.05 |
70B 54.67±0.08 |
70B 53.28±0.05 |
14B 53.20±0.05 |
72B 52.97±0.05 |
8B 52.18±0.06 |
8B 51.58±0.06 |
27B 51.08±0.07 |
32B 50.07±0.06 |
9B 49.86±0.07 |
70B 49.59±0.07 |
123B 49.02±0.09 |
111B 48.07±0.08 |
8B 46.83±0.06 |
4B 46.53±0.07 |
4B 44.73±0.05 |
9B 44.62±0.07 |
14B 43.75±0.07 |
8B 43.47±0.09 |
10B 43.45±0.08 |
32B 41.03±0.07 |
21B MoE 40.38±0.10 |
70B 39.22±0.05 |
20B 36.93±0.07 |
104B 36.76±0.09 |
7B 36.35±0.05 |
8B 35.81±0.07 |
32B 35.08±0.09 |
8B 34.04±0.06 |
32B 33.98±0.08 |
24B 33.74±0.10 |
8B 32.97±0.08 |
70B 31.37±0.11 |
14B 30.30±0.11 |
8B 29.53±0.06 |
83B 27.66±0.12 |
8B 26.56±0.11 |
9B 26.07±0.07 |
8B 25.41±0.04 |
7B 25.04±0.08 |
13B 22.61±0.06 |
7B 20.69±0.09 |
7B 20.37±0.10 |
8B 18.39±0.10 |
7B 15.11±0.06 |
Language Performance by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | SEA | MY | TL | ID | MS | TA | TH | VI | EN |
|---|---|---|---|---|---|---|---|---|---|
SEA-LION v4 (Qwen) 32B AISG | 60.82 ± 0.06 | 49.56 ± 0.14 | 65.35 ± 0.14 | 66.59 ± 0.10 | 61.36 ± 0.14 | 62.30 ± 0.15 | 57.91 ± 0.13 | 62.63 ± 0.14 | 65.30 ± 0.17 |
Qwen 3 Next 80B MoE Alibaba | 60.73 ± 0.05 | 44.88 ± 0.16 | 66.48 ± 0.13 | 67.11 ± 0.10 | 62.80 ± 0.12 | 60.05 ± 0.13 | 58.09 ± 0.09 | 65.68 ± 0.10 | 65.49 ± 0.10 |
Qwen 3 VL 32B Alibaba | 59.88 ± 0.05 | 40.89 ± 0.19 | 63.87 ± 0.11 | 68.41 ± 0.10 | 63.65 ± 0.14 | 58.50 ± 0.14 | 59.73 ± 0.12 | 64.10 ± 0.15 | 65.04 ± 0.13 |
SEA-LION v4 (Gemma) 27B AISG | 59.84 ± 0.05 | 47.18 ± 0.15 | 68.10 ± 0.14 | 64.33 ± 0.14 | 61.10 ± 0.16 | 64.43 ± 0.16 | 53.46 ± 0.14 | 60.26 ± 0.17 | 61.82 ± 0.21 |
Gemma 3 27B | 59.74 ± 0.06 | 48.14 ± 0.17 | 67.70 ± 0.12 | 64.12 ± 0.15 | 60.92 ± 0.17 | 64.36 ± 0.22 | 52.88 ± 0.12 | 60.06 ± 0.17 | 61.70 ± 0.17 |
Qwen 3 32B Alibaba | 58.60 ± 0.06 | 44.60 ± 0.19 | 62.23 ± 0.13 | 65.67 ± 0.11 | 59.67 ± 0.17 | 58.88 ± 0.24 | 56.98 ± 0.15 | 62.19 ± 0.12 | 66.27 ± 0.15 |
SEA-LION v3 (Llama) 70B AISG | 57.65 ± 0.07 | 38.21 ± 0.28 | 66.38 ± 0.17 | 64.04 ± 0.18 | 59.94 ± 0.17 | 57.99 ± 0.21 | 54.60 ± 0.15 | 62.37 ± 0.23 | 63.47 ± 0.17 |
Gemma 3 12B | 56.70 ± 0.06 | 42.46 ± 0.15 | 65.00 ± 0.11 | 61.80 ± 0.11 | 57.96 ± 0.12 | 59.86 ± 0.21 | 50.75 ± 0.13 | 59.07 ± 0.15 | 56.33 ± 0.21 |
Llama 4 Scout 109B MoE Meta | 55.85 ± 0.04 | 45.54 ± 0.17 | 61.84 ± 0.11 | 61.27 ± 0.11 | 57.38 ± 0.11 | 58.69 ± 0.15 | 48.44 ± 0.08 | 57.78 ± 0.09 | 62.22 ± 0.15 |
Qwen 3 30B MoE Alibaba | 55.55 ± 0.05 | 25.62 ± 0.12 | 61.39 ± 0.12 | 63.29 ± 0.10 | 61.13 ± 0.14 | 56.06 ± 0.16 | 55.77 ± 0.11 | 65.56 ± 0.14 | 60.93 ± 0.15 |
Tulu 3 70B AI2 | 54.67 ± 0.08 | 35.11 ± 0.17 | 62.96 ± 0.24 | 62.66 ± 0.17 | 57.39 ± 0.19 | 50.61 ± 0.23 | 54.25 ± 0.16 | 59.74 ± 0.20 | 57.17 ± 0.19 |
Llama 3.3 70B Meta | 53.28 ± 0.05 | 23.07 ± 0.15 | 63.21 ± 0.11 | 63.17 ± 0.09 | 58.80 ± 0.15 | 54.74 ± 0.15 | 50.05 ± 0.08 | 59.92 ± 0.11 | 64.74 ± 0.16 |
Qwen 3 14B Alibaba | 53.20 ± 0.05 | 32.46 ± 0.13 | 57.37 ± 0.13 | 61.37 ± 0.12 | 55.04 ± 0.11 | 52.30 ± 0.17 | 54.82 ± 0.15 | 59.03 ± 0.14 | 63.44 ± 0.16 |
Qwen 2.5 72B Alibaba | 52.97 ± 0.05 | 27.54 ± 0.20 | 61.94 ± 0.13 | 64.82 ± 0.08 | 59.48 ± 0.14 | 42.51 ± 0.15 | 53.80 ± 0.16 | 60.70 ± 0.13 | 61.71 ± 0.18 |
SEA-LION v4 (Qwen VL) 8B AISG | 52.18 ± 0.06 | 30.67 ± 0.19 | 54.93 ± 0.13 | 61.99 ± 0.15 | 58.46 ± 0.11 | 45.00 ± 0.16 | 54.19 ± 0.15 | 60.02 ± 0.13 | 60.36 ± 0.16 |
Qwen 3 VL 8B Alibaba | 51.58 ± 0.06 | 30.25 ± 0.22 | 53.44 ± 0.16 | 61.58 ± 0.12 | 58.38 ± 0.14 | 42.70 ± 0.18 | 54.52 ± 0.11 | 60.15 ± 0.13 | 56.20 ± 0.12 |
Gemma 2 27B | 51.08 ± 0.07 | 23.97 ± 0.24 | 61.19 ± 0.16 | 59.79 ± 0.15 | 54.06 ± 0.22 | 53.44 ± 0.23 | 49.78 ± 0.17 | 55.33 ± 0.18 | 41.24 ± 0.24 |
Qwen 2.5 32B Alibaba | 50.07 ± 0.06 | 26.99 ± 0.17 | 56.49 ± 0.16 | 61.82 ± 0.10 | 53.71 ± 0.13 | 44.29 ± 0.22 | 50.00 ± 0.12 | 57.15 ± 0.15 | 57.83 ± 0.16 |
SEA-LION v3 (Gemma 2) 9B AISG | 49.86 ± 0.07 | 15.40 ± 0.22 | 60.75 ± 0.22 | 59.14 ± 0.13 | 54.70 ± 0.18 | 53.89 ± 0.21 | 48.95 ± 0.18 | 56.15 ± 0.20 | 43.80 ± 0.18 |
Llama 3.1 70B Meta | 49.59 ± 0.07 | 19.87 ± 0.24 | 61.82 ± 0.17 | 60.48 ± 0.14 | 55.67 ± 0.18 | 46.08 ± 0.21 | 47.14 ± 0.20 | 56.10 ± 0.17 | 57.93 ± 0.18 |
Mistral Large 2411 123B Mistral AI | 49.02 ± 0.09 | 26.17 ± 0.29 | 58.98 ± 0.15 | 58.23 ± 0.24 | 52.73 ± 0.21 | 49.44 ± 0.29 | 45.34 ± 0.17 | 52.21 ± 0.20 | 62.72 ± 0.15 |
Command A 03-2025 111B CohereLabs | 48.07 ± 0.08 | 16.52 ± 0.23 | 49.22 ± 0.21 | 66.73 ± 0.17 | 54.76 ± 0.17 | 53.29 ± 0.20 | 34.58 ± 0.21 | 61.39 ± 0.19 | 62.16 ± 0.19 |
Qwen 3 8B Alibaba | 46.83 ± 0.06 | 26.26 ± 0.20 | 50.42 ± 0.16 | 58.58 ± 0.14 | 53.83 ± 0.15 | 33.23 ± 0.16 | 50.66 ± 0.14 | 54.80 ± 0.15 | 58.97 ± 0.18 |
SEA-LION v4 (Qwen VL) 4B AISG | 46.53 ± 0.07 | 23.31 ± 0.17 | 49.19 ± 0.14 | 56.75 ± 0.10 | 53.58 ± 0.15 | 36.85 ± 0.20 | 51.17 ± 0.12 | 54.87 ± 0.18 | 50.21 ± 0.14 |
Qwen 3 VL 4B Alibaba | 44.73 ± 0.05 | 20.42 ± 0.16 | 44.81 ± 0.19 | 55.95 ± 0.11 | 53.23 ± 0.14 | 33.85 ± 0.20 | 50.31 ± 0.11 | 54.53 ± 0.14 | 49.81 ± 0.12 |
Gemma 2 9B | 44.62 ± 0.07 | 9.63 ± 0.17 | 53.79 ± 0.17 | 54.92 ± 0.18 | 49.72 ± 0.21 | 48.17 ± 0.23 | 44.65 ± 0.17 | 51.44 ± 0.17 | 31.44 ± 0.19 |
Qwen 2.5 14B Alibaba | 43.75 ± 0.07 | 13.59 ± 0.19 | 51.62 ± 0.12 | 58.49 ± 0.12 | 51.49 ± 0.13 | 32.49 ± 0.22 | 46.32 ± 0.13 | 52.22 ± 0.15 | 54.49 ± 0.16 |
SEA-LION v3 (Llama) 8B AISG | 43.47 ± 0.09 | 17.88 ± 0.26 | 49.07 ± 0.22 | 52.51 ± 0.20 | 51.32 ± 0.28 | 42.13 ± 0.32 | 42.37 ± 0.19 | 49.00 ± 0.24 | 44.92 ± 0.28 |
MERaLiON 2 10B A*STAR | 43.45 ± 0.08 | 10.66 ± 0.16 | 52.26 ± 0.20 | 55.06 ± 0.15 | 49.12 ± 0.22 | 45.29 ± 0.25 | 41.93 ± 0.15 | 49.85 ± 0.20 | 30.25 ± 0.19 |
Aya Expanse 32B CohereLabs | 41.03 ± 0.07 | 6.44 ± 0.15 | 47.65 ± 0.13 | 59.65 ± 0.15 | 48.35 ± 0.20 | 40.58 ± 0.15 | 30.47 ± 0.18 | 54.04 ± 0.15 | 36.23 ± 0.26 |
ERNIE 4.5 21B MoE Baidu | 40.38 ± 0.10 | 17.70 ± 0.15 | 45.78 ± 0.35 | 49.00 ± 0.16 | 45.83 ± 0.18 | 40.75 ± 0.18 | 42.75 ± 0.18 | 40.86 ± 0.22 | 59.65 ± 0.16 |
Llama 3 70B Meta | 39.22 ± 0.05 | 13.09 ± 0.17 | 52.32 ± 0.14 | 49.96 ± 0.14 | 43.42 ± 0.10 | 31.13 ± 0.16 | 39.51 ± 0.09 | 45.09 ± 0.11 | 49.87 ± 0.14 |
Sailor2 20B SAIL | 36.93 ± 0.07 | 8.55 ± 0.11 | 51.82 ± 0.15 | 51.53 ± 0.16 | 45.19 ± 0.17 | 35.23 ± 0.20 | 37.12 ± 0.15 | 29.04 ± 0.18 | 32.37 ± 0.14 |
Command R+ 08-2024 104B CohereLabs | 36.76 ± 0.09 | 6.61 ± 0.18 | 45.26 ± 0.29 | 50.74 ± 0.22 | 44.12 ± 0.26 | 29.43 ± 0.39 | 32.19 ± 0.18 | 48.97 ± 0.21 | 30.39 ± 0.15 |
Qwen 2.5 7B Alibaba | 36.35 ± 0.05 | 8.04 ± 0.15 | 38.20 ± 0.16 | 52.56 ± 0.13 | 46.94 ± 0.13 | 21.63 ± 0.13 | 39.89 ± 0.16 | 47.19 ± 0.13 | 42.11 ± 0.18 |
Sailor2 8B SAIL | 35.81 ± 0.07 | 11.65 ± 0.13 | 51.41 ± 0.16 | 46.68 ± 0.18 | 43.90 ± 0.19 | 20.59 ± 0.16 | 35.75 ± 0.15 | 40.67 ± 0.17 | 14.80 ± 0.11 |
Olmo 2 0325 32B AI2 | 35.08 ± 0.09 | 4.38 ± 0.15 | 53.00 ± 0.24 | 50.75 ± 0.21 | 48.28 ± 0.23 | 19.42 ± 0.31 | 32.99 ± 0.22 | 36.77 ± 0.27 | 33.35 ± 0.09 |
Tulu 3 8B AI2 | 34.04 ± 0.06 | 10.95 ± 0.14 | 33.94 ± 0.24 | 42.96 ± 0.13 | 42.86 ± 0.14 | 24.58 ± 0.25 | 39.85 ± 0.17 | 43.13 ± 0.23 | 36.94 ± 0.17 |
Command R 08-2024 32B CohereLabs | 33.98 ± 0.08 | 4.87 ± 0.15 | 38.24 ± 0.25 | 49.15 ± 0.20 | 39.00 ± 0.21 | 36.43 ± 0.27 | 27.84 ± 0.22 | 42.34 ± 0.25 | 28.37 ± 0.24 |
Mistral Small 3.1 2503 24B Mistral AI | 33.74 ± 0.10 | 2.22 ± 0.11 | 47.51 ± 0.33 | 53.67 ± 0.22 | 44.39 ± 0.22 | 10.39 ± 0.28 | 31.49 ± 0.25 | 46.52 ± 0.24 | 48.45 ± 0.17 |
Llama 3.1 8B Meta | 32.97 ± 0.08 | 8.45 ± 0.19 | 39.63 ± 0.22 | 47.23 ± 0.21 | 45.49 ± 0.22 | 17.53 ± 0.26 | 33.65 ± 0.18 | 38.83 ± 0.20 | 38.40 ± 0.19 |
Apertus 70B Swiss AI | 31.37 ± 0.11 | 13.09 ± 0.25 | 34.68 ± 0.29 | 33.31 ± 0.22 | 42.69 ± 0.24 | 28.48 ± 0.30 | 29.18 ± 0.23 | 38.12 ± 0.25 | 26.18 ± 0.16 |
phi-4 14B Microsoft | 30.30 ± 0.11 | 6.45 ± 0.18 | 29.72 ± 0.23 | 49.14 ± 0.28 | 39.71 ± 0.24 | 22.74 ± 0.23 | 25.91 ± 0.26 | 38.42 ± 0.23 | 56.33 ± 0.20 |
Aya Expanse 8B CohereLabs | 29.53 ± 0.06 | 3.03 ± 0.13 | 29.70 ± 0.16 | 49.78 ± 0.15 | 44.12 ± 0.18 | 17.05 ± 0.23 | 17.22 ± 0.13 | 45.78 ± 0.17 | 23.32 ± 0.20 |
Babel 83B Alibaba-DAMO | 27.66 ± 0.12 | 9.87 ± 0.23 | 29.79 ± 0.41 | 36.39 ± 0.32 | 28.79 ± 0.27 | 25.82 ± 0.33 | 25.76 ± 0.30 | 37.19 ± 0.32 | 29.06 ± 0.19 |
Apertus 8B Swiss AI | 26.56 ± 0.11 | 9.30 ± 0.21 | 23.09 ± 0.39 | 33.71 ± 0.29 | 38.91 ± 0.31 | 17.21 ± 0.27 | 29.35 ± 0.30 | 34.32 ± 0.34 | 21.38 ± 0.22 |
Babel 9B Alibaba-DAMO | 26.07 ± 0.07 | 8.01 ± 0.18 | 27.56 ± 0.27 | 32.20 ± 0.26 | 31.66 ± 0.22 | 16.70 ± 0.21 | 32.17 ± 0.22 | 34.19 ± 0.20 | 19.95 ± 0.14 |
Llama 3 8B Meta | 25.41 ± 0.04 | 3.20 ± 0.06 | 30.16 ± 0.15 | 38.80 ± 0.15 | 35.77 ± 0.16 | 8.95 ± 0.16 | 25.16 ± 0.15 | 35.81 ± 0.15 | 28.85 ± 0.17 |
SeaLLMs V3 7B Alibaba-DAMO | 25.04 ± 0.08 | 7.13 ± 0.16 | 26.67 ± 0.34 | 32.57 ± 0.22 | 36.82 ± 0.28 | 11.92 ± 0.18 | 29.69 ± 0.27 | 30.44 ± 0.23 | 20.87 ± 0.11 |
Olmo 2 1124 13B AI2 | 22.61 ± 0.06 | 1.84 ± 0.09 | 35.13 ± 0.29 | 36.46 ± 0.26 | 38.54 ± 0.19 | 8.48 ± 0.18 | 14.08 ± 0.18 | 23.71 ± 0.25 | 30.19 ± 0.18 |
Olmo 3 7B AI2 | 20.69 ± 0.09 | 3.41 ± 0.11 | 23.19 ± 0.26 | 31.49 ± 0.24 | 30.18 ± 0.22 | 12.85 ± 0.19 | 15.64 ± 0.24 | 28.06 ± 0.21 | 43.41 ± 0.19 |
Command R7B 12-2024 7B CohereLabs | 20.37 ± 0.10 | 3.19 ± 0.13 | 25.28 ± 0.33 | 33.07 ± 0.25 | 28.08 ± 0.26 | 13.32 ± 0.18 | 14.64 ± 0.20 | 24.98 ± 0.31 | 30.63 ± 0.14 |
Ministral 2410 8B Mistral AI | 18.39 ± 0.10 | 3.90 ± 0.13 | 19.24 ± 0.35 | 28.04 ± 0.21 | 23.78 ± 0.30 | 11.11 ± 0.19 | 17.12 ± 0.22 | 25.51 ± 0.29 | 25.17 ± 0.26 |
Olmo 2 1124 7B AI2 | 15.11 ± 0.06 | 2.18 ± 0.12 | 14.84 ± 0.19 | 25.27 ± 0.27 | 29.81 ± 0.22 | 6.98 ± 0.14 | 11.14 ± 0.19 | 15.57 ± 0.22 | 22.37 ± 0.13 |