Malay Performance
Malay Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
31B 70.89±0.09 |
122B MoE 69.77±0.13 |
27B 69.69±0.10 |
27B 69.61±0.20 |
26B MoE 68.76±0.08 |
27B 68.45±0.21 |
32B 68.06±0.10 |
70B 67.68±0.20 |
32B 67.13±0.12 |
70B 67.05±0.12 |
35B MoE 66.72±0.22 |
27B 66.72±0.17 |
27B 66.47±0.13 |
128B 66.21±0.13 |
109B MoE 65.61±0.14 |
35B MoE 65.52±0.24 |
8B 64.61±0.10 |
8B 64.21±0.10 |
12B 63.86±0.07 |
8B 63.54±0.12 |
120B MoE 62.91±0.20 |
9B 61.58±0.23 |
4B 60.58±0.15 |
4B 59.90±0.16 |
9B 59.22±0.23 |
119B MoE 58.87±0.18 |
8B 58.08±0.20 |
5B 57.63±0.18 |
5B 57.35±0.15 |
4B 57.10±0.13 |
10B 55.68±0.22 |
32B 55.07±0.16 |
4B 54.74±0.15 |
4B 52.38±0.25 |
8B 51.69±0.15 |
30B MoE 51.42±0.34 |
8B 50.92±0.15 |
8B 45.08±0.32 |
3B 43.00±0.24 |
30B MoE 38.61±0.24 |
3B 35.94±0.28 |
3B 35.33±0.30 |
3B 31.94±0.17 |
Malay Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MS | Instruction Following | Knowledge | Multi-Turn Chat | NLG | NLU | Safety |
|---|---|---|---|---|---|---|---|
Gemma 4 31B | 70.89 ± 0.09 | 89.75 ± 0.42 | 77.06 ± 0.10 | 83.53 ± 0.33 | 92.68 ± 0.01 | 73.18 ± 0.06 | 9.11 ± 0.15 |
Qwen 3.5 122B MoE Alibaba | 69.77 ± 0.13 | 85.02 ± 0.48 | 76.95 ± 0.40 | 85.29 ± 0.39 | 91.93 ± 0.02 | 70.73 ± 0.15 | 8.73 ± 0.40 |
SEA-LION v4.5 (Qwen) 27B AISG | 69.69 ± 0.10 | 85.08 ± 0.42 | 76.73 ± 0.33 | 82.98 ± 0.31 | 91.87 ± 0.02 | 71.04 ± 0.11 | 10.44 ± 0.39 |
Qwen 3.5 27B Alibaba | 69.61 ± 0.20 | 85.84 ± 0.61 | 77.79 ± 0.43 | 82.98 ± 0.32 | 91.63 ± 0.02 | 70.32 ± 0.16 | 9.07 ± 0.58 |
Gemma 4 26B MoE | 68.76 ± 0.08 | 86.03 ± 0.41 | 71.11 ± 0.17 | 84.25 ± 0.28 | 92.31 ± 0.01 | 70.71 ± 0.08 | 8.13 ± 0.14 |
Qwen 3.6 27B Alibaba | 68.45 ± 0.21 | 83.59 ± 0.50 | 74.43 ± 0.50 | 85.39 ± 0.38 | 90.83 ± 0.03 | 70.06 ± 0.19 | 6.41 ± 0.69 |
Qwen 3 VL 32B Alibaba | 68.06 ± 0.10 | 82.67 ± 0.39 | 69.54 ± 0.25 | 82.12 ± 0.31 | 90.60 ± 0.02 | 69.48 ± 0.07 | 13.96 ± 0.19 |
SEA-LION v3 (Llama) 70B AISG | 67.68 ± 0.20 | 84.54 ± 0.66 | 74.78 ± 0.33 | 72.60 ± 0.46 | 91.04 ± 0.02 | 69.54 ± 0.18 | 13.56 ± 0.46 |
SEA-LION v4 (Qwen) 32B AISG | 67.13 ± 0.12 | 79.49 ± 0.56 | 69.21 ± 0.15 | 78.70 ± 0.40 | 90.48 ± 0.02 | 68.41 ± 0.07 | 16.49 ± 0.13 |
Llama 3.3 70B Meta | 67.05 ± 0.12 | 87.14 ± 0.32 | 72.10 ± 0.19 | 65.80 ± 0.44 | 90.08 ± 0.02 | 69.77 ± 0.10 | 17.43 ± 0.21 |
Qwen 3.6 35B MoE Alibaba | 66.72 ± 0.22 | 81.49 ± 0.88 | 70.27 ± 0.54 | 82.56 ± 0.35 | 90.03 ± 0.04 | 68.22 ± 0.24 | 7.75 ± 0.68 |
SEA-LION v4 (Gemma) 27B AISG | 66.72 ± 0.17 | 83.43 ± 0.77 | 64.60 ± 0.27 | 77.57 ± 0.44 | 91.82 ± 0.01 | 68.71 ± 0.10 | 14.19 ± 0.28 |
Gemma 3 27B | 66.47 ± 0.13 | 81.90 ± 0.62 | 64.34 ± 0.21 | 77.75 ± 0.35 | 91.85 ± 0.01 | 68.74 ± 0.12 | 14.23 ± 0.21 |
Mistral Medium 3.5 128B Mistral AI | 66.21 ± 0.13 | 76.29 ± 0.42 | 69.18 ± 0.53 | 81.94 ± 0.43 | 90.09 ± 0.06 | 69.91 ± 0.14 | 9.87 ± 0.41 |
Llama 4 Scout 109B MoE Meta | 65.61 ± 0.14 | 85.81 ± 0.44 | 64.71 ± 0.18 | 69.42 ± 0.53 | 91.15 ± 0.01 | 69.07 ± 0.06 | 13.50 ± 0.12 |
Qwen 3.5 35B MoE Alibaba | 65.52 ± 0.24 | 85.62 ± 0.52 | 56.61 ± 0.94 | 84.14 ± 0.51 | 89.38 ± 0.04 | 66.29 ± 0.25 | 11.07 ± 0.78 |
Qwen 3 VL 8B Alibaba | 64.61 ± 0.10 | 82.60 ± 0.42 | 55.07 ± 0.21 | 77.93 ± 0.33 | 88.30 ± 0.03 | 67.09 ± 0.08 | 16.65 ± 0.24 |
SEA-LION v4 (Qwen VL) 8B AISG | 64.21 ± 0.10 | 84.10 ± 0.39 | 54.97 ± 0.15 | 74.53 ± 0.35 | 88.76 ± 0.02 | 66.76 ± 0.08 | 16.12 ± 0.14 |
Gemma 3 12B | 63.86 ± 0.07 | 79.05 ± 0.00 | 57.26 ± 0.00 | 74.82 ± 0.43 | 91.00 ± 0.00 | 66.65 ± 0.00 | 14.40 ± 0.00 |
Gemma 4 (E4B) 8B | 63.54 ± 0.12 | 84.83 ± 0.43 | 52.98 ± 0.56 | 77.66 ± 0.44 | 90.92 ± 0.02 | 66.40 ± 0.15 | 8.44 ± 0.28 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 62.91 ± 0.20 | 76.83 ± 0.80 | 59.12 ± 0.61 | 82.36 ± 0.42 | 83.84 ± 0.04 | 63.05 ± 0.25 | 12.28 ± 0.42 |
SEA-LION v3 (Gemma 2) 9B AISG | 61.58 ± 0.23 | 80.98 ± 1.01 | 53.93 ± 0.40 | 66.09 ± 0.48 | 89.95 ± 0.04 | 66.46 ± 0.13 | 12.05 ± 0.36 |
SEA-LION v4 (Qwen VL) 4B AISG | 60.58 ± 0.15 | 79.40 ± 0.53 | 47.14 ± 0.23 | 70.68 ± 0.53 | 86.31 ± 0.03 | 64.85 ± 0.07 | 15.09 ± 0.11 |
Qwen 3 VL 4B Alibaba | 59.90 ± 0.16 | 81.11 ± 0.59 | 45.03 ± 0.30 | 69.88 ± 0.52 | 86.09 ± 0.03 | 64.53 ± 0.07 | 12.77 ± 0.11 |
Qwen 3.5 9B Alibaba | 59.22 ± 0.23 | 78.19 ± 0.77 | 52.93 ± 0.70 | 76.02 ± 0.51 | 81.43 ± 0.12 | 63.99 ± 0.27 | 2.75 ± 0.49 |
Mistral Small 4 119B MoE Mistral AI | 58.87 ± 0.18 | 68.51 ± 0.89 | 53.30 ± 0.51 | 74.24 ± 0.59 | 82.34 ± 0.13 | 62.81 ± 0.23 | 12.03 ± 0.39 |
SEA-LION v3 (Llama) 8B AISG | 58.08 ± 0.20 | 80.19 ± 0.85 | 45.34 ± 0.51 | 58.75 ± 0.70 | 89.48 ± 0.03 | 60.61 ± 0.20 | 14.12 ± 0.44 |
Gemma 4 (E2B) 5B | 57.63 ± 0.18 | 81.94 ± 0.68 | 38.32 ± 0.36 | 73.67 ± 0.43 | 89.21 ± 0.01 | 58.53 ± 0.12 | 4.13 ± 0.19 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.35 ± 0.15 | 81.71 ± 0.72 | 38.33 ± 0.37 | 71.89 ± 0.47 | 89.27 ± 0.02 | 59.21 ± 0.12 | 3.67 ± 0.20 |
SEA-LION v4 (Gemma VL) 4B AISG | 57.10 ± 0.13 | 79.05 ± 0.51 | 37.29 ± 0.13 | 66.77 ± 0.48 | 88.19 ± 0.02 | 57.28 ± 0.13 | 14.01 ± 0.16 |
MERaLiON 2 10B A*STAR | 55.68 ± 0.22 | 62.89 ± 1.01 | 50.90 ± 0.40 | 54.85 ± 0.58 | 86.11 ± 0.06 | 64.53 ± 0.15 | 14.80 ± 0.33 |
Olmo 3.1 32B AI2 | 55.07 ± 0.16 | 70.67 ± 0.55 | 40.51 ± 0.49 | 71.32 ± 0.53 | 86.15 ± 0.04 | 56.12 ± 0.18 | 5.63 ± 0.22 |
Gemma 3 4B | 54.74 ± 0.15 | 76.38 ± 0.72 | 36.16 ± 0.18 | 64.22 ± 0.55 | 87.10 ± 0.05 | 53.61 ± 0.11 | 10.99 ± 0.13 |
Qwen 3.5 4B Alibaba | 52.38 ± 0.25 | 69.94 ± 1.08 | 43.16 ± 0.73 | 64.43 ± 0.73 | 72.73 ± 0.13 | 60.10 ± 0.30 | 3.90 ± 0.63 |
Llama 3.1 8B Meta | 51.69 ± 0.15 | 62.57 ± 0.85 | 36.65 ± 0.47 | 50.07 ± 0.52 | 87.30 ± 0.04 | 58.87 ± 0.15 | 14.66 ± 0.29 |
GLM 4.7 Flash 30B MoE Z.ai | 51.42 ± 0.34 | 69.08 ± 1.16 | 36.35 ± 0.78 | 54.57 ± 0.83 | 79.67 ± 0.14 | 52.38 ± 0.30 | 16.45 ± 0.62 |
SEA-LION v4 (Apertus) 8B AISG | 50.92 ± 0.15 | 63.75 ± 0.73 | 37.88 ± 0.32 | 46.62 ± 0.57 | 89.29 ± 0.02 | 55.27 ± 0.15 | 12.72 ± 0.33 |
Apertus 8B Swiss AI | 45.08 ± 0.32 | 64.63 ± 0.83 | 31.33 ± 0.72 | 41.75 ± 0.60 | 86.18 ± 0.07 | 36.81 ± 0.39 | 9.78 ± 0.90 |
Llama 3.2 3B Meta | 43.00 ± 0.24 | 61.37 ± 0.98 | 23.99 ± 0.62 | 42.03 ± 0.55 | 76.70 ± 0.07 | 47.44 ± 0.23 | 6.46 ± 0.28 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 38.61 ± 0.24 | 58.79 ± 1.23 | 27.05 ± 0.79 | 61.07 ± 0.66 | 66.28 ± 0.13 | 18.48 ± 0.25 | 0.00 ± 0.00 |
Tiny Aya Water 3B CohereLabs | 35.94 ± 0.28 | 64.51 ± 1.27 | 33.09 ± 0.73 | 29.12 ± 0.61 | 59.21 ± 0.24 | 25.39 ± 0.53 | 4.35 ± 0.76 |
Tiny Aya Global 3B CohereLabs | 35.33 ± 0.30 | 62.16 ± 1.60 | 32.61 ± 0.75 | 26.48 ± 0.67 | 59.43 ± 0.22 | 27.68 ± 0.39 | 3.59 ± 0.54 |
MERaLiON 2 3B A*STAR | 31.94 ± 0.17 | 43.37 ± 0.87 | 11.78 ± 0.55 | 28.87 ± 0.38 | 74.02 ± 0.11 | 26.61 ± 0.23 | 6.99 ± 0.34 |
Malay Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MS | Instruction Following | SEA-IFEval |
|---|---|---|---|
Gemma 4 31B | 70.89 ± 0.09 | 89.75 ± 0.42 | 89.75 ± 0.42 |
Qwen 3.5 122B MoE Alibaba | 69.77 ± 0.13 | 85.02 ± 0.48 | 85.02 ± 0.48 |
SEA-LION v4.5 (Qwen) 27B AISG | 69.69 ± 0.10 | 85.08 ± 0.42 | 85.08 ± 0.42 |
Qwen 3.5 27B Alibaba | 69.61 ± 0.20 | 85.84 ± 0.61 | 85.84 ± 0.61 |
Gemma 4 26B MoE | 68.76 ± 0.08 | 86.03 ± 0.41 | 86.03 ± 0.41 |
Qwen 3.6 27B Alibaba | 68.45 ± 0.21 | 83.59 ± 0.50 | 83.59 ± 0.50 |
Qwen 3 VL 32B Alibaba | 68.06 ± 0.10 | 82.67 ± 0.39 | 82.67 ± 0.39 |
SEA-LION v3 (Llama) 70B AISG | 67.68 ± 0.20 | 84.54 ± 0.66 | 84.54 ± 0.66 |
SEA-LION v4 (Qwen) 32B AISG | 67.13 ± 0.12 | 79.49 ± 0.56 | 79.49 ± 0.56 |
Llama 3.3 70B Meta | 67.05 ± 0.12 | 87.14 ± 0.32 | 87.14 ± 0.32 |
Qwen 3.6 35B MoE Alibaba | 66.72 ± 0.22 | 81.49 ± 0.88 | 81.49 ± 0.88 |
SEA-LION v4 (Gemma) 27B AISG | 66.72 ± 0.17 | 83.43 ± 0.77 | 83.43 ± 0.77 |
Gemma 3 27B | 66.47 ± 0.13 | 81.90 ± 0.62 | 81.90 ± 0.62 |
Mistral Medium 3.5 128B Mistral AI | 66.21 ± 0.13 | 76.29 ± 0.42 | 76.29 ± 0.42 |
Llama 4 Scout 109B MoE Meta | 65.61 ± 0.14 | 85.81 ± 0.44 | 85.81 ± 0.44 |
Qwen 3.5 35B MoE Alibaba | 65.52 ± 0.24 | 85.62 ± 0.52 | 85.62 ± 0.52 |
Qwen 3 VL 8B Alibaba | 64.61 ± 0.10 | 82.60 ± 0.42 | 82.60 ± 0.42 |
SEA-LION v4 (Qwen VL) 8B AISG | 64.21 ± 0.10 | 84.10 ± 0.39 | 84.10 ± 0.39 |
Gemma 3 12B | 63.86 ± 0.07 | 79.05 ± 0.00 | 79.05 ± 0.00 |
Gemma 4 (E4B) 8B | 63.54 ± 0.12 | 84.83 ± 0.43 | 84.83 ± 0.43 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 62.91 ± 0.20 | 76.83 ± 0.80 | 76.83 ± 0.80 |
SEA-LION v3 (Gemma 2) 9B AISG | 61.58 ± 0.23 | 80.98 ± 1.01 | 80.98 ± 1.01 |
SEA-LION v4 (Qwen VL) 4B AISG | 60.58 ± 0.15 | 79.40 ± 0.53 | 79.40 ± 0.53 |
Qwen 3 VL 4B Alibaba | 59.90 ± 0.16 | 81.11 ± 0.59 | 81.11 ± 0.59 |
Qwen 3.5 9B Alibaba | 59.22 ± 0.23 | 78.19 ± 0.77 | 78.19 ± 0.77 |
Mistral Small 4 119B MoE Mistral AI | 58.87 ± 0.18 | 68.51 ± 0.89 | 68.51 ± 0.89 |
SEA-LION v3 (Llama) 8B AISG | 58.08 ± 0.20 | 80.19 ± 0.85 | 80.19 ± 0.85 |
Gemma 4 (E2B) 5B | 57.63 ± 0.18 | 81.94 ± 0.68 | 81.94 ± 0.68 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.35 ± 0.15 | 81.71 ± 0.72 | 81.71 ± 0.72 |
SEA-LION v4 (Gemma VL) 4B AISG | 57.10 ± 0.13 | 79.05 ± 0.51 | 79.05 ± 0.51 |
MERaLiON 2 10B A*STAR | 55.68 ± 0.22 | 62.89 ± 1.01 | 62.89 ± 1.01 |
Olmo 3.1 32B AI2 | 55.07 ± 0.16 | 70.67 ± 0.55 | 70.67 ± 0.55 |
Gemma 3 4B | 54.74 ± 0.15 | 76.38 ± 0.72 | 76.38 ± 0.72 |
Qwen 3.5 4B Alibaba | 52.38 ± 0.25 | 69.94 ± 1.08 | 69.94 ± 1.08 |
Llama 3.1 8B Meta | 51.69 ± 0.15 | 62.57 ± 0.85 | 62.57 ± 0.85 |
GLM 4.7 Flash 30B MoE Z.ai | 51.42 ± 0.34 | 69.08 ± 1.16 | 69.08 ± 1.16 |
SEA-LION v4 (Apertus) 8B AISG | 50.92 ± 0.15 | 63.75 ± 0.73 | 63.75 ± 0.73 |
Apertus 8B Swiss AI | 45.08 ± 0.32 | 64.63 ± 0.83 | 64.63 ± 0.83 |
Llama 3.2 3B Meta | 43.00 ± 0.24 | 61.37 ± 0.98 | 61.37 ± 0.98 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 38.61 ± 0.24 | 58.79 ± 1.23 | 58.79 ± 1.23 |
Tiny Aya Water 3B CohereLabs | 35.94 ± 0.28 | 64.51 ± 1.27 | 64.51 ± 1.27 |
Tiny Aya Global 3B CohereLabs | 35.33 ± 0.30 | 62.16 ± 1.60 | 62.16 ± 1.60 |
MERaLiON 2 3B A*STAR | 31.94 ± 0.17 | 43.37 ± 0.87 | 43.37 ± 0.87 |
Model | MS | Knowledge | Global MMLU Lite |
|---|---|---|---|
Gemma 4 31B | 70.89 ± 0.09 | 77.06 ± 0.10 | 77.06 ± 0.10 |
Qwen 3.5 122B MoE Alibaba | 69.77 ± 0.13 | 76.95 ± 0.40 | 76.95 ± 0.40 |
SEA-LION v4.5 (Qwen) 27B AISG | 69.69 ± 0.10 | 76.73 ± 0.33 | 76.73 ± 0.33 |
Qwen 3.5 27B Alibaba | 69.61 ± 0.20 | 77.79 ± 0.43 | 77.79 ± 0.43 |
Gemma 4 26B MoE | 68.76 ± 0.08 | 71.11 ± 0.17 | 71.11 ± 0.17 |
Qwen 3.6 27B Alibaba | 68.45 ± 0.21 | 74.43 ± 0.50 | 74.43 ± 0.50 |
Qwen 3 VL 32B Alibaba | 68.06 ± 0.10 | 69.54 ± 0.25 | 69.54 ± 0.25 |
SEA-LION v3 (Llama) 70B AISG | 67.68 ± 0.20 | 74.78 ± 0.33 | 74.78 ± 0.33 |
SEA-LION v4 (Qwen) 32B AISG | 67.13 ± 0.12 | 69.21 ± 0.15 | 69.21 ± 0.15 |
Llama 3.3 70B Meta | 67.05 ± 0.12 | 72.10 ± 0.19 | 72.10 ± 0.19 |
Qwen 3.6 35B MoE Alibaba | 66.72 ± 0.22 | 70.27 ± 0.54 | 70.27 ± 0.54 |
SEA-LION v4 (Gemma) 27B AISG | 66.72 ± 0.17 | 64.60 ± 0.27 | 64.60 ± 0.27 |
Gemma 3 27B | 66.47 ± 0.13 | 64.34 ± 0.21 | 64.34 ± 0.21 |
Mistral Medium 3.5 128B Mistral AI | 66.21 ± 0.13 | 69.18 ± 0.53 | 69.18 ± 0.53 |
Llama 4 Scout 109B MoE Meta | 65.61 ± 0.14 | 64.71 ± 0.18 | 64.71 ± 0.18 |
Qwen 3.5 35B MoE Alibaba | 65.52 ± 0.24 | 56.61 ± 0.94 | 56.61 ± 0.94 |
Qwen 3 VL 8B Alibaba | 64.61 ± 0.10 | 55.07 ± 0.21 | 55.07 ± 0.21 |
SEA-LION v4 (Qwen VL) 8B AISG | 64.21 ± 0.10 | 54.97 ± 0.15 | 54.97 ± 0.15 |
Gemma 3 12B | 63.86 ± 0.07 | 57.26 ± 0.00 | 57.26 ± 0.00 |
Gemma 4 (E4B) 8B | 63.54 ± 0.12 | 52.98 ± 0.56 | 52.98 ± 0.56 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 62.91 ± 0.20 | 59.12 ± 0.61 | 59.12 ± 0.61 |
SEA-LION v3 (Gemma 2) 9B AISG | 61.58 ± 0.23 | 53.93 ± 0.40 | 53.93 ± 0.40 |
SEA-LION v4 (Qwen VL) 4B AISG | 60.58 ± 0.15 | 47.14 ± 0.23 | 47.14 ± 0.23 |
Qwen 3 VL 4B Alibaba | 59.90 ± 0.16 | 45.03 ± 0.30 | 45.03 ± 0.30 |
Qwen 3.5 9B Alibaba | 59.22 ± 0.23 | 52.93 ± 0.70 | 52.93 ± 0.70 |
Mistral Small 4 119B MoE Mistral AI | 58.87 ± 0.18 | 53.30 ± 0.51 | 53.30 ± 0.51 |
SEA-LION v3 (Llama) 8B AISG | 58.08 ± 0.20 | 45.34 ± 0.51 | 45.34 ± 0.51 |
Gemma 4 (E2B) 5B | 57.63 ± 0.18 | 38.32 ± 0.36 | 38.32 ± 0.36 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.35 ± 0.15 | 38.33 ± 0.37 | 38.33 ± 0.37 |
SEA-LION v4 (Gemma VL) 4B AISG | 57.10 ± 0.13 | 37.29 ± 0.13 | 37.29 ± 0.13 |
MERaLiON 2 10B A*STAR | 55.68 ± 0.22 | 50.90 ± 0.40 | 50.90 ± 0.40 |
Olmo 3.1 32B AI2 | 55.07 ± 0.16 | 40.51 ± 0.49 | 40.51 ± 0.49 |
Gemma 3 4B | 54.74 ± 0.15 | 36.16 ± 0.18 | 36.16 ± 0.18 |
Qwen 3.5 4B Alibaba | 52.38 ± 0.25 | 43.16 ± 0.73 | 43.16 ± 0.73 |
Llama 3.1 8B Meta | 51.69 ± 0.15 | 36.65 ± 0.47 | 36.65 ± 0.47 |
GLM 4.7 Flash 30B MoE Z.ai | 51.42 ± 0.34 | 36.35 ± 0.78 | 36.35 ± 0.78 |
SEA-LION v4 (Apertus) 8B AISG | 50.92 ± 0.15 | 37.88 ± 0.32 | 37.88 ± 0.32 |
Apertus 8B Swiss AI | 45.08 ± 0.32 | 31.33 ± 0.72 | 31.33 ± 0.72 |
Llama 3.2 3B Meta | 43.00 ± 0.24 | 23.99 ± 0.62 | 23.99 ± 0.62 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 38.61 ± 0.24 | 27.05 ± 0.79 | 27.05 ± 0.79 |
Tiny Aya Water 3B CohereLabs | 35.94 ± 0.28 | 33.09 ± 0.73 | 33.09 ± 0.73 |
Tiny Aya Global 3B CohereLabs | 35.33 ± 0.30 | 32.61 ± 0.75 | 32.61 ± 0.75 |
MERaLiON 2 3B A*STAR | 31.94 ± 0.17 | 11.78 ± 0.55 | 11.78 ± 0.55 |
Model | MS | Multi-Turn Chat | SEA-MT-Bench (LLM Judge) |
|---|---|---|---|
Gemma 4 31B | 70.89 ± 0.09 | 83.53 ± 0.33 | 83.53 ± 0.33 |
Qwen 3.5 122B MoE Alibaba | 69.77 ± 0.13 | 85.29 ± 0.39 | 85.29 ± 0.39 |
SEA-LION v4.5 (Qwen) 27B AISG | 69.69 ± 0.10 | 82.98 ± 0.31 | 82.98 ± 0.31 |
Qwen 3.5 27B Alibaba | 69.61 ± 0.20 | 82.98 ± 0.32 | 82.98 ± 0.32 |
Gemma 4 26B MoE | 68.76 ± 0.08 | 84.25 ± 0.28 | 84.25 ± 0.28 |
Qwen 3.6 27B Alibaba | 68.45 ± 0.21 | 85.39 ± 0.38 | 85.39 ± 0.38 |
Qwen 3 VL 32B Alibaba | 68.06 ± 0.10 | 82.12 ± 0.31 | 82.12 ± 0.31 |
SEA-LION v3 (Llama) 70B AISG | 67.68 ± 0.20 | 72.60 ± 0.46 | 72.60 ± 0.46 |
SEA-LION v4 (Qwen) 32B AISG | 67.13 ± 0.12 | 78.70 ± 0.40 | 78.70 ± 0.40 |
Llama 3.3 70B Meta | 67.05 ± 0.12 | 65.80 ± 0.44 | 65.80 ± 0.44 |
Qwen 3.6 35B MoE Alibaba | 66.72 ± 0.22 | 82.56 ± 0.35 | 82.56 ± 0.35 |
SEA-LION v4 (Gemma) 27B AISG | 66.72 ± 0.17 | 77.57 ± 0.44 | 77.57 ± 0.44 |
Gemma 3 27B | 66.47 ± 0.13 | 77.75 ± 0.35 | 77.75 ± 0.35 |
Mistral Medium 3.5 128B Mistral AI | 66.21 ± 0.13 | 81.94 ± 0.43 | 81.94 ± 0.43 |
Llama 4 Scout 109B MoE Meta | 65.61 ± 0.14 | 69.42 ± 0.53 | 69.42 ± 0.53 |
Qwen 3.5 35B MoE Alibaba | 65.52 ± 0.24 | 84.14 ± 0.51 | 84.14 ± 0.51 |
Qwen 3 VL 8B Alibaba | 64.61 ± 0.10 | 77.93 ± 0.33 | 77.93 ± 0.33 |
SEA-LION v4 (Qwen VL) 8B AISG | 64.21 ± 0.10 | 74.53 ± 0.35 | 74.53 ± 0.35 |
Gemma 3 12B | 63.86 ± 0.07 | 74.82 ± 0.43 | 74.82 ± 0.43 |
Gemma 4 (E4B) 8B | 63.54 ± 0.12 | 77.66 ± 0.44 | 77.66 ± 0.44 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 62.91 ± 0.20 | 82.36 ± 0.42 | 82.36 ± 0.42 |
SEA-LION v3 (Gemma 2) 9B AISG | 61.58 ± 0.23 | 66.09 ± 0.48 | 66.09 ± 0.48 |
SEA-LION v4 (Qwen VL) 4B AISG | 60.58 ± 0.15 | 70.68 ± 0.53 | 70.68 ± 0.53 |
Qwen 3 VL 4B Alibaba | 59.90 ± 0.16 | 69.88 ± 0.52 | 69.88 ± 0.52 |
Qwen 3.5 9B Alibaba | 59.22 ± 0.23 | 76.02 ± 0.51 | 76.02 ± 0.51 |
Mistral Small 4 119B MoE Mistral AI | 58.87 ± 0.18 | 74.24 ± 0.59 | 74.24 ± 0.59 |
SEA-LION v3 (Llama) 8B AISG | 58.08 ± 0.20 | 58.75 ± 0.70 | 58.75 ± 0.70 |
Gemma 4 (E2B) 5B | 57.63 ± 0.18 | 73.67 ± 0.43 | 73.67 ± 0.43 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.35 ± 0.15 | 71.89 ± 0.47 | 71.89 ± 0.47 |
SEA-LION v4 (Gemma VL) 4B AISG | 57.10 ± 0.13 | 66.77 ± 0.48 | 66.77 ± 0.48 |
MERaLiON 2 10B A*STAR | 55.68 ± 0.22 | 54.85 ± 0.58 | 54.85 ± 0.58 |
Olmo 3.1 32B AI2 | 55.07 ± 0.16 | 71.32 ± 0.53 | 71.32 ± 0.53 |
Gemma 3 4B | 54.74 ± 0.15 | 64.22 ± 0.55 | 64.22 ± 0.55 |
Qwen 3.5 4B Alibaba | 52.38 ± 0.25 | 64.43 ± 0.73 | 64.43 ± 0.73 |
Llama 3.1 8B Meta | 51.69 ± 0.15 | 50.07 ± 0.52 | 50.07 ± 0.52 |
GLM 4.7 Flash 30B MoE Z.ai | 51.42 ± 0.34 | 54.57 ± 0.83 | 54.57 ± 0.83 |
SEA-LION v4 (Apertus) 8B AISG | 50.92 ± 0.15 | 46.62 ± 0.57 | 46.62 ± 0.57 |
Apertus 8B Swiss AI | 45.08 ± 0.32 | 41.75 ± 0.60 | 41.75 ± 0.60 |
Llama 3.2 3B Meta | 43.00 ± 0.24 | 42.03 ± 0.55 | 42.03 ± 0.55 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 38.61 ± 0.24 | 61.07 ± 0.66 | 61.07 ± 0.66 |
Tiny Aya Water 3B CohereLabs | 35.94 ± 0.28 | 29.12 ± 0.61 | 29.12 ± 0.61 |
Tiny Aya Global 3B CohereLabs | 35.33 ± 0.30 | 26.48 ± 0.67 | 26.48 ± 0.67 |
MERaLiON 2 3B A*STAR | 31.94 ± 0.17 | 28.87 ± 0.38 | 28.87 ± 0.38 |
Model | MS | NLG | Translations |
|---|---|---|---|
Gemma 4 31B | 70.89 ± 0.09 | 92.68 ± 0.01 | 92.68 ± 0.01 |
Qwen 3.5 122B MoE Alibaba | 69.77 ± 0.13 | 91.93 ± 0.02 | 91.93 ± 0.02 |
SEA-LION v4.5 (Qwen) 27B AISG | 69.69 ± 0.10 | 91.87 ± 0.02 | 91.87 ± 0.02 |
Qwen 3.5 27B Alibaba | 69.61 ± 0.20 | 91.63 ± 0.02 | 91.63 ± 0.02 |
Gemma 4 26B MoE | 68.76 ± 0.08 | 92.31 ± 0.01 | 92.31 ± 0.01 |
Qwen 3.6 27B Alibaba | 68.45 ± 0.21 | 90.83 ± 0.03 | 90.83 ± 0.03 |
Qwen 3 VL 32B Alibaba | 68.06 ± 0.10 | 90.60 ± 0.02 | 90.60 ± 0.02 |
SEA-LION v3 (Llama) 70B AISG | 67.68 ± 0.20 | 91.04 ± 0.02 | 91.04 ± 0.02 |
SEA-LION v4 (Qwen) 32B AISG | 67.13 ± 0.12 | 90.48 ± 0.02 | 90.48 ± 0.02 |
Llama 3.3 70B Meta | 67.05 ± 0.12 | 90.08 ± 0.02 | 90.08 ± 0.02 |
Qwen 3.6 35B MoE Alibaba | 66.72 ± 0.22 | 90.03 ± 0.04 | 90.03 ± 0.04 |
SEA-LION v4 (Gemma) 27B AISG | 66.72 ± 0.17 | 91.82 ± 0.01 | 91.82 ± 0.01 |
Gemma 3 27B | 66.47 ± 0.13 | 91.85 ± 0.01 | 91.85 ± 0.01 |
Mistral Medium 3.5 128B Mistral AI | 66.21 ± 0.13 | 90.09 ± 0.06 | 90.09 ± 0.06 |
Llama 4 Scout 109B MoE Meta | 65.61 ± 0.14 | 91.15 ± 0.01 | 91.15 ± 0.01 |
Qwen 3.5 35B MoE Alibaba | 65.52 ± 0.24 | 89.38 ± 0.04 | 89.38 ± 0.04 |
Qwen 3 VL 8B Alibaba | 64.61 ± 0.10 | 88.30 ± 0.03 | 88.30 ± 0.03 |
SEA-LION v4 (Qwen VL) 8B AISG | 64.21 ± 0.10 | 88.76 ± 0.02 | 88.76 ± 0.02 |
Gemma 3 12B | 63.86 ± 0.07 | 91.00 ± 0.00 | 91.00 ± 0.00 |
Gemma 4 (E4B) 8B | 63.54 ± 0.12 | 90.92 ± 0.02 | 90.92 ± 0.02 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 62.91 ± 0.20 | 83.84 ± 0.04 | 83.84 ± 0.04 |
SEA-LION v3 (Gemma 2) 9B AISG | 61.58 ± 0.23 | 89.95 ± 0.04 | 89.95 ± 0.04 |
SEA-LION v4 (Qwen VL) 4B AISG | 60.58 ± 0.15 | 86.31 ± 0.03 | 86.31 ± 0.03 |
Qwen 3 VL 4B Alibaba | 59.90 ± 0.16 | 86.09 ± 0.03 | 86.09 ± 0.03 |
Qwen 3.5 9B Alibaba | 59.22 ± 0.23 | 81.43 ± 0.12 | 81.43 ± 0.12 |
Mistral Small 4 119B MoE Mistral AI | 58.87 ± 0.18 | 82.34 ± 0.13 | 82.34 ± 0.13 |
SEA-LION v3 (Llama) 8B AISG | 58.08 ± 0.20 | 89.48 ± 0.03 | 89.48 ± 0.03 |
Gemma 4 (E2B) 5B | 57.63 ± 0.18 | 89.21 ± 0.01 | 89.21 ± 0.01 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.35 ± 0.15 | 89.27 ± 0.02 | 89.27 ± 0.02 |
SEA-LION v4 (Gemma VL) 4B AISG | 57.10 ± 0.13 | 88.19 ± 0.02 | 88.19 ± 0.02 |
MERaLiON 2 10B A*STAR | 55.68 ± 0.22 | 86.11 ± 0.06 | 86.11 ± 0.06 |
Olmo 3.1 32B AI2 | 55.07 ± 0.16 | 86.15 ± 0.04 | 86.15 ± 0.04 |
Gemma 3 4B | 54.74 ± 0.15 | 87.10 ± 0.05 | 87.10 ± 0.05 |
Qwen 3.5 4B Alibaba | 52.38 ± 0.25 | 72.73 ± 0.13 | 72.73 ± 0.13 |
Llama 3.1 8B Meta | 51.69 ± 0.15 | 87.30 ± 0.04 | 87.30 ± 0.04 |
GLM 4.7 Flash 30B MoE Z.ai | 51.42 ± 0.34 | 79.67 ± 0.14 | 79.67 ± 0.14 |
SEA-LION v4 (Apertus) 8B AISG | 50.92 ± 0.15 | 89.29 ± 0.02 | 89.29 ± 0.02 |
Apertus 8B Swiss AI | 45.08 ± 0.32 | 86.18 ± 0.07 | 86.18 ± 0.07 |
Llama 3.2 3B Meta | 43.00 ± 0.24 | 76.70 ± 0.07 | 76.70 ± 0.07 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 38.61 ± 0.24 | 66.28 ± 0.13 | 66.28 ± 0.13 |
Tiny Aya Water 3B CohereLabs | 35.94 ± 0.28 | 59.21 ± 0.24 | 59.21 ± 0.24 |
Tiny Aya Global 3B CohereLabs | 35.33 ± 0.30 | 59.43 ± 0.22 | 59.43 ± 0.22 |
MERaLiON 2 3B A*STAR | 31.94 ± 0.17 | 74.02 ± 0.11 | 74.02 ± 0.11 |
Model | MS | NLU | Belebele QA | Sentiment Analysis |
|---|---|---|---|---|
Gemma 4 31B | 70.89 ± 0.09 | 73.18 ± 0.06 | 93.25 ± 0.05 | 53.12 ± 0.10 |
Qwen 3.5 122B MoE Alibaba | 69.77 ± 0.13 | 70.73 ± 0.15 | 92.15 ± 0.13 | 49.31 ± 0.29 |
SEA-LION v4.5 (Qwen) 27B AISG | 69.69 ± 0.10 | 71.04 ± 0.11 | 91.97 ± 0.10 | 50.11 ± 0.18 |
Qwen 3.5 27B Alibaba | 69.61 ± 0.20 | 70.32 ± 0.16 | 91.64 ± 0.11 | 49.00 ± 0.32 |
Gemma 4 26B MoE | 68.76 ± 0.08 | 70.71 ± 0.08 | 91.08 ± 0.10 | 50.35 ± 0.14 |
Qwen 3.6 27B Alibaba | 68.45 ± 0.21 | 70.06 ± 0.19 | 91.58 ± 0.14 | 48.54 ± 0.42 |
Qwen 3 VL 32B Alibaba | 68.06 ± 0.10 | 69.48 ± 0.07 | 91.33 ± 0.06 | 47.63 ± 0.14 |
SEA-LION v3 (Llama) 70B AISG | 67.68 ± 0.20 | 69.54 ± 0.18 | 90.70 ± 0.15 | 48.38 ± 0.41 |
SEA-LION v4 (Qwen) 32B AISG | 67.13 ± 0.12 | 68.41 ± 0.07 | 90.23 ± 0.05 | 46.58 ± 0.12 |
Llama 3.3 70B Meta | 67.05 ± 0.12 | 69.77 ± 0.10 | 90.95 ± 0.08 | 48.58 ± 0.20 |
Qwen 3.6 35B MoE Alibaba | 66.72 ± 0.22 | 68.22 ± 0.24 | 89.60 ± 0.23 | 46.84 ± 0.36 |
SEA-LION v4 (Gemma) 27B AISG | 66.72 ± 0.17 | 68.71 ± 0.10 | 88.31 ± 0.08 | 49.11 ± 0.18 |
Gemma 3 27B | 66.47 ± 0.13 | 68.74 ± 0.12 | 87.96 ± 0.09 | 49.52 ± 0.18 |
Mistral Medium 3.5 128B Mistral AI | 66.21 ± 0.13 | 69.91 ± 0.14 | 91.02 ± 0.16 | 48.80 ± 0.24 |
Llama 4 Scout 109B MoE Meta | 65.61 ± 0.14 | 69.07 ± 0.06 | 89.58 ± 0.05 | 48.55 ± 0.10 |
Qwen 3.5 35B MoE Alibaba | 65.52 ± 0.24 | 66.29 ± 0.25 | 88.42 ± 0.26 | 44.15 ± 0.38 |
Qwen 3 VL 8B Alibaba | 64.61 ± 0.10 | 67.09 ± 0.08 | 86.48 ± 0.07 | 47.70 ± 0.16 |
SEA-LION v4 (Qwen VL) 8B AISG | 64.21 ± 0.10 | 66.76 ± 0.08 | 86.06 ± 0.06 | 47.46 ± 0.14 |
Gemma 3 12B | 63.86 ± 0.07 | 66.65 ± 0.00 | 85.18 ± 0.00 | 48.13 ± 0.00 |
Gemma 4 (E4B) 8B | 63.54 ± 0.12 | 66.40 ± 0.15 | 85.24 ± 0.18 | 47.56 ± 0.23 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 62.91 ± 0.20 | 63.05 ± 0.25 | 86.73 ± 0.22 | 39.37 ± 0.51 |
SEA-LION v3 (Gemma 2) 9B AISG | 61.58 ± 0.23 | 66.46 ± 0.13 | 87.87 ± 0.09 | 45.05 ± 0.26 |
SEA-LION v4 (Qwen VL) 4B AISG | 60.58 ± 0.15 | 64.85 ± 0.07 | 84.41 ± 0.07 | 45.29 ± 0.11 |
Qwen 3 VL 4B Alibaba | 59.90 ± 0.16 | 64.53 ± 0.07 | 83.40 ± 0.06 | 45.65 ± 0.14 |
Qwen 3.5 9B Alibaba | 59.22 ± 0.23 | 63.99 ± 0.27 | 84.17 ± 0.28 | 43.81 ± 0.55 |
Mistral Small 4 119B MoE Mistral AI | 58.87 ± 0.18 | 62.81 ± 0.23 | 83.39 ± 0.25 | 42.23 ± 0.37 |
SEA-LION v3 (Llama) 8B AISG | 58.08 ± 0.20 | 60.61 ± 0.20 | 78.75 ± 0.23 | 42.48 ± 0.38 |
Gemma 4 (E2B) 5B | 57.63 ± 0.18 | 58.53 ± 0.12 | 72.36 ± 0.11 | 44.70 ± 0.18 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.35 ± 0.15 | 59.21 ± 0.12 | 72.92 ± 0.10 | 45.49 ± 0.19 |
SEA-LION v4 (Gemma VL) 4B AISG | 57.10 ± 0.13 | 57.28 ± 0.13 | 69.94 ± 0.11 | 44.63 ± 0.19 |
MERaLiON 2 10B A*STAR | 55.68 ± 0.22 | 64.53 ± 0.15 | 83.85 ± 0.13 | 45.21 ± 0.31 |
Olmo 3.1 32B AI2 | 55.07 ± 0.16 | 56.12 ± 0.18 | 67.88 ± 0.31 | 44.36 ± 0.25 |
Gemma 3 4B | 54.74 ± 0.15 | 53.61 ± 0.11 | 63.44 ± 0.12 | 43.78 ± 0.22 |
Qwen 3.5 4B Alibaba | 52.38 ± 0.25 | 60.10 ± 0.30 | 80.23 ± 0.38 | 39.98 ± 0.47 |
Llama 3.1 8B Meta | 51.69 ± 0.15 | 58.87 ± 0.15 | 75.95 ± 0.20 | 41.78 ± 0.23 |
GLM 4.7 Flash 30B MoE Z.ai | 51.42 ± 0.34 | 52.38 ± 0.30 | 68.16 ± 0.41 | 36.60 ± 0.44 |
SEA-LION v4 (Apertus) 8B AISG | 50.92 ± 0.15 | 55.27 ± 0.15 | 70.41 ± 0.18 | 40.13 ± 0.25 |
Apertus 8B Swiss AI | 45.08 ± 0.32 | 36.81 ± 0.39 | 48.11 ± 0.50 | 25.51 ± 0.54 |
Llama 3.2 3B Meta | 43.00 ± 0.24 | 47.44 ± 0.23 | 60.53 ± 0.29 | 34.35 ± 0.40 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 38.61 ± 0.24 | 18.48 ± 0.25 | 36.88 ± 0.45 | 0.08 ± 0.16 |
Tiny Aya Water 3B CohereLabs | 35.94 ± 0.28 | 25.39 ± 0.53 | 34.31 ± 0.74 | 16.46 ± 0.56 |
Tiny Aya Global 3B CohereLabs | 35.33 ± 0.30 | 27.68 ± 0.39 | 43.73 ± 0.54 | 11.63 ± 0.66 |
MERaLiON 2 3B A*STAR | 31.94 ± 0.17 | 26.61 ± 0.23 | 49.77 ± 0.29 | 3.46 ± 0.52 |
Model | MS | Safety | Toxicity Detection |
|---|---|---|---|
Gemma 4 31B | 70.89 ± 0.09 | 9.11 ± 0.15 | 9.11 ± 0.15 |
Qwen 3.5 122B MoE Alibaba | 69.77 ± 0.13 | 8.73 ± 0.40 | 8.73 ± 0.40 |
SEA-LION v4.5 (Qwen) 27B AISG | 69.69 ± 0.10 | 10.44 ± 0.39 | 10.44 ± 0.39 |
Qwen 3.5 27B Alibaba | 69.61 ± 0.20 | 9.07 ± 0.58 | 9.07 ± 0.58 |
Gemma 4 26B MoE | 68.76 ± 0.08 | 8.13 ± 0.14 | 8.13 ± 0.14 |
Qwen 3.6 27B Alibaba | 68.45 ± 0.21 | 6.41 ± 0.69 | 6.41 ± 0.69 |
Qwen 3 VL 32B Alibaba | 68.06 ± 0.10 | 13.96 ± 0.19 | 13.96 ± 0.19 |
SEA-LION v3 (Llama) 70B AISG | 67.68 ± 0.20 | 13.56 ± 0.46 | 13.56 ± 0.46 |
SEA-LION v4 (Qwen) 32B AISG | 67.13 ± 0.12 | 16.49 ± 0.13 | 16.49 ± 0.13 |
Llama 3.3 70B Meta | 67.05 ± 0.12 | 17.43 ± 0.21 | 17.43 ± 0.21 |
Qwen 3.6 35B MoE Alibaba | 66.72 ± 0.22 | 7.75 ± 0.68 | 7.75 ± 0.68 |
SEA-LION v4 (Gemma) 27B AISG | 66.72 ± 0.17 | 14.19 ± 0.28 | 14.19 ± 0.28 |
Gemma 3 27B | 66.47 ± 0.13 | 14.23 ± 0.21 | 14.23 ± 0.21 |
Mistral Medium 3.5 128B Mistral AI | 66.21 ± 0.13 | 9.87 ± 0.41 | 9.87 ± 0.41 |
Llama 4 Scout 109B MoE Meta | 65.61 ± 0.14 | 13.50 ± 0.12 | 13.50 ± 0.12 |
Qwen 3.5 35B MoE Alibaba | 65.52 ± 0.24 | 11.07 ± 0.78 | 11.07 ± 0.78 |
Qwen 3 VL 8B Alibaba | 64.61 ± 0.10 | 16.65 ± 0.24 | 16.65 ± 0.24 |
SEA-LION v4 (Qwen VL) 8B AISG | 64.21 ± 0.10 | 16.12 ± 0.14 | 16.12 ± 0.14 |
Gemma 3 12B | 63.86 ± 0.07 | 14.40 ± 0.00 | 14.40 ± 0.00 |
Gemma 4 (E4B) 8B | 63.54 ± 0.12 | 8.44 ± 0.28 | 8.44 ± 0.28 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 62.91 ± 0.20 | 12.28 ± 0.42 | 12.28 ± 0.42 |
SEA-LION v3 (Gemma 2) 9B AISG | 61.58 ± 0.23 | 12.05 ± 0.36 | 12.05 ± 0.36 |
SEA-LION v4 (Qwen VL) 4B AISG | 60.58 ± 0.15 | 15.09 ± 0.11 | 15.09 ± 0.11 |
Qwen 3 VL 4B Alibaba | 59.90 ± 0.16 | 12.77 ± 0.11 | 12.77 ± 0.11 |
Qwen 3.5 9B Alibaba | 59.22 ± 0.23 | 2.75 ± 0.49 | 2.75 ± 0.49 |
Mistral Small 4 119B MoE Mistral AI | 58.87 ± 0.18 | 12.03 ± 0.39 | 12.03 ± 0.39 |
SEA-LION v3 (Llama) 8B AISG | 58.08 ± 0.20 | 14.12 ± 0.44 | 14.12 ± 0.44 |
Gemma 4 (E2B) 5B | 57.63 ± 0.18 | 4.13 ± 0.19 | 4.13 ± 0.19 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.35 ± 0.15 | 3.67 ± 0.20 | 3.67 ± 0.20 |
SEA-LION v4 (Gemma VL) 4B AISG | 57.10 ± 0.13 | 14.01 ± 0.16 | 14.01 ± 0.16 |
MERaLiON 2 10B A*STAR | 55.68 ± 0.22 | 14.80 ± 0.33 | 14.80 ± 0.33 |
Olmo 3.1 32B AI2 | 55.07 ± 0.16 | 5.63 ± 0.22 | 5.63 ± 0.22 |
Gemma 3 4B | 54.74 ± 0.15 | 10.99 ± 0.13 | 10.99 ± 0.13 |
Qwen 3.5 4B Alibaba | 52.38 ± 0.25 | 3.90 ± 0.63 | 3.90 ± 0.63 |
Llama 3.1 8B Meta | 51.69 ± 0.15 | 14.66 ± 0.29 | 14.66 ± 0.29 |
GLM 4.7 Flash 30B MoE Z.ai | 51.42 ± 0.34 | 16.45 ± 0.62 | 16.45 ± 0.62 |
SEA-LION v4 (Apertus) 8B AISG | 50.92 ± 0.15 | 12.72 ± 0.33 | 12.72 ± 0.33 |
Apertus 8B Swiss AI | 45.08 ± 0.32 | 9.78 ± 0.90 | 9.78 ± 0.90 |
Llama 3.2 3B Meta | 43.00 ± 0.24 | 6.46 ± 0.28 | 6.46 ± 0.28 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 38.61 ± 0.24 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tiny Aya Water 3B CohereLabs | 35.94 ± 0.28 | 4.35 ± 0.76 | 4.35 ± 0.76 |
Tiny Aya Global 3B CohereLabs | 35.33 ± 0.30 | 3.59 ± 0.54 | 3.59 ± 0.54 |
MERaLiON 2 3B A*STAR | 31.94 ± 0.17 | 6.99 ± 0.34 | 6.99 ± 0.34 |