Vietnamese Performance
Vietnamese Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
31B 74.51±0.08 |
27B 71.08±0.13 |
122B MoE 71.05±0.19 |
26B MoE 69.80±0.11 |
27B 69.55±0.15 |
27B 68.70±0.21 |
70B 68.31±0.23 |
32B 67.09±0.11 |
32B 66.93±0.09 |
35B MoE 66.79±0.23 |
70B 66.76±0.12 |
128B 66.64±0.17 |
27B 65.05±0.16 |
27B 64.66±0.14 |
109B MoE 64.44±0.12 |
8B 64.26±0.11 |
35B MoE 64.13±0.27 |
8B 63.86±0.13 |
12B 63.53±0.06 |
9B 62.03±0.16 |
8B 61.70±0.12 |
120B MoE 60.31±0.19 |
4B 59.69±0.11 |
4B 59.45±0.10 |
9B 58.38±0.37 |
119B MoE 58.01±0.19 |
5B 54.80±0.17 |
8B 54.70±0.21 |
5B 53.51±0.13 |
4B 53.07±0.28 |
4B 52.38±0.14 |
32B 51.32±0.19 |
4B 50.40±0.15 |
10B 49.83±0.21 |
30B MoE 46.83±0.30 |
8B 43.88±0.19 |
8B 43.71±0.21 |
30B MoE 40.13±0.27 |
8B 40.13±0.24 |
3B 34.69±0.32 |
3B 33.02±0.30 |
3B 32.31±0.21 |
3B 18.30±0.25 |
Vietnamese Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | VI | Instruction Following | Knowledge | Multi-Turn Chat | NLG | NLR | NLU | Safety |
|---|---|---|---|---|---|---|---|---|
Gemma 4 31B | 74.51 ± 0.08 | 95.05 ± 0.30 | 75.46 ± 0.20 | 85.64 ± 0.29 | 55.59 ± 0.04 | 78.91 ± 0.05 | 70.05 ± 0.09 | 60.90 ± 0.14 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.08 ± 0.13 | 92.41 ± 0.54 | 75.75 ± 0.30 | 86.16 ± 0.30 | 54.32 ± 0.04 | 75.28 ± 0.17 | 66.37 ± 0.17 | 47.27 ± 0.71 |
Qwen 3.5 122B MoE Alibaba | 71.05 ± 0.19 | 90.83 ± 0.45 | 76.29 ± 0.30 | 86.59 ± 0.36 | 54.88 ± 0.04 | 71.31 ± 0.21 | 67.59 ± 0.29 | 49.85 ± 0.91 |
Gemma 4 26B MoE | 69.80 ± 0.11 | 88.54 ± 0.51 | 70.74 ± 0.21 | 84.93 ± 0.33 | 55.65 ± 0.06 | 73.59 ± 0.12 | 64.17 ± 0.19 | 50.98 ± 0.28 |
Qwen 3.5 27B Alibaba | 69.55 ± 0.15 | 90.60 ± 0.45 | 76.12 ± 0.25 | 85.91 ± 0.29 | 54.45 ± 0.05 | 71.00 ± 0.21 | 65.75 ± 0.26 | 42.99 ± 0.81 |
Qwen 3.6 27B Alibaba | 68.70 ± 0.21 | 86.76 ± 0.54 | 75.66 ± 0.36 | 87.19 ± 0.27 | 54.15 ± 0.05 | 71.59 ± 0.31 | 65.83 ± 0.24 | 39.69 ± 1.21 |
SEA-LION v3 (Llama) 70B AISG | 68.31 ± 0.23 | 92.29 ± 0.61 | 71.84 ± 0.37 | 72.26 ± 0.57 | 56.25 ± 0.06 | 72.86 ± 0.26 | 68.88 ± 0.35 | 43.78 ± 0.92 |
SEA-LION v4 (Qwen) 32B AISG | 67.09 ± 0.11 | 87.90 ± 0.46 | 67.56 ± 0.15 | 78.48 ± 0.54 | 55.00 ± 0.05 | 69.90 ± 0.09 | 63.80 ± 0.15 | 46.98 ± 0.19 |
Qwen 3 VL 32B Alibaba | 66.93 ± 0.09 | 88.83 ± 0.41 | 69.27 ± 0.19 | 81.02 ± 0.37 | 53.68 ± 0.03 | 70.71 ± 0.11 | 68.54 ± 0.12 | 36.47 ± 0.16 |
Qwen 3.6 35B MoE Alibaba | 66.79 ± 0.23 | 83.78 ± 0.89 | 65.07 ± 0.66 | 85.77 ± 0.32 | 54.77 ± 0.06 | 68.72 ± 0.32 | 68.75 ± 0.34 | 40.65 ± 1.39 |
Llama 3.3 70B Meta | 66.76 ± 0.12 | 93.59 ± 0.46 | 67.25 ± 0.19 | 65.98 ± 0.55 | 54.15 ± 0.06 | 74.94 ± 0.09 | 65.40 ± 0.18 | 46.00 ± 0.48 |
Mistral Medium 3.5 128B Mistral AI | 66.64 ± 0.17 | 85.21 ± 0.63 | 69.25 ± 0.39 | 80.50 ± 0.44 | 54.12 ± 0.05 | 64.88 ± 0.27 | 68.75 ± 0.33 | 43.76 ± 0.65 |
SEA-LION v4 (Gemma) 27B AISG | 65.05 ± 0.16 | 84.32 ± 0.66 | 64.06 ± 0.34 | 77.98 ± 0.30 | 52.28 ± 0.03 | 70.73 ± 0.16 | 63.88 ± 0.24 | 42.11 ± 0.45 |
Gemma 3 27B | 64.66 ± 0.14 | 84.63 ± 0.66 | 64.58 ± 0.25 | 77.90 ± 0.41 | 51.55 ± 0.04 | 70.62 ± 0.11 | 63.12 ± 0.23 | 40.21 ± 0.39 |
Llama 4 Scout 109B MoE Meta | 64.44 ± 0.12 | 90.54 ± 0.58 | 65.07 ± 0.16 | 68.81 ± 0.56 | 54.62 ± 0.04 | 58.26 ± 0.08 | 67.48 ± 0.12 | 46.32 ± 0.24 |
SEA-LION v4 (Qwen VL) 8B AISG | 64.26 ± 0.11 | 89.87 ± 0.49 | 56.07 ± 0.21 | 77.69 ± 0.33 | 53.73 ± 0.04 | 64.79 ± 0.10 | 70.42 ± 0.13 | 37.27 ± 0.30 |
Qwen 3.5 35B MoE Alibaba | 64.13 ± 0.27 | 87.49 ± 0.71 | 57.31 ± 0.63 | 84.40 ± 0.36 | 54.43 ± 0.07 | 64.20 ± 0.42 | 62.30 ± 0.43 | 38.77 ± 1.15 |
Qwen 3 VL 8B Alibaba | 63.86 ± 0.13 | 86.41 ± 0.55 | 55.46 ± 0.27 | 80.18 ± 0.41 | 53.78 ± 0.04 | 64.22 ± 0.15 | 71.80 ± 0.12 | 35.15 ± 0.30 |
Gemma 3 12B | 63.53 ± 0.06 | 85.71 ± 0.00 | 51.27 ± 0.00 | 74.27 ± 0.44 | 53.49 ± 0.00 | 74.20 ± 0.00 | 65.55 ± 0.00 | 40.19 ± 0.00 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.03 ± 0.16 | 81.56 ± 1.05 | 50.54 ± 0.51 | 67.81 ± 0.31 | 54.48 ± 0.06 | 64.21 ± 0.17 | 70.67 ± 0.26 | 44.97 ± 0.53 |
Gemma 4 (E4B) 8B | 61.70 ± 0.12 | 88.03 ± 0.44 | 52.04 ± 0.47 | 77.53 ± 0.38 | 53.97 ± 0.05 | 61.91 ± 0.26 | 53.36 ± 0.31 | 45.03 ± 0.62 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 60.31 ± 0.19 | 78.79 ± 0.86 | 62.71 ± 0.50 | 83.96 ± 0.36 | 50.60 ± 0.06 | 59.32 ± 0.36 | 60.42 ± 0.45 | 26.37 ± 0.51 |
SEA-LION v4 (Qwen VL) 4B AISG | 59.69 ± 0.11 | 87.27 ± 0.71 | 46.53 ± 0.15 | 73.33 ± 0.41 | 52.47 ± 0.04 | 62.08 ± 0.11 | 68.82 ± 0.14 | 27.37 ± 0.22 |
Qwen 3 VL 4B Alibaba | 59.45 ± 0.10 | 87.21 ± 0.52 | 44.20 ± 0.20 | 73.05 ± 0.51 | 52.70 ± 0.04 | 61.32 ± 0.13 | 71.04 ± 0.11 | 26.66 ± 0.22 |
Qwen 3.5 9B Alibaba | 58.38 ± 0.37 | 80.44 ± 1.16 | 56.39 ± 0.78 | 78.56 ± 0.42 | 51.78 ± 0.08 | 51.72 ± 0.47 | 61.18 ± 0.34 | 28.58 ± 1.27 |
Mistral Small 4 119B MoE Mistral AI | 58.01 ± 0.19 | 73.90 ± 0.83 | 58.97 ± 0.55 | 76.60 ± 0.50 | 54.10 ± 0.05 | 50.41 ± 0.56 | 70.77 ± 0.42 | 21.29 ± 0.53 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 54.80 ± 0.17 | 85.87 ± 0.93 | 40.04 ± 0.40 | 69.86 ± 0.50 | 53.02 ± 0.05 | 46.24 ± 0.23 | 53.65 ± 0.22 | 34.93 ± 0.35 |
SEA-LION v3 (Llama) 8B AISG | 54.70 ± 0.21 | 81.08 ± 0.82 | 43.74 ± 0.50 | 62.02 ± 0.70 | 53.91 ± 0.07 | 54.45 ± 0.36 | 65.69 ± 0.42 | 21.99 ± 0.48 |
Gemma 4 (E2B) 5B | 53.51 ± 0.13 | 83.05 ± 0.50 | 37.99 ± 0.39 | 71.70 ± 0.40 | 52.57 ± 0.05 | 44.18 ± 0.19 | 49.52 ± 0.22 | 35.54 ± 0.52 |
Qwen 3.5 4B Alibaba | 53.07 ± 0.28 | 79.17 ± 1.02 | 43.78 ± 0.95 | 71.23 ± 0.56 | 47.19 ± 0.11 | 40.30 ± 0.35 | 58.28 ± 0.42 | 31.55 ± 1.11 |
SEA-LION v4 (Gemma VL) 4B AISG | 52.38 ± 0.14 | 81.94 ± 0.70 | 37.54 ± 0.17 | 66.38 ± 0.35 | 52.68 ± 0.06 | 50.13 ± 0.14 | 62.55 ± 0.22 | 15.45 ± 0.20 |
Olmo 3.1 32B AI2 | 51.32 ± 0.19 | 81.05 ± 0.72 | 33.07 ± 0.33 | 70.33 ± 0.51 | 51.34 ± 0.05 | 41.52 ± 0.36 | 64.91 ± 0.33 | 17.04 ± 0.71 |
Gemma 3 4B | 50.40 ± 0.15 | 81.17 ± 0.74 | 36.85 ± 0.23 | 65.37 ± 0.43 | 52.66 ± 0.06 | 42.08 ± 0.24 | 56.33 ± 0.20 | 18.36 ± 0.35 |
MERaLiON 2 10B A*STAR | 49.83 ± 0.21 | 63.62 ± 1.01 | 29.77 ± 0.58 | 50.35 ± 0.59 | 52.52 ± 0.07 | 56.09 ± 0.24 | 55.22 ± 0.43 | 41.21 ± 0.56 |
GLM 4.7 Flash 30B MoE Z.ai | 46.83 ± 0.30 | 71.21 ± 1.27 | 35.85 ± 0.95 | 60.69 ± 0.63 | 51.25 ± 0.09 | 18.40 ± 0.74 | 58.96 ± 0.58 | 31.47 ± 1.07 |
Llama 3.1 8B Meta | 43.88 ± 0.19 | 72.06 ± 0.97 | 28.91 ± 0.61 | 50.78 ± 0.61 | 52.47 ± 0.06 | 17.25 ± 0.47 | 63.50 ± 0.33 | 22.19 ± 0.70 |
SEA-LION v4 (Apertus) 8B AISG | 43.71 ± 0.21 | 62.51 ± 0.97 | 39.39 ± 0.40 | 46.51 ± 0.50 | 50.14 ± 0.06 | 27.89 ± 0.41 | 56.89 ± 0.31 | 22.62 ± 0.39 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 40.13 ± 0.27 | 70.83 ± 1.01 | 37.07 ± 0.84 | 62.43 ± 0.85 | 42.72 ± 0.12 | 20.08 ± 0.77 | 36.37 ± 0.77 | 11.42 ± 1.03 |
Apertus 8B Swiss AI | 40.13 ± 0.24 | 68.03 ± 1.23 | 29.69 ± 0.70 | 42.60 ± 0.67 | 48.46 ± 0.15 | 24.28 ± 0.51 | 44.09 ± 0.73 | 23.75 ± 0.97 |
Tiny Aya Water 3B CohereLabs | 34.69 ± 0.32 | 66.67 ± 1.32 | 32.84 ± 0.66 | 35.83 ± 0.63 | 39.16 ± 0.15 | 20.17 ± 0.58 | 37.24 ± 0.77 | 10.90 ± 0.83 |
Tiny Aya Global 3B CohereLabs | 33.02 ± 0.30 | 65.56 ± 1.10 | 33.11 ± 0.73 | 32.21 ± 0.81 | 38.38 ± 0.17 | 14.42 ± 0.55 | 36.62 ± 0.72 | 10.82 ± 0.73 |
Llama 3.2 3B Meta | 32.31 ± 0.21 | 60.73 ± 0.89 | 9.50 ± 0.45 | 43.51 ± 0.70 | 49.50 ± 0.06 | 3.63 ± 0.24 | 50.16 ± 0.34 | 9.15 ± 0.78 |
MERaLiON 2 3B A*STAR | 18.30 ± 0.25 | 36.79 ± 1.21 | 0.11 ± 0.04 | 28.79 ± 0.49 | 37.78 ± 0.11 | 6.72 ± 0.71 | 14.87 ± 0.34 | 3.06 ± 0.71 |
Vietnamese Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | VI | Instruction Following | SEA-IFEval |
|---|---|---|---|
Gemma 4 31B | 74.51 ± 0.08 | 95.05 ± 0.30 | 95.05 ± 0.30 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.08 ± 0.13 | 92.41 ± 0.54 | 92.41 ± 0.54 |
Qwen 3.5 122B MoE Alibaba | 71.05 ± 0.19 | 90.83 ± 0.45 | 90.83 ± 0.45 |
Gemma 4 26B MoE | 69.80 ± 0.11 | 88.54 ± 0.51 | 88.54 ± 0.51 |
Qwen 3.5 27B Alibaba | 69.55 ± 0.15 | 90.60 ± 0.45 | 90.60 ± 0.45 |
Qwen 3.6 27B Alibaba | 68.70 ± 0.21 | 86.76 ± 0.54 | 86.76 ± 0.54 |
SEA-LION v3 (Llama) 70B AISG | 68.31 ± 0.23 | 92.29 ± 0.61 | 92.29 ± 0.61 |
SEA-LION v4 (Qwen) 32B AISG | 67.09 ± 0.11 | 87.90 ± 0.46 | 87.90 ± 0.46 |
Qwen 3 VL 32B Alibaba | 66.93 ± 0.09 | 88.83 ± 0.41 | 88.83 ± 0.41 |
Qwen 3.6 35B MoE Alibaba | 66.79 ± 0.23 | 83.78 ± 0.89 | 83.78 ± 0.89 |
Llama 3.3 70B Meta | 66.76 ± 0.12 | 93.59 ± 0.46 | 93.59 ± 0.46 |
Mistral Medium 3.5 128B Mistral AI | 66.64 ± 0.17 | 85.21 ± 0.63 | 85.21 ± 0.63 |
SEA-LION v4 (Gemma) 27B AISG | 65.05 ± 0.16 | 84.32 ± 0.66 | 84.32 ± 0.66 |
Gemma 3 27B | 64.66 ± 0.14 | 84.63 ± 0.66 | 84.63 ± 0.66 |
Llama 4 Scout 109B MoE Meta | 64.44 ± 0.12 | 90.54 ± 0.58 | 90.54 ± 0.58 |
SEA-LION v4 (Qwen VL) 8B AISG | 64.26 ± 0.11 | 89.87 ± 0.49 | 89.87 ± 0.49 |
Qwen 3.5 35B MoE Alibaba | 64.13 ± 0.27 | 87.49 ± 0.71 | 87.49 ± 0.71 |
Qwen 3 VL 8B Alibaba | 63.86 ± 0.13 | 86.41 ± 0.55 | 86.41 ± 0.55 |
Gemma 3 12B | 63.53 ± 0.06 | 85.71 ± 0.00 | 85.71 ± 0.00 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.03 ± 0.16 | 81.56 ± 1.05 | 81.56 ± 1.05 |
Gemma 4 (E4B) 8B | 61.70 ± 0.12 | 88.03 ± 0.44 | 88.03 ± 0.44 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 60.31 ± 0.19 | 78.79 ± 0.86 | 78.79 ± 0.86 |
SEA-LION v4 (Qwen VL) 4B AISG | 59.69 ± 0.11 | 87.27 ± 0.71 | 87.27 ± 0.71 |
Qwen 3 VL 4B Alibaba | 59.45 ± 0.10 | 87.21 ± 0.52 | 87.21 ± 0.52 |
Qwen 3.5 9B Alibaba | 58.38 ± 0.37 | 80.44 ± 1.16 | 80.44 ± 1.16 |
Mistral Small 4 119B MoE Mistral AI | 58.01 ± 0.19 | 73.90 ± 0.83 | 73.90 ± 0.83 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 54.80 ± 0.17 | 85.87 ± 0.93 | 85.87 ± 0.93 |
SEA-LION v3 (Llama) 8B AISG | 54.70 ± 0.21 | 81.08 ± 0.82 | 81.08 ± 0.82 |
Gemma 4 (E2B) 5B | 53.51 ± 0.13 | 83.05 ± 0.50 | 83.05 ± 0.50 |
Qwen 3.5 4B Alibaba | 53.07 ± 0.28 | 79.17 ± 1.02 | 79.17 ± 1.02 |
SEA-LION v4 (Gemma VL) 4B AISG | 52.38 ± 0.14 | 81.94 ± 0.70 | 81.94 ± 0.70 |
Olmo 3.1 32B AI2 | 51.32 ± 0.19 | 81.05 ± 0.72 | 81.05 ± 0.72 |
Gemma 3 4B | 50.40 ± 0.15 | 81.17 ± 0.74 | 81.17 ± 0.74 |
MERaLiON 2 10B A*STAR | 49.83 ± 0.21 | 63.62 ± 1.01 | 63.62 ± 1.01 |
GLM 4.7 Flash 30B MoE Z.ai | 46.83 ± 0.30 | 71.21 ± 1.27 | 71.21 ± 1.27 |
Llama 3.1 8B Meta | 43.88 ± 0.19 | 72.06 ± 0.97 | 72.06 ± 0.97 |
SEA-LION v4 (Apertus) 8B AISG | 43.71 ± 0.21 | 62.51 ± 0.97 | 62.51 ± 0.97 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 40.13 ± 0.27 | 70.83 ± 1.01 | 70.83 ± 1.01 |
Apertus 8B Swiss AI | 40.13 ± 0.24 | 68.03 ± 1.23 | 68.03 ± 1.23 |
Tiny Aya Water 3B CohereLabs | 34.69 ± 0.32 | 66.67 ± 1.32 | 66.67 ± 1.32 |
Tiny Aya Global 3B CohereLabs | 33.02 ± 0.30 | 65.56 ± 1.10 | 65.56 ± 1.10 |
Llama 3.2 3B Meta | 32.31 ± 0.21 | 60.73 ± 0.89 | 60.73 ± 0.89 |
MERaLiON 2 3B A*STAR | 18.30 ± 0.25 | 36.79 ± 1.21 | 36.79 ± 1.21 |
Model | VI | Knowledge | Global MMLU Lite |
|---|---|---|---|
Gemma 4 31B | 74.51 ± 0.08 | 75.46 ± 0.20 | 75.46 ± 0.20 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.08 ± 0.13 | 75.75 ± 0.30 | 75.75 ± 0.30 |
Qwen 3.5 122B MoE Alibaba | 71.05 ± 0.19 | 76.29 ± 0.30 | 76.29 ± 0.30 |
Gemma 4 26B MoE | 69.80 ± 0.11 | 70.74 ± 0.21 | 70.74 ± 0.21 |
Qwen 3.5 27B Alibaba | 69.55 ± 0.15 | 76.12 ± 0.25 | 76.12 ± 0.25 |
Qwen 3.6 27B Alibaba | 68.70 ± 0.21 | 75.66 ± 0.36 | 75.66 ± 0.36 |
SEA-LION v3 (Llama) 70B AISG | 68.31 ± 0.23 | 71.84 ± 0.37 | 71.84 ± 0.37 |
SEA-LION v4 (Qwen) 32B AISG | 67.09 ± 0.11 | 67.56 ± 0.15 | 67.56 ± 0.15 |
Qwen 3 VL 32B Alibaba | 66.93 ± 0.09 | 69.27 ± 0.19 | 69.27 ± 0.19 |
Qwen 3.6 35B MoE Alibaba | 66.79 ± 0.23 | 65.07 ± 0.66 | 65.07 ± 0.66 |
Llama 3.3 70B Meta | 66.76 ± 0.12 | 67.25 ± 0.19 | 67.25 ± 0.19 |
Mistral Medium 3.5 128B Mistral AI | 66.64 ± 0.17 | 69.25 ± 0.39 | 69.25 ± 0.39 |
SEA-LION v4 (Gemma) 27B AISG | 65.05 ± 0.16 | 64.06 ± 0.34 | 64.06 ± 0.34 |
Gemma 3 27B | 64.66 ± 0.14 | 64.58 ± 0.25 | 64.58 ± 0.25 |
Llama 4 Scout 109B MoE Meta | 64.44 ± 0.12 | 65.07 ± 0.16 | 65.07 ± 0.16 |
SEA-LION v4 (Qwen VL) 8B AISG | 64.26 ± 0.11 | 56.07 ± 0.21 | 56.07 ± 0.21 |
Qwen 3.5 35B MoE Alibaba | 64.13 ± 0.27 | 57.31 ± 0.63 | 57.31 ± 0.63 |
Qwen 3 VL 8B Alibaba | 63.86 ± 0.13 | 55.46 ± 0.27 | 55.46 ± 0.27 |
Gemma 3 12B | 63.53 ± 0.06 | 51.27 ± 0.00 | 51.27 ± 0.00 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.03 ± 0.16 | 50.54 ± 0.51 | 50.54 ± 0.51 |
Gemma 4 (E4B) 8B | 61.70 ± 0.12 | 52.04 ± 0.47 | 52.04 ± 0.47 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 60.31 ± 0.19 | 62.71 ± 0.50 | 62.71 ± 0.50 |
SEA-LION v4 (Qwen VL) 4B AISG | 59.69 ± 0.11 | 46.53 ± 0.15 | 46.53 ± 0.15 |
Qwen 3 VL 4B Alibaba | 59.45 ± 0.10 | 44.20 ± 0.20 | 44.20 ± 0.20 |
Qwen 3.5 9B Alibaba | 58.38 ± 0.37 | 56.39 ± 0.78 | 56.39 ± 0.78 |
Mistral Small 4 119B MoE Mistral AI | 58.01 ± 0.19 | 58.97 ± 0.55 | 58.97 ± 0.55 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 54.80 ± 0.17 | 40.04 ± 0.40 | 40.04 ± 0.40 |
SEA-LION v3 (Llama) 8B AISG | 54.70 ± 0.21 | 43.74 ± 0.50 | 43.74 ± 0.50 |
Gemma 4 (E2B) 5B | 53.51 ± 0.13 | 37.99 ± 0.39 | 37.99 ± 0.39 |
Qwen 3.5 4B Alibaba | 53.07 ± 0.28 | 43.78 ± 0.95 | 43.78 ± 0.95 |
SEA-LION v4 (Gemma VL) 4B AISG | 52.38 ± 0.14 | 37.54 ± 0.17 | 37.54 ± 0.17 |
Olmo 3.1 32B AI2 | 51.32 ± 0.19 | 33.07 ± 0.33 | 33.07 ± 0.33 |
Gemma 3 4B | 50.40 ± 0.15 | 36.85 ± 0.23 | 36.85 ± 0.23 |
MERaLiON 2 10B A*STAR | 49.83 ± 0.21 | 29.77 ± 0.58 | 29.77 ± 0.58 |
GLM 4.7 Flash 30B MoE Z.ai | 46.83 ± 0.30 | 35.85 ± 0.95 | 35.85 ± 0.95 |
Llama 3.1 8B Meta | 43.88 ± 0.19 | 28.91 ± 0.61 | 28.91 ± 0.61 |
SEA-LION v4 (Apertus) 8B AISG | 43.71 ± 0.21 | 39.39 ± 0.40 | 39.39 ± 0.40 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 40.13 ± 0.27 | 37.07 ± 0.84 | 37.07 ± 0.84 |
Apertus 8B Swiss AI | 40.13 ± 0.24 | 29.69 ± 0.70 | 29.69 ± 0.70 |
Tiny Aya Water 3B CohereLabs | 34.69 ± 0.32 | 32.84 ± 0.66 | 32.84 ± 0.66 |
Tiny Aya Global 3B CohereLabs | 33.02 ± 0.30 | 33.11 ± 0.73 | 33.11 ± 0.73 |
Llama 3.2 3B Meta | 32.31 ± 0.21 | 9.50 ± 0.45 | 9.50 ± 0.45 |
MERaLiON 2 3B A*STAR | 18.30 ± 0.25 | 0.11 ± 0.04 | 0.11 ± 0.04 |
Model | VI | Multi-Turn Chat | SEA-MT-Bench (LLM Judge) |
|---|---|---|---|
Gemma 4 31B | 74.51 ± 0.08 | 85.64 ± 0.29 | 85.64 ± 0.29 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.08 ± 0.13 | 86.16 ± 0.30 | 86.16 ± 0.30 |
Qwen 3.5 122B MoE Alibaba | 71.05 ± 0.19 | 86.59 ± 0.36 | 86.59 ± 0.36 |
Gemma 4 26B MoE | 69.80 ± 0.11 | 84.93 ± 0.33 | 84.93 ± 0.33 |
Qwen 3.5 27B Alibaba | 69.55 ± 0.15 | 85.91 ± 0.29 | 85.91 ± 0.29 |
Qwen 3.6 27B Alibaba | 68.70 ± 0.21 | 87.19 ± 0.27 | 87.19 ± 0.27 |
SEA-LION v3 (Llama) 70B AISG | 68.31 ± 0.23 | 72.26 ± 0.57 | 72.26 ± 0.57 |
SEA-LION v4 (Qwen) 32B AISG | 67.09 ± 0.11 | 78.48 ± 0.54 | 78.48 ± 0.54 |
Qwen 3 VL 32B Alibaba | 66.93 ± 0.09 | 81.02 ± 0.37 | 81.02 ± 0.37 |
Qwen 3.6 35B MoE Alibaba | 66.79 ± 0.23 | 85.77 ± 0.32 | 85.77 ± 0.32 |
Llama 3.3 70B Meta | 66.76 ± 0.12 | 65.98 ± 0.55 | 65.98 ± 0.55 |
Mistral Medium 3.5 128B Mistral AI | 66.64 ± 0.17 | 80.50 ± 0.44 | 80.50 ± 0.44 |
SEA-LION v4 (Gemma) 27B AISG | 65.05 ± 0.16 | 77.98 ± 0.30 | 77.98 ± 0.30 |
Gemma 3 27B | 64.66 ± 0.14 | 77.90 ± 0.41 | 77.90 ± 0.41 |
Llama 4 Scout 109B MoE Meta | 64.44 ± 0.12 | 68.81 ± 0.56 | 68.81 ± 0.56 |
SEA-LION v4 (Qwen VL) 8B AISG | 64.26 ± 0.11 | 77.69 ± 0.33 | 77.69 ± 0.33 |
Qwen 3.5 35B MoE Alibaba | 64.13 ± 0.27 | 84.40 ± 0.36 | 84.40 ± 0.36 |
Qwen 3 VL 8B Alibaba | 63.86 ± 0.13 | 80.18 ± 0.41 | 80.18 ± 0.41 |
Gemma 3 12B | 63.53 ± 0.06 | 74.27 ± 0.44 | 74.27 ± 0.44 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.03 ± 0.16 | 67.81 ± 0.31 | 67.81 ± 0.31 |
Gemma 4 (E4B) 8B | 61.70 ± 0.12 | 77.53 ± 0.38 | 77.53 ± 0.38 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 60.31 ± 0.19 | 83.96 ± 0.36 | 83.96 ± 0.36 |
SEA-LION v4 (Qwen VL) 4B AISG | 59.69 ± 0.11 | 73.33 ± 0.41 | 73.33 ± 0.41 |
Qwen 3 VL 4B Alibaba | 59.45 ± 0.10 | 73.05 ± 0.51 | 73.05 ± 0.51 |
Qwen 3.5 9B Alibaba | 58.38 ± 0.37 | 78.56 ± 0.42 | 78.56 ± 0.42 |
Mistral Small 4 119B MoE Mistral AI | 58.01 ± 0.19 | 76.60 ± 0.50 | 76.60 ± 0.50 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 54.80 ± 0.17 | 69.86 ± 0.50 | 69.86 ± 0.50 |
SEA-LION v3 (Llama) 8B AISG | 54.70 ± 0.21 | 62.02 ± 0.70 | 62.02 ± 0.70 |
Gemma 4 (E2B) 5B | 53.51 ± 0.13 | 71.70 ± 0.40 | 71.70 ± 0.40 |
Qwen 3.5 4B Alibaba | 53.07 ± 0.28 | 71.23 ± 0.56 | 71.23 ± 0.56 |
SEA-LION v4 (Gemma VL) 4B AISG | 52.38 ± 0.14 | 66.38 ± 0.35 | 66.38 ± 0.35 |
Olmo 3.1 32B AI2 | 51.32 ± 0.19 | 70.33 ± 0.51 | 70.33 ± 0.51 |
Gemma 3 4B | 50.40 ± 0.15 | 65.37 ± 0.43 | 65.37 ± 0.43 |
MERaLiON 2 10B A*STAR | 49.83 ± 0.21 | 50.35 ± 0.59 | 50.35 ± 0.59 |
GLM 4.7 Flash 30B MoE Z.ai | 46.83 ± 0.30 | 60.69 ± 0.63 | 60.69 ± 0.63 |
Llama 3.1 8B Meta | 43.88 ± 0.19 | 50.78 ± 0.61 | 50.78 ± 0.61 |
SEA-LION v4 (Apertus) 8B AISG | 43.71 ± 0.21 | 46.51 ± 0.50 | 46.51 ± 0.50 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 40.13 ± 0.27 | 62.43 ± 0.85 | 62.43 ± 0.85 |
Apertus 8B Swiss AI | 40.13 ± 0.24 | 42.60 ± 0.67 | 42.60 ± 0.67 |
Tiny Aya Water 3B CohereLabs | 34.69 ± 0.32 | 35.83 ± 0.63 | 35.83 ± 0.63 |
Tiny Aya Global 3B CohereLabs | 33.02 ± 0.30 | 32.21 ± 0.81 | 32.21 ± 0.81 |
Llama 3.2 3B Meta | 32.31 ± 0.21 | 43.51 ± 0.70 | 43.51 ± 0.70 |
MERaLiON 2 3B A*STAR | 18.30 ± 0.25 | 28.79 ± 0.49 | 28.79 ± 0.49 |
Model | VI | NLG | Summarization | Translations |
|---|---|---|---|---|
Gemma 4 31B | 74.51 ± 0.08 | 55.59 ± 0.04 | 17.47 ± 0.07 | 93.70 ± 0.01 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.08 ± 0.13 | 54.32 ± 0.04 | 15.26 ± 0.08 | 93.38 ± 0.01 |
Qwen 3.5 122B MoE Alibaba | 71.05 ± 0.19 | 54.88 ± 0.04 | 16.29 ± 0.09 | 93.47 ± 0.02 |
Gemma 4 26B MoE | 69.80 ± 0.11 | 55.65 ± 0.06 | 17.91 ± 0.12 | 93.38 ± 0.01 |
Qwen 3.5 27B Alibaba | 69.55 ± 0.15 | 54.45 ± 0.05 | 15.54 ± 0.09 | 93.36 ± 0.02 |
Qwen 3.6 27B Alibaba | 68.70 ± 0.21 | 54.15 ± 0.05 | 15.37 ± 0.10 | 92.93 ± 0.03 |
SEA-LION v3 (Llama) 70B AISG | 68.31 ± 0.23 | 56.25 ± 0.06 | 19.77 ± 0.11 | 92.74 ± 0.02 |
SEA-LION v4 (Qwen) 32B AISG | 67.09 ± 0.11 | 55.00 ± 0.05 | 17.46 ± 0.10 | 92.53 ± 0.01 |
Qwen 3 VL 32B Alibaba | 66.93 ± 0.09 | 53.68 ± 0.03 | 14.60 ± 0.07 | 92.76 ± 0.02 |
Qwen 3.6 35B MoE Alibaba | 66.79 ± 0.23 | 54.77 ± 0.06 | 17.26 ± 0.12 | 92.29 ± 0.04 |
Llama 3.3 70B Meta | 66.76 ± 0.12 | 54.15 ± 0.06 | 17.03 ± 0.12 | 91.27 ± 0.02 |
Mistral Medium 3.5 128B Mistral AI | 66.64 ± 0.17 | 54.12 ± 0.05 | 16.14 ± 0.10 | 92.10 ± 0.03 |
SEA-LION v4 (Gemma) 27B AISG | 65.05 ± 0.16 | 52.28 ± 0.03 | 14.66 ± 0.07 | 89.91 ± 0.03 |
Gemma 3 27B | 64.66 ± 0.14 | 51.55 ± 0.04 | 14.75 ± 0.09 | 88.36 ± 0.02 |
Llama 4 Scout 109B MoE Meta | 64.44 ± 0.12 | 54.62 ± 0.04 | 17.56 ± 0.07 | 91.68 ± 0.01 |
SEA-LION v4 (Qwen VL) 8B AISG | 64.26 ± 0.11 | 53.73 ± 0.04 | 16.03 ± 0.08 | 91.43 ± 0.02 |
Qwen 3.5 35B MoE Alibaba | 64.13 ± 0.27 | 54.43 ± 0.07 | 16.71 ± 0.12 | 92.16 ± 0.02 |
Qwen 3 VL 8B Alibaba | 63.86 ± 0.13 | 53.78 ± 0.04 | 16.06 ± 0.07 | 91.49 ± 0.02 |
Gemma 3 12B | 63.53 ± 0.06 | 53.49 ± 0.00 | 14.37 ± 0.00 | 92.61 ± 0.00 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.03 ± 0.16 | 54.48 ± 0.06 | 17.35 ± 0.11 | 91.61 ± 0.03 |
Gemma 4 (E4B) 8B | 61.70 ± 0.12 | 53.97 ± 0.05 | 15.70 ± 0.10 | 92.24 ± 0.02 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 60.31 ± 0.19 | 50.60 ± 0.06 | 14.54 ± 0.12 | 86.66 ± 0.05 |
SEA-LION v4 (Qwen VL) 4B AISG | 59.69 ± 0.11 | 52.47 ± 0.04 | 15.59 ± 0.07 | 89.36 ± 0.03 |
Qwen 3 VL 4B Alibaba | 59.45 ± 0.10 | 52.70 ± 0.04 | 15.60 ± 0.09 | 89.80 ± 0.02 |
Qwen 3.5 9B Alibaba | 58.38 ± 0.37 | 51.78 ± 0.08 | 14.77 ± 0.14 | 88.79 ± 0.06 |
Mistral Small 4 119B MoE Mistral AI | 58.01 ± 0.19 | 54.10 ± 0.05 | 16.07 ± 0.11 | 92.13 ± 0.03 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 54.80 ± 0.17 | 53.02 ± 0.05 | 15.13 ± 0.10 | 90.91 ± 0.01 |
SEA-LION v3 (Llama) 8B AISG | 54.70 ± 0.21 | 53.91 ± 0.07 | 16.25 ± 0.13 | 91.57 ± 0.02 |
Gemma 4 (E2B) 5B | 53.51 ± 0.13 | 52.57 ± 0.05 | 14.33 ± 0.10 | 90.81 ± 0.01 |
Qwen 3.5 4B Alibaba | 53.07 ± 0.28 | 47.19 ± 0.11 | 13.13 ± 0.13 | 81.25 ± 0.13 |
SEA-LION v4 (Gemma VL) 4B AISG | 52.38 ± 0.14 | 52.68 ± 0.06 | 15.49 ± 0.12 | 89.87 ± 0.03 |
Olmo 3.1 32B AI2 | 51.32 ± 0.19 | 51.34 ± 0.05 | 16.34 ± 0.10 | 86.34 ± 0.04 |
Gemma 3 4B | 50.40 ± 0.15 | 52.66 ± 0.06 | 14.85 ± 0.11 | 90.47 ± 0.02 |
MERaLiON 2 10B A*STAR | 49.83 ± 0.21 | 52.52 ± 0.07 | 16.86 ± 0.13 | 88.19 ± 0.06 |
GLM 4.7 Flash 30B MoE Z.ai | 46.83 ± 0.30 | 51.25 ± 0.09 | 15.68 ± 0.15 | 86.83 ± 0.07 |
Llama 3.1 8B Meta | 43.88 ± 0.19 | 52.47 ± 0.06 | 17.54 ± 0.10 | 87.40 ± 0.07 |
SEA-LION v4 (Apertus) 8B AISG | 43.71 ± 0.21 | 50.14 ± 0.06 | 13.14 ± 0.10 | 87.15 ± 0.05 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 40.13 ± 0.27 | 42.72 ± 0.12 | 12.63 ± 0.13 | 72.81 ± 0.15 |
Apertus 8B Swiss AI | 40.13 ± 0.24 | 48.46 ± 0.15 | 16.64 ± 0.19 | 80.28 ± 0.17 |
Tiny Aya Water 3B CohereLabs | 34.69 ± 0.32 | 39.16 ± 0.15 | 11.63 ± 0.25 | 66.68 ± 0.19 |
Tiny Aya Global 3B CohereLabs | 33.02 ± 0.30 | 38.38 ± 0.17 | 10.94 ± 0.24 | 65.83 ± 0.20 |
Llama 3.2 3B Meta | 32.31 ± 0.21 | 49.50 ± 0.06 | 15.98 ± 0.11 | 83.02 ± 0.05 |
MERaLiON 2 3B A*STAR | 18.30 ± 0.25 | 37.78 ± 0.11 | 10.73 ± 0.18 | 64.84 ± 0.12 |
Model | VI | NLR | Causal Reasoning | Natural Language Inference |
|---|---|---|---|---|
Gemma 4 31B | 74.51 ± 0.08 | 78.91 ± 0.05 | 92.25 ± 0.07 | 65.56 ± 0.10 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.08 ± 0.13 | 75.28 ± 0.17 | 92.84 ± 0.21 | 57.72 ± 0.30 |
Qwen 3.5 122B MoE Alibaba | 71.05 ± 0.19 | 71.31 ± 0.21 | 90.97 ± 0.33 | 51.64 ± 0.24 |
Gemma 4 26B MoE | 69.80 ± 0.11 | 73.59 ± 0.12 | 90.13 ± 0.13 | 57.04 ± 0.17 |
Qwen 3.5 27B Alibaba | 69.55 ± 0.15 | 71.00 ± 0.21 | 90.29 ± 0.33 | 51.71 ± 0.18 |
Qwen 3.6 27B Alibaba | 68.70 ± 0.21 | 71.59 ± 0.31 | 89.60 ± 0.40 | 53.58 ± 0.44 |
SEA-LION v3 (Llama) 70B AISG | 68.31 ± 0.23 | 72.86 ± 0.26 | 90.67 ± 0.35 | 55.05 ± 0.44 |
SEA-LION v4 (Qwen) 32B AISG | 67.09 ± 0.11 | 69.90 ± 0.09 | 85.84 ± 0.10 | 53.96 ± 0.14 |
Qwen 3 VL 32B Alibaba | 66.93 ± 0.09 | 70.71 ± 0.11 | 90.27 ± 0.08 | 51.15 ± 0.20 |
Qwen 3.6 35B MoE Alibaba | 66.79 ± 0.23 | 68.72 ± 0.32 | 87.80 ± 0.49 | 49.63 ± 0.51 |
Llama 3.3 70B Meta | 66.76 ± 0.12 | 74.94 ± 0.09 | 89.64 ± 0.13 | 60.24 ± 0.15 |
Mistral Medium 3.5 128B Mistral AI | 66.64 ± 0.17 | 64.88 ± 0.27 | 87.16 ± 0.43 | 42.60 ± 0.20 |
SEA-LION v4 (Gemma) 27B AISG | 65.05 ± 0.16 | 70.73 ± 0.16 | 85.28 ± 0.22 | 56.18 ± 0.23 |
Gemma 3 27B | 64.66 ± 0.14 | 70.62 ± 0.11 | 85.43 ± 0.13 | 55.81 ± 0.16 |
Llama 4 Scout 109B MoE Meta | 64.44 ± 0.12 | 58.26 ± 0.08 | 86.89 ± 0.12 | 29.63 ± 0.15 |
SEA-LION v4 (Qwen VL) 8B AISG | 64.26 ± 0.11 | 64.79 ± 0.10 | 82.37 ± 0.13 | 47.20 ± 0.17 |
Qwen 3.5 35B MoE Alibaba | 64.13 ± 0.27 | 64.20 ± 0.42 | 84.36 ± 0.60 | 44.03 ± 0.48 |
Qwen 3 VL 8B Alibaba | 63.86 ± 0.13 | 64.22 ± 0.15 | 80.56 ± 0.26 | 47.88 ± 0.20 |
Gemma 3 12B | 63.53 ± 0.06 | 74.20 ± 0.00 | 90.40 ± 0.00 | 57.99 ± 0.00 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.03 ± 0.16 | 64.21 ± 0.17 | 84.88 ± 0.26 | 43.54 ± 0.24 |
Gemma 4 (E4B) 8B | 61.70 ± 0.12 | 61.91 ± 0.26 | 83.84 ± 0.45 | 39.97 ± 0.41 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 60.31 ± 0.19 | 59.32 ± 0.36 | 79.28 ± 0.44 | 39.37 ± 0.41 |
SEA-LION v4 (Qwen VL) 4B AISG | 59.69 ± 0.11 | 62.08 ± 0.11 | 73.00 ± 0.19 | 51.15 ± 0.13 |
Qwen 3 VL 4B Alibaba | 59.45 ± 0.10 | 61.32 ± 0.13 | 72.65 ± 0.19 | 49.98 ± 0.21 |
Qwen 3.5 9B Alibaba | 58.38 ± 0.37 | 51.72 ± 0.47 | 75.93 ± 0.74 | 27.50 ± 0.59 |
Mistral Small 4 119B MoE Mistral AI | 58.01 ± 0.19 | 50.41 ± 0.56 | 69.89 ± 0.92 | 30.93 ± 0.52 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 54.80 ± 0.17 | 46.24 ± 0.23 | 65.40 ± 0.38 | 27.09 ± 0.30 |
SEA-LION v3 (Llama) 8B AISG | 54.70 ± 0.21 | 54.45 ± 0.36 | 75.91 ± 0.42 | 33.00 ± 0.63 |
Gemma 4 (E2B) 5B | 53.51 ± 0.13 | 44.18 ± 0.19 | 66.25 ± 0.33 | 22.11 ± 0.23 |
Qwen 3.5 4B Alibaba | 53.07 ± 0.28 | 40.30 ± 0.35 | 66.16 ± 0.74 | 14.43 ± 0.60 |
SEA-LION v4 (Gemma VL) 4B AISG | 52.38 ± 0.14 | 50.13 ± 0.14 | 66.43 ± 0.22 | 33.84 ± 0.18 |
Olmo 3.1 32B AI2 | 51.32 ± 0.19 | 41.52 ± 0.36 | 54.51 ± 0.55 | 28.54 ± 0.34 |
Gemma 3 4B | 50.40 ± 0.15 | 42.08 ± 0.24 | 56.07 ± 0.42 | 28.10 ± 0.19 |
MERaLiON 2 10B A*STAR | 49.83 ± 0.21 | 56.09 ± 0.24 | 83.23 ± 0.28 | 28.94 ± 0.33 |
GLM 4.7 Flash 30B MoE Z.ai | 46.83 ± 0.30 | 18.40 ± 0.74 | 18.47 ± 1.14 | 18.34 ± 0.73 |
Llama 3.1 8B Meta | 43.88 ± 0.19 | 17.25 ± 0.47 | 1.37 ± 0.50 | 33.13 ± 0.71 |
SEA-LION v4 (Apertus) 8B AISG | 43.71 ± 0.21 | 27.89 ± 0.41 | 55.77 ± 0.82 | 0.00 ± 0.00 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 40.13 ± 0.27 | 20.08 ± 0.77 | 29.95 ± 1.29 | 10.21 ± 0.61 |
Apertus 8B Swiss AI | 40.13 ± 0.24 | 24.28 ± 0.51 | 47.99 ± 1.01 | 0.58 ± 0.13 |
Tiny Aya Water 3B CohereLabs | 34.69 ± 0.32 | 20.17 ± 0.58 | 37.12 ± 1.13 | 3.22 ± 0.61 |
Tiny Aya Global 3B CohereLabs | 33.02 ± 0.30 | 14.42 ± 0.55 | 27.85 ± 1.06 | 1.00 ± 0.29 |
Llama 3.2 3B Meta | 32.31 ± 0.21 | 3.63 ± 0.24 | 0.00 ± 0.00 | 7.27 ± 0.49 |
MERaLiON 2 3B A*STAR | 18.30 ± 0.25 | 6.72 ± 0.71 | 13.07 ± 1.44 | 0.36 ± 0.13 |
Model | VI | NLU | Question Answering | Sentiment Analysis |
|---|---|---|---|---|
Gemma 4 31B | 74.51 ± 0.08 | 70.05 ± 0.09 | 77.72 ± 0.17 | 62.37 ± 0.07 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.08 ± 0.13 | 66.37 ± 0.17 | 78.45 ± 0.16 | 54.28 ± 0.31 |
Qwen 3.5 122B MoE Alibaba | 71.05 ± 0.19 | 67.59 ± 0.29 | 80.13 ± 0.30 | 55.06 ± 0.46 |
Gemma 4 26B MoE | 69.80 ± 0.11 | 64.17 ± 0.19 | 67.04 ± 0.35 | 61.29 ± 0.09 |
Qwen 3.5 27B Alibaba | 69.55 ± 0.15 | 65.75 ± 0.26 | 80.92 ± 0.37 | 50.59 ± 0.29 |
Qwen 3.6 27B Alibaba | 68.70 ± 0.21 | 65.83 ± 0.24 | 78.79 ± 0.38 | 52.86 ± 0.42 |
SEA-LION v3 (Llama) 70B AISG | 68.31 ± 0.23 | 68.88 ± 0.35 | 76.06 ± 0.49 | 61.70 ± 0.55 |
SEA-LION v4 (Qwen) 32B AISG | 67.09 ± 0.11 | 63.80 ± 0.15 | 68.18 ± 0.22 | 59.42 ± 0.26 |
Qwen 3 VL 32B Alibaba | 66.93 ± 0.09 | 68.54 ± 0.12 | 75.94 ± 0.17 | 61.15 ± 0.19 |
Qwen 3.6 35B MoE Alibaba | 66.79 ± 0.23 | 68.75 ± 0.34 | 80.07 ± 0.52 | 57.43 ± 0.52 |
Llama 3.3 70B Meta | 66.76 ± 0.12 | 65.40 ± 0.18 | 75.56 ± 0.24 | 55.25 ± 0.25 |
Mistral Medium 3.5 128B Mistral AI | 66.64 ± 0.17 | 68.75 ± 0.33 | 79.43 ± 0.59 | 58.06 ± 0.44 |
SEA-LION v4 (Gemma) 27B AISG | 65.05 ± 0.16 | 63.88 ± 0.24 | 65.02 ± 0.42 | 62.75 ± 0.38 |
Gemma 3 27B | 64.66 ± 0.14 | 63.12 ± 0.23 | 63.63 ± 0.41 | 62.62 ± 0.30 |
Llama 4 Scout 109B MoE Meta | 64.44 ± 0.12 | 67.48 ± 0.12 | 76.13 ± 0.17 | 58.83 ± 0.15 |
SEA-LION v4 (Qwen VL) 8B AISG | 64.26 ± 0.11 | 70.42 ± 0.13 | 81.51 ± 0.19 | 59.34 ± 0.17 |
Qwen 3.5 35B MoE Alibaba | 64.13 ± 0.27 | 62.30 ± 0.43 | 79.30 ± 0.48 | 45.30 ± 0.69 |
Qwen 3 VL 8B Alibaba | 63.86 ± 0.13 | 71.80 ± 0.12 | 82.66 ± 0.20 | 60.93 ± 0.16 |
Gemma 3 12B | 63.53 ± 0.06 | 65.55 ± 0.00 | 65.03 ± 0.00 | 66.07 ± 0.00 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.03 ± 0.16 | 70.67 ± 0.26 | 78.09 ± 0.45 | 63.25 ± 0.28 |
Gemma 4 (E4B) 8B | 61.70 ± 0.12 | 53.36 ± 0.31 | 44.98 ± 0.47 | 61.74 ± 0.34 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 60.31 ± 0.19 | 60.42 ± 0.45 | 68.17 ± 0.66 | 52.67 ± 0.95 |
SEA-LION v4 (Qwen VL) 4B AISG | 59.69 ± 0.11 | 68.82 ± 0.14 | 75.94 ± 0.20 | 61.69 ± 0.18 |
Qwen 3 VL 4B Alibaba | 59.45 ± 0.10 | 71.04 ± 0.11 | 77.63 ± 0.18 | 64.45 ± 0.07 |
Qwen 3.5 9B Alibaba | 58.38 ± 0.37 | 61.18 ± 0.34 | 76.67 ± 0.49 | 45.69 ± 0.63 |
Mistral Small 4 119B MoE Mistral AI | 58.01 ± 0.19 | 70.77 ± 0.42 | 77.13 ± 0.36 | 64.41 ± 0.71 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 54.80 ± 0.17 | 53.65 ± 0.22 | 46.85 ± 0.34 | 60.45 ± 0.31 |
SEA-LION v3 (Llama) 8B AISG | 54.70 ± 0.21 | 65.69 ± 0.42 | 67.51 ± 0.55 | 63.88 ± 0.56 |
Gemma 4 (E2B) 5B | 53.51 ± 0.13 | 49.52 ± 0.22 | 40.13 ± 0.28 | 58.92 ± 0.33 |
Qwen 3.5 4B Alibaba | 53.07 ± 0.28 | 58.28 ± 0.42 | 75.20 ± 0.59 | 41.36 ± 0.65 |
SEA-LION v4 (Gemma VL) 4B AISG | 52.38 ± 0.14 | 62.55 ± 0.22 | 62.22 ± 0.28 | 62.87 ± 0.30 |
Olmo 3.1 32B AI2 | 51.32 ± 0.19 | 64.91 ± 0.33 | 67.19 ± 0.40 | 62.63 ± 0.45 |
Gemma 3 4B | 50.40 ± 0.15 | 56.33 ± 0.20 | 48.03 ± 0.37 | 64.63 ± 0.21 |
MERaLiON 2 10B A*STAR | 49.83 ± 0.21 | 55.22 ± 0.43 | 49.23 ± 0.83 | 61.21 ± 0.22 |
GLM 4.7 Flash 30B MoE Z.ai | 46.83 ± 0.30 | 58.96 ± 0.58 | 73.83 ± 0.77 | 44.10 ± 0.68 |
Llama 3.1 8B Meta | 43.88 ± 0.19 | 63.50 ± 0.33 | 71.71 ± 0.40 | 55.29 ± 0.50 |
SEA-LION v4 (Apertus) 8B AISG | 43.71 ± 0.21 | 56.89 ± 0.31 | 72.93 ± 0.24 | 40.85 ± 0.51 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 40.13 ± 0.27 | 36.37 ± 0.77 | 61.53 ± 0.93 | 11.22 ± 1.18 |
Apertus 8B Swiss AI | 40.13 ± 0.24 | 44.09 ± 0.73 | 63.04 ± 0.83 | 25.14 ± 1.19 |
Tiny Aya Water 3B CohereLabs | 34.69 ± 0.32 | 37.24 ± 0.77 | 57.54 ± 0.96 | 16.94 ± 1.00 |
Tiny Aya Global 3B CohereLabs | 33.02 ± 0.30 | 36.62 ± 0.72 | 57.09 ± 1.02 | 16.16 ± 1.09 |
Llama 3.2 3B Meta | 32.31 ± 0.21 | 50.16 ± 0.34 | 56.80 ± 0.43 | 43.53 ± 0.54 |
MERaLiON 2 3B A*STAR | 18.30 ± 0.25 | 14.87 ± 0.34 | 29.74 ± 0.69 | 0.00 ± 0.00 |
Model | VI | Safety | Toxicity Detection |
|---|---|---|---|
Gemma 4 31B | 74.51 ± 0.08 | 60.90 ± 0.14 | 60.90 ± 0.14 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.08 ± 0.13 | 47.27 ± 0.71 | 47.27 ± 0.71 |
Qwen 3.5 122B MoE Alibaba | 71.05 ± 0.19 | 49.85 ± 0.91 | 49.85 ± 0.91 |
Gemma 4 26B MoE | 69.80 ± 0.11 | 50.98 ± 0.28 | 50.98 ± 0.28 |
Qwen 3.5 27B Alibaba | 69.55 ± 0.15 | 42.99 ± 0.81 | 42.99 ± 0.81 |
Qwen 3.6 27B Alibaba | 68.70 ± 0.21 | 39.69 ± 1.21 | 39.69 ± 1.21 |
SEA-LION v3 (Llama) 70B AISG | 68.31 ± 0.23 | 43.78 ± 0.92 | 43.78 ± 0.92 |
SEA-LION v4 (Qwen) 32B AISG | 67.09 ± 0.11 | 46.98 ± 0.19 | 46.98 ± 0.19 |
Qwen 3 VL 32B Alibaba | 66.93 ± 0.09 | 36.47 ± 0.16 | 36.47 ± 0.16 |
Qwen 3.6 35B MoE Alibaba | 66.79 ± 0.23 | 40.65 ± 1.39 | 40.65 ± 1.39 |
Llama 3.3 70B Meta | 66.76 ± 0.12 | 46.00 ± 0.48 | 46.00 ± 0.48 |
Mistral Medium 3.5 128B Mistral AI | 66.64 ± 0.17 | 43.76 ± 0.65 | 43.76 ± 0.65 |
SEA-LION v4 (Gemma) 27B AISG | 65.05 ± 0.16 | 42.11 ± 0.45 | 42.11 ± 0.45 |
Gemma 3 27B | 64.66 ± 0.14 | 40.21 ± 0.39 | 40.21 ± 0.39 |
Llama 4 Scout 109B MoE Meta | 64.44 ± 0.12 | 46.32 ± 0.24 | 46.32 ± 0.24 |
SEA-LION v4 (Qwen VL) 8B AISG | 64.26 ± 0.11 | 37.27 ± 0.30 | 37.27 ± 0.30 |
Qwen 3.5 35B MoE Alibaba | 64.13 ± 0.27 | 38.77 ± 1.15 | 38.77 ± 1.15 |
Qwen 3 VL 8B Alibaba | 63.86 ± 0.13 | 35.15 ± 0.30 | 35.15 ± 0.30 |
Gemma 3 12B | 63.53 ± 0.06 | 40.19 ± 0.00 | 40.19 ± 0.00 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.03 ± 0.16 | 44.97 ± 0.53 | 44.97 ± 0.53 |
Gemma 4 (E4B) 8B | 61.70 ± 0.12 | 45.03 ± 0.62 | 45.03 ± 0.62 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 60.31 ± 0.19 | 26.37 ± 0.51 | 26.37 ± 0.51 |
SEA-LION v4 (Qwen VL) 4B AISG | 59.69 ± 0.11 | 27.37 ± 0.22 | 27.37 ± 0.22 |
Qwen 3 VL 4B Alibaba | 59.45 ± 0.10 | 26.66 ± 0.22 | 26.66 ± 0.22 |
Qwen 3.5 9B Alibaba | 58.38 ± 0.37 | 28.58 ± 1.27 | 28.58 ± 1.27 |
Mistral Small 4 119B MoE Mistral AI | 58.01 ± 0.19 | 21.29 ± 0.53 | 21.29 ± 0.53 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 54.80 ± 0.17 | 34.93 ± 0.35 | 34.93 ± 0.35 |
SEA-LION v3 (Llama) 8B AISG | 54.70 ± 0.21 | 21.99 ± 0.48 | 21.99 ± 0.48 |
Gemma 4 (E2B) 5B | 53.51 ± 0.13 | 35.54 ± 0.52 | 35.54 ± 0.52 |
Qwen 3.5 4B Alibaba | 53.07 ± 0.28 | 31.55 ± 1.11 | 31.55 ± 1.11 |
SEA-LION v4 (Gemma VL) 4B AISG | 52.38 ± 0.14 | 15.45 ± 0.20 | 15.45 ± 0.20 |
Olmo 3.1 32B AI2 | 51.32 ± 0.19 | 17.04 ± 0.71 | 17.04 ± 0.71 |
Gemma 3 4B | 50.40 ± 0.15 | 18.36 ± 0.35 | 18.36 ± 0.35 |
MERaLiON 2 10B A*STAR | 49.83 ± 0.21 | 41.21 ± 0.56 | 41.21 ± 0.56 |
GLM 4.7 Flash 30B MoE Z.ai | 46.83 ± 0.30 | 31.47 ± 1.07 | 31.47 ± 1.07 |
Llama 3.1 8B Meta | 43.88 ± 0.19 | 22.19 ± 0.70 | 22.19 ± 0.70 |
SEA-LION v4 (Apertus) 8B AISG | 43.71 ± 0.21 | 22.62 ± 0.39 | 22.62 ± 0.39 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 40.13 ± 0.27 | 11.42 ± 1.03 | 11.42 ± 1.03 |
Apertus 8B Swiss AI | 40.13 ± 0.24 | 23.75 ± 0.97 | 23.75 ± 0.97 |
Tiny Aya Water 3B CohereLabs | 34.69 ± 0.32 | 10.90 ± 0.83 | 10.90 ± 0.83 |
Tiny Aya Global 3B CohereLabs | 33.02 ± 0.30 | 10.82 ± 0.73 | 10.82 ± 0.73 |
Llama 3.2 3B Meta | 32.31 ± 0.21 | 9.15 ± 0.78 | 9.15 ± 0.78 |
MERaLiON 2 3B A*STAR | 18.30 ± 0.25 | 3.06 ± 0.71 | 3.06 ± 0.71 |