Indonesian Performance
Indonesian Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
31B 77.36±0.07 |
27B 74.51±0.13 |
27B 74.38±0.14 |
122B MoE 74.18±0.12 |
27B 73.00±0.15 |
26B MoE 72.95±0.08 |
32B 71.79±0.08 |
32B 70.91±0.09 |
70B 70.23±0.14 |
70B 69.68±0.08 |
128B 69.30±0.14 |
35B MoE 68.88±0.13 |
27B 68.35±0.09 |
27B 68.27±0.10 |
109B MoE 68.00±0.10 |
8B 67.39±0.10 |
8B 66.92±0.09 |
12B 66.22±0.06 |
8B 66.08±0.14 |
9B 64.52±0.20 |
35B MoE 64.52±0.17 |
120B MoE 63.14±0.18 |
9B 61.69±0.22 |
119B MoE 61.61±0.19 |
4B 61.32±0.10 |
4B 60.83±0.12 |
10B 58.36±0.14 |
8B 58.03±0.16 |
5B 57.41±0.10 |
5B 56.55±0.11 |
4B 56.39±0.11 |
4B 54.52±0.10 |
4B 54.43±0.22 |
32B 53.36±0.14 |
8B 53.03±0.18 |
30B MoE 50.35±0.28 |
8B 47.36±0.13 |
8B 38.87±0.21 |
30B MoE 37.29±0.25 |
3B 37.24±0.16 |
3B 36.47±0.26 |
3B 35.82±0.23 |
3B 34.82±0.26 |
Indonesian Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | ID | Instruction Following | Knowledge | Linguistic Diagnostics | Multi-Turn Chat | NLG | NLR | NLU | Safety |
|---|---|---|---|---|---|---|---|---|---|
Gemma 4 31B | 77.36 ± 0.07 | 96.57 ± 0.30 | 78.20 ± 0.14 | 74.41 ± 0.22 | 85.29 ± 0.28 | 56.12 ± 0.06 | 89.03 ± 0.04 | 81.98 ± 0.06 | 57.30 ± 0.17 |
SEA-LION v4.5 (Qwen) 27B AISG | 74.51 ± 0.13 | 91.46 ± 0.46 | 79.51 ± 0.34 | 57.25 ± 0.43 | 85.05 ± 0.43 | 54.87 ± 0.06 | 88.34 ± 0.19 | 79.84 ± 0.20 | 59.76 ± 0.37 |
Qwen 3.5 27B Alibaba | 74.38 ± 0.14 | 94.54 ± 0.56 | 79.77 ± 0.33 | 60.11 ± 0.44 | 84.74 ± 0.34 | 54.86 ± 0.06 | 86.80 ± 0.19 | 77.43 ± 0.21 | 56.82 ± 0.62 |
Qwen 3.5 122B MoE Alibaba | 74.18 ± 0.12 | 90.51 ± 0.52 | 78.17 ± 0.44 | 62.18 ± 0.44 | 85.47 ± 0.38 | 54.72 ± 0.05 | 87.87 ± 0.15 | 78.07 ± 0.23 | 56.50 ± 0.50 |
Qwen 3.6 27B Alibaba | 73.00 ± 0.15 | 90.89 ± 0.71 | 76.63 ± 0.51 | 55.39 ± 0.53 | 85.97 ± 0.33 | 54.38 ± 0.06 | 87.26 ± 0.28 | 79.83 ± 0.25 | 53.62 ± 0.69 |
Gemma 4 26B MoE | 72.95 ± 0.08 | 93.14 ± 0.35 | 73.33 ± 0.17 | 60.06 ± 0.22 | 84.38 ± 0.36 | 56.06 ± 0.06 | 85.92 ± 0.11 | 79.00 ± 0.13 | 51.71 ± 0.22 |
Qwen 3 VL 32B Alibaba | 71.79 ± 0.08 | 88.73 ± 0.42 | 71.14 ± 0.20 | 60.43 ± 0.17 | 81.32 ± 0.37 | 54.35 ± 0.03 | 87.41 ± 0.09 | 77.40 ± 0.10 | 53.55 ± 0.19 |
SEA-LION v4 (Qwen) 32B AISG | 70.91 ± 0.09 | 89.43 ± 0.42 | 70.13 ± 0.15 | 56.88 ± 0.26 | 79.27 ± 0.47 | 54.87 ± 0.03 | 85.10 ± 0.07 | 79.42 ± 0.09 | 52.20 ± 0.17 |
SEA-LION v3 (Llama) 70B AISG | 70.23 ± 0.14 | 92.79 ± 0.43 | 74.68 ± 0.39 | 55.90 ± 0.48 | 72.93 ± 0.45 | 56.00 ± 0.07 | 85.20 ± 0.23 | 75.96 ± 0.29 | 48.39 ± 0.60 |
Llama 3.3 70B Meta | 69.68 ± 0.08 | 94.41 ± 0.38 | 73.16 ± 0.22 | 55.73 ± 0.18 | 65.50 ± 0.54 | 55.28 ± 0.06 | 85.58 ± 0.11 | 76.22 ± 0.10 | 51.58 ± 0.30 |
Mistral Medium 3.5 128B Mistral AI | 69.30 ± 0.14 | 86.60 ± 0.52 | 73.42 ± 0.38 | 50.20 ± 0.57 | 82.85 ± 0.51 | 54.74 ± 0.05 | 78.96 ± 0.21 | 74.98 ± 0.21 | 52.66 ± 0.44 |
Qwen 3.6 35B MoE Alibaba | 68.88 ± 0.13 | 88.22 ± 0.70 | 73.10 ± 0.46 | 53.90 ± 0.59 | 84.41 ± 0.39 | 54.61 ± 0.06 | 83.80 ± 0.32 | 75.80 ± 0.32 | 37.22 ± 0.54 |
Gemma 3 27B | 68.35 ± 0.09 | 87.75 ± 0.62 | 66.37 ± 0.23 | 55.64 ± 0.25 | 77.49 ± 0.37 | 54.95 ± 0.04 | 81.80 ± 0.05 | 75.24 ± 0.11 | 47.56 ± 0.13 |
SEA-LION v4 (Gemma) 27B AISG | 68.27 ± 0.10 | 89.24 ± 0.61 | 64.80 ± 0.17 | 53.95 ± 0.32 | 78.13 ± 0.26 | 54.82 ± 0.04 | 81.91 ± 0.11 | 75.09 ± 0.15 | 48.25 ± 0.25 |
Llama 4 Scout 109B MoE Meta | 68.00 ± 0.10 | 92.25 ± 0.39 | 66.58 ± 0.11 | 54.90 ± 0.23 | 69.89 ± 0.54 | 55.38 ± 0.05 | 75.01 ± 0.06 | 76.12 ± 0.09 | 53.87 ± 0.17 |
SEA-LION v4 (Qwen VL) 8B AISG | 67.39 ± 0.10 | 93.62 ± 0.46 | 59.79 ± 0.22 | 45.45 ± 0.24 | 77.72 ± 0.41 | 54.08 ± 0.03 | 80.52 ± 0.08 | 73.33 ± 0.09 | 54.60 ± 0.26 |
Qwen 3 VL 8B Alibaba | 66.92 ± 0.09 | 91.46 ± 0.44 | 60.17 ± 0.18 | 43.43 ± 0.23 | 79.89 ± 0.35 | 53.60 ± 0.04 | 79.93 ± 0.08 | 74.01 ± 0.08 | 52.90 ± 0.25 |
Gemma 3 12B | 66.22 ± 0.06 | 92.38 ± 0.00 | 61.30 ± 0.00 | 48.57 ± 0.11 | 74.61 ± 0.45 | 54.37 ± 0.00 | 81.97 ± 0.00 | 75.53 ± 0.08 | 41.08 ± 0.00 |
Gemma 4 (E4B) 8B | 66.08 ± 0.14 | 89.24 ± 0.65 | 58.91 ± 0.45 | 47.81 ± 0.43 | 77.43 ± 0.43 | 54.41 ± 0.08 | 79.98 ± 0.18 | 73.39 ± 0.24 | 47.46 ± 0.32 |
SEA-LION v3 (Gemma 2) 9B AISG | 64.52 ± 0.20 | 89.78 ± 0.69 | 58.25 ± 0.50 | 46.49 ± 0.36 | 67.07 ± 0.60 | 55.46 ± 0.07 | 81.53 ± 0.14 | 75.06 ± 0.22 | 42.53 ± 0.61 |
Qwen 3.5 35B MoE Alibaba | 64.52 ± 0.17 | 90.00 ± 0.69 | 51.13 ± 0.71 | 51.69 ± 0.51 | 84.79 ± 0.38 | 54.16 ± 0.06 | 76.55 ± 0.32 | 73.10 ± 0.32 | 34.74 ± 0.81 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 63.14 ± 0.18 | 79.75 ± 0.74 | 68.58 ± 0.63 | 41.25 ± 0.63 | 84.77 ± 0.43 | 51.93 ± 0.05 | 70.84 ± 0.47 | 70.70 ± 0.33 | 37.27 ± 0.69 |
Qwen 3.5 9B Alibaba | 61.69 ± 0.22 | 85.87 ± 0.88 | 59.41 ± 0.58 | 37.38 ± 0.80 | 79.07 ± 0.42 | 52.39 ± 0.07 | 63.25 ± 0.43 | 71.16 ± 0.40 | 44.97 ± 0.72 |
Mistral Small 4 119B MoE Mistral AI | 61.61 ± 0.19 | 77.21 ± 0.85 | 61.59 ± 0.48 | 44.85 ± 0.77 | 76.99 ± 0.57 | 54.01 ± 0.07 | 70.00 ± 0.42 | 75.46 ± 0.30 | 32.81 ± 0.45 |
SEA-LION v4 (Qwen VL) 4B AISG | 61.32 ± 0.10 | 90.03 ± 0.43 | 50.30 ± 0.16 | 39.89 ± 0.19 | 74.48 ± 0.56 | 52.86 ± 0.04 | 75.64 ± 0.09 | 71.20 ± 0.10 | 36.15 ± 0.19 |
Qwen 3 VL 4B Alibaba | 60.83 ± 0.12 | 88.95 ± 0.56 | 48.81 ± 0.21 | 43.56 ± 0.20 | 73.90 ± 0.50 | 52.46 ± 0.04 | 74.76 ± 0.11 | 69.18 ± 0.11 | 34.97 ± 0.19 |
MERaLiON 2 10B A*STAR | 58.36 ± 0.14 | 72.16 ± 0.91 | 51.24 ± 0.51 | 44.87 ± 0.54 | 54.82 ± 0.42 | 55.31 ± 0.07 | 73.01 ± 0.23 | 71.15 ± 0.25 | 44.33 ± 0.50 |
SEA-LION v3 (Llama) 8B AISG | 58.03 ± 0.16 | 86.19 ± 0.75 | 51.73 ± 0.55 | 28.00 ± 0.51 | 62.76 ± 0.52 | 54.48 ± 0.06 | 72.28 ± 0.36 | 67.74 ± 0.38 | 41.07 ± 0.66 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.41 ± 0.10 | 88.92 ± 0.65 | 44.54 ± 0.34 | 34.49 ± 0.18 | 73.21 ± 0.47 | 53.80 ± 0.06 | 60.65 ± 0.19 | 64.11 ± 0.21 | 39.54 ± 0.24 |
Gemma 4 (E2B) 5B | 56.55 ± 0.11 | 84.73 ± 0.77 | 44.10 ± 0.32 | 34.58 ± 0.30 | 74.03 ± 0.43 | 53.43 ± 0.06 | 59.09 ± 0.17 | 63.82 ± 0.20 | 38.60 ± 0.20 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.39 ± 0.11 | 87.30 ± 0.58 | 46.94 ± 0.20 | 28.11 ± 0.25 | 68.46 ± 0.41 | 53.71 ± 0.05 | 65.24 ± 0.10 | 68.22 ± 0.10 | 33.14 ± 0.15 |
Gemma 3 4B | 54.52 ± 0.10 | 85.68 ± 0.69 | 49.41 ± 0.13 | 26.88 ± 0.25 | 67.53 ± 0.34 | 53.45 ± 0.07 | 60.98 ± 0.09 | 67.17 ± 0.13 | 25.08 ± 0.17 |
Qwen 3.5 4B Alibaba | 54.43 ± 0.22 | 80.51 ± 0.96 | 50.74 ± 0.58 | 29.44 ± 0.84 | 68.55 ± 0.63 | 49.20 ± 0.07 | 54.73 ± 0.60 | 66.96 ± 0.48 | 35.30 ± 0.66 |
Olmo 3.1 32B AI2 | 53.36 ± 0.14 | 82.57 ± 0.73 | 46.81 ± 0.39 | 27.53 ± 0.44 | 71.82 ± 0.60 | 51.64 ± 0.05 | 66.99 ± 0.24 | 62.86 ± 0.22 | 16.65 ± 0.35 |
Llama 3.1 8B Meta | 53.03 ± 0.18 | 77.65 ± 0.92 | 41.85 ± 0.48 | 28.64 ± 0.72 | 50.00 ± 0.52 | 53.98 ± 0.07 | 64.79 ± 0.34 | 67.92 ± 0.29 | 39.40 ± 0.52 |
GLM 4.7 Flash 30B MoE Z.ai | 50.35 ± 0.28 | 75.90 ± 1.20 | 44.08 ± 0.76 | 25.93 ± 0.63 | 57.62 ± 0.60 | 51.00 ± 0.12 | 37.49 ± 0.76 | 63.92 ± 0.52 | 46.83 ± 0.52 |
SEA-LION v4 (Apertus) 8B AISG | 47.36 ± 0.13 | 71.68 ± 0.71 | 43.66 ± 0.38 | 27.69 ± 0.49 | 50.07 ± 0.63 | 54.18 ± 0.04 | 38.36 ± 0.34 | 60.92 ± 0.25 | 32.32 ± 0.31 |
Apertus 8B Swiss AI | 38.87 ± 0.21 | 74.00 ± 1.08 | 26.95 ± 0.91 | 21.49 ± 0.61 | 43.92 ± 0.48 | 53.30 ± 0.09 | 23.37 ± 0.73 | 46.63 ± 0.53 | 21.34 ± 0.66 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 37.29 ± 0.25 | 73.49 ± 1.30 | 34.11 ± 0.93 | 15.66 ± 0.52 | 66.11 ± 0.61 | 46.16 ± 0.08 | 15.76 ± 0.72 | 38.55 ± 0.72 | 8.51 ± 0.74 |
MERaLiON 2 3B A*STAR | 37.24 ± 0.16 | 52.70 ± 1.23 | 14.85 ± 0.56 | 17.26 ± 0.39 | 31.08 ± 0.72 | 47.53 ± 0.10 | 43.17 ± 0.35 | 58.36 ± 0.33 | 32.99 ± 0.40 |
Tiny Aya Water 3B CohereLabs | 36.47 ± 0.26 | 69.81 ± 1.25 | 34.87 ± 0.55 | 13.74 ± 0.76 | 30.65 ± 0.74 | 40.73 ± 0.14 | 30.22 ± 0.53 | 46.48 ± 0.72 | 25.26 ± 0.97 |
Llama 3.2 3B Meta | 35.82 ± 0.23 | 61.17 ± 1.14 | 14.26 ± 0.46 | 19.27 ± 0.43 | 44.57 ± 0.51 | 51.08 ± 0.06 | 16.54 ± 0.59 | 50.34 ± 0.52 | 29.36 ± 0.32 |
Tiny Aya Global 3B CohereLabs | 34.82 ± 0.26 | 67.08 ± 1.27 | 33.49 ± 0.68 | 13.02 ± 0.69 | 29.68 ± 0.55 | 40.97 ± 0.16 | 26.85 ± 0.57 | 47.51 ± 0.66 | 19.97 ± 0.80 |
Indonesian Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | ID | Instruction Following | SEA-IFEval |
|---|---|---|---|
Gemma 4 31B | 77.36 ± 0.07 | 96.57 ± 0.30 | 96.57 ± 0.30 |
SEA-LION v4.5 (Qwen) 27B AISG | 74.51 ± 0.13 | 91.46 ± 0.46 | 91.46 ± 0.46 |
Qwen 3.5 27B Alibaba | 74.38 ± 0.14 | 94.54 ± 0.56 | 94.54 ± 0.56 |
Qwen 3.5 122B MoE Alibaba | 74.18 ± 0.12 | 90.51 ± 0.52 | 90.51 ± 0.52 |
Qwen 3.6 27B Alibaba | 73.00 ± 0.15 | 90.89 ± 0.71 | 90.89 ± 0.71 |
Gemma 4 26B MoE | 72.95 ± 0.08 | 93.14 ± 0.35 | 93.14 ± 0.35 |
Qwen 3 VL 32B Alibaba | 71.79 ± 0.08 | 88.73 ± 0.42 | 88.73 ± 0.42 |
SEA-LION v4 (Qwen) 32B AISG | 70.91 ± 0.09 | 89.43 ± 0.42 | 89.43 ± 0.42 |
SEA-LION v3 (Llama) 70B AISG | 70.23 ± 0.14 | 92.79 ± 0.43 | 92.79 ± 0.43 |
Llama 3.3 70B Meta | 69.68 ± 0.08 | 94.41 ± 0.38 | 94.41 ± 0.38 |
Mistral Medium 3.5 128B Mistral AI | 69.30 ± 0.14 | 86.60 ± 0.52 | 86.60 ± 0.52 |
Qwen 3.6 35B MoE Alibaba | 68.88 ± 0.13 | 88.22 ± 0.70 | 88.22 ± 0.70 |
Gemma 3 27B | 68.35 ± 0.09 | 87.75 ± 0.62 | 87.75 ± 0.62 |
SEA-LION v4 (Gemma) 27B AISG | 68.27 ± 0.10 | 89.24 ± 0.61 | 89.24 ± 0.61 |
Llama 4 Scout 109B MoE Meta | 68.00 ± 0.10 | 92.25 ± 0.39 | 92.25 ± 0.39 |
SEA-LION v4 (Qwen VL) 8B AISG | 67.39 ± 0.10 | 93.62 ± 0.46 | 93.62 ± 0.46 |
Qwen 3 VL 8B Alibaba | 66.92 ± 0.09 | 91.46 ± 0.44 | 91.46 ± 0.44 |
Gemma 3 12B | 66.22 ± 0.06 | 92.38 ± 0.00 | 92.38 ± 0.00 |
Gemma 4 (E4B) 8B | 66.08 ± 0.14 | 89.24 ± 0.65 | 89.24 ± 0.65 |
SEA-LION v3 (Gemma 2) 9B AISG | 64.52 ± 0.20 | 89.78 ± 0.69 | 89.78 ± 0.69 |
Qwen 3.5 35B MoE Alibaba | 64.52 ± 0.17 | 90.00 ± 0.69 | 90.00 ± 0.69 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 63.14 ± 0.18 | 79.75 ± 0.74 | 79.75 ± 0.74 |
Qwen 3.5 9B Alibaba | 61.69 ± 0.22 | 85.87 ± 0.88 | 85.87 ± 0.88 |
Mistral Small 4 119B MoE Mistral AI | 61.61 ± 0.19 | 77.21 ± 0.85 | 77.21 ± 0.85 |
SEA-LION v4 (Qwen VL) 4B AISG | 61.32 ± 0.10 | 90.03 ± 0.43 | 90.03 ± 0.43 |
Qwen 3 VL 4B Alibaba | 60.83 ± 0.12 | 88.95 ± 0.56 | 88.95 ± 0.56 |
MERaLiON 2 10B A*STAR | 58.36 ± 0.14 | 72.16 ± 0.91 | 72.16 ± 0.91 |
SEA-LION v3 (Llama) 8B AISG | 58.03 ± 0.16 | 86.19 ± 0.75 | 86.19 ± 0.75 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.41 ± 0.10 | 88.92 ± 0.65 | 88.92 ± 0.65 |
Gemma 4 (E2B) 5B | 56.55 ± 0.11 | 84.73 ± 0.77 | 84.73 ± 0.77 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.39 ± 0.11 | 87.30 ± 0.58 | 87.30 ± 0.58 |
Gemma 3 4B | 54.52 ± 0.10 | 85.68 ± 0.69 | 85.68 ± 0.69 |
Qwen 3.5 4B Alibaba | 54.43 ± 0.22 | 80.51 ± 0.96 | 80.51 ± 0.96 |
Olmo 3.1 32B AI2 | 53.36 ± 0.14 | 82.57 ± 0.73 | 82.57 ± 0.73 |
Llama 3.1 8B Meta | 53.03 ± 0.18 | 77.65 ± 0.92 | 77.65 ± 0.92 |
GLM 4.7 Flash 30B MoE Z.ai | 50.35 ± 0.28 | 75.90 ± 1.20 | 75.90 ± 1.20 |
SEA-LION v4 (Apertus) 8B AISG | 47.36 ± 0.13 | 71.68 ± 0.71 | 71.68 ± 0.71 |
Apertus 8B Swiss AI | 38.87 ± 0.21 | 74.00 ± 1.08 | 74.00 ± 1.08 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 37.29 ± 0.25 | 73.49 ± 1.30 | 73.49 ± 1.30 |
MERaLiON 2 3B A*STAR | 37.24 ± 0.16 | 52.70 ± 1.23 | 52.70 ± 1.23 |
Tiny Aya Water 3B CohereLabs | 36.47 ± 0.26 | 69.81 ± 1.25 | 69.81 ± 1.25 |
Llama 3.2 3B Meta | 35.82 ± 0.23 | 61.17 ± 1.14 | 61.17 ± 1.14 |
Tiny Aya Global 3B CohereLabs | 34.82 ± 0.26 | 67.08 ± 1.27 | 67.08 ± 1.27 |
Model | ID | Knowledge | Global MMLU Lite |
|---|---|---|---|
Gemma 4 31B | 77.36 ± 0.07 | 78.20 ± 0.14 | 78.20 ± 0.14 |
SEA-LION v4.5 (Qwen) 27B AISG | 74.51 ± 0.13 | 79.51 ± 0.34 | 79.51 ± 0.34 |
Qwen 3.5 27B Alibaba | 74.38 ± 0.14 | 79.77 ± 0.33 | 79.77 ± 0.33 |
Qwen 3.5 122B MoE Alibaba | 74.18 ± 0.12 | 78.17 ± 0.44 | 78.17 ± 0.44 |
Qwen 3.6 27B Alibaba | 73.00 ± 0.15 | 76.63 ± 0.51 | 76.63 ± 0.51 |
Gemma 4 26B MoE | 72.95 ± 0.08 | 73.33 ± 0.17 | 73.33 ± 0.17 |
Qwen 3 VL 32B Alibaba | 71.79 ± 0.08 | 71.14 ± 0.20 | 71.14 ± 0.20 |
SEA-LION v4 (Qwen) 32B AISG | 70.91 ± 0.09 | 70.13 ± 0.15 | 70.13 ± 0.15 |
SEA-LION v3 (Llama) 70B AISG | 70.23 ± 0.14 | 74.68 ± 0.39 | 74.68 ± 0.39 |
Llama 3.3 70B Meta | 69.68 ± 0.08 | 73.16 ± 0.22 | 73.16 ± 0.22 |
Mistral Medium 3.5 128B Mistral AI | 69.30 ± 0.14 | 73.42 ± 0.38 | 73.42 ± 0.38 |
Qwen 3.6 35B MoE Alibaba | 68.88 ± 0.13 | 73.10 ± 0.46 | 73.10 ± 0.46 |
Gemma 3 27B | 68.35 ± 0.09 | 66.37 ± 0.23 | 66.37 ± 0.23 |
SEA-LION v4 (Gemma) 27B AISG | 68.27 ± 0.10 | 64.80 ± 0.17 | 64.80 ± 0.17 |
Llama 4 Scout 109B MoE Meta | 68.00 ± 0.10 | 66.58 ± 0.11 | 66.58 ± 0.11 |
SEA-LION v4 (Qwen VL) 8B AISG | 67.39 ± 0.10 | 59.79 ± 0.22 | 59.79 ± 0.22 |
Qwen 3 VL 8B Alibaba | 66.92 ± 0.09 | 60.17 ± 0.18 | 60.17 ± 0.18 |
Gemma 3 12B | 66.22 ± 0.06 | 61.30 ± 0.00 | 61.30 ± 0.00 |
Gemma 4 (E4B) 8B | 66.08 ± 0.14 | 58.91 ± 0.45 | 58.91 ± 0.45 |
SEA-LION v3 (Gemma 2) 9B AISG | 64.52 ± 0.20 | 58.25 ± 0.50 | 58.25 ± 0.50 |
Qwen 3.5 35B MoE Alibaba | 64.52 ± 0.17 | 51.13 ± 0.71 | 51.13 ± 0.71 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 63.14 ± 0.18 | 68.58 ± 0.63 | 68.58 ± 0.63 |
Qwen 3.5 9B Alibaba | 61.69 ± 0.22 | 59.41 ± 0.58 | 59.41 ± 0.58 |
Mistral Small 4 119B MoE Mistral AI | 61.61 ± 0.19 | 61.59 ± 0.48 | 61.59 ± 0.48 |
SEA-LION v4 (Qwen VL) 4B AISG | 61.32 ± 0.10 | 50.30 ± 0.16 | 50.30 ± 0.16 |
Qwen 3 VL 4B Alibaba | 60.83 ± 0.12 | 48.81 ± 0.21 | 48.81 ± 0.21 |
MERaLiON 2 10B A*STAR | 58.36 ± 0.14 | 51.24 ± 0.51 | 51.24 ± 0.51 |
SEA-LION v3 (Llama) 8B AISG | 58.03 ± 0.16 | 51.73 ± 0.55 | 51.73 ± 0.55 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.41 ± 0.10 | 44.54 ± 0.34 | 44.54 ± 0.34 |
Gemma 4 (E2B) 5B | 56.55 ± 0.11 | 44.10 ± 0.32 | 44.10 ± 0.32 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.39 ± 0.11 | 46.94 ± 0.20 | 46.94 ± 0.20 |
Gemma 3 4B | 54.52 ± 0.10 | 49.41 ± 0.13 | 49.41 ± 0.13 |
Qwen 3.5 4B Alibaba | 54.43 ± 0.22 | 50.74 ± 0.58 | 50.74 ± 0.58 |
Olmo 3.1 32B AI2 | 53.36 ± 0.14 | 46.81 ± 0.39 | 46.81 ± 0.39 |
Llama 3.1 8B Meta | 53.03 ± 0.18 | 41.85 ± 0.48 | 41.85 ± 0.48 |
GLM 4.7 Flash 30B MoE Z.ai | 50.35 ± 0.28 | 44.08 ± 0.76 | 44.08 ± 0.76 |
SEA-LION v4 (Apertus) 8B AISG | 47.36 ± 0.13 | 43.66 ± 0.38 | 43.66 ± 0.38 |
Apertus 8B Swiss AI | 38.87 ± 0.21 | 26.95 ± 0.91 | 26.95 ± 0.91 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 37.29 ± 0.25 | 34.11 ± 0.93 | 34.11 ± 0.93 |
MERaLiON 2 3B A*STAR | 37.24 ± 0.16 | 14.85 ± 0.56 | 14.85 ± 0.56 |
Tiny Aya Water 3B CohereLabs | 36.47 ± 0.26 | 34.87 ± 0.55 | 34.87 ± 0.55 |
Llama 3.2 3B Meta | 35.82 ± 0.23 | 14.26 ± 0.46 | 14.26 ± 0.46 |
Tiny Aya Global 3B CohereLabs | 34.82 ± 0.26 | 33.49 ± 0.68 | 33.49 ± 0.68 |
Model | ID | Linguistic Diagnostics | Syntax | Pragmatics | Syntax (LLM Judge) |
|---|---|---|---|---|---|
Gemma 4 31B | 77.36 ± 0.07 | 74.41 ± 0.22 | 67.33 ± 0.18 | 78.96 ± 0.46 | 76.93 ± 0.36 |
SEA-LION v4.5 (Qwen) 27B AISG | 74.51 ± 0.13 | 57.25 ± 0.43 | 46.12 ± 0.62 | 65.27 ± 0.87 | 60.37 ± 0.47 |
Qwen 3.5 27B Alibaba | 74.38 ± 0.14 | 60.11 ± 0.44 | 47.37 ± 0.80 | 67.44 ± 1.06 | 65.51 ± 0.51 |
Qwen 3.5 122B MoE Alibaba | 74.18 ± 0.12 | 62.18 ± 0.44 | 56.04 ± 0.69 | 65.67 ± 1.21 | 64.84 ± 0.41 |
Qwen 3.6 27B Alibaba | 73.00 ± 0.15 | 55.39 ± 0.53 | 44.79 ± 1.00 | 63.73 ± 1.35 | 57.64 ± 0.51 |
Gemma 4 26B MoE | 72.95 ± 0.08 | 60.06 ± 0.22 | 54.93 ± 0.38 | 56.25 ± 0.29 | 68.99 ± 0.41 |
Qwen 3 VL 32B Alibaba | 71.79 ± 0.08 | 60.43 ± 0.17 | 53.37 ± 0.29 | 69.18 ± 0.37 | 58.73 ± 0.33 |
SEA-LION v4 (Qwen) 32B AISG | 70.91 ± 0.09 | 56.88 ± 0.26 | 53.77 ± 0.30 | 59.66 ± 0.60 | 57.20 ± 0.30 |
SEA-LION v3 (Llama) 70B AISG | 70.23 ± 0.14 | 55.90 ± 0.48 | 41.00 ± 0.85 | 67.44 ± 0.86 | 59.26 ± 0.62 |
Llama 3.3 70B Meta | 69.68 ± 0.08 | 55.73 ± 0.18 | 36.88 ± 0.36 | 70.87 ± 0.48 | 59.44 ± 0.31 |
Mistral Medium 3.5 128B Mistral AI | 69.30 ± 0.14 | 50.20 ± 0.57 | 31.19 ± 0.67 | 57.68 ± 1.42 | 61.72 ± 0.46 |
Qwen 3.6 35B MoE Alibaba | 68.88 ± 0.13 | 53.90 ± 0.59 | 40.86 ± 0.85 | 66.02 ± 1.63 | 54.82 ± 0.50 |
Gemma 3 27B | 68.35 ± 0.09 | 55.64 ± 0.25 | 33.32 ± 0.47 | 70.79 ± 0.34 | 62.81 ± 0.36 |
SEA-LION v4 (Gemma) 27B AISG | 68.27 ± 0.10 | 53.95 ± 0.32 | 32.47 ± 0.55 | 67.57 ± 0.63 | 61.82 ± 0.48 |
Llama 4 Scout 109B MoE Meta | 68.00 ± 0.10 | 54.90 ± 0.23 | 37.86 ± 0.35 | 73.33 ± 0.36 | 53.51 ± 0.36 |
SEA-LION v4 (Qwen VL) 8B AISG | 67.39 ± 0.10 | 45.45 ± 0.24 | 29.00 ± 0.42 | 54.33 ± 0.69 | 53.01 ± 0.28 |
Qwen 3 VL 8B Alibaba | 66.92 ± 0.09 | 43.43 ± 0.23 | 25.40 ± 0.52 | 52.96 ± 0.47 | 51.93 ± 0.31 |
Gemma 3 12B | 66.22 ± 0.06 | 48.57 ± 0.11 | 38.42 ± 0.00 | 58.25 ± 0.00 | 49.04 ± 0.32 |
Gemma 4 (E4B) 8B | 66.08 ± 0.14 | 47.81 ± 0.43 | 31.91 ± 0.87 | 56.43 ± 0.65 | 55.10 ± 0.46 |
SEA-LION v3 (Gemma 2) 9B AISG | 64.52 ± 0.20 | 46.49 ± 0.36 | 25.44 ± 0.81 | 66.33 ± 0.44 | 47.71 ± 0.61 |
Qwen 3.5 35B MoE Alibaba | 64.52 ± 0.17 | 51.69 ± 0.51 | 41.11 ± 1.03 | 60.29 ± 1.04 | 53.67 ± 0.65 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 63.14 ± 0.18 | 41.25 ± 0.63 | 25.04 ± 1.12 | 45.40 ± 1.61 | 53.31 ± 0.66 |
Qwen 3.5 9B Alibaba | 61.69 ± 0.22 | 37.38 ± 0.80 | 21.21 ± 1.22 | 47.93 ± 1.85 | 43.00 ± 0.69 |
Mistral Small 4 119B MoE Mistral AI | 61.61 ± 0.19 | 44.85 ± 0.77 | 22.19 ± 1.24 | 55.96 ± 1.98 | 56.39 ± 0.72 |
SEA-LION v4 (Qwen VL) 4B AISG | 61.32 ± 0.10 | 39.89 ± 0.19 | 19.19 ± 0.28 | 56.46 ± 0.42 | 44.02 ± 0.26 |
Qwen 3 VL 4B Alibaba | 60.83 ± 0.12 | 43.56 ± 0.20 | 21.16 ± 0.38 | 61.81 ± 0.34 | 47.71 ± 0.21 |
MERaLiON 2 10B A*STAR | 58.36 ± 0.14 | 44.87 ± 0.54 | 30.11 ± 0.78 | 56.36 ± 0.87 | 48.15 ± 0.67 |
SEA-LION v3 (Llama) 8B AISG | 58.03 ± 0.16 | 28.00 ± 0.51 | 8.33 ± 1.23 | 31.92 ± 1.17 | 43.73 ± 0.34 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.41 ± 0.10 | 34.49 ± 0.18 | 11.58 ± 0.44 | 45.30 ± 0.28 | 46.61 ± 0.33 |
Gemma 4 (E2B) 5B | 56.55 ± 0.11 | 34.58 ± 0.30 | 11.53 ± 0.41 | 45.84 ± 0.61 | 46.37 ± 0.31 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.39 ± 0.11 | 28.11 ± 0.25 | 14.12 ± 0.48 | 24.68 ± 0.43 | 45.51 ± 0.27 |
Gemma 3 4B | 54.52 ± 0.10 | 26.88 ± 0.25 | 16.21 ± 0.38 | 20.15 ± 0.34 | 44.28 ± 0.38 |
Qwen 3.5 4B Alibaba | 54.43 ± 0.22 | 29.44 ± 0.84 | 11.96 ± 1.44 | 39.75 ± 1.83 | 36.62 ± 0.81 |
Olmo 3.1 32B AI2 | 53.36 ± 0.14 | 27.53 ± 0.44 | 13.77 ± 1.01 | 21.64 ± 0.92 | 47.18 ± 0.44 |
Llama 3.1 8B Meta | 53.03 ± 0.18 | 28.64 ± 0.72 | 10.30 ± 1.30 | 31.23 ± 1.24 | 44.40 ± 0.52 |
GLM 4.7 Flash 30B MoE Z.ai | 50.35 ± 0.28 | 25.93 ± 0.63 | 8.79 ± 1.29 | 36.93 ± 1.48 | 32.09 ± 0.68 |
SEA-LION v4 (Apertus) 8B AISG | 47.36 ± 0.13 | 27.69 ± 0.49 | 21.56 ± 1.26 | 23.87 ± 0.92 | 37.65 ± 0.36 |
Apertus 8B Swiss AI | 38.87 ± 0.21 | 21.49 ± 0.61 | 9.46 ± 1.66 | 18.71 ± 1.10 | 36.31 ± 0.58 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 37.29 ± 0.25 | 15.66 ± 0.52 | 0.02 ± 0.03 | 15.81 ± 1.63 | 31.16 ± 0.54 |
MERaLiON 2 3B A*STAR | 37.24 ± 0.16 | 17.26 ± 0.39 | 0.00 ± 0.00 | 18.65 ± 0.80 | 33.12 ± 0.74 |
Tiny Aya Water 3B CohereLabs | 36.47 ± 0.26 | 13.74 ± 0.76 | 3.84 ± 1.40 | 17.94 ± 1.50 | 19.43 ± 0.54 |
Llama 3.2 3B Meta | 35.82 ± 0.23 | 19.27 ± 0.43 | 2.58 ± 0.85 | 18.34 ± 0.81 | 36.88 ± 0.54 |
Tiny Aya Global 3B CohereLabs | 34.82 ± 0.26 | 13.02 ± 0.69 | 3.47 ± 1.13 | 19.07 ± 1.52 | 16.52 ± 0.63 |
Model | ID | Multi-Turn Chat | SEA-MT-Bench (LLM Judge) |
|---|---|---|---|
Gemma 4 31B | 77.36 ± 0.07 | 85.29 ± 0.28 | 85.29 ± 0.28 |
SEA-LION v4.5 (Qwen) 27B AISG | 74.51 ± 0.13 | 85.05 ± 0.43 | 85.05 ± 0.43 |
Qwen 3.5 27B Alibaba | 74.38 ± 0.14 | 84.74 ± 0.34 | 84.74 ± 0.34 |
Qwen 3.5 122B MoE Alibaba | 74.18 ± 0.12 | 85.47 ± 0.38 | 85.47 ± 0.38 |
Qwen 3.6 27B Alibaba | 73.00 ± 0.15 | 85.97 ± 0.33 | 85.97 ± 0.33 |
Gemma 4 26B MoE | 72.95 ± 0.08 | 84.38 ± 0.36 | 84.38 ± 0.36 |
Qwen 3 VL 32B Alibaba | 71.79 ± 0.08 | 81.32 ± 0.37 | 81.32 ± 0.37 |
SEA-LION v4 (Qwen) 32B AISG | 70.91 ± 0.09 | 79.27 ± 0.47 | 79.27 ± 0.47 |
SEA-LION v3 (Llama) 70B AISG | 70.23 ± 0.14 | 72.93 ± 0.45 | 72.93 ± 0.45 |
Llama 3.3 70B Meta | 69.68 ± 0.08 | 65.50 ± 0.54 | 65.50 ± 0.54 |
Mistral Medium 3.5 128B Mistral AI | 69.30 ± 0.14 | 82.85 ± 0.51 | 82.85 ± 0.51 |
Qwen 3.6 35B MoE Alibaba | 68.88 ± 0.13 | 84.41 ± 0.39 | 84.41 ± 0.39 |
Gemma 3 27B | 68.35 ± 0.09 | 77.49 ± 0.37 | 77.49 ± 0.37 |
SEA-LION v4 (Gemma) 27B AISG | 68.27 ± 0.10 | 78.13 ± 0.26 | 78.13 ± 0.26 |
Llama 4 Scout 109B MoE Meta | 68.00 ± 0.10 | 69.89 ± 0.54 | 69.89 ± 0.54 |
SEA-LION v4 (Qwen VL) 8B AISG | 67.39 ± 0.10 | 77.72 ± 0.41 | 77.72 ± 0.41 |
Qwen 3 VL 8B Alibaba | 66.92 ± 0.09 | 79.89 ± 0.35 | 79.89 ± 0.35 |
Gemma 3 12B | 66.22 ± 0.06 | 74.61 ± 0.45 | 74.61 ± 0.45 |
Gemma 4 (E4B) 8B | 66.08 ± 0.14 | 77.43 ± 0.43 | 77.43 ± 0.43 |
SEA-LION v3 (Gemma 2) 9B AISG | 64.52 ± 0.20 | 67.07 ± 0.60 | 67.07 ± 0.60 |
Qwen 3.5 35B MoE Alibaba | 64.52 ± 0.17 | 84.79 ± 0.38 | 84.79 ± 0.38 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 63.14 ± 0.18 | 84.77 ± 0.43 | 84.77 ± 0.43 |
Qwen 3.5 9B Alibaba | 61.69 ± 0.22 | 79.07 ± 0.42 | 79.07 ± 0.42 |
Mistral Small 4 119B MoE Mistral AI | 61.61 ± 0.19 | 76.99 ± 0.57 | 76.99 ± 0.57 |
SEA-LION v4 (Qwen VL) 4B AISG | 61.32 ± 0.10 | 74.48 ± 0.56 | 74.48 ± 0.56 |
Qwen 3 VL 4B Alibaba | 60.83 ± 0.12 | 73.90 ± 0.50 | 73.90 ± 0.50 |
MERaLiON 2 10B A*STAR | 58.36 ± 0.14 | 54.82 ± 0.42 | 54.82 ± 0.42 |
SEA-LION v3 (Llama) 8B AISG | 58.03 ± 0.16 | 62.76 ± 0.52 | 62.76 ± 0.52 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.41 ± 0.10 | 73.21 ± 0.47 | 73.21 ± 0.47 |
Gemma 4 (E2B) 5B | 56.55 ± 0.11 | 74.03 ± 0.43 | 74.03 ± 0.43 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.39 ± 0.11 | 68.46 ± 0.41 | 68.46 ± 0.41 |
Gemma 3 4B | 54.52 ± 0.10 | 67.53 ± 0.34 | 67.53 ± 0.34 |
Qwen 3.5 4B Alibaba | 54.43 ± 0.22 | 68.55 ± 0.63 | 68.55 ± 0.63 |
Olmo 3.1 32B AI2 | 53.36 ± 0.14 | 71.82 ± 0.60 | 71.82 ± 0.60 |
Llama 3.1 8B Meta | 53.03 ± 0.18 | 50.00 ± 0.52 | 50.00 ± 0.52 |
GLM 4.7 Flash 30B MoE Z.ai | 50.35 ± 0.28 | 57.62 ± 0.60 | 57.62 ± 0.60 |
SEA-LION v4 (Apertus) 8B AISG | 47.36 ± 0.13 | 50.07 ± 0.63 | 50.07 ± 0.63 |
Apertus 8B Swiss AI | 38.87 ± 0.21 | 43.92 ± 0.48 | 43.92 ± 0.48 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 37.29 ± 0.25 | 66.11 ± 0.61 | 66.11 ± 0.61 |
MERaLiON 2 3B A*STAR | 37.24 ± 0.16 | 31.08 ± 0.72 | 31.08 ± 0.72 |
Tiny Aya Water 3B CohereLabs | 36.47 ± 0.26 | 30.65 ± 0.74 | 30.65 ± 0.74 |
Llama 3.2 3B Meta | 35.82 ± 0.23 | 44.57 ± 0.51 | 44.57 ± 0.51 |
Tiny Aya Global 3B CohereLabs | 34.82 ± 0.26 | 29.68 ± 0.55 | 29.68 ± 0.55 |
Model | ID | NLG | Summarization | Translations |
|---|---|---|---|---|
Gemma 4 31B | 77.36 ± 0.07 | 56.12 ± 0.06 | 18.49 ± 0.11 | 93.76 ± 0.01 |
SEA-LION v4.5 (Qwen) 27B AISG | 74.51 ± 0.13 | 54.87 ± 0.06 | 16.43 ± 0.12 | 93.31 ± 0.02 |
Qwen 3.5 27B Alibaba | 74.38 ± 0.14 | 54.86 ± 0.06 | 16.34 ± 0.13 | 93.39 ± 0.02 |
Qwen 3.5 122B MoE Alibaba | 74.18 ± 0.12 | 54.72 ± 0.05 | 15.69 ± 0.09 | 93.74 ± 0.01 |
Qwen 3.6 27B Alibaba | 73.00 ± 0.15 | 54.38 ± 0.06 | 15.70 ± 0.12 | 93.06 ± 0.02 |
Gemma 4 26B MoE | 72.95 ± 0.08 | 56.06 ± 0.06 | 18.70 ± 0.12 | 93.43 ± 0.01 |
Qwen 3 VL 32B Alibaba | 71.79 ± 0.08 | 54.35 ± 0.03 | 15.73 ± 0.06 | 92.97 ± 0.01 |
SEA-LION v4 (Qwen) 32B AISG | 70.91 ± 0.09 | 54.87 ± 0.03 | 17.05 ± 0.07 | 92.69 ± 0.01 |
SEA-LION v3 (Llama) 70B AISG | 70.23 ± 0.14 | 56.00 ± 0.07 | 19.10 ± 0.13 | 92.90 ± 0.02 |
Llama 3.3 70B Meta | 69.68 ± 0.08 | 55.28 ± 0.06 | 18.74 ± 0.12 | 91.82 ± 0.01 |
Mistral Medium 3.5 128B Mistral AI | 69.30 ± 0.14 | 54.74 ± 0.05 | 16.79 ± 0.10 | 92.68 ± 0.03 |
Qwen 3.6 35B MoE Alibaba | 68.88 ± 0.13 | 54.61 ± 0.06 | 16.96 ± 0.12 | 92.25 ± 0.02 |
Gemma 3 27B | 68.35 ± 0.09 | 54.95 ± 0.04 | 16.31 ± 0.07 | 93.59 ± 0.01 |
SEA-LION v4 (Gemma) 27B AISG | 68.27 ± 0.10 | 54.82 ± 0.04 | 16.11 ± 0.09 | 93.53 ± 0.01 |
Llama 4 Scout 109B MoE Meta | 68.00 ± 0.10 | 55.38 ± 0.05 | 18.18 ± 0.10 | 92.58 ± 0.01 |
SEA-LION v4 (Qwen VL) 8B AISG | 67.39 ± 0.10 | 54.08 ± 0.03 | 16.28 ± 0.06 | 91.87 ± 0.01 |
Qwen 3 VL 8B Alibaba | 66.92 ± 0.09 | 53.60 ± 0.04 | 15.58 ± 0.08 | 91.62 ± 0.02 |
Gemma 3 12B | 66.22 ± 0.06 | 54.37 ± 0.00 | 15.48 ± 0.00 | 93.25 ± 0.00 |
Gemma 4 (E4B) 8B | 66.08 ± 0.14 | 54.41 ± 0.08 | 16.35 ± 0.15 | 92.46 ± 0.02 |
SEA-LION v3 (Gemma 2) 9B AISG | 64.52 ± 0.20 | 55.46 ± 0.07 | 18.27 ± 0.13 | 92.65 ± 0.02 |
Qwen 3.5 35B MoE Alibaba | 64.52 ± 0.17 | 54.16 ± 0.06 | 16.27 ± 0.12 | 92.05 ± 0.03 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 63.14 ± 0.18 | 51.93 ± 0.05 | 14.26 ± 0.09 | 89.61 ± 0.04 |
Qwen 3.5 9B Alibaba | 61.69 ± 0.22 | 52.39 ± 0.07 | 15.23 ± 0.11 | 89.55 ± 0.07 |
Mistral Small 4 119B MoE Mistral AI | 61.61 ± 0.19 | 54.01 ± 0.07 | 15.35 ± 0.14 | 92.67 ± 0.03 |
SEA-LION v4 (Qwen VL) 4B AISG | 61.32 ± 0.10 | 52.86 ± 0.04 | 15.55 ± 0.07 | 90.17 ± 0.03 |
Qwen 3 VL 4B Alibaba | 60.83 ± 0.12 | 52.46 ± 0.04 | 14.99 ± 0.08 | 89.93 ± 0.02 |
MERaLiON 2 10B A*STAR | 58.36 ± 0.14 | 55.31 ± 0.07 | 19.26 ± 0.14 | 91.36 ± 0.03 |
SEA-LION v3 (Llama) 8B AISG | 58.03 ± 0.16 | 54.48 ± 0.06 | 17.10 ± 0.11 | 91.87 ± 0.02 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.41 ± 0.10 | 53.80 ± 0.06 | 16.48 ± 0.11 | 91.12 ± 0.01 |
Gemma 4 (E2B) 5B | 56.55 ± 0.11 | 53.43 ± 0.06 | 15.81 ± 0.11 | 91.04 ± 0.01 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.39 ± 0.11 | 53.71 ± 0.05 | 15.71 ± 0.10 | 91.71 ± 0.02 |
Gemma 3 4B | 54.52 ± 0.10 | 53.45 ± 0.07 | 15.18 ± 0.13 | 91.73 ± 0.02 |
Qwen 3.5 4B Alibaba | 54.43 ± 0.22 | 49.20 ± 0.07 | 14.46 ± 0.13 | 83.93 ± 0.11 |
Olmo 3.1 32B AI2 | 53.36 ± 0.14 | 51.64 ± 0.05 | 15.06 ± 0.08 | 88.22 ± 0.04 |
Llama 3.1 8B Meta | 53.03 ± 0.18 | 53.98 ± 0.07 | 17.83 ± 0.15 | 90.13 ± 0.03 |
GLM 4.7 Flash 30B MoE Z.ai | 50.35 ± 0.28 | 51.00 ± 0.12 | 15.92 ± 0.19 | 86.08 ± 0.10 |
SEA-LION v4 (Apertus) 8B AISG | 47.36 ± 0.13 | 54.18 ± 0.04 | 17.01 ± 0.08 | 91.34 ± 0.03 |
Apertus 8B Swiss AI | 38.87 ± 0.21 | 53.30 ± 0.09 | 17.05 ± 0.16 | 89.56 ± 0.07 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 37.29 ± 0.25 | 46.16 ± 0.08 | 13.82 ± 0.15 | 78.50 ± 0.10 |
MERaLiON 2 3B A*STAR | 37.24 ± 0.16 | 47.53 ± 0.10 | 15.36 ± 0.17 | 79.70 ± 0.11 |
Tiny Aya Water 3B CohereLabs | 36.47 ± 0.26 | 40.73 ± 0.14 | 12.64 ± 0.16 | 68.81 ± 0.23 |
Llama 3.2 3B Meta | 35.82 ± 0.23 | 51.08 ± 0.06 | 16.97 ± 0.11 | 85.19 ± 0.06 |
Tiny Aya Global 3B CohereLabs | 34.82 ± 0.26 | 40.97 ± 0.16 | 12.13 ± 0.24 | 69.81 ± 0.20 |
Model | ID | NLR | Causal Reasoning | Natural Language Inference |
|---|---|---|---|---|
Gemma 4 31B | 77.36 ± 0.07 | 89.03 ± 0.04 | 95.71 ± 0.06 | 82.36 ± 0.06 |
SEA-LION v4.5 (Qwen) 27B AISG | 74.51 ± 0.13 | 88.34 ± 0.19 | 94.59 ± 0.23 | 82.09 ± 0.29 |
Qwen 3.5 27B Alibaba | 74.38 ± 0.14 | 86.80 ± 0.19 | 93.15 ± 0.23 | 80.45 ± 0.29 |
Qwen 3.5 122B MoE Alibaba | 74.18 ± 0.12 | 87.87 ± 0.15 | 93.89 ± 0.26 | 81.85 ± 0.17 |
Qwen 3.6 27B Alibaba | 73.00 ± 0.15 | 87.26 ± 0.28 | 93.49 ± 0.37 | 81.03 ± 0.35 |
Gemma 4 26B MoE | 72.95 ± 0.08 | 85.92 ± 0.11 | 92.55 ± 0.19 | 79.29 ± 0.10 |
Qwen 3 VL 32B Alibaba | 71.79 ± 0.08 | 87.41 ± 0.09 | 93.51 ± 0.14 | 81.32 ± 0.16 |
SEA-LION v4 (Qwen) 32B AISG | 70.91 ± 0.09 | 85.10 ± 0.07 | 91.85 ± 0.14 | 78.35 ± 0.07 |
SEA-LION v3 (Llama) 70B AISG | 70.23 ± 0.14 | 85.20 ± 0.23 | 92.03 ± 0.32 | 78.38 ± 0.30 |
Llama 3.3 70B Meta | 69.68 ± 0.08 | 85.58 ± 0.11 | 93.08 ± 0.11 | 78.08 ± 0.16 |
Mistral Medium 3.5 128B Mistral AI | 69.30 ± 0.14 | 78.96 ± 0.21 | 91.67 ± 0.33 | 66.25 ± 0.36 |
Qwen 3.6 35B MoE Alibaba | 68.88 ± 0.13 | 83.80 ± 0.32 | 91.83 ± 0.33 | 75.77 ± 0.45 |
Gemma 3 27B | 68.35 ± 0.09 | 81.80 ± 0.05 | 90.23 ± 0.07 | 73.37 ± 0.10 |
SEA-LION v4 (Gemma) 27B AISG | 68.27 ± 0.10 | 81.91 ± 0.11 | 90.11 ± 0.17 | 73.72 ± 0.13 |
Llama 4 Scout 109B MoE Meta | 68.00 ± 0.10 | 75.01 ± 0.06 | 90.43 ± 0.07 | 59.59 ± 0.12 |
SEA-LION v4 (Qwen VL) 8B AISG | 67.39 ± 0.10 | 80.52 ± 0.08 | 88.25 ± 0.12 | 72.79 ± 0.12 |
Qwen 3 VL 8B Alibaba | 66.92 ± 0.09 | 79.93 ± 0.08 | 88.15 ± 0.13 | 71.72 ± 0.10 |
Gemma 3 12B | 66.22 ± 0.06 | 81.97 ± 0.00 | 89.20 ± 0.00 | 74.73 ± 0.00 |
Gemma 4 (E4B) 8B | 66.08 ± 0.14 | 79.98 ± 0.18 | 86.75 ± 0.34 | 73.22 ± 0.18 |
SEA-LION v3 (Gemma 2) 9B AISG | 64.52 ± 0.20 | 81.53 ± 0.14 | 91.51 ± 0.19 | 71.56 ± 0.26 |
Qwen 3.5 35B MoE Alibaba | 64.52 ± 0.17 | 76.55 ± 0.32 | 80.07 ± 0.53 | 73.03 ± 0.30 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 63.14 ± 0.18 | 70.84 ± 0.47 | 72.48 ± 0.88 | 69.20 ± 0.45 |
Qwen 3.5 9B Alibaba | 61.69 ± 0.22 | 63.25 ± 0.43 | 83.08 ± 0.49 | 43.42 ± 0.56 |
Mistral Small 4 119B MoE Mistral AI | 61.61 ± 0.19 | 70.00 ± 0.42 | 81.57 ± 0.57 | 58.42 ± 0.52 |
SEA-LION v4 (Qwen VL) 4B AISG | 61.32 ± 0.10 | 75.64 ± 0.09 | 78.79 ± 0.20 | 72.50 ± 0.09 |
Qwen 3 VL 4B Alibaba | 60.83 ± 0.12 | 74.76 ± 0.11 | 78.79 ± 0.20 | 70.74 ± 0.21 |
MERaLiON 2 10B A*STAR | 58.36 ± 0.14 | 73.01 ± 0.23 | 87.27 ± 0.29 | 58.76 ± 0.35 |
SEA-LION v3 (Llama) 8B AISG | 58.03 ± 0.16 | 72.28 ± 0.36 | 81.49 ± 0.45 | 63.07 ± 0.54 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.41 ± 0.10 | 60.65 ± 0.19 | 67.05 ± 0.31 | 54.25 ± 0.18 |
Gemma 4 (E2B) 5B | 56.55 ± 0.11 | 59.09 ± 0.17 | 67.11 ± 0.30 | 51.08 ± 0.20 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.39 ± 0.11 | 65.24 ± 0.10 | 71.80 ± 0.18 | 58.69 ± 0.10 |
Gemma 3 4B | 54.52 ± 0.10 | 60.98 ± 0.09 | 66.52 ± 0.14 | 55.44 ± 0.11 |
Qwen 3.5 4B Alibaba | 54.43 ± 0.22 | 54.73 ± 0.60 | 74.91 ± 0.81 | 34.56 ± 0.71 |
Olmo 3.1 32B AI2 | 53.36 ± 0.14 | 66.99 ± 0.24 | 67.32 ± 0.36 | 66.65 ± 0.35 |
Llama 3.1 8B Meta | 53.03 ± 0.18 | 64.79 ± 0.34 | 75.97 ± 0.50 | 53.61 ± 0.42 |
GLM 4.7 Flash 30B MoE Z.ai | 50.35 ± 0.28 | 37.49 ± 0.76 | 50.93 ± 1.30 | 24.05 ± 0.85 |
SEA-LION v4 (Apertus) 8B AISG | 47.36 ± 0.13 | 38.36 ± 0.34 | 65.20 ± 0.59 | 11.53 ± 0.29 |
Apertus 8B Swiss AI | 38.87 ± 0.21 | 23.37 ± 0.73 | 39.71 ± 1.36 | 7.03 ± 0.63 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 37.29 ± 0.25 | 15.76 ± 0.72 | 20.61 ± 1.23 | 10.90 ± 0.84 |
MERaLiON 2 3B A*STAR | 37.24 ± 0.16 | 43.17 ± 0.35 | 61.93 ± 0.62 | 24.41 ± 0.55 |
Tiny Aya Water 3B CohereLabs | 36.47 ± 0.26 | 30.22 ± 0.53 | 49.83 ± 0.94 | 10.62 ± 0.90 |
Llama 3.2 3B Meta | 35.82 ± 0.23 | 16.54 ± 0.59 | 25.36 ± 1.06 | 7.71 ± 0.36 |
Tiny Aya Global 3B CohereLabs | 34.82 ± 0.26 | 26.85 ± 0.57 | 44.13 ± 1.11 | 9.56 ± 0.94 |
Model | ID | NLU | Metaphor Understanding | Question Answering | Sentiment Analysis |
|---|---|---|---|---|---|
Gemma 4 31B | 77.36 ± 0.07 | 81.98 ± 0.06 | 83.29 ± 0.12 | 83.81 ± 0.11 | 78.84 ± 0.09 |
SEA-LION v4.5 (Qwen) 27B AISG | 74.51 ± 0.13 | 79.84 ± 0.20 | 78.15 ± 0.43 | 85.64 ± 0.28 | 75.74 ± 0.26 |
Qwen 3.5 27B Alibaba | 74.38 ± 0.14 | 77.43 ± 0.21 | 75.98 ± 0.45 | 83.52 ± 0.38 | 72.79 ± 0.25 |
Qwen 3.5 122B MoE Alibaba | 74.18 ± 0.12 | 78.07 ± 0.23 | 76.50 ± 0.35 | 82.49 ± 0.33 | 75.21 ± 0.30 |
Qwen 3.6 27B Alibaba | 73.00 ± 0.15 | 79.83 ± 0.25 | 78.40 ± 0.58 | 82.70 ± 0.34 | 78.39 ± 0.35 |
Gemma 4 26B MoE | 72.95 ± 0.08 | 79.00 ± 0.13 | 75.80 ± 0.30 | 80.63 ± 0.21 | 80.56 ± 0.12 |
Qwen 3 VL 32B Alibaba | 71.79 ± 0.08 | 77.40 ± 0.10 | 75.00 ± 0.19 | 76.54 ± 0.18 | 80.67 ± 0.08 |
SEA-LION v4 (Qwen) 32B AISG | 70.91 ± 0.09 | 79.42 ± 0.09 | 78.46 ± 0.18 | 78.15 ± 0.18 | 81.64 ± 0.08 |
SEA-LION v3 (Llama) 70B AISG | 70.23 ± 0.14 | 75.96 ± 0.29 | 72.95 ± 0.45 | 79.06 ± 0.36 | 75.87 ± 0.48 |
Llama 3.3 70B Meta | 69.68 ± 0.08 | 76.22 ± 0.10 | 75.51 ± 0.21 | 79.52 ± 0.18 | 73.62 ± 0.16 |
Mistral Medium 3.5 128B Mistral AI | 69.30 ± 0.14 | 74.98 ± 0.21 | 71.80 ± 0.50 | 80.08 ± 0.33 | 73.05 ± 0.19 |
Qwen 3.6 35B MoE Alibaba | 68.88 ± 0.13 | 75.80 ± 0.32 | 68.17 ± 0.58 | 82.83 ± 0.51 | 76.41 ± 0.46 |
Gemma 3 27B | 68.35 ± 0.09 | 75.24 ± 0.11 | 69.04 ± 0.21 | 82.02 ± 0.24 | 74.66 ± 0.23 |
SEA-LION v4 (Gemma) 27B AISG | 68.27 ± 0.10 | 75.09 ± 0.15 | 69.18 ± 0.25 | 81.80 ± 0.27 | 74.30 ± 0.23 |
Llama 4 Scout 109B MoE Meta | 68.00 ± 0.10 | 76.12 ± 0.09 | 72.38 ± 0.12 | 83.94 ± 0.20 | 72.06 ± 0.12 |
SEA-LION v4 (Qwen VL) 8B AISG | 67.39 ± 0.10 | 73.33 ± 0.09 | 65.87 ± 0.22 | 81.12 ± 0.09 | 73.00 ± 0.13 |
Qwen 3 VL 8B Alibaba | 66.92 ± 0.09 | 74.01 ± 0.08 | 68.40 ± 0.14 | 80.64 ± 0.15 | 73.00 ± 0.11 |
Gemma 3 12B | 66.22 ± 0.06 | 75.53 ± 0.08 | 70.11 ± 0.00 | 81.96 ± 0.00 | 74.51 ± 0.24 |
Gemma 4 (E4B) 8B | 66.08 ± 0.14 | 73.39 ± 0.24 | 67.76 ± 0.33 | 75.43 ± 0.57 | 76.96 ± 0.37 |
SEA-LION v3 (Gemma 2) 9B AISG | 64.52 ± 0.20 | 75.06 ± 0.22 | 72.32 ± 0.41 | 79.35 ± 0.38 | 73.52 ± 0.28 |
Qwen 3.5 35B MoE Alibaba | 64.52 ± 0.17 | 73.10 ± 0.32 | 64.04 ± 0.80 | 80.55 ± 0.50 | 74.72 ± 0.48 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 63.14 ± 0.18 | 70.70 ± 0.33 | 63.58 ± 0.83 | 74.68 ± 0.53 | 73.84 ± 0.39 |
Qwen 3.5 9B Alibaba | 61.69 ± 0.22 | 71.16 ± 0.40 | 64.35 ± 0.84 | 73.41 ± 0.52 | 75.73 ± 0.54 |
Mistral Small 4 119B MoE Mistral AI | 61.61 ± 0.19 | 75.46 ± 0.30 | 66.85 ± 0.57 | 77.78 ± 0.54 | 81.75 ± 0.35 |
SEA-LION v4 (Qwen VL) 4B AISG | 61.32 ± 0.10 | 71.20 ± 0.10 | 63.59 ± 0.19 | 76.07 ± 0.16 | 73.94 ± 0.16 |
Qwen 3 VL 4B Alibaba | 60.83 ± 0.12 | 69.18 ± 0.11 | 57.00 ± 0.23 | 77.02 ± 0.26 | 73.53 ± 0.07 |
MERaLiON 2 10B A*STAR | 58.36 ± 0.14 | 71.15 ± 0.25 | 73.06 ± 0.40 | 66.15 ± 0.52 | 74.25 ± 0.30 |
SEA-LION v3 (Llama) 8B AISG | 58.03 ± 0.16 | 67.74 ± 0.38 | 60.93 ± 0.77 | 72.40 ± 0.50 | 69.90 ± 0.50 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.41 ± 0.10 | 64.11 ± 0.21 | 59.02 ± 0.44 | 60.92 ± 0.21 | 72.39 ± 0.31 |
Gemma 4 (E2B) 5B | 56.55 ± 0.11 | 63.82 ± 0.20 | 58.57 ± 0.40 | 60.70 ± 0.22 | 72.18 ± 0.21 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.39 ± 0.11 | 68.22 ± 0.10 | 54.86 ± 0.15 | 76.07 ± 0.20 | 73.71 ± 0.20 |
Gemma 3 4B | 54.52 ± 0.10 | 67.17 ± 0.13 | 54.96 ± 0.26 | 71.61 ± 0.34 | 74.95 ± 0.14 |
Qwen 3.5 4B Alibaba | 54.43 ± 0.22 | 66.96 ± 0.48 | 57.46 ± 0.98 | 69.76 ± 0.53 | 73.66 ± 0.57 |
Olmo 3.1 32B AI2 | 53.36 ± 0.14 | 62.86 ± 0.22 | 50.41 ± 0.55 | 64.48 ± 0.38 | 73.71 ± 0.29 |
Llama 3.1 8B Meta | 53.03 ± 0.18 | 67.92 ± 0.29 | 60.55 ± 0.57 | 77.27 ± 0.34 | 65.93 ± 0.51 |
GLM 4.7 Flash 30B MoE Z.ai | 50.35 ± 0.28 | 63.92 ± 0.52 | 47.21 ± 1.25 | 75.19 ± 0.64 | 69.36 ± 0.61 |
SEA-LION v4 (Apertus) 8B AISG | 47.36 ± 0.13 | 60.92 ± 0.25 | 48.39 ± 0.55 | 78.68 ± 0.42 | 55.70 ± 0.21 |
Apertus 8B Swiss AI | 38.87 ± 0.21 | 46.63 ± 0.53 | 22.52 ± 1.61 | 72.18 ± 0.77 | 45.18 ± 0.70 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 37.29 ± 0.25 | 38.55 ± 0.72 | 14.63 ± 1.65 | 64.68 ± 0.89 | 36.34 ± 0.75 |
MERaLiON 2 3B A*STAR | 37.24 ± 0.16 | 58.36 ± 0.33 | 49.48 ± 0.55 | 58.04 ± 0.67 | 67.54 ± 0.72 |
Tiny Aya Water 3B CohereLabs | 36.47 ± 0.26 | 46.48 ± 0.72 | 36.37 ± 1.62 | 59.77 ± 0.91 | 43.31 ± 0.98 |
Llama 3.2 3B Meta | 35.82 ± 0.23 | 50.34 ± 0.52 | 16.74 ± 1.52 | 65.39 ± 0.39 | 68.89 ± 0.58 |
Tiny Aya Global 3B CohereLabs | 34.82 ± 0.26 | 47.51 ± 0.66 | 32.07 ± 1.54 | 59.60 ± 0.93 | 50.84 ± 0.93 |
Model | ID | Safety | Toxicity Detection |
|---|---|---|---|
Gemma 4 31B | 77.36 ± 0.07 | 57.30 ± 0.17 | 57.30 ± 0.17 |
SEA-LION v4.5 (Qwen) 27B AISG | 74.51 ± 0.13 | 59.76 ± 0.37 | 59.76 ± 0.37 |
Qwen 3.5 27B Alibaba | 74.38 ± 0.14 | 56.82 ± 0.62 | 56.82 ± 0.62 |
Qwen 3.5 122B MoE Alibaba | 74.18 ± 0.12 | 56.50 ± 0.50 | 56.50 ± 0.50 |
Qwen 3.6 27B Alibaba | 73.00 ± 0.15 | 53.62 ± 0.69 | 53.62 ± 0.69 |
Gemma 4 26B MoE | 72.95 ± 0.08 | 51.71 ± 0.22 | 51.71 ± 0.22 |
Qwen 3 VL 32B Alibaba | 71.79 ± 0.08 | 53.55 ± 0.19 | 53.55 ± 0.19 |
SEA-LION v4 (Qwen) 32B AISG | 70.91 ± 0.09 | 52.20 ± 0.17 | 52.20 ± 0.17 |
SEA-LION v3 (Llama) 70B AISG | 70.23 ± 0.14 | 48.39 ± 0.60 | 48.39 ± 0.60 |
Llama 3.3 70B Meta | 69.68 ± 0.08 | 51.58 ± 0.30 | 51.58 ± 0.30 |
Mistral Medium 3.5 128B Mistral AI | 69.30 ± 0.14 | 52.66 ± 0.44 | 52.66 ± 0.44 |
Qwen 3.6 35B MoE Alibaba | 68.88 ± 0.13 | 37.22 ± 0.54 | 37.22 ± 0.54 |
Gemma 3 27B | 68.35 ± 0.09 | 47.56 ± 0.13 | 47.56 ± 0.13 |
SEA-LION v4 (Gemma) 27B AISG | 68.27 ± 0.10 | 48.25 ± 0.25 | 48.25 ± 0.25 |
Llama 4 Scout 109B MoE Meta | 68.00 ± 0.10 | 53.87 ± 0.17 | 53.87 ± 0.17 |
SEA-LION v4 (Qwen VL) 8B AISG | 67.39 ± 0.10 | 54.60 ± 0.26 | 54.60 ± 0.26 |
Qwen 3 VL 8B Alibaba | 66.92 ± 0.09 | 52.90 ± 0.25 | 52.90 ± 0.25 |
Gemma 3 12B | 66.22 ± 0.06 | 41.08 ± 0.00 | 41.08 ± 0.00 |
Gemma 4 (E4B) 8B | 66.08 ± 0.14 | 47.46 ± 0.32 | 47.46 ± 0.32 |
SEA-LION v3 (Gemma 2) 9B AISG | 64.52 ± 0.20 | 42.53 ± 0.61 | 42.53 ± 0.61 |
Qwen 3.5 35B MoE Alibaba | 64.52 ± 0.17 | 34.74 ± 0.81 | 34.74 ± 0.81 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 63.14 ± 0.18 | 37.27 ± 0.69 | 37.27 ± 0.69 |
Qwen 3.5 9B Alibaba | 61.69 ± 0.22 | 44.97 ± 0.72 | 44.97 ± 0.72 |
Mistral Small 4 119B MoE Mistral AI | 61.61 ± 0.19 | 32.81 ± 0.45 | 32.81 ± 0.45 |
SEA-LION v4 (Qwen VL) 4B AISG | 61.32 ± 0.10 | 36.15 ± 0.19 | 36.15 ± 0.19 |
Qwen 3 VL 4B Alibaba | 60.83 ± 0.12 | 34.97 ± 0.19 | 34.97 ± 0.19 |
MERaLiON 2 10B A*STAR | 58.36 ± 0.14 | 44.33 ± 0.50 | 44.33 ± 0.50 |
SEA-LION v3 (Llama) 8B AISG | 58.03 ± 0.16 | 41.07 ± 0.66 | 41.07 ± 0.66 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 57.41 ± 0.10 | 39.54 ± 0.24 | 39.54 ± 0.24 |
Gemma 4 (E2B) 5B | 56.55 ± 0.11 | 38.60 ± 0.20 | 38.60 ± 0.20 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.39 ± 0.11 | 33.14 ± 0.15 | 33.14 ± 0.15 |
Gemma 3 4B | 54.52 ± 0.10 | 25.08 ± 0.17 | 25.08 ± 0.17 |
Qwen 3.5 4B Alibaba | 54.43 ± 0.22 | 35.30 ± 0.66 | 35.30 ± 0.66 |
Olmo 3.1 32B AI2 | 53.36 ± 0.14 | 16.65 ± 0.35 | 16.65 ± 0.35 |
Llama 3.1 8B Meta | 53.03 ± 0.18 | 39.40 ± 0.52 | 39.40 ± 0.52 |
GLM 4.7 Flash 30B MoE Z.ai | 50.35 ± 0.28 | 46.83 ± 0.52 | 46.83 ± 0.52 |
SEA-LION v4 (Apertus) 8B AISG | 47.36 ± 0.13 | 32.32 ± 0.31 | 32.32 ± 0.31 |
Apertus 8B Swiss AI | 38.87 ± 0.21 | 21.34 ± 0.66 | 21.34 ± 0.66 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 37.29 ± 0.25 | 8.51 ± 0.74 | 8.51 ± 0.74 |
MERaLiON 2 3B A*STAR | 37.24 ± 0.16 | 32.99 ± 0.40 | 32.99 ± 0.40 |
Tiny Aya Water 3B CohereLabs | 36.47 ± 0.26 | 25.26 ± 0.97 | 25.26 ± 0.97 |
Llama 3.2 3B Meta | 35.82 ± 0.23 | 29.36 ± 0.32 | 29.36 ± 0.32 |
Tiny Aya Global 3B CohereLabs | 34.82 ± 0.26 | 19.97 ± 0.80 | 19.97 ± 0.80 |