Burmese Performance
Burmese Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
31B 67.29±0.08 |
26B MoE 61.39±0.12 |
27B 61.11±0.16 |
27B 59.35±0.19 |
122B MoE 59.23±0.20 |
32B 56.13±0.11 |
109B MoE 53.85±0.15 |
27B 53.70±0.25 |
8B 50.93±0.19 |
27B 50.53±0.20 |
27B 50.14±0.19 |
32B 47.65±0.14 |
12B 45.78±0 |
35B MoE 40.84±0.29 |
35B MoE 40.64±0.25 |
70B 38.82±0.42 |
128B 35.98±0.34 |
8B 35.79±0.19 |
5B 34.78±0.17 |
8B 34.40±0.22 |
5B 33.40±0.18 |
9B 30.82±0.39 |
4B 27.99±0.17 |
4B 27.07±0.16 |
70B 26.16±0.20 |
4B 24.36±0.21 |
4B 22.17±0.16 |
119B MoE 19.88±0.31 |
8B 19.12±0.27 |
8B 18.09±0.21 |
4B 16.24±0.22 |
9B 16.14±0.19 |
30B MoE 12.01±0.23 |
8B 10.09±0.22 |
10B 9.91±0.20 |
8B 9.81±0.22 |
32B 7.33±0.12 |
120B MoE 7.21±0.17 |
3B 5.69±0.17 |
3B 5.43±0.15 |
3B 5.37±0.18 |
3B 4.15±0.19 |
30B MoE 1.75±0.14 |
Burmese Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MY | Instruction Following | Knowledge | NLG | NLR | NLU | Safety |
|---|---|---|---|---|---|---|---|
Gemma 4 31B | 67.29 ± 0.08 | 78.93 ± 0.41 | 66.44 ± 0.17 | 57.91 ± 0.05 | 67.96 ± 0.11 | 62.69 ± 0.07 | 69.83 ± 0.18 |
Gemma 4 26B MoE | 61.39 ± 0.12 | 71.87 ± 0.49 | 53.95 ± 0.20 | 55.86 ± 0.09 | 62.07 ± 0.17 | 60.73 ± 0.22 | 63.85 ± 0.29 |
SEA-LION v4.5 (Qwen) 27B AISG | 61.11 ± 0.16 | 73.20 ± 0.71 | 58.12 ± 0.27 | 54.73 ± 0.05 | 59.26 ± 0.30 | 62.35 ± 0.15 | 59.02 ± 0.51 |
Qwen 3.5 27B Alibaba | 59.35 ± 0.19 | 72.90 ± 0.78 | 59.74 ± 0.41 | 53.48 ± 0.07 | 53.56 ± 0.43 | 61.68 ± 0.26 | 54.77 ± 0.58 |
Qwen 3.5 122B MoE Alibaba | 59.23 ± 0.20 | 74.37 ± 0.78 | 61.13 ± 0.53 | 54.06 ± 0.07 | 55.99 ± 0.37 | 58.17 ± 0.23 | 51.65 ± 0.53 |
SEA-LION v4 (Qwen) 32B AISG | 56.13 ± 0.11 | 64.50 ± 0.61 | 50.07 ± 0.19 | 45.36 ± 0.05 | 61.65 ± 0.15 | 60.95 ± 0.09 | 54.25 ± 0.20 |
Llama 4 Scout 109B MoE Meta | 53.85 ± 0.15 | 71.53 ± 0.71 | 46.77 ± 0.17 | 53.91 ± 0.10 | 51.14 ± 0.12 | 56.93 ± 0.13 | 42.83 ± 0.26 |
Qwen 3.6 27B Alibaba | 53.70 ± 0.25 | 71.87 ± 0.95 | 55.26 ± 0.58 | 50.08 ± 0.09 | 36.78 ± 0.54 | 59.16 ± 0.35 | 49.05 ± 0.73 |
Gemma 4 (E4B) 8B | 50.93 ± 0.19 | 62.60 ± 1.04 | 40.61 ± 0.45 | 47.94 ± 0.13 | 42.73 ± 0.53 | 59.74 ± 0.24 | 51.95 ± 0.34 |
Gemma 3 27B | 50.53 ± 0.20 | 65.93 ± 0.83 | 45.13 ± 0.30 | 49.03 ± 0.10 | 57.13 ± 0.24 | 57.66 ± 0.19 | 28.30 ± 0.34 |
SEA-LION v4 (Gemma) 27B AISG | 50.14 ± 0.19 | 67.23 ± 0.87 | 44.05 ± 0.33 | 48.35 ± 0.12 | 56.99 ± 0.32 | 57.86 ± 0.24 | 26.38 ± 0.36 |
Qwen 3 VL 32B Alibaba | 47.65 ± 0.14 | 66.93 ± 0.66 | 42.19 ± 0.27 | 28.88 ± 0.17 | 48.05 ± 0.24 | 60.59 ± 0.19 | 39.25 ± 0.41 |
Gemma 3 12B | 45.78 ± 0.00 | 60.00 ± 0.00 | 35.29 ± 0.00 | 44.27 ± 0.00 | 35.38 ± 0.00 | 58.76 ± 0.00 | 41.00 ± 0.00 |
Qwen 3.5 35B MoE Alibaba | 40.84 ± 0.29 | 63.90 ± 0.75 | 36.03 ± 0.72 | 22.18 ± 0.23 | 21.74 ± 0.85 | 55.14 ± 0.68 | 46.03 ± 0.81 |
Qwen 3.6 35B MoE Alibaba | 40.64 ± 0.25 | 63.90 ± 1.02 | 23.79 ± 0.78 | 40.77 ± 0.17 | 17.93 ± 0.44 | 56.95 ± 0.47 | 40.53 ± 0.78 |
SEA-LION v3 (Llama) 70B AISG | 38.82 ± 0.42 | 71.40 ± 0.97 | 27.43 ± 0.68 | 40.90 ± 0.14 | 40.51 ± 0.54 | 26.20 ± 0.75 | 26.45 ± 1.36 |
Mistral Medium 3.5 128B Mistral AI | 35.98 ± 0.34 | 48.43 ± 1.45 | 26.35 ± 0.65 | 31.07 ± 0.19 | 27.92 ± 0.73 | 49.54 ± 0.58 | 32.55 ± 1.03 |
SEA-LION v4 (Qwen VL) 8B AISG | 35.79 ± 0.19 | 43.83 ± 0.86 | 32.09 ± 0.31 | 30.43 ± 0.15 | 31.21 ± 0.26 | 49.88 ± 0.15 | 27.28 ± 0.31 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 34.78 ± 0.17 | 53.57 ± 0.73 | 24.40 ± 0.40 | 6.82 ± 0.00 | 34.32 ± 0.41 | 44.01 ± 0.31 | 45.53 ± 0.21 |
Qwen 3 VL 8B Alibaba | 34.40 ± 0.22 | 45.20 ± 1.20 | 28.12 ± 0.26 | 27.01 ± 0.13 | 34.52 ± 0.31 | 46.00 ± 0.22 | 25.55 ± 0.34 |
Gemma 4 (E2B) 5B | 33.40 ± 0.18 | 53.43 ± 0.90 | 23.13 ± 0.27 | 6.82 ± 0.00 | 32.85 ± 0.34 | 44.69 ± 0.24 | 39.48 ± 0.27 |
Qwen 3.5 9B Alibaba | 30.82 ± 0.39 | 55.60 ± 1.38 | 14.28 ± 0.59 | 28.39 ± 0.19 | 5.32 ± 0.18 | 38.77 ± 0.86 | 42.58 ± 0.89 |
SEA-LION v4 (Qwen VL) 4B AISG | 27.99 ± 0.17 | 41.23 ± 1.06 | 22.55 ± 0.25 | 17.86 ± 0.08 | 24.31 ± 0.18 | 38.13 ± 0.23 | 23.88 ± 0.49 |
SEA-LION v4 (Gemma VL) 4B AISG | 27.07 ± 0.16 | 57.67 ± 0.88 | 0.39 ± 0.06 | 40.00 ± 0.07 | 19.05 ± 0.14 | 45.28 ± 0.22 | 0.00 ± 0.00 |
Llama 3.3 70B Meta | 26.16 ± 0.20 | 63.73 ± 1.10 | 4.13 ± 0.26 | 44.25 ± 0.12 | 21.29 ± 0.16 | 0.30 ± 0.23 | 23.28 ± 0.63 |
Qwen 3 VL 4B Alibaba | 24.36 ± 0.21 | 41.43 ± 1.01 | 21.68 ± 0.42 | 17.03 ± 0.05 | 21.92 ± 0.35 | 39.18 ± 0.30 | 4.90 ± 0.43 |
Gemma 3 4B | 22.17 ± 0.16 | 53.37 ± 0.81 | 0.32 ± 0.04 | 31.68 ± 0.09 | 14.53 ± 0.15 | 33.12 ± 0.40 | 0.00 ± 0.00 |
Mistral Small 4 119B MoE Mistral AI | 19.88 ± 0.31 | 35.93 ± 0.93 | 4.49 ± 0.40 | 31.61 ± 0.16 | 8.44 ± 0.81 | 35.11 ± 0.76 | 3.73 ± 0.54 |
SEA-LION v3 (Llama) 8B AISG | 19.12 ± 0.27 | 53.37 ± 0.96 | 10.64 ± 0.62 | 9.40 ± 0.10 | 9.09 ± 0.28 | 24.28 ± 0.70 | 7.95 ± 1.09 |
SEA-LION v4 (Apertus) 8B AISG | 18.09 ± 0.21 | 33.80 ± 0.83 | 0.02 ± 0.02 | 34.27 ± 0.14 | 0.00 ± 0.00 | 14.62 ± 0.71 | 25.82 ± 0.74 |
Qwen 3.5 4B Alibaba | 16.24 ± 0.22 | 49.23 ± 0.91 | 3.10 ± 0.33 | 20.16 ± 0.17 | 1.20 ± 0.30 | 22.87 ± 0.88 | 0.87 ± 0.50 |
SEA-LION v3 (Gemma 2) 9B AISG | 16.14 ± 0.19 | 48.47 ± 0.93 | 0.27 ± 0.09 | 32.68 ± 0.20 | 11.02 ± 0.26 | 4.41 ± 0.64 | 0.00 ± 0.00 |
GLM 4.7 Flash 30B MoE Z.ai | 12.01 ± 0.23 | 37.57 ± 1.30 | 0.89 ± 0.16 | 23.37 ± 0.21 | 4.52 ± 0.32 | 5.69 ± 0.75 | 0.00 ± 0.00 |
Apertus 8B Swiss AI | 10.09 ± 0.22 | 28.93 ± 1.28 | 2.00 ± 0.26 | 26.48 ± 0.20 | 0.77 ± 0.36 | 2.37 ± 0.51 | 0.00 ± 0.00 |
MERaLiON 2 10B A*STAR | 9.91 ± 0.20 | 29.50 ± 1.06 | 1.08 ± 0.20 | 22.98 ± 0.09 | 5.92 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.1 8B Meta | 9.81 ± 0.22 | 32.87 ± 1.09 | 2.15 ± 0.27 | 20.77 ± 0.20 | 0.00 ± 0.00 | 3.09 ± 0.53 | 0.00 ± 0.00 |
Olmo 3.1 32B AI2 | 7.33 ± 0.12 | 29.83 ± 0.71 | 1.37 ± 0.20 | 12.75 ± 0.08 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 7.21 ± 0.17 | 31.17 ± 1.03 | 0.19 ± 0.05 | 11.91 ± 0.09 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tiny Aya Water 3B CohereLabs | 5.69 ± 0.17 | 24.23 ± 1.05 | 0.07 ± 0.04 | 9.82 ± 0.09 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
MERaLiON 2 3B A*STAR | 5.43 ± 0.15 | 20.23 ± 0.88 | 0.08 ± 0.03 | 12.28 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tiny Aya Global 3B CohereLabs | 5.37 ± 0.18 | 23.87 ± 1.10 | 0.16 ± 0.07 | 8.20 ± 0.09 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.2 3B Meta | 4.15 ± 0.19 | 12.90 ± 1.09 | 0.00 ± 0.00 | 12.01 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 1.75 ± 0.14 | 8.47 ± 0.84 | 0.00 ± 0.00 | 2.04 ± 0.06 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Burmese Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MY | Instruction Following | SEA-IFEval |
|---|---|---|---|
Gemma 4 31B | 67.29 ± 0.08 | 78.93 ± 0.41 | 78.93 ± 0.41 |
Gemma 4 26B MoE | 61.39 ± 0.12 | 71.87 ± 0.49 | 71.87 ± 0.49 |
SEA-LION v4.5 (Qwen) 27B AISG | 61.11 ± 0.16 | 73.20 ± 0.71 | 73.20 ± 0.71 |
Qwen 3.5 27B Alibaba | 59.35 ± 0.19 | 72.90 ± 0.78 | 72.90 ± 0.78 |
Qwen 3.5 122B MoE Alibaba | 59.23 ± 0.20 | 74.37 ± 0.78 | 74.37 ± 0.78 |
SEA-LION v4 (Qwen) 32B AISG | 56.13 ± 0.11 | 64.50 ± 0.61 | 64.50 ± 0.61 |
Llama 4 Scout 109B MoE Meta | 53.85 ± 0.15 | 71.53 ± 0.71 | 71.53 ± 0.71 |
Qwen 3.6 27B Alibaba | 53.70 ± 0.25 | 71.87 ± 0.95 | 71.87 ± 0.95 |
Gemma 4 (E4B) 8B | 50.93 ± 0.19 | 62.60 ± 1.04 | 62.60 ± 1.04 |
Gemma 3 27B | 50.53 ± 0.20 | 65.93 ± 0.83 | 65.93 ± 0.83 |
SEA-LION v4 (Gemma) 27B AISG | 50.14 ± 0.19 | 67.23 ± 0.87 | 67.23 ± 0.87 |
Qwen 3 VL 32B Alibaba | 47.65 ± 0.14 | 66.93 ± 0.66 | 66.93 ± 0.66 |
Gemma 3 12B | 45.78 ± 0.00 | 60.00 ± 0.00 | 60.00 ± 0.00 |
Qwen 3.5 35B MoE Alibaba | 40.84 ± 0.29 | 63.90 ± 0.75 | 63.90 ± 0.75 |
Qwen 3.6 35B MoE Alibaba | 40.64 ± 0.25 | 63.90 ± 1.02 | 63.90 ± 1.02 |
SEA-LION v3 (Llama) 70B AISG | 38.82 ± 0.42 | 71.40 ± 0.97 | 71.40 ± 0.97 |
Mistral Medium 3.5 128B Mistral AI | 35.98 ± 0.34 | 48.43 ± 1.45 | 48.43 ± 1.45 |
SEA-LION v4 (Qwen VL) 8B AISG | 35.79 ± 0.19 | 43.83 ± 0.86 | 43.83 ± 0.86 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 34.78 ± 0.17 | 53.57 ± 0.73 | 53.57 ± 0.73 |
Qwen 3 VL 8B Alibaba | 34.40 ± 0.22 | 45.20 ± 1.20 | 45.20 ± 1.20 |
Gemma 4 (E2B) 5B | 33.40 ± 0.18 | 53.43 ± 0.90 | 53.43 ± 0.90 |
Qwen 3.5 9B Alibaba | 30.82 ± 0.39 | 55.60 ± 1.38 | 55.60 ± 1.38 |
SEA-LION v4 (Qwen VL) 4B AISG | 27.99 ± 0.17 | 41.23 ± 1.06 | 41.23 ± 1.06 |
SEA-LION v4 (Gemma VL) 4B AISG | 27.07 ± 0.16 | 57.67 ± 0.88 | 57.67 ± 0.88 |
Llama 3.3 70B Meta | 26.16 ± 0.20 | 63.73 ± 1.10 | 63.73 ± 1.10 |
Qwen 3 VL 4B Alibaba | 24.36 ± 0.21 | 41.43 ± 1.01 | 41.43 ± 1.01 |
Gemma 3 4B | 22.17 ± 0.16 | 53.37 ± 0.81 | 53.37 ± 0.81 |
Mistral Small 4 119B MoE Mistral AI | 19.88 ± 0.31 | 35.93 ± 0.93 | 35.93 ± 0.93 |
SEA-LION v3 (Llama) 8B AISG | 19.12 ± 0.27 | 53.37 ± 0.96 | 53.37 ± 0.96 |
SEA-LION v4 (Apertus) 8B AISG | 18.09 ± 0.21 | 33.80 ± 0.83 | 33.80 ± 0.83 |
Qwen 3.5 4B Alibaba | 16.24 ± 0.22 | 49.23 ± 0.91 | 49.23 ± 0.91 |
SEA-LION v3 (Gemma 2) 9B AISG | 16.14 ± 0.19 | 48.47 ± 0.93 | 48.47 ± 0.93 |
GLM 4.7 Flash 30B MoE Z.ai | 12.01 ± 0.23 | 37.57 ± 1.30 | 37.57 ± 1.30 |
Apertus 8B Swiss AI | 10.09 ± 0.22 | 28.93 ± 1.28 | 28.93 ± 1.28 |
MERaLiON 2 10B A*STAR | 9.91 ± 0.20 | 29.50 ± 1.06 | 29.50 ± 1.06 |
Llama 3.1 8B Meta | 9.81 ± 0.22 | 32.87 ± 1.09 | 32.87 ± 1.09 |
Olmo 3.1 32B AI2 | 7.33 ± 0.12 | 29.83 ± 0.71 | 29.83 ± 0.71 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 7.21 ± 0.17 | 31.17 ± 1.03 | 31.17 ± 1.03 |
Tiny Aya Water 3B CohereLabs | 5.69 ± 0.17 | 24.23 ± 1.05 | 24.23 ± 1.05 |
MERaLiON 2 3B A*STAR | 5.43 ± 0.15 | 20.23 ± 0.88 | 20.23 ± 0.88 |
Tiny Aya Global 3B CohereLabs | 5.37 ± 0.18 | 23.87 ± 1.10 | 23.87 ± 1.10 |
Llama 3.2 3B Meta | 4.15 ± 0.19 | 12.90 ± 1.09 | 12.90 ± 1.09 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 1.75 ± 0.14 | 8.47 ± 0.84 | 8.47 ± 0.84 |
Model | MY | Knowledge | Global MMLU Lite |
|---|---|---|---|
Gemma 4 31B | 67.29 ± 0.08 | 66.44 ± 0.17 | 66.44 ± 0.17 |
Gemma 4 26B MoE | 61.39 ± 0.12 | 53.95 ± 0.20 | 53.95 ± 0.20 |
SEA-LION v4.5 (Qwen) 27B AISG | 61.11 ± 0.16 | 58.12 ± 0.27 | 58.12 ± 0.27 |
Qwen 3.5 27B Alibaba | 59.35 ± 0.19 | 59.74 ± 0.41 | 59.74 ± 0.41 |
Qwen 3.5 122B MoE Alibaba | 59.23 ± 0.20 | 61.13 ± 0.53 | 61.13 ± 0.53 |
SEA-LION v4 (Qwen) 32B AISG | 56.13 ± 0.11 | 50.07 ± 0.19 | 50.07 ± 0.19 |
Llama 4 Scout 109B MoE Meta | 53.85 ± 0.15 | 46.77 ± 0.17 | 46.77 ± 0.17 |
Qwen 3.6 27B Alibaba | 53.70 ± 0.25 | 55.26 ± 0.58 | 55.26 ± 0.58 |
Gemma 4 (E4B) 8B | 50.93 ± 0.19 | 40.61 ± 0.45 | 40.61 ± 0.45 |
Gemma 3 27B | 50.53 ± 0.20 | 45.13 ± 0.30 | 45.13 ± 0.30 |
SEA-LION v4 (Gemma) 27B AISG | 50.14 ± 0.19 | 44.05 ± 0.33 | 44.05 ± 0.33 |
Qwen 3 VL 32B Alibaba | 47.65 ± 0.14 | 42.19 ± 0.27 | 42.19 ± 0.27 |
Gemma 3 12B | 45.78 ± 0.00 | 35.29 ± 0.00 | 35.29 ± 0.00 |
Qwen 3.5 35B MoE Alibaba | 40.84 ± 0.29 | 36.03 ± 0.72 | 36.03 ± 0.72 |
Qwen 3.6 35B MoE Alibaba | 40.64 ± 0.25 | 23.79 ± 0.78 | 23.79 ± 0.78 |
SEA-LION v3 (Llama) 70B AISG | 38.82 ± 0.42 | 27.43 ± 0.68 | 27.43 ± 0.68 |
Mistral Medium 3.5 128B Mistral AI | 35.98 ± 0.34 | 26.35 ± 0.65 | 26.35 ± 0.65 |
SEA-LION v4 (Qwen VL) 8B AISG | 35.79 ± 0.19 | 32.09 ± 0.31 | 32.09 ± 0.31 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 34.78 ± 0.17 | 24.40 ± 0.40 | 24.40 ± 0.40 |
Qwen 3 VL 8B Alibaba | 34.40 ± 0.22 | 28.12 ± 0.26 | 28.12 ± 0.26 |
Gemma 4 (E2B) 5B | 33.40 ± 0.18 | 23.13 ± 0.27 | 23.13 ± 0.27 |
Qwen 3.5 9B Alibaba | 30.82 ± 0.39 | 14.28 ± 0.59 | 14.28 ± 0.59 |
SEA-LION v4 (Qwen VL) 4B AISG | 27.99 ± 0.17 | 22.55 ± 0.25 | 22.55 ± 0.25 |
SEA-LION v4 (Gemma VL) 4B AISG | 27.07 ± 0.16 | 0.39 ± 0.06 | 0.39 ± 0.06 |
Llama 3.3 70B Meta | 26.16 ± 0.20 | 4.13 ± 0.26 | 4.13 ± 0.26 |
Qwen 3 VL 4B Alibaba | 24.36 ± 0.21 | 21.68 ± 0.42 | 21.68 ± 0.42 |
Gemma 3 4B | 22.17 ± 0.16 | 0.32 ± 0.04 | 0.32 ± 0.04 |
Mistral Small 4 119B MoE Mistral AI | 19.88 ± 0.31 | 4.49 ± 0.40 | 4.49 ± 0.40 |
SEA-LION v3 (Llama) 8B AISG | 19.12 ± 0.27 | 10.64 ± 0.62 | 10.64 ± 0.62 |
SEA-LION v4 (Apertus) 8B AISG | 18.09 ± 0.21 | 0.02 ± 0.02 | 0.02 ± 0.02 |
Qwen 3.5 4B Alibaba | 16.24 ± 0.22 | 3.10 ± 0.33 | 3.10 ± 0.33 |
SEA-LION v3 (Gemma 2) 9B AISG | 16.14 ± 0.19 | 0.27 ± 0.09 | 0.27 ± 0.09 |
GLM 4.7 Flash 30B MoE Z.ai | 12.01 ± 0.23 | 0.89 ± 0.16 | 0.89 ± 0.16 |
Apertus 8B Swiss AI | 10.09 ± 0.22 | 2.00 ± 0.26 | 2.00 ± 0.26 |
MERaLiON 2 10B A*STAR | 9.91 ± 0.20 | 1.08 ± 0.20 | 1.08 ± 0.20 |
Llama 3.1 8B Meta | 9.81 ± 0.22 | 2.15 ± 0.27 | 2.15 ± 0.27 |
Olmo 3.1 32B AI2 | 7.33 ± 0.12 | 1.37 ± 0.20 | 1.37 ± 0.20 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 7.21 ± 0.17 | 0.19 ± 0.05 | 0.19 ± 0.05 |
Tiny Aya Water 3B CohereLabs | 5.69 ± 0.17 | 0.07 ± 0.04 | 0.07 ± 0.04 |
MERaLiON 2 3B A*STAR | 5.43 ± 0.15 | 0.08 ± 0.03 | 0.08 ± 0.03 |
Tiny Aya Global 3B CohereLabs | 5.37 ± 0.18 | 0.16 ± 0.07 | 0.16 ± 0.07 |
Llama 3.2 3B Meta | 4.15 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 1.75 ± 0.14 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | MY | NLG | Summarization | Translations |
|---|---|---|---|---|
Gemma 4 31B | 67.29 ± 0.08 | 57.91 ± 0.05 | 27.35 ± 0.10 | 88.48 ± 0.02 |
Gemma 4 26B MoE | 61.39 ± 0.12 | 55.86 ± 0.09 | 24.37 ± 0.19 | 87.34 ± 0.03 |
SEA-LION v4.5 (Qwen) 27B AISG | 61.11 ± 0.16 | 54.73 ± 0.05 | 24.09 ± 0.10 | 85.38 ± 0.04 |
Qwen 3.5 27B Alibaba | 59.35 ± 0.19 | 53.48 ± 0.07 | 24.15 ± 0.14 | 82.81 ± 0.07 |
Qwen 3.5 122B MoE Alibaba | 59.23 ± 0.20 | 54.06 ± 0.07 | 23.37 ± 0.14 | 84.75 ± 0.05 |
SEA-LION v4 (Qwen) 32B AISG | 56.13 ± 0.11 | 45.36 ± 0.05 | 23.53 ± 0.07 | 67.20 ± 0.09 |
Llama 4 Scout 109B MoE Meta | 53.85 ± 0.15 | 53.91 ± 0.10 | 25.86 ± 0.19 | 81.96 ± 0.04 |
Qwen 3.6 27B Alibaba | 53.70 ± 0.25 | 50.08 ± 0.09 | 23.44 ± 0.14 | 76.73 ± 0.09 |
Gemma 4 (E4B) 8B | 50.93 ± 0.19 | 47.94 ± 0.13 | 20.01 ± 0.20 | 75.87 ± 0.17 |
Gemma 3 27B | 50.53 ± 0.20 | 49.03 ± 0.10 | 19.13 ± 0.20 | 78.94 ± 0.06 |
SEA-LION v4 (Gemma) 27B AISG | 50.14 ± 0.19 | 48.35 ± 0.12 | 17.55 ± 0.23 | 79.15 ± 0.05 |
Qwen 3 VL 32B Alibaba | 47.65 ± 0.14 | 28.88 ± 0.17 | 8.34 ± 0.30 | 49.41 ± 0.17 |
Gemma 3 12B | 45.78 ± 0.00 | 44.27 ± 0.00 | 20.66 ± 0.00 | 67.89 ± 0.00 |
Qwen 3.5 35B MoE Alibaba | 40.84 ± 0.29 | 22.18 ± 0.23 | 6.91 ± 0.35 | 37.45 ± 0.29 |
Qwen 3.6 35B MoE Alibaba | 40.64 ± 0.25 | 40.77 ± 0.17 | 13.73 ± 0.30 | 67.81 ± 0.17 |
SEA-LION v3 (Llama) 70B AISG | 38.82 ± 0.42 | 40.90 ± 0.14 | 23.33 ± 0.25 | 58.47 ± 0.20 |
Mistral Medium 3.5 128B Mistral AI | 35.98 ± 0.34 | 31.07 ± 0.19 | 17.09 ± 0.26 | 45.05 ± 0.19 |
SEA-LION v4 (Qwen VL) 8B AISG | 35.79 ± 0.19 | 30.43 ± 0.15 | 12.04 ± 0.21 | 48.82 ± 0.17 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 34.78 ± 0.17 | 6.82 ± 0.00 | 0.55 ± 0.00 | 13.09 ± 0.00 |
Qwen 3 VL 8B Alibaba | 34.40 ± 0.22 | 27.01 ± 0.13 | 3.39 ± 0.23 | 50.64 ± 0.12 |
Gemma 4 (E2B) 5B | 33.40 ± 0.18 | 6.82 ± 0.00 | 0.55 ± 0.00 | 13.09 ± 0.00 |
Qwen 3.5 9B Alibaba | 30.82 ± 0.39 | 28.39 ± 0.19 | 11.73 ± 0.30 | 45.05 ± 0.25 |
SEA-LION v4 (Qwen VL) 4B AISG | 27.99 ± 0.17 | 17.86 ± 0.08 | 1.92 ± 0.12 | 33.79 ± 0.11 |
SEA-LION v4 (Gemma VL) 4B AISG | 27.07 ± 0.16 | 40.00 ± 0.07 | 19.18 ± 0.13 | 60.82 ± 0.08 |
Llama 3.3 70B Meta | 26.16 ± 0.20 | 44.25 ± 0.12 | 28.35 ± 0.21 | 60.16 ± 0.16 |
Qwen 3 VL 4B Alibaba | 24.36 ± 0.21 | 17.03 ± 0.05 | 0.55 ± 0.00 | 33.50 ± 0.09 |
Gemma 3 4B | 22.17 ± 0.16 | 31.68 ± 0.09 | 20.24 ± 0.14 | 43.12 ± 0.11 |
Mistral Small 4 119B MoE Mistral AI | 19.88 ± 0.31 | 31.61 ± 0.16 | 15.94 ± 0.22 | 47.28 ± 0.23 |
SEA-LION v3 (Llama) 8B AISG | 19.12 ± 0.27 | 9.40 ± 0.10 | 3.56 ± 0.18 | 15.24 ± 0.14 |
SEA-LION v4 (Apertus) 8B AISG | 18.09 ± 0.21 | 34.27 ± 0.14 | 5.73 ± 0.15 | 62.81 ± 0.19 |
Qwen 3.5 4B Alibaba | 16.24 ± 0.22 | 20.16 ± 0.17 | 15.76 ± 0.29 | 24.55 ± 0.18 |
SEA-LION v3 (Gemma 2) 9B AISG | 16.14 ± 0.19 | 32.68 ± 0.20 | 12.55 ± 0.39 | 52.80 ± 0.12 |
GLM 4.7 Flash 30B MoE Z.ai | 12.01 ± 0.23 | 23.37 ± 0.21 | 6.53 ± 0.37 | 40.21 ± 0.21 |
Apertus 8B Swiss AI | 10.09 ± 0.22 | 26.48 ± 0.20 | 12.18 ± 0.33 | 40.78 ± 0.22 |
MERaLiON 2 10B A*STAR | 9.91 ± 0.20 | 22.98 ± 0.09 | 13.56 ± 0.14 | 32.40 ± 0.15 |
Llama 3.1 8B Meta | 9.81 ± 0.22 | 20.77 ± 0.20 | 22.05 ± 0.35 | 19.49 ± 0.13 |
Olmo 3.1 32B AI2 | 7.33 ± 0.12 | 12.75 ± 0.08 | 18.53 ± 0.14 | 6.97 ± 0.10 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 7.21 ± 0.17 | 11.91 ± 0.09 | 2.08 ± 0.15 | 21.75 ± 0.13 |
Tiny Aya Water 3B CohereLabs | 5.69 ± 0.17 | 9.82 ± 0.09 | 8.74 ± 0.11 | 10.90 ± 0.12 |
MERaLiON 2 3B A*STAR | 5.43 ± 0.15 | 12.28 ± 0.12 | 8.71 ± 0.20 | 15.85 ± 0.16 |
Tiny Aya Global 3B CohereLabs | 5.37 ± 0.18 | 8.20 ± 0.09 | 7.25 ± 0.12 | 9.15 ± 0.13 |
Llama 3.2 3B Meta | 4.15 ± 0.19 | 12.01 ± 0.12 | 17.11 ± 0.21 | 6.91 ± 0.06 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 1.75 ± 0.14 | 2.04 ± 0.06 | 0.86 ± 0.04 | 3.23 ± 0.12 |
Model | MY | NLR | Causal Reasoning | Natural Language Inference |
|---|---|---|---|---|
Gemma 4 31B | 67.29 ± 0.08 | 67.96 ± 0.11 | 76.83 ± 0.23 | 59.09 ± 0.15 |
Gemma 4 26B MoE | 61.39 ± 0.12 | 62.07 ± 0.17 | 69.80 ± 0.31 | 54.35 ± 0.19 |
SEA-LION v4.5 (Qwen) 27B AISG | 61.11 ± 0.16 | 59.26 ± 0.30 | 70.60 ± 0.60 | 47.92 ± 0.31 |
Qwen 3.5 27B Alibaba | 59.35 ± 0.19 | 53.56 ± 0.43 | 60.95 ± 0.80 | 46.17 ± 0.35 |
Qwen 3.5 122B MoE Alibaba | 59.23 ± 0.20 | 55.99 ± 0.37 | 65.32 ± 0.71 | 46.67 ± 0.38 |
SEA-LION v4 (Qwen) 32B AISG | 56.13 ± 0.11 | 61.65 ± 0.15 | 65.90 ± 0.22 | 57.39 ± 0.21 |
Llama 4 Scout 109B MoE Meta | 53.85 ± 0.15 | 51.14 ± 0.12 | 69.32 ± 0.14 | 32.97 ± 0.18 |
Qwen 3.6 27B Alibaba | 53.70 ± 0.25 | 36.78 ± 0.54 | 61.58 ± 0.86 | 11.97 ± 0.69 |
Gemma 4 (E4B) 8B | 50.93 ± 0.19 | 42.73 ± 0.53 | 53.18 ± 1.03 | 32.27 ± 0.25 |
Gemma 3 27B | 50.53 ± 0.20 | 57.13 ± 0.24 | 58.82 ± 0.47 | 55.45 ± 0.17 |
SEA-LION v4 (Gemma) 27B AISG | 50.14 ± 0.19 | 56.99 ± 0.32 | 59.18 ± 0.46 | 54.79 ± 0.29 |
Qwen 3 VL 32B Alibaba | 47.65 ± 0.14 | 48.05 ± 0.24 | 50.30 ± 0.41 | 45.81 ± 0.23 |
Gemma 3 12B | 45.78 ± 0.00 | 35.38 ± 0.00 | 45.00 ± 0.00 | 25.75 ± 0.00 |
Qwen 3.5 35B MoE Alibaba | 40.84 ± 0.29 | 21.74 ± 0.85 | 9.67 ± 1.49 | 33.82 ± 0.89 |
Qwen 3.6 35B MoE Alibaba | 40.64 ± 0.25 | 17.93 ± 0.44 | 0.15 ± 0.29 | 35.70 ± 0.79 |
SEA-LION v3 (Llama) 70B AISG | 38.82 ± 0.42 | 40.51 ± 0.54 | 36.70 ± 0.97 | 44.32 ± 0.50 |
Mistral Medium 3.5 128B Mistral AI | 35.98 ± 0.34 | 27.92 ± 0.73 | 39.25 ± 1.07 | 16.59 ± 0.70 |
SEA-LION v4 (Qwen VL) 8B AISG | 35.79 ± 0.19 | 31.21 ± 0.26 | 41.08 ± 0.49 | 21.33 ± 0.23 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 34.78 ± 0.17 | 34.32 ± 0.41 | 36.78 ± 0.83 | 31.86 ± 0.27 |
Qwen 3 VL 8B Alibaba | 34.40 ± 0.22 | 34.52 ± 0.31 | 36.50 ± 0.48 | 32.54 ± 0.34 |
Gemma 4 (E2B) 5B | 33.40 ± 0.18 | 32.85 ± 0.34 | 34.77 ± 0.67 | 30.93 ± 0.28 |
Qwen 3.5 9B Alibaba | 30.82 ± 0.39 | 5.32 ± 0.18 | 0.00 ± 0.00 | 10.63 ± 0.36 |
SEA-LION v4 (Qwen VL) 4B AISG | 27.99 ± 0.17 | 24.31 ± 0.18 | 27.17 ± 0.37 | 21.46 ± 0.14 |
SEA-LION v4 (Gemma VL) 4B AISG | 27.07 ± 0.16 | 19.05 ± 0.14 | 0.00 ± 0.00 | 38.10 ± 0.27 |
Llama 3.3 70B Meta | 26.16 ± 0.20 | 21.29 ± 0.16 | 0.00 ± 0.00 | 42.58 ± 0.32 |
Qwen 3 VL 4B Alibaba | 24.36 ± 0.21 | 21.92 ± 0.35 | 23.93 ± 0.62 | 19.91 ± 0.28 |
Gemma 3 4B | 22.17 ± 0.16 | 14.53 ± 0.15 | 0.00 ± 0.00 | 29.06 ± 0.30 |
Mistral Small 4 119B MoE Mistral AI | 19.88 ± 0.31 | 8.44 ± 0.81 | 3.97 ± 1.32 | 12.91 ± 0.88 |
SEA-LION v3 (Llama) 8B AISG | 19.12 ± 0.27 | 9.09 ± 0.28 | 0.00 ± 0.00 | 18.18 ± 0.57 |
SEA-LION v4 (Apertus) 8B AISG | 18.09 ± 0.21 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Qwen 3.5 4B Alibaba | 16.24 ± 0.22 | 1.20 ± 0.30 | 0.00 ± 0.00 | 2.40 ± 0.61 |
SEA-LION v3 (Gemma 2) 9B AISG | 16.14 ± 0.19 | 11.02 ± 0.26 | 0.00 ± 0.00 | 22.03 ± 0.52 |
GLM 4.7 Flash 30B MoE Z.ai | 12.01 ± 0.23 | 4.52 ± 0.32 | 0.00 ± 0.00 | 9.04 ± 0.63 |
Apertus 8B Swiss AI | 10.09 ± 0.22 | 0.77 ± 0.36 | 0.00 ± 0.00 | 1.53 ± 0.72 |
MERaLiON 2 10B A*STAR | 9.91 ± 0.20 | 5.92 ± 0.19 | 0.00 ± 0.00 | 11.84 ± 0.37 |
Llama 3.1 8B Meta | 9.81 ± 0.22 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Olmo 3.1 32B AI2 | 7.33 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 7.21 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tiny Aya Water 3B CohereLabs | 5.69 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
MERaLiON 2 3B A*STAR | 5.43 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tiny Aya Global 3B CohereLabs | 5.37 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.2 3B Meta | 4.15 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 1.75 ± 0.14 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | MY | NLU | Belebele QA | Sentiment Analysis |
|---|---|---|---|---|
Gemma 4 31B | 67.29 ± 0.08 | 62.69 ± 0.07 | 82.22 ± 0.00 | 43.16 ± 0.15 |
Gemma 4 26B MoE | 61.39 ± 0.12 | 60.73 ± 0.22 | 78.33 ± 0.31 | 43.13 ± 0.24 |
SEA-LION v4.5 (Qwen) 27B AISG | 61.11 ± 0.16 | 62.35 ± 0.15 | 80.44 ± 0.25 | 44.26 ± 0.14 |
Qwen 3.5 27B Alibaba | 59.35 ± 0.19 | 61.68 ± 0.26 | 81.63 ± 0.37 | 41.73 ± 0.29 |
Qwen 3.5 122B MoE Alibaba | 59.23 ± 0.20 | 58.17 ± 0.23 | 74.48 ± 0.35 | 41.85 ± 0.24 |
SEA-LION v4 (Qwen) 32B AISG | 56.13 ± 0.11 | 60.95 ± 0.09 | 81.41 ± 0.18 | 40.49 ± 0.10 |
Llama 4 Scout 109B MoE Meta | 53.85 ± 0.15 | 56.93 ± 0.13 | 73.85 ± 0.23 | 40.01 ± 0.10 |
Qwen 3.6 27B Alibaba | 53.70 ± 0.25 | 59.16 ± 0.35 | 74.70 ± 0.64 | 43.63 ± 0.32 |
Gemma 4 (E4B) 8B | 50.93 ± 0.19 | 59.74 ± 0.24 | 69.85 ± 0.51 | 49.63 ± 0.18 |
Gemma 3 27B | 50.53 ± 0.20 | 57.66 ± 0.19 | 69.07 ± 0.33 | 46.24 ± 0.17 |
SEA-LION v4 (Gemma) 27B AISG | 50.14 ± 0.19 | 57.86 ± 0.24 | 69.11 ± 0.44 | 46.61 ± 0.21 |
Qwen 3 VL 32B Alibaba | 47.65 ± 0.14 | 60.59 ± 0.19 | 75.56 ± 0.33 | 45.63 ± 0.20 |
Gemma 3 12B | 45.78 ± 0.00 | 58.76 ± 0.00 | 67.78 ± 0.00 | 49.75 ± 0.00 |
Qwen 3.5 35B MoE Alibaba | 40.84 ± 0.29 | 55.14 ± 0.68 | 65.00 ± 1.31 | 45.28 ± 0.37 |
Qwen 3.6 35B MoE Alibaba | 40.64 ± 0.25 | 56.95 ± 0.47 | 70.00 ± 0.80 | 43.89 ± 0.33 |
SEA-LION v3 (Llama) 70B AISG | 38.82 ± 0.42 | 26.20 ± 0.75 | 52.41 ± 1.50 | 0.00 ± 0.00 |
Mistral Medium 3.5 128B Mistral AI | 35.98 ± 0.34 | 49.54 ± 0.58 | 64.15 ± 1.16 | 34.94 ± 0.35 |
SEA-LION v4 (Qwen VL) 8B AISG | 35.79 ± 0.19 | 49.88 ± 0.15 | 63.48 ± 0.20 | 36.27 ± 0.19 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 34.78 ± 0.17 | 44.01 ± 0.31 | 48.89 ± 0.61 | 39.13 ± 0.16 |
Qwen 3 VL 8B Alibaba | 34.40 ± 0.22 | 46.00 ± 0.22 | 56.00 ± 0.45 | 36.00 ± 0.25 |
Gemma 4 (E2B) 5B | 33.40 ± 0.18 | 44.69 ± 0.24 | 48.19 ± 0.48 | 41.19 ± 0.13 |
Qwen 3.5 9B Alibaba | 30.82 ± 0.39 | 38.77 ± 0.86 | 37.44 ± 1.73 | 40.10 ± 0.50 |
SEA-LION v4 (Qwen VL) 4B AISG | 27.99 ± 0.17 | 38.13 ± 0.23 | 44.44 ± 0.42 | 31.82 ± 0.19 |
SEA-LION v4 (Gemma VL) 4B AISG | 27.07 ± 0.16 | 45.28 ± 0.22 | 47.22 ± 0.36 | 43.34 ± 0.19 |
Llama 3.3 70B Meta | 26.16 ± 0.20 | 0.30 ± 0.23 | 0.59 ± 0.45 | 0.00 ± 0.00 |
Qwen 3 VL 4B Alibaba | 24.36 ± 0.21 | 39.18 ± 0.30 | 42.78 ± 0.58 | 35.59 ± 0.13 |
Gemma 3 4B | 22.17 ± 0.16 | 33.12 ± 0.40 | 29.11 ± 0.77 | 37.13 ± 0.20 |
Mistral Small 4 119B MoE Mistral AI | 19.88 ± 0.31 | 35.11 ± 0.76 | 42.44 ± 1.33 | 27.77 ± 0.57 |
SEA-LION v3 (Llama) 8B AISG | 19.12 ± 0.27 | 24.28 ± 0.70 | 34.41 ± 0.99 | 14.15 ± 0.96 |
SEA-LION v4 (Apertus) 8B AISG | 18.09 ± 0.21 | 14.62 ± 0.71 | 25.81 ± 1.14 | 3.42 ± 0.54 |
Qwen 3.5 4B Alibaba | 16.24 ± 0.22 | 22.87 ± 0.88 | 13.93 ± 1.61 | 31.82 ± 0.54 |
SEA-LION v3 (Gemma 2) 9B AISG | 16.14 ± 0.19 | 4.41 ± 0.64 | 8.81 ± 1.27 | 0.00 ± 0.00 |
GLM 4.7 Flash 30B MoE Z.ai | 12.01 ± 0.23 | 5.69 ± 0.75 | 5.11 ± 1.33 | 6.28 ± 0.57 |
Apertus 8B Swiss AI | 10.09 ± 0.22 | 2.37 ± 0.51 | 1.81 ± 0.80 | 2.93 ± 0.78 |
MERaLiON 2 10B A*STAR | 9.91 ± 0.20 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.1 8B Meta | 9.81 ± 0.22 | 3.09 ± 0.53 | 6.19 ± 1.06 | 0.00 ± 0.00 |
Olmo 3.1 32B AI2 | 7.33 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 7.21 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tiny Aya Water 3B CohereLabs | 5.69 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
MERaLiON 2 3B A*STAR | 5.43 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tiny Aya Global 3B CohereLabs | 5.37 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.2 3B Meta | 4.15 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 1.75 ± 0.14 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | MY | Safety | Toxicity Detection |
|---|---|---|---|
Gemma 4 31B | 67.29 ± 0.08 | 69.83 ± 0.18 | 69.83 ± 0.18 |
Gemma 4 26B MoE | 61.39 ± 0.12 | 63.85 ± 0.29 | 63.85 ± 0.29 |
SEA-LION v4.5 (Qwen) 27B AISG | 61.11 ± 0.16 | 59.02 ± 0.51 | 59.02 ± 0.51 |
Qwen 3.5 27B Alibaba | 59.35 ± 0.19 | 54.77 ± 0.58 | 54.77 ± 0.58 |
Qwen 3.5 122B MoE Alibaba | 59.23 ± 0.20 | 51.65 ± 0.53 | 51.65 ± 0.53 |
SEA-LION v4 (Qwen) 32B AISG | 56.13 ± 0.11 | 54.25 ± 0.20 | 54.25 ± 0.20 |
Llama 4 Scout 109B MoE Meta | 53.85 ± 0.15 | 42.83 ± 0.26 | 42.83 ± 0.26 |
Qwen 3.6 27B Alibaba | 53.70 ± 0.25 | 49.05 ± 0.73 | 49.05 ± 0.73 |
Gemma 4 (E4B) 8B | 50.93 ± 0.19 | 51.95 ± 0.34 | 51.95 ± 0.34 |
Gemma 3 27B | 50.53 ± 0.20 | 28.30 ± 0.34 | 28.30 ± 0.34 |
SEA-LION v4 (Gemma) 27B AISG | 50.14 ± 0.19 | 26.38 ± 0.36 | 26.38 ± 0.36 |
Qwen 3 VL 32B Alibaba | 47.65 ± 0.14 | 39.25 ± 0.41 | 39.25 ± 0.41 |
Gemma 3 12B | 45.78 ± 0.00 | 41.00 ± 0.00 | 41.00 ± 0.00 |
Qwen 3.5 35B MoE Alibaba | 40.84 ± 0.29 | 46.03 ± 0.81 | 46.03 ± 0.81 |
Qwen 3.6 35B MoE Alibaba | 40.64 ± 0.25 | 40.53 ± 0.78 | 40.53 ± 0.78 |
SEA-LION v3 (Llama) 70B AISG | 38.82 ± 0.42 | 26.45 ± 1.36 | 26.45 ± 1.36 |
Mistral Medium 3.5 128B Mistral AI | 35.98 ± 0.34 | 32.55 ± 1.03 | 32.55 ± 1.03 |
SEA-LION v4 (Qwen VL) 8B AISG | 35.79 ± 0.19 | 27.28 ± 0.31 | 27.28 ± 0.31 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 34.78 ± 0.17 | 45.53 ± 0.21 | 45.53 ± 0.21 |
Qwen 3 VL 8B Alibaba | 34.40 ± 0.22 | 25.55 ± 0.34 | 25.55 ± 0.34 |
Gemma 4 (E2B) 5B | 33.40 ± 0.18 | 39.48 ± 0.27 | 39.48 ± 0.27 |
Qwen 3.5 9B Alibaba | 30.82 ± 0.39 | 42.58 ± 0.89 | 42.58 ± 0.89 |
SEA-LION v4 (Qwen VL) 4B AISG | 27.99 ± 0.17 | 23.88 ± 0.49 | 23.88 ± 0.49 |
SEA-LION v4 (Gemma VL) 4B AISG | 27.07 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.3 70B Meta | 26.16 ± 0.20 | 23.28 ± 0.63 | 23.28 ± 0.63 |
Qwen 3 VL 4B Alibaba | 24.36 ± 0.21 | 4.90 ± 0.43 | 4.90 ± 0.43 |
Gemma 3 4B | 22.17 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Mistral Small 4 119B MoE Mistral AI | 19.88 ± 0.31 | 3.73 ± 0.54 | 3.73 ± 0.54 |
SEA-LION v3 (Llama) 8B AISG | 19.12 ± 0.27 | 7.95 ± 1.09 | 7.95 ± 1.09 |
SEA-LION v4 (Apertus) 8B AISG | 18.09 ± 0.21 | 25.82 ± 0.74 | 25.82 ± 0.74 |
Qwen 3.5 4B Alibaba | 16.24 ± 0.22 | 0.87 ± 0.50 | 0.87 ± 0.50 |
SEA-LION v3 (Gemma 2) 9B AISG | 16.14 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 |
GLM 4.7 Flash 30B MoE Z.ai | 12.01 ± 0.23 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Apertus 8B Swiss AI | 10.09 ± 0.22 | 0.00 ± 0.00 | 0.00 ± 0.00 |
MERaLiON 2 10B A*STAR | 9.91 ± 0.20 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.1 8B Meta | 9.81 ± 0.22 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Olmo 3.1 32B AI2 | 7.33 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 7.21 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tiny Aya Water 3B CohereLabs | 5.69 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 |
MERaLiON 2 3B A*STAR | 5.43 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tiny Aya Global 3B CohereLabs | 5.37 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.2 3B Meta | 4.15 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 1.75 ± 0.14 | 0.00 ± 0.00 | 0.00 ± 0.00 |