English Performance
English Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
31B 65.21±0.20 |
26B MoE 62.38±0.19 |
27B 59.14±0.24 |
122B MoE 58.98±0.21 |
27B 58.08±0.23 |
27B 57.05±0.29 |
35B MoE 55.46±0.20 |
35B MoE 55.29±0.15 |
128B 52.92±0.22 |
32B 51.75±0.20 |
8B 49.29±0.19 |
9B 47.73±0.24 |
109B MoE 46.31±0.17 |
120B MoE 44.47±0.23 |
70B 43.68±0.18 |
8B 43.61±0.17 |
8B 43.30±0.15 |
32B 42.01±0.12 |
5B 41.70±0.19 |
70B 41.54±0.17 |
5B 41.48±0.21 |
27B 41.39±0.19 |
4B 40.91±0.28 |
27B 40.88±0.14 |
32B 40.82±0.15 |
119B MoE 40.45±0.19 |
12B 37.91±0.11 |
4B 37.66±0.16 |
4B 37.11±0.17 |
30B MoE 34.60±0.20 |
8B 31.11±0.17 |
9B 29.50±0.14 |
30B MoE 28.46±0.23 |
8B 28.37±0.19 |
4B 26.96±0.16 |
4B 25.10±0.10 |
10B 22.55±0.13 |
3B 22.07±0.15 |
8B 18.19±0.13 |
8B 18.00±0.11 |
3B 16.16±0.13 |
3B 15.11±0.13 |
3B 11.31±0.11 |
English Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | EN | Code Generation | Instruction Following | Knowledge | Math | Reasoning | Long Context |
|---|---|---|---|---|---|---|---|
Gemma 4 31B | 65.21 ± 0.20 | 73.74 ± 0.18 | 91.73 ± 0.18 | 69.47 ± 0.04 | 36.46 ± 0.37 | 79.33 ± 0.33 | 40.55 ± 1.01 |
Gemma 4 26B MoE | 62.38 ± 0.19 | 71.50 ± 0.16 | 89.74 ± 0.24 | 69.39 ± 0.06 | 28.25 ± 0.32 | 76.98 ± 0.32 | 38.42 ± 1.04 |
Qwen 3.5 27B Alibaba | 59.14 ± 0.24 | 59.70 ± 0.22 | 88.06 ± 0.23 | 70.45 ± 0.05 | 13.01 ± 0.21 | 64.69 ± 0.51 | 58.95 ± 1.26 |
Qwen 3.5 122B MoE Alibaba | 58.98 ± 0.21 | 60.32 ± 0.19 | 87.49 ± 0.24 | 71.83 ± 0.05 | 12.64 ± 0.28 | 65.31 ± 0.48 | 56.31 ± 1.28 |
SEA-LION v4.5 (Qwen) 27B AISG | 58.08 ± 0.23 | 59.87 ± 0.26 | 85.59 ± 0.28 | 70.05 ± 0.06 | 12.78 ± 0.25 | 66.06 ± 0.52 | 54.12 ± 1.09 |
Qwen 3.6 27B Alibaba | 57.05 ± 0.29 | 59.18 ± 0.29 | 84.82 ± 0.24 | 70.32 ± 0.06 | 10.45 ± 0.22 | 61.86 ± 0.57 | 55.69 ± 1.29 |
Qwen 3.5 35B MoE Alibaba | 55.46 ± 0.20 | 56.82 ± 0.27 | 84.95 ± 0.32 | 67.77 ± 0.06 | 11.78 ± 0.29 | 61.85 ± 0.53 | 49.59 ± 1.27 |
Qwen 3.6 35B MoE Alibaba | 55.29 ± 0.15 | 54.76 ± 0.28 | 82.61 ± 0.32 | 69.13 ± 0.05 | 10.97 ± 0.26 | 61.58 ± 0.54 | 52.70 ± 0.73 |
Mistral Medium 3.5 128B Mistral AI | 52.92 ± 0.22 | 50.84 ± 0.20 | 80.91 ± 0.32 | 62.56 ± 0.10 | 7.33 ± 0.28 | 76.15 ± 0.60 | 39.74 ± 1.20 |
Qwen 3 VL 32B Alibaba | 51.75 ± 0.20 | 50.53 ± 0.24 | 83.09 ± 0.29 | 63.18 ± 0.06 | 11.66 ± 0.34 | 71.06 ± 0.39 | 30.98 ± 0.91 |
Gemma 4 (E4B) 8B | 49.29 ± 0.19 | 51.66 ± 0.23 | 84.42 ± 0.31 | 55.47 ± 0.07 | 10.58 ± 0.26 | 69.84 ± 0.58 | 23.76 ± 0.80 |
Qwen 3.5 9B Alibaba | 47.73 ± 0.24 | 45.61 ± 0.29 | 81.28 ± 0.29 | 60.85 ± 0.06 | 7.37 ± 0.32 | 48.69 ± 0.78 | 42.58 ± 1.11 |
Llama 4 Scout 109B MoE Meta | 46.31 ± 0.17 | 31.15 ± 0.24 | 84.96 ± 0.20 | 60.90 ± 0.05 | 9.68 ± 0.37 | 63.71 ± 0.57 | 27.46 ± 0.84 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 44.47 ± 0.23 | 51.62 ± 0.24 | 76.79 ± 0.37 | 54.19 ± 0.10 | 9.94 ± 0.26 | 49.90 ± 0.69 | 24.40 ± 1.10 |
Llama 3.3 70B Meta | 43.68 ± 0.18 | 31.61 ± 0.21 | 89.89 ± 0.13 | 56.97 ± 0.06 | 6.65 ± 0.20 | 64.21 ± 0.46 | 12.78 ± 0.88 |
SEA-LION v4 (Qwen VL) 8B AISG | 43.61 ± 0.17 | 38.73 ± 0.17 | 84.90 ± 0.18 | 51.99 ± 0.08 | 10.54 ± 0.26 | 62.98 ± 0.68 | 12.49 ± 0.79 |
Qwen 3 VL 8B Alibaba | 43.30 ± 0.15 | 41.25 ± 0.22 | 85.50 ± 0.25 | 53.81 ± 0.07 | 9.21 ± 0.25 | 56.52 ± 0.43 | 13.49 ± 0.67 |
SEA-LION v4 (Qwen) 32B AISG | 42.01 ± 0.12 | 42.56 ± 0.14 | 84.50 ± 0.33 | 45.62 ± 0.13 | 11.56 ± 0.30 | 67.82 ± 0.61 | 0.00 ± 0.00 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 41.70 ± 0.19 | 43.49 ± 0.25 | 80.66 ± 0.25 | 45.22 ± 0.08 | 8.15 ± 0.29 | 61.69 ± 0.65 | 10.98 ± 1.03 |
SEA-LION v3 (Llama) 70B AISG | 41.54 ± 0.17 | 30.78 ± 0.20 | 86.75 ± 0.24 | 47.71 ± 0.15 | 5.92 ± 0.20 | 65.18 ± 0.52 | 12.92 ± 0.84 |
Gemma 4 (E2B) 5B | 41.48 ± 0.21 | 45.02 ± 0.30 | 78.24 ± 0.21 | 46.61 ± 0.08 | 6.38 ± 0.21 | 57.74 ± 0.50 | 14.91 ± 0.91 |
SEA-LION v4 (Gemma) 27B AISG | 41.39 ± 0.19 | 31.25 ± 0.17 | 80.40 ± 0.25 | 53.43 ± 0.06 | 13.24 ± 0.34 | 59.76 ± 0.68 | 10.27 ± 0.60 |
Qwen 3.5 4B Alibaba | 40.91 ± 0.28 | 34.95 ± 0.23 | 77.02 ± 0.40 | 55.69 ± 0.07 | 5.60 ± 0.31 | 38.29 ± 0.72 | 33.88 ± 1.14 |
Gemma 3 27B | 40.88 ± 0.14 | 31.05 ± 0.11 | 81.42 ± 0.26 | 53.66 ± 0.07 | 13.15 ± 0.29 | 55.58 ± 0.70 | 10.41 ± 0.74 |
Olmo 3.1 32B AI2 | 40.82 ± 0.15 | 40.06 ± 0.30 | 86.88 ± 0.25 | 51.79 ± 0.07 | 8.08 ± 0.25 | 58.09 ± 0.79 | 0.00 ± 0.00 |
Mistral Small 4 119B MoE Mistral AI | 40.45 ± 0.19 | 40.99 ± 0.27 | 73.19 ± 0.43 | 40.14 ± 0.15 | 8.96 ± 0.36 | 62.06 ± 0.65 | 17.35 ± 1.14 |
Gemma 3 12B | 37.91 ± 0.11 | 26.35 ± 0.00 | 79.67 ± 0.00 | 47.27 ± 0.00 | 10.35 ± 0.00 | 53.72 ± 0.00 | 10.12 ± 0.68 |
Qwen 3 VL 4B Alibaba | 37.66 ± 0.16 | 36.59 ± 0.24 | 82.67 ± 0.30 | 49.07 ± 0.05 | 8.20 ± 0.34 | 36.73 ± 0.53 | 12.71 ± 0.74 |
SEA-LION v4 (Qwen VL) 4B AISG | 37.11 ± 0.17 | 34.43 ± 0.22 | 81.90 ± 0.33 | 46.47 ± 0.09 | 7.88 ± 0.30 | 40.65 ± 0.89 | 11.34 ± 0.65 |
GLM 4.7 Flash 30B MoE Z.ai | 34.60 ± 0.20 | 32.39 ± 0.31 | 76.40 ± 0.53 | 46.35 ± 0.10 | 5.85 ± 0.33 | 33.87 ± 0.85 | 12.78 ± 0.89 |
SEA-LION v3 (Llama) 8B AISG | 31.11 ± 0.17 | 16.43 ± 0.20 | 79.24 ± 0.42 | 42.00 ± 0.09 | 2.37 ± 0.23 | 39.51 ± 0.81 | 7.10 ± 0.76 |
SEA-LION v3 (Gemma 2) 9B AISG | 29.50 ± 0.14 | 17.15 ± 0.17 | 75.66 ± 0.48 | 44.09 ± 0.10 | 1.69 ± 0.16 | 38.41 ± 0.61 | 0.00 ± 0.00 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 28.46 ± 0.23 | 30.20 ± 0.33 | 77.89 ± 0.39 | 37.55 ± 0.12 | 4.42 ± 0.27 | 12.63 ± 0.77 | 8.08 ± 0.61 |
Llama 3.1 8B Meta | 28.37 ± 0.19 | 14.60 ± 0.21 | 74.54 ± 0.41 | 37.49 ± 0.09 | 2.37 ± 0.27 | 32.50 ± 0.76 | 8.75 ± 0.61 |
SEA-LION v4 (Gemma VL) 4B AISG | 26.96 ± 0.16 | 16.52 ± 0.14 | 75.63 ± 0.38 | 33.36 ± 0.08 | 4.56 ± 0.24 | 26.57 ± 0.67 | 5.15 ± 0.35 |
Gemma 3 4B | 25.10 ± 0.10 | 16.02 ± 0.14 | 71.48 ± 0.30 | 32.81 ± 0.11 | 5.18 ± 0.25 | 20.66 ± 0.57 | 4.47 ± 0.43 |
MERaLiON 2 10B A*STAR | 22.55 ± 0.13 | 6.34 ± 0.14 | 62.67 ± 0.42 | 37.25 ± 0.09 | 0.93 ± 0.14 | 28.08 ± 0.65 | 0.00 ± 0.00 |
Llama 3.2 3B Meta | 22.07 ± 0.15 | 10.67 ± 0.18 | 69.03 ± 0.43 | 29.63 ± 0.10 | 1.86 ± 0.18 | 17.49 ± 0.69 | 3.78 ± 0.59 |
Apertus 8B Swiss AI | 18.19 ± 0.13 | 4.80 ± 0.15 | 65.34 ± 0.59 | 24.85 ± 0.11 | 0.69 ± 0.13 | 13.45 ± 0.69 | 0.00 ± 0.00 |
SEA-LION v4 (Apertus) 8B AISG | 18.00 ± 0.11 | 6.32 ± 0.17 | 59.69 ± 0.43 | 27.36 ± 0.08 | 0.82 ± 0.15 | 13.79 ± 0.28 | 0.00 ± 0.00 |
Tiny Aya Water 3B CohereLabs | 16.16 ± 0.13 | 3.52 ± 0.11 | 64.00 ± 0.57 | 21.05 ± 0.12 | 0.50 ± 0.11 | 7.87 ± 0.81 | 0.00 ± 0.00 |
Tiny Aya Global 3B CohereLabs | 15.11 ± 0.13 | 2.83 ± 0.13 | 62.37 ± 0.57 | 19.19 ± 0.12 | 0.57 ± 0.10 | 5.71 ± 0.49 | 0.00 ± 0.00 |
MERaLiON 2 3B A*STAR | 11.31 ± 0.11 | 1.58 ± 0.12 | 45.42 ± 0.37 | 15.38 ± 0.13 | 0.43 ± 0.12 | 5.08 ± 0.48 | 0.00 ± 0.00 |
English Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | EN | Code Generation | LiveCodeBench v6 |
|---|---|---|---|
Gemma 4 31B | 65.21 ± 0.20 | 73.74 ± 0.18 | 73.74 ± 0.18 |
Gemma 4 26B MoE | 62.38 ± 0.19 | 71.50 ± 0.16 | 71.50 ± 0.16 |
Qwen 3.5 27B Alibaba | 59.14 ± 0.24 | 59.70 ± 0.22 | 59.70 ± 0.22 |
Qwen 3.5 122B MoE Alibaba | 58.98 ± 0.21 | 60.32 ± 0.19 | 60.32 ± 0.19 |
SEA-LION v4.5 (Qwen) 27B AISG | 58.08 ± 0.23 | 59.87 ± 0.26 | 59.87 ± 0.26 |
Qwen 3.6 27B Alibaba | 57.05 ± 0.29 | 59.18 ± 0.29 | 59.18 ± 0.29 |
Qwen 3.5 35B MoE Alibaba | 55.46 ± 0.20 | 56.82 ± 0.27 | 56.82 ± 0.27 |
Qwen 3.6 35B MoE Alibaba | 55.29 ± 0.15 | 54.76 ± 0.28 | 54.76 ± 0.28 |
Mistral Medium 3.5 128B Mistral AI | 52.92 ± 0.22 | 50.84 ± 0.20 | 50.84 ± 0.20 |
Qwen 3 VL 32B Alibaba | 51.75 ± 0.20 | 50.53 ± 0.24 | 50.53 ± 0.24 |
Gemma 4 (E4B) 8B | 49.29 ± 0.19 | 51.66 ± 0.23 | 51.66 ± 0.23 |
Qwen 3.5 9B Alibaba | 47.73 ± 0.24 | 45.61 ± 0.29 | 45.61 ± 0.29 |
Llama 4 Scout 109B MoE Meta | 46.31 ± 0.17 | 31.15 ± 0.24 | 31.15 ± 0.24 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 44.47 ± 0.23 | 51.62 ± 0.24 | 51.62 ± 0.24 |
Llama 3.3 70B Meta | 43.68 ± 0.18 | 31.61 ± 0.21 | 31.61 ± 0.21 |
SEA-LION v4 (Qwen VL) 8B AISG | 43.61 ± 0.17 | 38.73 ± 0.17 | 38.73 ± 0.17 |
Qwen 3 VL 8B Alibaba | 43.30 ± 0.15 | 41.25 ± 0.22 | 41.25 ± 0.22 |
SEA-LION v4 (Qwen) 32B AISG | 42.01 ± 0.12 | 42.56 ± 0.14 | 42.56 ± 0.14 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 41.70 ± 0.19 | 43.49 ± 0.25 | 43.49 ± 0.25 |
SEA-LION v3 (Llama) 70B AISG | 41.54 ± 0.17 | 30.78 ± 0.20 | 30.78 ± 0.20 |
Gemma 4 (E2B) 5B | 41.48 ± 0.21 | 45.02 ± 0.30 | 45.02 ± 0.30 |
SEA-LION v4 (Gemma) 27B AISG | 41.39 ± 0.19 | 31.25 ± 0.17 | 31.25 ± 0.17 |
Qwen 3.5 4B Alibaba | 40.91 ± 0.28 | 34.95 ± 0.23 | 34.95 ± 0.23 |
Gemma 3 27B | 40.88 ± 0.14 | 31.05 ± 0.11 | 31.05 ± 0.11 |
Olmo 3.1 32B AI2 | 40.82 ± 0.15 | 40.06 ± 0.30 | 40.06 ± 0.30 |
Mistral Small 4 119B MoE Mistral AI | 40.45 ± 0.19 | 40.99 ± 0.27 | 40.99 ± 0.27 |
Gemma 3 12B | 37.91 ± 0.11 | 26.35 ± 0.00 | 26.35 ± 0.00 |
Qwen 3 VL 4B Alibaba | 37.66 ± 0.16 | 36.59 ± 0.24 | 36.59 ± 0.24 |
SEA-LION v4 (Qwen VL) 4B AISG | 37.11 ± 0.17 | 34.43 ± 0.22 | 34.43 ± 0.22 |
GLM 4.7 Flash 30B MoE Z.ai | 34.60 ± 0.20 | 32.39 ± 0.31 | 32.39 ± 0.31 |
SEA-LION v3 (Llama) 8B AISG | 31.11 ± 0.17 | 16.43 ± 0.20 | 16.43 ± 0.20 |
SEA-LION v3 (Gemma 2) 9B AISG | 29.50 ± 0.14 | 17.15 ± 0.17 | 17.15 ± 0.17 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 28.46 ± 0.23 | 30.20 ± 0.33 | 30.20 ± 0.33 |
Llama 3.1 8B Meta | 28.37 ± 0.19 | 14.60 ± 0.21 | 14.60 ± 0.21 |
SEA-LION v4 (Gemma VL) 4B AISG | 26.96 ± 0.16 | 16.52 ± 0.14 | 16.52 ± 0.14 |
Gemma 3 4B | 25.10 ± 0.10 | 16.02 ± 0.14 | 16.02 ± 0.14 |
MERaLiON 2 10B A*STAR | 22.55 ± 0.13 | 6.34 ± 0.14 | 6.34 ± 0.14 |
Llama 3.2 3B Meta | 22.07 ± 0.15 | 10.67 ± 0.18 | 10.67 ± 0.18 |
Apertus 8B Swiss AI | 18.19 ± 0.13 | 4.80 ± 0.15 | 4.80 ± 0.15 |
SEA-LION v4 (Apertus) 8B AISG | 18.00 ± 0.11 | 6.32 ± 0.17 | 6.32 ± 0.17 |
Tiny Aya Water 3B CohereLabs | 16.16 ± 0.13 | 3.52 ± 0.11 | 3.52 ± 0.11 |
Tiny Aya Global 3B CohereLabs | 15.11 ± 0.13 | 2.83 ± 0.13 | 2.83 ± 0.13 |
MERaLiON 2 3B A*STAR | 11.31 ± 0.11 | 1.58 ± 0.12 | 1.58 ± 0.12 |
Model | EN | Instruction Following | IFEval |
|---|---|---|---|
Gemma 4 31B | 65.21 ± 0.20 | 91.73 ± 0.18 | 91.73 ± 0.18 |
Gemma 4 26B MoE | 62.38 ± 0.19 | 89.74 ± 0.24 | 89.74 ± 0.24 |
Qwen 3.5 27B Alibaba | 59.14 ± 0.24 | 88.06 ± 0.23 | 88.06 ± 0.23 |
Qwen 3.5 122B MoE Alibaba | 58.98 ± 0.21 | 87.49 ± 0.24 | 87.49 ± 0.24 |
SEA-LION v4.5 (Qwen) 27B AISG | 58.08 ± 0.23 | 85.59 ± 0.28 | 85.59 ± 0.28 |
Qwen 3.6 27B Alibaba | 57.05 ± 0.29 | 84.82 ± 0.24 | 84.82 ± 0.24 |
Qwen 3.5 35B MoE Alibaba | 55.46 ± 0.20 | 84.95 ± 0.32 | 84.95 ± 0.32 |
Qwen 3.6 35B MoE Alibaba | 55.29 ± 0.15 | 82.61 ± 0.32 | 82.61 ± 0.32 |
Mistral Medium 3.5 128B Mistral AI | 52.92 ± 0.22 | 80.91 ± 0.32 | 80.91 ± 0.32 |
Qwen 3 VL 32B Alibaba | 51.75 ± 0.20 | 83.09 ± 0.29 | 83.09 ± 0.29 |
Gemma 4 (E4B) 8B | 49.29 ± 0.19 | 84.42 ± 0.31 | 84.42 ± 0.31 |
Qwen 3.5 9B Alibaba | 47.73 ± 0.24 | 81.28 ± 0.29 | 81.28 ± 0.29 |
Llama 4 Scout 109B MoE Meta | 46.31 ± 0.17 | 84.96 ± 0.20 | 84.96 ± 0.20 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 44.47 ± 0.23 | 76.79 ± 0.37 | 76.79 ± 0.37 |
Llama 3.3 70B Meta | 43.68 ± 0.18 | 89.89 ± 0.13 | 89.89 ± 0.13 |
SEA-LION v4 (Qwen VL) 8B AISG | 43.61 ± 0.17 | 84.90 ± 0.18 | 84.90 ± 0.18 |
Qwen 3 VL 8B Alibaba | 43.30 ± 0.15 | 85.50 ± 0.25 | 85.50 ± 0.25 |
SEA-LION v4 (Qwen) 32B AISG | 42.01 ± 0.12 | 84.50 ± 0.33 | 84.50 ± 0.33 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 41.70 ± 0.19 | 80.66 ± 0.25 | 80.66 ± 0.25 |
SEA-LION v3 (Llama) 70B AISG | 41.54 ± 0.17 | 86.75 ± 0.24 | 86.75 ± 0.24 |
Gemma 4 (E2B) 5B | 41.48 ± 0.21 | 78.24 ± 0.21 | 78.24 ± 0.21 |
SEA-LION v4 (Gemma) 27B AISG | 41.39 ± 0.19 | 80.40 ± 0.25 | 80.40 ± 0.25 |
Qwen 3.5 4B Alibaba | 40.91 ± 0.28 | 77.02 ± 0.40 | 77.02 ± 0.40 |
Gemma 3 27B | 40.88 ± 0.14 | 81.42 ± 0.26 | 81.42 ± 0.26 |
Olmo 3.1 32B AI2 | 40.82 ± 0.15 | 86.88 ± 0.25 | 86.88 ± 0.25 |
Mistral Small 4 119B MoE Mistral AI | 40.45 ± 0.19 | 73.19 ± 0.43 | 73.19 ± 0.43 |
Gemma 3 12B | 37.91 ± 0.11 | 79.67 ± 0.00 | 79.67 ± 0.00 |
Qwen 3 VL 4B Alibaba | 37.66 ± 0.16 | 82.67 ± 0.30 | 82.67 ± 0.30 |
SEA-LION v4 (Qwen VL) 4B AISG | 37.11 ± 0.17 | 81.90 ± 0.33 | 81.90 ± 0.33 |
GLM 4.7 Flash 30B MoE Z.ai | 34.60 ± 0.20 | 76.40 ± 0.53 | 76.40 ± 0.53 |
SEA-LION v3 (Llama) 8B AISG | 31.11 ± 0.17 | 79.24 ± 0.42 | 79.24 ± 0.42 |
SEA-LION v3 (Gemma 2) 9B AISG | 29.50 ± 0.14 | 75.66 ± 0.48 | 75.66 ± 0.48 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 28.46 ± 0.23 | 77.89 ± 0.39 | 77.89 ± 0.39 |
Llama 3.1 8B Meta | 28.37 ± 0.19 | 74.54 ± 0.41 | 74.54 ± 0.41 |
SEA-LION v4 (Gemma VL) 4B AISG | 26.96 ± 0.16 | 75.63 ± 0.38 | 75.63 ± 0.38 |
Gemma 3 4B | 25.10 ± 0.10 | 71.48 ± 0.30 | 71.48 ± 0.30 |
MERaLiON 2 10B A*STAR | 22.55 ± 0.13 | 62.67 ± 0.42 | 62.67 ± 0.42 |
Llama 3.2 3B Meta | 22.07 ± 0.15 | 69.03 ± 0.43 | 69.03 ± 0.43 |
Apertus 8B Swiss AI | 18.19 ± 0.13 | 65.34 ± 0.59 | 65.34 ± 0.59 |
SEA-LION v4 (Apertus) 8B AISG | 18.00 ± 0.11 | 59.69 ± 0.43 | 59.69 ± 0.43 |
Tiny Aya Water 3B CohereLabs | 16.16 ± 0.13 | 64.00 ± 0.57 | 64.00 ± 0.57 |
Tiny Aya Global 3B CohereLabs | 15.11 ± 0.13 | 62.37 ± 0.57 | 62.37 ± 0.57 |
MERaLiON 2 3B A*STAR | 11.31 ± 0.11 | 45.42 ± 0.37 | 45.42 ± 0.37 |
Model | EN | Knowledge | MMLU Redux | SuperGPQA |
|---|---|---|---|---|
Gemma 4 31B | 65.21 ± 0.20 | 69.47 ± 0.04 | 91.21 ± 0.06 | 47.72 ± 0.06 |
Gemma 4 26B MoE | 62.38 ± 0.19 | 69.39 ± 0.06 | 87.80 ± 0.11 | 50.99 ± 0.05 |
Qwen 3.5 27B Alibaba | 59.14 ± 0.24 | 70.45 ± 0.05 | 89.94 ± 0.08 | 50.95 ± 0.05 |
Qwen 3.5 122B MoE Alibaba | 58.98 ± 0.21 | 71.83 ± 0.05 | 90.39 ± 0.08 | 53.27 ± 0.04 |
SEA-LION v4.5 (Qwen) 27B AISG | 58.08 ± 0.23 | 70.05 ± 0.06 | 89.68 ± 0.09 | 50.41 ± 0.05 |
Qwen 3.6 27B Alibaba | 57.05 ± 0.29 | 70.32 ± 0.06 | 89.78 ± 0.09 | 50.86 ± 0.06 |
Qwen 3.5 35B MoE Alibaba | 55.46 ± 0.20 | 67.77 ± 0.06 | 88.66 ± 0.09 | 46.88 ± 0.05 |
Qwen 3.6 35B MoE Alibaba | 55.29 ± 0.15 | 69.13 ± 0.05 | 88.96 ± 0.09 | 49.29 ± 0.06 |
Mistral Medium 3.5 128B Mistral AI | 52.92 ± 0.22 | 62.56 ± 0.10 | 81.25 ± 0.19 | 43.88 ± 0.07 |
Qwen 3 VL 32B Alibaba | 51.75 ± 0.20 | 63.18 ± 0.06 | 86.34 ± 0.09 | 40.03 ± 0.06 |
Gemma 4 (E4B) 8B | 49.29 ± 0.19 | 55.47 ± 0.07 | 79.30 ± 0.11 | 31.63 ± 0.06 |
Qwen 3.5 9B Alibaba | 47.73 ± 0.24 | 60.85 ± 0.06 | 83.70 ± 0.12 | 38.00 ± 0.07 |
Llama 4 Scout 109B MoE Meta | 46.31 ± 0.17 | 60.90 ± 0.05 | 85.32 ± 0.08 | 36.47 ± 0.05 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 44.47 ± 0.23 | 54.19 ± 0.10 | 76.35 ± 0.17 | 32.03 ± 0.07 |
Llama 3.3 70B Meta | 43.68 ± 0.18 | 56.97 ± 0.06 | 84.78 ± 0.10 | 29.16 ± 0.05 |
SEA-LION v4 (Qwen VL) 8B AISG | 43.61 ± 0.17 | 51.99 ± 0.08 | 76.62 ± 0.14 | 27.36 ± 0.07 |
Qwen 3 VL 8B Alibaba | 43.30 ± 0.15 | 53.81 ± 0.07 | 79.10 ± 0.12 | 28.53 ± 0.06 |
SEA-LION v4 (Qwen) 32B AISG | 42.01 ± 0.12 | 45.62 ± 0.13 | 56.88 ± 0.25 | 34.36 ± 0.05 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 41.70 ± 0.19 | 45.22 ± 0.08 | 68.32 ± 0.13 | 22.12 ± 0.07 |
SEA-LION v3 (Llama) 70B AISG | 41.54 ± 0.17 | 47.71 ± 0.15 | 63.61 ± 0.26 | 31.80 ± 0.09 |
Gemma 4 (E2B) 5B | 41.48 ± 0.21 | 46.61 ± 0.08 | 69.12 ± 0.14 | 24.11 ± 0.06 |
SEA-LION v4 (Gemma) 27B AISG | 41.39 ± 0.19 | 53.43 ± 0.06 | 79.42 ± 0.10 | 27.45 ± 0.05 |
Qwen 3.5 4B Alibaba | 40.91 ± 0.28 | 55.69 ± 0.07 | 79.07 ± 0.12 | 32.31 ± 0.08 |
Gemma 3 27B | 40.88 ± 0.14 | 53.66 ± 0.07 | 79.59 ± 0.12 | 27.72 ± 0.06 |
Olmo 3.1 32B AI2 | 40.82 ± 0.15 | 51.79 ± 0.07 | 78.82 ± 0.14 | 24.77 ± 0.07 |
Mistral Small 4 119B MoE Mistral AI | 40.45 ± 0.19 | 40.14 ± 0.15 | 45.09 ± 0.28 | 35.19 ± 0.09 |
Gemma 3 12B | 37.91 ± 0.11 | 47.27 ± 0.00 | 74.29 ± 0.00 | 20.25 ± 0.00 |
Qwen 3 VL 4B Alibaba | 37.66 ± 0.16 | 49.07 ± 0.05 | 75.55 ± 0.09 | 22.59 ± 0.06 |
SEA-LION v4 (Qwen VL) 4B AISG | 37.11 ± 0.17 | 46.47 ± 0.09 | 72.24 ± 0.18 | 20.70 ± 0.06 |
GLM 4.7 Flash 30B MoE Z.ai | 34.60 ± 0.20 | 46.35 ± 0.10 | 70.53 ± 0.20 | 22.16 ± 0.09 |
SEA-LION v3 (Llama) 8B AISG | 31.11 ± 0.17 | 42.00 ± 0.09 | 67.43 ± 0.19 | 16.58 ± 0.05 |
SEA-LION v3 (Gemma 2) 9B AISG | 29.50 ± 0.14 | 44.09 ± 0.10 | 69.01 ± 0.16 | 19.18 ± 0.09 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 28.46 ± 0.23 | 37.55 ± 0.12 | 61.56 ± 0.23 | 13.54 ± 0.09 |
Llama 3.1 8B Meta | 28.37 ± 0.19 | 37.49 ± 0.09 | 63.57 ± 0.17 | 11.41 ± 0.06 |
SEA-LION v4 (Gemma VL) 4B AISG | 26.96 ± 0.16 | 33.36 ± 0.08 | 56.15 ± 0.17 | 10.57 ± 0.06 |
Gemma 3 4B | 25.10 ± 0.10 | 32.81 ± 0.11 | 55.55 ± 0.23 | 10.07 ± 0.06 |
MERaLiON 2 10B A*STAR | 22.55 ± 0.13 | 37.25 ± 0.09 | 60.55 ± 0.18 | 13.95 ± 0.05 |
Llama 3.2 3B Meta | 22.07 ± 0.15 | 29.63 ± 0.10 | 52.53 ± 0.19 | 6.73 ± 0.06 |
Apertus 8B Swiss AI | 18.19 ± 0.13 | 24.85 ± 0.11 | 44.97 ± 0.22 | 4.72 ± 0.06 |
SEA-LION v4 (Apertus) 8B AISG | 18.00 ± 0.11 | 27.36 ± 0.08 | 47.45 ± 0.15 | 7.27 ± 0.05 |
Tiny Aya Water 3B CohereLabs | 16.16 ± 0.13 | 21.05 ± 0.12 | 34.83 ± 0.24 | 7.27 ± 0.06 |
Tiny Aya Global 3B CohereLabs | 15.11 ± 0.13 | 19.19 ± 0.12 | 31.80 ± 0.24 | 6.58 ± 0.06 |
MERaLiON 2 3B A*STAR | 11.31 ± 0.11 | 15.38 ± 0.13 | 30.76 ± 0.26 | 0.00 ± 0.00 |
Model | EN | Math | MathArena |
|---|---|---|---|
Gemma 4 31B | 65.21 ± 0.20 | 36.46 ± 0.37 | 36.46 ± 0.37 |
Gemma 4 26B MoE | 62.38 ± 0.19 | 28.25 ± 0.32 | 28.25 ± 0.32 |
Qwen 3.5 27B Alibaba | 59.14 ± 0.24 | 13.01 ± 0.21 | 13.01 ± 0.21 |
Qwen 3.5 122B MoE Alibaba | 58.98 ± 0.21 | 12.64 ± 0.28 | 12.64 ± 0.28 |
SEA-LION v4.5 (Qwen) 27B AISG | 58.08 ± 0.23 | 12.78 ± 0.25 | 12.78 ± 0.25 |
Qwen 3.6 27B Alibaba | 57.05 ± 0.29 | 10.45 ± 0.22 | 10.45 ± 0.22 |
Qwen 3.5 35B MoE Alibaba | 55.46 ± 0.20 | 11.78 ± 0.29 | 11.78 ± 0.29 |
Qwen 3.6 35B MoE Alibaba | 55.29 ± 0.15 | 10.97 ± 0.26 | 10.97 ± 0.26 |
Mistral Medium 3.5 128B Mistral AI | 52.92 ± 0.22 | 7.33 ± 0.28 | 7.33 ± 0.28 |
Qwen 3 VL 32B Alibaba | 51.75 ± 0.20 | 11.66 ± 0.34 | 11.66 ± 0.34 |
Gemma 4 (E4B) 8B | 49.29 ± 0.19 | 10.58 ± 0.26 | 10.58 ± 0.26 |
Qwen 3.5 9B Alibaba | 47.73 ± 0.24 | 7.37 ± 0.32 | 7.37 ± 0.32 |
Llama 4 Scout 109B MoE Meta | 46.31 ± 0.17 | 9.68 ± 0.37 | 9.68 ± 0.37 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 44.47 ± 0.23 | 9.94 ± 0.26 | 9.94 ± 0.26 |
Llama 3.3 70B Meta | 43.68 ± 0.18 | 6.65 ± 0.20 | 6.65 ± 0.20 |
SEA-LION v4 (Qwen VL) 8B AISG | 43.61 ± 0.17 | 10.54 ± 0.26 | 10.54 ± 0.26 |
Qwen 3 VL 8B Alibaba | 43.30 ± 0.15 | 9.21 ± 0.25 | 9.21 ± 0.25 |
SEA-LION v4 (Qwen) 32B AISG | 42.01 ± 0.12 | 11.56 ± 0.30 | 11.56 ± 0.30 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 41.70 ± 0.19 | 8.15 ± 0.29 | 8.15 ± 0.29 |
SEA-LION v3 (Llama) 70B AISG | 41.54 ± 0.17 | 5.92 ± 0.20 | 5.92 ± 0.20 |
Gemma 4 (E2B) 5B | 41.48 ± 0.21 | 6.38 ± 0.21 | 6.38 ± 0.21 |
SEA-LION v4 (Gemma) 27B AISG | 41.39 ± 0.19 | 13.24 ± 0.34 | 13.24 ± 0.34 |
Qwen 3.5 4B Alibaba | 40.91 ± 0.28 | 5.60 ± 0.31 | 5.60 ± 0.31 |
Gemma 3 27B | 40.88 ± 0.14 | 13.15 ± 0.29 | 13.15 ± 0.29 |
Olmo 3.1 32B AI2 | 40.82 ± 0.15 | 8.08 ± 0.25 | 8.08 ± 0.25 |
Mistral Small 4 119B MoE Mistral AI | 40.45 ± 0.19 | 8.96 ± 0.36 | 8.96 ± 0.36 |
Gemma 3 12B | 37.91 ± 0.11 | 10.35 ± 0.00 | 10.35 ± 0.00 |
Qwen 3 VL 4B Alibaba | 37.66 ± 0.16 | 8.20 ± 0.34 | 8.20 ± 0.34 |
SEA-LION v4 (Qwen VL) 4B AISG | 37.11 ± 0.17 | 7.88 ± 0.30 | 7.88 ± 0.30 |
GLM 4.7 Flash 30B MoE Z.ai | 34.60 ± 0.20 | 5.85 ± 0.33 | 5.85 ± 0.33 |
SEA-LION v3 (Llama) 8B AISG | 31.11 ± 0.17 | 2.37 ± 0.23 | 2.37 ± 0.23 |
SEA-LION v3 (Gemma 2) 9B AISG | 29.50 ± 0.14 | 1.69 ± 0.16 | 1.69 ± 0.16 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 28.46 ± 0.23 | 4.42 ± 0.27 | 4.42 ± 0.27 |
Llama 3.1 8B Meta | 28.37 ± 0.19 | 2.37 ± 0.27 | 2.37 ± 0.27 |
SEA-LION v4 (Gemma VL) 4B AISG | 26.96 ± 0.16 | 4.56 ± 0.24 | 4.56 ± 0.24 |
Gemma 3 4B | 25.10 ± 0.10 | 5.18 ± 0.25 | 5.18 ± 0.25 |
MERaLiON 2 10B A*STAR | 22.55 ± 0.13 | 0.93 ± 0.14 | 0.93 ± 0.14 |
Llama 3.2 3B Meta | 22.07 ± 0.15 | 1.86 ± 0.18 | 1.86 ± 0.18 |
Apertus 8B Swiss AI | 18.19 ± 0.13 | 0.69 ± 0.13 | 0.69 ± 0.13 |
SEA-LION v4 (Apertus) 8B AISG | 18.00 ± 0.11 | 0.82 ± 0.15 | 0.82 ± 0.15 |
Tiny Aya Water 3B CohereLabs | 16.16 ± 0.13 | 0.50 ± 0.11 | 0.50 ± 0.11 |
Tiny Aya Global 3B CohereLabs | 15.11 ± 0.13 | 0.57 ± 0.10 | 0.57 ± 0.10 |
MERaLiON 2 3B A*STAR | 11.31 ± 0.11 | 0.43 ± 0.12 | 0.43 ± 0.12 |
Model | EN | Reasoning | MuSR |
|---|---|---|---|
Gemma 4 31B | 65.21 ± 0.20 | 79.33 ± 0.33 | 79.33 ± 0.33 |
Gemma 4 26B MoE | 62.38 ± 0.19 | 76.98 ± 0.32 | 76.98 ± 0.32 |
Qwen 3.5 27B Alibaba | 59.14 ± 0.24 | 64.69 ± 0.51 | 64.69 ± 0.51 |
Qwen 3.5 122B MoE Alibaba | 58.98 ± 0.21 | 65.31 ± 0.48 | 65.31 ± 0.48 |
SEA-LION v4.5 (Qwen) 27B AISG | 58.08 ± 0.23 | 66.06 ± 0.52 | 66.06 ± 0.52 |
Qwen 3.6 27B Alibaba | 57.05 ± 0.29 | 61.86 ± 0.57 | 61.86 ± 0.57 |
Qwen 3.5 35B MoE Alibaba | 55.46 ± 0.20 | 61.85 ± 0.53 | 61.85 ± 0.53 |
Qwen 3.6 35B MoE Alibaba | 55.29 ± 0.15 | 61.58 ± 0.54 | 61.58 ± 0.54 |
Mistral Medium 3.5 128B Mistral AI | 52.92 ± 0.22 | 76.15 ± 0.60 | 76.15 ± 0.60 |
Qwen 3 VL 32B Alibaba | 51.75 ± 0.20 | 71.06 ± 0.39 | 71.06 ± 0.39 |
Gemma 4 (E4B) 8B | 49.29 ± 0.19 | 69.84 ± 0.58 | 69.84 ± 0.58 |
Qwen 3.5 9B Alibaba | 47.73 ± 0.24 | 48.69 ± 0.78 | 48.69 ± 0.78 |
Llama 4 Scout 109B MoE Meta | 46.31 ± 0.17 | 63.71 ± 0.57 | 63.71 ± 0.57 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 44.47 ± 0.23 | 49.90 ± 0.69 | 49.90 ± 0.69 |
Llama 3.3 70B Meta | 43.68 ± 0.18 | 64.21 ± 0.46 | 64.21 ± 0.46 |
SEA-LION v4 (Qwen VL) 8B AISG | 43.61 ± 0.17 | 62.98 ± 0.68 | 62.98 ± 0.68 |
Qwen 3 VL 8B Alibaba | 43.30 ± 0.15 | 56.52 ± 0.43 | 56.52 ± 0.43 |
SEA-LION v4 (Qwen) 32B AISG | 42.01 ± 0.12 | 67.82 ± 0.61 | 67.82 ± 0.61 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 41.70 ± 0.19 | 61.69 ± 0.65 | 61.69 ± 0.65 |
SEA-LION v3 (Llama) 70B AISG | 41.54 ± 0.17 | 65.18 ± 0.52 | 65.18 ± 0.52 |
Gemma 4 (E2B) 5B | 41.48 ± 0.21 | 57.74 ± 0.50 | 57.74 ± 0.50 |
SEA-LION v4 (Gemma) 27B AISG | 41.39 ± 0.19 | 59.76 ± 0.68 | 59.76 ± 0.68 |
Qwen 3.5 4B Alibaba | 40.91 ± 0.28 | 38.29 ± 0.72 | 38.29 ± 0.72 |
Gemma 3 27B | 40.88 ± 0.14 | 55.58 ± 0.70 | 55.58 ± 0.70 |
Olmo 3.1 32B AI2 | 40.82 ± 0.15 | 58.09 ± 0.79 | 58.09 ± 0.79 |
Mistral Small 4 119B MoE Mistral AI | 40.45 ± 0.19 | 62.06 ± 0.65 | 62.06 ± 0.65 |
Gemma 3 12B | 37.91 ± 0.11 | 53.72 ± 0.00 | 53.72 ± 0.00 |
Qwen 3 VL 4B Alibaba | 37.66 ± 0.16 | 36.73 ± 0.53 | 36.73 ± 0.53 |
SEA-LION v4 (Qwen VL) 4B AISG | 37.11 ± 0.17 | 40.65 ± 0.89 | 40.65 ± 0.89 |
GLM 4.7 Flash 30B MoE Z.ai | 34.60 ± 0.20 | 33.87 ± 0.85 | 33.87 ± 0.85 |
SEA-LION v3 (Llama) 8B AISG | 31.11 ± 0.17 | 39.51 ± 0.81 | 39.51 ± 0.81 |
SEA-LION v3 (Gemma 2) 9B AISG | 29.50 ± 0.14 | 38.41 ± 0.61 | 38.41 ± 0.61 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 28.46 ± 0.23 | 12.63 ± 0.77 | 12.63 ± 0.77 |
Llama 3.1 8B Meta | 28.37 ± 0.19 | 32.50 ± 0.76 | 32.50 ± 0.76 |
SEA-LION v4 (Gemma VL) 4B AISG | 26.96 ± 0.16 | 26.57 ± 0.67 | 26.57 ± 0.67 |
Gemma 3 4B | 25.10 ± 0.10 | 20.66 ± 0.57 | 20.66 ± 0.57 |
MERaLiON 2 10B A*STAR | 22.55 ± 0.13 | 28.08 ± 0.65 | 28.08 ± 0.65 |
Llama 3.2 3B Meta | 22.07 ± 0.15 | 17.49 ± 0.69 | 17.49 ± 0.69 |
Apertus 8B Swiss AI | 18.19 ± 0.13 | 13.45 ± 0.69 | 13.45 ± 0.69 |
SEA-LION v4 (Apertus) 8B AISG | 18.00 ± 0.11 | 13.79 ± 0.28 | 13.79 ± 0.28 |
Tiny Aya Water 3B CohereLabs | 16.16 ± 0.13 | 7.87 ± 0.81 | 7.87 ± 0.81 |
Tiny Aya Global 3B CohereLabs | 15.11 ± 0.13 | 5.71 ± 0.49 | 5.71 ± 0.49 |
MERaLiON 2 3B A*STAR | 11.31 ± 0.11 | 5.08 ± 0.48 | 5.08 ± 0.48 |
Model | EN | Long Context | AA-LCR |
|---|---|---|---|
Gemma 4 31B | 65.21 ± 0.20 | 40.55 ± 1.01 | 40.55 ± 1.01 |
Gemma 4 26B MoE | 62.38 ± 0.19 | 38.42 ± 1.04 | 38.42 ± 1.04 |
Qwen 3.5 27B Alibaba | 59.14 ± 0.24 | 58.95 ± 1.26 | 58.95 ± 1.26 |
Qwen 3.5 122B MoE Alibaba | 58.98 ± 0.21 | 56.31 ± 1.28 | 56.31 ± 1.28 |
SEA-LION v4.5 (Qwen) 27B AISG | 58.08 ± 0.23 | 54.12 ± 1.09 | 54.12 ± 1.09 |
Qwen 3.6 27B Alibaba | 57.05 ± 0.29 | 55.69 ± 1.29 | 55.69 ± 1.29 |
Qwen 3.5 35B MoE Alibaba | 55.46 ± 0.20 | 49.59 ± 1.27 | 49.59 ± 1.27 |
Qwen 3.6 35B MoE Alibaba | 55.29 ± 0.15 | 52.70 ± 0.73 | 52.70 ± 0.73 |
Mistral Medium 3.5 128B Mistral AI | 52.92 ± 0.22 | 39.74 ± 1.20 | 39.74 ± 1.20 |
Qwen 3 VL 32B Alibaba | 51.75 ± 0.20 | 30.98 ± 0.91 | 30.98 ± 0.91 |
Gemma 4 (E4B) 8B | 49.29 ± 0.19 | 23.76 ± 0.80 | 23.76 ± 0.80 |
Qwen 3.5 9B Alibaba | 47.73 ± 0.24 | 42.58 ± 1.11 | 42.58 ± 1.11 |
Llama 4 Scout 109B MoE Meta | 46.31 ± 0.17 | 27.46 ± 0.84 | 27.46 ± 0.84 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 44.47 ± 0.23 | 24.40 ± 1.10 | 24.40 ± 1.10 |
Llama 3.3 70B Meta | 43.68 ± 0.18 | 12.78 ± 0.88 | 12.78 ± 0.88 |
SEA-LION v4 (Qwen VL) 8B AISG | 43.61 ± 0.17 | 12.49 ± 0.79 | 12.49 ± 0.79 |
Qwen 3 VL 8B Alibaba | 43.30 ± 0.15 | 13.49 ± 0.67 | 13.49 ± 0.67 |
SEA-LION v4 (Qwen) 32B AISG | 42.01 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 41.70 ± 0.19 | 10.98 ± 1.03 | 10.98 ± 1.03 |
SEA-LION v3 (Llama) 70B AISG | 41.54 ± 0.17 | 12.92 ± 0.84 | 12.92 ± 0.84 |
Gemma 4 (E2B) 5B | 41.48 ± 0.21 | 14.91 ± 0.91 | 14.91 ± 0.91 |
SEA-LION v4 (Gemma) 27B AISG | 41.39 ± 0.19 | 10.27 ± 0.60 | 10.27 ± 0.60 |
Qwen 3.5 4B Alibaba | 40.91 ± 0.28 | 33.88 ± 1.14 | 33.88 ± 1.14 |
Gemma 3 27B | 40.88 ± 0.14 | 10.41 ± 0.74 | 10.41 ± 0.74 |
Olmo 3.1 32B AI2 | 40.82 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Mistral Small 4 119B MoE Mistral AI | 40.45 ± 0.19 | 17.35 ± 1.14 | 17.35 ± 1.14 |
Gemma 3 12B | 37.91 ± 0.11 | 10.12 ± 0.68 | 10.12 ± 0.68 |
Qwen 3 VL 4B Alibaba | 37.66 ± 0.16 | 12.71 ± 0.74 | 12.71 ± 0.74 |
SEA-LION v4 (Qwen VL) 4B AISG | 37.11 ± 0.17 | 11.34 ± 0.65 | 11.34 ± 0.65 |
GLM 4.7 Flash 30B MoE Z.ai | 34.60 ± 0.20 | 12.78 ± 0.89 | 12.78 ± 0.89 |
SEA-LION v3 (Llama) 8B AISG | 31.11 ± 0.17 | 7.10 ± 0.76 | 7.10 ± 0.76 |
SEA-LION v3 (Gemma 2) 9B AISG | 29.50 ± 0.14 | 0.00 ± 0.00 | 0.00 ± 0.00 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 28.46 ± 0.23 | 8.08 ± 0.61 | 8.08 ± 0.61 |
Llama 3.1 8B Meta | 28.37 ± 0.19 | 8.75 ± 0.61 | 8.75 ± 0.61 |
SEA-LION v4 (Gemma VL) 4B AISG | 26.96 ± 0.16 | 5.15 ± 0.35 | 5.15 ± 0.35 |
Gemma 3 4B | 25.10 ± 0.10 | 4.47 ± 0.43 | 4.47 ± 0.43 |
MERaLiON 2 10B A*STAR | 22.55 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.2 3B Meta | 22.07 ± 0.15 | 3.78 ± 0.59 | 3.78 ± 0.59 |
Apertus 8B Swiss AI | 18.19 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
SEA-LION v4 (Apertus) 8B AISG | 18.00 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tiny Aya Water 3B CohereLabs | 16.16 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tiny Aya Global 3B CohereLabs | 15.11 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
MERaLiON 2 3B A*STAR | 11.31 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 |