Thai Performance
Thai Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
27B 66.42±0.11 |
27B 63.24±0.19 |
27B 63.08±0.13 |
31B 62.35±0.07 |
32B 61.74±0.10 |
32B 61.24±0.08 |
122B MoE 61.21±0.15 |
26B MoE 60.14±0.10 |
70B 59.78±0.17 |
8B 57.38±0.08 |
8B 57.16±0.12 |
70B 56.90±0.07 |
35B MoE 56.26±0.18 |
4B 56.21±0.12 |
8B 56.16±0.20 |
35B MoE 55.33±0.20 |
109B MoE 54.85±0.08 |
27B 54.67±0.13 |
128B 54.67±0.22 |
27B 54.36±0.13 |
9B 54.10±0.22 |
4B 54.07±0.12 |
120B MoE 53.42±0.18 |
9B 53.14±0.21 |
12B 52.60±0 |
119B MoE 49.10±0.21 |
8B 46.52±0.18 |
5B 45.32±0.10 |
5B 44.30±0.18 |
4B 44.29±0.19 |
4B 43.61±0.15 |
10B 42.98±0.23 |
8B 42.94±0.16 |
4B 40.61±0.17 |
30B MoE 38.99±0.29 |
8B 37.18±0.15 |
3B 36.15±0.24 |
8B 34.48±0.30 |
32B 34.13±0.16 |
30B MoE 32.10±0.25 |
3B 22.94±0.34 |
3B 9.62±0.19 |
3B 8.04±0.20 |
Thai Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TH | Instruction Following | Knowledge | NLG | NLR | NLU | Safety |
|---|---|---|---|---|---|---|---|
SEA-LION v4.5 (Qwen) 27B AISG | 66.42 ± 0.11 | 91.87 ± 0.45 | 57.52 ± 0.37 | 58.29 ± 0.04 | 74.62 ± 0.22 | 63.91 ± 0.16 | 52.30 ± 0.40 |
Qwen 3.6 27B Alibaba | 63.24 ± 0.19 | 85.43 ± 0.78 | 52.84 ± 0.46 | 57.87 ± 0.06 | 71.61 ± 0.32 | 62.79 ± 0.27 | 48.90 ± 0.42 |
Qwen 3.5 27B Alibaba | 63.08 ± 0.13 | 89.17 ± 0.50 | 59.43 ± 0.39 | 57.96 ± 0.05 | 70.41 ± 0.30 | 61.33 ± 0.20 | 40.18 ± 0.43 |
Gemma 4 31B | 62.35 ± 0.07 | 93.03 ± 0.32 | 58.72 ± 0.21 | 59.48 ± 0.06 | 79.74 ± 0.06 | 63.48 ± 0.12 | 19.67 ± 0.13 |
Qwen 3 VL 32B Alibaba | 61.74 ± 0.10 | 86.27 ± 0.58 | 54.11 ± 0.28 | 56.74 ± 0.04 | 70.00 ± 0.16 | 61.30 ± 0.10 | 41.99 ± 0.21 |
SEA-LION v4 (Qwen) 32B AISG | 61.24 ± 0.08 | 82.90 ± 0.43 | 52.69 ± 0.14 | 57.31 ± 0.04 | 70.76 ± 0.07 | 61.88 ± 0.09 | 41.87 ± 0.09 |
Qwen 3.5 122B MoE Alibaba | 61.21 ± 0.15 | 85.23 ± 0.55 | 57.03 ± 0.36 | 58.31 ± 0.07 | 64.89 ± 0.28 | 62.88 ± 0.23 | 38.90 ± 0.43 |
Gemma 4 26B MoE | 60.14 ± 0.10 | 90.17 ± 0.51 | 49.03 ± 0.26 | 59.07 ± 0.04 | 73.42 ± 0.14 | 63.57 ± 0.13 | 25.62 ± 0.16 |
SEA-LION v3 (Llama) 70B AISG | 59.78 ± 0.17 | 87.60 ± 0.69 | 49.83 ± 0.45 | 58.59 ± 0.05 | 70.32 ± 0.33 | 60.15 ± 0.28 | 32.17 ± 0.37 |
SEA-LION v4 (Qwen VL) 8B AISG | 57.38 ± 0.08 | 76.63 ± 0.33 | 44.95 ± 0.42 | 56.31 ± 0.05 | 61.39 ± 0.13 | 60.66 ± 0.10 | 44.33 ± 0.19 |
Qwen 3 VL 8B Alibaba | 57.16 ± 0.12 | 76.00 ± 0.51 | 43.18 ± 0.49 | 55.71 ± 0.06 | 61.88 ± 0.14 | 60.38 ± 0.11 | 45.82 ± 0.18 |
Llama 3.3 70B Meta | 56.90 ± 0.07 | 83.30 ± 0.35 | 48.21 ± 0.23 | 56.07 ± 0.09 | 72.76 ± 0.14 | 59.64 ± 0.16 | 21.40 ± 0.10 |
Qwen 3.6 35B MoE Alibaba | 56.26 ± 0.18 | 81.53 ± 0.88 | 37.73 ± 0.71 | 57.33 ± 0.07 | 61.15 ± 0.41 | 61.12 ± 0.28 | 38.67 ± 0.68 |
SEA-LION v4 (Qwen VL) 4B AISG | 56.21 ± 0.12 | 81.83 ± 0.65 | 39.86 ± 0.21 | 54.59 ± 0.06 | 61.97 ± 0.15 | 58.78 ± 0.08 | 40.22 ± 0.13 |
Gemma 4 (E4B) 8B | 56.16 ± 0.20 | 77.70 ± 0.75 | 44.66 ± 0.38 | 57.45 ± 0.05 | 61.58 ± 0.27 | 57.99 ± 0.17 | 37.60 ± 0.37 |
Qwen 3.5 35B MoE Alibaba | 55.33 ± 0.20 | 79.00 ± 0.88 | 46.44 ± 0.56 | 56.61 ± 0.06 | 54.78 ± 0.49 | 60.19 ± 0.33 | 34.98 ± 0.45 |
Llama 4 Scout 109B MoE Meta | 54.85 ± 0.08 | 87.70 ± 0.47 | 41.94 ± 0.08 | 58.52 ± 0.05 | 62.84 ± 0.10 | 61.75 ± 0.10 | 16.35 ± 0.05 |
SEA-LION v4 (Gemma) 27B AISG | 54.67 ± 0.13 | 74.97 ± 0.75 | 45.79 ± 0.22 | 58.39 ± 0.05 | 68.83 ± 0.16 | 61.98 ± 0.13 | 18.08 ± 0.14 |
Mistral Medium 3.5 128B Mistral AI | 54.67 ± 0.22 | 71.47 ± 0.85 | 49.69 ± 0.63 | 54.12 ± 0.09 | 61.30 ± 0.39 | 59.95 ± 0.28 | 31.49 ± 0.28 |
Gemma 3 27B | 54.36 ± 0.13 | 73.23 ± 0.74 | 45.12 ± 0.18 | 58.46 ± 0.06 | 68.52 ± 0.11 | 62.50 ± 0.12 | 18.34 ± 0.13 |
SEA-LION v3 (Gemma 2) 9B AISG | 54.10 ± 0.22 | 73.57 ± 1.09 | 42.12 ± 0.52 | 55.72 ± 0.07 | 62.51 ± 0.18 | 57.66 ± 0.20 | 33.03 ± 0.33 |
Qwen 3 VL 4B Alibaba | 54.07 ± 0.12 | 81.37 ± 0.59 | 36.46 ± 0.32 | 54.21 ± 0.05 | 57.03 ± 0.15 | 57.90 ± 0.13 | 37.43 ± 0.17 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 53.42 ± 0.18 | 69.00 ± 0.70 | 48.48 ± 0.60 | 55.27 ± 0.07 | 63.91 ± 0.48 | 56.38 ± 0.34 | 27.47 ± 0.45 |
Qwen 3.5 9B Alibaba | 53.14 ± 0.21 | 76.93 ± 0.89 | 40.61 ± 0.73 | 51.86 ± 0.08 | 49.10 ± 0.56 | 55.99 ± 0.33 | 44.36 ± 0.51 |
Gemma 3 12B | 52.60 ± 0.00 | 70.00 ± 0.00 | 44.16 ± 0.00 | 57.94 ± 0.00 | 67.85 ± 0.00 | 60.53 ± 0.00 | 15.12 ± 0.00 |
Mistral Small 4 119B MoE Mistral AI | 49.10 ± 0.21 | 64.23 ± 0.97 | 41.65 ± 0.74 | 56.05 ± 0.07 | 44.81 ± 0.51 | 56.89 ± 0.38 | 30.98 ± 0.40 |
SEA-LION v3 (Llama) 8B AISG | 46.52 ± 0.18 | 70.90 ± 0.95 | 28.95 ± 0.62 | 56.36 ± 0.07 | 54.45 ± 0.49 | 54.81 ± 0.27 | 13.66 ± 0.26 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 45.32 ± 0.10 | 75.97 ± 0.63 | 25.48 ± 0.30 | 51.51 ± 0.07 | 38.21 ± 0.25 | 53.83 ± 0.14 | 26.90 ± 0.34 |
Gemma 4 (E2B) 5B | 44.30 ± 0.18 | 74.07 ± 0.84 | 25.15 ± 0.29 | 54.14 ± 0.07 | 31.65 ± 0.18 | 52.92 ± 0.12 | 27.85 ± 0.36 |
Qwen 3.5 4B Alibaba | 44.29 ± 0.19 | 67.97 ± 0.86 | 36.29 ± 0.72 | 45.85 ± 0.10 | 24.15 ± 0.66 | 54.75 ± 0.34 | 36.76 ± 0.68 |
SEA-LION v4 (Gemma VL) 4B AISG | 43.61 ± 0.15 | 70.30 ± 0.84 | 25.62 ± 0.23 | 54.60 ± 0.05 | 46.11 ± 0.15 | 56.39 ± 0.20 | 8.62 ± 0.09 |
MERaLiON 2 10B A*STAR | 42.98 ± 0.23 | 53.93 ± 1.06 | 38.66 ± 0.46 | 50.77 ± 0.08 | 55.72 ± 0.27 | 31.83 ± 0.43 | 26.97 ± 0.25 |
SEA-LION v4 (Apertus) 8B AISG | 42.94 ± 0.16 | 49.23 ± 0.81 | 24.18 ± 0.34 | 56.07 ± 0.07 | 34.18 ± 0.30 | 56.71 ± 0.20 | 37.23 ± 0.36 |
Gemma 3 4B | 40.61 ± 0.17 | 64.70 ± 0.90 | 21.59 ± 0.32 | 53.57 ± 0.06 | 43.45 ± 0.14 | 53.19 ± 0.21 | 7.16 ± 0.05 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 38.99 ± 0.29 | 64.73 ± 1.17 | 26.84 ± 0.72 | 50.43 ± 0.09 | 22.02 ± 0.67 | 39.80 ± 0.59 | 30.08 ± 0.88 |
Llama 3.1 8B Meta | 37.18 ± 0.15 | 63.87 ± 0.64 | 28.02 ± 0.51 | 44.40 ± 0.10 | 31.03 ± 0.42 | 55.76 ± 0.31 | 0.00 ± 0.00 |
Llama 3.2 3B Meta | 36.15 ± 0.24 | 54.87 ± 0.93 | 18.87 ± 0.43 | 50.28 ± 0.09 | 13.00 ± 0.73 | 43.21 ± 0.41 | 36.70 ± 0.44 |
Apertus 8B Swiss AI | 34.48 ± 0.30 | 52.10 ± 1.05 | 14.45 ± 0.75 | 50.57 ± 0.14 | 25.46 ± 0.80 | 37.59 ± 0.42 | 26.74 ± 0.72 |
Olmo 3.1 32B AI2 | 34.13 ± 0.16 | 61.47 ± 0.69 | 16.65 ± 0.51 | 47.04 ± 0.06 | 29.64 ± 0.38 | 49.98 ± 0.34 | 0.00 ± 0.00 |
GLM 4.7 Flash 30B MoE Z.ai | 32.10 ± 0.25 | 58.63 ± 0.97 | 12.63 ± 0.63 | 47.88 ± 0.12 | 13.13 ± 0.73 | 47.48 ± 0.50 | 12.86 ± 0.68 |
MERaLiON 2 3B A*STAR | 22.94 ± 0.34 | 31.47 ± 1.50 | 14.83 ± 0.81 | 37.37 ± 0.12 | 20.03 ± 0.61 | 25.84 ± 0.27 | 8.12 ± 1.05 |
Tiny Aya Water 3B CohereLabs | 9.62 ± 0.19 | 23.50 ± 1.14 | 0.13 ± 0.09 | 20.01 ± 0.15 | 0.00 ± 0.00 | 14.07 ± 0.39 | 0.00 ± 0.00 |
Tiny Aya Global 3B CohereLabs | 8.04 ± 0.20 | 18.57 ± 1.07 | 0.03 ± 0.04 | 16.18 ± 0.14 | 0.00 ± 0.00 | 13.49 ± 0.44 | 0.00 ± 0.00 |
Thai Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TH | Instruction Following | SEA-IFEval |
|---|---|---|---|
SEA-LION v4.5 (Qwen) 27B AISG | 66.42 ± 0.11 | 91.87 ± 0.45 | 91.87 ± 0.45 |
Qwen 3.6 27B Alibaba | 63.24 ± 0.19 | 85.43 ± 0.78 | 85.43 ± 0.78 |
Qwen 3.5 27B Alibaba | 63.08 ± 0.13 | 89.17 ± 0.50 | 89.17 ± 0.50 |
Gemma 4 31B | 62.35 ± 0.07 | 93.03 ± 0.32 | 93.03 ± 0.32 |
Qwen 3 VL 32B Alibaba | 61.74 ± 0.10 | 86.27 ± 0.58 | 86.27 ± 0.58 |
SEA-LION v4 (Qwen) 32B AISG | 61.24 ± 0.08 | 82.90 ± 0.43 | 82.90 ± 0.43 |
Qwen 3.5 122B MoE Alibaba | 61.21 ± 0.15 | 85.23 ± 0.55 | 85.23 ± 0.55 |
Gemma 4 26B MoE | 60.14 ± 0.10 | 90.17 ± 0.51 | 90.17 ± 0.51 |
SEA-LION v3 (Llama) 70B AISG | 59.78 ± 0.17 | 87.60 ± 0.69 | 87.60 ± 0.69 |
SEA-LION v4 (Qwen VL) 8B AISG | 57.38 ± 0.08 | 76.63 ± 0.33 | 76.63 ± 0.33 |
Qwen 3 VL 8B Alibaba | 57.16 ± 0.12 | 76.00 ± 0.51 | 76.00 ± 0.51 |
Llama 3.3 70B Meta | 56.90 ± 0.07 | 83.30 ± 0.35 | 83.30 ± 0.35 |
Qwen 3.6 35B MoE Alibaba | 56.26 ± 0.18 | 81.53 ± 0.88 | 81.53 ± 0.88 |
SEA-LION v4 (Qwen VL) 4B AISG | 56.21 ± 0.12 | 81.83 ± 0.65 | 81.83 ± 0.65 |
Gemma 4 (E4B) 8B | 56.16 ± 0.20 | 77.70 ± 0.75 | 77.70 ± 0.75 |
Qwen 3.5 35B MoE Alibaba | 55.33 ± 0.20 | 79.00 ± 0.88 | 79.00 ± 0.88 |
Llama 4 Scout 109B MoE Meta | 54.85 ± 0.08 | 87.70 ± 0.47 | 87.70 ± 0.47 |
SEA-LION v4 (Gemma) 27B AISG | 54.67 ± 0.13 | 74.97 ± 0.75 | 74.97 ± 0.75 |
Mistral Medium 3.5 128B Mistral AI | 54.67 ± 0.22 | 71.47 ± 0.85 | 71.47 ± 0.85 |
Gemma 3 27B | 54.36 ± 0.13 | 73.23 ± 0.74 | 73.23 ± 0.74 |
SEA-LION v3 (Gemma 2) 9B AISG | 54.10 ± 0.22 | 73.57 ± 1.09 | 73.57 ± 1.09 |
Qwen 3 VL 4B Alibaba | 54.07 ± 0.12 | 81.37 ± 0.59 | 81.37 ± 0.59 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 53.42 ± 0.18 | 69.00 ± 0.70 | 69.00 ± 0.70 |
Qwen 3.5 9B Alibaba | 53.14 ± 0.21 | 76.93 ± 0.89 | 76.93 ± 0.89 |
Gemma 3 12B | 52.60 ± 0.00 | 70.00 ± 0.00 | 70.00 ± 0.00 |
Mistral Small 4 119B MoE Mistral AI | 49.10 ± 0.21 | 64.23 ± 0.97 | 64.23 ± 0.97 |
SEA-LION v3 (Llama) 8B AISG | 46.52 ± 0.18 | 70.90 ± 0.95 | 70.90 ± 0.95 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 45.32 ± 0.10 | 75.97 ± 0.63 | 75.97 ± 0.63 |
Gemma 4 (E2B) 5B | 44.30 ± 0.18 | 74.07 ± 0.84 | 74.07 ± 0.84 |
Qwen 3.5 4B Alibaba | 44.29 ± 0.19 | 67.97 ± 0.86 | 67.97 ± 0.86 |
SEA-LION v4 (Gemma VL) 4B AISG | 43.61 ± 0.15 | 70.30 ± 0.84 | 70.30 ± 0.84 |
MERaLiON 2 10B A*STAR | 42.98 ± 0.23 | 53.93 ± 1.06 | 53.93 ± 1.06 |
SEA-LION v4 (Apertus) 8B AISG | 42.94 ± 0.16 | 49.23 ± 0.81 | 49.23 ± 0.81 |
Gemma 3 4B | 40.61 ± 0.17 | 64.70 ± 0.90 | 64.70 ± 0.90 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 38.99 ± 0.29 | 64.73 ± 1.17 | 64.73 ± 1.17 |
Llama 3.1 8B Meta | 37.18 ± 0.15 | 63.87 ± 0.64 | 63.87 ± 0.64 |
Llama 3.2 3B Meta | 36.15 ± 0.24 | 54.87 ± 0.93 | 54.87 ± 0.93 |
Apertus 8B Swiss AI | 34.48 ± 0.30 | 52.10 ± 1.05 | 52.10 ± 1.05 |
Olmo 3.1 32B AI2 | 34.13 ± 0.16 | 61.47 ± 0.69 | 61.47 ± 0.69 |
GLM 4.7 Flash 30B MoE Z.ai | 32.10 ± 0.25 | 58.63 ± 0.97 | 58.63 ± 0.97 |
MERaLiON 2 3B A*STAR | 22.94 ± 0.34 | 31.47 ± 1.50 | 31.47 ± 1.50 |
Tiny Aya Water 3B CohereLabs | 9.62 ± 0.19 | 23.50 ± 1.14 | 23.50 ± 1.14 |
Tiny Aya Global 3B CohereLabs | 8.04 ± 0.20 | 18.57 ± 1.07 | 18.57 ± 1.07 |
Model | TH | Knowledge | thai_exam |
|---|---|---|---|
SEA-LION v4.5 (Qwen) 27B AISG | 66.42 ± 0.11 | 57.52 ± 0.37 | 57.52 ± 0.37 |
Qwen 3.6 27B Alibaba | 63.24 ± 0.19 | 52.84 ± 0.46 | 52.84 ± 0.46 |
Qwen 3.5 27B Alibaba | 63.08 ± 0.13 | 59.43 ± 0.39 | 59.43 ± 0.39 |
Gemma 4 31B | 62.35 ± 0.07 | 58.72 ± 0.21 | 58.72 ± 0.21 |
Qwen 3 VL 32B Alibaba | 61.74 ± 0.10 | 54.11 ± 0.28 | 54.11 ± 0.28 |
SEA-LION v4 (Qwen) 32B AISG | 61.24 ± 0.08 | 52.69 ± 0.14 | 52.69 ± 0.14 |
Qwen 3.5 122B MoE Alibaba | 61.21 ± 0.15 | 57.03 ± 0.36 | 57.03 ± 0.36 |
Gemma 4 26B MoE | 60.14 ± 0.10 | 49.03 ± 0.26 | 49.03 ± 0.26 |
SEA-LION v3 (Llama) 70B AISG | 59.78 ± 0.17 | 49.83 ± 0.45 | 49.83 ± 0.45 |
SEA-LION v4 (Qwen VL) 8B AISG | 57.38 ± 0.08 | 44.95 ± 0.42 | 44.95 ± 0.42 |
Qwen 3 VL 8B Alibaba | 57.16 ± 0.12 | 43.18 ± 0.49 | 43.18 ± 0.49 |
Llama 3.3 70B Meta | 56.90 ± 0.07 | 48.21 ± 0.23 | 48.21 ± 0.23 |
Qwen 3.6 35B MoE Alibaba | 56.26 ± 0.18 | 37.73 ± 0.71 | 37.73 ± 0.71 |
SEA-LION v4 (Qwen VL) 4B AISG | 56.21 ± 0.12 | 39.86 ± 0.21 | 39.86 ± 0.21 |
Gemma 4 (E4B) 8B | 56.16 ± 0.20 | 44.66 ± 0.38 | 44.66 ± 0.38 |
Qwen 3.5 35B MoE Alibaba | 55.33 ± 0.20 | 46.44 ± 0.56 | 46.44 ± 0.56 |
Llama 4 Scout 109B MoE Meta | 54.85 ± 0.08 | 41.94 ± 0.08 | 41.94 ± 0.08 |
SEA-LION v4 (Gemma) 27B AISG | 54.67 ± 0.13 | 45.79 ± 0.22 | 45.79 ± 0.22 |
Mistral Medium 3.5 128B Mistral AI | 54.67 ± 0.22 | 49.69 ± 0.63 | 49.69 ± 0.63 |
Gemma 3 27B | 54.36 ± 0.13 | 45.12 ± 0.18 | 45.12 ± 0.18 |
SEA-LION v3 (Gemma 2) 9B AISG | 54.10 ± 0.22 | 42.12 ± 0.52 | 42.12 ± 0.52 |
Qwen 3 VL 4B Alibaba | 54.07 ± 0.12 | 36.46 ± 0.32 | 36.46 ± 0.32 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 53.42 ± 0.18 | 48.48 ± 0.60 | 48.48 ± 0.60 |
Qwen 3.5 9B Alibaba | 53.14 ± 0.21 | 40.61 ± 0.73 | 40.61 ± 0.73 |
Gemma 3 12B | 52.60 ± 0.00 | 44.16 ± 0.00 | 44.16 ± 0.00 |
Mistral Small 4 119B MoE Mistral AI | 49.10 ± 0.21 | 41.65 ± 0.74 | 41.65 ± 0.74 |
SEA-LION v3 (Llama) 8B AISG | 46.52 ± 0.18 | 28.95 ± 0.62 | 28.95 ± 0.62 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 45.32 ± 0.10 | 25.48 ± 0.30 | 25.48 ± 0.30 |
Gemma 4 (E2B) 5B | 44.30 ± 0.18 | 25.15 ± 0.29 | 25.15 ± 0.29 |
Qwen 3.5 4B Alibaba | 44.29 ± 0.19 | 36.29 ± 0.72 | 36.29 ± 0.72 |
SEA-LION v4 (Gemma VL) 4B AISG | 43.61 ± 0.15 | 25.62 ± 0.23 | 25.62 ± 0.23 |
MERaLiON 2 10B A*STAR | 42.98 ± 0.23 | 38.66 ± 0.46 | 38.66 ± 0.46 |
SEA-LION v4 (Apertus) 8B AISG | 42.94 ± 0.16 | 24.18 ± 0.34 | 24.18 ± 0.34 |
Gemma 3 4B | 40.61 ± 0.17 | 21.59 ± 0.32 | 21.59 ± 0.32 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 38.99 ± 0.29 | 26.84 ± 0.72 | 26.84 ± 0.72 |
Llama 3.1 8B Meta | 37.18 ± 0.15 | 28.02 ± 0.51 | 28.02 ± 0.51 |
Llama 3.2 3B Meta | 36.15 ± 0.24 | 18.87 ± 0.43 | 18.87 ± 0.43 |
Apertus 8B Swiss AI | 34.48 ± 0.30 | 14.45 ± 0.75 | 14.45 ± 0.75 |
Olmo 3.1 32B AI2 | 34.13 ± 0.16 | 16.65 ± 0.51 | 16.65 ± 0.51 |
GLM 4.7 Flash 30B MoE Z.ai | 32.10 ± 0.25 | 12.63 ± 0.63 | 12.63 ± 0.63 |
MERaLiON 2 3B A*STAR | 22.94 ± 0.34 | 14.83 ± 0.81 | 14.83 ± 0.81 |
Tiny Aya Water 3B CohereLabs | 9.62 ± 0.19 | 0.13 ± 0.09 | 0.13 ± 0.09 |
Tiny Aya Global 3B CohereLabs | 8.04 ± 0.20 | 0.03 ± 0.04 | 0.03 ± 0.04 |
Model | TH | NLG | Summarization | Translations |
|---|---|---|---|---|
SEA-LION v4.5 (Qwen) 27B AISG | 66.42 ± 0.11 | 58.29 ± 0.04 | 23.72 ± 0.08 | 92.86 ± 0.02 |
Qwen 3.6 27B Alibaba | 63.24 ± 0.19 | 57.87 ± 0.06 | 24.00 ± 0.12 | 91.74 ± 0.03 |
Qwen 3.5 27B Alibaba | 63.08 ± 0.13 | 57.96 ± 0.05 | 23.09 ± 0.11 | 92.83 ± 0.02 |
Gemma 4 31B | 62.35 ± 0.07 | 59.48 ± 0.06 | 25.08 ± 0.11 | 93.88 ± 0.01 |
Qwen 3 VL 32B Alibaba | 61.74 ± 0.10 | 56.74 ± 0.04 | 21.95 ± 0.07 | 91.53 ± 0.02 |
SEA-LION v4 (Qwen) 32B AISG | 61.24 ± 0.08 | 57.31 ± 0.04 | 23.82 ± 0.08 | 90.81 ± 0.02 |
Qwen 3.5 122B MoE Alibaba | 61.21 ± 0.15 | 58.31 ± 0.07 | 23.68 ± 0.13 | 92.94 ± 0.02 |
Gemma 4 26B MoE | 60.14 ± 0.10 | 59.07 ± 0.04 | 24.68 ± 0.08 | 93.45 ± 0.01 |
SEA-LION v3 (Llama) 70B AISG | 59.78 ± 0.17 | 58.59 ± 0.05 | 25.42 ± 0.10 | 91.77 ± 0.02 |
SEA-LION v4 (Qwen VL) 8B AISG | 57.38 ± 0.08 | 56.31 ± 0.05 | 23.30 ± 0.08 | 89.33 ± 0.03 |
Qwen 3 VL 8B Alibaba | 57.16 ± 0.12 | 55.71 ± 0.06 | 22.42 ± 0.11 | 89.00 ± 0.03 |
Llama 3.3 70B Meta | 56.90 ± 0.07 | 56.07 ± 0.09 | 22.92 ± 0.16 | 89.22 ± 0.03 |
Qwen 3.6 35B MoE Alibaba | 56.26 ± 0.18 | 57.33 ± 0.07 | 23.43 ± 0.13 | 91.24 ± 0.03 |
SEA-LION v4 (Qwen VL) 4B AISG | 56.21 ± 0.12 | 54.59 ± 0.06 | 22.66 ± 0.11 | 86.52 ± 0.04 |
Gemma 4 (E4B) 8B | 56.16 ± 0.20 | 57.45 ± 0.05 | 23.03 ± 0.11 | 91.87 ± 0.02 |
Qwen 3.5 35B MoE Alibaba | 55.33 ± 0.20 | 56.61 ± 0.06 | 22.33 ± 0.11 | 90.90 ± 0.03 |
Llama 4 Scout 109B MoE Meta | 54.85 ± 0.08 | 58.52 ± 0.05 | 25.82 ± 0.10 | 91.21 ± 0.02 |
SEA-LION v4 (Gemma) 27B AISG | 54.67 ± 0.13 | 58.39 ± 0.05 | 23.74 ± 0.10 | 93.04 ± 0.02 |
Mistral Medium 3.5 128B Mistral AI | 54.67 ± 0.22 | 54.12 ± 0.09 | 23.85 ± 0.16 | 84.40 ± 0.10 |
Gemma 3 27B | 54.36 ± 0.13 | 58.46 ± 0.06 | 23.88 ± 0.11 | 93.05 ± 0.02 |
SEA-LION v3 (Gemma 2) 9B AISG | 54.10 ± 0.22 | 55.72 ± 0.07 | 23.94 ± 0.14 | 87.49 ± 0.04 |
Qwen 3 VL 4B Alibaba | 54.07 ± 0.12 | 54.21 ± 0.05 | 22.13 ± 0.09 | 86.29 ± 0.04 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 53.42 ± 0.18 | 55.27 ± 0.07 | 22.26 ± 0.11 | 88.27 ± 0.06 |
Qwen 3.5 9B Alibaba | 53.14 ± 0.21 | 51.86 ± 0.08 | 20.94 ± 0.14 | 82.78 ± 0.09 |
Gemma 3 12B | 52.60 ± 0.00 | 57.94 ± 0.00 | 23.59 ± 0.00 | 92.29 ± 0.00 |
Mistral Small 4 119B MoE Mistral AI | 49.10 ± 0.21 | 56.05 ± 0.07 | 22.39 ± 0.12 | 89.71 ± 0.06 |
SEA-LION v3 (Llama) 8B AISG | 46.52 ± 0.18 | 56.36 ± 0.07 | 23.88 ± 0.12 | 88.84 ± 0.05 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 45.32 ± 0.10 | 51.51 ± 0.07 | 21.74 ± 0.12 | 81.29 ± 0.09 |
Gemma 4 (E2B) 5B | 44.30 ± 0.18 | 54.14 ± 0.07 | 21.70 ± 0.10 | 86.57 ± 0.08 |
Qwen 3.5 4B Alibaba | 44.29 ± 0.19 | 45.85 ± 0.10 | 20.74 ± 0.12 | 70.96 ± 0.14 |
SEA-LION v4 (Gemma VL) 4B AISG | 43.61 ± 0.15 | 54.60 ± 0.05 | 20.45 ± 0.09 | 88.75 ± 0.03 |
MERaLiON 2 10B A*STAR | 42.98 ± 0.23 | 50.77 ± 0.08 | 20.03 ± 0.16 | 81.51 ± 0.09 |
SEA-LION v4 (Apertus) 8B AISG | 42.94 ± 0.16 | 56.07 ± 0.07 | 22.67 ± 0.13 | 89.46 ± 0.03 |
Gemma 3 4B | 40.61 ± 0.17 | 53.57 ± 0.06 | 18.18 ± 0.11 | 88.96 ± 0.03 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 38.99 ± 0.29 | 50.43 ± 0.09 | 20.13 ± 0.17 | 80.74 ± 0.10 |
Llama 3.1 8B Meta | 37.18 ± 0.15 | 44.40 ± 0.10 | 24.95 ± 0.17 | 63.85 ± 0.10 |
Llama 3.2 3B Meta | 36.15 ± 0.24 | 50.28 ± 0.09 | 24.12 ± 0.15 | 76.44 ± 0.09 |
Apertus 8B Swiss AI | 34.48 ± 0.30 | 50.57 ± 0.14 | 21.47 ± 0.24 | 79.68 ± 0.10 |
Olmo 3.1 32B AI2 | 34.13 ± 0.16 | 47.04 ± 0.06 | 20.83 ± 0.10 | 73.26 ± 0.08 |
GLM 4.7 Flash 30B MoE Z.ai | 32.10 ± 0.25 | 47.88 ± 0.12 | 19.53 ± 0.18 | 76.22 ± 0.13 |
MERaLiON 2 3B A*STAR | 22.94 ± 0.34 | 37.37 ± 0.12 | 15.65 ± 0.18 | 59.09 ± 0.16 |
Tiny Aya Water 3B CohereLabs | 9.62 ± 0.19 | 20.01 ± 0.15 | 10.98 ± 0.25 | 29.04 ± 0.14 |
Tiny Aya Global 3B CohereLabs | 8.04 ± 0.20 | 16.18 ± 0.14 | 10.05 ± 0.20 | 22.31 ± 0.17 |
Model | TH | NLR | Causal Reasoning | Natural Language Inference |
|---|---|---|---|---|
SEA-LION v4.5 (Qwen) 27B AISG | 66.42 ± 0.11 | 74.62 ± 0.22 | 93.51 ± 0.31 | 55.74 ± 0.31 |
Qwen 3.6 27B Alibaba | 63.24 ± 0.19 | 71.61 ± 0.32 | 91.13 ± 0.34 | 52.10 ± 0.55 |
Qwen 3.5 27B Alibaba | 63.08 ± 0.13 | 70.41 ± 0.30 | 91.13 ± 0.47 | 49.69 ± 0.35 |
Gemma 4 31B | 62.35 ± 0.07 | 79.74 ± 0.06 | 96.89 ± 0.11 | 62.59 ± 0.06 |
Qwen 3 VL 32B Alibaba | 61.74 ± 0.10 | 70.00 ± 0.16 | 88.97 ± 0.15 | 51.03 ± 0.25 |
SEA-LION v4 (Qwen) 32B AISG | 61.24 ± 0.08 | 70.76 ± 0.07 | 91.84 ± 0.12 | 49.69 ± 0.09 |
Qwen 3.5 122B MoE Alibaba | 61.21 ± 0.15 | 64.89 ± 0.28 | 89.25 ± 0.47 | 40.52 ± 0.30 |
Gemma 4 26B MoE | 60.14 ± 0.10 | 73.42 ± 0.14 | 92.37 ± 0.25 | 54.47 ± 0.12 |
SEA-LION v3 (Llama) 70B AISG | 59.78 ± 0.17 | 70.32 ± 0.33 | 90.49 ± 0.43 | 50.15 ± 0.54 |
SEA-LION v4 (Qwen VL) 8B AISG | 57.38 ± 0.08 | 61.39 ± 0.13 | 82.25 ± 0.17 | 40.53 ± 0.17 |
Qwen 3 VL 8B Alibaba | 57.16 ± 0.12 | 61.88 ± 0.14 | 81.71 ± 0.23 | 42.05 ± 0.20 |
Llama 3.3 70B Meta | 56.90 ± 0.07 | 72.76 ± 0.14 | 91.99 ± 0.17 | 53.53 ± 0.21 |
Qwen 3.6 35B MoE Alibaba | 56.26 ± 0.18 | 61.15 ± 0.41 | 80.17 ± 0.73 | 42.14 ± 0.58 |
SEA-LION v4 (Qwen VL) 4B AISG | 56.21 ± 0.12 | 61.97 ± 0.15 | 75.29 ± 0.19 | 48.65 ± 0.19 |
Gemma 4 (E4B) 8B | 56.16 ± 0.20 | 61.58 ± 0.27 | 82.95 ± 0.47 | 40.22 ± 0.33 |
Qwen 3.5 35B MoE Alibaba | 55.33 ± 0.20 | 54.78 ± 0.49 | 67.07 ± 0.83 | 42.49 ± 0.42 |
Llama 4 Scout 109B MoE Meta | 54.85 ± 0.08 | 62.84 ± 0.10 | 88.32 ± 0.14 | 37.36 ± 0.14 |
SEA-LION v4 (Gemma) 27B AISG | 54.67 ± 0.13 | 68.83 ± 0.16 | 88.25 ± 0.20 | 49.40 ± 0.21 |
Mistral Medium 3.5 128B Mistral AI | 54.67 ± 0.22 | 61.30 ± 0.39 | 78.07 ± 0.67 | 44.54 ± 0.36 |
Gemma 3 27B | 54.36 ± 0.13 | 68.52 ± 0.11 | 87.99 ± 0.19 | 49.06 ± 0.15 |
SEA-LION v3 (Gemma 2) 9B AISG | 54.10 ± 0.22 | 62.51 ± 0.18 | 84.95 ± 0.21 | 40.08 ± 0.33 |
Qwen 3 VL 4B Alibaba | 54.07 ± 0.12 | 57.03 ± 0.15 | 71.11 ± 0.30 | 42.96 ± 0.20 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 53.42 ± 0.18 | 63.91 ± 0.48 | 79.37 ± 0.68 | 48.45 ± 0.57 |
Qwen 3.5 9B Alibaba | 53.14 ± 0.21 | 49.10 ± 0.56 | 71.87 ± 0.83 | 26.34 ± 0.57 |
Gemma 3 12B | 52.60 ± 0.00 | 67.85 ± 0.00 | 85.20 ± 0.00 | 50.50 ± 0.00 |
Mistral Small 4 119B MoE Mistral AI | 49.10 ± 0.21 | 44.81 ± 0.51 | 55.93 ± 0.97 | 33.69 ± 0.40 |
SEA-LION v3 (Llama) 8B AISG | 46.52 ± 0.18 | 54.45 ± 0.49 | 67.79 ± 0.66 | 41.12 ± 0.64 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 45.32 ± 0.10 | 38.21 ± 0.25 | 62.57 ± 0.36 | 13.85 ± 0.31 |
Gemma 4 (E2B) 5B | 44.30 ± 0.18 | 31.65 ± 0.18 | 63.29 ± 0.35 | 0.00 ± 0.00 |
Qwen 3.5 4B Alibaba | 44.29 ± 0.19 | 24.15 ± 0.66 | 42.56 ± 1.13 | 5.74 ± 0.44 |
SEA-LION v4 (Gemma VL) 4B AISG | 43.61 ± 0.15 | 46.11 ± 0.15 | 60.49 ± 0.29 | 31.74 ± 0.16 |
MERaLiON 2 10B A*STAR | 42.98 ± 0.23 | 55.72 ± 0.27 | 82.47 ± 0.30 | 28.97 ± 0.38 |
SEA-LION v4 (Apertus) 8B AISG | 42.94 ± 0.16 | 34.18 ± 0.30 | 63.25 ± 0.58 | 5.11 ± 0.20 |
Gemma 3 4B | 40.61 ± 0.17 | 43.45 ± 0.14 | 58.12 ± 0.24 | 28.78 ± 0.12 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 38.99 ± 0.29 | 22.02 ± 0.67 | 40.96 ± 1.07 | 3.08 ± 0.63 |
Llama 3.1 8B Meta | 37.18 ± 0.15 | 31.03 ± 0.42 | 32.20 ± 0.63 | 29.85 ± 0.54 |
Llama 3.2 3B Meta | 36.15 ± 0.24 | 13.00 ± 0.73 | 11.80 ± 1.20 | 14.20 ± 0.50 |
Apertus 8B Swiss AI | 34.48 ± 0.30 | 25.46 ± 0.80 | 42.05 ± 1.32 | 8.87 ± 0.78 |
Olmo 3.1 32B AI2 | 34.13 ± 0.16 | 29.64 ± 0.38 | 31.20 ± 0.62 | 28.09 ± 0.50 |
GLM 4.7 Flash 30B MoE Z.ai | 32.10 ± 0.25 | 13.13 ± 0.73 | 11.91 ± 1.24 | 14.35 ± 0.70 |
MERaLiON 2 3B A*STAR | 22.94 ± 0.34 | 20.03 ± 0.61 | 26.65 ± 1.05 | 13.41 ± 0.64 |
Tiny Aya Water 3B CohereLabs | 9.62 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tiny Aya Global 3B CohereLabs | 8.04 ± 0.20 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | TH | NLU | Question Answering | Sentiment Analysis |
|---|---|---|---|---|
SEA-LION v4.5 (Qwen) 27B AISG | 66.42 ± 0.11 | 63.91 ± 0.16 | 87.54 ± 0.16 | 40.27 ± 0.28 |
Qwen 3.6 27B Alibaba | 63.24 ± 0.19 | 62.79 ± 0.27 | 87.69 ± 0.36 | 37.89 ± 0.44 |
Qwen 3.5 27B Alibaba | 63.08 ± 0.13 | 61.33 ± 0.20 | 87.19 ± 0.31 | 35.47 ± 0.36 |
Gemma 4 31B | 62.35 ± 0.07 | 63.48 ± 0.12 | 85.44 ± 0.18 | 41.52 ± 0.14 |
Qwen 3 VL 32B Alibaba | 61.74 ± 0.10 | 61.30 ± 0.10 | 85.69 ± 0.16 | 36.92 ± 0.14 |
SEA-LION v4 (Qwen) 32B AISG | 61.24 ± 0.08 | 61.88 ± 0.09 | 87.88 ± 0.10 | 35.87 ± 0.15 |
Qwen 3.5 122B MoE Alibaba | 61.21 ± 0.15 | 62.88 ± 0.23 | 87.13 ± 0.21 | 38.64 ± 0.39 |
Gemma 4 26B MoE | 60.14 ± 0.10 | 63.57 ± 0.13 | 84.24 ± 0.15 | 42.90 ± 0.20 |
SEA-LION v3 (Llama) 70B AISG | 59.78 ± 0.17 | 60.15 ± 0.28 | 85.77 ± 0.38 | 34.53 ± 0.38 |
SEA-LION v4 (Qwen VL) 8B AISG | 57.38 ± 0.08 | 60.66 ± 0.10 | 87.12 ± 0.21 | 34.20 ± 0.07 |
Qwen 3 VL 8B Alibaba | 57.16 ± 0.12 | 60.38 ± 0.11 | 86.54 ± 0.16 | 34.23 ± 0.15 |
Llama 3.3 70B Meta | 56.90 ± 0.07 | 59.64 ± 0.16 | 86.01 ± 0.26 | 33.26 ± 0.23 |
Qwen 3.6 35B MoE Alibaba | 56.26 ± 0.18 | 61.12 ± 0.28 | 86.38 ± 0.39 | 35.86 ± 0.40 |
SEA-LION v4 (Qwen VL) 4B AISG | 56.21 ± 0.12 | 58.78 ± 0.08 | 83.35 ± 0.09 | 34.21 ± 0.15 |
Gemma 4 (E4B) 8B | 56.16 ± 0.20 | 57.99 ± 0.17 | 82.21 ± 0.26 | 33.77 ± 0.24 |
Qwen 3.5 35B MoE Alibaba | 55.33 ± 0.20 | 60.19 ± 0.33 | 85.29 ± 0.45 | 35.09 ± 0.42 |
Llama 4 Scout 109B MoE Meta | 54.85 ± 0.08 | 61.75 ± 0.10 | 89.13 ± 0.17 | 34.37 ± 0.09 |
SEA-LION v4 (Gemma) 27B AISG | 54.67 ± 0.13 | 61.98 ± 0.13 | 85.66 ± 0.18 | 38.31 ± 0.16 |
Mistral Medium 3.5 128B Mistral AI | 54.67 ± 0.22 | 59.95 ± 0.28 | 85.22 ± 0.51 | 34.68 ± 0.45 |
Gemma 3 27B | 54.36 ± 0.13 | 62.50 ± 0.12 | 86.08 ± 0.20 | 38.92 ± 0.15 |
SEA-LION v3 (Gemma 2) 9B AISG | 54.10 ± 0.22 | 57.66 ± 0.20 | 83.09 ± 0.33 | 32.24 ± 0.20 |
Qwen 3 VL 4B Alibaba | 54.07 ± 0.12 | 57.90 ± 0.13 | 82.66 ± 0.24 | 33.15 ± 0.11 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 53.42 ± 0.18 | 56.38 ± 0.34 | 81.81 ± 0.44 | 30.96 ± 0.55 |
Qwen 3.5 9B Alibaba | 53.14 ± 0.21 | 55.99 ± 0.33 | 82.14 ± 0.45 | 29.85 ± 0.55 |
Gemma 3 12B | 52.60 ± 0.00 | 60.53 ± 0.00 | 83.06 ± 0.00 | 38.00 ± 0.00 |
Mistral Small 4 119B MoE Mistral AI | 49.10 ± 0.21 | 56.89 ± 0.38 | 84.50 ± 0.45 | 29.28 ± 0.62 |
SEA-LION v3 (Llama) 8B AISG | 46.52 ± 0.18 | 54.81 ± 0.27 | 80.76 ± 0.48 | 28.87 ± 0.31 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 45.32 ± 0.10 | 53.83 ± 0.14 | 75.31 ± 0.18 | 32.35 ± 0.18 |
Gemma 4 (E2B) 5B | 44.30 ± 0.18 | 52.92 ± 0.12 | 73.57 ± 0.22 | 32.28 ± 0.14 |
Qwen 3.5 4B Alibaba | 44.29 ± 0.19 | 54.75 ± 0.34 | 80.32 ± 0.49 | 29.18 ± 0.61 |
SEA-LION v4 (Gemma VL) 4B AISG | 43.61 ± 0.15 | 56.39 ± 0.20 | 78.52 ± 0.24 | 34.25 ± 0.33 |
MERaLiON 2 10B A*STAR | 42.98 ± 0.23 | 31.83 ± 0.43 | 59.63 ± 0.48 | 4.04 ± 0.77 |
SEA-LION v4 (Apertus) 8B AISG | 42.94 ± 0.16 | 56.71 ± 0.20 | 82.36 ± 0.18 | 31.07 ± 0.32 |
Gemma 3 4B | 40.61 ± 0.17 | 53.19 ± 0.21 | 72.15 ± 0.33 | 34.24 ± 0.28 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 38.99 ± 0.29 | 39.80 ± 0.59 | 63.74 ± 0.81 | 15.85 ± 0.83 |
Llama 3.1 8B Meta | 37.18 ± 0.15 | 55.76 ± 0.31 | 87.01 ± 0.45 | 24.50 ± 0.34 |
Llama 3.2 3B Meta | 36.15 ± 0.24 | 43.21 ± 0.41 | 74.35 ± 0.44 | 12.07 ± 0.63 |
Apertus 8B Swiss AI | 34.48 ± 0.30 | 37.59 ± 0.42 | 75.06 ± 0.82 | 0.12 ± 0.14 |
Olmo 3.1 32B AI2 | 34.13 ± 0.16 | 49.98 ± 0.34 | 79.92 ± 0.41 | 20.04 ± 0.49 |
GLM 4.7 Flash 30B MoE Z.ai | 32.10 ± 0.25 | 47.48 ± 0.50 | 77.90 ± 0.85 | 17.06 ± 0.45 |
MERaLiON 2 3B A*STAR | 22.94 ± 0.34 | 25.84 ± 0.27 | 51.67 ± 0.55 | 0.00 ± 0.00 |
Tiny Aya Water 3B CohereLabs | 9.62 ± 0.19 | 14.07 ± 0.39 | 28.13 ± 0.77 | 0.00 ± 0.00 |
Tiny Aya Global 3B CohereLabs | 8.04 ± 0.20 | 13.49 ± 0.44 | 26.97 ± 0.87 | 0.00 ± 0.00 |
Model | TH | Safety | Toxicity Detection |
|---|---|---|---|
SEA-LION v4.5 (Qwen) 27B AISG | 66.42 ± 0.11 | 52.30 ± 0.40 | 52.30 ± 0.40 |
Qwen 3.6 27B Alibaba | 63.24 ± 0.19 | 48.90 ± 0.42 | 48.90 ± 0.42 |
Qwen 3.5 27B Alibaba | 63.08 ± 0.13 | 40.18 ± 0.43 | 40.18 ± 0.43 |
Gemma 4 31B | 62.35 ± 0.07 | 19.67 ± 0.13 | 19.67 ± 0.13 |
Qwen 3 VL 32B Alibaba | 61.74 ± 0.10 | 41.99 ± 0.21 | 41.99 ± 0.21 |
SEA-LION v4 (Qwen) 32B AISG | 61.24 ± 0.08 | 41.87 ± 0.09 | 41.87 ± 0.09 |
Qwen 3.5 122B MoE Alibaba | 61.21 ± 0.15 | 38.90 ± 0.43 | 38.90 ± 0.43 |
Gemma 4 26B MoE | 60.14 ± 0.10 | 25.62 ± 0.16 | 25.62 ± 0.16 |
SEA-LION v3 (Llama) 70B AISG | 59.78 ± 0.17 | 32.17 ± 0.37 | 32.17 ± 0.37 |
SEA-LION v4 (Qwen VL) 8B AISG | 57.38 ± 0.08 | 44.33 ± 0.19 | 44.33 ± 0.19 |
Qwen 3 VL 8B Alibaba | 57.16 ± 0.12 | 45.82 ± 0.18 | 45.82 ± 0.18 |
Llama 3.3 70B Meta | 56.90 ± 0.07 | 21.40 ± 0.10 | 21.40 ± 0.10 |
Qwen 3.6 35B MoE Alibaba | 56.26 ± 0.18 | 38.67 ± 0.68 | 38.67 ± 0.68 |
SEA-LION v4 (Qwen VL) 4B AISG | 56.21 ± 0.12 | 40.22 ± 0.13 | 40.22 ± 0.13 |
Gemma 4 (E4B) 8B | 56.16 ± 0.20 | 37.60 ± 0.37 | 37.60 ± 0.37 |
Qwen 3.5 35B MoE Alibaba | 55.33 ± 0.20 | 34.98 ± 0.45 | 34.98 ± 0.45 |
Llama 4 Scout 109B MoE Meta | 54.85 ± 0.08 | 16.35 ± 0.05 | 16.35 ± 0.05 |
SEA-LION v4 (Gemma) 27B AISG | 54.67 ± 0.13 | 18.08 ± 0.14 | 18.08 ± 0.14 |
Mistral Medium 3.5 128B Mistral AI | 54.67 ± 0.22 | 31.49 ± 0.28 | 31.49 ± 0.28 |
Gemma 3 27B | 54.36 ± 0.13 | 18.34 ± 0.13 | 18.34 ± 0.13 |
SEA-LION v3 (Gemma 2) 9B AISG | 54.10 ± 0.22 | 33.03 ± 0.33 | 33.03 ± 0.33 |
Qwen 3 VL 4B Alibaba | 54.07 ± 0.12 | 37.43 ± 0.17 | 37.43 ± 0.17 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 53.42 ± 0.18 | 27.47 ± 0.45 | 27.47 ± 0.45 |
Qwen 3.5 9B Alibaba | 53.14 ± 0.21 | 44.36 ± 0.51 | 44.36 ± 0.51 |
Gemma 3 12B | 52.60 ± 0.00 | 15.12 ± 0.00 | 15.12 ± 0.00 |
Mistral Small 4 119B MoE Mistral AI | 49.10 ± 0.21 | 30.98 ± 0.40 | 30.98 ± 0.40 |
SEA-LION v3 (Llama) 8B AISG | 46.52 ± 0.18 | 13.66 ± 0.26 | 13.66 ± 0.26 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 45.32 ± 0.10 | 26.90 ± 0.34 | 26.90 ± 0.34 |
Gemma 4 (E2B) 5B | 44.30 ± 0.18 | 27.85 ± 0.36 | 27.85 ± 0.36 |
Qwen 3.5 4B Alibaba | 44.29 ± 0.19 | 36.76 ± 0.68 | 36.76 ± 0.68 |
SEA-LION v4 (Gemma VL) 4B AISG | 43.61 ± 0.15 | 8.62 ± 0.09 | 8.62 ± 0.09 |
MERaLiON 2 10B A*STAR | 42.98 ± 0.23 | 26.97 ± 0.25 | 26.97 ± 0.25 |
SEA-LION v4 (Apertus) 8B AISG | 42.94 ± 0.16 | 37.23 ± 0.36 | 37.23 ± 0.36 |
Gemma 3 4B | 40.61 ± 0.17 | 7.16 ± 0.05 | 7.16 ± 0.05 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 38.99 ± 0.29 | 30.08 ± 0.88 | 30.08 ± 0.88 |
Llama 3.1 8B Meta | 37.18 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.2 3B Meta | 36.15 ± 0.24 | 36.70 ± 0.44 | 36.70 ± 0.44 |
Apertus 8B Swiss AI | 34.48 ± 0.30 | 26.74 ± 0.72 | 26.74 ± 0.72 |
Olmo 3.1 32B AI2 | 34.13 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 |
GLM 4.7 Flash 30B MoE Z.ai | 32.10 ± 0.25 | 12.86 ± 0.68 | 12.86 ± 0.68 |
MERaLiON 2 3B A*STAR | 22.94 ± 0.34 | 8.12 ± 1.05 | 8.12 ± 1.05 |
Tiny Aya Water 3B CohereLabs | 9.62 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tiny Aya Global 3B CohereLabs | 8.04 ± 0.20 | 0.00 ± 0.00 | 0.00 ± 0.00 |