Filipino Performance
Filipino Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
31B 76.32±0.08 |
26B MoE 72.00±0.09 |
27B 71.99±0.15 |
122B MoE 71.13±0.18 |
27B 70.53±0.16 |
27B 70.17±0.13 |
27B 69.62±0.20 |
70B 69.38±0.18 |
32B 67.77±0.12 |
70B 66.84±0.13 |
12B 66.76±0.08 |
32B 66.55±0.14 |
27B 66.43±0.23 |
128B 65.65±0.18 |
109B MoE 64.82±0.12 |
8B 64.22±0.17 |
35B MoE 63.64±0.21 |
9B 62.79±0.15 |
35B MoE 62.42±0.23 |
120B MoE 58.90±0.19 |
4B 56.63±0.15 |
8B 56.21±0.15 |
8B 55.63±0.16 |
119B MoE 55.42±0.26 |
4B 54.19±0.14 |
5B 53.01±0.17 |
5B 52.42±0.18 |
10B 52.26±0.16 |
8B 52.17±0.25 |
4B 51.63±0.15 |
32B 51.45±0.14 |
4B 46.72±0.17 |
9B 43.61±0.23 |
8B 41.94±0.12 |
30B MoE 41.86±0.28 |
8B 40.85±0.20 |
4B 36.22±0.33 |
8B 27.54±0.26 |
30B MoE 27.53±0.30 |
3B 26.28±0.23 |
3B 23.67±0.31 |
3B 23.62±0.32 |
3B 23.52±0.20 |
Filipino Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TL | Cultural | Instruction Following | Knowledge | Multi-Turn Chat | NLG | NLR | NLU | Safety |
|---|---|---|---|---|---|---|---|---|---|
Gemma 4 31B | 76.32 ± 0.08 | 84.28 ± 0.33 | 92.25 ± 0.28 | 76.15 ± 0.12 | 82.73 ± 0.33 | 58.76 ± 0.08 | 81.51 ± 0.07 | 75.21 ± 0.06 | 59.63 ± 0.35 |
Gemma 4 26B MoE | 72.00 ± 0.09 | 81.33 ± 0.28 | 88.73 ± 0.49 | 65.05 ± 0.28 | 82.38 ± 0.31 | 58.05 ± 0.08 | 76.32 ± 0.15 | 71.62 ± 0.15 | 52.50 ± 0.33 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.99 ± 0.15 | 75.94 ± 0.68 | 87.65 ± 0.54 | 73.65 ± 0.31 | 79.90 ± 0.38 | 55.85 ± 0.06 | 79.01 ± 0.23 | 70.88 ± 0.31 | 53.03 ± 0.61 |
Qwen 3.5 122B MoE Alibaba | 71.13 ± 0.18 | 78.61 ± 0.37 | 90.06 ± 0.75 | 74.29 ± 0.28 | 82.09 ± 0.33 | 55.43 ± 0.08 | 73.04 ± 0.24 | 69.85 ± 0.30 | 45.65 ± 0.86 |
SEA-LION v4 (Gemma) 27B AISG | 70.53 ± 0.16 | 77.30 ± 0.51 | 85.94 ± 0.91 | 64.75 ± 0.24 | 76.51 ± 0.50 | 55.59 ± 0.06 | 72.39 ± 0.11 | 70.17 ± 0.26 | 61.58 ± 0.36 |
Gemma 3 27B | 70.17 ± 0.13 | 75.87 ± 0.47 | 83.90 ± 0.79 | 65.52 ± 0.19 | 76.37 ± 0.50 | 55.53 ± 0.05 | 71.41 ± 0.12 | 69.94 ± 0.21 | 62.80 ± 0.34 |
Qwen 3.5 27B Alibaba | 69.62 ± 0.20 | 74.45 ± 0.65 | 88.86 ± 0.65 | 70.73 ± 0.52 | 82.08 ± 0.32 | 53.58 ± 0.07 | 73.68 ± 0.24 | 70.21 ± 0.40 | 43.38 ± 0.86 |
SEA-LION v3 (Llama) 70B AISG | 69.38 ± 0.18 | 74.23 ± 0.49 | 87.56 ± 0.56 | 71.17 ± 0.48 | 66.53 ± 0.71 | 57.44 ± 0.10 | 73.36 ± 0.31 | 69.16 ± 0.39 | 55.55 ± 0.63 |
SEA-LION v4 (Qwen) 32B AISG | 67.77 ± 0.12 | 67.53 ± 0.55 | 84.70 ± 0.61 | 60.34 ± 0.14 | 74.64 ± 0.41 | 54.33 ± 0.03 | 73.58 ± 0.07 | 73.54 ± 0.13 | 53.50 ± 0.27 |
Llama 3.3 70B Meta | 66.84 ± 0.13 | 69.61 ± 0.62 | 89.75 ± 0.48 | 68.63 ± 0.22 | 58.12 ± 0.52 | 55.44 ± 0.08 | 73.41 ± 0.14 | 69.33 ± 0.19 | 50.43 ± 0.26 |
Gemma 3 12B | 66.76 ± 0.08 | 73.01 ± 0.49 | 81.90 ± 0.00 | 57.38 ± 0.00 | 73.51 ± 0.42 | 54.66 ± 0.00 | 69.50 ± 0.00 | 67.58 ± 0.00 | 56.50 ± 0.00 |
Qwen 3 VL 32B Alibaba | 66.55 ± 0.14 | 67.92 ± 0.44 | 81.46 ± 0.64 | 62.65 ± 0.24 | 77.77 ± 0.39 | 50.93 ± 0.03 | 71.87 ± 0.16 | 70.70 ± 0.19 | 49.10 ± 0.29 |
Qwen 3.6 27B Alibaba | 66.43 ± 0.23 | 64.89 ± 0.76 | 88.92 ± 0.51 | 70.24 ± 0.55 | 80.33 ± 0.29 | 53.44 ± 0.09 | 68.43 ± 0.43 | 63.55 ± 0.63 | 41.67 ± 1.40 |
Mistral Medium 3.5 128B Mistral AI | 65.65 ± 0.18 | 75.70 ± 0.55 | 75.27 ± 0.89 | 67.39 ± 0.38 | 78.55 ± 0.57 | 54.22 ± 0.08 | 63.57 ± 0.28 | 68.86 ± 0.27 | 41.65 ± 0.69 |
Llama 4 Scout 109B MoE Meta | 64.82 ± 0.12 | 68.58 ± 0.57 | 90.48 ± 0.43 | 68.94 ± 0.11 | 64.09 ± 0.51 | 55.06 ± 0.04 | 68.55 ± 0.08 | 66.47 ± 0.06 | 36.38 ± 0.12 |
Gemma 4 (E4B) 8B | 64.22 ± 0.17 | 64.71 ± 0.40 | 82.86 ± 0.68 | 54.24 ± 0.39 | 76.51 ± 0.47 | 54.96 ± 0.05 | 69.64 ± 0.16 | 70.72 ± 0.26 | 40.12 ± 0.65 |
Qwen 3.6 35B MoE Alibaba | 63.64 ± 0.21 | 70.15 ± 0.66 | 82.48 ± 0.69 | 63.10 ± 0.62 | 79.81 ± 0.39 | 53.16 ± 0.10 | 47.35 ± 0.52 | 68.17 ± 0.47 | 44.92 ± 1.00 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.79 ± 0.15 | 66.83 ± 0.51 | 78.10 ± 0.91 | 51.90 ± 0.32 | 62.30 ± 0.39 | 54.14 ± 0.08 | 67.26 ± 0.23 | 73.25 ± 0.30 | 48.57 ± 0.49 |
Qwen 3.5 35B MoE Alibaba | 62.42 ± 0.23 | 70.09 ± 0.61 | 82.54 ± 0.72 | 47.93 ± 0.71 | 78.86 ± 0.47 | 51.30 ± 0.07 | 55.15 ± 0.70 | 69.03 ± 0.55 | 44.45 ± 1.01 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 58.90 ± 0.19 | 74.55 ± 0.58 | 67.52 ± 0.97 | 58.85 ± 0.68 | 79.90 ± 0.59 | 47.56 ± 0.07 | 48.73 ± 0.67 | 65.08 ± 0.52 | 29.02 ± 0.77 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.63 ± 0.15 | 57.26 ± 0.47 | 76.22 ± 0.68 | 42.79 ± 0.26 | 66.19 ± 0.50 | 53.79 ± 0.04 | 57.97 ± 0.18 | 64.20 ± 0.11 | 34.63 ± 0.30 |
SEA-LION v4 (Qwen VL) 8B AISG | 56.21 ± 0.15 | 52.67 ± 0.39 | 76.60 ± 0.75 | 45.88 ± 0.27 | 61.08 ± 0.59 | 49.14 ± 0.07 | 58.60 ± 0.14 | 66.69 ± 0.07 | 39.02 ± 0.34 |
Qwen 3 VL 8B Alibaba | 55.63 ± 0.16 | 52.02 ± 0.50 | 77.52 ± 0.90 | 44.71 ± 0.15 | 63.78 ± 0.57 | 49.02 ± 0.07 | 56.48 ± 0.18 | 66.46 ± 0.15 | 35.07 ± 0.21 |
Mistral Small 4 119B MoE Mistral AI | 55.42 ± 0.26 | 67.36 ± 0.61 | 61.97 ± 1.10 | 52.66 ± 0.47 | 71.85 ± 0.57 | 52.67 ± 0.07 | 52.63 ± 0.59 | 59.14 ± 0.34 | 25.05 ± 0.61 |
Gemma 3 4B | 54.19 ± 0.14 | 55.24 ± 0.51 | 76.32 ± 0.75 | 43.99 ± 0.21 | 63.51 ± 0.48 | 53.39 ± 0.06 | 51.29 ± 0.14 | 63.78 ± 0.11 | 26.02 ± 0.34 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.01 ± 0.17 | 53.62 ± 0.45 | 72.03 ± 0.84 | 37.50 ± 0.40 | 68.86 ± 0.46 | 54.32 ± 0.06 | 47.23 ± 0.17 | 60.63 ± 0.26 | 29.87 ± 0.63 |
Gemma 4 (E2B) 5B | 52.42 ± 0.18 | 52.60 ± 0.32 | 72.35 ± 0.97 | 37.06 ± 0.28 | 70.72 ± 0.38 | 54.00 ± 0.05 | 40.57 ± 0.25 | 61.29 ± 0.19 | 30.80 ± 0.64 |
MERaLiON 2 10B A*STAR | 52.26 ± 0.16 | 57.00 ± 0.60 | 56.03 ± 0.98 | 50.02 ± 0.42 | 44.52 ± 0.59 | 51.07 ± 0.11 | 64.36 ± 0.19 | 66.67 ± 0.33 | 28.43 ± 0.44 |
SEA-LION v3 (Llama) 8B AISG | 52.17 ± 0.25 | 56.48 ± 0.60 | 68.98 ± 0.95 | 44.93 ± 0.45 | 57.12 ± 0.75 | 53.02 ± 0.11 | 54.41 ± 0.54 | 61.54 ± 0.43 | 20.90 ± 0.33 |
SEA-LION v4 (Qwen VL) 4B AISG | 51.63 ± 0.15 | 44.69 ± 0.50 | 67.97 ± 0.66 | 43.75 ± 0.22 | 55.96 ± 0.64 | 45.49 ± 0.05 | 47.27 ± 0.18 | 65.94 ± 0.14 | 41.98 ± 0.23 |
Olmo 3.1 32B AI2 | 51.45 ± 0.14 | 55.36 ± 0.57 | 64.35 ± 0.66 | 42.15 ± 0.39 | 66.40 ± 0.37 | 49.55 ± 0.08 | 40.92 ± 0.34 | 60.14 ± 0.28 | 32.73 ± 0.54 |
Qwen 3 VL 4B Alibaba | 46.72 ± 0.17 | 38.20 ± 0.45 | 63.81 ± 0.80 | 39.51 ± 0.26 | 51.17 ± 0.69 | 42.70 ± 0.05 | 41.23 ± 0.32 | 60.23 ± 0.19 | 36.92 ± 0.26 |
Qwen 3.5 9B Alibaba | 43.61 ± 0.23 | 49.51 ± 0.54 | 66.22 ± 1.08 | 41.79 ± 0.48 | 64.18 ± 0.81 | 43.38 ± 0.11 | 28.34 ± 0.59 | 29.66 ± 0.78 | 25.78 ± 1.53 |
Llama 3.1 8B Meta | 41.94 ± 0.12 | 44.45 ± 0.50 | 54.76 ± 0.77 | 34.15 ± 0.53 | 39.26 ± 0.63 | 51.79 ± 0.12 | 35.80 ± 0.54 | 54.45 ± 0.49 | 20.83 ± 0.54 |
GLM 4.7 Flash 30B MoE Z.ai | 41.86 ± 0.28 | 42.81 ± 1.00 | 58.95 ± 1.26 | 25.50 ± 0.67 | 42.18 ± 0.62 | 47.56 ± 0.10 | 23.93 ± 0.95 | 55.68 ± 0.80 | 38.30 ± 0.95 |
SEA-LION v4 (Apertus) 8B AISG | 40.85 ± 0.20 | 48.53 ± 0.59 | 48.22 ± 1.16 | 20.01 ± 0.33 | 38.42 ± 0.54 | 50.05 ± 0.04 | 25.50 ± 0.36 | 47.92 ± 0.31 | 48.12 ± 0.50 |
Qwen 3.5 4B Alibaba | 36.22 ± 0.33 | 37.79 ± 0.58 | 51.94 ± 1.23 | 31.81 ± 0.87 | 40.21 ± 0.67 | 31.54 ± 0.17 | 22.51 ± 0.76 | 50.21 ± 0.61 | 23.75 ± 1.27 |
Apertus 8B Swiss AI | 27.54 ± 0.26 | 23.42 ± 0.95 | 55.02 ± 1.21 | 6.92 ± 0.52 | 32.49 ± 0.68 | 42.92 ± 0.13 | 6.54 ± 0.65 | 26.87 ± 0.81 | 26.15 ± 1.39 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 27.53 ± 0.30 | 45.11 ± 0.85 | 47.97 ± 1.54 | 24.21 ± 0.86 | 51.04 ± 0.84 | 29.31 ± 0.12 | 5.47 ± 1.02 | 16.74 ± 0.65 | 0.37 ± 0.21 |
Llama 3.2 3B Meta | 26.28 ± 0.23 | 30.72 ± 0.64 | 37.11 ± 1.05 | 21.34 ± 0.52 | 24.88 ± 0.40 | 40.28 ± 0.16 | 8.18 ± 0.69 | 33.84 ± 0.57 | 13.93 ± 1.13 |
Tiny Aya Global 3B CohereLabs | 23.67 ± 0.31 | 20.73 ± 1.22 | 65.43 ± 1.29 | 23.89 ± 0.89 | 29.45 ± 0.66 | 35.99 ± 0.18 | 0.48 ± 0.37 | 11.65 ± 0.98 | 1.77 ± 1.11 |
Tiny Aya Water 3B CohereLabs | 23.62 ± 0.32 | 18.25 ± 1.10 | 67.17 ± 1.04 | 25.99 ± 0.74 | 32.31 ± 0.67 | 36.11 ± 0.18 | 1.97 ± 0.64 | 6.61 ± 0.96 | 0.57 ± 0.57 |
MERaLiON 2 3B A*STAR | 23.52 ± 0.20 | 32.76 ± 0.69 | 26.32 ± 1.02 | 18.11 ± 0.46 | 23.18 ± 0.55 | 29.18 ± 0.11 | 21.69 ± 0.57 | 27.75 ± 0.69 | 9.13 ± 0.37 |
Filipino Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TL | Cultural | Kalahi (LLM Judge) | Kalahi |
|---|---|---|---|---|
Gemma 4 31B | 76.32 ± 0.08 | 84.28 ± 0.33 | 77.88 ± 0.63 | 90.68 ± 0.23 |
Gemma 4 26B MoE | 72.00 ± 0.09 | 81.33 ± 0.28 | 72.99 ± 0.59 | 89.68 ± 0.20 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.99 ± 0.15 | 75.94 ± 0.68 | 65.31 ± 1.24 | 86.57 ± 0.42 |
Qwen 3.5 122B MoE Alibaba | 71.13 ± 0.18 | 78.61 ± 0.37 | 68.92 ± 0.71 | 88.31 ± 0.40 |
SEA-LION v4 (Gemma) 27B AISG | 70.53 ± 0.16 | 77.30 ± 0.51 | 71.28 ± 0.98 | 83.32 ± 0.22 |
Gemma 3 27B | 70.17 ± 0.13 | 75.87 ± 0.47 | 68.92 ± 0.94 | 82.81 ± 0.14 |
Qwen 3.5 27B Alibaba | 69.62 ± 0.20 | 74.45 ± 0.65 | 63.53 ± 0.95 | 85.36 ± 0.55 |
SEA-LION v3 (Llama) 70B AISG | 69.38 ± 0.18 | 74.23 ± 0.49 | 61.11 ± 0.92 | 87.36 ± 0.47 |
SEA-LION v4 (Qwen) 32B AISG | 67.77 ± 0.12 | 67.53 ± 0.55 | 52.31 ± 1.06 | 82.76 ± 0.16 |
Llama 3.3 70B Meta | 66.84 ± 0.13 | 69.61 ± 0.62 | 56.50 ± 1.29 | 82.72 ± 0.25 |
Gemma 3 12B | 66.76 ± 0.08 | 73.01 ± 0.49 | 62.92 ± 0.98 | 83.10 ± 0.00 |
Qwen 3 VL 32B Alibaba | 66.55 ± 0.14 | 67.92 ± 0.44 | 55.31 ± 0.89 | 80.54 ± 0.36 |
Qwen 3.6 27B Alibaba | 66.43 ± 0.23 | 64.89 ± 0.76 | 61.98 ± 0.98 | 67.80 ± 1.09 |
Mistral Medium 3.5 128B Mistral AI | 65.65 ± 0.18 | 75.70 ± 0.55 | 65.00 ± 1.15 | 86.40 ± 0.53 |
Llama 4 Scout 109B MoE Meta | 64.82 ± 0.12 | 68.58 ± 0.57 | 51.23 ± 1.14 | 85.93 ± 0.13 |
Gemma 4 (E4B) 8B | 64.22 ± 0.17 | 64.71 ± 0.40 | 52.69 ± 0.82 | 76.73 ± 0.32 |
Qwen 3.6 35B MoE Alibaba | 63.64 ± 0.21 | 70.15 ± 0.66 | 60.03 ± 1.14 | 80.28 ± 0.79 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.79 ± 0.15 | 66.83 ± 0.51 | 47.73 ± 0.83 | 85.94 ± 0.49 |
Qwen 3.5 35B MoE Alibaba | 62.42 ± 0.23 | 70.09 ± 0.61 | 59.12 ± 0.95 | 81.06 ± 0.66 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 58.90 ± 0.19 | 74.55 ± 0.58 | 68.03 ± 1.01 | 81.06 ± 0.66 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.63 ± 0.15 | 57.26 ± 0.47 | 46.54 ± 0.93 | 67.97 ± 0.00 |
SEA-LION v4 (Qwen VL) 8B AISG | 56.21 ± 0.15 | 52.67 ± 0.39 | 28.70 ± 0.79 | 76.65 ± 0.40 |
Qwen 3 VL 8B Alibaba | 55.63 ± 0.16 | 52.02 ± 0.50 | 30.54 ± 0.96 | 73.50 ± 0.21 |
Mistral Small 4 119B MoE Mistral AI | 55.42 ± 0.26 | 67.36 ± 0.61 | 57.01 ± 1.17 | 77.71 ± 0.67 |
Gemma 3 4B | 54.19 ± 0.14 | 55.24 ± 0.51 | 48.99 ± 0.98 | 61.49 ± 0.25 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.01 ± 0.17 | 53.62 ± 0.45 | 44.96 ± 0.80 | 62.28 ± 0.39 |
Gemma 4 (E2B) 5B | 52.42 ± 0.18 | 52.60 ± 0.32 | 46.13 ± 0.62 | 59.08 ± 0.22 |
MERaLiON 2 10B A*STAR | 52.26 ± 0.16 | 57.00 ± 0.60 | 36.75 ± 1.00 | 77.26 ± 0.50 |
SEA-LION v3 (Llama) 8B AISG | 52.17 ± 0.25 | 56.48 ± 0.60 | 44.17 ± 0.98 | 68.80 ± 0.59 |
SEA-LION v4 (Qwen VL) 4B AISG | 51.63 ± 0.15 | 44.69 ± 0.50 | 21.53 ± 0.93 | 67.84 ± 0.24 |
Olmo 3.1 32B AI2 | 51.45 ± 0.14 | 55.36 ± 0.57 | 43.53 ± 0.85 | 67.19 ± 0.68 |
Qwen 3 VL 4B Alibaba | 46.72 ± 0.17 | 38.20 ± 0.45 | 17.01 ± 0.60 | 59.39 ± 0.44 |
Qwen 3.5 9B Alibaba | 43.61 ± 0.23 | 49.51 ± 0.54 | 32.88 ± 1.06 | 66.15 ± 0.79 |
Llama 3.1 8B Meta | 41.94 ± 0.12 | 44.45 ± 0.50 | 28.02 ± 0.96 | 60.88 ± 0.57 |
GLM 4.7 Flash 30B MoE Z.ai | 41.86 ± 0.28 | 42.81 ± 1.00 | 27.78 ± 1.35 | 57.84 ± 1.08 |
SEA-LION v4 (Apertus) 8B AISG | 40.85 ± 0.20 | 48.53 ± 0.59 | 33.86 ± 1.06 | 63.20 ± 0.42 |
Qwen 3.5 4B Alibaba | 36.22 ± 0.33 | 37.79 ± 0.58 | 14.24 ± 0.69 | 61.33 ± 1.02 |
Apertus 8B Swiss AI | 27.54 ± 0.26 | 23.42 ± 0.95 | 31.17 ± 1.14 | 15.67 ± 1.59 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 27.53 ± 0.30 | 45.11 ± 0.85 | 37.94 ± 1.13 | 52.28 ± 1.44 |
Llama 3.2 3B Meta | 26.28 ± 0.23 | 30.72 ± 0.64 | 18.93 ± 0.93 | 42.51 ± 0.95 |
Tiny Aya Global 3B CohereLabs | 23.67 ± 0.31 | 20.73 ± 1.22 | 19.07 ± 0.80 | 22.40 ± 2.10 |
Tiny Aya Water 3B CohereLabs | 23.62 ± 0.32 | 18.25 ± 1.10 | 21.06 ± 0.83 | 15.44 ± 1.95 |
MERaLiON 2 3B A*STAR | 23.52 ± 0.20 | 32.76 ± 0.69 | 22.20 ± 1.04 | 43.32 ± 0.89 |
Model | TL | Instruction Following | SEA-IFEval |
|---|---|---|---|
Gemma 4 31B | 76.32 ± 0.08 | 92.25 ± 0.28 | 92.25 ± 0.28 |
Gemma 4 26B MoE | 72.00 ± 0.09 | 88.73 ± 0.49 | 88.73 ± 0.49 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.99 ± 0.15 | 87.65 ± 0.54 | 87.65 ± 0.54 |
Qwen 3.5 122B MoE Alibaba | 71.13 ± 0.18 | 90.06 ± 0.75 | 90.06 ± 0.75 |
SEA-LION v4 (Gemma) 27B AISG | 70.53 ± 0.16 | 85.94 ± 0.91 | 85.94 ± 0.91 |
Gemma 3 27B | 70.17 ± 0.13 | 83.90 ± 0.79 | 83.90 ± 0.79 |
Qwen 3.5 27B Alibaba | 69.62 ± 0.20 | 88.86 ± 0.65 | 88.86 ± 0.65 |
SEA-LION v3 (Llama) 70B AISG | 69.38 ± 0.18 | 87.56 ± 0.56 | 87.56 ± 0.56 |
SEA-LION v4 (Qwen) 32B AISG | 67.77 ± 0.12 | 84.70 ± 0.61 | 84.70 ± 0.61 |
Llama 3.3 70B Meta | 66.84 ± 0.13 | 89.75 ± 0.48 | 89.75 ± 0.48 |
Gemma 3 12B | 66.76 ± 0.08 | 81.90 ± 0.00 | 81.90 ± 0.00 |
Qwen 3 VL 32B Alibaba | 66.55 ± 0.14 | 81.46 ± 0.64 | 81.46 ± 0.64 |
Qwen 3.6 27B Alibaba | 66.43 ± 0.23 | 88.92 ± 0.51 | 88.92 ± 0.51 |
Mistral Medium 3.5 128B Mistral AI | 65.65 ± 0.18 | 75.27 ± 0.89 | 75.27 ± 0.89 |
Llama 4 Scout 109B MoE Meta | 64.82 ± 0.12 | 90.48 ± 0.43 | 90.48 ± 0.43 |
Gemma 4 (E4B) 8B | 64.22 ± 0.17 | 82.86 ± 0.68 | 82.86 ± 0.68 |
Qwen 3.6 35B MoE Alibaba | 63.64 ± 0.21 | 82.48 ± 0.69 | 82.48 ± 0.69 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.79 ± 0.15 | 78.10 ± 0.91 | 78.10 ± 0.91 |
Qwen 3.5 35B MoE Alibaba | 62.42 ± 0.23 | 82.54 ± 0.72 | 82.54 ± 0.72 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 58.90 ± 0.19 | 67.52 ± 0.97 | 67.52 ± 0.97 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.63 ± 0.15 | 76.22 ± 0.68 | 76.22 ± 0.68 |
SEA-LION v4 (Qwen VL) 8B AISG | 56.21 ± 0.15 | 76.60 ± 0.75 | 76.60 ± 0.75 |
Qwen 3 VL 8B Alibaba | 55.63 ± 0.16 | 77.52 ± 0.90 | 77.52 ± 0.90 |
Mistral Small 4 119B MoE Mistral AI | 55.42 ± 0.26 | 61.97 ± 1.10 | 61.97 ± 1.10 |
Gemma 3 4B | 54.19 ± 0.14 | 76.32 ± 0.75 | 76.32 ± 0.75 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.01 ± 0.17 | 72.03 ± 0.84 | 72.03 ± 0.84 |
Gemma 4 (E2B) 5B | 52.42 ± 0.18 | 72.35 ± 0.97 | 72.35 ± 0.97 |
MERaLiON 2 10B A*STAR | 52.26 ± 0.16 | 56.03 ± 0.98 | 56.03 ± 0.98 |
SEA-LION v3 (Llama) 8B AISG | 52.17 ± 0.25 | 68.98 ± 0.95 | 68.98 ± 0.95 |
SEA-LION v4 (Qwen VL) 4B AISG | 51.63 ± 0.15 | 67.97 ± 0.66 | 67.97 ± 0.66 |
Olmo 3.1 32B AI2 | 51.45 ± 0.14 | 64.35 ± 0.66 | 64.35 ± 0.66 |
Qwen 3 VL 4B Alibaba | 46.72 ± 0.17 | 63.81 ± 0.80 | 63.81 ± 0.80 |
Qwen 3.5 9B Alibaba | 43.61 ± 0.23 | 66.22 ± 1.08 | 66.22 ± 1.08 |
Llama 3.1 8B Meta | 41.94 ± 0.12 | 54.76 ± 0.77 | 54.76 ± 0.77 |
GLM 4.7 Flash 30B MoE Z.ai | 41.86 ± 0.28 | 58.95 ± 1.26 | 58.95 ± 1.26 |
SEA-LION v4 (Apertus) 8B AISG | 40.85 ± 0.20 | 48.22 ± 1.16 | 48.22 ± 1.16 |
Qwen 3.5 4B Alibaba | 36.22 ± 0.33 | 51.94 ± 1.23 | 51.94 ± 1.23 |
Apertus 8B Swiss AI | 27.54 ± 0.26 | 55.02 ± 1.21 | 55.02 ± 1.21 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 27.53 ± 0.30 | 47.97 ± 1.54 | 47.97 ± 1.54 |
Llama 3.2 3B Meta | 26.28 ± 0.23 | 37.11 ± 1.05 | 37.11 ± 1.05 |
Tiny Aya Global 3B CohereLabs | 23.67 ± 0.31 | 65.43 ± 1.29 | 65.43 ± 1.29 |
Tiny Aya Water 3B CohereLabs | 23.62 ± 0.32 | 67.17 ± 1.04 | 67.17 ± 1.04 |
MERaLiON 2 3B A*STAR | 23.52 ± 0.20 | 26.32 ± 1.02 | 26.32 ± 1.02 |
Model | TL | Knowledge | Global MMLU Lite |
|---|---|---|---|
Gemma 4 31B | 76.32 ± 0.08 | 76.15 ± 0.12 | 76.15 ± 0.12 |
Gemma 4 26B MoE | 72.00 ± 0.09 | 65.05 ± 0.28 | 65.05 ± 0.28 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.99 ± 0.15 | 73.65 ± 0.31 | 73.65 ± 0.31 |
Qwen 3.5 122B MoE Alibaba | 71.13 ± 0.18 | 74.29 ± 0.28 | 74.29 ± 0.28 |
SEA-LION v4 (Gemma) 27B AISG | 70.53 ± 0.16 | 64.75 ± 0.24 | 64.75 ± 0.24 |
Gemma 3 27B | 70.17 ± 0.13 | 65.52 ± 0.19 | 65.52 ± 0.19 |
Qwen 3.5 27B Alibaba | 69.62 ± 0.20 | 70.73 ± 0.52 | 70.73 ± 0.52 |
SEA-LION v3 (Llama) 70B AISG | 69.38 ± 0.18 | 71.17 ± 0.48 | 71.17 ± 0.48 |
SEA-LION v4 (Qwen) 32B AISG | 67.77 ± 0.12 | 60.34 ± 0.14 | 60.34 ± 0.14 |
Llama 3.3 70B Meta | 66.84 ± 0.13 | 68.63 ± 0.22 | 68.63 ± 0.22 |
Gemma 3 12B | 66.76 ± 0.08 | 57.38 ± 0.00 | 57.38 ± 0.00 |
Qwen 3 VL 32B Alibaba | 66.55 ± 0.14 | 62.65 ± 0.24 | 62.65 ± 0.24 |
Qwen 3.6 27B Alibaba | 66.43 ± 0.23 | 70.24 ± 0.55 | 70.24 ± 0.55 |
Mistral Medium 3.5 128B Mistral AI | 65.65 ± 0.18 | 67.39 ± 0.38 | 67.39 ± 0.38 |
Llama 4 Scout 109B MoE Meta | 64.82 ± 0.12 | 68.94 ± 0.11 | 68.94 ± 0.11 |
Gemma 4 (E4B) 8B | 64.22 ± 0.17 | 54.24 ± 0.39 | 54.24 ± 0.39 |
Qwen 3.6 35B MoE Alibaba | 63.64 ± 0.21 | 63.10 ± 0.62 | 63.10 ± 0.62 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.79 ± 0.15 | 51.90 ± 0.32 | 51.90 ± 0.32 |
Qwen 3.5 35B MoE Alibaba | 62.42 ± 0.23 | 47.93 ± 0.71 | 47.93 ± 0.71 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 58.90 ± 0.19 | 58.85 ± 0.68 | 58.85 ± 0.68 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.63 ± 0.15 | 42.79 ± 0.26 | 42.79 ± 0.26 |
SEA-LION v4 (Qwen VL) 8B AISG | 56.21 ± 0.15 | 45.88 ± 0.27 | 45.88 ± 0.27 |
Qwen 3 VL 8B Alibaba | 55.63 ± 0.16 | 44.71 ± 0.15 | 44.71 ± 0.15 |
Mistral Small 4 119B MoE Mistral AI | 55.42 ± 0.26 | 52.66 ± 0.47 | 52.66 ± 0.47 |
Gemma 3 4B | 54.19 ± 0.14 | 43.99 ± 0.21 | 43.99 ± 0.21 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.01 ± 0.17 | 37.50 ± 0.40 | 37.50 ± 0.40 |
Gemma 4 (E2B) 5B | 52.42 ± 0.18 | 37.06 ± 0.28 | 37.06 ± 0.28 |
MERaLiON 2 10B A*STAR | 52.26 ± 0.16 | 50.02 ± 0.42 | 50.02 ± 0.42 |
SEA-LION v3 (Llama) 8B AISG | 52.17 ± 0.25 | 44.93 ± 0.45 | 44.93 ± 0.45 |
SEA-LION v4 (Qwen VL) 4B AISG | 51.63 ± 0.15 | 43.75 ± 0.22 | 43.75 ± 0.22 |
Olmo 3.1 32B AI2 | 51.45 ± 0.14 | 42.15 ± 0.39 | 42.15 ± 0.39 |
Qwen 3 VL 4B Alibaba | 46.72 ± 0.17 | 39.51 ± 0.26 | 39.51 ± 0.26 |
Qwen 3.5 9B Alibaba | 43.61 ± 0.23 | 41.79 ± 0.48 | 41.79 ± 0.48 |
Llama 3.1 8B Meta | 41.94 ± 0.12 | 34.15 ± 0.53 | 34.15 ± 0.53 |
GLM 4.7 Flash 30B MoE Z.ai | 41.86 ± 0.28 | 25.50 ± 0.67 | 25.50 ± 0.67 |
SEA-LION v4 (Apertus) 8B AISG | 40.85 ± 0.20 | 20.01 ± 0.33 | 20.01 ± 0.33 |
Qwen 3.5 4B Alibaba | 36.22 ± 0.33 | 31.81 ± 0.87 | 31.81 ± 0.87 |
Apertus 8B Swiss AI | 27.54 ± 0.26 | 6.92 ± 0.52 | 6.92 ± 0.52 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 27.53 ± 0.30 | 24.21 ± 0.86 | 24.21 ± 0.86 |
Llama 3.2 3B Meta | 26.28 ± 0.23 | 21.34 ± 0.52 | 21.34 ± 0.52 |
Tiny Aya Global 3B CohereLabs | 23.67 ± 0.31 | 23.89 ± 0.89 | 23.89 ± 0.89 |
Tiny Aya Water 3B CohereLabs | 23.62 ± 0.32 | 25.99 ± 0.74 | 25.99 ± 0.74 |
MERaLiON 2 3B A*STAR | 23.52 ± 0.20 | 18.11 ± 0.46 | 18.11 ± 0.46 |
Model | TL | Multi-Turn Chat | SEA-MT-Bench (LLM Judge) |
|---|---|---|---|
Gemma 4 31B | 76.32 ± 0.08 | 82.73 ± 0.33 | 82.73 ± 0.33 |
Gemma 4 26B MoE | 72.00 ± 0.09 | 82.38 ± 0.31 | 82.38 ± 0.31 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.99 ± 0.15 | 79.90 ± 0.38 | 79.90 ± 0.38 |
Qwen 3.5 122B MoE Alibaba | 71.13 ± 0.18 | 82.09 ± 0.33 | 82.09 ± 0.33 |
SEA-LION v4 (Gemma) 27B AISG | 70.53 ± 0.16 | 76.51 ± 0.50 | 76.51 ± 0.50 |
Gemma 3 27B | 70.17 ± 0.13 | 76.37 ± 0.50 | 76.37 ± 0.50 |
Qwen 3.5 27B Alibaba | 69.62 ± 0.20 | 82.08 ± 0.32 | 82.08 ± 0.32 |
SEA-LION v3 (Llama) 70B AISG | 69.38 ± 0.18 | 66.53 ± 0.71 | 66.53 ± 0.71 |
SEA-LION v4 (Qwen) 32B AISG | 67.77 ± 0.12 | 74.64 ± 0.41 | 74.64 ± 0.41 |
Llama 3.3 70B Meta | 66.84 ± 0.13 | 58.12 ± 0.52 | 58.12 ± 0.52 |
Gemma 3 12B | 66.76 ± 0.08 | 73.51 ± 0.42 | 73.51 ± 0.42 |
Qwen 3 VL 32B Alibaba | 66.55 ± 0.14 | 77.77 ± 0.39 | 77.77 ± 0.39 |
Qwen 3.6 27B Alibaba | 66.43 ± 0.23 | 80.33 ± 0.29 | 80.33 ± 0.29 |
Mistral Medium 3.5 128B Mistral AI | 65.65 ± 0.18 | 78.55 ± 0.57 | 78.55 ± 0.57 |
Llama 4 Scout 109B MoE Meta | 64.82 ± 0.12 | 64.09 ± 0.51 | 64.09 ± 0.51 |
Gemma 4 (E4B) 8B | 64.22 ± 0.17 | 76.51 ± 0.47 | 76.51 ± 0.47 |
Qwen 3.6 35B MoE Alibaba | 63.64 ± 0.21 | 79.81 ± 0.39 | 79.81 ± 0.39 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.79 ± 0.15 | 62.30 ± 0.39 | 62.30 ± 0.39 |
Qwen 3.5 35B MoE Alibaba | 62.42 ± 0.23 | 78.86 ± 0.47 | 78.86 ± 0.47 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 58.90 ± 0.19 | 79.90 ± 0.59 | 79.90 ± 0.59 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.63 ± 0.15 | 66.19 ± 0.50 | 66.19 ± 0.50 |
SEA-LION v4 (Qwen VL) 8B AISG | 56.21 ± 0.15 | 61.08 ± 0.59 | 61.08 ± 0.59 |
Qwen 3 VL 8B Alibaba | 55.63 ± 0.16 | 63.78 ± 0.57 | 63.78 ± 0.57 |
Mistral Small 4 119B MoE Mistral AI | 55.42 ± 0.26 | 71.85 ± 0.57 | 71.85 ± 0.57 |
Gemma 3 4B | 54.19 ± 0.14 | 63.51 ± 0.48 | 63.51 ± 0.48 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.01 ± 0.17 | 68.86 ± 0.46 | 68.86 ± 0.46 |
Gemma 4 (E2B) 5B | 52.42 ± 0.18 | 70.72 ± 0.38 | 70.72 ± 0.38 |
MERaLiON 2 10B A*STAR | 52.26 ± 0.16 | 44.52 ± 0.59 | 44.52 ± 0.59 |
SEA-LION v3 (Llama) 8B AISG | 52.17 ± 0.25 | 57.12 ± 0.75 | 57.12 ± 0.75 |
SEA-LION v4 (Qwen VL) 4B AISG | 51.63 ± 0.15 | 55.96 ± 0.64 | 55.96 ± 0.64 |
Olmo 3.1 32B AI2 | 51.45 ± 0.14 | 66.40 ± 0.37 | 66.40 ± 0.37 |
Qwen 3 VL 4B Alibaba | 46.72 ± 0.17 | 51.17 ± 0.69 | 51.17 ± 0.69 |
Qwen 3.5 9B Alibaba | 43.61 ± 0.23 | 64.18 ± 0.81 | 64.18 ± 0.81 |
Llama 3.1 8B Meta | 41.94 ± 0.12 | 39.26 ± 0.63 | 39.26 ± 0.63 |
GLM 4.7 Flash 30B MoE Z.ai | 41.86 ± 0.28 | 42.18 ± 0.62 | 42.18 ± 0.62 |
SEA-LION v4 (Apertus) 8B AISG | 40.85 ± 0.20 | 38.42 ± 0.54 | 38.42 ± 0.54 |
Qwen 3.5 4B Alibaba | 36.22 ± 0.33 | 40.21 ± 0.67 | 40.21 ± 0.67 |
Apertus 8B Swiss AI | 27.54 ± 0.26 | 32.49 ± 0.68 | 32.49 ± 0.68 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 27.53 ± 0.30 | 51.04 ± 0.84 | 51.04 ± 0.84 |
Llama 3.2 3B Meta | 26.28 ± 0.23 | 24.88 ± 0.40 | 24.88 ± 0.40 |
Tiny Aya Global 3B CohereLabs | 23.67 ± 0.31 | 29.45 ± 0.66 | 29.45 ± 0.66 |
Tiny Aya Water 3B CohereLabs | 23.62 ± 0.32 | 32.31 ± 0.67 | 32.31 ± 0.67 |
MERaLiON 2 3B A*STAR | 23.52 ± 0.20 | 23.18 ± 0.55 | 23.18 ± 0.55 |
Model | TL | NLG | Summarization | Translations |
|---|---|---|---|---|
Gemma 4 31B | 76.32 ± 0.08 | 58.76 ± 0.08 | 26.90 ± 0.16 | 90.63 ± 0.02 |
Gemma 4 26B MoE | 72.00 ± 0.09 | 58.05 ± 0.08 | 25.74 ± 0.15 | 90.37 ± 0.02 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.99 ± 0.15 | 55.85 ± 0.06 | 22.86 ± 0.11 | 88.84 ± 0.04 |
Qwen 3.5 122B MoE Alibaba | 71.13 ± 0.18 | 55.43 ± 0.08 | 22.58 ± 0.14 | 88.29 ± 0.06 |
SEA-LION v4 (Gemma) 27B AISG | 70.53 ± 0.16 | 55.59 ± 0.06 | 19.92 ± 0.11 | 91.26 ± 0.02 |
Gemma 3 27B | 70.17 ± 0.13 | 55.53 ± 0.05 | 19.74 ± 0.10 | 91.33 ± 0.02 |
Qwen 3.5 27B Alibaba | 69.62 ± 0.20 | 53.58 ± 0.07 | 21.32 ± 0.12 | 85.84 ± 0.05 |
SEA-LION v3 (Llama) 70B AISG | 69.38 ± 0.18 | 57.44 ± 0.10 | 25.09 ± 0.20 | 89.79 ± 0.03 |
SEA-LION v4 (Qwen) 32B AISG | 67.77 ± 0.12 | 54.33 ± 0.03 | 21.22 ± 0.06 | 87.45 ± 0.03 |
Llama 3.3 70B Meta | 66.84 ± 0.13 | 55.44 ± 0.08 | 24.23 ± 0.15 | 86.66 ± 0.03 |
Gemma 3 12B | 66.76 ± 0.08 | 54.66 ± 0.00 | 19.17 ± 0.00 | 90.16 ± 0.00 |
Qwen 3 VL 32B Alibaba | 66.55 ± 0.14 | 50.93 ± 0.03 | 17.89 ± 0.05 | 83.97 ± 0.04 |
Qwen 3.6 27B Alibaba | 66.43 ± 0.23 | 53.44 ± 0.09 | 22.26 ± 0.17 | 84.63 ± 0.07 |
Mistral Medium 3.5 128B Mistral AI | 65.65 ± 0.18 | 54.22 ± 0.08 | 21.34 ± 0.15 | 87.11 ± 0.07 |
Llama 4 Scout 109B MoE Meta | 64.82 ± 0.12 | 55.06 ± 0.04 | 21.34 ± 0.08 | 88.78 ± 0.02 |
Gemma 4 (E4B) 8B | 64.22 ± 0.17 | 54.96 ± 0.05 | 21.23 ± 0.11 | 88.70 ± 0.03 |
Qwen 3.6 35B MoE Alibaba | 63.64 ± 0.21 | 53.16 ± 0.10 | 22.30 ± 0.18 | 84.02 ± 0.08 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.79 ± 0.15 | 54.14 ± 0.08 | 21.09 ± 0.13 | 87.19 ± 0.08 |
Qwen 3.5 35B MoE Alibaba | 62.42 ± 0.23 | 51.30 ± 0.07 | 20.76 ± 0.16 | 81.83 ± 0.09 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 58.90 ± 0.19 | 47.56 ± 0.07 | 16.54 ± 0.10 | 78.58 ± 0.09 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.63 ± 0.15 | 53.79 ± 0.04 | 19.86 ± 0.09 | 87.71 ± 0.05 |
SEA-LION v4 (Qwen VL) 8B AISG | 56.21 ± 0.15 | 49.14 ± 0.07 | 19.79 ± 0.11 | 78.49 ± 0.06 |
Qwen 3 VL 8B Alibaba | 55.63 ± 0.16 | 49.02 ± 0.07 | 20.89 ± 0.12 | 77.14 ± 0.06 |
Mistral Small 4 119B MoE Mistral AI | 55.42 ± 0.26 | 52.67 ± 0.07 | 18.42 ± 0.12 | 86.91 ± 0.05 |
Gemma 3 4B | 54.19 ± 0.14 | 53.39 ± 0.06 | 19.70 ± 0.10 | 87.09 ± 0.05 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.01 ± 0.17 | 54.32 ± 0.06 | 22.29 ± 0.12 | 86.34 ± 0.04 |
Gemma 4 (E2B) 5B | 52.42 ± 0.18 | 54.00 ± 0.05 | 21.81 ± 0.09 | 86.19 ± 0.03 |
MERaLiON 2 10B A*STAR | 52.26 ± 0.16 | 51.07 ± 0.11 | 21.54 ± 0.20 | 80.61 ± 0.11 |
SEA-LION v3 (Llama) 8B AISG | 52.17 ± 0.25 | 53.02 ± 0.11 | 19.87 ± 0.22 | 86.17 ± 0.06 |
SEA-LION v4 (Qwen VL) 4B AISG | 51.63 ± 0.15 | 45.49 ± 0.05 | 17.04 ± 0.08 | 73.94 ± 0.06 |
Olmo 3.1 32B AI2 | 51.45 ± 0.14 | 49.55 ± 0.08 | 21.05 ± 0.15 | 78.06 ± 0.06 |
Qwen 3 VL 4B Alibaba | 46.72 ± 0.17 | 42.70 ± 0.05 | 16.73 ± 0.07 | 68.67 ± 0.09 |
Qwen 3.5 9B Alibaba | 43.61 ± 0.23 | 43.38 ± 0.11 | 19.51 ± 0.17 | 67.24 ± 0.11 |
Llama 3.1 8B Meta | 41.94 ± 0.12 | 51.79 ± 0.12 | 25.48 ± 0.24 | 78.10 ± 0.09 |
GLM 4.7 Flash 30B MoE Z.ai | 41.86 ± 0.28 | 47.56 ± 0.10 | 22.20 ± 0.15 | 72.92 ± 0.15 |
SEA-LION v4 (Apertus) 8B AISG | 40.85 ± 0.20 | 50.05 ± 0.04 | 15.47 ± 0.07 | 84.63 ± 0.06 |
Qwen 3.5 4B Alibaba | 36.22 ± 0.33 | 31.54 ± 0.17 | 10.79 ± 0.25 | 52.28 ± 0.17 |
Apertus 8B Swiss AI | 27.54 ± 0.26 | 42.92 ± 0.13 | 16.84 ± 0.15 | 69.01 ± 0.21 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 27.53 ± 0.30 | 29.31 ± 0.12 | 12.27 ± 0.17 | 46.35 ± 0.18 |
Llama 3.2 3B Meta | 26.28 ± 0.23 | 40.28 ± 0.16 | 18.31 ± 0.33 | 62.25 ± 0.10 |
Tiny Aya Global 3B CohereLabs | 23.67 ± 0.31 | 35.99 ± 0.18 | 16.04 ± 0.22 | 55.94 ± 0.28 |
Tiny Aya Water 3B CohereLabs | 23.62 ± 0.32 | 36.11 ± 0.18 | 16.68 ± 0.15 | 55.53 ± 0.31 |
MERaLiON 2 3B A*STAR | 23.52 ± 0.20 | 29.18 ± 0.11 | 6.69 ± 0.19 | 51.67 ± 0.14 |
Model | TL | NLR | Causal Reasoning | Natural Language Inference |
|---|---|---|---|---|
Gemma 4 31B | 76.32 ± 0.08 | 81.51 ± 0.07 | 94.76 ± 0.11 | 68.27 ± 0.07 |
Gemma 4 26B MoE | 72.00 ± 0.09 | 76.32 ± 0.15 | 91.90 ± 0.22 | 60.75 ± 0.23 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.99 ± 0.15 | 79.01 ± 0.23 | 90.60 ± 0.27 | 67.41 ± 0.38 |
Qwen 3.5 122B MoE Alibaba | 71.13 ± 0.18 | 73.04 ± 0.24 | 86.54 ± 0.39 | 59.55 ± 0.37 |
SEA-LION v4 (Gemma) 27B AISG | 70.53 ± 0.16 | 72.39 ± 0.11 | 88.12 ± 0.18 | 56.67 ± 0.13 |
Gemma 3 27B | 70.17 ± 0.13 | 71.41 ± 0.12 | 87.84 ± 0.19 | 54.97 ± 0.13 |
Qwen 3.5 27B Alibaba | 69.62 ± 0.20 | 73.68 ± 0.24 | 84.64 ± 0.37 | 62.72 ± 0.38 |
SEA-LION v3 (Llama) 70B AISG | 69.38 ± 0.18 | 73.36 ± 0.31 | 88.21 ± 0.43 | 58.52 ± 0.52 |
SEA-LION v4 (Qwen) 32B AISG | 67.77 ± 0.12 | 73.58 ± 0.07 | 83.40 ± 0.12 | 63.76 ± 0.10 |
Llama 3.3 70B Meta | 66.84 ± 0.13 | 73.41 ± 0.14 | 87.75 ± 0.15 | 59.07 ± 0.25 |
Gemma 3 12B | 66.76 ± 0.08 | 69.50 ± 0.00 | 85.49 ± 0.00 | 53.50 ± 0.00 |
Qwen 3 VL 32B Alibaba | 66.55 ± 0.14 | 71.87 ± 0.16 | 84.39 ± 0.22 | 59.36 ± 0.26 |
Qwen 3.6 27B Alibaba | 66.43 ± 0.23 | 68.43 ± 0.43 | 85.44 ± 0.48 | 51.41 ± 0.72 |
Mistral Medium 3.5 128B Mistral AI | 65.65 ± 0.18 | 63.57 ± 0.28 | 81.57 ± 0.34 | 45.57 ± 0.54 |
Llama 4 Scout 109B MoE Meta | 64.82 ± 0.12 | 68.55 ± 0.08 | 87.95 ± 0.07 | 49.15 ± 0.16 |
Gemma 4 (E4B) 8B | 64.22 ± 0.17 | 69.64 ± 0.16 | 86.86 ± 0.25 | 52.42 ± 0.20 |
Qwen 3.6 35B MoE Alibaba | 63.64 ± 0.21 | 47.35 ± 0.52 | 79.39 ± 0.67 | 15.32 ± 0.75 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.79 ± 0.15 | 67.26 ± 0.23 | 87.35 ± 0.18 | 47.17 ± 0.43 |
Qwen 3.5 35B MoE Alibaba | 62.42 ± 0.23 | 55.15 ± 0.70 | 71.79 ± 0.86 | 38.51 ± 0.91 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 58.90 ± 0.19 | 48.73 ± 0.67 | 64.11 ± 0.99 | 33.36 ± 0.93 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.63 ± 0.15 | 57.97 ± 0.18 | 65.68 ± 0.30 | 50.25 ± 0.16 |
SEA-LION v4 (Qwen VL) 8B AISG | 56.21 ± 0.15 | 58.60 ± 0.14 | 71.45 ± 0.30 | 45.75 ± 0.20 |
Qwen 3 VL 8B Alibaba | 55.63 ± 0.16 | 56.48 ± 0.18 | 68.51 ± 0.27 | 44.44 ± 0.25 |
Mistral Small 4 119B MoE Mistral AI | 55.42 ± 0.26 | 52.63 ± 0.59 | 73.51 ± 0.81 | 31.75 ± 0.93 |
Gemma 3 4B | 54.19 ± 0.14 | 51.29 ± 0.14 | 54.16 ± 0.18 | 48.41 ± 0.19 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.01 ± 0.17 | 47.23 ± 0.17 | 57.65 ± 0.39 | 36.81 ± 0.30 |
Gemma 4 (E2B) 5B | 52.42 ± 0.18 | 40.57 ± 0.25 | 57.64 ± 0.37 | 23.49 ± 0.45 |
MERaLiON 2 10B A*STAR | 52.26 ± 0.16 | 64.36 ± 0.19 | 86.90 ± 0.15 | 41.82 ± 0.35 |
SEA-LION v3 (Llama) 8B AISG | 52.17 ± 0.25 | 54.41 ± 0.54 | 71.45 ± 0.82 | 37.37 ± 0.77 |
SEA-LION v4 (Qwen VL) 4B AISG | 51.63 ± 0.15 | 47.27 ± 0.18 | 55.27 ± 0.27 | 39.27 ± 0.28 |
Olmo 3.1 32B AI2 | 51.45 ± 0.14 | 40.92 ± 0.34 | 54.23 ± 0.41 | 27.61 ± 0.52 |
Qwen 3 VL 4B Alibaba | 46.72 ± 0.17 | 41.23 ± 0.32 | 46.35 ± 0.55 | 36.10 ± 0.30 |
Qwen 3.5 9B Alibaba | 43.61 ± 0.23 | 28.34 ± 0.59 | 56.68 ± 1.19 | 0.00 ± 0.00 |
Llama 3.1 8B Meta | 41.94 ± 0.12 | 35.80 ± 0.54 | 45.48 ± 1.00 | 26.12 ± 0.76 |
GLM 4.7 Flash 30B MoE Z.ai | 41.86 ± 0.28 | 23.93 ± 0.95 | 39.16 ± 1.30 | 8.71 ± 0.96 |
SEA-LION v4 (Apertus) 8B AISG | 40.85 ± 0.20 | 25.50 ± 0.36 | 50.40 ± 0.69 | 0.61 ± 0.14 |
Qwen 3.5 4B Alibaba | 36.22 ± 0.33 | 22.51 ± 0.76 | 34.76 ± 1.12 | 10.27 ± 0.91 |
Apertus 8B Swiss AI | 27.54 ± 0.26 | 6.54 ± 0.65 | 12.13 ± 1.25 | 0.96 ± 0.51 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 27.53 ± 0.30 | 5.47 ± 1.02 | 9.08 ± 2.10 | 1.86 ± 0.82 |
Llama 3.2 3B Meta | 26.28 ± 0.23 | 8.18 ± 0.69 | 9.84 ± 1.01 | 6.52 ± 0.92 |
Tiny Aya Global 3B CohereLabs | 23.67 ± 0.31 | 0.48 ± 0.37 | 0.97 ± 0.73 | 0.00 ± 0.00 |
Tiny Aya Water 3B CohereLabs | 23.62 ± 0.32 | 1.97 ± 0.64 | 3.93 ± 1.28 | 0.00 ± 0.00 |
MERaLiON 2 3B A*STAR | 23.52 ± 0.20 | 21.69 ± 0.57 | 41.51 ± 0.89 | 1.88 ± 0.61 |
Model | TL | NLU | Belebele QA | Sentiment Analysis |
|---|---|---|---|---|
Gemma 4 31B | 76.32 ± 0.08 | 75.21 ± 0.06 | 82.67 ± 0.00 | 67.75 ± 0.11 |
Gemma 4 26B MoE | 72.00 ± 0.09 | 71.62 ± 0.15 | 79.73 ± 0.34 | 63.50 ± 0.16 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.99 ± 0.15 | 70.88 ± 0.31 | 80.76 ± 0.66 | 61.01 ± 0.19 |
Qwen 3.5 122B MoE Alibaba | 71.13 ± 0.18 | 69.85 ± 0.30 | 78.80 ± 0.58 | 60.90 ± 0.25 |
SEA-LION v4 (Gemma) 27B AISG | 70.53 ± 0.16 | 70.17 ± 0.26 | 82.31 ± 0.48 | 58.03 ± 0.18 |
Gemma 3 27B | 70.17 ± 0.13 | 69.94 ± 0.21 | 82.67 ± 0.35 | 57.22 ± 0.15 |
Qwen 3.5 27B Alibaba | 69.62 ± 0.20 | 70.21 ± 0.40 | 80.89 ± 0.68 | 59.53 ± 0.34 |
SEA-LION v3 (Llama) 70B AISG | 69.38 ± 0.18 | 69.16 ± 0.39 | 81.78 ± 0.63 | 56.54 ± 0.42 |
SEA-LION v4 (Qwen) 32B AISG | 67.77 ± 0.12 | 73.54 ± 0.13 | 78.67 ± 0.00 | 68.41 ± 0.26 |
Llama 3.3 70B Meta | 66.84 ± 0.13 | 69.33 ± 0.19 | 82.76 ± 0.35 | 55.91 ± 0.14 |
Gemma 3 12B | 66.76 ± 0.08 | 67.58 ± 0.00 | 78.67 ± 0.00 | 56.50 ± 0.00 |
Qwen 3 VL 32B Alibaba | 66.55 ± 0.14 | 70.70 ± 0.19 | 81.47 ± 0.32 | 59.94 ± 0.14 |
Qwen 3.6 27B Alibaba | 66.43 ± 0.23 | 63.55 ± 0.63 | 80.67 ± 0.73 | 46.44 ± 0.76 |
Mistral Medium 3.5 128B Mistral AI | 65.65 ± 0.18 | 68.86 ± 0.27 | 83.64 ± 0.56 | 54.08 ± 0.25 |
Llama 4 Scout 109B MoE Meta | 64.82 ± 0.12 | 66.47 ± 0.06 | 78.67 ± 0.00 | 54.27 ± 0.11 |
Gemma 4 (E4B) 8B | 64.22 ± 0.17 | 70.72 ± 0.26 | 79.96 ± 0.41 | 61.48 ± 0.30 |
Qwen 3.6 35B MoE Alibaba | 63.64 ± 0.21 | 68.17 ± 0.47 | 79.02 ± 0.93 | 57.32 ± 0.39 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.79 ± 0.15 | 73.25 ± 0.30 | 80.13 ± 0.47 | 66.38 ± 0.39 |
Qwen 3.5 35B MoE Alibaba | 62.42 ± 0.23 | 69.03 ± 0.55 | 78.89 ± 0.92 | 59.17 ± 0.39 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 58.90 ± 0.19 | 65.08 ± 0.52 | 74.89 ± 0.91 | 55.27 ± 0.39 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.63 ± 0.15 | 64.20 ± 0.11 | 73.47 ± 0.15 | 54.94 ± 0.23 |
SEA-LION v4 (Qwen VL) 8B AISG | 56.21 ± 0.15 | 66.69 ± 0.07 | 73.33 ± 0.00 | 60.05 ± 0.14 |
Qwen 3 VL 8B Alibaba | 55.63 ± 0.16 | 66.46 ± 0.15 | 75.51 ± 0.23 | 57.41 ± 0.17 |
Mistral Small 4 119B MoE Mistral AI | 55.42 ± 0.26 | 59.14 ± 0.34 | 74.89 ± 0.66 | 43.40 ± 0.35 |
Gemma 3 4B | 54.19 ± 0.14 | 63.78 ± 0.11 | 72.13 ± 0.15 | 55.43 ± 0.21 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.01 ± 0.17 | 60.63 ± 0.26 | 65.78 ± 0.36 | 55.48 ± 0.21 |
Gemma 4 (E2B) 5B | 52.42 ± 0.18 | 61.29 ± 0.19 | 67.56 ± 0.29 | 55.03 ± 0.23 |
MERaLiON 2 10B A*STAR | 52.26 ± 0.16 | 66.67 ± 0.33 | 73.87 ± 0.48 | 59.47 ± 0.33 |
SEA-LION v3 (Llama) 8B AISG | 52.17 ± 0.25 | 61.54 ± 0.43 | 71.78 ± 0.83 | 51.30 ± 0.40 |
SEA-LION v4 (Qwen VL) 4B AISG | 51.63 ± 0.15 | 65.94 ± 0.14 | 75.91 ± 0.25 | 55.97 ± 0.17 |
Olmo 3.1 32B AI2 | 51.45 ± 0.14 | 60.14 ± 0.28 | 66.49 ± 0.48 | 53.79 ± 0.36 |
Qwen 3 VL 4B Alibaba | 46.72 ± 0.17 | 60.23 ± 0.19 | 71.20 ± 0.37 | 49.26 ± 0.17 |
Qwen 3.5 9B Alibaba | 43.61 ± 0.23 | 29.66 ± 0.78 | 14.71 ± 1.41 | 44.60 ± 0.43 |
Llama 3.1 8B Meta | 41.94 ± 0.12 | 54.45 ± 0.49 | 65.29 ± 0.82 | 43.61 ± 0.41 |
GLM 4.7 Flash 30B MoE Z.ai | 41.86 ± 0.28 | 55.68 ± 0.80 | 58.93 ± 1.44 | 52.43 ± 0.59 |
SEA-LION v4 (Apertus) 8B AISG | 40.85 ± 0.20 | 47.92 ± 0.31 | 50.89 ± 0.55 | 44.95 ± 0.25 |
Qwen 3.5 4B Alibaba | 36.22 ± 0.33 | 50.21 ± 0.61 | 58.71 ± 1.03 | 41.71 ± 0.55 |
Apertus 8B Swiss AI | 27.54 ± 0.26 | 26.87 ± 0.81 | 31.33 ± 1.44 | 22.40 ± 0.86 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 27.53 ± 0.30 | 16.74 ± 0.65 | 33.29 ± 1.32 | 0.18 ± 0.17 |
Llama 3.2 3B Meta | 26.28 ± 0.23 | 33.84 ± 0.57 | 33.29 ± 1.13 | 34.39 ± 0.50 |
Tiny Aya Global 3B CohereLabs | 23.67 ± 0.31 | 11.65 ± 0.98 | 21.56 ± 1.71 | 1.75 ± 0.62 |
Tiny Aya Water 3B CohereLabs | 23.62 ± 0.32 | 6.61 ± 0.96 | 12.53 ± 1.92 | 0.69 ± 0.43 |
MERaLiON 2 3B A*STAR | 23.52 ± 0.20 | 27.75 ± 0.69 | 22.31 ± 1.19 | 33.19 ± 0.54 |
Model | TL | Safety | Toxicity Detection |
|---|---|---|---|
Gemma 4 31B | 76.32 ± 0.08 | 59.63 ± 0.35 | 59.63 ± 0.35 |
Gemma 4 26B MoE | 72.00 ± 0.09 | 52.50 ± 0.33 | 52.50 ± 0.33 |
SEA-LION v4.5 (Qwen) 27B AISG | 71.99 ± 0.15 | 53.03 ± 0.61 | 53.03 ± 0.61 |
Qwen 3.5 122B MoE Alibaba | 71.13 ± 0.18 | 45.65 ± 0.86 | 45.65 ± 0.86 |
SEA-LION v4 (Gemma) 27B AISG | 70.53 ± 0.16 | 61.58 ± 0.36 | 61.58 ± 0.36 |
Gemma 3 27B | 70.17 ± 0.13 | 62.80 ± 0.34 | 62.80 ± 0.34 |
Qwen 3.5 27B Alibaba | 69.62 ± 0.20 | 43.38 ± 0.86 | 43.38 ± 0.86 |
SEA-LION v3 (Llama) 70B AISG | 69.38 ± 0.18 | 55.55 ± 0.63 | 55.55 ± 0.63 |
SEA-LION v4 (Qwen) 32B AISG | 67.77 ± 0.12 | 53.50 ± 0.27 | 53.50 ± 0.27 |
Llama 3.3 70B Meta | 66.84 ± 0.13 | 50.43 ± 0.26 | 50.43 ± 0.26 |
Gemma 3 12B | 66.76 ± 0.08 | 56.50 ± 0.00 | 56.50 ± 0.00 |
Qwen 3 VL 32B Alibaba | 66.55 ± 0.14 | 49.10 ± 0.29 | 49.10 ± 0.29 |
Qwen 3.6 27B Alibaba | 66.43 ± 0.23 | 41.67 ± 1.40 | 41.67 ± 1.40 |
Mistral Medium 3.5 128B Mistral AI | 65.65 ± 0.18 | 41.65 ± 0.69 | 41.65 ± 0.69 |
Llama 4 Scout 109B MoE Meta | 64.82 ± 0.12 | 36.38 ± 0.12 | 36.38 ± 0.12 |
Gemma 4 (E4B) 8B | 64.22 ± 0.17 | 40.12 ± 0.65 | 40.12 ± 0.65 |
Qwen 3.6 35B MoE Alibaba | 63.64 ± 0.21 | 44.92 ± 1.00 | 44.92 ± 1.00 |
SEA-LION v3 (Gemma 2) 9B AISG | 62.79 ± 0.15 | 48.57 ± 0.49 | 48.57 ± 0.49 |
Qwen 3.5 35B MoE Alibaba | 62.42 ± 0.23 | 44.45 ± 1.01 | 44.45 ± 1.01 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 58.90 ± 0.19 | 29.02 ± 0.77 | 29.02 ± 0.77 |
SEA-LION v4 (Gemma VL) 4B AISG | 56.63 ± 0.15 | 34.63 ± 0.30 | 34.63 ± 0.30 |
SEA-LION v4 (Qwen VL) 8B AISG | 56.21 ± 0.15 | 39.02 ± 0.34 | 39.02 ± 0.34 |
Qwen 3 VL 8B Alibaba | 55.63 ± 0.16 | 35.07 ± 0.21 | 35.07 ± 0.21 |
Mistral Small 4 119B MoE Mistral AI | 55.42 ± 0.26 | 25.05 ± 0.61 | 25.05 ± 0.61 |
Gemma 3 4B | 54.19 ± 0.14 | 26.02 ± 0.34 | 26.02 ± 0.34 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.01 ± 0.17 | 29.87 ± 0.63 | 29.87 ± 0.63 |
Gemma 4 (E2B) 5B | 52.42 ± 0.18 | 30.80 ± 0.64 | 30.80 ± 0.64 |
MERaLiON 2 10B A*STAR | 52.26 ± 0.16 | 28.43 ± 0.44 | 28.43 ± 0.44 |
SEA-LION v3 (Llama) 8B AISG | 52.17 ± 0.25 | 20.90 ± 0.33 | 20.90 ± 0.33 |
SEA-LION v4 (Qwen VL) 4B AISG | 51.63 ± 0.15 | 41.98 ± 0.23 | 41.98 ± 0.23 |
Olmo 3.1 32B AI2 | 51.45 ± 0.14 | 32.73 ± 0.54 | 32.73 ± 0.54 |
Qwen 3 VL 4B Alibaba | 46.72 ± 0.17 | 36.92 ± 0.26 | 36.92 ± 0.26 |
Qwen 3.5 9B Alibaba | 43.61 ± 0.23 | 25.78 ± 1.53 | 25.78 ± 1.53 |
Llama 3.1 8B Meta | 41.94 ± 0.12 | 20.83 ± 0.54 | 20.83 ± 0.54 |
GLM 4.7 Flash 30B MoE Z.ai | 41.86 ± 0.28 | 38.30 ± 0.95 | 38.30 ± 0.95 |
SEA-LION v4 (Apertus) 8B AISG | 40.85 ± 0.20 | 48.12 ± 0.50 | 48.12 ± 0.50 |
Qwen 3.5 4B Alibaba | 36.22 ± 0.33 | 23.75 ± 1.27 | 23.75 ± 1.27 |
Apertus 8B Swiss AI | 27.54 ± 0.26 | 26.15 ± 1.39 | 26.15 ± 1.39 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 27.53 ± 0.30 | 0.37 ± 0.21 | 0.37 ± 0.21 |
Llama 3.2 3B Meta | 26.28 ± 0.23 | 13.93 ± 1.13 | 13.93 ± 1.13 |
Tiny Aya Global 3B CohereLabs | 23.67 ± 0.31 | 1.77 ± 1.11 | 1.77 ± 1.11 |
Tiny Aya Water 3B CohereLabs | 23.62 ± 0.32 | 0.57 ± 0.57 | 0.57 ± 0.57 |
MERaLiON 2 3B A*STAR | 23.52 ± 0.20 | 9.13 ± 0.37 | 9.13 ± 0.37 |