Tamil Performance
Tamil Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
31B 75.41±0.16 |
27B 75.00±0.14 |
27B 72.77±0.18 |
122B MoE 72.23±0.14 |
26B MoE 72.20±0.16 |
32B 71.47±0.13 |
27B 69.66±0.20 |
27B 69.47±0.19 |
27B 69.04±0.16 |
32B 68.99±0.16 |
35B MoE 68.62±0.19 |
109B MoE 67.81±0.14 |
128B 67.37±0.19 |
12B 66.51±0.08 |
8B 65.96±0.18 |
70B 65.46±0.30 |
35B MoE 65.00±0.22 |
70B 62.71±0.17 |
9B 61.93±0.22 |
4B 55.10±0.19 |
9B 54.40±0.29 |
5B 53.85±0.20 |
119B MoE 53.83±0.30 |
8B 53.53±0.17 |
5B 52.92±0.19 |
8B 49.60±0.17 |
8B 46.95±0.25 |
4B 44.25±0.19 |
10B 43.79±0.24 |
4B 43.28±0.19 |
8B 38.82±0.27 |
4B 38.44±0.22 |
4B 36.74±0.23 |
120B MoE 33.59±0.42 |
32B 30.81±0.23 |
30B MoE 24.67±0.28 |
8B 22.37±0.31 |
8B 21.22±0.18 |
3B 16.86±0.20 |
30B MoE 16.26±0.20 |
3B 12.51±0.24 |
3B 5.62±0.17 |
3B 4.42±0.16 |
Tamil Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TA | Instruction Following | Linguistic Diagnostics | Multi-Turn Chat | NLG | NLR | NLU |
|---|---|---|---|---|---|---|---|
Gemma 4 31B | 75.41 ± 0.16 | 88.83 ± 0.74 | 75.21 ± 0.27 | 78.27 ± 0.44 | 51.46 ± 0.04 | 75.31 ± 0.05 | 83.40 ± 0.16 |
SEA-LION v4.5 (Qwen) 27B AISG | 75.00 ± 0.14 | 85.43 ± 0.49 | 77.03 ± 0.31 | 79.74 ± 0.38 | 50.51 ± 0.05 | 73.53 ± 0.19 | 83.74 ± 0.14 |
Qwen 3.5 27B Alibaba | 72.77 ± 0.18 | 84.00 ± 0.68 | 71.73 ± 0.48 | 79.86 ± 0.39 | 49.50 ± 0.07 | 68.91 ± 0.27 | 82.60 ± 0.24 |
Qwen 3.5 122B MoE Alibaba | 72.23 ± 0.14 | 84.48 ± 0.80 | 71.24 ± 0.45 | 77.74 ± 0.47 | 49.93 ± 0.07 | 66.47 ± 0.16 | 83.54 ± 0.22 |
Gemma 4 26B MoE | 72.20 ± 0.16 | 82.48 ± 0.62 | 72.33 ± 0.38 | 77.71 ± 0.51 | 50.40 ± 0.06 | 68.96 ± 0.13 | 81.32 ± 0.30 |
SEA-LION v4 (Qwen) 32B AISG | 71.47 ± 0.13 | 83.78 ± 0.71 | 71.16 ± 0.16 | 70.51 ± 0.38 | 50.05 ± 0.04 | 69.32 ± 0.09 | 83.98 ± 0.11 |
SEA-LION v4 (Gemma) 27B AISG | 69.66 ± 0.20 | 73.87 ± 0.85 | 70.13 ± 0.56 | 69.50 ± 0.47 | 50.37 ± 0.05 | 71.50 ± 0.09 | 82.61 ± 0.28 |
Gemma 3 27B | 69.47 ± 0.19 | 74.38 ± 0.81 | 70.75 ± 0.53 | 67.91 ± 0.48 | 50.09 ± 0.05 | 72.18 ± 0.10 | 81.50 ± 0.29 |
Qwen 3.6 27B Alibaba | 69.04 ± 0.16 | 82.38 ± 0.70 | 69.63 ± 0.63 | 76.97 ± 0.55 | 48.41 ± 0.08 | 56.35 ± 0.36 | 80.50 ± 0.27 |
Qwen 3 VL 32B Alibaba | 68.99 ± 0.16 | 81.71 ± 0.59 | 71.88 ± 0.26 | 71.47 ± 0.50 | 45.22 ± 0.07 | 63.63 ± 0.18 | 80.04 ± 0.17 |
Qwen 3.6 35B MoE Alibaba | 68.62 ± 0.19 | 78.41 ± 0.84 | 66.52 ± 0.66 | 75.38 ± 0.58 | 48.29 ± 0.10 | 62.38 ± 0.28 | 80.76 ± 0.28 |
Llama 4 Scout 109B MoE Meta | 67.81 ± 0.14 | 86.29 ± 0.63 | 67.64 ± 0.45 | 56.01 ± 0.45 | 50.07 ± 0.07 | 64.08 ± 0.07 | 82.78 ± 0.09 |
Mistral Medium 3.5 128B Mistral AI | 67.37 ± 0.19 | 78.70 ± 0.74 | 64.37 ± 0.65 | 72.84 ± 0.53 | 46.25 ± 0.08 | 60.01 ± 0.24 | 82.03 ± 0.33 |
Gemma 3 12B | 66.51 ± 0.08 | 78.10 ± 0.00 | 61.38 ± 0.00 | 63.95 ± 0.48 | 49.68 ± 0.00 | 63.86 ± 0.00 | 82.09 ± 0.00 |
Gemma 4 (E4B) 8B | 65.96 ± 0.18 | 84.29 ± 0.78 | 55.92 ± 0.80 | 71.92 ± 0.42 | 48.67 ± 0.09 | 54.98 ± 0.32 | 79.96 ± 0.31 |
SEA-LION v3 (Llama) 70B AISG | 65.46 ± 0.30 | 82.79 ± 0.96 | 55.28 ± 0.94 | 60.62 ± 0.56 | 49.33 ± 0.08 | 62.97 ± 0.42 | 81.75 ± 0.25 |
Qwen 3.5 35B MoE Alibaba | 65.00 ± 0.22 | 78.48 ± 0.74 | 51.31 ± 0.65 | 75.16 ± 0.37 | 46.88 ± 0.07 | 58.54 ± 0.37 | 79.61 ± 0.32 |
Llama 3.3 70B Meta | 62.71 ± 0.17 | 79.68 ± 0.69 | 55.12 ± 0.39 | 48.51 ± 0.63 | 47.06 ± 0.09 | 64.65 ± 0.17 | 81.24 ± 0.15 |
SEA-LION v3 (Gemma 2) 9B AISG | 61.93 ± 0.22 | 71.87 ± 1.29 | 63.13 ± 0.47 | 56.26 ± 0.51 | 44.94 ± 0.09 | 53.49 ± 0.25 | 81.90 ± 0.26 |
SEA-LION v4 (Gemma VL) 4B AISG | 55.10 ± 0.19 | 66.25 ± 0.81 | 35.37 ± 0.49 | 57.48 ± 0.35 | 43.75 ± 0.05 | 48.49 ± 0.25 | 79.28 ± 0.19 |
Qwen 3.5 9B Alibaba | 54.40 ± 0.29 | 69.27 ± 0.95 | 47.45 ± 0.90 | 63.45 ± 0.62 | 36.87 ± 0.09 | 34.26 ± 0.57 | 75.08 ± 0.36 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.85 ± 0.20 | 74.92 ± 1.01 | 27.82 ± 0.39 | 61.12 ± 0.46 | 41.07 ± 0.07 | 38.74 ± 0.25 | 79.43 ± 0.20 |
Mistral Small 4 119B MoE Mistral AI | 53.83 ± 0.30 | 64.06 ± 1.13 | 27.91 ± 1.05 | 65.21 ± 0.71 | 39.80 ± 0.14 | 47.16 ± 0.56 | 78.83 ± 0.29 |
SEA-LION v4 (Qwen VL) 8B AISG | 53.53 ± 0.17 | 61.94 ± 0.78 | 62.23 ± 0.35 | 46.62 ± 0.64 | 42.84 ± 0.05 | 27.36 ± 0.41 | 80.17 ± 0.20 |
Gemma 4 (E2B) 5B | 52.92 ± 0.19 | 71.08 ± 0.80 | 24.48 ± 0.41 | 63.36 ± 0.54 | 42.79 ± 0.07 | 37.14 ± 0.22 | 78.70 ± 0.17 |
Qwen 3 VL 8B Alibaba | 49.60 ± 0.17 | 59.05 ± 0.83 | 51.95 ± 0.44 | 46.83 ± 0.44 | 34.27 ± 0.08 | 24.99 ± 0.40 | 80.48 ± 0.14 |
SEA-LION v3 (Llama) 8B AISG | 46.95 ± 0.25 | 69.43 ± 1.01 | 23.82 ± 1.39 | 46.03 ± 0.60 | 44.26 ± 0.08 | 23.67 ± 0.55 | 74.50 ± 0.36 |
Gemma 3 4B | 44.25 ± 0.19 | 62.29 ± 0.97 | 14.93 ± 0.43 | 53.57 ± 0.48 | 44.82 ± 0.06 | 16.08 ± 0.08 | 73.83 ± 0.25 |
MERaLiON 2 10B A*STAR | 43.79 ± 0.24 | 46.41 ± 0.93 | 62.28 ± 0.61 | 29.21 ± 0.68 | 29.71 ± 0.11 | 46.02 ± 0.23 | 49.10 ± 0.47 |
SEA-LION v4 (Qwen VL) 4B AISG | 43.28 ± 0.19 | 54.86 ± 0.89 | 50.08 ± 0.26 | 33.91 ± 0.55 | 21.67 ± 0.03 | 21.79 ± 0.20 | 77.37 ± 0.15 |
SEA-LION v4 (Apertus) 8B AISG | 38.82 ± 0.27 | 47.59 ± 1.18 | 36.02 ± 0.50 | 33.59 ± 0.52 | 39.95 ± 0.07 | 1.40 ± 0.15 | 74.37 ± 0.27 |
Qwen 3 VL 4B Alibaba | 38.44 ± 0.22 | 52.73 ± 1.02 | 32.30 ± 0.41 | 29.84 ± 0.66 | 18.13 ± 0.01 | 24.94 ± 0.33 | 72.69 ± 0.16 |
Qwen 3.5 4B Alibaba | 36.74 ± 0.23 | 55.11 ± 1.18 | 18.01 ± 0.76 | 41.56 ± 0.73 | 17.85 ± 0.14 | 15.64 ± 0.74 | 72.27 ± 0.42 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 33.59 ± 0.42 | 46.57 ± 1.33 | 4.68 ± 1.01 | 64.95 ± 0.99 | 26.02 ± 0.08 | 6.19 ± 0.28 | 53.15 ± 0.55 |
Olmo 3.1 32B AI2 | 30.81 ± 0.23 | 54.38 ± 0.80 | 13.01 ± 0.60 | 46.51 ± 0.59 | 40.91 ± 0.07 | 7.22 ± 0.47 | 22.81 ± 0.35 |
GLM 4.7 Flash 30B MoE Z.ai | 24.67 ± 0.28 | 52.86 ± 1.39 | 0.00 ± 0.00 | 38.41 ± 0.61 | 35.58 ± 0.11 | 0.00 ± 0.00 | 21.18 ± 0.62 |
Apertus 8B Swiss AI | 22.37 ± 0.31 | 45.62 ± 1.46 | 0.00 ± 0.00 | 29.27 ± 0.68 | 30.19 ± 0.13 | 0.00 ± 0.00 | 29.14 ± 0.72 |
Llama 3.1 8B Meta | 21.22 ± 0.18 | 42.57 ± 0.98 | 4.31 ± 0.65 | 24.53 ± 0.59 | 30.08 ± 0.15 | 0.42 ± 0.17 | 25.39 ± 0.58 |
Llama 3.2 3B Meta | 16.86 ± 0.20 | 36.67 ± 0.88 | 1.84 ± 0.48 | 17.81 ± 0.58 | 27.19 ± 0.16 | 0.02 ± 0.03 | 17.64 ± 0.44 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 16.26 ± 0.20 | 42.60 ± 1.07 | 0.00 ± 0.00 | 30.23 ± 0.77 | 12.42 ± 0.08 | 0.00 ± 0.00 | 12.34 ± 0.30 |
MERaLiON 2 3B A*STAR | 12.51 ± 0.24 | 28.60 ± 1.03 | 2.44 ± 0.54 | 13.33 ± 0.50 | 22.19 ± 0.10 | 1.01 ± 0.34 | 7.50 ± 0.39 |
Tiny Aya Global 3B CohereLabs | 5.62 ± 0.17 | 22.22 ± 0.97 | 0.00 ± 0.00 | 3.84 ± 0.28 | 5.90 ± 0.07 | 0.00 ± 0.00 | 1.74 ± 0.25 |
Tiny Aya Water 3B CohereLabs | 4.42 ± 0.16 | 17.05 ± 0.94 | 0.00 ± 0.00 | 3.91 ± 0.31 | 4.44 ± 0.05 | 0.00 ± 0.00 | 1.10 ± 0.16 |
Tamil Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TA | Instruction Following | SEA-IFEval |
|---|---|---|---|
Gemma 4 31B | 75.41 ± 0.16 | 88.83 ± 0.74 | 88.83 ± 0.74 |
SEA-LION v4.5 (Qwen) 27B AISG | 75.00 ± 0.14 | 85.43 ± 0.49 | 85.43 ± 0.49 |
Qwen 3.5 27B Alibaba | 72.77 ± 0.18 | 84.00 ± 0.68 | 84.00 ± 0.68 |
Qwen 3.5 122B MoE Alibaba | 72.23 ± 0.14 | 84.48 ± 0.80 | 84.48 ± 0.80 |
Gemma 4 26B MoE | 72.20 ± 0.16 | 82.48 ± 0.62 | 82.48 ± 0.62 |
SEA-LION v4 (Qwen) 32B AISG | 71.47 ± 0.13 | 83.78 ± 0.71 | 83.78 ± 0.71 |
SEA-LION v4 (Gemma) 27B AISG | 69.66 ± 0.20 | 73.87 ± 0.85 | 73.87 ± 0.85 |
Gemma 3 27B | 69.47 ± 0.19 | 74.38 ± 0.81 | 74.38 ± 0.81 |
Qwen 3.6 27B Alibaba | 69.04 ± 0.16 | 82.38 ± 0.70 | 82.38 ± 0.70 |
Qwen 3 VL 32B Alibaba | 68.99 ± 0.16 | 81.71 ± 0.59 | 81.71 ± 0.59 |
Qwen 3.6 35B MoE Alibaba | 68.62 ± 0.19 | 78.41 ± 0.84 | 78.41 ± 0.84 |
Llama 4 Scout 109B MoE Meta | 67.81 ± 0.14 | 86.29 ± 0.63 | 86.29 ± 0.63 |
Mistral Medium 3.5 128B Mistral AI | 67.37 ± 0.19 | 78.70 ± 0.74 | 78.70 ± 0.74 |
Gemma 3 12B | 66.51 ± 0.08 | 78.10 ± 0.00 | 78.10 ± 0.00 |
Gemma 4 (E4B) 8B | 65.96 ± 0.18 | 84.29 ± 0.78 | 84.29 ± 0.78 |
SEA-LION v3 (Llama) 70B AISG | 65.46 ± 0.30 | 82.79 ± 0.96 | 82.79 ± 0.96 |
Qwen 3.5 35B MoE Alibaba | 65.00 ± 0.22 | 78.48 ± 0.74 | 78.48 ± 0.74 |
Llama 3.3 70B Meta | 62.71 ± 0.17 | 79.68 ± 0.69 | 79.68 ± 0.69 |
SEA-LION v3 (Gemma 2) 9B AISG | 61.93 ± 0.22 | 71.87 ± 1.29 | 71.87 ± 1.29 |
SEA-LION v4 (Gemma VL) 4B AISG | 55.10 ± 0.19 | 66.25 ± 0.81 | 66.25 ± 0.81 |
Qwen 3.5 9B Alibaba | 54.40 ± 0.29 | 69.27 ± 0.95 | 69.27 ± 0.95 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.85 ± 0.20 | 74.92 ± 1.01 | 74.92 ± 1.01 |
Mistral Small 4 119B MoE Mistral AI | 53.83 ± 0.30 | 64.06 ± 1.13 | 64.06 ± 1.13 |
SEA-LION v4 (Qwen VL) 8B AISG | 53.53 ± 0.17 | 61.94 ± 0.78 | 61.94 ± 0.78 |
Gemma 4 (E2B) 5B | 52.92 ± 0.19 | 71.08 ± 0.80 | 71.08 ± 0.80 |
Qwen 3 VL 8B Alibaba | 49.60 ± 0.17 | 59.05 ± 0.83 | 59.05 ± 0.83 |
SEA-LION v3 (Llama) 8B AISG | 46.95 ± 0.25 | 69.43 ± 1.01 | 69.43 ± 1.01 |
Gemma 3 4B | 44.25 ± 0.19 | 62.29 ± 0.97 | 62.29 ± 0.97 |
MERaLiON 2 10B A*STAR | 43.79 ± 0.24 | 46.41 ± 0.93 | 46.41 ± 0.93 |
SEA-LION v4 (Qwen VL) 4B AISG | 43.28 ± 0.19 | 54.86 ± 0.89 | 54.86 ± 0.89 |
SEA-LION v4 (Apertus) 8B AISG | 38.82 ± 0.27 | 47.59 ± 1.18 | 47.59 ± 1.18 |
Qwen 3 VL 4B Alibaba | 38.44 ± 0.22 | 52.73 ± 1.02 | 52.73 ± 1.02 |
Qwen 3.5 4B Alibaba | 36.74 ± 0.23 | 55.11 ± 1.18 | 55.11 ± 1.18 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 33.59 ± 0.42 | 46.57 ± 1.33 | 46.57 ± 1.33 |
Olmo 3.1 32B AI2 | 30.81 ± 0.23 | 54.38 ± 0.80 | 54.38 ± 0.80 |
GLM 4.7 Flash 30B MoE Z.ai | 24.67 ± 0.28 | 52.86 ± 1.39 | 52.86 ± 1.39 |
Apertus 8B Swiss AI | 22.37 ± 0.31 | 45.62 ± 1.46 | 45.62 ± 1.46 |
Llama 3.1 8B Meta | 21.22 ± 0.18 | 42.57 ± 0.98 | 42.57 ± 0.98 |
Llama 3.2 3B Meta | 16.86 ± 0.20 | 36.67 ± 0.88 | 36.67 ± 0.88 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 16.26 ± 0.20 | 42.60 ± 1.07 | 42.60 ± 1.07 |
MERaLiON 2 3B A*STAR | 12.51 ± 0.24 | 28.60 ± 1.03 | 28.60 ± 1.03 |
Tiny Aya Global 3B CohereLabs | 5.62 ± 0.17 | 22.22 ± 0.97 | 22.22 ± 0.97 |
Tiny Aya Water 3B CohereLabs | 4.42 ± 0.16 | 17.05 ± 0.94 | 17.05 ± 0.94 |
Model | TA | Linguistic Diagnostics | Syntax | Pragmatics |
|---|---|---|---|---|
Gemma 4 31B | 75.41 ± 0.16 | 75.21 ± 0.27 | 91.52 ± 0.10 | 58.90 ± 0.53 |
SEA-LION v4.5 (Qwen) 27B AISG | 75.00 ± 0.14 | 77.03 ± 0.31 | 88.81 ± 0.32 | 65.25 ± 0.57 |
Qwen 3.5 27B Alibaba | 72.77 ± 0.18 | 71.73 ± 0.48 | 86.60 ± 0.42 | 56.87 ± 0.97 |
Qwen 3.5 122B MoE Alibaba | 72.23 ± 0.14 | 71.24 ± 0.45 | 90.09 ± 0.45 | 52.40 ± 0.93 |
Gemma 4 26B MoE | 72.20 ± 0.16 | 72.33 ± 0.38 | 82.95 ± 0.17 | 61.71 ± 0.75 |
SEA-LION v4 (Qwen) 32B AISG | 71.47 ± 0.13 | 71.16 ± 0.16 | 87.66 ± 0.12 | 54.67 ± 0.29 |
SEA-LION v4 (Gemma) 27B AISG | 69.66 ± 0.20 | 70.13 ± 0.56 | 84.92 ± 0.26 | 55.34 ± 1.10 |
Gemma 3 27B | 69.47 ± 0.19 | 70.75 ± 0.53 | 85.35 ± 0.25 | 56.15 ± 1.07 |
Qwen 3.6 27B Alibaba | 69.04 ± 0.16 | 69.63 ± 0.63 | 81.23 ± 0.62 | 58.03 ± 1.08 |
Qwen 3 VL 32B Alibaba | 68.99 ± 0.16 | 71.88 ± 0.26 | 84.70 ± 0.23 | 59.06 ± 0.48 |
Qwen 3.6 35B MoE Alibaba | 68.62 ± 0.19 | 66.52 ± 0.66 | 76.71 ± 0.42 | 56.34 ± 1.08 |
Llama 4 Scout 109B MoE Meta | 67.81 ± 0.14 | 67.64 ± 0.45 | 74.61 ± 0.17 | 60.68 ± 0.87 |
Mistral Medium 3.5 128B Mistral AI | 67.37 ± 0.19 | 64.37 ± 0.65 | 67.59 ± 0.51 | 61.16 ± 1.14 |
Gemma 3 12B | 66.51 ± 0.08 | 61.38 ± 0.00 | 72.77 ± 0.00 | 50.00 ± 0.00 |
Gemma 4 (E4B) 8B | 65.96 ± 0.18 | 55.92 ± 0.80 | 66.96 ± 0.42 | 44.87 ± 1.46 |
SEA-LION v3 (Llama) 70B AISG | 65.46 ± 0.30 | 55.28 ± 0.94 | 66.96 ± 0.68 | 43.59 ± 1.81 |
Qwen 3.5 35B MoE Alibaba | 65.00 ± 0.22 | 51.31 ± 0.65 | 75.69 ± 0.50 | 26.93 ± 1.27 |
Llama 3.3 70B Meta | 62.71 ± 0.17 | 55.12 ± 0.39 | 67.05 ± 0.37 | 43.19 ± 0.71 |
SEA-LION v3 (Gemma 2) 9B AISG | 61.93 ± 0.22 | 63.13 ± 0.47 | 71.50 ± 0.45 | 54.75 ± 0.85 |
SEA-LION v4 (Gemma VL) 4B AISG | 55.10 ± 0.19 | 35.37 ± 0.49 | 44.75 ± 0.58 | 25.99 ± 0.71 |
Qwen 3.5 9B Alibaba | 54.40 ± 0.29 | 47.45 ± 0.90 | 56.18 ± 1.22 | 38.71 ± 1.36 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.85 ± 0.20 | 27.82 ± 0.39 | 40.37 ± 0.45 | 15.27 ± 0.58 |
Mistral Small 4 119B MoE Mistral AI | 53.83 ± 0.30 | 27.91 ± 1.05 | 36.00 ± 1.60 | 19.82 ± 1.53 |
SEA-LION v4 (Qwen VL) 8B AISG | 53.53 ± 0.17 | 62.23 ± 0.35 | 71.57 ± 0.19 | 52.89 ± 0.64 |
Gemma 4 (E2B) 5B | 52.92 ± 0.19 | 24.48 ± 0.41 | 32.21 ± 0.49 | 16.74 ± 0.73 |
Qwen 3 VL 8B Alibaba | 49.60 ± 0.17 | 51.95 ± 0.44 | 58.51 ± 0.59 | 45.38 ± 0.70 |
SEA-LION v3 (Llama) 8B AISG | 46.95 ± 0.25 | 23.82 ± 1.39 | 20.60 ± 1.81 | 27.05 ± 1.79 |
Gemma 3 4B | 44.25 ± 0.19 | 14.93 ± 0.43 | 17.63 ± 0.50 | 12.22 ± 0.66 |
MERaLiON 2 10B A*STAR | 43.79 ± 0.24 | 62.28 ± 0.61 | 68.10 ± 0.60 | 56.46 ± 1.06 |
SEA-LION v4 (Qwen VL) 4B AISG | 43.28 ± 0.19 | 50.08 ± 0.26 | 56.17 ± 0.42 | 43.99 ± 0.29 |
SEA-LION v4 (Apertus) 8B AISG | 38.82 ± 0.27 | 36.02 ± 0.50 | 56.23 ± 0.78 | 15.80 ± 0.83 |
Qwen 3 VL 4B Alibaba | 38.44 ± 0.22 | 32.30 ± 0.41 | 49.04 ± 0.75 | 15.56 ± 0.35 |
Qwen 3.5 4B Alibaba | 36.74 ± 0.23 | 18.01 ± 0.76 | 36.03 ± 1.52 | 0.00 ± 0.00 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 33.59 ± 0.42 | 4.68 ± 1.01 | 3.49 ± 1.25 | 5.87 ± 1.29 |
Olmo 3.1 32B AI2 | 30.81 ± 0.23 | 13.01 ± 0.60 | 26.03 ± 1.20 | 0.00 ± 0.00 |
GLM 4.7 Flash 30B MoE Z.ai | 24.67 ± 0.28 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Apertus 8B Swiss AI | 22.37 ± 0.31 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.1 8B Meta | 21.22 ± 0.18 | 4.31 ± 0.65 | 3.84 ± 0.99 | 4.77 ± 0.71 |
Llama 3.2 3B Meta | 16.86 ± 0.20 | 1.84 ± 0.48 | 3.01 ± 0.90 | 0.67 ± 0.38 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 16.26 ± 0.20 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
MERaLiON 2 3B A*STAR | 12.51 ± 0.24 | 2.44 ± 0.54 | 4.88 ± 1.07 | 0.00 ± 0.00 |
Tiny Aya Global 3B CohereLabs | 5.62 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tiny Aya Water 3B CohereLabs | 4.42 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | TA | Multi-Turn Chat | SEA-MT-Bench (LLM Judge) |
|---|---|---|---|
Gemma 4 31B | 75.41 ± 0.16 | 78.27 ± 0.44 | 78.27 ± 0.44 |
SEA-LION v4.5 (Qwen) 27B AISG | 75.00 ± 0.14 | 79.74 ± 0.38 | 79.74 ± 0.38 |
Qwen 3.5 27B Alibaba | 72.77 ± 0.18 | 79.86 ± 0.39 | 79.86 ± 0.39 |
Qwen 3.5 122B MoE Alibaba | 72.23 ± 0.14 | 77.74 ± 0.47 | 77.74 ± 0.47 |
Gemma 4 26B MoE | 72.20 ± 0.16 | 77.71 ± 0.51 | 77.71 ± 0.51 |
SEA-LION v4 (Qwen) 32B AISG | 71.47 ± 0.13 | 70.51 ± 0.38 | 70.51 ± 0.38 |
SEA-LION v4 (Gemma) 27B AISG | 69.66 ± 0.20 | 69.50 ± 0.47 | 69.50 ± 0.47 |
Gemma 3 27B | 69.47 ± 0.19 | 67.91 ± 0.48 | 67.91 ± 0.48 |
Qwen 3.6 27B Alibaba | 69.04 ± 0.16 | 76.97 ± 0.55 | 76.97 ± 0.55 |
Qwen 3 VL 32B Alibaba | 68.99 ± 0.16 | 71.47 ± 0.50 | 71.47 ± 0.50 |
Qwen 3.6 35B MoE Alibaba | 68.62 ± 0.19 | 75.38 ± 0.58 | 75.38 ± 0.58 |
Llama 4 Scout 109B MoE Meta | 67.81 ± 0.14 | 56.01 ± 0.45 | 56.01 ± 0.45 |
Mistral Medium 3.5 128B Mistral AI | 67.37 ± 0.19 | 72.84 ± 0.53 | 72.84 ± 0.53 |
Gemma 3 12B | 66.51 ± 0.08 | 63.95 ± 0.48 | 63.95 ± 0.48 |
Gemma 4 (E4B) 8B | 65.96 ± 0.18 | 71.92 ± 0.42 | 71.92 ± 0.42 |
SEA-LION v3 (Llama) 70B AISG | 65.46 ± 0.30 | 60.62 ± 0.56 | 60.62 ± 0.56 |
Qwen 3.5 35B MoE Alibaba | 65.00 ± 0.22 | 75.16 ± 0.37 | 75.16 ± 0.37 |
Llama 3.3 70B Meta | 62.71 ± 0.17 | 48.51 ± 0.63 | 48.51 ± 0.63 |
SEA-LION v3 (Gemma 2) 9B AISG | 61.93 ± 0.22 | 56.26 ± 0.51 | 56.26 ± 0.51 |
SEA-LION v4 (Gemma VL) 4B AISG | 55.10 ± 0.19 | 57.48 ± 0.35 | 57.48 ± 0.35 |
Qwen 3.5 9B Alibaba | 54.40 ± 0.29 | 63.45 ± 0.62 | 63.45 ± 0.62 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.85 ± 0.20 | 61.12 ± 0.46 | 61.12 ± 0.46 |
Mistral Small 4 119B MoE Mistral AI | 53.83 ± 0.30 | 65.21 ± 0.71 | 65.21 ± 0.71 |
SEA-LION v4 (Qwen VL) 8B AISG | 53.53 ± 0.17 | 46.62 ± 0.64 | 46.62 ± 0.64 |
Gemma 4 (E2B) 5B | 52.92 ± 0.19 | 63.36 ± 0.54 | 63.36 ± 0.54 |
Qwen 3 VL 8B Alibaba | 49.60 ± 0.17 | 46.83 ± 0.44 | 46.83 ± 0.44 |
SEA-LION v3 (Llama) 8B AISG | 46.95 ± 0.25 | 46.03 ± 0.60 | 46.03 ± 0.60 |
Gemma 3 4B | 44.25 ± 0.19 | 53.57 ± 0.48 | 53.57 ± 0.48 |
MERaLiON 2 10B A*STAR | 43.79 ± 0.24 | 29.21 ± 0.68 | 29.21 ± 0.68 |
SEA-LION v4 (Qwen VL) 4B AISG | 43.28 ± 0.19 | 33.91 ± 0.55 | 33.91 ± 0.55 |
SEA-LION v4 (Apertus) 8B AISG | 38.82 ± 0.27 | 33.59 ± 0.52 | 33.59 ± 0.52 |
Qwen 3 VL 4B Alibaba | 38.44 ± 0.22 | 29.84 ± 0.66 | 29.84 ± 0.66 |
Qwen 3.5 4B Alibaba | 36.74 ± 0.23 | 41.56 ± 0.73 | 41.56 ± 0.73 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 33.59 ± 0.42 | 64.95 ± 0.99 | 64.95 ± 0.99 |
Olmo 3.1 32B AI2 | 30.81 ± 0.23 | 46.51 ± 0.59 | 46.51 ± 0.59 |
GLM 4.7 Flash 30B MoE Z.ai | 24.67 ± 0.28 | 38.41 ± 0.61 | 38.41 ± 0.61 |
Apertus 8B Swiss AI | 22.37 ± 0.31 | 29.27 ± 0.68 | 29.27 ± 0.68 |
Llama 3.1 8B Meta | 21.22 ± 0.18 | 24.53 ± 0.59 | 24.53 ± 0.59 |
Llama 3.2 3B Meta | 16.86 ± 0.20 | 17.81 ± 0.58 | 17.81 ± 0.58 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 16.26 ± 0.20 | 30.23 ± 0.77 | 30.23 ± 0.77 |
MERaLiON 2 3B A*STAR | 12.51 ± 0.24 | 13.33 ± 0.50 | 13.33 ± 0.50 |
Tiny Aya Global 3B CohereLabs | 5.62 ± 0.17 | 3.84 ± 0.28 | 3.84 ± 0.28 |
Tiny Aya Water 3B CohereLabs | 4.42 ± 0.16 | 3.91 ± 0.31 | 3.91 ± 0.31 |
Model | TA | NLG | Summarization | Translations |
|---|---|---|---|---|
Gemma 4 31B | 75.41 ± 0.16 | 51.46 ± 0.04 | 13.23 ± 0.08 | 89.69 ± 0.02 |
SEA-LION v4.5 (Qwen) 27B AISG | 75.00 ± 0.14 | 50.51 ± 0.05 | 12.11 ± 0.10 | 88.91 ± 0.02 |
Qwen 3.5 27B Alibaba | 72.77 ± 0.18 | 49.50 ± 0.07 | 11.59 ± 0.12 | 87.41 ± 0.02 |
Qwen 3.5 122B MoE Alibaba | 72.23 ± 0.14 | 49.93 ± 0.07 | 11.94 ± 0.12 | 87.91 ± 0.03 |
Gemma 4 26B MoE | 72.20 ± 0.16 | 50.40 ± 0.06 | 12.12 ± 0.11 | 88.67 ± 0.02 |
SEA-LION v4 (Qwen) 32B AISG | 71.47 ± 0.13 | 50.05 ± 0.04 | 13.68 ± 0.09 | 86.42 ± 0.02 |
SEA-LION v4 (Gemma) 27B AISG | 69.66 ± 0.20 | 50.37 ± 0.05 | 11.35 ± 0.09 | 89.39 ± 0.02 |
Gemma 3 27B | 69.47 ± 0.19 | 50.09 ± 0.05 | 11.30 ± 0.10 | 88.88 ± 0.02 |
Qwen 3.6 27B Alibaba | 69.04 ± 0.16 | 48.41 ± 0.08 | 12.15 ± 0.14 | 84.67 ± 0.06 |
Qwen 3 VL 32B Alibaba | 68.99 ± 0.16 | 45.22 ± 0.07 | 12.21 ± 0.10 | 78.23 ± 0.08 |
Qwen 3.6 35B MoE Alibaba | 68.62 ± 0.19 | 48.29 ± 0.10 | 12.35 ± 0.19 | 84.22 ± 0.06 |
Llama 4 Scout 109B MoE Meta | 67.81 ± 0.14 | 50.07 ± 0.07 | 14.43 ± 0.13 | 85.72 ± 0.02 |
Mistral Medium 3.5 128B Mistral AI | 67.37 ± 0.19 | 46.25 ± 0.08 | 11.43 ± 0.13 | 81.06 ± 0.09 |
Gemma 3 12B | 66.51 ± 0.08 | 49.68 ± 0.00 | 11.11 ± 0.00 | 88.26 ± 0.00 |
Gemma 4 (E4B) 8B | 65.96 ± 0.18 | 48.67 ± 0.09 | 10.73 ± 0.16 | 86.61 ± 0.04 |
SEA-LION v3 (Llama) 70B AISG | 65.46 ± 0.30 | 49.33 ± 0.08 | 13.98 ± 0.14 | 84.68 ± 0.04 |
Qwen 3.5 35B MoE Alibaba | 65.00 ± 0.22 | 46.88 ± 0.07 | 10.44 ± 0.15 | 83.32 ± 0.06 |
Llama 3.3 70B Meta | 62.71 ± 0.17 | 47.06 ± 0.09 | 14.08 ± 0.17 | 80.04 ± 0.07 |
SEA-LION v3 (Gemma 2) 9B AISG | 61.93 ± 0.22 | 44.94 ± 0.09 | 12.91 ± 0.14 | 76.97 ± 0.08 |
SEA-LION v4 (Gemma VL) 4B AISG | 55.10 ± 0.19 | 43.75 ± 0.05 | 6.22 ± 0.10 | 81.29 ± 0.04 |
Qwen 3.5 9B Alibaba | 54.40 ± 0.29 | 36.87 ± 0.09 | 9.45 ± 0.12 | 64.29 ± 0.12 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.85 ± 0.20 | 41.07 ± 0.07 | 10.70 ± 0.09 | 71.44 ± 0.13 |
Mistral Small 4 119B MoE Mistral AI | 53.83 ± 0.30 | 39.80 ± 0.14 | 11.01 ± 0.15 | 68.60 ± 0.18 |
SEA-LION v4 (Qwen VL) 8B AISG | 53.53 ± 0.17 | 42.84 ± 0.05 | 11.92 ± 0.08 | 73.77 ± 0.07 |
Gemma 4 (E2B) 5B | 52.92 ± 0.19 | 42.79 ± 0.07 | 10.40 ± 0.10 | 75.18 ± 0.08 |
Qwen 3 VL 8B Alibaba | 49.60 ± 0.17 | 34.27 ± 0.08 | 7.51 ± 0.12 | 61.03 ± 0.12 |
SEA-LION v3 (Llama) 8B AISG | 46.95 ± 0.25 | 44.26 ± 0.08 | 11.45 ± 0.13 | 77.07 ± 0.09 |
Gemma 3 4B | 44.25 ± 0.19 | 44.82 ± 0.06 | 8.85 ± 0.11 | 80.80 ± 0.03 |
MERaLiON 2 10B A*STAR | 43.79 ± 0.24 | 29.71 ± 0.11 | 6.88 ± 0.14 | 52.55 ± 0.14 |
SEA-LION v4 (Qwen VL) 4B AISG | 43.28 ± 0.19 | 21.67 ± 0.03 | 0.05 ± 0.00 | 43.29 ± 0.06 |
SEA-LION v4 (Apertus) 8B AISG | 38.82 ± 0.27 | 39.95 ± 0.07 | 7.15 ± 0.13 | 72.75 ± 0.08 |
Qwen 3 VL 4B Alibaba | 38.44 ± 0.22 | 18.13 ± 0.01 | 0.00 ± 0.00 | 36.26 ± 0.03 |
Qwen 3.5 4B Alibaba | 36.74 ± 0.23 | 17.85 ± 0.14 | 5.08 ± 0.17 | 30.62 ± 0.22 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 33.59 ± 0.42 | 26.02 ± 0.08 | 5.92 ± 0.15 | 46.12 ± 0.09 |
Olmo 3.1 32B AI2 | 30.81 ± 0.23 | 40.91 ± 0.07 | 10.78 ± 0.11 | 71.04 ± 0.07 |
GLM 4.7 Flash 30B MoE Z.ai | 24.67 ± 0.28 | 35.58 ± 0.11 | 4.40 ± 0.23 | 66.77 ± 0.10 |
Apertus 8B Swiss AI | 22.37 ± 0.31 | 30.19 ± 0.13 | 7.75 ± 0.18 | 52.64 ± 0.17 |
Llama 3.1 8B Meta | 21.22 ± 0.18 | 30.08 ± 0.15 | 12.15 ± 0.19 | 48.01 ± 0.18 |
Llama 3.2 3B Meta | 16.86 ± 0.20 | 27.19 ± 0.16 | 10.14 ± 0.28 | 44.25 ± 0.15 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 16.26 ± 0.20 | 12.42 ± 0.08 | 1.93 ± 0.15 | 22.90 ± 0.13 |
MERaLiON 2 3B A*STAR | 12.51 ± 0.24 | 22.19 ± 0.10 | 4.67 ± 0.14 | 39.70 ± 0.12 |
Tiny Aya Global 3B CohereLabs | 5.62 ± 0.17 | 5.90 ± 0.07 | 0.40 ± 0.02 | 11.40 ± 0.13 |
Tiny Aya Water 3B CohereLabs | 4.42 ± 0.16 | 4.44 ± 0.05 | 0.19 ± 0.02 | 8.70 ± 0.10 |
Model | TA | NLR | Causal Reasoning | Natural Language Inference |
|---|---|---|---|---|
Gemma 4 31B | 75.41 ± 0.16 | 75.31 ± 0.05 | 92.08 ± 0.10 | 58.54 ± 0.08 |
SEA-LION v4.5 (Qwen) 27B AISG | 75.00 ± 0.14 | 73.53 ± 0.19 | 93.67 ± 0.32 | 53.39 ± 0.25 |
Qwen 3.5 27B Alibaba | 72.77 ± 0.18 | 68.91 ± 0.27 | 90.24 ± 0.40 | 47.58 ± 0.37 |
Qwen 3.5 122B MoE Alibaba | 72.23 ± 0.14 | 66.47 ± 0.16 | 88.49 ± 0.26 | 44.45 ± 0.22 |
Gemma 4 26B MoE | 72.20 ± 0.16 | 68.96 ± 0.13 | 88.09 ± 0.20 | 49.83 ± 0.11 |
SEA-LION v4 (Qwen) 32B AISG | 71.47 ± 0.13 | 69.32 ± 0.09 | 87.24 ± 0.08 | 51.40 ± 0.14 |
SEA-LION v4 (Gemma) 27B AISG | 69.66 ± 0.20 | 71.50 ± 0.09 | 88.56 ± 0.15 | 54.45 ± 0.13 |
Gemma 3 27B | 69.47 ± 0.19 | 72.18 ± 0.10 | 89.25 ± 0.16 | 55.11 ± 0.11 |
Qwen 3.6 27B Alibaba | 69.04 ± 0.16 | 56.35 ± 0.36 | 88.40 ± 0.44 | 24.30 ± 0.53 |
Qwen 3 VL 32B Alibaba | 68.99 ± 0.16 | 63.63 ± 0.18 | 82.43 ± 0.21 | 44.84 ± 0.23 |
Qwen 3.6 35B MoE Alibaba | 68.62 ± 0.19 | 62.38 ± 0.28 | 85.29 ± 0.51 | 39.46 ± 0.31 |
Llama 4 Scout 109B MoE Meta | 67.81 ± 0.14 | 64.08 ± 0.07 | 84.35 ± 0.10 | 43.81 ± 0.15 |
Mistral Medium 3.5 128B Mistral AI | 67.37 ± 0.19 | 60.01 ± 0.24 | 86.04 ± 0.37 | 33.99 ± 0.35 |
Gemma 3 12B | 66.51 ± 0.08 | 63.86 ± 0.00 | 85.20 ± 0.00 | 42.51 ± 0.00 |
Gemma 4 (E4B) 8B | 65.96 ± 0.18 | 54.98 ± 0.32 | 79.87 ± 0.46 | 30.09 ± 0.31 |
SEA-LION v3 (Llama) 70B AISG | 65.46 ± 0.30 | 62.97 ± 0.42 | 83.85 ± 0.55 | 42.09 ± 0.60 |
Qwen 3.5 35B MoE Alibaba | 65.00 ± 0.22 | 58.54 ± 0.37 | 77.15 ± 0.66 | 39.94 ± 0.42 |
Llama 3.3 70B Meta | 62.71 ± 0.17 | 64.65 ± 0.17 | 84.41 ± 0.29 | 44.90 ± 0.23 |
SEA-LION v3 (Gemma 2) 9B AISG | 61.93 ± 0.22 | 53.49 ± 0.25 | 79.52 ± 0.48 | 27.46 ± 0.20 |
SEA-LION v4 (Gemma VL) 4B AISG | 55.10 ± 0.19 | 48.49 ± 0.25 | 56.60 ± 0.49 | 40.37 ± 0.16 |
Qwen 3.5 9B Alibaba | 54.40 ± 0.29 | 34.26 ± 0.57 | 54.76 ± 1.06 | 13.76 ± 0.67 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.85 ± 0.20 | 38.74 ± 0.25 | 51.84 ± 0.43 | 25.64 ± 0.22 |
Mistral Small 4 119B MoE Mistral AI | 53.83 ± 0.30 | 47.16 ± 0.56 | 63.92 ± 0.76 | 30.40 ± 0.63 |
SEA-LION v4 (Qwen VL) 8B AISG | 53.53 ± 0.17 | 27.36 ± 0.41 | 39.44 ± 0.82 | 15.28 ± 0.13 |
Gemma 4 (E2B) 5B | 52.92 ± 0.19 | 37.14 ± 0.22 | 52.59 ± 0.37 | 21.69 ± 0.25 |
Qwen 3 VL 8B Alibaba | 49.60 ± 0.17 | 24.99 ± 0.40 | 36.89 ± 0.77 | 13.09 ± 0.09 |
SEA-LION v3 (Llama) 8B AISG | 46.95 ± 0.25 | 23.67 ± 0.55 | 24.81 ± 0.94 | 22.53 ± 0.53 |
Gemma 3 4B | 44.25 ± 0.19 | 16.08 ± 0.08 | 0.00 ± 0.00 | 32.16 ± 0.17 |
MERaLiON 2 10B A*STAR | 43.79 ± 0.24 | 46.02 ± 0.23 | 73.87 ± 0.46 | 18.18 ± 0.25 |
SEA-LION v4 (Qwen VL) 4B AISG | 43.28 ± 0.19 | 21.79 ± 0.20 | 43.57 ± 0.40 | 0.00 ± 0.00 |
SEA-LION v4 (Apertus) 8B AISG | 38.82 ± 0.27 | 1.40 ± 0.15 | 0.00 ± 0.00 | 2.80 ± 0.31 |
Qwen 3 VL 4B Alibaba | 38.44 ± 0.22 | 24.94 ± 0.33 | 35.71 ± 0.68 | 14.18 ± 0.11 |
Qwen 3.5 4B Alibaba | 36.74 ± 0.23 | 15.64 ± 0.74 | 29.73 ± 1.37 | 1.54 ± 0.41 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 33.59 ± 0.42 | 6.19 ± 0.28 | 0.00 ± 0.00 | 12.38 ± 0.56 |
Olmo 3.1 32B AI2 | 30.81 ± 0.23 | 7.22 ± 0.47 | 14.44 ± 0.94 | 0.00 ± 0.00 |
GLM 4.7 Flash 30B MoE Z.ai | 24.67 ± 0.28 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Apertus 8B Swiss AI | 22.37 ± 0.31 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.1 8B Meta | 21.22 ± 0.18 | 0.42 ± 0.17 | 0.00 ± 0.00 | 0.83 ± 0.34 |
Llama 3.2 3B Meta | 16.86 ± 0.20 | 0.02 ± 0.03 | 0.00 ± 0.00 | 0.05 ± 0.06 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 16.26 ± 0.20 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
MERaLiON 2 3B A*STAR | 12.51 ± 0.24 | 1.01 ± 0.34 | 0.73 ± 0.62 | 1.29 ± 0.18 |
Tiny Aya Global 3B CohereLabs | 5.62 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tiny Aya Water 3B CohereLabs | 4.42 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | TA | NLU | Question Answering | Sentiment Analysis |
|---|---|---|---|---|
Gemma 4 31B | 75.41 ± 0.16 | 83.40 ± 0.16 | 70.04 ± 0.33 | 96.76 ± 0.03 |
SEA-LION v4.5 (Qwen) 27B AISG | 75.00 ± 0.14 | 83.74 ± 0.14 | 71.48 ± 0.28 | 96.01 ± 0.06 |
Qwen 3.5 27B Alibaba | 72.77 ± 0.18 | 82.60 ± 0.24 | 69.42 ± 0.46 | 95.77 ± 0.14 |
Qwen 3.5 122B MoE Alibaba | 72.23 ± 0.14 | 83.54 ± 0.22 | 71.11 ± 0.40 | 95.98 ± 0.12 |
Gemma 4 26B MoE | 72.20 ± 0.16 | 81.32 ± 0.30 | 66.78 ± 0.60 | 95.85 ± 0.04 |
SEA-LION v4 (Qwen) 32B AISG | 71.47 ± 0.13 | 83.98 ± 0.11 | 72.64 ± 0.22 | 95.31 ± 0.05 |
SEA-LION v4 (Gemma) 27B AISG | 69.66 ± 0.20 | 82.61 ± 0.28 | 68.02 ± 0.55 | 97.20 ± 0.06 |
Gemma 3 27B | 69.47 ± 0.19 | 81.50 ± 0.29 | 65.68 ± 0.58 | 97.32 ± 0.05 |
Qwen 3.6 27B Alibaba | 69.04 ± 0.16 | 80.50 ± 0.27 | 69.98 ± 0.46 | 91.02 ± 0.22 |
Qwen 3 VL 32B Alibaba | 68.99 ± 0.16 | 80.04 ± 0.17 | 65.96 ± 0.36 | 94.12 ± 0.07 |
Qwen 3.6 35B MoE Alibaba | 68.62 ± 0.19 | 80.76 ± 0.28 | 67.19 ± 0.57 | 94.33 ± 0.15 |
Llama 4 Scout 109B MoE Meta | 67.81 ± 0.14 | 82.78 ± 0.09 | 68.78 ± 0.18 | 96.77 ± 0.03 |
Mistral Medium 3.5 128B Mistral AI | 67.37 ± 0.19 | 82.03 ± 0.33 | 67.66 ± 0.60 | 96.40 ± 0.15 |
Gemma 3 12B | 66.51 ± 0.08 | 82.09 ± 0.00 | 68.97 ± 0.00 | 95.20 ± 0.00 |
Gemma 4 (E4B) 8B | 65.96 ± 0.18 | 79.96 ± 0.31 | 64.45 ± 0.60 | 95.48 ± 0.10 |
SEA-LION v3 (Llama) 70B AISG | 65.46 ± 0.30 | 81.75 ± 0.25 | 67.78 ± 0.47 | 95.71 ± 0.20 |
Qwen 3.5 35B MoE Alibaba | 65.00 ± 0.22 | 79.61 ± 0.32 | 65.34 ± 0.60 | 93.88 ± 0.24 |
Llama 3.3 70B Meta | 62.71 ± 0.17 | 81.24 ± 0.15 | 66.87 ± 0.32 | 95.60 ± 0.05 |
SEA-LION v3 (Gemma 2) 9B AISG | 61.93 ± 0.22 | 81.90 ± 0.26 | 68.80 ± 0.51 | 94.99 ± 0.07 |
SEA-LION v4 (Gemma VL) 4B AISG | 55.10 ± 0.19 | 79.28 ± 0.19 | 62.75 ± 0.36 | 95.82 ± 0.07 |
Qwen 3.5 9B Alibaba | 54.40 ± 0.29 | 75.08 ± 0.36 | 62.14 ± 0.62 | 88.01 ± 0.32 |
SEA-LION v4.5 (Gemma E2B) 5B AISG | 53.85 ± 0.20 | 79.43 ± 0.20 | 63.71 ± 0.39 | 95.15 ± 0.05 |
Mistral Small 4 119B MoE Mistral AI | 53.83 ± 0.30 | 78.83 ± 0.29 | 64.61 ± 0.55 | 93.04 ± 0.15 |
SEA-LION v4 (Qwen VL) 8B AISG | 53.53 ± 0.17 | 80.17 ± 0.20 | 69.84 ± 0.38 | 90.49 ± 0.08 |
Gemma 4 (E2B) 5B | 52.92 ± 0.19 | 78.70 ± 0.17 | 62.08 ± 0.34 | 95.32 ± 0.03 |
Qwen 3 VL 8B Alibaba | 49.60 ± 0.17 | 80.48 ± 0.14 | 69.55 ± 0.29 | 91.42 ± 0.12 |
SEA-LION v3 (Llama) 8B AISG | 46.95 ± 0.25 | 74.50 ± 0.36 | 56.73 ± 0.63 | 92.27 ± 0.26 |
Gemma 3 4B | 44.25 ± 0.19 | 73.83 ± 0.25 | 54.33 ± 0.51 | 93.32 ± 0.07 |
MERaLiON 2 10B A*STAR | 43.79 ± 0.24 | 49.10 ± 0.47 | 45.64 ± 0.62 | 52.57 ± 0.78 |
SEA-LION v4 (Qwen VL) 4B AISG | 43.28 ± 0.19 | 77.37 ± 0.15 | 63.33 ± 0.26 | 91.41 ± 0.07 |
SEA-LION v4 (Apertus) 8B AISG | 38.82 ± 0.27 | 74.37 ± 0.27 | 61.32 ± 0.36 | 87.43 ± 0.32 |
Qwen 3 VL 4B Alibaba | 38.44 ± 0.22 | 72.69 ± 0.16 | 60.48 ± 0.28 | 84.91 ± 0.14 |
Qwen 3.5 4B Alibaba | 36.74 ± 0.23 | 72.27 ± 0.42 | 58.89 ± 0.77 | 85.65 ± 0.36 |
NVIDIA Nemotron 3 Super 120B MoE NVIDIA | 33.59 ± 0.42 | 53.15 ± 0.55 | 56.13 ± 0.92 | 50.18 ± 0.63 |
Olmo 3.1 32B AI2 | 30.81 ± 0.23 | 22.81 ± 0.35 | 45.32 ± 0.62 | 0.31 ± 0.25 |
GLM 4.7 Flash 30B MoE Z.ai | 24.67 ± 0.28 | 21.18 ± 0.62 | 42.36 ± 1.24 | 0.00 ± 0.00 |
Apertus 8B Swiss AI | 22.37 ± 0.31 | 29.14 ± 0.72 | 41.66 ± 1.21 | 16.62 ± 0.93 |
Llama 3.1 8B Meta | 21.22 ± 0.18 | 25.39 ± 0.58 | 50.79 ± 1.16 | 0.00 ± 0.00 |
Llama 3.2 3B Meta | 16.86 ± 0.20 | 17.64 ± 0.44 | 35.29 ± 0.88 | 0.00 ± 0.00 |
NVIDIA Nemotron 3 Nano 30B MoE NVIDIA | 16.26 ± 0.20 | 12.34 ± 0.30 | 24.67 ± 0.60 | 0.00 ± 0.00 |
MERaLiON 2 3B A*STAR | 12.51 ± 0.24 | 7.50 ± 0.39 | 15.00 ± 0.79 | 0.00 ± 0.00 |
Tiny Aya Global 3B CohereLabs | 5.62 ± 0.17 | 1.74 ± 0.25 | 3.48 ± 0.51 | 0.00 ± 0.00 |
Tiny Aya Water 3B CohereLabs | 4.42 ± 0.16 | 1.10 ± 0.16 | 2.20 ± 0.31 | 0.00 ± 0.00 |