Malay Performance
Malay Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
32B 63.65±0.14 |
80B MoE 62.80±0.12 |
32B 61.36±0.14 |
30B MoE 61.13±0.14 |
27B 61.10±0.16 |
27B 60.92±0.17 |
70B 59.94±0.17 |
32B 59.67±0.17 |
72B 59.48±0.14 |
70B 58.80±0.15 |
8B 58.46±0.11 |
8B 58.38±0.14 |
12B 57.96±0.12 |
70B 57.39±0.19 |
109B MoE 57.38±0.11 |
70B 55.67±0.18 |
14B 55.04±0.11 |
111B 54.76±0.17 |
9B 54.70±0.18 |
27B 54.06±0.22 |
8B 53.83±0.15 |
32B 53.71±0.13 |
4B 53.58±0.15 |
4B 53.23±0.14 |
123B 52.73±0.21 |
14B 51.49±0.13 |
8B 51.32±0.28 |
9B 49.72±0.21 |
10B 49.12±0.22 |
32B 48.35±0.20 |
32B 48.28±0.23 |
7B 46.94±0.13 |
21B MoE 45.83±0.18 |
8B 45.49±0.22 |
20B 45.19±0.17 |
24B 44.39±0.22 |
104B 44.12±0.26 |
8B 44.12±0.18 |
8B 43.90±0.19 |
70B 43.42±0.10 |
8B 42.86±0.14 |
70B 42.69±0.24 |
14B 39.71±0.24 |
32B 39.00±0.21 |
8B 38.91±0.31 |
13B 38.54±0.19 |
7B 36.82±0.28 |
8B 35.77±0.16 |
9B 31.66±0.22 |
7B 30.18±0.22 |
7B 29.81±0.22 |
83B 28.79±0.27 |
7B 28.08±0.26 |
8B 23.78±0.30 |
Malay Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MS | Instruction Following | Multi-Turn Chat | NLG | NLU | Safety | Knowledge |
|---|---|---|---|---|---|---|---|
Qwen 3 VL 32B Alibaba | 63.65 ± 0.14 | 82.35 ± 0.55 | 54.32 ± 0.50 | 90.62 ± 0.02 | 69.36 ± 0.07 | 14.40 ± 0.20 | 70.84 ± 0.24 |
Qwen 3 Next 80B MoE Alibaba | 62.80 ± 0.12 | 83.33 ± 0.32 | 57.47 ± 0.62 | 90.14 ± 0.01 | 67.90 ± 0.13 | 7.58 ± 0.17 | 70.36 ± 0.19 |
SEA-LION v4 (Qwen) 32B AISG | 61.36 ± 0.14 | 81.21 ± 0.50 | 42.30 ± 0.70 | 90.27 ± 0.02 | 68.42 ± 0.09 | 16.31 ± 0.13 | 69.65 ± 0.18 |
Qwen 3 30B MoE Alibaba | 61.13 ± 0.14 | 81.46 ± 0.47 | 51.48 ± 0.57 | 90.07 ± 0.02 | 67.60 ± 0.05 | 13.06 ± 0.11 | 63.12 ± 0.26 |
SEA-LION v4 (Gemma) 27B AISG | 61.10 ± 0.16 | 83.37 ± 0.65 | 44.78 ± 0.80 | 91.71 ± 0.02 | 69.31 ± 0.10 | 14.57 ± 0.21 | 62.86 ± 0.28 |
Gemma 3 27B | 60.92 ± 0.17 | 81.65 ± 0.59 | 45.89 ± 0.56 | 91.83 ± 0.01 | 69.16 ± 0.08 | 13.86 ± 0.25 | 63.13 ± 0.34 |
SEA-LION v3 (Llama) 70B AISG | 59.94 ± 0.17 | 85.62 ± 0.51 | 25.55 ± 0.75 | 91.01 ± 0.02 | 69.71 ± 0.14 | 13.62 ± 0.36 | 74.12 ± 0.44 |
Qwen 3 32B Alibaba | 59.67 ± 0.17 | 80.38 ± 0.62 | 41.64 ± 0.74 | 88.97 ± 0.03 | 66.47 ± 0.13 | 13.18 ± 0.29 | 67.38 ± 0.27 |
Qwen 2.5 72B Alibaba | 59.48 ± 0.14 | 79.24 ± 0.58 | 32.76 ± 0.55 | 89.60 ± 0.03 | 69.11 ± 0.10 | 15.39 ± 0.22 | 70.78 ± 0.32 |
Llama 3.3 70B Meta | 58.80 ± 0.15 | 86.51 ± 0.40 | 17.07 ± 0.57 | 90.10 ± 0.02 | 69.75 ± 0.09 | 17.48 ± 0.19 | 71.86 ± 0.19 |
SEA-LION v4 (Qwen VL) 8B AISG | 58.46 ± 0.11 | 84.32 ± 0.37 | 40.34 ± 0.58 | 88.80 ± 0.03 | 66.81 ± 0.08 | 16.07 ± 0.23 | 54.39 ± 0.28 |
Qwen 3 VL 8B Alibaba | 58.38 ± 0.14 | 83.46 ± 0.34 | 39.93 ± 0.71 | 88.26 ± 0.03 | 66.73 ± 0.10 | 15.87 ± 0.18 | 56.05 ± 0.28 |
Gemma 3 12B | 57.96 ± 0.12 | 81.65 ± 0.67 | 38.02 ± 0.60 | 91.02 ± 0.02 | 67.67 ± 0.07 | 11.57 ± 0.18 | 57.83 ± 0.25 |
Tulu 3 70B AI2 | 57.39 ± 0.19 | 80.00 ± 0.65 | 27.60 ± 0.58 | 91.06 ± 0.02 | 67.60 ± 0.20 | 11.11 ± 0.27 | 66.97 ± 0.30 |
Llama 4 Scout 109B MoE Meta | 57.38 ± 0.11 | 84.38 ± 0.46 | 21.34 ± 0.52 | 91.03 ± 0.02 | 68.72 ± 0.07 | 14.05 ± 0.13 | 64.79 ± 0.14 |
Llama 3.1 70B Meta | 55.67 ± 0.18 | 75.05 ± 0.86 | 13.97 ± 0.48 | 90.34 ± 0.02 | 69.16 ± 0.17 | 15.90 ± 0.24 | 69.58 ± 0.35 |
Qwen 3 14B Alibaba | 55.04 ± 0.11 | 78.35 ± 0.42 | 31.61 ± 0.52 | 87.44 ± 0.02 | 66.47 ± 0.11 | 12.54 ± 0.22 | 53.83 ± 0.26 |
Command A 03-2025 111B CohereLabs | 54.76 ± 0.17 | 84.79 ± 0.63 | 40.92 ± 0.72 | 87.16 ± 0.07 | 49.65 ± 0.38 | 0.00 ± 0.00 | 66.04 ± 0.35 |
SEA-LION v3 (Gemma 2) 9B AISG | 54.70 ± 0.18 | 81.05 ± 0.85 | 25.46 ± 0.63 | 90.38 ± 0.03 | 65.12 ± 0.15 | 12.19 ± 0.41 | 53.98 ± 0.44 |
Gemma 2 27B | 54.06 ± 0.22 | 74.00 ± 0.80 | 16.03 ± 0.55 | 90.48 ± 0.03 | 68.19 ± 0.12 | 14.11 ± 0.53 | 61.53 ± 0.38 |
Qwen 3 8B Alibaba | 53.83 ± 0.15 | 79.05 ± 0.63 | 26.29 ± 0.66 | 87.19 ± 0.04 | 63.43 ± 0.15 | 15.30 ± 0.17 | 51.70 ± 0.33 |
Qwen 2.5 32B Alibaba | 53.71 ± 0.13 | 79.24 ± 0.64 | 19.25 ± 0.44 | 86.01 ± 0.04 | 67.36 ± 0.09 | 9.77 ± 0.22 | 60.61 ± 0.25 |
SEA-LION v4 (Qwen VL) 4B AISG | 53.58 ± 0.15 | 80.57 ± 0.53 | 29.77 ± 0.69 | 86.22 ± 0.02 | 64.99 ± 0.05 | 12.92 ± 0.09 | 46.99 ± 0.20 |
Qwen 3 VL 4B Alibaba | 53.23 ± 0.14 | 80.54 ± 0.49 | 30.11 ± 0.56 | 85.94 ± 0.04 | 64.00 ± 0.08 | 12.97 ± 0.11 | 45.81 ± 0.28 |
Mistral Large 2411 123B Mistral AI | 52.73 ± 0.21 | 79.56 ± 0.87 | 21.49 ± 0.49 | 89.19 ± 0.04 | 53.06 ± 0.30 | 14.72 ± 0.54 | 58.38 ± 0.45 |
Qwen 2.5 14B Alibaba | 51.49 ± 0.13 | 74.54 ± 0.61 | 15.00 ± 0.41 | 85.33 ± 0.05 | 65.20 ± 0.09 | 15.65 ± 0.17 | 53.19 ± 0.19 |
SEA-LION v3 (Llama) 8B AISG | 51.32 ± 0.28 | 78.13 ± 0.74 | 20.11 ± 0.82 | 89.46 ± 0.03 | 60.30 ± 0.20 | 15.08 ± 0.29 | 44.86 ± 0.53 |
Gemma 2 9B | 49.72 ± 0.21 | 73.14 ± 0.88 | 12.08 ± 0.44 | 89.08 ± 0.04 | 61.98 ± 0.15 | 10.75 ± 0.28 | 51.31 ± 0.36 |
MERaLiON 2 10B A*STAR | 49.12 ± 0.22 | 69.11 ± 0.95 | 11.61 ± 0.31 | 88.16 ± 0.04 | 62.51 ± 0.20 | 14.04 ± 0.34 | 49.27 ± 0.52 |
Aya Expanse 32B CohereLabs | 48.35 ± 0.20 | 73.87 ± 0.79 | 18.74 ± 0.74 | 86.71 ± 0.04 | 48.22 ± 0.19 | 13.13 ± 0.25 | 49.44 ± 0.26 |
Olmo 2 0325 32B AI2 | 48.28 ± 0.23 | 75.08 ± 1.16 | 11.29 ± 0.48 | 82.23 ± 0.10 | 61.19 ± 0.27 | 12.31 ± 0.18 | 47.58 ± 0.64 |
Qwen 2.5 7B Alibaba | 46.94 ± 0.13 | 69.37 ± 0.68 | 14.51 ± 0.38 | 81.40 ± 0.05 | 56.37 ± 0.08 | 13.91 ± 0.12 | 46.07 ± 0.27 |
ERNIE 4.5 21B MoE Baidu | 45.83 ± 0.18 | 74.57 ± 0.85 | 14.38 ± 0.52 | 88.08 ± 0.04 | 50.07 ± 0.27 | 14.74 ± 0.16 | 33.11 ± 0.64 |
Llama 3.1 8B Meta | 45.49 ± 0.22 | 64.63 ± 1.08 | 10.07 ± 0.37 | 87.30 ± 0.03 | 59.11 ± 0.17 | 15.03 ± 0.24 | 36.82 ± 0.54 |
Sailor2 20B SAIL | 45.19 ± 0.17 | 41.02 ± 0.68 | 25.88 ± 0.68 | 76.16 ± 0.18 | 65.47 ± 0.14 | 12.73 ± 0.12 | 49.87 ± 0.31 |
Mistral Small 3.1 2503 24B Mistral AI | 44.39 ± 0.22 | 64.22 ± 0.99 | 9.71 ± 0.44 | 82.03 ± 0.07 | 55.72 ± 0.26 | 0.00 ± 0.00 | 54.68 ± 0.63 |
Command R+ 08-2024 104B CohereLabs | 44.12 ± 0.26 | 67.81 ± 1.06 | 7.39 ± 0.46 | 84.03 ± 0.07 | 62.48 ± 0.29 | 4.18 ± 0.36 | 38.85 ± 0.76 |
Aya Expanse 8B CohereLabs | 44.12 ± 0.18 | 63.78 ± 0.78 | 13.30 ± 0.65 | 81.71 ± 0.06 | 56.00 ± 0.18 | 12.41 ± 0.25 | 37.53 ± 0.20 |
Sailor2 8B SAIL | 43.90 ± 0.19 | 38.92 ± 0.89 | 25.56 ± 0.43 | 85.24 ± 0.07 | 51.53 ± 0.16 | 19.51 ± 0.31 | 42.67 ± 0.39 |
Llama 3 70B Meta | 43.42 ± 0.10 | 24.10 ± 0.47 | 9.07 ± 0.44 | 89.75 ± 0.02 | 66.39 ± 0.10 | 9.59 ± 0.16 | 61.64 ± 0.29 |
Tulu 3 8B AI2 | 42.86 ± 0.14 | 73.08 ± 0.59 | 12.67 ± 0.48 | 87.26 ± 0.05 | 53.32 ± 0.18 | 0.00 ± 0.00 | 30.81 ± 0.35 |
Apertus 70B Swiss AI | 42.69 ± 0.24 | 62.73 ± 1.01 | 14.80 ± 0.46 | 82.01 ± 0.09 | 48.52 ± 0.28 | 17.96 ± 0.45 | 30.15 ± 0.77 |
phi-4 14B Microsoft | 39.71 ± 0.24 | 61.52 ± 1.09 | 15.11 ± 0.51 | 72.87 ± 0.11 | 39.61 ± 0.40 | 0.00 ± 0.00 | 49.11 ± 0.40 |
Command R 08-2024 32B CohereLabs | 39.00 ± 0.21 | 64.63 ± 0.79 | 5.01 ± 0.39 | 81.29 ± 0.08 | 33.72 ± 0.16 | 10.82 ± 0.58 | 38.56 ± 0.63 |
Apertus 8B Swiss AI | 38.91 ± 0.31 | 65.59 ± 0.97 | 5.79 ± 0.42 | 86.17 ± 0.06 | 35.86 ± 0.29 | 9.26 ± 0.76 | 30.81 ± 0.85 |
Olmo 2 1124 13B AI2 | 38.54 ± 0.19 | 64.92 ± 0.89 | 6.70 ± 0.30 | 71.47 ± 0.14 | 46.86 ± 0.21 | 14.19 ± 0.25 | 27.11 ± 0.58 |
SeaLLMs V3 7B Alibaba-DAMO | 36.82 ± 0.28 | 52.35 ± 1.06 | 8.95 ± 0.40 | 82.02 ± 0.09 | 39.73 ± 0.35 | 9.01 ± 0.68 | 28.88 ± 0.84 |
Llama 3 8B Meta | 35.77 ± 0.16 | 26.89 ± 0.46 | 5.47 ± 0.45 | 84.96 ± 0.06 | 53.66 ± 0.16 | 8.43 ± 0.16 | 35.19 ± 0.49 |
Babel 9B Alibaba-DAMO | 31.66 ± 0.22 | 40.35 ± 0.95 | 3.79 ± 0.33 | 72.51 ± 0.13 | 35.27 ± 0.31 | 9.34 ± 0.63 | 28.69 ± 0.82 |
Olmo 3 7B AI2 | 30.18 ± 0.22 | 57.08 ± 0.96 | 6.11 ± 0.28 | 49.26 ± 0.15 | 40.18 ± 0.18 | 4.97 ± 0.74 | 23.50 ± 0.43 |
Olmo 2 1124 7B AI2 | 29.81 ± 0.22 | 55.02 ± 1.26 | 4.24 ± 0.33 | 59.43 ± 0.13 | 30.84 ± 0.30 | 6.78 ± 0.45 | 22.57 ± 0.50 |
Babel 83B Alibaba-DAMO | 28.79 ± 0.27 | 40.41 ± 1.24 | 7.13 ± 0.58 | 66.77 ± 0.18 | 22.52 ± 0.51 | 0.00 ± 0.00 | 35.89 ± 1.06 |
Command R7B 12-2024 7B CohereLabs | 28.08 ± 0.26 | 48.83 ± 1.43 | 2.08 ± 0.34 | 46.82 ± 0.15 | 45.54 ± 0.26 | 0.00 ± 0.00 | 25.23 ± 0.70 |
Ministral 2410 8B Mistral AI | 23.78 ± 0.30 | 28.03 ± 1.12 | 3.74 ± 0.38 | 57.48 ± 0.17 | 25.65 ± 0.23 | 7.46 ± 0.59 | 20.34 ± 0.93 |
Malay Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MS | Instruction Following | SEA-IFEval |
|---|---|---|---|
Qwen 3 VL 32B Alibaba | 63.65 ± 0.14 | 82.35 ± 0.55 | 82.35 ± 0.55 |
Qwen 3 Next 80B MoE Alibaba | 62.80 ± 0.12 | 83.33 ± 0.32 | 83.33 ± 0.32 |
SEA-LION v4 (Qwen) 32B AISG | 61.36 ± 0.14 | 81.21 ± 0.50 | 81.21 ± 0.50 |
Qwen 3 30B MoE Alibaba | 61.13 ± 0.14 | 81.46 ± 0.47 | 81.46 ± 0.47 |
SEA-LION v4 (Gemma) 27B AISG | 61.10 ± 0.16 | 83.37 ± 0.65 | 83.37 ± 0.65 |
Gemma 3 27B | 60.92 ± 0.17 | 81.65 ± 0.59 | 81.65 ± 0.59 |
SEA-LION v3 (Llama) 70B AISG | 59.94 ± 0.17 | 85.62 ± 0.51 | 85.62 ± 0.51 |
Qwen 3 32B Alibaba | 59.67 ± 0.17 | 80.38 ± 0.62 | 80.38 ± 0.62 |
Qwen 2.5 72B Alibaba | 59.48 ± 0.14 | 79.24 ± 0.58 | 79.24 ± 0.58 |
Llama 3.3 70B Meta | 58.80 ± 0.15 | 86.51 ± 0.40 | 86.51 ± 0.40 |
SEA-LION v4 (Qwen VL) 8B AISG | 58.46 ± 0.11 | 84.32 ± 0.37 | 84.32 ± 0.37 |
Qwen 3 VL 8B Alibaba | 58.38 ± 0.14 | 83.46 ± 0.34 | 83.46 ± 0.34 |
Gemma 3 12B | 57.96 ± 0.12 | 81.65 ± 0.67 | 81.65 ± 0.67 |
Tulu 3 70B AI2 | 57.39 ± 0.19 | 80.00 ± 0.65 | 80.00 ± 0.65 |
Llama 4 Scout 109B MoE Meta | 57.38 ± 0.11 | 84.38 ± 0.46 | 84.38 ± 0.46 |
Llama 3.1 70B Meta | 55.67 ± 0.18 | 75.05 ± 0.86 | 75.05 ± 0.86 |
Qwen 3 14B Alibaba | 55.04 ± 0.11 | 78.35 ± 0.42 | 78.35 ± 0.42 |
Command A 03-2025 111B CohereLabs | 54.76 ± 0.17 | 84.79 ± 0.63 | 84.79 ± 0.63 |
SEA-LION v3 (Gemma 2) 9B AISG | 54.70 ± 0.18 | 81.05 ± 0.85 | 81.05 ± 0.85 |
Gemma 2 27B | 54.06 ± 0.22 | 74.00 ± 0.80 | 74.00 ± 0.80 |
Qwen 3 8B Alibaba | 53.83 ± 0.15 | 79.05 ± 0.63 | 79.05 ± 0.63 |
Qwen 2.5 32B Alibaba | 53.71 ± 0.13 | 79.24 ± 0.64 | 79.24 ± 0.64 |
SEA-LION v4 (Qwen VL) 4B AISG | 53.58 ± 0.15 | 80.57 ± 0.53 | 80.57 ± 0.53 |
Qwen 3 VL 4B Alibaba | 53.23 ± 0.14 | 80.54 ± 0.49 | 80.54 ± 0.49 |
Mistral Large 2411 123B Mistral AI | 52.73 ± 0.21 | 79.56 ± 0.87 | 79.56 ± 0.87 |
Qwen 2.5 14B Alibaba | 51.49 ± 0.13 | 74.54 ± 0.61 | 74.54 ± 0.61 |
SEA-LION v3 (Llama) 8B AISG | 51.32 ± 0.28 | 78.13 ± 0.74 | 78.13 ± 0.74 |
Gemma 2 9B | 49.72 ± 0.21 | 73.14 ± 0.88 | 73.14 ± 0.88 |
MERaLiON 2 10B A*STAR | 49.12 ± 0.22 | 69.11 ± 0.95 | 69.11 ± 0.95 |
Aya Expanse 32B CohereLabs | 48.35 ± 0.20 | 73.87 ± 0.79 | 73.87 ± 0.79 |
Olmo 2 0325 32B AI2 | 48.28 ± 0.23 | 75.08 ± 1.16 | 75.08 ± 1.16 |
Qwen 2.5 7B Alibaba | 46.94 ± 0.13 | 69.37 ± 0.68 | 69.37 ± 0.68 |
ERNIE 4.5 21B MoE Baidu | 45.83 ± 0.18 | 74.57 ± 0.85 | 74.57 ± 0.85 |
Llama 3.1 8B Meta | 45.49 ± 0.22 | 64.63 ± 1.08 | 64.63 ± 1.08 |
Sailor2 20B SAIL | 45.19 ± 0.17 | 41.02 ± 0.68 | 41.02 ± 0.68 |
Mistral Small 3.1 2503 24B Mistral AI | 44.39 ± 0.22 | 64.22 ± 0.99 | 64.22 ± 0.99 |
Command R+ 08-2024 104B CohereLabs | 44.12 ± 0.26 | 67.81 ± 1.06 | 67.81 ± 1.06 |
Aya Expanse 8B CohereLabs | 44.12 ± 0.18 | 63.78 ± 0.78 | 63.78 ± 0.78 |
Sailor2 8B SAIL | 43.90 ± 0.19 | 38.92 ± 0.89 | 38.92 ± 0.89 |
Llama 3 70B Meta | 43.42 ± 0.10 | 24.10 ± 0.47 | 24.10 ± 0.47 |
Tulu 3 8B AI2 | 42.86 ± 0.14 | 73.08 ± 0.59 | 73.08 ± 0.59 |
Apertus 70B Swiss AI | 42.69 ± 0.24 | 62.73 ± 1.01 | 62.73 ± 1.01 |
phi-4 14B Microsoft | 39.71 ± 0.24 | 61.52 ± 1.09 | 61.52 ± 1.09 |
Command R 08-2024 32B CohereLabs | 39.00 ± 0.21 | 64.63 ± 0.79 | 64.63 ± 0.79 |
Apertus 8B Swiss AI | 38.91 ± 0.31 | 65.59 ± 0.97 | 65.59 ± 0.97 |
Olmo 2 1124 13B AI2 | 38.54 ± 0.19 | 64.92 ± 0.89 | 64.92 ± 0.89 |
SeaLLMs V3 7B Alibaba-DAMO | 36.82 ± 0.28 | 52.35 ± 1.06 | 52.35 ± 1.06 |
Llama 3 8B Meta | 35.77 ± 0.16 | 26.89 ± 0.46 | 26.89 ± 0.46 |
Babel 9B Alibaba-DAMO | 31.66 ± 0.22 | 40.35 ± 0.95 | 40.35 ± 0.95 |
Olmo 3 7B AI2 | 30.18 ± 0.22 | 57.08 ± 0.96 | 57.08 ± 0.96 |
Olmo 2 1124 7B AI2 | 29.81 ± 0.22 | 55.02 ± 1.26 | 55.02 ± 1.26 |
Babel 83B Alibaba-DAMO | 28.79 ± 0.27 | 40.41 ± 1.24 | 40.41 ± 1.24 |
Command R7B 12-2024 7B CohereLabs | 28.08 ± 0.26 | 48.83 ± 1.43 | 48.83 ± 1.43 |
Ministral 2410 8B Mistral AI | 23.78 ± 0.30 | 28.03 ± 1.12 | 28.03 ± 1.12 |
Model | MS | Multi-Turn Chat | SEA-MT-Bench |
|---|---|---|---|
Qwen 3 VL 32B Alibaba | 63.65 ± 0.14 | 54.32 ± 0.50 | 54.32 ± 0.50 |
Qwen 3 Next 80B MoE Alibaba | 62.80 ± 0.12 | 57.47 ± 0.62 | 57.47 ± 0.62 |
SEA-LION v4 (Qwen) 32B AISG | 61.36 ± 0.14 | 42.30 ± 0.70 | 42.30 ± 0.70 |
Qwen 3 30B MoE Alibaba | 61.13 ± 0.14 | 51.48 ± 0.57 | 51.48 ± 0.57 |
SEA-LION v4 (Gemma) 27B AISG | 61.10 ± 0.16 | 44.78 ± 0.80 | 44.78 ± 0.80 |
Gemma 3 27B | 60.92 ± 0.17 | 45.89 ± 0.56 | 45.89 ± 0.56 |
SEA-LION v3 (Llama) 70B AISG | 59.94 ± 0.17 | 25.55 ± 0.75 | 25.55 ± 0.75 |
Qwen 3 32B Alibaba | 59.67 ± 0.17 | 41.64 ± 0.74 | 41.64 ± 0.74 |
Qwen 2.5 72B Alibaba | 59.48 ± 0.14 | 32.76 ± 0.55 | 32.76 ± 0.55 |
Llama 3.3 70B Meta | 58.80 ± 0.15 | 17.07 ± 0.57 | 17.07 ± 0.57 |
SEA-LION v4 (Qwen VL) 8B AISG | 58.46 ± 0.11 | 40.34 ± 0.58 | 40.34 ± 0.58 |
Qwen 3 VL 8B Alibaba | 58.38 ± 0.14 | 39.93 ± 0.71 | 39.93 ± 0.71 |
Gemma 3 12B | 57.96 ± 0.12 | 38.02 ± 0.60 | 38.02 ± 0.60 |
Tulu 3 70B AI2 | 57.39 ± 0.19 | 27.60 ± 0.58 | 27.60 ± 0.58 |
Llama 4 Scout 109B MoE Meta | 57.38 ± 0.11 | 21.34 ± 0.52 | 21.34 ± 0.52 |
Llama 3.1 70B Meta | 55.67 ± 0.18 | 13.97 ± 0.48 | 13.97 ± 0.48 |
Qwen 3 14B Alibaba | 55.04 ± 0.11 | 31.61 ± 0.52 | 31.61 ± 0.52 |
Command A 03-2025 111B CohereLabs | 54.76 ± 0.17 | 40.92 ± 0.72 | 40.92 ± 0.72 |
SEA-LION v3 (Gemma 2) 9B AISG | 54.70 ± 0.18 | 25.46 ± 0.63 | 25.46 ± 0.63 |
Gemma 2 27B | 54.06 ± 0.22 | 16.03 ± 0.55 | 16.03 ± 0.55 |
Qwen 3 8B Alibaba | 53.83 ± 0.15 | 26.29 ± 0.66 | 26.29 ± 0.66 |
Qwen 2.5 32B Alibaba | 53.71 ± 0.13 | 19.25 ± 0.44 | 19.25 ± 0.44 |
SEA-LION v4 (Qwen VL) 4B AISG | 53.58 ± 0.15 | 29.77 ± 0.69 | 29.77 ± 0.69 |
Qwen 3 VL 4B Alibaba | 53.23 ± 0.14 | 30.11 ± 0.56 | 30.11 ± 0.56 |
Mistral Large 2411 123B Mistral AI | 52.73 ± 0.21 | 21.49 ± 0.49 | 21.49 ± 0.49 |
Qwen 2.5 14B Alibaba | 51.49 ± 0.13 | 15.00 ± 0.41 | 15.00 ± 0.41 |
SEA-LION v3 (Llama) 8B AISG | 51.32 ± 0.28 | 20.11 ± 0.82 | 20.11 ± 0.82 |
Gemma 2 9B | 49.72 ± 0.21 | 12.08 ± 0.44 | 12.08 ± 0.44 |
MERaLiON 2 10B A*STAR | 49.12 ± 0.22 | 11.61 ± 0.31 | 11.61 ± 0.31 |
Aya Expanse 32B CohereLabs | 48.35 ± 0.20 | 18.74 ± 0.74 | 18.74 ± 0.74 |
Olmo 2 0325 32B AI2 | 48.28 ± 0.23 | 11.29 ± 0.48 | 11.29 ± 0.48 |
Qwen 2.5 7B Alibaba | 46.94 ± 0.13 | 14.51 ± 0.38 | 14.51 ± 0.38 |
ERNIE 4.5 21B MoE Baidu | 45.83 ± 0.18 | 14.38 ± 0.52 | 14.38 ± 0.52 |
Llama 3.1 8B Meta | 45.49 ± 0.22 | 10.07 ± 0.37 | 10.07 ± 0.37 |
Sailor2 20B SAIL | 45.19 ± 0.17 | 25.88 ± 0.68 | 25.88 ± 0.68 |
Mistral Small 3.1 2503 24B Mistral AI | 44.39 ± 0.22 | 9.71 ± 0.44 | 9.71 ± 0.44 |
Command R+ 08-2024 104B CohereLabs | 44.12 ± 0.26 | 7.39 ± 0.46 | 7.39 ± 0.46 |
Aya Expanse 8B CohereLabs | 44.12 ± 0.18 | 13.30 ± 0.65 | 13.30 ± 0.65 |
Sailor2 8B SAIL | 43.90 ± 0.19 | 25.56 ± 0.43 | 25.56 ± 0.43 |
Llama 3 70B Meta | 43.42 ± 0.10 | 9.07 ± 0.44 | 9.07 ± 0.44 |
Tulu 3 8B AI2 | 42.86 ± 0.14 | 12.67 ± 0.48 | 12.67 ± 0.48 |
Apertus 70B Swiss AI | 42.69 ± 0.24 | 14.80 ± 0.46 | 14.80 ± 0.46 |
phi-4 14B Microsoft | 39.71 ± 0.24 | 15.11 ± 0.51 | 15.11 ± 0.51 |
Command R 08-2024 32B CohereLabs | 39.00 ± 0.21 | 5.01 ± 0.39 | 5.01 ± 0.39 |
Apertus 8B Swiss AI | 38.91 ± 0.31 | 5.79 ± 0.42 | 5.79 ± 0.42 |
Olmo 2 1124 13B AI2 | 38.54 ± 0.19 | 6.70 ± 0.30 | 6.70 ± 0.30 |
SeaLLMs V3 7B Alibaba-DAMO | 36.82 ± 0.28 | 8.95 ± 0.40 | 8.95 ± 0.40 |
Llama 3 8B Meta | 35.77 ± 0.16 | 5.47 ± 0.45 | 5.47 ± 0.45 |
Babel 9B Alibaba-DAMO | 31.66 ± 0.22 | 3.79 ± 0.33 | 3.79 ± 0.33 |
Olmo 3 7B AI2 | 30.18 ± 0.22 | 6.11 ± 0.28 | 6.11 ± 0.28 |
Olmo 2 1124 7B AI2 | 29.81 ± 0.22 | 4.24 ± 0.33 | 4.24 ± 0.33 |
Babel 83B Alibaba-DAMO | 28.79 ± 0.27 | 7.13 ± 0.58 | 7.13 ± 0.58 |
Command R7B 12-2024 7B CohereLabs | 28.08 ± 0.26 | 2.08 ± 0.34 | 2.08 ± 0.34 |
Ministral 2410 8B Mistral AI | 23.78 ± 0.30 | 3.74 ± 0.38 | 3.74 ± 0.38 |
Model | MS | NLG | Translations |
|---|---|---|---|
Qwen 3 VL 32B Alibaba | 63.65 ± 0.14 | 90.62 ± 0.02 | 90.62 ± 0.02 |
Qwen 3 Next 80B MoE Alibaba | 62.80 ± 0.12 | 90.14 ± 0.01 | 90.14 ± 0.01 |
SEA-LION v4 (Qwen) 32B AISG | 61.36 ± 0.14 | 90.27 ± 0.02 | 90.27 ± 0.02 |
Qwen 3 30B MoE Alibaba | 61.13 ± 0.14 | 90.07 ± 0.02 | 90.07 ± 0.02 |
SEA-LION v4 (Gemma) 27B AISG | 61.10 ± 0.16 | 91.71 ± 0.02 | 91.71 ± 0.02 |
Gemma 3 27B | 60.92 ± 0.17 | 91.83 ± 0.01 | 91.83 ± 0.01 |
SEA-LION v3 (Llama) 70B AISG | 59.94 ± 0.17 | 91.01 ± 0.02 | 91.01 ± 0.02 |
Qwen 3 32B Alibaba | 59.67 ± 0.17 | 88.97 ± 0.03 | 88.97 ± 0.03 |
Qwen 2.5 72B Alibaba | 59.48 ± 0.14 | 89.60 ± 0.03 | 89.60 ± 0.03 |
Llama 3.3 70B Meta | 58.80 ± 0.15 | 90.10 ± 0.02 | 90.10 ± 0.02 |
SEA-LION v4 (Qwen VL) 8B AISG | 58.46 ± 0.11 | 88.80 ± 0.03 | 88.80 ± 0.03 |
Qwen 3 VL 8B Alibaba | 58.38 ± 0.14 | 88.26 ± 0.03 | 88.26 ± 0.03 |
Gemma 3 12B | 57.96 ± 0.12 | 91.02 ± 0.02 | 91.02 ± 0.02 |
Tulu 3 70B AI2 | 57.39 ± 0.19 | 91.06 ± 0.02 | 91.06 ± 0.02 |
Llama 4 Scout 109B MoE Meta | 57.38 ± 0.11 | 91.03 ± 0.02 | 91.03 ± 0.02 |
Llama 3.1 70B Meta | 55.67 ± 0.18 | 90.34 ± 0.02 | 90.34 ± 0.02 |
Qwen 3 14B Alibaba | 55.04 ± 0.11 | 87.44 ± 0.02 | 87.44 ± 0.02 |
Command A 03-2025 111B CohereLabs | 54.76 ± 0.17 | 87.16 ± 0.07 | 87.16 ± 0.07 |
SEA-LION v3 (Gemma 2) 9B AISG | 54.70 ± 0.18 | 90.38 ± 0.03 | 90.38 ± 0.03 |
Gemma 2 27B | 54.06 ± 0.22 | 90.48 ± 0.03 | 90.48 ± 0.03 |
Qwen 3 8B Alibaba | 53.83 ± 0.15 | 87.19 ± 0.04 | 87.19 ± 0.04 |
Qwen 2.5 32B Alibaba | 53.71 ± 0.13 | 86.01 ± 0.04 | 86.01 ± 0.04 |
SEA-LION v4 (Qwen VL) 4B AISG | 53.58 ± 0.15 | 86.22 ± 0.02 | 86.22 ± 0.02 |
Qwen 3 VL 4B Alibaba | 53.23 ± 0.14 | 85.94 ± 0.04 | 85.94 ± 0.04 |
Mistral Large 2411 123B Mistral AI | 52.73 ± 0.21 | 89.19 ± 0.04 | 89.19 ± 0.04 |
Qwen 2.5 14B Alibaba | 51.49 ± 0.13 | 85.33 ± 0.05 | 85.33 ± 0.05 |
SEA-LION v3 (Llama) 8B AISG | 51.32 ± 0.28 | 89.46 ± 0.03 | 89.46 ± 0.03 |
Gemma 2 9B | 49.72 ± 0.21 | 89.08 ± 0.04 | 89.08 ± 0.04 |
MERaLiON 2 10B A*STAR | 49.12 ± 0.22 | 88.16 ± 0.04 | 88.16 ± 0.04 |
Aya Expanse 32B CohereLabs | 48.35 ± 0.20 | 86.71 ± 0.04 | 86.71 ± 0.04 |
Olmo 2 0325 32B AI2 | 48.28 ± 0.23 | 82.23 ± 0.10 | 82.23 ± 0.10 |
Qwen 2.5 7B Alibaba | 46.94 ± 0.13 | 81.40 ± 0.05 | 81.40 ± 0.05 |
ERNIE 4.5 21B MoE Baidu | 45.83 ± 0.18 | 88.08 ± 0.04 | 88.08 ± 0.04 |
Llama 3.1 8B Meta | 45.49 ± 0.22 | 87.30 ± 0.03 | 87.30 ± 0.03 |
Sailor2 20B SAIL | 45.19 ± 0.17 | 76.16 ± 0.18 | 76.16 ± 0.18 |
Mistral Small 3.1 2503 24B Mistral AI | 44.39 ± 0.22 | 82.03 ± 0.07 | 82.03 ± 0.07 |
Command R+ 08-2024 104B CohereLabs | 44.12 ± 0.26 | 84.03 ± 0.07 | 84.03 ± 0.07 |
Aya Expanse 8B CohereLabs | 44.12 ± 0.18 | 81.71 ± 0.06 | 81.71 ± 0.06 |
Sailor2 8B SAIL | 43.90 ± 0.19 | 85.24 ± 0.07 | 85.24 ± 0.07 |
Llama 3 70B Meta | 43.42 ± 0.10 | 89.75 ± 0.02 | 89.75 ± 0.02 |
Tulu 3 8B AI2 | 42.86 ± 0.14 | 87.26 ± 0.05 | 87.26 ± 0.05 |
Apertus 70B Swiss AI | 42.69 ± 0.24 | 82.01 ± 0.09 | 82.01 ± 0.09 |
phi-4 14B Microsoft | 39.71 ± 0.24 | 72.87 ± 0.11 | 72.87 ± 0.11 |
Command R 08-2024 32B CohereLabs | 39.00 ± 0.21 | 81.29 ± 0.08 | 81.29 ± 0.08 |
Apertus 8B Swiss AI | 38.91 ± 0.31 | 86.17 ± 0.06 | 86.17 ± 0.06 |
Olmo 2 1124 13B AI2 | 38.54 ± 0.19 | 71.47 ± 0.14 | 71.47 ± 0.14 |
SeaLLMs V3 7B Alibaba-DAMO | 36.82 ± 0.28 | 82.02 ± 0.09 | 82.02 ± 0.09 |
Llama 3 8B Meta | 35.77 ± 0.16 | 84.96 ± 0.06 | 84.96 ± 0.06 |
Babel 9B Alibaba-DAMO | 31.66 ± 0.22 | 72.51 ± 0.13 | 72.51 ± 0.13 |
Olmo 3 7B AI2 | 30.18 ± 0.22 | 49.26 ± 0.15 | 49.26 ± 0.15 |
Olmo 2 1124 7B AI2 | 29.81 ± 0.22 | 59.43 ± 0.13 | 59.43 ± 0.13 |
Babel 83B Alibaba-DAMO | 28.79 ± 0.27 | 66.77 ± 0.18 | 66.77 ± 0.18 |
Command R7B 12-2024 7B CohereLabs | 28.08 ± 0.26 | 46.82 ± 0.15 | 46.82 ± 0.15 |
Ministral 2410 8B Mistral AI | 23.78 ± 0.30 | 57.48 ± 0.17 | 57.48 ± 0.17 |
Model | MS | NLU | Belebele QA | Sentiment Analysis |
|---|---|---|---|---|
Qwen 3 VL 32B Alibaba | 63.65 ± 0.14 | 69.36 ± 0.07 | 91.31 ± 0.09 | 47.40 ± 0.11 |
Qwen 3 Next 80B MoE Alibaba | 62.80 ± 0.12 | 67.90 ± 0.13 | 86.14 ± 0.16 | 49.65 ± 0.17 |
SEA-LION v4 (Qwen) 32B AISG | 61.36 ± 0.14 | 68.42 ± 0.09 | 90.24 ± 0.05 | 46.60 ± 0.16 |
Qwen 3 30B MoE Alibaba | 61.13 ± 0.14 | 67.60 ± 0.05 | 87.00 ± 0.05 | 48.20 ± 0.11 |
SEA-LION v4 (Gemma) 27B AISG | 61.10 ± 0.16 | 69.31 ± 0.10 | 88.15 ± 0.15 | 50.47 ± 0.12 |
Gemma 3 27B | 60.92 ± 0.17 | 69.16 ± 0.08 | 87.86 ± 0.12 | 50.46 ± 0.12 |
SEA-LION v3 (Llama) 70B AISG | 59.94 ± 0.17 | 69.71 ± 0.14 | 90.44 ± 0.15 | 48.99 ± 0.28 |
Qwen 3 32B Alibaba | 59.67 ± 0.17 | 66.47 ± 0.13 | 88.57 ± 0.07 | 44.37 ± 0.21 |
Qwen 2.5 72B Alibaba | 59.48 ± 0.14 | 69.11 ± 0.10 | 90.03 ± 0.11 | 48.19 ± 0.17 |
Llama 3.3 70B Meta | 58.80 ± 0.15 | 69.75 ± 0.09 | 90.86 ± 0.08 | 48.65 ± 0.16 |
SEA-LION v4 (Qwen VL) 8B AISG | 58.46 ± 0.11 | 66.81 ± 0.08 | 86.45 ± 0.09 | 47.17 ± 0.14 |
Qwen 3 VL 8B Alibaba | 58.38 ± 0.14 | 66.73 ± 0.10 | 85.77 ± 0.07 | 47.70 ± 0.17 |
Gemma 3 12B | 57.96 ± 0.12 | 67.67 ± 0.07 | 85.63 ± 0.10 | 49.72 ± 0.14 |
Tulu 3 70B AI2 | 57.39 ± 0.19 | 67.60 ± 0.20 | 89.44 ± 0.14 | 45.75 ± 0.36 |
Llama 4 Scout 109B MoE Meta | 57.38 ± 0.11 | 68.72 ± 0.07 | 89.26 ± 0.02 | 48.17 ± 0.12 |
Llama 3.1 70B Meta | 55.67 ± 0.18 | 69.16 ± 0.17 | 90.29 ± 0.19 | 48.03 ± 0.26 |
Qwen 3 14B Alibaba | 55.04 ± 0.11 | 66.47 ± 0.11 | 87.01 ± 0.11 | 45.93 ± 0.18 |
Command A 03-2025 111B CohereLabs | 54.76 ± 0.17 | 49.65 ± 0.38 | 49.81 ± 0.65 | 49.49 ± 0.26 |
SEA-LION v3 (Gemma 2) 9B AISG | 54.70 ± 0.18 | 65.12 ± 0.15 | 87.36 ± 0.14 | 42.88 ± 0.26 |
Gemma 2 27B | 54.06 ± 0.22 | 68.19 ± 0.12 | 88.17 ± 0.18 | 48.21 ± 0.19 |
Qwen 3 8B Alibaba | 53.83 ± 0.15 | 63.43 ± 0.15 | 80.77 ± 0.16 | 46.08 ± 0.20 |
Qwen 2.5 32B Alibaba | 53.71 ± 0.13 | 67.36 ± 0.09 | 90.32 ± 0.06 | 44.41 ± 0.15 |
SEA-LION v4 (Qwen VL) 4B AISG | 53.58 ± 0.15 | 64.99 ± 0.05 | 84.62 ± 0.06 | 45.35 ± 0.10 |
Qwen 3 VL 4B Alibaba | 53.23 ± 0.14 | 64.00 ± 0.08 | 82.86 ± 0.10 | 45.13 ± 0.14 |
Mistral Large 2411 123B Mistral AI | 52.73 ± 0.21 | 53.06 ± 0.30 | 87.56 ± 0.20 | 18.57 ± 0.61 |
Qwen 2.5 14B Alibaba | 51.49 ± 0.13 | 65.20 ± 0.09 | 85.66 ± 0.11 | 44.73 ± 0.15 |
SEA-LION v3 (Llama) 8B AISG | 51.32 ± 0.28 | 60.30 ± 0.20 | 78.55 ± 0.12 | 42.04 ± 0.39 |
Gemma 2 9B | 49.72 ± 0.21 | 61.98 ± 0.15 | 86.51 ± 0.13 | 37.45 ± 0.30 |
MERaLiON 2 10B A*STAR | 49.12 ± 0.22 | 62.51 ± 0.20 | 83.32 ± 0.17 | 41.71 ± 0.35 |
Aya Expanse 32B CohereLabs | 48.35 ± 0.20 | 48.22 ± 0.19 | 79.16 ± 0.21 | 17.28 ± 0.34 |
Olmo 2 0325 32B AI2 | 48.28 ± 0.23 | 61.19 ± 0.27 | 78.60 ± 0.21 | 43.78 ± 0.48 |
Qwen 2.5 7B Alibaba | 46.94 ± 0.13 | 56.37 ± 0.08 | 75.19 ± 0.08 | 37.55 ± 0.16 |
ERNIE 4.5 21B MoE Baidu | 45.83 ± 0.18 | 50.07 ± 0.27 | 61.76 ± 0.42 | 38.38 ± 0.25 |
Llama 3.1 8B Meta | 45.49 ± 0.22 | 59.11 ± 0.17 | 75.82 ± 0.25 | 42.40 ± 0.27 |
Sailor2 20B SAIL | 45.19 ± 0.17 | 65.47 ± 0.14 | 86.64 ± 0.08 | 44.31 ± 0.26 |
Mistral Small 3.1 2503 24B Mistral AI | 44.39 ± 0.22 | 55.72 ± 0.26 | 86.32 ± 0.24 | 25.13 ± 0.53 |
Command R+ 08-2024 104B CohereLabs | 44.12 ± 0.26 | 62.48 ± 0.29 | 79.15 ± 0.31 | 45.80 ± 0.44 |
Aya Expanse 8B CohereLabs | 44.12 ± 0.18 | 56.00 ± 0.18 | 68.12 ± 0.28 | 43.88 ± 0.20 |
Sailor2 8B SAIL | 43.90 ± 0.19 | 51.53 ± 0.16 | 75.59 ± 0.16 | 27.46 ± 0.28 |
Llama 3 70B Meta | 43.42 ± 0.10 | 66.39 ± 0.10 | 87.63 ± 0.09 | 45.14 ± 0.17 |
Tulu 3 8B AI2 | 42.86 ± 0.14 | 53.32 ± 0.18 | 69.89 ± 0.16 | 36.75 ± 0.36 |
Apertus 70B Swiss AI | 42.69 ± 0.24 | 48.52 ± 0.28 | 60.86 ± 0.37 | 36.19 ± 0.47 |
phi-4 14B Microsoft | 39.71 ± 0.24 | 39.61 ± 0.40 | 37.03 ± 0.70 | 42.19 ± 0.37 |
Command R 08-2024 32B CohereLabs | 39.00 ± 0.21 | 33.72 ± 0.16 | 67.43 ± 0.32 | 0.00 ± 0.00 |
Apertus 8B Swiss AI | 38.91 ± 0.31 | 35.86 ± 0.29 | 47.87 ± 0.49 | 23.84 ± 0.49 |
Olmo 2 1124 13B AI2 | 38.54 ± 0.19 | 46.86 ± 0.21 | 60.64 ± 0.26 | 33.08 ± 0.42 |
SeaLLMs V3 7B Alibaba-DAMO | 36.82 ± 0.28 | 39.73 ± 0.35 | 62.44 ± 0.44 | 17.01 ± 0.43 |
Llama 3 8B Meta | 35.77 ± 0.16 | 53.66 ± 0.16 | 64.91 ± 0.19 | 42.41 ± 0.24 |
Babel 9B Alibaba-DAMO | 31.66 ± 0.22 | 35.27 ± 0.31 | 63.45 ± 0.47 | 7.08 ± 0.41 |
Olmo 3 7B AI2 | 30.18 ± 0.22 | 40.18 ± 0.18 | 52.35 ± 0.26 | 28.00 ± 0.31 |
Olmo 2 1124 7B AI2 | 29.81 ± 0.22 | 30.84 ± 0.30 | 43.63 ± 0.46 | 18.05 ± 0.43 |
Babel 83B Alibaba-DAMO | 28.79 ± 0.27 | 22.52 ± 0.51 | 32.20 ± 0.92 | 12.84 ± 0.67 |
Command R7B 12-2024 7B CohereLabs | 28.08 ± 0.26 | 45.54 ± 0.26 | 58.61 ± 0.37 | 32.47 ± 0.46 |
Ministral 2410 8B Mistral AI | 23.78 ± 0.30 | 25.65 ± 0.23 | 51.29 ± 0.45 | 0.00 ± 0.00 |
Model | MS | Safety | Toxicity Detection |
|---|---|---|---|
Qwen 3 VL 32B Alibaba | 63.65 ± 0.14 | 14.40 ± 0.20 | 14.40 ± 0.20 |
Qwen 3 Next 80B MoE Alibaba | 62.80 ± 0.12 | 7.58 ± 0.17 | 7.58 ± 0.17 |
SEA-LION v4 (Qwen) 32B AISG | 61.36 ± 0.14 | 16.31 ± 0.13 | 16.31 ± 0.13 |
Qwen 3 30B MoE Alibaba | 61.13 ± 0.14 | 13.06 ± 0.11 | 13.06 ± 0.11 |
SEA-LION v4 (Gemma) 27B AISG | 61.10 ± 0.16 | 14.57 ± 0.21 | 14.57 ± 0.21 |
Gemma 3 27B | 60.92 ± 0.17 | 13.86 ± 0.25 | 13.86 ± 0.25 |
SEA-LION v3 (Llama) 70B AISG | 59.94 ± 0.17 | 13.62 ± 0.36 | 13.62 ± 0.36 |
Qwen 3 32B Alibaba | 59.67 ± 0.17 | 13.18 ± 0.29 | 13.18 ± 0.29 |
Qwen 2.5 72B Alibaba | 59.48 ± 0.14 | 15.39 ± 0.22 | 15.39 ± 0.22 |
Llama 3.3 70B Meta | 58.80 ± 0.15 | 17.48 ± 0.19 | 17.48 ± 0.19 |
SEA-LION v4 (Qwen VL) 8B AISG | 58.46 ± 0.11 | 16.07 ± 0.23 | 16.07 ± 0.23 |
Qwen 3 VL 8B Alibaba | 58.38 ± 0.14 | 15.87 ± 0.18 | 15.87 ± 0.18 |
Gemma 3 12B | 57.96 ± 0.12 | 11.57 ± 0.18 | 11.57 ± 0.18 |
Tulu 3 70B AI2 | 57.39 ± 0.19 | 11.11 ± 0.27 | 11.11 ± 0.27 |
Llama 4 Scout 109B MoE Meta | 57.38 ± 0.11 | 14.05 ± 0.13 | 14.05 ± 0.13 |
Llama 3.1 70B Meta | 55.67 ± 0.18 | 15.90 ± 0.24 | 15.90 ± 0.24 |
Qwen 3 14B Alibaba | 55.04 ± 0.11 | 12.54 ± 0.22 | 12.54 ± 0.22 |
Command A 03-2025 111B CohereLabs | 54.76 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 |
SEA-LION v3 (Gemma 2) 9B AISG | 54.70 ± 0.18 | 12.19 ± 0.41 | 12.19 ± 0.41 |
Gemma 2 27B | 54.06 ± 0.22 | 14.11 ± 0.53 | 14.11 ± 0.53 |
Qwen 3 8B Alibaba | 53.83 ± 0.15 | 15.30 ± 0.17 | 15.30 ± 0.17 |
Qwen 2.5 32B Alibaba | 53.71 ± 0.13 | 9.77 ± 0.22 | 9.77 ± 0.22 |
SEA-LION v4 (Qwen VL) 4B AISG | 53.58 ± 0.15 | 12.92 ± 0.09 | 12.92 ± 0.09 |
Qwen 3 VL 4B Alibaba | 53.23 ± 0.14 | 12.97 ± 0.11 | 12.97 ± 0.11 |
Mistral Large 2411 123B Mistral AI | 52.73 ± 0.21 | 14.72 ± 0.54 | 14.72 ± 0.54 |
Qwen 2.5 14B Alibaba | 51.49 ± 0.13 | 15.65 ± 0.17 | 15.65 ± 0.17 |
SEA-LION v3 (Llama) 8B AISG | 51.32 ± 0.28 | 15.08 ± 0.29 | 15.08 ± 0.29 |
Gemma 2 9B | 49.72 ± 0.21 | 10.75 ± 0.28 | 10.75 ± 0.28 |
MERaLiON 2 10B A*STAR | 49.12 ± 0.22 | 14.04 ± 0.34 | 14.04 ± 0.34 |
Aya Expanse 32B CohereLabs | 48.35 ± 0.20 | 13.13 ± 0.25 | 13.13 ± 0.25 |
Olmo 2 0325 32B AI2 | 48.28 ± 0.23 | 12.31 ± 0.18 | 12.31 ± 0.18 |
Qwen 2.5 7B Alibaba | 46.94 ± 0.13 | 13.91 ± 0.12 | 13.91 ± 0.12 |
ERNIE 4.5 21B MoE Baidu | 45.83 ± 0.18 | 14.74 ± 0.16 | 14.74 ± 0.16 |
Llama 3.1 8B Meta | 45.49 ± 0.22 | 15.03 ± 0.24 | 15.03 ± 0.24 |
Sailor2 20B SAIL | 45.19 ± 0.17 | 12.73 ± 0.12 | 12.73 ± 0.12 |
Mistral Small 3.1 2503 24B Mistral AI | 44.39 ± 0.22 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Command R+ 08-2024 104B CohereLabs | 44.12 ± 0.26 | 4.18 ± 0.36 | 4.18 ± 0.36 |
Aya Expanse 8B CohereLabs | 44.12 ± 0.18 | 12.41 ± 0.25 | 12.41 ± 0.25 |
Sailor2 8B SAIL | 43.90 ± 0.19 | 19.51 ± 0.31 | 19.51 ± 0.31 |
Llama 3 70B Meta | 43.42 ± 0.10 | 9.59 ± 0.16 | 9.59 ± 0.16 |
Tulu 3 8B AI2 | 42.86 ± 0.14 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Apertus 70B Swiss AI | 42.69 ± 0.24 | 17.96 ± 0.45 | 17.96 ± 0.45 |
phi-4 14B Microsoft | 39.71 ± 0.24 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Command R 08-2024 32B CohereLabs | 39.00 ± 0.21 | 10.82 ± 0.58 | 10.82 ± 0.58 |
Apertus 8B Swiss AI | 38.91 ± 0.31 | 9.26 ± 0.76 | 9.26 ± 0.76 |
Olmo 2 1124 13B AI2 | 38.54 ± 0.19 | 14.19 ± 0.25 | 14.19 ± 0.25 |
SeaLLMs V3 7B Alibaba-DAMO | 36.82 ± 0.28 | 9.01 ± 0.68 | 9.01 ± 0.68 |
Llama 3 8B Meta | 35.77 ± 0.16 | 8.43 ± 0.16 | 8.43 ± 0.16 |
Babel 9B Alibaba-DAMO | 31.66 ± 0.22 | 9.34 ± 0.63 | 9.34 ± 0.63 |
Olmo 3 7B AI2 | 30.18 ± 0.22 | 4.97 ± 0.74 | 4.97 ± 0.74 |
Olmo 2 1124 7B AI2 | 29.81 ± 0.22 | 6.78 ± 0.45 | 6.78 ± 0.45 |
Babel 83B Alibaba-DAMO | 28.79 ± 0.27 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Command R7B 12-2024 7B CohereLabs | 28.08 ± 0.26 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Ministral 2410 8B Mistral AI | 23.78 ± 0.30 | 7.46 ± 0.59 | 7.46 ± 0.59 |
Model | MS | Knowledge | Global MMLU Lite |
|---|---|---|---|
Qwen 3 VL 32B Alibaba | 63.65 ± 0.14 | 70.84 ± 0.24 | 70.84 ± 0.24 |
Qwen 3 Next 80B MoE Alibaba | 62.80 ± 0.12 | 70.36 ± 0.19 | 70.36 ± 0.19 |
SEA-LION v4 (Qwen) 32B AISG | 61.36 ± 0.14 | 69.65 ± 0.18 | 69.65 ± 0.18 |
Qwen 3 30B MoE Alibaba | 61.13 ± 0.14 | 63.12 ± 0.26 | 63.12 ± 0.26 |
SEA-LION v4 (Gemma) 27B AISG | 61.10 ± 0.16 | 62.86 ± 0.28 | 62.86 ± 0.28 |
Gemma 3 27B | 60.92 ± 0.17 | 63.13 ± 0.34 | 63.13 ± 0.34 |
SEA-LION v3 (Llama) 70B AISG | 59.94 ± 0.17 | 74.12 ± 0.44 | 74.12 ± 0.44 |
Qwen 3 32B Alibaba | 59.67 ± 0.17 | 67.38 ± 0.27 | 67.38 ± 0.27 |
Qwen 2.5 72B Alibaba | 59.48 ± 0.14 | 70.78 ± 0.32 | 70.78 ± 0.32 |
Llama 3.3 70B Meta | 58.80 ± 0.15 | 71.86 ± 0.19 | 71.86 ± 0.19 |
SEA-LION v4 (Qwen VL) 8B AISG | 58.46 ± 0.11 | 54.39 ± 0.28 | 54.39 ± 0.28 |
Qwen 3 VL 8B Alibaba | 58.38 ± 0.14 | 56.05 ± 0.28 | 56.05 ± 0.28 |
Gemma 3 12B | 57.96 ± 0.12 | 57.83 ± 0.25 | 57.83 ± 0.25 |
Tulu 3 70B AI2 | 57.39 ± 0.19 | 66.97 ± 0.30 | 66.97 ± 0.30 |
Llama 4 Scout 109B MoE Meta | 57.38 ± 0.11 | 64.79 ± 0.14 | 64.79 ± 0.14 |
Llama 3.1 70B Meta | 55.67 ± 0.18 | 69.58 ± 0.35 | 69.58 ± 0.35 |
Qwen 3 14B Alibaba | 55.04 ± 0.11 | 53.83 ± 0.26 | 53.83 ± 0.26 |
Command A 03-2025 111B CohereLabs | 54.76 ± 0.17 | 66.04 ± 0.35 | 66.04 ± 0.35 |
SEA-LION v3 (Gemma 2) 9B AISG | 54.70 ± 0.18 | 53.98 ± 0.44 | 53.98 ± 0.44 |
Gemma 2 27B | 54.06 ± 0.22 | 61.53 ± 0.38 | 61.53 ± 0.38 |
Qwen 3 8B Alibaba | 53.83 ± 0.15 | 51.70 ± 0.33 | 51.70 ± 0.33 |
Qwen 2.5 32B Alibaba | 53.71 ± 0.13 | 60.61 ± 0.25 | 60.61 ± 0.25 |
SEA-LION v4 (Qwen VL) 4B AISG | 53.58 ± 0.15 | 46.99 ± 0.20 | 46.99 ± 0.20 |
Qwen 3 VL 4B Alibaba | 53.23 ± 0.14 | 45.81 ± 0.28 | 45.81 ± 0.28 |
Mistral Large 2411 123B Mistral AI | 52.73 ± 0.21 | 58.38 ± 0.45 | 58.38 ± 0.45 |
Qwen 2.5 14B Alibaba | 51.49 ± 0.13 | 53.19 ± 0.19 | 53.19 ± 0.19 |
SEA-LION v3 (Llama) 8B AISG | 51.32 ± 0.28 | 44.86 ± 0.53 | 44.86 ± 0.53 |
Gemma 2 9B | 49.72 ± 0.21 | 51.31 ± 0.36 | 51.31 ± 0.36 |
MERaLiON 2 10B A*STAR | 49.12 ± 0.22 | 49.27 ± 0.52 | 49.27 ± 0.52 |
Aya Expanse 32B CohereLabs | 48.35 ± 0.20 | 49.44 ± 0.26 | 49.44 ± 0.26 |
Olmo 2 0325 32B AI2 | 48.28 ± 0.23 | 47.58 ± 0.64 | 47.58 ± 0.64 |
Qwen 2.5 7B Alibaba | 46.94 ± 0.13 | 46.07 ± 0.27 | 46.07 ± 0.27 |
ERNIE 4.5 21B MoE Baidu | 45.83 ± 0.18 | 33.11 ± 0.64 | 33.11 ± 0.64 |
Llama 3.1 8B Meta | 45.49 ± 0.22 | 36.82 ± 0.54 | 36.82 ± 0.54 |
Sailor2 20B SAIL | 45.19 ± 0.17 | 49.87 ± 0.31 | 49.87 ± 0.31 |
Mistral Small 3.1 2503 24B Mistral AI | 44.39 ± 0.22 | 54.68 ± 0.63 | 54.68 ± 0.63 |
Command R+ 08-2024 104B CohereLabs | 44.12 ± 0.26 | 38.85 ± 0.76 | 38.85 ± 0.76 |
Aya Expanse 8B CohereLabs | 44.12 ± 0.18 | 37.53 ± 0.20 | 37.53 ± 0.20 |
Sailor2 8B SAIL | 43.90 ± 0.19 | 42.67 ± 0.39 | 42.67 ± 0.39 |
Llama 3 70B Meta | 43.42 ± 0.10 | 61.64 ± 0.29 | 61.64 ± 0.29 |
Tulu 3 8B AI2 | 42.86 ± 0.14 | 30.81 ± 0.35 | 30.81 ± 0.35 |
Apertus 70B Swiss AI | 42.69 ± 0.24 | 30.15 ± 0.77 | 30.15 ± 0.77 |
phi-4 14B Microsoft | 39.71 ± 0.24 | 49.11 ± 0.40 | 49.11 ± 0.40 |
Command R 08-2024 32B CohereLabs | 39.00 ± 0.21 | 38.56 ± 0.63 | 38.56 ± 0.63 |
Apertus 8B Swiss AI | 38.91 ± 0.31 | 30.81 ± 0.85 | 30.81 ± 0.85 |
Olmo 2 1124 13B AI2 | 38.54 ± 0.19 | 27.11 ± 0.58 | 27.11 ± 0.58 |
SeaLLMs V3 7B Alibaba-DAMO | 36.82 ± 0.28 | 28.88 ± 0.84 | 28.88 ± 0.84 |
Llama 3 8B Meta | 35.77 ± 0.16 | 35.19 ± 0.49 | 35.19 ± 0.49 |
Babel 9B Alibaba-DAMO | 31.66 ± 0.22 | 28.69 ± 0.82 | 28.69 ± 0.82 |
Olmo 3 7B AI2 | 30.18 ± 0.22 | 23.50 ± 0.43 | 23.50 ± 0.43 |
Olmo 2 1124 7B AI2 | 29.81 ± 0.22 | 22.57 ± 0.50 | 22.57 ± 0.50 |
Babel 83B Alibaba-DAMO | 28.79 ± 0.27 | 35.89 ± 1.06 | 35.89 ± 1.06 |
Command R7B 12-2024 7B CohereLabs | 28.08 ± 0.26 | 25.23 ± 0.70 | 25.23 ± 0.70 |
Ministral 2410 8B Mistral AI | 23.78 ± 0.30 | 20.34 ± 0.93 | 20.34 ± 0.93 |