Malay Performance
Malay Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 80B MoE 62.80±0.12 |
![]() ![]() 32B 61.36±0.14 |
![]() ![]() 30B MoE 61.13±0.14 |
![]() ![]() 27B 61.10±0.16 |
![]() ![]() 27B 60.92±0.17 |
![]() ![]() 70B 59.94±0.17 |
![]() ![]() 32B 59.67±0.17 |
![]() ![]() 72B 59.48±0.14 |
![]() ![]() 70B 58.80±0.15 |
![]() ![]() 12B 57.96±0.12 |
![]() ![]() 70B 57.39±0.19 |
![]() ![]() 109B MoE 57.38±0.11 |
![]() ![]() 70B 55.67±0.18 |
![]() ![]() 14B 55.04±0.11 |
![]() ![]() 111B 54.76±0.17 |
![]() ![]() 9B 54.70±0.18 |
![]() ![]() 27B 54.06±0.22 |
![]() ![]() 8B 53.83±0.15 |
![]() ![]() 32B 53.71±0.13 |
![]() ![]() 123B 52.73±0.21 |
![]() ![]() 14B 51.49±0.13 |
![]() ![]() 8B 51.32±0.28 |
![]() ![]() 9B 49.72±0.21 |
![]() ![]() 10B 49.12±0.22 |
![]() ![]() 32B 48.35±0.20 |
![]() ![]() 32B 48.28±0.23 |
![]() ![]() 7B 46.94±0.13 |
![]() ![]() 21B MoE 45.83±0.18 |
![]() ![]() 8B 45.49±0.22 |
![]() ![]() 20B 45.19±0.17 |
![]() ![]() 24B 44.39±0.22 |
![]() ![]() 104B 44.12±0.26 |
![]() ![]() 8B 44.12±0.18 |
![]() ![]() 8B 43.90±0.19 |
![]() ![]() 70B 43.42±0.10 |
![]() ![]() 8B 42.86±0.14 |
![]() ![]() 70B 42.69±0.24 |
![]() ![]() 14B 39.71±0.24 |
![]() ![]() 32B 39.00±0.21 |
![]() ![]() 8B 38.91±0.31 |
![]() ![]() 13B 38.54±0.19 |
![]() ![]() 7B 36.82±0.28 |
![]() ![]() 8B 35.77±0.16 |
![]() ![]() 9B 31.66±0.22 |
![]() ![]() 7B 29.81±0.22 |
![]() ![]() 83B 28.79±0.27 |
![]() ![]() 7B 28.08±0.26 |
![]() ![]() 8B 23.78±0.30 |
Malay Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MS | Instruction Following | Multi-Turn Chat | NLG | NLU | Safety | Knowledge |
---|---|---|---|---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 62.80 ± 0.12 | 83.33 ± 0.32 | 57.47 ± 0.62 | 90.14 ± 0.01 | 67.90 ± 0.13 | 7.58 ± 0.17 | 70.36 ± 0.19 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 61.36 ± 0.14 | 81.21 ± 0.50 | 42.30 ± 0.70 | 90.27 ± 0.02 | 68.42 ± 0.09 | 16.31 ± 0.13 | 69.65 ± 0.18 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.13 ± 0.14 | 81.46 ± 0.47 | 51.48 ± 0.57 | 90.07 ± 0.02 | 67.60 ± 0.05 | 13.06 ± 0.11 | 63.12 ± 0.26 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 61.10 ± 0.16 | 83.37 ± 0.65 | 44.78 ± 0.80 | 91.71 ± 0.02 | 69.31 ± 0.10 | 14.57 ± 0.21 | 62.86 ± 0.28 |
![]() ![]() Gemma 3 27B | 60.92 ± 0.17 | 81.65 ± 0.59 | 45.89 ± 0.56 | 91.83 ± 0.01 | 69.16 ± 0.08 | 13.86 ± 0.25 | 63.13 ± 0.34 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 59.94 ± 0.17 | 85.62 ± 0.51 | 25.55 ± 0.75 | 91.01 ± 0.02 | 69.71 ± 0.14 | 13.62 ± 0.36 | 74.12 ± 0.44 |
![]() ![]() Qwen 3 32B Alibaba | 59.67 ± 0.17 | 80.38 ± 0.62 | 41.64 ± 0.74 | 88.97 ± 0.03 | 66.47 ± 0.13 | 13.18 ± 0.29 | 67.38 ± 0.27 |
![]() ![]() Qwen 2.5 72B Alibaba | 59.48 ± 0.14 | 79.24 ± 0.58 | 32.76 ± 0.55 | 89.60 ± 0.03 | 69.11 ± 0.10 | 15.39 ± 0.22 | 70.78 ± 0.32 |
![]() ![]() Llama 3.3 70B Meta | 58.80 ± 0.15 | 86.51 ± 0.40 | 17.07 ± 0.57 | 90.10 ± 0.02 | 69.75 ± 0.09 | 17.48 ± 0.19 | 71.86 ± 0.19 |
![]() ![]() Gemma 3 12B | 57.96 ± 0.12 | 81.65 ± 0.67 | 38.02 ± 0.60 | 91.02 ± 0.02 | 67.67 ± 0.07 | 11.57 ± 0.18 | 57.83 ± 0.25 |
![]() ![]() Tulu 3 70B AI2 | 57.39 ± 0.19 | 80.00 ± 0.65 | 27.60 ± 0.58 | 91.06 ± 0.02 | 67.60 ± 0.20 | 11.11 ± 0.27 | 66.97 ± 0.30 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 57.38 ± 0.11 | 84.38 ± 0.46 | 21.34 ± 0.52 | 91.03 ± 0.02 | 68.72 ± 0.07 | 14.05 ± 0.13 | 64.79 ± 0.14 |
![]() ![]() Llama 3.1 70B Meta | 55.67 ± 0.18 | 75.05 ± 0.86 | 13.97 ± 0.48 | 90.34 ± 0.02 | 69.16 ± 0.17 | 15.90 ± 0.24 | 69.58 ± 0.35 |
![]() ![]() Qwen 3 14B Alibaba | 55.04 ± 0.11 | 78.35 ± 0.42 | 31.61 ± 0.52 | 87.44 ± 0.02 | 66.47 ± 0.11 | 12.54 ± 0.22 | 53.83 ± 0.26 |
![]() ![]() Command A 03-2025 111B CohereLabs | 54.76 ± 0.17 | 84.79 ± 0.63 | 40.92 ± 0.72 | 87.16 ± 0.07 | 49.65 ± 0.38 | 0.00 ± 0.00 | 66.04 ± 0.35 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 54.70 ± 0.18 | 81.05 ± 0.85 | 25.46 ± 0.63 | 90.38 ± 0.03 | 65.12 ± 0.15 | 12.19 ± 0.41 | 53.98 ± 0.44 |
![]() ![]() Gemma 2 27B | 54.06 ± 0.22 | 74.00 ± 0.80 | 16.03 ± 0.55 | 90.48 ± 0.03 | 68.19 ± 0.12 | 14.11 ± 0.53 | 61.53 ± 0.38 |
![]() ![]() Qwen 3 8B Alibaba | 53.83 ± 0.15 | 79.05 ± 0.63 | 26.29 ± 0.66 | 87.19 ± 0.04 | 63.43 ± 0.15 | 15.30 ± 0.17 | 51.70 ± 0.33 |
![]() ![]() Qwen 2.5 32B Alibaba | 53.71 ± 0.13 | 79.24 ± 0.64 | 19.25 ± 0.44 | 86.01 ± 0.04 | 67.36 ± 0.09 | 9.77 ± 0.22 | 60.61 ± 0.25 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 52.73 ± 0.21 | 79.56 ± 0.87 | 21.49 ± 0.49 | 89.19 ± 0.04 | 53.06 ± 0.30 | 14.72 ± 0.54 | 58.38 ± 0.45 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.49 ± 0.13 | 74.54 ± 0.61 | 15.00 ± 0.41 | 85.33 ± 0.05 | 65.20 ± 0.09 | 15.65 ± 0.17 | 53.19 ± 0.19 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 51.32 ± 0.28 | 78.13 ± 0.74 | 20.11 ± 0.82 | 89.46 ± 0.03 | 60.30 ± 0.20 | 15.08 ± 0.29 | 44.86 ± 0.53 |
![]() ![]() Gemma 2 9B | 49.72 ± 0.21 | 73.14 ± 0.88 | 12.08 ± 0.44 | 89.08 ± 0.04 | 61.98 ± 0.15 | 10.75 ± 0.28 | 51.31 ± 0.36 |
![]() ![]() MERaLiON 2 10B A*STAR | 49.12 ± 0.22 | 69.11 ± 0.95 | 11.61 ± 0.31 | 88.16 ± 0.04 | 62.51 ± 0.20 | 14.04 ± 0.34 | 49.27 ± 0.52 |
![]() ![]() Aya Expanse 32B CohereLabs | 48.35 ± 0.20 | 73.87 ± 0.79 | 18.74 ± 0.74 | 86.71 ± 0.04 | 48.22 ± 0.19 | 13.13 ± 0.25 | 49.44 ± 0.26 |
![]() ![]() Olmo 2 0325 32B AI2 | 48.28 ± 0.23 | 75.08 ± 1.16 | 11.29 ± 0.48 | 82.23 ± 0.10 | 61.19 ± 0.27 | 12.31 ± 0.18 | 47.58 ± 0.64 |
![]() ![]() Qwen 2.5 7B Alibaba | 46.94 ± 0.13 | 69.37 ± 0.68 | 14.51 ± 0.38 | 81.40 ± 0.05 | 56.37 ± 0.08 | 13.91 ± 0.12 | 46.07 ± 0.27 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.83 ± 0.18 | 74.57 ± 0.85 | 14.38 ± 0.52 | 88.08 ± 0.04 | 50.07 ± 0.27 | 14.74 ± 0.16 | 33.11 ± 0.64 |
![]() ![]() Llama 3.1 8B Meta | 45.49 ± 0.22 | 64.63 ± 1.08 | 10.07 ± 0.37 | 87.30 ± 0.03 | 59.11 ± 0.17 | 15.03 ± 0.24 | 36.82 ± 0.54 |
![]() ![]() Sailor2 20B SAIL | 45.19 ± 0.17 | 41.02 ± 0.68 | 25.88 ± 0.68 | 76.16 ± 0.18 | 65.47 ± 0.14 | 12.73 ± 0.12 | 49.87 ± 0.31 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 44.39 ± 0.22 | 64.22 ± 0.99 | 9.71 ± 0.44 | 82.03 ± 0.07 | 55.72 ± 0.26 | 0.00 ± 0.00 | 54.68 ± 0.63 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 44.12 ± 0.26 | 67.81 ± 1.06 | 7.39 ± 0.46 | 84.03 ± 0.07 | 62.48 ± 0.29 | 4.18 ± 0.36 | 38.85 ± 0.76 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.12 ± 0.18 | 63.78 ± 0.78 | 13.30 ± 0.65 | 81.71 ± 0.06 | 56.00 ± 0.18 | 12.41 ± 0.25 | 37.53 ± 0.20 |
![]() ![]() Sailor2 8B SAIL | 43.90 ± 0.19 | 38.92 ± 0.89 | 25.56 ± 0.43 | 85.24 ± 0.07 | 51.53 ± 0.16 | 19.51 ± 0.31 | 42.67 ± 0.39 |
![]() ![]() Llama 3 70B Meta | 43.42 ± 0.10 | 24.10 ± 0.47 | 9.07 ± 0.44 | 89.75 ± 0.02 | 66.39 ± 0.10 | 9.59 ± 0.16 | 61.64 ± 0.29 |
![]() ![]() Tulu 3 8B AI2 | 42.86 ± 0.14 | 73.08 ± 0.59 | 12.67 ± 0.48 | 87.26 ± 0.05 | 53.32 ± 0.18 | 0.00 ± 0.00 | 30.81 ± 0.35 |
![]() ![]() Apertus 70B Swiss AI | 42.69 ± 0.24 | 62.73 ± 1.01 | 14.80 ± 0.46 | 82.01 ± 0.09 | 48.52 ± 0.28 | 17.96 ± 0.45 | 30.15 ± 0.77 |
![]() ![]() phi-4 14B Microsoft | 39.71 ± 0.24 | 61.52 ± 1.09 | 15.11 ± 0.51 | 72.87 ± 0.11 | 39.61 ± 0.40 | 0.00 ± 0.00 | 49.11 ± 0.40 |
![]() ![]() Command R 08-2024 32B CohereLabs | 39.00 ± 0.21 | 64.63 ± 0.79 | 5.01 ± 0.39 | 81.29 ± 0.08 | 33.72 ± 0.16 | 10.82 ± 0.58 | 38.56 ± 0.63 |
![]() ![]() Apertus 8B Swiss AI | 38.91 ± 0.31 | 65.59 ± 0.97 | 5.79 ± 0.42 | 86.17 ± 0.06 | 35.86 ± 0.29 | 9.26 ± 0.76 | 30.81 ± 0.85 |
![]() ![]() Olmo 2 1124 13B AI2 | 38.54 ± 0.19 | 64.92 ± 0.89 | 6.70 ± 0.30 | 71.47 ± 0.14 | 46.86 ± 0.21 | 14.19 ± 0.25 | 27.11 ± 0.58 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 36.82 ± 0.28 | 52.35 ± 1.06 | 8.95 ± 0.40 | 82.02 ± 0.09 | 39.73 ± 0.35 | 9.01 ± 0.68 | 28.88 ± 0.84 |
![]() ![]() Llama 3 8B Meta | 35.77 ± 0.16 | 26.89 ± 0.46 | 5.47 ± 0.45 | 84.96 ± 0.06 | 53.66 ± 0.16 | 8.43 ± 0.16 | 35.19 ± 0.49 |
![]() ![]() Babel 9B Alibaba-DAMO | 31.66 ± 0.22 | 40.35 ± 0.95 | 3.79 ± 0.33 | 72.51 ± 0.13 | 35.27 ± 0.31 | 9.34 ± 0.63 | 28.69 ± 0.82 |
![]() ![]() Olmo 2 1124 7B AI2 | 29.81 ± 0.22 | 55.02 ± 1.26 | 4.24 ± 0.33 | 59.43 ± 0.13 | 30.84 ± 0.30 | 6.78 ± 0.45 | 22.57 ± 0.50 |
![]() ![]() Babel 83B Alibaba-DAMO | 28.79 ± 0.27 | 40.41 ± 1.24 | 7.13 ± 0.58 | 66.77 ± 0.18 | 22.52 ± 0.51 | 0.00 ± 0.00 | 35.89 ± 1.06 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 28.08 ± 0.26 | 48.83 ± 1.43 | 2.08 ± 0.34 | 46.82 ± 0.15 | 45.54 ± 0.26 | 0.00 ± 0.00 | 25.23 ± 0.70 |
![]() ![]() Ministral 2410 8B Mistral AI | 23.78 ± 0.30 | 28.03 ± 1.12 | 3.74 ± 0.38 | 57.48 ± 0.17 | 25.65 ± 0.23 | 7.46 ± 0.59 | 20.34 ± 0.93 |
Malay Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MS | Instruction Following | SEA-IFEval |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 62.80 ± 0.12 | 83.33 ± 0.32 | 83.33 ± 0.32 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 61.36 ± 0.14 | 81.21 ± 0.50 | 81.21 ± 0.50 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.13 ± 0.14 | 81.46 ± 0.47 | 81.46 ± 0.47 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 61.10 ± 0.16 | 83.37 ± 0.65 | 83.37 ± 0.65 |
![]() ![]() Gemma 3 27B | 60.92 ± 0.17 | 81.65 ± 0.59 | 81.65 ± 0.59 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 59.94 ± 0.17 | 85.62 ± 0.51 | 85.62 ± 0.51 |
![]() ![]() Qwen 3 32B Alibaba | 59.67 ± 0.17 | 80.38 ± 0.62 | 80.38 ± 0.62 |
![]() ![]() Qwen 2.5 72B Alibaba | 59.48 ± 0.14 | 79.24 ± 0.58 | 79.24 ± 0.58 |
![]() ![]() Llama 3.3 70B Meta | 58.80 ± 0.15 | 86.51 ± 0.40 | 86.51 ± 0.40 |
![]() ![]() Gemma 3 12B | 57.96 ± 0.12 | 81.65 ± 0.67 | 81.65 ± 0.67 |
![]() ![]() Tulu 3 70B AI2 | 57.39 ± 0.19 | 80.00 ± 0.65 | 80.00 ± 0.65 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 57.38 ± 0.11 | 84.38 ± 0.46 | 84.38 ± 0.46 |
![]() ![]() Llama 3.1 70B Meta | 55.67 ± 0.18 | 75.05 ± 0.86 | 75.05 ± 0.86 |
![]() ![]() Qwen 3 14B Alibaba | 55.04 ± 0.11 | 78.35 ± 0.42 | 78.35 ± 0.42 |
![]() ![]() Command A 03-2025 111B CohereLabs | 54.76 ± 0.17 | 84.79 ± 0.63 | 84.79 ± 0.63 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 54.70 ± 0.18 | 81.05 ± 0.85 | 81.05 ± 0.85 |
![]() ![]() Gemma 2 27B | 54.06 ± 0.22 | 74.00 ± 0.80 | 74.00 ± 0.80 |
![]() ![]() Qwen 3 8B Alibaba | 53.83 ± 0.15 | 79.05 ± 0.63 | 79.05 ± 0.63 |
![]() ![]() Qwen 2.5 32B Alibaba | 53.71 ± 0.13 | 79.24 ± 0.64 | 79.24 ± 0.64 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 52.73 ± 0.21 | 79.56 ± 0.87 | 79.56 ± 0.87 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.49 ± 0.13 | 74.54 ± 0.61 | 74.54 ± 0.61 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 51.32 ± 0.28 | 78.13 ± 0.74 | 78.13 ± 0.74 |
![]() ![]() Gemma 2 9B | 49.72 ± 0.21 | 73.14 ± 0.88 | 73.14 ± 0.88 |
![]() ![]() MERaLiON 2 10B A*STAR | 49.12 ± 0.22 | 69.11 ± 0.95 | 69.11 ± 0.95 |
![]() ![]() Aya Expanse 32B CohereLabs | 48.35 ± 0.20 | 73.87 ± 0.79 | 73.87 ± 0.79 |
![]() ![]() Olmo 2 0325 32B AI2 | 48.28 ± 0.23 | 75.08 ± 1.16 | 75.08 ± 1.16 |
![]() ![]() Qwen 2.5 7B Alibaba | 46.94 ± 0.13 | 69.37 ± 0.68 | 69.37 ± 0.68 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.83 ± 0.18 | 74.57 ± 0.85 | 74.57 ± 0.85 |
![]() ![]() Llama 3.1 8B Meta | 45.49 ± 0.22 | 64.63 ± 1.08 | 64.63 ± 1.08 |
![]() ![]() Sailor2 20B SAIL | 45.19 ± 0.17 | 41.02 ± 0.68 | 41.02 ± 0.68 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 44.39 ± 0.22 | 64.22 ± 0.99 | 64.22 ± 0.99 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 44.12 ± 0.26 | 67.81 ± 1.06 | 67.81 ± 1.06 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.12 ± 0.18 | 63.78 ± 0.78 | 63.78 ± 0.78 |
![]() ![]() Sailor2 8B SAIL | 43.90 ± 0.19 | 38.92 ± 0.89 | 38.92 ± 0.89 |
![]() ![]() Llama 3 70B Meta | 43.42 ± 0.10 | 24.10 ± 0.47 | 24.10 ± 0.47 |
![]() ![]() Tulu 3 8B AI2 | 42.86 ± 0.14 | 73.08 ± 0.59 | 73.08 ± 0.59 |
![]() ![]() Apertus 70B Swiss AI | 42.69 ± 0.24 | 62.73 ± 1.01 | 62.73 ± 1.01 |
![]() ![]() phi-4 14B Microsoft | 39.71 ± 0.24 | 61.52 ± 1.09 | 61.52 ± 1.09 |
![]() ![]() Command R 08-2024 32B CohereLabs | 39.00 ± 0.21 | 64.63 ± 0.79 | 64.63 ± 0.79 |
![]() ![]() Apertus 8B Swiss AI | 38.91 ± 0.31 | 65.59 ± 0.97 | 65.59 ± 0.97 |
![]() ![]() Olmo 2 1124 13B AI2 | 38.54 ± 0.19 | 64.92 ± 0.89 | 64.92 ± 0.89 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 36.82 ± 0.28 | 52.35 ± 1.06 | 52.35 ± 1.06 |
![]() ![]() Llama 3 8B Meta | 35.77 ± 0.16 | 26.89 ± 0.46 | 26.89 ± 0.46 |
![]() ![]() Babel 9B Alibaba-DAMO | 31.66 ± 0.22 | 40.35 ± 0.95 | 40.35 ± 0.95 |
![]() ![]() Olmo 2 1124 7B AI2 | 29.81 ± 0.22 | 55.02 ± 1.26 | 55.02 ± 1.26 |
![]() ![]() Babel 83B Alibaba-DAMO | 28.79 ± 0.27 | 40.41 ± 1.24 | 40.41 ± 1.24 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 28.08 ± 0.26 | 48.83 ± 1.43 | 48.83 ± 1.43 |
![]() ![]() Ministral 2410 8B Mistral AI | 23.78 ± 0.30 | 28.03 ± 1.12 | 28.03 ± 1.12 |
Model | MS | Multi-Turn Chat | SEA-MT-Bench |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 62.80 ± 0.12 | 57.47 ± 0.62 | 57.47 ± 0.62 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 61.36 ± 0.14 | 42.30 ± 0.70 | 42.30 ± 0.70 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.13 ± 0.14 | 51.48 ± 0.57 | 51.48 ± 0.57 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 61.10 ± 0.16 | 44.78 ± 0.80 | 44.78 ± 0.80 |
![]() ![]() Gemma 3 27B | 60.92 ± 0.17 | 45.89 ± 0.56 | 45.89 ± 0.56 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 59.94 ± 0.17 | 25.55 ± 0.75 | 25.55 ± 0.75 |
![]() ![]() Qwen 3 32B Alibaba | 59.67 ± 0.17 | 41.64 ± 0.74 | 41.64 ± 0.74 |
![]() ![]() Qwen 2.5 72B Alibaba | 59.48 ± 0.14 | 32.76 ± 0.55 | 32.76 ± 0.55 |
![]() ![]() Llama 3.3 70B Meta | 58.80 ± 0.15 | 17.07 ± 0.57 | 17.07 ± 0.57 |
![]() ![]() Gemma 3 12B | 57.96 ± 0.12 | 38.02 ± 0.60 | 38.02 ± 0.60 |
![]() ![]() Tulu 3 70B AI2 | 57.39 ± 0.19 | 27.60 ± 0.58 | 27.60 ± 0.58 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 57.38 ± 0.11 | 21.34 ± 0.52 | 21.34 ± 0.52 |
![]() ![]() Llama 3.1 70B Meta | 55.67 ± 0.18 | 13.97 ± 0.48 | 13.97 ± 0.48 |
![]() ![]() Qwen 3 14B Alibaba | 55.04 ± 0.11 | 31.61 ± 0.52 | 31.61 ± 0.52 |
![]() ![]() Command A 03-2025 111B CohereLabs | 54.76 ± 0.17 | 40.92 ± 0.72 | 40.92 ± 0.72 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 54.70 ± 0.18 | 25.46 ± 0.63 | 25.46 ± 0.63 |
![]() ![]() Gemma 2 27B | 54.06 ± 0.22 | 16.03 ± 0.55 | 16.03 ± 0.55 |
![]() ![]() Qwen 3 8B Alibaba | 53.83 ± 0.15 | 26.29 ± 0.66 | 26.29 ± 0.66 |
![]() ![]() Qwen 2.5 32B Alibaba | 53.71 ± 0.13 | 19.25 ± 0.44 | 19.25 ± 0.44 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 52.73 ± 0.21 | 21.49 ± 0.49 | 21.49 ± 0.49 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.49 ± 0.13 | 15.00 ± 0.41 | 15.00 ± 0.41 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 51.32 ± 0.28 | 20.11 ± 0.82 | 20.11 ± 0.82 |
![]() ![]() Gemma 2 9B | 49.72 ± 0.21 | 12.08 ± 0.44 | 12.08 ± 0.44 |
![]() ![]() MERaLiON 2 10B A*STAR | 49.12 ± 0.22 | 11.61 ± 0.31 | 11.61 ± 0.31 |
![]() ![]() Aya Expanse 32B CohereLabs | 48.35 ± 0.20 | 18.74 ± 0.74 | 18.74 ± 0.74 |
![]() ![]() Olmo 2 0325 32B AI2 | 48.28 ± 0.23 | 11.29 ± 0.48 | 11.29 ± 0.48 |
![]() ![]() Qwen 2.5 7B Alibaba | 46.94 ± 0.13 | 14.51 ± 0.38 | 14.51 ± 0.38 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.83 ± 0.18 | 14.38 ± 0.52 | 14.38 ± 0.52 |
![]() ![]() Llama 3.1 8B Meta | 45.49 ± 0.22 | 10.07 ± 0.37 | 10.07 ± 0.37 |
![]() ![]() Sailor2 20B SAIL | 45.19 ± 0.17 | 25.88 ± 0.68 | 25.88 ± 0.68 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 44.39 ± 0.22 | 9.71 ± 0.44 | 9.71 ± 0.44 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 44.12 ± 0.26 | 7.39 ± 0.46 | 7.39 ± 0.46 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.12 ± 0.18 | 13.30 ± 0.65 | 13.30 ± 0.65 |
![]() ![]() Sailor2 8B SAIL | 43.90 ± 0.19 | 25.56 ± 0.43 | 25.56 ± 0.43 |
![]() ![]() Llama 3 70B Meta | 43.42 ± 0.10 | 9.07 ± 0.44 | 9.07 ± 0.44 |
![]() ![]() Tulu 3 8B AI2 | 42.86 ± 0.14 | 12.67 ± 0.48 | 12.67 ± 0.48 |
![]() ![]() Apertus 70B Swiss AI | 42.69 ± 0.24 | 14.80 ± 0.46 | 14.80 ± 0.46 |
![]() ![]() phi-4 14B Microsoft | 39.71 ± 0.24 | 15.11 ± 0.51 | 15.11 ± 0.51 |
![]() ![]() Command R 08-2024 32B CohereLabs | 39.00 ± 0.21 | 5.01 ± 0.39 | 5.01 ± 0.39 |
![]() ![]() Apertus 8B Swiss AI | 38.91 ± 0.31 | 5.79 ± 0.42 | 5.79 ± 0.42 |
![]() ![]() Olmo 2 1124 13B AI2 | 38.54 ± 0.19 | 6.70 ± 0.30 | 6.70 ± 0.30 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 36.82 ± 0.28 | 8.95 ± 0.40 | 8.95 ± 0.40 |
![]() ![]() Llama 3 8B Meta | 35.77 ± 0.16 | 5.47 ± 0.45 | 5.47 ± 0.45 |
![]() ![]() Babel 9B Alibaba-DAMO | 31.66 ± 0.22 | 3.79 ± 0.33 | 3.79 ± 0.33 |
![]() ![]() Olmo 2 1124 7B AI2 | 29.81 ± 0.22 | 4.24 ± 0.33 | 4.24 ± 0.33 |
![]() ![]() Babel 83B Alibaba-DAMO | 28.79 ± 0.27 | 7.13 ± 0.58 | 7.13 ± 0.58 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 28.08 ± 0.26 | 2.08 ± 0.34 | 2.08 ± 0.34 |
![]() ![]() Ministral 2410 8B Mistral AI | 23.78 ± 0.30 | 3.74 ± 0.38 | 3.74 ± 0.38 |
Model | MS | NLG | Translations |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 62.80 ± 0.12 | 90.14 ± 0.01 | 90.14 ± 0.01 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 61.36 ± 0.14 | 90.27 ± 0.02 | 90.27 ± 0.02 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.13 ± 0.14 | 90.07 ± 0.02 | 90.07 ± 0.02 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 61.10 ± 0.16 | 91.71 ± 0.02 | 91.71 ± 0.02 |
![]() ![]() Gemma 3 27B | 60.92 ± 0.17 | 91.83 ± 0.01 | 91.83 ± 0.01 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 59.94 ± 0.17 | 91.01 ± 0.02 | 91.01 ± 0.02 |
![]() ![]() Qwen 3 32B Alibaba | 59.67 ± 0.17 | 88.97 ± 0.03 | 88.97 ± 0.03 |
![]() ![]() Qwen 2.5 72B Alibaba | 59.48 ± 0.14 | 89.60 ± 0.03 | 89.60 ± 0.03 |
![]() ![]() Llama 3.3 70B Meta | 58.80 ± 0.15 | 90.10 ± 0.02 | 90.10 ± 0.02 |
![]() ![]() Gemma 3 12B | 57.96 ± 0.12 | 91.02 ± 0.02 | 91.02 ± 0.02 |
![]() ![]() Tulu 3 70B AI2 | 57.39 ± 0.19 | 91.06 ± 0.02 | 91.06 ± 0.02 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 57.38 ± 0.11 | 91.03 ± 0.02 | 91.03 ± 0.02 |
![]() ![]() Llama 3.1 70B Meta | 55.67 ± 0.18 | 90.34 ± 0.02 | 90.34 ± 0.02 |
![]() ![]() Qwen 3 14B Alibaba | 55.04 ± 0.11 | 87.44 ± 0.02 | 87.44 ± 0.02 |
![]() ![]() Command A 03-2025 111B CohereLabs | 54.76 ± 0.17 | 87.16 ± 0.07 | 87.16 ± 0.07 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 54.70 ± 0.18 | 90.38 ± 0.03 | 90.38 ± 0.03 |
![]() ![]() Gemma 2 27B | 54.06 ± 0.22 | 90.48 ± 0.03 | 90.48 ± 0.03 |
![]() ![]() Qwen 3 8B Alibaba | 53.83 ± 0.15 | 87.19 ± 0.04 | 87.19 ± 0.04 |
![]() ![]() Qwen 2.5 32B Alibaba | 53.71 ± 0.13 | 86.01 ± 0.04 | 86.01 ± 0.04 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 52.73 ± 0.21 | 89.19 ± 0.04 | 89.19 ± 0.04 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.49 ± 0.13 | 85.33 ± 0.05 | 85.33 ± 0.05 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 51.32 ± 0.28 | 89.46 ± 0.03 | 89.46 ± 0.03 |
![]() ![]() Gemma 2 9B | 49.72 ± 0.21 | 89.08 ± 0.04 | 89.08 ± 0.04 |
![]() ![]() MERaLiON 2 10B A*STAR | 49.12 ± 0.22 | 88.16 ± 0.04 | 88.16 ± 0.04 |
![]() ![]() Aya Expanse 32B CohereLabs | 48.35 ± 0.20 | 86.71 ± 0.04 | 86.71 ± 0.04 |
![]() ![]() Olmo 2 0325 32B AI2 | 48.28 ± 0.23 | 82.23 ± 0.10 | 82.23 ± 0.10 |
![]() ![]() Qwen 2.5 7B Alibaba | 46.94 ± 0.13 | 81.40 ± 0.05 | 81.40 ± 0.05 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.83 ± 0.18 | 88.08 ± 0.04 | 88.08 ± 0.04 |
![]() ![]() Llama 3.1 8B Meta | 45.49 ± 0.22 | 87.30 ± 0.03 | 87.30 ± 0.03 |
![]() ![]() Sailor2 20B SAIL | 45.19 ± 0.17 | 76.16 ± 0.18 | 76.16 ± 0.18 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 44.39 ± 0.22 | 82.03 ± 0.07 | 82.03 ± 0.07 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 44.12 ± 0.26 | 84.03 ± 0.07 | 84.03 ± 0.07 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.12 ± 0.18 | 81.71 ± 0.06 | 81.71 ± 0.06 |
![]() ![]() Sailor2 8B SAIL | 43.90 ± 0.19 | 85.24 ± 0.07 | 85.24 ± 0.07 |
![]() ![]() Llama 3 70B Meta | 43.42 ± 0.10 | 89.75 ± 0.02 | 89.75 ± 0.02 |
![]() ![]() Tulu 3 8B AI2 | 42.86 ± 0.14 | 87.26 ± 0.05 | 87.26 ± 0.05 |
![]() ![]() Apertus 70B Swiss AI | 42.69 ± 0.24 | 82.01 ± 0.09 | 82.01 ± 0.09 |
![]() ![]() phi-4 14B Microsoft | 39.71 ± 0.24 | 72.87 ± 0.11 | 72.87 ± 0.11 |
![]() ![]() Command R 08-2024 32B CohereLabs | 39.00 ± 0.21 | 81.29 ± 0.08 | 81.29 ± 0.08 |
![]() ![]() Apertus 8B Swiss AI | 38.91 ± 0.31 | 86.17 ± 0.06 | 86.17 ± 0.06 |
![]() ![]() Olmo 2 1124 13B AI2 | 38.54 ± 0.19 | 71.47 ± 0.14 | 71.47 ± 0.14 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 36.82 ± 0.28 | 82.02 ± 0.09 | 82.02 ± 0.09 |
![]() ![]() Llama 3 8B Meta | 35.77 ± 0.16 | 84.96 ± 0.06 | 84.96 ± 0.06 |
![]() ![]() Babel 9B Alibaba-DAMO | 31.66 ± 0.22 | 72.51 ± 0.13 | 72.51 ± 0.13 |
![]() ![]() Olmo 2 1124 7B AI2 | 29.81 ± 0.22 | 59.43 ± 0.13 | 59.43 ± 0.13 |
![]() ![]() Babel 83B Alibaba-DAMO | 28.79 ± 0.27 | 66.77 ± 0.18 | 66.77 ± 0.18 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 28.08 ± 0.26 | 46.82 ± 0.15 | 46.82 ± 0.15 |
![]() ![]() Ministral 2410 8B Mistral AI | 23.78 ± 0.30 | 57.48 ± 0.17 | 57.48 ± 0.17 |
Model | MS | NLU | Belebele QA | Sentiment Analysis |
---|---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 62.80 ± 0.12 | 67.90 ± 0.13 | 86.14 ± 0.16 | 49.65 ± 0.17 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 61.36 ± 0.14 | 68.42 ± 0.09 | 90.24 ± 0.05 | 46.60 ± 0.16 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.13 ± 0.14 | 67.60 ± 0.05 | 87.00 ± 0.05 | 48.20 ± 0.11 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 61.10 ± 0.16 | 69.31 ± 0.10 | 88.15 ± 0.15 | 50.47 ± 0.12 |
![]() ![]() Gemma 3 27B | 60.92 ± 0.17 | 69.16 ± 0.08 | 87.86 ± 0.12 | 50.46 ± 0.12 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 59.94 ± 0.17 | 69.71 ± 0.14 | 90.44 ± 0.15 | 48.99 ± 0.28 |
![]() ![]() Qwen 3 32B Alibaba | 59.67 ± 0.17 | 66.47 ± 0.13 | 88.57 ± 0.07 | 44.37 ± 0.21 |
![]() ![]() Qwen 2.5 72B Alibaba | 59.48 ± 0.14 | 69.11 ± 0.10 | 90.03 ± 0.11 | 48.19 ± 0.17 |
![]() ![]() Llama 3.3 70B Meta | 58.80 ± 0.15 | 69.75 ± 0.09 | 90.86 ± 0.08 | 48.65 ± 0.16 |
![]() ![]() Gemma 3 12B | 57.96 ± 0.12 | 67.67 ± 0.07 | 85.63 ± 0.10 | 49.72 ± 0.14 |
![]() ![]() Tulu 3 70B AI2 | 57.39 ± 0.19 | 67.60 ± 0.20 | 89.44 ± 0.14 | 45.75 ± 0.36 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 57.38 ± 0.11 | 68.72 ± 0.07 | 89.26 ± 0.02 | 48.17 ± 0.12 |
![]() ![]() Llama 3.1 70B Meta | 55.67 ± 0.18 | 69.16 ± 0.17 | 90.29 ± 0.19 | 48.03 ± 0.26 |
![]() ![]() Qwen 3 14B Alibaba | 55.04 ± 0.11 | 66.47 ± 0.11 | 87.01 ± 0.11 | 45.93 ± 0.18 |
![]() ![]() Command A 03-2025 111B CohereLabs | 54.76 ± 0.17 | 49.65 ± 0.38 | 49.81 ± 0.65 | 49.49 ± 0.26 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 54.70 ± 0.18 | 65.12 ± 0.15 | 87.36 ± 0.14 | 42.88 ± 0.26 |
![]() ![]() Gemma 2 27B | 54.06 ± 0.22 | 68.19 ± 0.12 | 88.17 ± 0.18 | 48.21 ± 0.19 |
![]() ![]() Qwen 3 8B Alibaba | 53.83 ± 0.15 | 63.43 ± 0.15 | 80.77 ± 0.16 | 46.08 ± 0.20 |
![]() ![]() Qwen 2.5 32B Alibaba | 53.71 ± 0.13 | 67.36 ± 0.09 | 90.32 ± 0.06 | 44.41 ± 0.15 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 52.73 ± 0.21 | 53.06 ± 0.30 | 87.56 ± 0.20 | 18.57 ± 0.61 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.49 ± 0.13 | 65.20 ± 0.09 | 85.66 ± 0.11 | 44.73 ± 0.15 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 51.32 ± 0.28 | 60.30 ± 0.20 | 78.55 ± 0.12 | 42.04 ± 0.39 |
![]() ![]() Gemma 2 9B | 49.72 ± 0.21 | 61.98 ± 0.15 | 86.51 ± 0.13 | 37.45 ± 0.30 |
![]() ![]() MERaLiON 2 10B A*STAR | 49.12 ± 0.22 | 62.51 ± 0.20 | 83.32 ± 0.17 | 41.71 ± 0.35 |
![]() ![]() Aya Expanse 32B CohereLabs | 48.35 ± 0.20 | 48.22 ± 0.19 | 79.16 ± 0.21 | 17.28 ± 0.34 |
![]() ![]() Olmo 2 0325 32B AI2 | 48.28 ± 0.23 | 61.19 ± 0.27 | 78.60 ± 0.21 | 43.78 ± 0.48 |
![]() ![]() Qwen 2.5 7B Alibaba | 46.94 ± 0.13 | 56.37 ± 0.08 | 75.19 ± 0.08 | 37.55 ± 0.16 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.83 ± 0.18 | 50.07 ± 0.27 | 61.76 ± 0.42 | 38.38 ± 0.25 |
![]() ![]() Llama 3.1 8B Meta | 45.49 ± 0.22 | 59.11 ± 0.17 | 75.82 ± 0.25 | 42.40 ± 0.27 |
![]() ![]() Sailor2 20B SAIL | 45.19 ± 0.17 | 65.47 ± 0.14 | 86.64 ± 0.08 | 44.31 ± 0.26 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 44.39 ± 0.22 | 55.72 ± 0.26 | 86.32 ± 0.24 | 25.13 ± 0.53 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 44.12 ± 0.26 | 62.48 ± 0.29 | 79.15 ± 0.31 | 45.80 ± 0.44 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.12 ± 0.18 | 56.00 ± 0.18 | 68.12 ± 0.28 | 43.88 ± 0.20 |
![]() ![]() Sailor2 8B SAIL | 43.90 ± 0.19 | 51.53 ± 0.16 | 75.59 ± 0.16 | 27.46 ± 0.28 |
![]() ![]() Llama 3 70B Meta | 43.42 ± 0.10 | 66.39 ± 0.10 | 87.63 ± 0.09 | 45.14 ± 0.17 |
![]() ![]() Tulu 3 8B AI2 | 42.86 ± 0.14 | 53.32 ± 0.18 | 69.89 ± 0.16 | 36.75 ± 0.36 |
![]() ![]() Apertus 70B Swiss AI | 42.69 ± 0.24 | 48.52 ± 0.28 | 60.86 ± 0.37 | 36.19 ± 0.47 |
![]() ![]() phi-4 14B Microsoft | 39.71 ± 0.24 | 39.61 ± 0.40 | 37.03 ± 0.70 | 42.19 ± 0.37 |
![]() ![]() Command R 08-2024 32B CohereLabs | 39.00 ± 0.21 | 33.72 ± 0.16 | 67.43 ± 0.32 | 0.00 ± 0.00 |
![]() ![]() Apertus 8B Swiss AI | 38.91 ± 0.31 | 35.86 ± 0.29 | 47.87 ± 0.49 | 23.84 ± 0.49 |
![]() ![]() Olmo 2 1124 13B AI2 | 38.54 ± 0.19 | 46.86 ± 0.21 | 60.64 ± 0.26 | 33.08 ± 0.42 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 36.82 ± 0.28 | 39.73 ± 0.35 | 62.44 ± 0.44 | 17.01 ± 0.43 |
![]() ![]() Llama 3 8B Meta | 35.77 ± 0.16 | 53.66 ± 0.16 | 64.91 ± 0.19 | 42.41 ± 0.24 |
![]() ![]() Babel 9B Alibaba-DAMO | 31.66 ± 0.22 | 35.27 ± 0.31 | 63.45 ± 0.47 | 7.08 ± 0.41 |
![]() ![]() Olmo 2 1124 7B AI2 | 29.81 ± 0.22 | 30.84 ± 0.30 | 43.63 ± 0.46 | 18.05 ± 0.43 |
![]() ![]() Babel 83B Alibaba-DAMO | 28.79 ± 0.27 | 22.52 ± 0.51 | 32.20 ± 0.92 | 12.84 ± 0.67 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 28.08 ± 0.26 | 45.54 ± 0.26 | 58.61 ± 0.37 | 32.47 ± 0.46 |
![]() ![]() Ministral 2410 8B Mistral AI | 23.78 ± 0.30 | 25.65 ± 0.23 | 51.29 ± 0.45 | 0.00 ± 0.00 |
Model | MS | Safety | Toxicity Detection |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 62.80 ± 0.12 | 7.58 ± 0.17 | 7.58 ± 0.17 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 61.36 ± 0.14 | 16.31 ± 0.13 | 16.31 ± 0.13 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.13 ± 0.14 | 13.06 ± 0.11 | 13.06 ± 0.11 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 61.10 ± 0.16 | 14.57 ± 0.21 | 14.57 ± 0.21 |
![]() ![]() Gemma 3 27B | 60.92 ± 0.17 | 13.86 ± 0.25 | 13.86 ± 0.25 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 59.94 ± 0.17 | 13.62 ± 0.36 | 13.62 ± 0.36 |
![]() ![]() Qwen 3 32B Alibaba | 59.67 ± 0.17 | 13.18 ± 0.29 | 13.18 ± 0.29 |
![]() ![]() Qwen 2.5 72B Alibaba | 59.48 ± 0.14 | 15.39 ± 0.22 | 15.39 ± 0.22 |
![]() ![]() Llama 3.3 70B Meta | 58.80 ± 0.15 | 17.48 ± 0.19 | 17.48 ± 0.19 |
![]() ![]() Gemma 3 12B | 57.96 ± 0.12 | 11.57 ± 0.18 | 11.57 ± 0.18 |
![]() ![]() Tulu 3 70B AI2 | 57.39 ± 0.19 | 11.11 ± 0.27 | 11.11 ± 0.27 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 57.38 ± 0.11 | 14.05 ± 0.13 | 14.05 ± 0.13 |
![]() ![]() Llama 3.1 70B Meta | 55.67 ± 0.18 | 15.90 ± 0.24 | 15.90 ± 0.24 |
![]() ![]() Qwen 3 14B Alibaba | 55.04 ± 0.11 | 12.54 ± 0.22 | 12.54 ± 0.22 |
![]() ![]() Command A 03-2025 111B CohereLabs | 54.76 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 54.70 ± 0.18 | 12.19 ± 0.41 | 12.19 ± 0.41 |
![]() ![]() Gemma 2 27B | 54.06 ± 0.22 | 14.11 ± 0.53 | 14.11 ± 0.53 |
![]() ![]() Qwen 3 8B Alibaba | 53.83 ± 0.15 | 15.30 ± 0.17 | 15.30 ± 0.17 |
![]() ![]() Qwen 2.5 32B Alibaba | 53.71 ± 0.13 | 9.77 ± 0.22 | 9.77 ± 0.22 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 52.73 ± 0.21 | 14.72 ± 0.54 | 14.72 ± 0.54 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.49 ± 0.13 | 15.65 ± 0.17 | 15.65 ± 0.17 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 51.32 ± 0.28 | 15.08 ± 0.29 | 15.08 ± 0.29 |
![]() ![]() Gemma 2 9B | 49.72 ± 0.21 | 10.75 ± 0.28 | 10.75 ± 0.28 |
![]() ![]() MERaLiON 2 10B A*STAR | 49.12 ± 0.22 | 14.04 ± 0.34 | 14.04 ± 0.34 |
![]() ![]() Aya Expanse 32B CohereLabs | 48.35 ± 0.20 | 13.13 ± 0.25 | 13.13 ± 0.25 |
![]() ![]() Olmo 2 0325 32B AI2 | 48.28 ± 0.23 | 12.31 ± 0.18 | 12.31 ± 0.18 |
![]() ![]() Qwen 2.5 7B Alibaba | 46.94 ± 0.13 | 13.91 ± 0.12 | 13.91 ± 0.12 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.83 ± 0.18 | 14.74 ± 0.16 | 14.74 ± 0.16 |
![]() ![]() Llama 3.1 8B Meta | 45.49 ± 0.22 | 15.03 ± 0.24 | 15.03 ± 0.24 |
![]() ![]() Sailor2 20B SAIL | 45.19 ± 0.17 | 12.73 ± 0.12 | 12.73 ± 0.12 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 44.39 ± 0.22 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 44.12 ± 0.26 | 4.18 ± 0.36 | 4.18 ± 0.36 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.12 ± 0.18 | 12.41 ± 0.25 | 12.41 ± 0.25 |
![]() ![]() Sailor2 8B SAIL | 43.90 ± 0.19 | 19.51 ± 0.31 | 19.51 ± 0.31 |
![]() ![]() Llama 3 70B Meta | 43.42 ± 0.10 | 9.59 ± 0.16 | 9.59 ± 0.16 |
![]() ![]() Tulu 3 8B AI2 | 42.86 ± 0.14 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Apertus 70B Swiss AI | 42.69 ± 0.24 | 17.96 ± 0.45 | 17.96 ± 0.45 |
![]() ![]() phi-4 14B Microsoft | 39.71 ± 0.24 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R 08-2024 32B CohereLabs | 39.00 ± 0.21 | 10.82 ± 0.58 | 10.82 ± 0.58 |
![]() ![]() Apertus 8B Swiss AI | 38.91 ± 0.31 | 9.26 ± 0.76 | 9.26 ± 0.76 |
![]() ![]() Olmo 2 1124 13B AI2 | 38.54 ± 0.19 | 14.19 ± 0.25 | 14.19 ± 0.25 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 36.82 ± 0.28 | 9.01 ± 0.68 | 9.01 ± 0.68 |
![]() ![]() Llama 3 8B Meta | 35.77 ± 0.16 | 8.43 ± 0.16 | 8.43 ± 0.16 |
![]() ![]() Babel 9B Alibaba-DAMO | 31.66 ± 0.22 | 9.34 ± 0.63 | 9.34 ± 0.63 |
![]() ![]() Olmo 2 1124 7B AI2 | 29.81 ± 0.22 | 6.78 ± 0.45 | 6.78 ± 0.45 |
![]() ![]() Babel 83B Alibaba-DAMO | 28.79 ± 0.27 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 28.08 ± 0.26 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Ministral 2410 8B Mistral AI | 23.78 ± 0.30 | 7.46 ± 0.59 | 7.46 ± 0.59 |
Model | MS | Knowledge | Global MMLU Lite |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 62.80 ± 0.12 | 70.36 ± 0.19 | 70.36 ± 0.19 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 61.36 ± 0.14 | 69.65 ± 0.18 | 69.65 ± 0.18 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.13 ± 0.14 | 63.12 ± 0.26 | 63.12 ± 0.26 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 61.10 ± 0.16 | 62.86 ± 0.28 | 62.86 ± 0.28 |
![]() ![]() Gemma 3 27B | 60.92 ± 0.17 | 63.13 ± 0.34 | 63.13 ± 0.34 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 59.94 ± 0.17 | 74.12 ± 0.44 | 74.12 ± 0.44 |
![]() ![]() Qwen 3 32B Alibaba | 59.67 ± 0.17 | 67.38 ± 0.27 | 67.38 ± 0.27 |
![]() ![]() Qwen 2.5 72B Alibaba | 59.48 ± 0.14 | 70.78 ± 0.32 | 70.78 ± 0.32 |
![]() ![]() Llama 3.3 70B Meta | 58.80 ± 0.15 | 71.86 ± 0.19 | 71.86 ± 0.19 |
![]() ![]() Gemma 3 12B | 57.96 ± 0.12 | 57.83 ± 0.25 | 57.83 ± 0.25 |
![]() ![]() Tulu 3 70B AI2 | 57.39 ± 0.19 | 66.97 ± 0.30 | 66.97 ± 0.30 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 57.38 ± 0.11 | 64.79 ± 0.14 | 64.79 ± 0.14 |
![]() ![]() Llama 3.1 70B Meta | 55.67 ± 0.18 | 69.58 ± 0.35 | 69.58 ± 0.35 |
![]() ![]() Qwen 3 14B Alibaba | 55.04 ± 0.11 | 53.83 ± 0.26 | 53.83 ± 0.26 |
![]() ![]() Command A 03-2025 111B CohereLabs | 54.76 ± 0.17 | 66.04 ± 0.35 | 66.04 ± 0.35 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 54.70 ± 0.18 | 53.98 ± 0.44 | 53.98 ± 0.44 |
![]() ![]() Gemma 2 27B | 54.06 ± 0.22 | 61.53 ± 0.38 | 61.53 ± 0.38 |
![]() ![]() Qwen 3 8B Alibaba | 53.83 ± 0.15 | 51.70 ± 0.33 | 51.70 ± 0.33 |
![]() ![]() Qwen 2.5 32B Alibaba | 53.71 ± 0.13 | 60.61 ± 0.25 | 60.61 ± 0.25 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 52.73 ± 0.21 | 58.38 ± 0.45 | 58.38 ± 0.45 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.49 ± 0.13 | 53.19 ± 0.19 | 53.19 ± 0.19 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 51.32 ± 0.28 | 44.86 ± 0.53 | 44.86 ± 0.53 |
![]() ![]() Gemma 2 9B | 49.72 ± 0.21 | 51.31 ± 0.36 | 51.31 ± 0.36 |
![]() ![]() MERaLiON 2 10B A*STAR | 49.12 ± 0.22 | 49.27 ± 0.52 | 49.27 ± 0.52 |
![]() ![]() Aya Expanse 32B CohereLabs | 48.35 ± 0.20 | 49.44 ± 0.26 | 49.44 ± 0.26 |
![]() ![]() Olmo 2 0325 32B AI2 | 48.28 ± 0.23 | 47.58 ± 0.64 | 47.58 ± 0.64 |
![]() ![]() Qwen 2.5 7B Alibaba | 46.94 ± 0.13 | 46.07 ± 0.27 | 46.07 ± 0.27 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.83 ± 0.18 | 33.11 ± 0.64 | 33.11 ± 0.64 |
![]() ![]() Llama 3.1 8B Meta | 45.49 ± 0.22 | 36.82 ± 0.54 | 36.82 ± 0.54 |
![]() ![]() Sailor2 20B SAIL | 45.19 ± 0.17 | 49.87 ± 0.31 | 49.87 ± 0.31 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 44.39 ± 0.22 | 54.68 ± 0.63 | 54.68 ± 0.63 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 44.12 ± 0.26 | 38.85 ± 0.76 | 38.85 ± 0.76 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.12 ± 0.18 | 37.53 ± 0.20 | 37.53 ± 0.20 |
![]() ![]() Sailor2 8B SAIL | 43.90 ± 0.19 | 42.67 ± 0.39 | 42.67 ± 0.39 |
![]() ![]() Llama 3 70B Meta | 43.42 ± 0.10 | 61.64 ± 0.29 | 61.64 ± 0.29 |
![]() ![]() Tulu 3 8B AI2 | 42.86 ± 0.14 | 30.81 ± 0.35 | 30.81 ± 0.35 |
![]() ![]() Apertus 70B Swiss AI | 42.69 ± 0.24 | 30.15 ± 0.77 | 30.15 ± 0.77 |
![]() ![]() phi-4 14B Microsoft | 39.71 ± 0.24 | 49.11 ± 0.40 | 49.11 ± 0.40 |
![]() ![]() Command R 08-2024 32B CohereLabs | 39.00 ± 0.21 | 38.56 ± 0.63 | 38.56 ± 0.63 |
![]() ![]() Apertus 8B Swiss AI | 38.91 ± 0.31 | 30.81 ± 0.85 | 30.81 ± 0.85 |
![]() ![]() Olmo 2 1124 13B AI2 | 38.54 ± 0.19 | 27.11 ± 0.58 | 27.11 ± 0.58 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 36.82 ± 0.28 | 28.88 ± 0.84 | 28.88 ± 0.84 |
![]() ![]() Llama 3 8B Meta | 35.77 ± 0.16 | 35.19 ± 0.49 | 35.19 ± 0.49 |
![]() ![]() Babel 9B Alibaba-DAMO | 31.66 ± 0.22 | 28.69 ± 0.82 | 28.69 ± 0.82 |
![]() ![]() Olmo 2 1124 7B AI2 | 29.81 ± 0.22 | 22.57 ± 0.50 | 22.57 ± 0.50 |
![]() ![]() Babel 83B Alibaba-DAMO | 28.79 ± 0.27 | 35.89 ± 1.06 | 35.89 ± 1.06 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 28.08 ± 0.26 | 25.23 ± 0.70 | 25.23 ± 0.70 |
![]() ![]() Ministral 2410 8B Mistral AI | 23.78 ± 0.30 | 20.34 ± 0.93 | 20.34 ± 0.93 |