Malay Performance
Malay Scores by Model
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 30B MoE 71.55±0.28 |
![]() ![]() 27B 71.31±0.43 |
![]() ![]() 27B 71.20±0.34 |
![]() ![]() 32B 70.01±0.23 |
![]() ![]() 70B 69.82±0.43 |
![]() ![]() 72B 69.35±0.34 |
![]() ![]() 12B 68.68±0.52 |
![]() ![]() 70B 68.37±0.23 |
![]() ![]() 70B 67.83±0.44 |
![]() ![]() 109B MoE 67.58±0.22 |
![]() ![]() 111B 66.34±3.38 |
![]() ![]() 14B 66.09±0.18 |
![]() ![]() 9B 65.76±0.37 |
![]() ![]() 70B 65.44±0.31 |
![]() ![]() 8B 64.72±0.37 |
![]() ![]() 32B 64.64±0.18 |
![]() ![]() 27B 64.42±0.45 |
![]() ![]() 123B 63.95±1.33 |
![]() ![]() 8B 62.70±0.36 |
![]() ![]() 14B 62.32±0.29 |
![]() ![]() 9B 61.02±0.45 |
![]() ![]() 10B 60.46±0.61 |
![]() ![]() 32B 60.29±0.84 |
![]() ![]() 32B 59.66±0.40 |
![]() ![]() 7B 58.57±0.33 |
![]() ![]() 21B MoE 58.03±1.75 |
![]() ![]() 8B 57.14±0.43 |
![]() ![]() 104B 56.48±0.67 |
![]() ![]() 20B 56.09±0.35 |
![]() ![]() 8B 55.77±0.29 |
![]() ![]() 8B 55.28±0.22 |
![]() ![]() 8B 54.99±0.85 |
![]() ![]() 24B 54.73±3.70 |
![]() ![]() 70B 53.99±0.35 |
![]() ![]() 32B 51.15±1.97 |
![]() ![]() 14B 51.05±3.48 |
![]() ![]() 13B 50.95±0.81 |
![]() ![]() 7B 50.26±0.53 |
![]() ![]() 8B 48.13±0.38 |
![]() ![]() 9B 45.27±0.58 |
![]() ![]() 7B 43.61±0.95 |
![]() ![]() 83B 40.21±6.22 |
![]() ![]() 7B 39.17±2.51 |
![]() ![]() 8B 37.96±2.25 |
Malay Competencies
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MS | Instruction Following | Multi-Turn Chat | NLG | NLU | Safety | Knowledge |
---|---|---|---|---|---|---|---|
![]() ![]() Qwen 3 30B MoE Alibaba | 71.55 ± 0.28 | 81.79 ± 1.14 | 51.29 ± 1.19 | 90.08 ± 0.04 | 77.73 ± 0.09 | 56.50 ± 0.12 | 71.88 ± 0.19 |
![]() ![]() SEA-LION v4 27B AISG | 71.31 ± 0.43 | 83.21 ± 1.73 | 45.15 ± 1.48 | 91.71 ± 0.03 | 79.00 ± 0.23 | 57.22 ± 0.14 | 71.53 ± 0.39 |
![]() ![]() Gemma 3 27B | 71.20 ± 0.34 | 81.55 ± 1.17 | 46.23 ± 1.39 | 91.83 ± 0.04 | 78.85 ± 0.18 | 56.95 ± 0.10 | 71.81 ± 0.43 |
![]() ![]() Qwen 3 32B Alibaba | 70.01 ± 0.23 | 80.71 ± 0.92 | 41.65 ± 0.78 | 88.96 ± 0.11 | 77.20 ± 0.37 | 56.49 ± 0.51 | 75.03 ± 0.31 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 69.82 ± 0.43 | 85.83 ± 1.47 | 26.08 ± 2.10 | 91.01 ± 0.10 | 79.27 ± 0.22 | 56.71 ± 0.64 | 80.03 ± 0.81 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.35 ± 0.34 | 79.64 ± 1.32 | 32.54 ± 1.23 | 89.59 ± 0.07 | 78.96 ± 0.24 | 57.74 ± 0.30 | 77.63 ± 0.29 |
![]() ![]() Gemma 3 12B | 68.68 ± 0.52 | 81.55 ± 1.69 | 37.82 ± 2.11 | 91.01 ± 0.05 | 77.66 ± 0.12 | 55.84 ± 0.10 | 68.22 ± 0.22 |
![]() ![]() Llama 3.3 70B Meta | 68.37 ± 0.23 | 86.67 ± 0.79 | 17.08 ± 0.75 | 90.09 ± 0.04 | 79.37 ± 0.03 | 58.76 ± 0.21 | 78.25 ± 0.26 |
![]() ![]() Tulu 3 70B AI2 | 67.83 ± 0.44 | 79.76 ± 0.98 | 28.18 ± 1.42 | 91.07 ± 0.15 | 77.87 ± 1.03 | 55.60 ± 0.52 | 74.50 ± 0.82 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 67.58 ± 0.22 | 84.52 ± 0.98 | 21.44 ± 1.68 | 91.02 ± 0.04 | 78.59 ± 0.05 | 56.95 ± 0.14 | 72.97 ± 0.22 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.34 ± 3.38 | 84.88 ± 1.19 | 40.09 ± 2.52 | 87.18 ± 0.40 | 64.00 ± 9.41 | 48.20 ± 12.85 | 73.72 ± 0.83 |
![]() ![]() Qwen 3 14B Alibaba | 66.09 ± 0.18 | 78.45 ± 1.17 | 31.95 ± 1.18 | 87.44 ± 0.33 | 76.98 ± 0.32 | 56.38 ± 0.12 | 65.31 ± 0.71 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 65.76 ± 0.37 | 81.43 ± 2.00 | 25.48 ± 1.67 | 90.36 ± 0.14 | 76.18 ± 0.90 | 56.08 ± 0.12 | 65.03 ± 1.08 |
![]() ![]() Llama 3.1 70B Meta | 65.44 ± 0.31 | 74.64 ± 1.54 | 14.06 ± 1.06 | 90.35 ± 0.05 | 79.00 ± 0.23 | 57.88 ± 0.37 | 76.69 ± 0.55 |
![]() ![]() Qwen 3 8B Alibaba | 64.72 ± 0.37 | 79.17 ± 1.78 | 26.40 ± 1.06 | 87.17 ± 0.07 | 74.66 ± 0.19 | 57.75 ± 0.25 | 63.19 ± 0.48 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.64 ± 0.18 | 78.93 ± 1.14 | 19.50 ± 0.99 | 86.02 ± 0.36 | 77.84 ± 0.15 | 54.88 ± 0.10 | 70.69 ± 0.50 |
![]() ![]() Gemma 2 27B | 64.42 ± 0.45 | 74.05 ± 0.92 | 16.16 ± 1.18 | 90.47 ± 0.18 | 78.25 ± 0.29 | 57.02 ± 0.59 | 70.56 ± 2.29 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 63.95 ± 1.33 | 79.17 ± 1.78 | 21.77 ± 1.70 | 89.19 ± 0.12 | 68.23 ± 7.67 | 57.26 ± 0.61 | 68.09 ± 0.97 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 62.70 ± 0.36 | 78.69 ± 1.54 | 20.15 ± 1.63 | 89.47 ± 0.10 | 72.57 ± 0.56 | 57.62 ± 0.37 | 57.69 ± 1.05 |
![]() ![]() Qwen 2.5 14B Alibaba | 62.32 ± 0.29 | 74.88 ± 1.27 | 14.98 ± 0.89 | 85.32 ± 0.08 | 76.21 ± 0.30 | 57.82 ± 0.10 | 64.72 ± 0.27 |
![]() ![]() Gemma 2 9B | 61.02 ± 0.45 | 72.50 ± 1.74 | 12.07 ± 1.22 | 89.10 ± 0.20 | 74.03 ± 3.05 | 55.43 ± 0.31 | 63.00 ± 0.82 |
![]() ![]() MERaLiON 2 10B A*STAR | 60.46 ± 0.61 | 70.00 ± 1.69 | 11.96 ± 1.17 | 88.13 ± 0.90 | 74.28 ± 1.96 | 57.17 ± 0.18 | 61.22 ± 1.44 |
![]() ![]() Aya Expanse 32B CohereLabs | 60.29 ± 0.84 | 73.93 ± 1.41 | 18.97 ± 1.25 | 86.72 ± 0.19 | 64.51 ± 4.13 | 56.59 ± 0.23 | 61.03 ± 0.45 |
![]() ![]() Olmo 2 0325 32B AI2 | 59.66 ± 0.40 | 75.12 ± 0.96 | 11.37 ± 0.62 | 82.26 ± 0.49 | 73.11 ± 1.43 | 56.29 ± 0.48 | 59.81 ± 1.11 |
![]() ![]() Qwen 2.5 7B Alibaba | 58.57 ± 0.33 | 69.64 ± 1.29 | 14.60 ± 0.84 | 81.40 ± 0.09 | 69.81 ± 0.21 | 56.87 ± 0.06 | 59.09 ± 0.32 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 58.03 ± 1.75 | 74.64 ± 1.93 | 14.28 ± 1.20 | 88.05 ± 0.12 | 64.93 ± 4.15 | 57.38 ± 0.34 | 48.91 ± 6.07 |
![]() ![]() Llama 3.1 8B Meta | 57.14 ± 0.43 | 64.88 ± 1.24 | 10.02 ± 0.81 | 87.27 ± 0.05 | 71.72 ± 0.41 | 57.45 ± 0.33 | 51.53 ± 1.19 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 56.48 ± 0.67 | 68.33 ± 2.22 | 7.65 ± 0.74 | 84.02 ± 0.51 | 73.93 ± 1.33 | 52.14 ± 0.66 | 52.81 ± 2.32 |
![]() ![]() Sailor2 20B SAIL | 56.09 ± 0.35 | 40.60 ± 1.05 | 25.59 ± 1.25 | 76.02 ± 2.33 | 76.26 ± 1.81 | 56.40 ± 0.04 | 61.69 ± 1.07 |
![]() ![]() Aya Expanse 8B CohereLabs | 55.77 ± 0.29 | 63.33 ± 1.12 | 13.31 ± 0.59 | 81.72 ± 0.27 | 69.25 ± 1.44 | 56.21 ± 0.29 | 50.81 ± 0.64 |
![]() ![]() Sailor2 8B SAIL | 55.28 ± 0.22 | 38.45 ± 0.78 | 25.38 ± 1.10 | 85.22 ± 0.45 | 66.50 ± 1.01 | 59.84 ± 0.47 | 56.28 ± 0.30 |
![]() ![]() Tulu 3 8B AI2 | 54.99 ± 0.85 | 72.62 ± 1.49 | 12.66 ± 0.68 | 87.27 ± 0.21 | 67.46 ± 0.82 | 43.90 ± 5.96 | 46.06 ± 0.67 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 54.73 ± 3.70 | 64.52 ± 1.65 | 9.59 ± 0.84 | 82.05 ± 0.79 | 70.11 ± 5.61 | 36.52 ± 17.16 | 65.59 ± 0.56 |
![]() ![]() Llama 3 70B Meta | 53.99 ± 0.35 | 23.81 ± 1.50 | 8.78 ± 0.76 | 89.73 ± 0.06 | 77.00 ± 0.15 | 54.76 ± 0.29 | 69.88 ± 0.36 |
![]() ![]() Command R 08-2024 32B CohereLabs | 51.15 ± 1.97 | 64.29 ± 3.12 | 5.28 ± 0.50 | 81.28 ± 0.70 | 48.22 ± 8.56 | 55.65 ± 1.22 | 52.19 ± 1.55 |
![]() ![]() phi-4 14B Microsoft | 51.05 ± 3.48 | 62.26 ± 2.69 | 15.14 ± 1.27 | 72.86 ± 0.85 | 56.96 ± 9.10 | 37.90 ± 15.27 | 61.19 ± 0.96 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.95 ± 0.81 | 64.29 ± 2.06 | 6.73 ± 0.83 | 71.48 ± 1.76 | 63.03 ± 2.04 | 57.10 ± 0.69 | 43.09 ± 0.78 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 50.26 ± 0.53 | 52.14 ± 2.51 | 9.11 ± 0.98 | 82.06 ± 0.95 | 58.10 ± 3.47 | 54.54 ± 1.56 | 45.59 ± 2.12 |
![]() ![]() Llama 3 8B Meta | 48.13 ± 0.38 | 26.79 ± 1.39 | 5.33 ± 0.75 | 84.93 ± 0.47 | 67.70 ± 0.30 | 54.20 ± 0.19 | 49.84 ± 1.08 |
![]() ![]() Babel 9B Alibaba-DAMO | 45.27 ± 0.58 | 40.60 ± 1.54 | 3.72 ± 0.53 | 72.57 ± 4.47 | 55.27 ± 3.56 | 54.69 ± 1.18 | 44.78 ± 4.42 |
![]() ![]() Olmo 2 1124 7B AI2 | 43.61 ± 0.95 | 54.52 ± 1.72 | 4.09 ± 0.50 | 59.42 ± 0.94 | 51.55 ± 3.59 | 53.39 ± 1.54 | 38.66 ± 2.11 |
![]() ![]() Babel 83B Alibaba-DAMO | 40.21 ± 6.22 | 39.40 ± 3.38 | 7.44 ± 1.21 | 66.76 ± 3.49 | 45.37 ± 12.71 | 31.11 ± 17.44 | 51.19 ± 14.69 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 39.17 ± 2.51 | 49.88 ± 4.22 | 1.99 ± 0.48 | 46.74 ± 4.48 | 61.77 ± 1.85 | 32.00 ± 9.79 | 42.63 ± 6.65 |
![]() ![]() Ministral 2410 8B Mistral AI | 37.96 ± 2.25 | 28.45 ± 3.84 | 3.88 ± 0.70 | 57.35 ± 3.64 | 46.46 ± 9.41 | 53.80 ± 0.99 | 37.84 ± 5.58 |
Malay Tasks
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MS | Instruction Following | SEA-IFEval |
---|---|---|---|
![]() ![]() Qwen 3 30B MoE Alibaba | 71.55 ± 0.28 | 81.79 ± 1.14 | 81.79 ± 6.81 |
![]() ![]() SEA-LION v4 27B AISG | 71.31 ± 0.43 | 83.21 ± 1.73 | 83.21 ± 6.01 |
![]() ![]() Gemma 3 27B | 71.20 ± 0.34 | 81.55 ± 1.17 | 81.55 ± 6.51 |
![]() ![]() Qwen 3 32B Alibaba | 70.01 ± 0.23 | 80.71 ± 0.92 | 80.71 ± 6.73 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 69.82 ± 0.43 | 85.83 ± 1.47 | 85.83 ± 5.79 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.35 ± 0.34 | 79.64 ± 1.32 | 79.64 ± 6.87 |
![]() ![]() Gemma 3 12B | 68.68 ± 0.52 | 81.55 ± 1.69 | 81.55 ± 6.39 |
![]() ![]() Llama 3.3 70B Meta | 68.37 ± 0.23 | 86.67 ± 0.79 | 86.67 ± 6.14 |
![]() ![]() Tulu 3 70B AI2 | 67.83 ± 0.44 | 79.76 ± 0.98 | 79.76 ± 6.74 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 67.58 ± 0.22 | 84.52 ± 0.98 | 84.52 ± 6.45 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.34 ± 3.38 | 84.88 ± 1.19 | 84.88 ± 5.96 |
![]() ![]() Qwen 3 14B Alibaba | 66.09 ± 0.18 | 78.45 ± 1.17 | 78.45 ± 7.42 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 65.76 ± 0.37 | 81.43 ± 2.00 | 81.43 ± 5.90 |
![]() ![]() Llama 3.1 70B Meta | 65.44 ± 0.31 | 74.64 ± 1.54 | 74.64 ± 7.06 |
![]() ![]() Qwen 3 8B Alibaba | 64.72 ± 0.37 | 79.17 ± 1.78 | 79.17 ± 6.98 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.64 ± 0.18 | 78.93 ± 1.14 | 78.93 ± 7.17 |
![]() ![]() Gemma 2 27B | 64.42 ± 0.45 | 74.05 ± 0.92 | 74.05 ± 7.09 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 63.95 ± 1.33 | 79.17 ± 1.78 | 79.17 ± 6.35 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 62.70 ± 0.36 | 78.69 ± 1.54 | 78.69 ± 6.64 |
![]() ![]() Qwen 2.5 14B Alibaba | 62.32 ± 0.29 | 74.88 ± 1.27 | 74.88 ± 7.26 |
![]() ![]() Gemma 2 9B | 61.02 ± 0.45 | 72.50 ± 1.74 | 72.50 ± 6.83 |
![]() ![]() MERaLiON 2 10B A*STAR | 60.46 ± 0.61 | 70.00 ± 1.69 | 70.00 ± 6.94 |
![]() ![]() Aya Expanse 32B CohereLabs | 60.29 ± 0.84 | 73.93 ± 1.41 | 73.93 ± 7.15 |
![]() ![]() Olmo 2 0325 32B AI2 | 59.66 ± 0.40 | 75.12 ± 0.96 | 75.12 ± 6.52 |
![]() ![]() Qwen 2.5 7B Alibaba | 58.57 ± 0.33 | 69.64 ± 1.29 | 69.64 ± 7.77 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 58.03 ± 1.75 | 74.64 ± 1.93 | 74.64 ± 7.24 |
![]() ![]() Llama 3.1 8B Meta | 57.14 ± 0.43 | 64.88 ± 1.24 | 64.88 ± 7.61 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 56.48 ± 0.67 | 68.33 ± 2.22 | 68.33 ± 6.88 |
![]() ![]() Sailor2 20B SAIL | 56.09 ± 0.35 | 40.60 ± 1.05 | 40.60 ± 8.22 |
![]() ![]() Aya Expanse 8B CohereLabs | 55.77 ± 0.29 | 63.33 ± 1.12 | 63.33 ± 8.33 |
![]() ![]() Sailor2 8B SAIL | 55.28 ± 0.22 | 38.45 ± 0.78 | 38.45 ± 8.56 |
![]() ![]() Tulu 3 8B AI2 | 54.99 ± 0.85 | 72.62 ± 1.49 | 72.62 ± 7.41 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 54.73 ± 3.70 | 64.52 ± 1.65 | 64.52 ± 6.82 |
![]() ![]() Llama 3 70B Meta | 53.99 ± 0.35 | 23.81 ± 1.50 | 23.81 ± 7.56 |
![]() ![]() Command R 08-2024 32B CohereLabs | 51.15 ± 1.97 | 64.29 ± 3.12 | 64.29 ± 7.08 |
![]() ![]() phi-4 14B Microsoft | 51.05 ± 3.48 | 62.26 ± 2.69 | 62.26 ± 7.00 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.95 ± 0.81 | 64.29 ± 2.06 | 64.29 ± 7.70 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 50.26 ± 0.53 | 52.14 ± 2.51 | 52.14 ± 7.81 |
![]() ![]() Llama 3 8B Meta | 48.13 ± 0.38 | 26.79 ± 1.39 | 26.79 ± 7.78 |
![]() ![]() Babel 9B Alibaba-DAMO | 45.27 ± 0.58 | 40.60 ± 1.54 | 40.60 ± 7.57 |
![]() ![]() Olmo 2 1124 7B AI2 | 43.61 ± 0.95 | 54.52 ± 1.72 | 54.52 ± 7.44 |
![]() ![]() Babel 83B Alibaba-DAMO | 40.21 ± 6.22 | 39.40 ± 3.38 | 39.40 ± 6.14 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 39.17 ± 2.51 | 49.88 ± 4.22 | 49.88 ± 6.11 |
![]() ![]() Ministral 2410 8B Mistral AI | 37.96 ± 2.25 | 28.45 ± 3.84 | 28.45 ± 5.83 |
Model | MS | Multi-Turn Chat | SEA-MT-Bench |
---|---|---|---|
![]() ![]() Qwen 3 30B MoE Alibaba | 71.55 ± 0.28 | 51.29 ± 1.19 | 51.29 ± 5.58 |
![]() ![]() SEA-LION v4 27B AISG | 71.31 ± 0.43 | 45.15 ± 1.48 | 45.15 ± 5.19 |
![]() ![]() Gemma 3 27B | 71.20 ± 0.34 | 46.23 ± 1.39 | 46.23 ± 5.33 |
![]() ![]() Qwen 3 32B Alibaba | 70.01 ± 0.23 | 41.65 ± 0.78 | 41.65 ± 5.32 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 69.82 ± 0.43 | 26.08 ± 2.10 | 26.08 ± 4.25 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.35 ± 0.34 | 32.54 ± 1.23 | 32.54 ± 5.05 |
![]() ![]() Gemma 3 12B | 68.68 ± 0.52 | 37.82 ± 2.11 | 37.82 ± 5.43 |
![]() ![]() Llama 3.3 70B Meta | 68.37 ± 0.23 | 17.08 ± 0.75 | 17.08 ± 4.40 |
![]() ![]() Tulu 3 70B AI2 | 67.83 ± 0.44 | 28.18 ± 1.42 | 28.18 ± 4.75 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 67.58 ± 0.22 | 21.44 ± 1.68 | 21.44 ± 4.87 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.34 ± 3.38 | 40.09 ± 2.52 | 40.09 ± 4.48 |
![]() ![]() Qwen 3 14B Alibaba | 66.09 ± 0.18 | 31.95 ± 1.18 | 31.95 ± 5.39 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 65.76 ± 0.37 | 25.48 ± 1.67 | 25.48 ± 4.42 |
![]() ![]() Llama 3.1 70B Meta | 65.44 ± 0.31 | 14.06 ± 1.06 | 14.06 ± 4.06 |
![]() ![]() Qwen 3 8B Alibaba | 64.72 ± 0.37 | 26.40 ± 1.06 | 26.40 ± 5.28 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.64 ± 0.18 | 19.50 ± 0.99 | 19.50 ± 4.50 |
![]() ![]() Gemma 2 27B | 64.42 ± 0.45 | 16.16 ± 1.18 | 16.16 ± 3.71 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 63.95 ± 1.33 | 21.77 ± 1.70 | 21.77 ± 4.29 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 62.70 ± 0.36 | 20.15 ± 1.63 | 20.15 ± 4.12 |
![]() ![]() Qwen 2.5 14B Alibaba | 62.32 ± 0.29 | 14.98 ± 0.89 | 14.98 ± 4.16 |
![]() ![]() Gemma 2 9B | 61.02 ± 0.45 | 12.07 ± 1.22 | 12.07 ± 3.63 |
![]() ![]() MERaLiON 2 10B A*STAR | 60.46 ± 0.61 | 11.96 ± 1.17 | 11.96 ± 3.49 |
![]() ![]() Aya Expanse 32B CohereLabs | 60.29 ± 0.84 | 18.97 ± 1.25 | 18.97 ± 4.32 |
![]() ![]() Olmo 2 0325 32B AI2 | 59.66 ± 0.40 | 11.37 ± 0.62 | 11.37 ± 3.53 |
![]() ![]() Qwen 2.5 7B Alibaba | 58.57 ± 0.33 | 14.60 ± 0.84 | 14.60 ± 3.86 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 58.03 ± 1.75 | 14.28 ± 1.20 | 14.28 ± 3.95 |
![]() ![]() Llama 3.1 8B Meta | 57.14 ± 0.43 | 10.02 ± 0.81 | 10.02 ± 3.48 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 56.48 ± 0.67 | 7.65 ± 0.74 | 7.65 ± 2.79 |
![]() ![]() Sailor2 20B SAIL | 56.09 ± 0.35 | 25.59 ± 1.25 | 25.59 ± 4.35 |
![]() ![]() Aya Expanse 8B CohereLabs | 55.77 ± 0.29 | 13.31 ± 0.59 | 13.31 ± 3.75 |
![]() ![]() Sailor2 8B SAIL | 55.28 ± 0.22 | 25.38 ± 1.10 | 25.38 ± 4.64 |
![]() ![]() Tulu 3 8B AI2 | 54.99 ± 0.85 | 12.66 ± 0.68 | 12.66 ± 3.79 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 54.73 ± 3.70 | 9.59 ± 0.84 | 9.59 ± 3.28 |
![]() ![]() Llama 3 70B Meta | 53.99 ± 0.35 | 8.78 ± 0.76 | 8.78 ± 3.26 |
![]() ![]() Command R 08-2024 32B CohereLabs | 51.15 ± 1.97 | 5.28 ± 0.50 | 5.28 ± 2.79 |
![]() ![]() phi-4 14B Microsoft | 51.05 ± 3.48 | 15.14 ± 1.27 | 15.14 ± 4.14 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.95 ± 0.81 | 6.73 ± 0.83 | 6.73 ± 2.93 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 50.26 ± 0.53 | 9.11 ± 0.98 | 9.11 ± 3.10 |
![]() ![]() Llama 3 8B Meta | 48.13 ± 0.38 | 5.33 ± 0.75 | 5.33 ± 2.46 |
![]() ![]() Babel 9B Alibaba-DAMO | 45.27 ± 0.58 | 3.72 ± 0.53 | 3.72 ± 1.91 |
![]() ![]() Olmo 2 1124 7B AI2 | 43.61 ± 0.95 | 4.09 ± 0.50 | 4.09 ± 2.42 |
![]() ![]() Babel 83B Alibaba-DAMO | 40.21 ± 6.22 | 7.44 ± 1.21 | 7.44 ± 2.45 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 39.17 ± 2.51 | 1.99 ± 0.48 | 1.99 ± 1.36 |
![]() ![]() Ministral 2410 8B Mistral AI | 37.96 ± 2.25 | 3.88 ± 0.70 | 3.88 ± 2.28 |
Model | MS | NLG | Translations |
---|---|---|---|
![]() ![]() Qwen 3 30B MoE Alibaba | 71.55 ± 0.28 | 90.08 ± 0.04 | 90.08 ± 0.04 |
![]() ![]() SEA-LION v4 27B AISG | 71.31 ± 0.43 | 91.71 ± 0.03 | 91.71 ± 0.03 |
![]() ![]() Gemma 3 27B | 71.20 ± 0.34 | 91.83 ± 0.04 | 91.83 ± 0.04 |
![]() ![]() Qwen 3 32B Alibaba | 70.01 ± 0.23 | 88.96 ± 0.11 | 88.96 ± 0.11 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 69.82 ± 0.43 | 91.01 ± 0.10 | 91.01 ± 0.10 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.35 ± 0.34 | 89.59 ± 0.07 | 89.59 ± 0.07 |
![]() ![]() Gemma 3 12B | 68.68 ± 0.52 | 91.01 ± 0.05 | 91.01 ± 0.05 |
![]() ![]() Llama 3.3 70B Meta | 68.37 ± 0.23 | 90.09 ± 0.04 | 90.09 ± 0.04 |
![]() ![]() Tulu 3 70B AI2 | 67.83 ± 0.44 | 91.07 ± 0.15 | 91.07 ± 0.15 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 67.58 ± 0.22 | 91.02 ± 0.04 | 91.02 ± 0.04 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.34 ± 3.38 | 87.18 ± 0.40 | 87.18 ± 0.40 |
![]() ![]() Qwen 3 14B Alibaba | 66.09 ± 0.18 | 87.44 ± 0.33 | 87.44 ± 0.33 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 65.76 ± 0.37 | 90.36 ± 0.14 | 90.36 ± 0.14 |
![]() ![]() Llama 3.1 70B Meta | 65.44 ± 0.31 | 90.35 ± 0.05 | 90.35 ± 0.05 |
![]() ![]() Qwen 3 8B Alibaba | 64.72 ± 0.37 | 87.17 ± 0.07 | 87.17 ± 0.07 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.64 ± 0.18 | 86.02 ± 0.36 | 86.02 ± 0.36 |
![]() ![]() Gemma 2 27B | 64.42 ± 0.45 | 90.47 ± 0.18 | 90.47 ± 0.18 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 63.95 ± 1.33 | 89.19 ± 0.12 | 89.19 ± 0.12 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 62.70 ± 0.36 | 89.47 ± 0.10 | 89.47 ± 0.10 |
![]() ![]() Qwen 2.5 14B Alibaba | 62.32 ± 0.29 | 85.32 ± 0.08 | 85.32 ± 0.08 |
![]() ![]() Gemma 2 9B | 61.02 ± 0.45 | 89.10 ± 0.20 | 89.10 ± 0.20 |
![]() ![]() MERaLiON 2 10B A*STAR | 60.46 ± 0.61 | 88.13 ± 0.90 | 88.13 ± 0.90 |
![]() ![]() Aya Expanse 32B CohereLabs | 60.29 ± 0.84 | 86.72 ± 0.19 | 86.72 ± 0.19 |
![]() ![]() Olmo 2 0325 32B AI2 | 59.66 ± 0.40 | 82.26 ± 0.49 | 82.26 ± 0.49 |
![]() ![]() Qwen 2.5 7B Alibaba | 58.57 ± 0.33 | 81.40 ± 0.09 | 81.40 ± 0.09 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 58.03 ± 1.75 | 88.05 ± 0.12 | 88.05 ± 0.12 |
![]() ![]() Llama 3.1 8B Meta | 57.14 ± 0.43 | 87.27 ± 0.05 | 87.27 ± 0.05 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 56.48 ± 0.67 | 84.02 ± 0.51 | 84.02 ± 0.51 |
![]() ![]() Sailor2 20B SAIL | 56.09 ± 0.35 | 76.02 ± 2.33 | 76.02 ± 2.33 |
![]() ![]() Aya Expanse 8B CohereLabs | 55.77 ± 0.29 | 81.72 ± 0.27 | 81.72 ± 0.27 |
![]() ![]() Sailor2 8B SAIL | 55.28 ± 0.22 | 85.22 ± 0.45 | 85.22 ± 0.45 |
![]() ![]() Tulu 3 8B AI2 | 54.99 ± 0.85 | 87.27 ± 0.21 | 87.27 ± 0.21 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 54.73 ± 3.70 | 82.05 ± 0.79 | 82.05 ± 0.79 |
![]() ![]() Llama 3 70B Meta | 53.99 ± 0.35 | 89.73 ± 0.06 | 89.73 ± 0.06 |
![]() ![]() Command R 08-2024 32B CohereLabs | 51.15 ± 1.97 | 81.28 ± 0.70 | 81.28 ± 0.70 |
![]() ![]() phi-4 14B Microsoft | 51.05 ± 3.48 | 72.86 ± 0.85 | 72.86 ± 0.85 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.95 ± 0.81 | 71.48 ± 1.76 | 71.48 ± 1.76 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 50.26 ± 0.53 | 82.06 ± 0.95 | 82.06 ± 0.95 |
![]() ![]() Llama 3 8B Meta | 48.13 ± 0.38 | 84.93 ± 0.47 | 84.93 ± 0.47 |
![]() ![]() Babel 9B Alibaba-DAMO | 45.27 ± 0.58 | 72.57 ± 4.47 | 72.57 ± 4.47 |
![]() ![]() Olmo 2 1124 7B AI2 | 43.61 ± 0.95 | 59.42 ± 0.94 | 59.42 ± 0.94 |
![]() ![]() Babel 83B Alibaba-DAMO | 40.21 ± 6.22 | 66.76 ± 3.49 | 66.76 ± 3.49 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 39.17 ± 2.51 | 46.74 ± 4.48 | 46.74 ± 4.48 |
![]() ![]() Ministral 2410 8B Mistral AI | 37.96 ± 2.25 | 57.35 ± 3.64 | 57.35 ± 3.64 |
Model | MS | NLU | Belebele QA | Sentiment Analysis |
---|---|---|---|---|
![]() ![]() Qwen 3 30B MoE Alibaba | 71.55 ± 0.28 | 77.73 ± 0.09 | 90.04 ± 1.94 | 65.43 ± 2.66 |
![]() ![]() SEA-LION v4 27B AISG | 71.31 ± 0.43 | 79.00 ± 0.23 | 91.02 ± 1.79 | 66.98 ± 2.62 |
![]() ![]() Gemma 3 27B | 71.20 ± 0.34 | 78.85 ± 0.18 | 90.70 ± 1.86 | 67.00 ± 2.63 |
![]() ![]() Qwen 3 32B Alibaba | 70.01 ± 0.23 | 77.20 ± 0.37 | 91.40 ± 1.79 | 63.01 ± 2.60 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 69.82 ± 0.43 | 79.27 ± 0.22 | 92.61 ± 1.59 | 65.94 ± 2.46 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.35 ± 0.34 | 78.96 ± 0.24 | 92.42 ± 1.68 | 65.50 ± 2.61 |
![]() ![]() Gemma 3 12B | 68.68 ± 0.52 | 77.66 ± 0.12 | 88.91 ± 2.02 | 66.42 ± 2.64 |
![]() ![]() Llama 3.3 70B Meta | 68.37 ± 0.23 | 79.37 ± 0.03 | 92.89 ± 1.66 | 65.84 ± 2.62 |
![]() ![]() Tulu 3 70B AI2 | 67.83 ± 0.44 | 77.87 ± 1.03 | 91.94 ± 1.70 | 63.79 ± 2.37 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 67.58 ± 0.22 | 78.59 ± 0.05 | 91.79 ± 1.80 | 65.39 ± 2.66 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.34 ± 3.38 | 64.00 ± 9.41 | 61.66 ± 2.04 | 66.33 ± 2.40 |
![]() ![]() Qwen 3 14B Alibaba | 66.09 ± 0.18 | 76.98 ± 0.32 | 90.00 ± 1.92 | 63.97 ± 2.65 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 65.76 ± 0.37 | 76.18 ± 0.90 | 90.38 ± 1.83 | 61.99 ± 2.55 |
![]() ![]() Llama 3.1 70B Meta | 65.44 ± 0.31 | 79.00 ± 0.23 | 92.58 ± 1.58 | 65.42 ± 2.50 |
![]() ![]() Qwen 3 8B Alibaba | 64.72 ± 0.37 | 74.66 ± 0.19 | 85.25 ± 2.27 | 64.06 ± 2.62 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.64 ± 0.18 | 77.84 ± 0.15 | 92.71 ± 1.68 | 62.97 ± 2.67 |
![]() ![]() Gemma 2 27B | 64.42 ± 0.45 | 78.25 ± 0.29 | 90.96 ± 1.73 | 65.54 ± 2.54 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 63.95 ± 1.33 | 68.23 ± 7.67 | 90.81 ± 1.68 | 45.65 ± 1.90 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 62.70 ± 0.36 | 72.57 ± 0.56 | 83.78 ± 2.26 | 61.35 ± 2.50 |
![]() ![]() Qwen 2.5 14B Alibaba | 62.32 ± 0.29 | 76.21 ± 0.30 | 89.20 ± 1.98 | 63.21 ± 2.67 |
![]() ![]() Gemma 2 9B | 61.02 ± 0.45 | 74.03 ± 3.05 | 89.66 ± 1.90 | 58.39 ± 2.45 |
![]() ![]() MERaLiON 2 10B A*STAR | 60.46 ± 0.61 | 74.28 ± 1.96 | 87.37 ± 2.03 | 61.18 ± 2.43 |
![]() ![]() Aya Expanse 32B CohereLabs | 60.29 ± 0.84 | 64.51 ± 4.13 | 84.15 ± 2.22 | 44.88 ± 2.43 |
![]() ![]() Olmo 2 0325 32B AI2 | 59.66 ± 0.40 | 73.11 ± 1.43 | 83.69 ± 2.28 | 62.53 ± 2.16 |
![]() ![]() Qwen 2.5 7B Alibaba | 58.57 ± 0.33 | 69.81 ± 0.21 | 81.22 ± 2.53 | 58.40 ± 2.74 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 58.03 ± 1.75 | 64.93 ± 4.15 | 70.92 ± 2.30 | 58.95 ± 2.63 |
![]() ![]() Llama 3.1 8B Meta | 57.14 ± 0.43 | 71.72 ± 0.41 | 81.72 ± 2.38 | 61.72 ± 2.56 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 56.48 ± 0.67 | 73.93 ± 1.33 | 84.05 ± 2.07 | 63.81 ± 2.27 |
![]() ![]() Sailor2 20B SAIL | 56.09 ± 0.35 | 76.26 ± 1.81 | 89.68 ± 1.97 | 62.83 ± 2.54 |
![]() ![]() Aya Expanse 8B CohereLabs | 55.77 ± 0.29 | 69.25 ± 1.44 | 76.01 ± 2.61 | 62.49 ± 2.60 |
![]() ![]() Sailor2 8B SAIL | 55.28 ± 0.22 | 66.50 ± 1.01 | 81.30 ± 2.44 | 51.70 ± 2.65 |
![]() ![]() Tulu 3 8B AI2 | 54.99 ± 0.85 | 67.46 ± 0.82 | 77.00 ± 2.62 | 57.92 ± 2.51 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 54.73 ± 3.70 | 70.11 ± 5.61 | 89.66 ± 1.85 | 50.56 ± 2.00 |
![]() ![]() Llama 3 70B Meta | 53.99 ± 0.35 | 77.00 ± 0.15 | 90.53 ± 1.89 | 63.48 ± 2.66 |
![]() ![]() Command R 08-2024 32B CohereLabs | 51.15 ± 1.97 | 48.22 ± 8.56 | 74.82 ± 2.30 | 21.63 ± 1.21 |
![]() ![]() phi-4 14B Microsoft | 51.05 ± 3.48 | 56.96 ± 9.10 | 52.32 ± 2.11 | 61.59 ± 2.47 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.95 ± 0.81 | 63.03 ± 2.04 | 70.66 ± 2.69 | 55.41 ± 2.32 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 50.26 ± 0.53 | 58.10 ± 3.47 | 71.37 ± 2.23 | 44.82 ± 2.15 |
![]() ![]() Llama 3 8B Meta | 48.13 ± 0.38 | 67.70 ± 0.30 | 73.80 ± 2.75 | 61.59 ± 2.61 |
![]() ![]() Babel 9B Alibaba-DAMO | 45.27 ± 0.58 | 55.27 ± 3.56 | 72.50 ± 2.25 | 38.04 ± 2.15 |
![]() ![]() Olmo 2 1124 7B AI2 | 43.61 ± 0.95 | 51.55 ± 3.59 | 57.56 ± 2.53 | 45.55 ± 2.25 |
![]() ![]() Babel 83B Alibaba-DAMO | 40.21 ± 6.22 | 45.37 ± 12.71 | 48.84 ± 1.25 | 41.91 ± 1.54 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 39.17 ± 2.51 | 61.77 ± 1.85 | 68.38 ± 2.61 | 55.16 ± 2.31 |
![]() ![]() Ministral 2410 8B Mistral AI | 37.96 ± 2.25 | 46.46 ± 9.41 | 63.02 ± 2.37 | 29.90 ± 1.14 |
Model | MS | Safety | Toxicity Detection |
---|---|---|---|
![]() ![]() Qwen 3 30B MoE Alibaba | 71.55 ± 0.28 | 56.50 ± 0.12 | 56.50 ± 3.06 |
![]() ![]() SEA-LION v4 27B AISG | 71.31 ± 0.43 | 57.22 ± 0.14 | 57.23 ± 3.00 |
![]() ![]() Gemma 3 27B | 71.20 ± 0.34 | 56.95 ± 0.10 | 56.95 ± 3.02 |
![]() ![]() Qwen 3 32B Alibaba | 70.01 ± 0.23 | 56.49 ± 0.51 | 56.49 ± 3.00 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 69.82 ± 0.43 | 56.71 ± 0.64 | 56.71 ± 2.90 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.35 ± 0.34 | 57.74 ± 0.30 | 57.74 ± 2.99 |
![]() ![]() Gemma 3 12B | 68.68 ± 0.52 | 55.84 ± 0.10 | 55.84 ± 3.04 |
![]() ![]() Llama 3.3 70B Meta | 68.37 ± 0.23 | 58.76 ± 0.21 | 58.76 ± 3.01 |
![]() ![]() Tulu 3 70B AI2 | 67.83 ± 0.44 | 55.60 ± 0.52 | 55.60 ± 2.96 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 67.58 ± 0.22 | 56.95 ± 0.14 | 56.95 ± 3.05 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.34 ± 3.38 | 48.20 ± 12.85 | 48.20 ± 2.52 |
![]() ![]() Qwen 3 14B Alibaba | 66.09 ± 0.18 | 56.38 ± 0.12 | 56.38 ± 3.03 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 65.76 ± 0.37 | 56.08 ± 0.12 | 56.07 ± 2.90 |
![]() ![]() Llama 3.1 70B Meta | 65.44 ± 0.31 | 57.88 ± 0.37 | 57.88 ± 2.94 |
![]() ![]() Qwen 3 8B Alibaba | 64.72 ± 0.37 | 57.75 ± 0.25 | 57.75 ± 3.01 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.64 ± 0.18 | 54.88 ± 0.10 | 54.87 ± 3.01 |
![]() ![]() Gemma 2 27B | 64.42 ± 0.45 | 57.02 ± 0.59 | 57.03 ± 2.85 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 63.95 ± 1.33 | 57.26 ± 0.61 | 57.26 ± 2.73 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 62.70 ± 0.36 | 57.62 ± 0.37 | 57.63 ± 2.95 |
![]() ![]() Qwen 2.5 14B Alibaba | 62.32 ± 0.29 | 57.82 ± 0.10 | 57.83 ± 3.03 |
![]() ![]() Gemma 2 9B | 61.02 ± 0.45 | 55.43 ± 0.31 | 55.43 ± 2.96 |
![]() ![]() MERaLiON 2 10B A*STAR | 60.46 ± 0.61 | 57.17 ± 0.18 | 57.17 ± 2.93 |
![]() ![]() Aya Expanse 32B CohereLabs | 60.29 ± 0.84 | 56.59 ± 0.23 | 56.59 ± 2.99 |
![]() ![]() Olmo 2 0325 32B AI2 | 59.66 ± 0.40 | 56.29 ± 0.48 | 56.29 ± 3.00 |
![]() ![]() Qwen 2.5 7B Alibaba | 58.57 ± 0.33 | 56.87 ± 0.06 | 56.88 ± 3.04 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 58.03 ± 1.75 | 57.38 ± 0.34 | 57.38 ± 3.03 |
![]() ![]() Llama 3.1 8B Meta | 57.14 ± 0.43 | 57.45 ± 0.33 | 57.45 ± 2.98 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 56.48 ± 0.67 | 52.14 ± 0.66 | 52.14 ± 2.99 |
![]() ![]() Sailor2 20B SAIL | 56.09 ± 0.35 | 56.40 ± 0.04 | 56.40 ± 3.06 |
![]() ![]() Aya Expanse 8B CohereLabs | 55.77 ± 0.29 | 56.21 ± 0.29 | 56.21 ± 2.98 |
![]() ![]() Sailor2 8B SAIL | 55.28 ± 0.22 | 59.84 ± 0.47 | 59.84 ± 2.94 |
![]() ![]() Tulu 3 8B AI2 | 54.99 ± 0.85 | 43.90 ± 5.96 | 43.90 ± 2.70 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 54.73 ± 3.70 | 36.52 ± 17.16 | 36.52 ± 1.83 |
![]() ![]() Llama 3 70B Meta | 53.99 ± 0.35 | 54.76 ± 0.29 | 54.76 ± 3.04 |
![]() ![]() Command R 08-2024 32B CohereLabs | 51.15 ± 1.97 | 55.65 ± 1.22 | 55.65 ± 2.64 |
![]() ![]() phi-4 14B Microsoft | 51.05 ± 3.48 | 37.90 ± 15.27 | 37.90 ± 2.18 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.95 ± 0.81 | 57.10 ± 0.69 | 57.10 ± 2.92 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 50.26 ± 0.53 | 54.54 ± 1.56 | 54.54 ± 2.65 |
![]() ![]() Llama 3 8B Meta | 48.13 ± 0.38 | 54.20 ± 0.19 | 54.20 ± 3.06 |
![]() ![]() Babel 9B Alibaba-DAMO | 45.27 ± 0.58 | 54.69 ± 1.18 | 54.69 ± 2.52 |
![]() ![]() Olmo 2 1124 7B AI2 | 43.61 ± 0.95 | 53.39 ± 1.54 | 53.39 ± 2.85 |
![]() ![]() Babel 83B Alibaba-DAMO | 40.21 ± 6.22 | 31.11 ± 17.44 | 31.11 ± 1.45 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 39.17 ± 2.51 | 32.00 ± 9.79 | 32.00 ± 2.12 |
![]() ![]() Ministral 2410 8B Mistral AI | 37.96 ± 2.25 | 53.80 ± 0.99 | 53.80 ± 2.64 |
Model | MS | Knowledge | Global MMLU Lite |
---|---|---|---|
![]() ![]() Qwen 3 30B MoE Alibaba | 71.55 ± 0.28 | 71.88 ± 0.19 | 71.88 ± 0.19 |
![]() ![]() SEA-LION v4 27B AISG | 71.31 ± 0.43 | 71.53 ± 0.39 | 71.53 ± 0.39 |
![]() ![]() Gemma 3 27B | 71.20 ± 0.34 | 71.81 ± 0.43 | 71.81 ± 0.43 |
![]() ![]() Qwen 3 32B Alibaba | 70.01 ± 0.23 | 75.03 ± 0.31 | 75.03 ± 0.31 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 69.82 ± 0.43 | 80.03 ± 0.81 | 80.03 ± 0.81 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.35 ± 0.34 | 77.63 ± 0.29 | 77.63 ± 0.29 |
![]() ![]() Gemma 3 12B | 68.68 ± 0.52 | 68.22 ± 0.22 | 68.22 ± 0.22 |
![]() ![]() Llama 3.3 70B Meta | 68.37 ± 0.23 | 78.25 ± 0.26 | 78.25 ± 0.26 |
![]() ![]() Tulu 3 70B AI2 | 67.83 ± 0.44 | 74.50 ± 0.82 | 74.50 ± 0.82 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 67.58 ± 0.22 | 72.97 ± 0.22 | 72.97 ± 0.22 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.34 ± 3.38 | 73.72 ± 0.83 | 73.72 ± 0.83 |
![]() ![]() Qwen 3 14B Alibaba | 66.09 ± 0.18 | 65.31 ± 0.71 | 65.31 ± 0.71 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 65.76 ± 0.37 | 65.03 ± 1.08 | 65.03 ± 1.08 |
![]() ![]() Llama 3.1 70B Meta | 65.44 ± 0.31 | 76.69 ± 0.55 | 76.69 ± 0.55 |
![]() ![]() Qwen 3 8B Alibaba | 64.72 ± 0.37 | 63.19 ± 0.48 | 63.19 ± 0.48 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.64 ± 0.18 | 70.69 ± 0.50 | 70.69 ± 0.50 |
![]() ![]() Gemma 2 27B | 64.42 ± 0.45 | 70.56 ± 2.29 | 70.56 ± 2.29 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 63.95 ± 1.33 | 68.09 ± 0.97 | 68.09 ± 0.97 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 62.70 ± 0.36 | 57.69 ± 1.05 | 57.69 ± 1.05 |
![]() ![]() Qwen 2.5 14B Alibaba | 62.32 ± 0.29 | 64.72 ± 0.27 | 64.72 ± 0.27 |
![]() ![]() Gemma 2 9B | 61.02 ± 0.45 | 63.00 ± 0.82 | 63.00 ± 0.82 |
![]() ![]() MERaLiON 2 10B A*STAR | 60.46 ± 0.61 | 61.22 ± 1.44 | 61.22 ± 1.44 |
![]() ![]() Aya Expanse 32B CohereLabs | 60.29 ± 0.84 | 61.03 ± 0.45 | 61.03 ± 0.45 |
![]() ![]() Olmo 2 0325 32B AI2 | 59.66 ± 0.40 | 59.81 ± 1.11 | 59.81 ± 1.11 |
![]() ![]() Qwen 2.5 7B Alibaba | 58.57 ± 0.33 | 59.09 ± 0.32 | 59.09 ± 0.32 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 58.03 ± 1.75 | 48.91 ± 6.07 | 48.91 ± 6.07 |
![]() ![]() Llama 3.1 8B Meta | 57.14 ± 0.43 | 51.53 ± 1.19 | 51.53 ± 1.19 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 56.48 ± 0.67 | 52.81 ± 2.32 | 52.81 ± 2.32 |
![]() ![]() Sailor2 20B SAIL | 56.09 ± 0.35 | 61.69 ± 1.07 | 61.69 ± 1.07 |
![]() ![]() Aya Expanse 8B CohereLabs | 55.77 ± 0.29 | 50.81 ± 0.64 | 50.81 ± 0.64 |
![]() ![]() Sailor2 8B SAIL | 55.28 ± 0.22 | 56.28 ± 0.30 | 56.28 ± 0.30 |
![]() ![]() Tulu 3 8B AI2 | 54.99 ± 0.85 | 46.06 ± 0.67 | 46.06 ± 0.67 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 54.73 ± 3.70 | 65.59 ± 0.56 | 65.59 ± 0.56 |
![]() ![]() Llama 3 70B Meta | 53.99 ± 0.35 | 69.88 ± 0.36 | 69.88 ± 0.36 |
![]() ![]() Command R 08-2024 32B CohereLabs | 51.15 ± 1.97 | 52.19 ± 1.55 | 52.19 ± 1.55 |
![]() ![]() phi-4 14B Microsoft | 51.05 ± 3.48 | 61.19 ± 0.96 | 61.19 ± 0.96 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.95 ± 0.81 | 43.09 ± 0.78 | 43.09 ± 0.78 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 50.26 ± 0.53 | 45.59 ± 2.12 | 45.59 ± 2.12 |
![]() ![]() Llama 3 8B Meta | 48.13 ± 0.38 | 49.84 ± 1.08 | 49.84 ± 1.08 |
![]() ![]() Babel 9B Alibaba-DAMO | 45.27 ± 0.58 | 44.78 ± 4.42 | 44.78 ± 4.42 |
![]() ![]() Olmo 2 1124 7B AI2 | 43.61 ± 0.95 | 38.66 ± 2.11 | 38.66 ± 2.11 |
![]() ![]() Babel 83B Alibaba-DAMO | 40.21 ± 6.22 | 51.19 ± 14.69 | 51.19 ± 14.69 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 39.17 ± 2.51 | 42.63 ± 6.65 | 42.63 ± 6.65 |
![]() ![]() Ministral 2410 8B Mistral AI | 37.96 ± 2.25 | 37.84 ± 5.58 | 37.84 ± 5.58 |