Burmese Performance
Burmese Scores by Model
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 27B 57.78±0.43 |
![]() ![]() 27B 57.18±0.42 |
![]() ![]() 109B MoE 54.76±0.23 |
![]() ![]() 12B 52.82±0.22 |
![]() ![]() 32B 43.03±0.63 |
![]() ![]() 70B 40.52±0.30 |
![]() ![]() 70B 40.16±2.57 |
![]() ![]() 27B 38.95±6.64 |
![]() ![]() 14B 35.03±0.51 |
![]() ![]() 72B 33.37±0.40 |
![]() ![]() 32B 32.15±0.32 |
![]() ![]() 123B 31.23±2.11 |
![]() ![]() 8B 30.49±0.76 |
![]() ![]() 8B 27.35±1.82 |
![]() ![]() 21B MoE 27.33±1.13 |
![]() ![]() 30B MoE 25.88±0.27 |
![]() ![]() 111B 25.41±2.48 |
![]() ![]() 70B 24.14±3.01 |
![]() ![]() 70B 23.82±0.59 |
![]() ![]() 9B 21.69±3.72 |
![]() ![]() 70B 21.48±2.59 |
![]() ![]() 14B 21.05±0.48 |
![]() ![]() 10B 20.20±3.33 |
![]() ![]() 83B 19.42±6.02 |
![]() ![]() 8B 18.87±0.59 |
![]() ![]() 32B 16.82±1.67 |
![]() ![]() 9B 16.18±3.65 |
![]() ![]() 8B 15.82±2.50 |
![]() ![]() 8B 14.92±1.18 |
![]() ![]() 32B 13.44±1.87 |
![]() ![]() 104B 13.26±2.31 |
![]() ![]() 8B 13.22±2.97 |
![]() ![]() 9B 12.14±2.23 |
![]() ![]() 7B 12.01±1.50 |
![]() ![]() 7B 9.97±1.81 |
![]() ![]() 14B 9.69±1.63 |
![]() ![]() 20B 8.56±0.30 |
![]() ![]() 32B 7.17±1.41 |
![]() ![]() 8B 5.91±0.45 |
![]() ![]() 24B 4.57±2.19 |
![]() ![]() 7B 3.87±1.49 |
![]() ![]() 8B 2.93±0.24 |
![]() ![]() 13B 2.40±0.73 |
![]() ![]() 7B 2.28±0.35 |
Burmese Competencies
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MY | Instruction Following | Multi-Turn Chat | NLG | NLR | NLU | Safety | Knowledge |
---|---|---|---|---|---|---|---|---|
![]() ![]() Gemma 3 27B | 57.78 ± 0.43 | 59.40 ± 1.65 | 34.27 ± 2.08 | 40.81 ± 0.13 | 73.83 ± 0.15 | 70.97 ± 0.26 | 67.09 ± 0.18 | 58.06 ± 0.44 |
![]() ![]() SEA-LION v4 27B AISG | 57.18 ± 0.42 | 58.57 ± 0.93 | 32.60 ± 2.05 | 39.87 ± 1.07 | 73.65 ± 0.26 | 70.91 ± 0.46 | 66.91 ± 0.70 | 57.75 ± 1.21 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 54.76 ± 0.23 | 61.43 ± 1.45 | 12.82 ± 1.40 | 44.39 ± 0.22 | 69.78 ± 0.34 | 62.00 ± 2.60 | 72.69 ± 0.30 | 60.22 ± 0.27 |
![]() ![]() Gemma 3 12B | 52.82 ± 0.22 | 53.69 ± 1.17 | 25.22 ± 1.07 | 37.47 ± 0.51 | 61.48 ± 0.30 | 70.73 ± 0.19 | 70.13 ± 0.09 | 51.03 ± 0.39 |
![]() ![]() Qwen 3 32B Alibaba | 43.03 ± 0.63 | 62.38 ± 1.90 | 13.31 ± 0.59 | 25.10 ± 4.22 | 74.28 ± 0.90 | 71.03 ± 0.55 | 0.00 ± 0.00 | 55.13 ± 1.15 |
![]() ![]() Tulu 3 70B AI2 | 40.52 ± 0.30 | 51.55 ± 1.29 | 7.54 ± 0.50 | 36.47 ± 0.19 | 70.57 ± 0.70 | 65.56 ± 1.65 | 0.00 ± 0.00 | 51.94 ± 1.50 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 40.16 ± 2.57 | 67.98 ± 1.54 | 11.21 ± 0.45 | 34.19 ± 1.92 | 70.26 ± 3.25 | 48.02 ± 9.28 | 0.00 ± 0.00 | 49.47 ± 10.51 |
![]() ![]() Gemma 2 27B | 38.95 ± 6.64 | 45.24 ± 1.50 | 5.66 ± 0.52 | 27.78 ± 0.49 | 58.91 ± 6.31 | 47.95 ± 19.25 | 47.41 ± 13.98 | 39.69 ± 8.80 |
![]() ![]() Qwen 3 14B Alibaba | 35.03 ± 0.51 | 44.17 ± 1.49 | 8.78 ± 0.92 | 28.36 ± 2.09 | 55.60 ± 0.54 | 64.94 ± 0.55 | 0.00 ± 0.00 | 43.34 ± 1.08 |
![]() ![]() Qwen 2.5 72B Alibaba | 33.37 ± 0.40 | 47.86 ± 2.25 | 8.89 ± 0.89 | 25.92 ± 0.60 | 47.19 ± 0.55 | 60.74 ± 0.61 | 0.00 ± 0.00 | 43.00 ± 0.80 |
![]() ![]() Qwen 2.5 32B Alibaba | 32.15 ± 0.32 | 48.21 ± 1.36 | 6.47 ± 0.64 | 22.94 ± 0.11 | 46.97 ± 1.21 | 56.00 ± 0.43 | 0.00 ± 0.00 | 44.47 ± 1.14 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 31.23 ± 2.11 | 47.74 ± 2.70 | 5.77 ± 0.71 | 29.92 ± 2.45 | 47.10 ± 7.85 | 56.63 ± 4.47 | 0.03 ± 0.06 | 31.44 ± 7.92 |
![]() ![]() Qwen 3 8B Alibaba | 30.49 ± 0.76 | 37.02 ± 1.39 | 9.05 ± 0.68 | 26.37 ± 0.13 | 50.47 ± 0.51 | 55.50 ± 1.13 | 0.00 ± 0.00 | 35.03 ± 2.83 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 27.35 ± 1.82 | 47.14 ± 2.23 | 4.47 ± 0.73 | 7.38 ± 1.10 | 48.79 ± 6.61 | 50.43 ± 6.37 | 0.00 ± 0.00 | 33.22 ± 2.80 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 27.33 ± 1.13 | 45.24 ± 1.45 | 12.88 ± 0.90 | 37.42 ± 0.64 | 15.47 ± 2.17 | 8.23 ± 2.87 | 34.72 ± 6.00 | 37.38 ± 2.03 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.88 ± 0.27 | 50.36 ± 1.29 | 13.47 ± 0.38 | 17.50 ± 0.27 | 34.39 ± 0.94 | 65.42 ± 0.73 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command A 03-2025 111B CohereLabs | 25.41 ± 2.48 | 46.43 ± 2.36 | 6.57 ± 0.59 | 19.15 ± 2.42 | 35.97 ± 9.17 | 46.18 ± 6.78 | 0.72 ± 1.41 | 22.84 ± 8.27 |
![]() ![]() Llama 3.1 70B Meta | 24.14 ± 3.01 | 45.00 ± 2.84 | 5.87 ± 1.09 | 33.58 ± 1.50 | 37.30 ± 8.74 | 26.89 ± 6.55 | 0.00 ± 0.00 | 20.38 ± 8.53 |
![]() ![]() Llama 3.3 70B Meta | 23.82 ± 0.59 | 56.55 ± 1.93 | 7.06 ± 0.60 | 34.59 ± 0.35 | 30.81 ± 0.21 | 19.44 ± 3.29 | 0.00 ± 0.00 | 18.28 ± 2.37 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 21.69 ± 3.72 | 47.62 ± 1.27 | 8.78 ± 0.90 | 28.81 ± 0.48 | 33.97 ± 7.68 | 18.21 ± 12.92 | 12.44 ± 12.56 | 2.03 ± 1.55 |
![]() ![]() Llama 3 70B Meta | 21.48 ± 2.59 | 10.36 ± 1.29 | 5.44 ± 0.83 | 24.96 ± 0.37 | 61.85 ± 0.40 | 24.31 ± 11.58 | 0.00 ± 0.00 | 23.47 ± 5.43 |
![]() ![]() Qwen 2.5 14B Alibaba | 21.05 ± 0.48 | 41.43 ± 2.09 | 4.96 ± 0.58 | 19.06 ± 0.20 | 42.07 ± 1.36 | 33.61 ± 2.42 | 1.13 ± 1.04 | 5.09 ± 0.84 |
![]() ![]() MERaLiON 2 10B A*STAR | 20.20 ± 3.33 | 34.88 ± 1.22 | 3.39 ± 0.34 | 20.55 ± 0.68 | 23.72 ± 1.27 | 24.28 ± 9.35 | 25.72 ± 13.13 | 8.84 ± 5.12 |
![]() ![]() Babel 83B Alibaba-DAMO | 19.42 ± 6.02 | 22.62 ± 1.61 | 1.19 ± 0.55 | 6.63 ± 1.37 | 33.70 ± 15.15 | 42.67 ± 13.59 | 0.13 ± 0.24 | 29.03 ± 13.98 |
![]() ![]() Tulu 3 8B AI2 | 18.87 ± 0.59 | 36.67 ± 1.69 | 1.24 ± 0.52 | 21.32 ± 0.28 | 23.56 ± 3.20 | 30.28 ± 2.22 | 0.00 ± 0.00 | 19.03 ± 1.95 |
![]() ![]() Aya Expanse 32B CohereLabs | 16.82 ± 1.67 | 27.86 ± 2.28 | 1.40 ± 0.47 | 10.34 ± 0.65 | 40.15 ± 3.09 | 12.43 ± 6.07 | 7.53 ± 8.40 | 18.00 ± 3.96 |
![]() ![]() Gemma 2 9B | 16.18 ± 3.65 | 32.26 ± 1.67 | 3.29 ± 0.57 | 20.17 ± 0.72 | 27.69 ± 9.05 | 13.29 ± 8.07 | 9.06 ± 9.30 | 7.47 ± 3.43 |
![]() ![]() Sailor2 8B SAIL | 15.82 ± 2.50 | 24.52 ± 1.61 | 9.64 ± 0.86 | 24.41 ± 0.61 | 34.24 ± 3.46 | 15.11 ± 12.05 | 1.06 ± 1.37 | 1.78 ± 1.50 |
![]() ![]() Llama 3.1 8B Meta | 14.92 ± 1.18 | 25.48 ± 2.70 | 2.42 ± 0.66 | 14.03 ± 0.81 | 30.37 ± 5.16 | 16.35 ± 1.20 | 0.00 ± 0.00 | 15.78 ± 2.76 |
![]() ![]() Command R 08-2024 32B CohereLabs | 13.44 ± 1.87 | 16.31 ± 1.39 | 0.27 ± 0.32 | 6.68 ± 0.79 | 33.85 ± 8.45 | 13.77 ± 3.71 | 10.63 ± 11.61 | 12.56 ± 5.57 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 13.26 ± 2.31 | 21.90 ± 2.34 | 0.75 ± 0.38 | 14.35 ± 1.64 | 19.08 ± 9.54 | 19.75 ± 6.27 | 6.53 ± 8.38 | 10.47 ± 6.04 |
![]() ![]() Ministral 2410 8B Mistral AI | 13.22 ± 2.97 | 9.64 ± 2.40 | 0.92 ± 0.49 | 14.13 ± 1.27 | 24.47 ± 8.89 | 17.41 ± 8.44 | 23.69 ± 12.84 | 2.28 ± 2.63 |
![]() ![]() Babel 9B Alibaba-DAMO | 12.14 ± 2.23 | 22.26 ± 2.93 | 2.86 ± 0.97 | 10.82 ± 0.48 | 15.05 ± 6.20 | 29.18 ± 7.29 | 4.44 ± 5.29 | 0.38 ± 0.36 |
![]() ![]() Qwen 2.5 7B Alibaba | 12.01 ± 1.50 | 33.69 ± 1.65 | 2.10 ± 0.52 | 7.99 ± 0.21 | 22.31 ± 2.13 | 1.61 ± 0.84 | 16.09 ± 8.88 | 0.25 ± 0.23 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 9.97 ± 1.81 | 25.60 ± 2.04 | 3.61 ± 0.90 | 11.79 ± 1.72 | 7.89 ± 6.73 | 12.26 ± 6.13 | 5.88 ± 8.20 | 2.75 ± 2.42 |
![]() ![]() phi-4 14B Microsoft | 9.69 ± 1.63 | 20.83 ± 2.22 | 1.78 ± 0.30 | 9.21 ± 1.00 | 5.31 ± 4.74 | 30.27 ± 6.06 | 0.00 ± 0.00 | 0.41 ± 0.53 |
![]() ![]() Sailor2 20B SAIL | 8.56 ± 0.30 | 23.45 ± 1.83 | 9.21 ± 0.75 | 26.41 ± 0.50 | 0.82 ± 0.57 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 0325 32B AI2 | 7.17 ± 1.41 | 20.71 ± 0.85 | 0.11 ± 0.14 | 7.43 ± 0.27 | 6.91 ± 5.26 | 13.81 ± 3.51 | 0.00 ± 0.00 | 1.22 ± 1.47 |
![]() ![]() Llama 3 8B Meta | 5.91 ± 0.45 | 8.45 ± 0.96 | 1.89 ± 0.66 | 11.46 ± 1.18 | 15.72 ± 0.77 | 1.51 ± 1.66 | 0.00 ± 0.00 | 2.34 ± 2.10 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 4.57 ± 2.19 | 9.52 ± 1.58 | 0.38 ± 0.25 | 3.49 ± 1.28 | 7.58 ± 4.99 | 8.71 ± 7.24 | 0.00 ± 0.00 | 2.28 ± 2.90 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 3.87 ± 1.49 | 12.50 ± 2.94 | 0.05 ± 0.11 | 6.12 ± 1.31 | 5.59 ± 6.14 | 0.58 ± 0.90 | 0.31 ± 0.36 | 1.94 ± 3.66 |
![]() ![]() Aya Expanse 8B CohereLabs | 2.93 ± 0.24 | 18.69 ± 1.58 | 0.00 ± 0.00 | 1.42 ± 0.26 | 0.34 ± 0.25 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.06 ± 0.08 |
![]() ![]() Olmo 2 1124 13B AI2 | 2.40 ± 0.73 | 11.67 ± 2.28 | 0.05 ± 0.11 | 0.51 ± 0.07 | 3.86 ± 4.81 | 0.46 ± 0.45 | 0.00 ± 0.00 | 0.21 ± 0.34 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.28 ± 0.35 | 11.43 ± 1.58 | 0.05 ± 0.11 | 3.20 ± 0.60 | 1.22 ± 0.81 | 0.01 ± 0.02 | 0.00 ± 0.00 | 0.07 ± 0.14 |
Burmese Tasks
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MY | Instruction Following | SEA-IFEval |
---|---|---|---|
![]() ![]() Gemma 3 27B | 57.78 ± 0.43 | 59.40 ± 1.65 | 59.40 ± 8.46 |
![]() ![]() SEA-LION v4 27B AISG | 57.18 ± 0.42 | 58.57 ± 0.93 | 58.57 ± 8.39 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 54.76 ± 0.23 | 61.43 ± 1.45 | 61.43 ± 8.33 |
![]() ![]() Gemma 3 12B | 52.82 ± 0.22 | 53.69 ± 1.17 | 53.69 ± 8.58 |
![]() ![]() Qwen 3 32B Alibaba | 43.03 ± 0.63 | 62.38 ± 1.90 | 62.38 ± 7.87 |
![]() ![]() Tulu 3 70B AI2 | 40.52 ± 0.30 | 51.55 ± 1.29 | 51.55 ± 8.02 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 40.16 ± 2.57 | 67.98 ± 1.54 | 67.98 ± 7.20 |
![]() ![]() Gemma 2 27B | 38.95 ± 6.64 | 45.24 ± 1.50 | 45.24 ± 7.95 |
![]() ![]() Qwen 3 14B Alibaba | 35.03 ± 0.51 | 44.17 ± 1.49 | 44.17 ± 8.37 |
![]() ![]() Qwen 2.5 72B Alibaba | 33.37 ± 0.40 | 47.86 ± 2.25 | 47.86 ± 8.02 |
![]() ![]() Qwen 2.5 32B Alibaba | 32.15 ± 0.32 | 48.21 ± 1.36 | 48.21 ± 8.25 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 31.23 ± 2.11 | 47.74 ± 2.70 | 47.74 ± 7.11 |
![]() ![]() Qwen 3 8B Alibaba | 30.49 ± 0.76 | 37.02 ± 1.39 | 37.02 ± 7.18 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 27.35 ± 1.82 | 47.14 ± 2.23 | 47.14 ± 7.77 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 27.33 ± 1.13 | 45.24 ± 1.45 | 45.24 ± 8.19 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.88 ± 0.27 | 50.36 ± 1.29 | 50.36 ± 8.67 |
![]() ![]() Command A 03-2025 111B CohereLabs | 25.41 ± 2.48 | 46.43 ± 2.36 | 46.43 ± 7.19 |
![]() ![]() Llama 3.1 70B Meta | 24.14 ± 3.01 | 45.00 ± 2.84 | 45.00 ± 7.75 |
![]() ![]() Llama 3.3 70B Meta | 23.82 ± 0.59 | 56.55 ± 1.93 | 56.55 ± 7.86 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 21.69 ± 3.72 | 47.62 ± 1.27 | 47.62 ± 7.50 |
![]() ![]() Llama 3 70B Meta | 21.48 ± 2.59 | 10.36 ± 1.29 | 10.36 ± 5.07 |
![]() ![]() Qwen 2.5 14B Alibaba | 21.05 ± 0.48 | 41.43 ± 2.09 | 41.43 ± 7.50 |
![]() ![]() MERaLiON 2 10B A*STAR | 20.20 ± 3.33 | 34.88 ± 1.22 | 34.88 ± 7.02 |
![]() ![]() Babel 83B Alibaba-DAMO | 19.42 ± 6.02 | 22.62 ± 1.61 | 22.62 ± 5.17 |
![]() ![]() Tulu 3 8B AI2 | 18.87 ± 0.59 | 36.67 ± 1.69 | 36.67 ± 7.72 |
![]() ![]() Aya Expanse 32B CohereLabs | 16.82 ± 1.67 | 27.86 ± 2.28 | 27.86 ± 6.84 |
![]() ![]() Gemma 2 9B | 16.18 ± 3.65 | 32.26 ± 1.67 | 32.26 ± 6.62 |
![]() ![]() Sailor2 8B SAIL | 15.82 ± 2.50 | 24.52 ± 1.61 | 24.52 ± 7.03 |
![]() ![]() Llama 3.1 8B Meta | 14.92 ± 1.18 | 25.48 ± 2.70 | 25.48 ± 6.11 |
![]() ![]() Command R 08-2024 32B CohereLabs | 13.44 ± 1.87 | 16.31 ± 1.39 | 16.31 ± 4.79 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 13.26 ± 2.31 | 21.90 ± 2.34 | 21.90 ± 5.35 |
![]() ![]() Ministral 2410 8B Mistral AI | 13.22 ± 2.97 | 9.64 ± 2.40 | 9.64 ± 3.54 |
![]() ![]() Babel 9B Alibaba-DAMO | 12.14 ± 2.23 | 22.26 ± 2.93 | 22.26 ± 5.41 |
![]() ![]() Qwen 2.5 7B Alibaba | 12.01 ± 1.50 | 33.69 ± 1.65 | 33.69 ± 7.62 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 9.97 ± 1.81 | 25.60 ± 2.04 | 25.60 ± 6.66 |
![]() ![]() phi-4 14B Microsoft | 9.69 ± 1.63 | 20.83 ± 2.22 | 20.83 ± 5.53 |
![]() ![]() Sailor2 20B SAIL | 8.56 ± 0.30 | 23.45 ± 1.83 | 23.45 ± 6.82 |
![]() ![]() Olmo 2 0325 32B AI2 | 7.17 ± 1.41 | 20.71 ± 0.85 | 20.71 ± 5.84 |
![]() ![]() Llama 3 8B Meta | 5.91 ± 0.45 | 8.45 ± 0.96 | 8.45 ± 4.36 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 4.57 ± 2.19 | 9.52 ± 1.58 | 9.52 ± 3.18 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 3.87 ± 1.49 | 12.50 ± 2.94 | 12.50 ± 3.76 |
![]() ![]() Aya Expanse 8B CohereLabs | 2.93 ± 0.24 | 18.69 ± 1.58 | 18.69 ± 5.88 |
![]() ![]() Olmo 2 1124 13B AI2 | 2.40 ± 0.73 | 11.67 ± 2.28 | 11.67 ± 4.55 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.28 ± 0.35 | 11.43 ± 1.58 | 11.43 ± 4.60 |
Model | MY | Multi-Turn Chat | SEA-MT-Bench |
---|---|---|---|
![]() ![]() Gemma 3 27B | 57.78 ± 0.43 | 34.27 ± 2.08 | 34.27 ± 5.14 |
![]() ![]() SEA-LION v4 27B AISG | 57.18 ± 0.42 | 32.60 ± 2.05 | 32.60 ± 5.03 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 54.76 ± 0.23 | 12.82 ± 1.40 | 12.82 ± 3.30 |
![]() ![]() Gemma 3 12B | 52.82 ± 0.22 | 25.22 ± 1.07 | 25.22 ± 4.88 |
![]() ![]() Qwen 3 32B Alibaba | 43.03 ± 0.63 | 13.31 ± 0.59 | 13.31 ± 4.19 |
![]() ![]() Tulu 3 70B AI2 | 40.52 ± 0.30 | 7.54 ± 0.50 | 7.54 ± 3.12 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 40.16 ± 2.57 | 11.21 ± 0.45 | 11.21 ± 3.42 |
![]() ![]() Gemma 2 27B | 38.95 ± 6.64 | 5.66 ± 0.52 | 5.66 ± 2.80 |
![]() ![]() Qwen 3 14B Alibaba | 35.03 ± 0.51 | 8.78 ± 0.92 | 8.78 ± 3.65 |
![]() ![]() Qwen 2.5 72B Alibaba | 33.37 ± 0.40 | 8.89 ± 0.89 | 8.89 ± 3.43 |
![]() ![]() Qwen 2.5 32B Alibaba | 32.15 ± 0.32 | 6.47 ± 0.64 | 6.47 ± 2.99 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 31.23 ± 2.11 | 5.77 ± 0.71 | 5.77 ± 2.37 |
![]() ![]() Qwen 3 8B Alibaba | 30.49 ± 0.76 | 9.05 ± 0.68 | 9.05 ± 3.64 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 27.35 ± 1.82 | 4.47 ± 0.73 | 4.47 ± 2.42 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 27.33 ± 1.13 | 12.88 ± 0.90 | 12.88 ± 3.71 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.88 ± 0.27 | 13.47 ± 0.38 | 13.47 ± 4.31 |
![]() ![]() Command A 03-2025 111B CohereLabs | 25.41 ± 2.48 | 6.57 ± 0.59 | 6.57 ± 2.92 |
![]() ![]() Llama 3.1 70B Meta | 24.14 ± 3.01 | 5.87 ± 1.09 | 5.87 ± 2.44 |
![]() ![]() Llama 3.3 70B Meta | 23.82 ± 0.59 | 7.06 ± 0.60 | 7.06 ± 2.92 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 21.69 ± 3.72 | 8.78 ± 0.90 | 8.78 ± 3.10 |
![]() ![]() Llama 3 70B Meta | 21.48 ± 2.59 | 5.44 ± 0.83 | 5.44 ± 2.42 |
![]() ![]() Qwen 2.5 14B Alibaba | 21.05 ± 0.48 | 4.96 ± 0.58 | 4.96 ± 2.63 |
![]() ![]() MERaLiON 2 10B A*STAR | 20.20 ± 3.33 | 3.39 ± 0.34 | 3.39 ± 2.24 |
![]() ![]() Babel 83B Alibaba-DAMO | 19.42 ± 6.02 | 1.19 ± 0.55 | 1.19 ± 0.83 |
![]() ![]() Tulu 3 8B AI2 | 18.87 ± 0.59 | 1.24 ± 0.52 | 1.24 ± 0.98 |
![]() ![]() Aya Expanse 32B CohereLabs | 16.82 ± 1.67 | 1.40 ± 0.47 | 1.40 ± 1.06 |
![]() ![]() Gemma 2 9B | 16.18 ± 3.65 | 3.29 ± 0.57 | 3.29 ± 1.76 |
![]() ![]() Sailor2 8B SAIL | 15.82 ± 2.50 | 9.64 ± 0.86 | 9.64 ± 3.41 |
![]() ![]() Llama 3.1 8B Meta | 14.92 ± 1.18 | 2.42 ± 0.66 | 2.42 ± 1.60 |
![]() ![]() Command R 08-2024 32B CohereLabs | 13.44 ± 1.87 | 0.27 ± 0.32 | 0.27 ± 0.28 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 13.26 ± 2.31 | 0.75 ± 0.38 | 0.75 ± 0.79 |
![]() ![]() Ministral 2410 8B Mistral AI | 13.22 ± 2.97 | 0.92 ± 0.49 | 0.92 ± 0.89 |
![]() ![]() Babel 9B Alibaba-DAMO | 12.14 ± 2.23 | 2.86 ± 0.97 | 2.86 ± 1.49 |
![]() ![]() Qwen 2.5 7B Alibaba | 12.01 ± 1.50 | 2.10 ± 0.52 | 2.10 ± 1.41 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 9.97 ± 1.81 | 3.61 ± 0.90 | 3.61 ± 2.12 |
![]() ![]() phi-4 14B Microsoft | 9.69 ± 1.63 | 1.78 ± 0.30 | 1.78 ± 1.19 |
![]() ![]() Sailor2 20B SAIL | 8.56 ± 0.30 | 9.21 ± 0.75 | 9.21 ± 3.28 |
![]() ![]() Olmo 2 0325 32B AI2 | 7.17 ± 1.41 | 0.11 ± 0.14 | 0.22 ± 0.42 |
![]() ![]() Llama 3 8B Meta | 5.91 ± 0.45 | 1.89 ± 0.66 | 1.89 ± 1.20 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 4.57 ± 2.19 | 0.38 ± 0.25 | 0.38 ± 0.34 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 3.87 ± 1.49 | 0.05 ± 0.11 | 0.05 ± 0.11 |
![]() ![]() Aya Expanse 8B CohereLabs | 2.93 ± 0.24 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 13B AI2 | 2.40 ± 0.73 | 0.05 ± 0.11 | 0.43 ± 0.84 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.28 ± 0.35 | 0.05 ± 0.11 | 0.11 ± 0.21 |
Model | MY | NLG | Summarization | Translations |
---|---|---|---|---|
![]() ![]() Gemma 3 27B | 57.78 ± 0.43 | 40.81 ± 0.13 | 2.20 ± 0.66 | 79.41 ± 0.11 |
![]() ![]() SEA-LION v4 27B AISG | 57.18 ± 0.42 | 39.87 ± 1.07 | 1.86 ± 0.61 | 77.89 ± 2.21 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 54.76 ± 0.23 | 44.39 ± 0.22 | 7.60 ± 1.37 | 81.17 ± 0.10 |
![]() ![]() Gemma 3 12B | 52.82 ± 0.22 | 37.47 ± 0.51 | 4.08 ± 0.92 | 70.87 ± 1.02 |
![]() ![]() Qwen 3 32B Alibaba | 43.03 ± 0.63 | 25.10 ± 4.22 | 5.37 ± 1.05 | 44.83 ± 8.52 |
![]() ![]() Tulu 3 70B AI2 | 40.52 ± 0.30 | 36.47 ± 0.19 | 6.33 ± 0.99 | 66.61 ± 0.29 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 40.16 ± 2.57 | 34.19 ± 1.92 | 5.16 ± 1.00 | 63.22 ± 3.85 |
![]() ![]() Gemma 2 27B | 38.95 ± 6.64 | 27.78 ± 0.49 | 5.23 ± 0.90 | 50.33 ± 0.75 |
![]() ![]() Qwen 3 14B Alibaba | 35.03 ± 0.51 | 28.36 ± 2.09 | 4.97 ± 1.01 | 51.76 ± 4.07 |
![]() ![]() Qwen 2.5 72B Alibaba | 33.37 ± 0.40 | 25.92 ± 0.60 | 5.50 ± 0.87 | 46.34 ± 1.19 |
![]() ![]() Qwen 2.5 32B Alibaba | 32.15 ± 0.32 | 22.94 ± 0.11 | 6.61 ± 0.99 | 39.28 ± 0.20 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 31.23 ± 2.11 | 29.92 ± 2.45 | 4.69 ± 0.75 | 55.14 ± 4.03 |
![]() ![]() Qwen 3 8B Alibaba | 30.49 ± 0.76 | 26.37 ± 0.13 | 4.84 ± 0.93 | 47.91 ± 0.20 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 27.35 ± 1.82 | 7.38 ± 1.10 | 0.86 ± 0.29 | 13.90 ± 2.19 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 27.33 ± 1.13 | 37.42 ± 0.64 | 2.68 ± 0.58 | 72.16 ± 1.10 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.88 ± 0.27 | 17.50 ± 0.27 | 0.22 ± 0.18 | 34.78 ± 0.48 |
![]() ![]() Command A 03-2025 111B CohereLabs | 25.41 ± 2.48 | 19.15 ± 2.42 | 1.14 ± 0.35 | 37.16 ± 4.67 |
![]() ![]() Llama 3.1 70B Meta | 24.14 ± 3.01 | 33.58 ± 1.50 | 8.92 ± 1.19 | 58.25 ± 2.95 |
![]() ![]() Llama 3.3 70B Meta | 23.82 ± 0.59 | 34.59 ± 0.35 | 9.15 ± 1.16 | 60.02 ± 0.73 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 21.69 ± 3.72 | 28.81 ± 0.48 | 4.55 ± 0.70 | 53.06 ± 0.64 |
![]() ![]() Llama 3 70B Meta | 21.48 ± 2.59 | 24.96 ± 0.37 | 0.00 ± 0.00 | 49.93 ± 0.74 |
![]() ![]() Qwen 2.5 14B Alibaba | 21.05 ± 0.48 | 19.06 ± 0.20 | 5.66 ± 0.88 | 32.47 ± 0.28 |
![]() ![]() MERaLiON 2 10B A*STAR | 20.20 ± 3.33 | 20.55 ± 0.68 | 6.12 ± 0.79 | 34.97 ± 1.24 |
![]() ![]() Babel 83B Alibaba-DAMO | 19.42 ± 6.02 | 6.63 ± 1.37 | 3.59 ± 0.62 | 9.67 ± 2.27 |
![]() ![]() Tulu 3 8B AI2 | 18.87 ± 0.59 | 21.32 ± 0.28 | 5.06 ± 0.86 | 37.58 ± 0.34 |
![]() ![]() Aya Expanse 32B CohereLabs | 16.82 ± 1.67 | 10.34 ± 0.65 | 0.00 ± 0.00 | 20.68 ± 1.30 |
![]() ![]() Gemma 2 9B | 16.18 ± 3.65 | 20.17 ± 0.72 | 4.55 ± 0.65 | 35.79 ± 1.08 |
![]() ![]() Sailor2 8B SAIL | 15.82 ± 2.50 | 24.41 ± 0.61 | 0.00 ± 0.00 | 48.82 ± 1.22 |
![]() ![]() Llama 3.1 8B Meta | 14.92 ± 1.18 | 14.03 ± 0.81 | 7.35 ± 1.05 | 20.71 ± 1.52 |
![]() ![]() Command R 08-2024 32B CohereLabs | 13.44 ± 1.87 | 6.68 ± 0.79 | 3.61 ± 0.63 | 9.74 ± 1.47 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 13.26 ± 2.31 | 14.35 ± 1.64 | 2.80 ± 0.50 | 25.90 ± 3.10 |
![]() ![]() Ministral 2410 8B Mistral AI | 13.22 ± 2.97 | 14.13 ± 1.27 | 0.84 ± 0.22 | 27.42 ± 2.40 |
![]() ![]() Babel 9B Alibaba-DAMO | 12.14 ± 2.23 | 10.82 ± 0.48 | 4.06 ± 0.63 | 17.58 ± 1.04 |
![]() ![]() Qwen 2.5 7B Alibaba | 12.01 ± 1.50 | 7.99 ± 0.21 | 4.89 ± 0.75 | 11.08 ± 0.25 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 9.97 ± 1.81 | 11.79 ± 1.72 | 4.81 ± 0.86 | 18.77 ± 2.88 |
![]() ![]() phi-4 14B Microsoft | 9.69 ± 1.63 | 9.21 ± 1.00 | 2.29 ± 0.44 | 16.12 ± 1.80 |
![]() ![]() Sailor2 20B SAIL | 8.56 ± 0.30 | 26.41 ± 0.50 | 0.00 ± 0.00 | 52.81 ± 0.99 |
![]() ![]() Olmo 2 0325 32B AI2 | 7.17 ± 1.41 | 7.43 ± 0.27 | 0.00 ± 0.00 | 14.85 ± 0.53 |
![]() ![]() Llama 3 8B Meta | 5.91 ± 0.45 | 11.46 ± 1.18 | 0.00 ± 0.00 | 22.92 ± 2.36 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 4.57 ± 2.19 | 3.49 ± 1.28 | 0.63 ± 0.14 | 6.34 ± 2.41 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 3.87 ± 1.49 | 6.12 ± 1.31 | 2.57 ± 0.43 | 9.67 ± 2.49 |
![]() ![]() Aya Expanse 8B CohereLabs | 2.93 ± 0.24 | 1.42 ± 0.26 | 0.00 ± 0.00 | 2.83 ± 0.52 |
![]() ![]() Olmo 2 1124 13B AI2 | 2.40 ± 0.73 | 0.51 ± 0.07 | 0.00 ± 0.00 | 1.03 ± 0.14 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.28 ± 0.35 | 3.20 ± 0.60 | 0.00 ± 0.00 | 6.41 ± 1.20 |
Model | MY | NLR | Causal Reasoning | Natural Language Inference |
---|---|---|---|---|
![]() ![]() Gemma 3 27B | 57.78 ± 0.43 | 73.83 ± 0.15 | 78.50 ± 3.78 | 69.17 ± 3.64 |
![]() ![]() SEA-LION v4 27B AISG | 57.18 ± 0.42 | 73.65 ± 0.26 | 78.72 ± 3.64 | 68.58 ± 3.64 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 54.76 ± 0.23 | 69.78 ± 0.34 | 85.13 ± 3.40 | 54.44 ± 3.89 |
![]() ![]() Gemma 3 12B | 52.82 ± 0.22 | 61.48 ± 0.30 | 72.22 ± 3.95 | 50.75 ± 3.95 |
![]() ![]() Qwen 3 32B Alibaba | 43.03 ± 0.63 | 74.28 ± 0.90 | 80.59 ± 3.66 | 67.96 ± 3.46 |
![]() ![]() Tulu 3 70B AI2 | 40.52 ± 0.30 | 70.57 ± 0.70 | 78.63 ± 3.77 | 62.52 ± 3.58 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 40.16 ± 2.57 | 70.26 ± 3.25 | 77.28 ± 3.31 | 63.23 ± 3.26 |
![]() ![]() Gemma 2 27B | 38.95 ± 6.64 | 58.91 ± 6.31 | 62.97 ± 3.62 | 54.85 ± 3.76 |
![]() ![]() Qwen 3 14B Alibaba | 35.03 ± 0.51 | 55.60 ± 0.54 | 65.88 ± 4.26 | 45.33 ± 3.81 |
![]() ![]() Qwen 2.5 72B Alibaba | 33.37 ± 0.40 | 47.19 ± 0.55 | 52.75 ± 4.82 | 41.63 ± 3.83 |
![]() ![]() Qwen 2.5 32B Alibaba | 32.15 ± 0.32 | 46.97 ± 1.21 | 56.66 ± 4.54 | 37.29 ± 3.54 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 31.23 ± 2.11 | 47.10 ± 7.85 | 53.56 ± 3.27 | 40.65 ± 3.68 |
![]() ![]() Qwen 3 8B Alibaba | 30.49 ± 0.76 | 50.47 ± 0.51 | 65.41 ± 4.21 | 35.54 ± 3.78 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 27.35 ± 1.82 | 48.79 ± 6.61 | 52.53 ± 3.37 | 45.04 ± 3.39 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 27.33 ± 1.13 | 15.47 ± 2.17 | 0.00 ± 0.00 | 30.94 ± 3.16 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.88 ± 0.27 | 34.39 ± 0.94 | 6.16 ± 1.92 | 62.63 ± 3.55 |
![]() ![]() Command A 03-2025 111B CohereLabs | 25.41 ± 2.48 | 35.97 ± 9.17 | 26.47 ± 2.39 | 45.48 ± 3.24 |
![]() ![]() Llama 3.1 70B Meta | 24.14 ± 3.01 | 37.30 ± 8.74 | 17.72 ± 1.46 | 56.88 ± 3.21 |
![]() ![]() Llama 3.3 70B Meta | 23.82 ± 0.59 | 30.81 ± 0.21 | 0.00 ± 0.00 | 61.63 ± 3.69 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 21.69 ± 3.72 | 33.97 ± 7.68 | 18.13 ± 1.62 | 49.81 ± 3.80 |
![]() ![]() Llama 3 70B Meta | 21.48 ± 2.59 | 61.85 ± 0.40 | 66.84 ± 4.21 | 56.85 ± 3.57 |
![]() ![]() Qwen 2.5 14B Alibaba | 21.05 ± 0.48 | 42.07 ± 1.36 | 47.84 ± 4.38 | 36.29 ± 3.07 |
![]() ![]() MERaLiON 2 10B A*STAR | 20.20 ± 3.33 | 23.72 ± 1.27 | 5.63 ± 0.96 | 41.81 ± 3.81 |
![]() ![]() Babel 83B Alibaba-DAMO | 19.42 ± 6.02 | 33.70 ± 15.15 | 34.50 ± 1.36 | 32.90 ± 1.21 |
![]() ![]() Tulu 3 8B AI2 | 18.87 ± 0.59 | 23.56 ± 3.20 | 12.91 ± 1.98 | 34.21 ± 3.77 |
![]() ![]() Aya Expanse 32B CohereLabs | 16.82 ± 1.67 | 40.15 ± 3.09 | 48.41 ± 3.58 | 31.90 ± 2.85 |
![]() ![]() Gemma 2 9B | 16.18 ± 3.65 | 27.69 ± 9.05 | 15.91 ± 1.37 | 39.48 ± 3.39 |
![]() ![]() Sailor2 8B SAIL | 15.82 ± 2.50 | 34.24 ± 3.46 | 5.94 ± 1.06 | 62.54 ± 3.07 |
![]() ![]() Llama 3.1 8B Meta | 14.92 ± 1.18 | 30.37 ± 5.16 | 35.84 ± 3.56 | 24.90 ± 2.93 |
![]() ![]() Command R 08-2024 32B CohereLabs | 13.44 ± 1.87 | 33.85 ± 8.45 | 33.78 ± 1.85 | 33.92 ± 1.69 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 13.26 ± 2.31 | 19.08 ± 9.54 | 9.44 ± 1.00 | 28.73 ± 2.08 |
![]() ![]() Ministral 2410 8B Mistral AI | 13.22 ± 2.97 | 24.47 ± 8.89 | 17.16 ± 1.62 | 31.79 ± 1.33 |
![]() ![]() Babel 9B Alibaba-DAMO | 12.14 ± 2.23 | 15.05 ± 6.20 | 10.75 ± 1.17 | 19.35 ± 1.89 |
![]() ![]() Qwen 2.5 7B Alibaba | 12.01 ± 1.50 | 22.31 ± 2.13 | 7.06 ± 1.62 | 37.56 ± 2.88 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 9.97 ± 1.81 | 7.89 ± 6.73 | 12.09 ± 0.96 | 3.69 ± 0.53 |
![]() ![]() phi-4 14B Microsoft | 9.69 ± 1.63 | 5.31 ± 4.74 | 7.63 ± 1.02 | 3.00 ± 0.73 |
![]() ![]() Sailor2 20B SAIL | 8.56 ± 0.30 | 0.82 ± 0.57 | 0.00 ± 0.00 | 1.65 ± 0.65 |
![]() ![]() Olmo 2 0325 32B AI2 | 7.17 ± 1.41 | 6.91 ± 5.26 | 6.16 ± 0.63 | 7.67 ± 0.88 |
![]() ![]() Llama 3 8B Meta | 5.91 ± 0.45 | 15.72 ± 0.77 | 0.00 ± 0.00 | 31.44 ± 3.48 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 4.57 ± 2.19 | 7.58 ± 4.99 | 2.75 ± 0.59 | 12.42 ± 1.12 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 3.87 ± 1.49 | 5.59 ± 6.14 | 2.53 ± 0.49 | 8.65 ± 1.01 |
![]() ![]() Aya Expanse 8B CohereLabs | 2.93 ± 0.24 | 0.34 ± 0.25 | 0.34 ± 0.20 | 0.33 ± 0.17 |
![]() ![]() Olmo 2 1124 13B AI2 | 2.40 ± 0.73 | 3.86 ± 4.81 | 3.38 ± 0.54 | 4.35 ± 0.71 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.28 ± 0.35 | 1.22 ± 0.81 | 0.03 ± 0.06 | 2.42 ± 0.45 |
Model | MY | NLU | Belebele QA | Sentiment Analysis |
---|---|---|---|---|
![]() ![]() Gemma 3 27B | 57.78 ± 0.43 | 70.97 ± 0.26 | 77.29 ± 7.39 | 64.65 ± 3.74 |
![]() ![]() SEA-LION v4 27B AISG | 57.18 ± 0.42 | 70.91 ± 0.46 | 76.88 ± 7.32 | 64.94 ± 3.70 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 54.76 ± 0.23 | 62.00 ± 2.60 | 81.88 ± 6.86 | 42.13 ± 3.64 |
![]() ![]() Gemma 3 12B | 52.82 ± 0.22 | 70.73 ± 0.19 | 75.42 ± 7.47 | 66.04 ± 3.72 |
![]() ![]() Qwen 3 32B Alibaba | 43.03 ± 0.63 | 71.03 ± 0.55 | 81.04 ± 6.49 | 61.02 ± 3.80 |
![]() ![]() Tulu 3 70B AI2 | 40.52 ± 0.30 | 65.56 ± 1.65 | 74.90 ± 7.26 | 56.23 ± 3.64 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 40.16 ± 2.57 | 48.02 ± 9.28 | 80.21 ± 6.51 | 15.83 ± 1.05 |
![]() ![]() Gemma 2 27B | 38.95 ± 6.64 | 47.95 ± 19.25 | 48.44 ± 6.02 | 47.46 ± 2.86 |
![]() ![]() Qwen 3 14B Alibaba | 35.03 ± 0.51 | 64.94 ± 0.55 | 71.77 ± 7.55 | 58.10 ± 3.86 |
![]() ![]() Qwen 2.5 72B Alibaba | 33.37 ± 0.40 | 60.74 ± 0.61 | 61.46 ± 8.51 | 60.02 ± 3.76 |
![]() ![]() Qwen 2.5 32B Alibaba | 32.15 ± 0.32 | 56.00 ± 0.43 | 57.29 ± 8.38 | 54.71 ± 3.78 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 31.23 ± 2.11 | 56.63 ± 4.47 | 55.94 ± 7.07 | 57.31 ± 3.65 |
![]() ![]() Qwen 3 8B Alibaba | 30.49 ± 0.76 | 55.50 ± 1.13 | 55.10 ± 8.40 | 55.90 ± 3.79 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 27.35 ± 1.82 | 50.43 ± 6.37 | 53.23 ± 8.07 | 47.63 ± 2.80 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 27.33 ± 1.13 | 8.23 ± 2.87 | 16.46 ± 4.93 | 0.00 ± 0.00 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.88 ± 0.27 | 65.42 ± 0.73 | 76.67 ± 7.35 | 54.17 ± 3.84 |
![]() ![]() Command A 03-2025 111B CohereLabs | 25.41 ± 2.48 | 46.18 ± 6.78 | 47.81 ± 7.59 | 44.54 ± 3.13 |
![]() ![]() Llama 3.1 70B Meta | 24.14 ± 3.01 | 26.89 ± 6.55 | 53.75 ± 6.59 | 0.02 ± 0.04 |
![]() ![]() Llama 3.3 70B Meta | 23.82 ± 0.59 | 19.44 ± 3.29 | 27.71 ± 7.51 | 11.17 ± 2.02 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 21.69 ± 3.72 | 18.21 ± 12.92 | 27.92 ± 3.99 | 8.50 ± 0.67 |
![]() ![]() Llama 3 70B Meta | 21.48 ± 2.59 | 24.31 ± 11.58 | 36.46 ± 6.85 | 12.17 ± 1.26 |
![]() ![]() Qwen 2.5 14B Alibaba | 21.05 ± 0.48 | 33.61 ± 2.42 | 50.52 ± 8.48 | 16.71 ± 2.39 |
![]() ![]() MERaLiON 2 10B A*STAR | 20.20 ± 3.33 | 24.28 ± 9.35 | 17.29 ± 3.21 | 31.27 ± 2.52 |
![]() ![]() Babel 83B Alibaba-DAMO | 19.42 ± 6.02 | 42.67 ± 13.59 | 46.88 ± 5.48 | 38.46 ± 2.57 |
![]() ![]() Tulu 3 8B AI2 | 18.87 ± 0.59 | 30.28 ± 2.22 | 29.58 ± 7.60 | 30.98 ± 2.88 |
![]() ![]() Aya Expanse 32B CohereLabs | 16.82 ± 1.67 | 12.43 ± 6.07 | 12.08 ± 3.75 | 12.77 ± 1.44 |
![]() ![]() Gemma 2 9B | 16.18 ± 3.65 | 13.29 ± 8.07 | 5.00 ± 1.59 | 21.58 ± 1.85 |
![]() ![]() Sailor2 8B SAIL | 15.82 ± 2.50 | 15.11 ± 12.05 | 21.46 ± 3.78 | 8.77 ± 1.10 |
![]() ![]() Llama 3.1 8B Meta | 14.92 ± 1.18 | 16.35 ± 1.20 | 30.73 ± 7.29 | 1.98 ± 0.67 |
![]() ![]() Command R 08-2024 32B CohereLabs | 13.44 ± 1.87 | 13.77 ± 3.71 | 24.79 ± 5.41 | 2.75 ± 0.66 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 13.26 ± 2.31 | 19.75 ± 6.27 | 5.21 ± 1.60 | 34.29 ± 2.35 |
![]() ![]() Ministral 2410 8B Mistral AI | 13.22 ± 2.97 | 17.41 ± 8.44 | 4.90 ± 1.45 | 29.92 ± 2.15 |
![]() ![]() Babel 9B Alibaba-DAMO | 12.14 ± 2.23 | 29.18 ± 7.29 | 10.10 ± 3.03 | 48.25 ± 3.23 |
![]() ![]() Qwen 2.5 7B Alibaba | 12.01 ± 1.50 | 1.61 ± 0.84 | 3.23 ± 2.07 | 0.00 ± 0.00 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 9.97 ± 1.81 | 12.26 ± 6.13 | 8.33 ± 2.56 | 16.19 ± 1.08 |
![]() ![]() phi-4 14B Microsoft | 9.69 ± 1.63 | 30.27 ± 6.06 | 15.10 ± 3.54 | 45.44 ± 3.24 |
![]() ![]() Sailor2 20B SAIL | 8.56 ± 0.30 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 0325 32B AI2 | 7.17 ± 1.41 | 13.81 ± 3.51 | 0.63 ± 0.49 | 27.00 ± 3.06 |
![]() ![]() Llama 3 8B Meta | 5.91 ± 0.45 | 1.51 ± 1.66 | 3.02 ± 1.44 | 0.00 ± 0.00 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 4.57 ± 2.19 | 8.71 ± 7.24 | 5.21 ± 1.37 | 12.21 ± 1.44 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 3.87 ± 1.49 | 0.58 ± 0.90 | 0.94 ± 0.59 | 0.23 ± 0.13 |
![]() ![]() Aya Expanse 8B CohereLabs | 2.93 ± 0.24 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 13B AI2 | 2.40 ± 0.73 | 0.46 ± 0.45 | 0.63 ± 0.49 | 0.29 ± 0.15 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.28 ± 0.35 | 0.01 ± 0.02 | 0.00 ± 0.00 | 0.02 ± 0.04 |
Model | MY | Safety | Toxicity Detection |
---|---|---|---|
![]() ![]() Gemma 3 27B | 57.78 ± 0.43 | 67.09 ± 0.18 | 67.09 ± 4.57 |
![]() ![]() SEA-LION v4 27B AISG | 57.18 ± 0.42 | 66.91 ± 0.70 | 66.91 ± 4.53 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 54.76 ± 0.23 | 72.69 ± 0.30 | 72.69 ± 4.33 |
![]() ![]() Gemma 3 12B | 52.82 ± 0.22 | 70.13 ± 0.09 | 70.13 ± 4.42 |
![]() ![]() Qwen 3 32B Alibaba | 43.03 ± 0.63 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Tulu 3 70B AI2 | 40.52 ± 0.30 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 40.16 ± 2.57 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Gemma 2 27B | 38.95 ± 6.64 | 47.41 ± 13.98 | 47.41 ± 4.00 |
![]() ![]() Qwen 3 14B Alibaba | 35.03 ± 0.51 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Qwen 2.5 72B Alibaba | 33.37 ± 0.40 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Qwen 2.5 32B Alibaba | 32.15 ± 0.32 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 31.23 ± 2.11 | 0.03 ± 0.06 | 0.03 ± 0.06 |
![]() ![]() Qwen 3 8B Alibaba | 30.49 ± 0.76 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 27.35 ± 1.82 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 27.33 ± 1.13 | 34.72 ± 6.00 | 34.72 ± 3.62 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.88 ± 0.27 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command A 03-2025 111B CohereLabs | 25.41 ± 2.48 | 0.72 ± 1.41 | 0.72 ± 0.29 |
![]() ![]() Llama 3.1 70B Meta | 24.14 ± 3.01 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Llama 3.3 70B Meta | 23.82 ± 0.59 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 21.69 ± 3.72 | 12.44 ± 12.56 | 12.44 ± 1.36 |
![]() ![]() Llama 3 70B Meta | 21.48 ± 2.59 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Qwen 2.5 14B Alibaba | 21.05 ± 0.48 | 1.13 ± 1.04 | 1.13 ± 0.56 |
![]() ![]() MERaLiON 2 10B A*STAR | 20.20 ± 3.33 | 25.72 ± 13.13 | 25.72 ± 2.30 |
![]() ![]() Babel 83B Alibaba-DAMO | 19.42 ± 6.02 | 0.13 ± 0.24 | 0.13 ± 0.12 |
![]() ![]() Tulu 3 8B AI2 | 18.87 ± 0.59 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Aya Expanse 32B CohereLabs | 16.82 ± 1.67 | 7.53 ± 8.40 | 7.53 ± 1.14 |
![]() ![]() Gemma 2 9B | 16.18 ± 3.65 | 9.06 ± 9.30 | 9.06 ± 1.06 |
![]() ![]() Sailor2 8B SAIL | 15.82 ± 2.50 | 1.06 ± 1.37 | 1.06 ± 0.45 |
![]() ![]() Llama 3.1 8B Meta | 14.92 ± 1.18 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R 08-2024 32B CohereLabs | 13.44 ± 1.87 | 10.63 ± 11.61 | 10.63 ± 1.00 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 13.26 ± 2.31 | 6.53 ± 8.38 | 6.53 ± 0.72 |
![]() ![]() Ministral 2410 8B Mistral AI | 13.22 ± 2.97 | 23.69 ± 12.84 | 23.69 ± 2.48 |
![]() ![]() Babel 9B Alibaba-DAMO | 12.14 ± 2.23 | 4.44 ± 5.29 | 4.44 ± 0.94 |
![]() ![]() Qwen 2.5 7B Alibaba | 12.01 ± 1.50 | 16.09 ± 8.88 | 16.09 ± 2.27 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 9.97 ± 1.81 | 5.88 ± 8.20 | 5.88 ± 0.90 |
![]() ![]() phi-4 14B Microsoft | 9.69 ± 1.63 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Sailor2 20B SAIL | 8.56 ± 0.30 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 0325 32B AI2 | 7.17 ± 1.41 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Llama 3 8B Meta | 5.91 ± 0.45 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 4.57 ± 2.19 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 3.87 ± 1.49 | 0.31 ± 0.36 | 0.31 ± 0.21 |
![]() ![]() Aya Expanse 8B CohereLabs | 2.93 ± 0.24 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 13B AI2 | 2.40 ± 0.73 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.28 ± 0.35 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | MY | Knowledge | Global MMLU Lite |
---|---|---|---|
![]() ![]() Gemma 3 27B | 57.78 ± 0.43 | 58.06 ± 0.44 | 58.06 ± 0.44 |
![]() ![]() SEA-LION v4 27B AISG | 57.18 ± 0.42 | 57.75 ± 1.21 | 57.75 ± 1.21 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 54.76 ± 0.23 | 60.22 ± 0.27 | 60.22 ± 0.27 |
![]() ![]() Gemma 3 12B | 52.82 ± 0.22 | 51.03 ± 0.39 | 51.03 ± 0.39 |
![]() ![]() Qwen 3 32B Alibaba | 43.03 ± 0.63 | 55.13 ± 1.15 | 55.13 ± 1.15 |
![]() ![]() Tulu 3 70B AI2 | 40.52 ± 0.30 | 51.94 ± 1.50 | 51.94 ± 1.50 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 40.16 ± 2.57 | 49.47 ± 10.51 | 49.47 ± 10.51 |
![]() ![]() Gemma 2 27B | 38.95 ± 6.64 | 39.69 ± 8.80 | 39.69 ± 8.80 |
![]() ![]() Qwen 3 14B Alibaba | 35.03 ± 0.51 | 43.34 ± 1.08 | 43.34 ± 1.08 |
![]() ![]() Qwen 2.5 72B Alibaba | 33.37 ± 0.40 | 43.00 ± 0.80 | 43.00 ± 0.80 |
![]() ![]() Qwen 2.5 32B Alibaba | 32.15 ± 0.32 | 44.47 ± 1.14 | 44.47 ± 1.14 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 31.23 ± 2.11 | 31.44 ± 7.92 | 31.44 ± 7.92 |
![]() ![]() Qwen 3 8B Alibaba | 30.49 ± 0.76 | 35.03 ± 2.83 | 35.03 ± 2.83 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 27.35 ± 1.82 | 33.22 ± 2.80 | 33.22 ± 2.80 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 27.33 ± 1.13 | 37.38 ± 2.03 | 37.38 ± 2.03 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.88 ± 0.27 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command A 03-2025 111B CohereLabs | 25.41 ± 2.48 | 22.84 ± 8.27 | 22.84 ± 8.27 |
![]() ![]() Llama 3.1 70B Meta | 24.14 ± 3.01 | 20.38 ± 8.53 | 20.38 ± 8.53 |
![]() ![]() Llama 3.3 70B Meta | 23.82 ± 0.59 | 18.28 ± 2.37 | 18.28 ± 2.37 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 21.69 ± 3.72 | 2.03 ± 1.55 | 2.03 ± 1.55 |
![]() ![]() Llama 3 70B Meta | 21.48 ± 2.59 | 23.47 ± 5.43 | 23.47 ± 5.43 |
![]() ![]() Qwen 2.5 14B Alibaba | 21.05 ± 0.48 | 5.09 ± 0.84 | 5.09 ± 0.84 |
![]() ![]() MERaLiON 2 10B A*STAR | 20.20 ± 3.33 | 8.84 ± 5.12 | 8.84 ± 5.12 |
![]() ![]() Babel 83B Alibaba-DAMO | 19.42 ± 6.02 | 29.03 ± 13.98 | 29.03 ± 13.98 |
![]() ![]() Tulu 3 8B AI2 | 18.87 ± 0.59 | 19.03 ± 1.95 | 19.03 ± 1.95 |
![]() ![]() Aya Expanse 32B CohereLabs | 16.82 ± 1.67 | 18.00 ± 3.96 | 18.00 ± 3.96 |
![]() ![]() Gemma 2 9B | 16.18 ± 3.65 | 7.47 ± 3.43 | 7.47 ± 3.43 |
![]() ![]() Sailor2 8B SAIL | 15.82 ± 2.50 | 1.78 ± 1.50 | 1.78 ± 1.50 |
![]() ![]() Llama 3.1 8B Meta | 14.92 ± 1.18 | 15.78 ± 2.76 | 15.78 ± 2.76 |
![]() ![]() Command R 08-2024 32B CohereLabs | 13.44 ± 1.87 | 12.56 ± 5.57 | 12.56 ± 5.57 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 13.26 ± 2.31 | 10.47 ± 6.04 | 10.47 ± 6.04 |
![]() ![]() Ministral 2410 8B Mistral AI | 13.22 ± 2.97 | 2.28 ± 2.63 | 2.28 ± 2.63 |
![]() ![]() Babel 9B Alibaba-DAMO | 12.14 ± 2.23 | 0.38 ± 0.36 | 0.38 ± 0.36 |
![]() ![]() Qwen 2.5 7B Alibaba | 12.01 ± 1.50 | 0.25 ± 0.23 | 0.25 ± 0.23 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 9.97 ± 1.81 | 2.75 ± 2.42 | 2.75 ± 2.42 |
![]() ![]() phi-4 14B Microsoft | 9.69 ± 1.63 | 0.41 ± 0.53 | 0.41 ± 0.53 |
![]() ![]() Sailor2 20B SAIL | 8.56 ± 0.30 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 0325 32B AI2 | 7.17 ± 1.41 | 1.22 ± 1.47 | 1.22 ± 1.47 |
![]() ![]() Llama 3 8B Meta | 5.91 ± 0.45 | 2.34 ± 2.10 | 2.34 ± 2.10 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 4.57 ± 2.19 | 2.28 ± 2.90 | 2.28 ± 2.90 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 3.87 ± 1.49 | 1.94 ± 3.66 | 1.94 ± 3.66 |
![]() ![]() Aya Expanse 8B CohereLabs | 2.93 ± 0.24 | 0.06 ± 0.08 | 0.06 ± 0.08 |
![]() ![]() Olmo 2 1124 13B AI2 | 2.40 ± 0.73 | 0.21 ± 0.34 | 0.21 ± 0.34 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.28 ± 0.35 | 0.07 ± 0.14 | 0.07 ± 0.14 |