Burmese Performance
Burmese Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 32B 48.28±0.14 |
![]() ![]() 27B 47.36±0.17 |
![]() ![]() 27B 46.50±0.16 |
![]() ![]() 109B MoE 44.27±0.17 |
![]() ![]() 80B MoE 43.68±0.16 |
![]() ![]() 32B 43.18±0.19 |
![]() ![]() 12B 41.42±0.15 |
![]() ![]() 70B 36.78±0.28 |
![]() ![]() 70B 33.71±0.17 |
![]() ![]() 14B 31.19±0.14 |
![]() ![]() 72B 26.39±0.20 |
![]() ![]() 32B 25.74±0.17 |
![]() ![]() 30B MoE 25.55±0.12 |
![]() ![]() 8B 25.16±0.20 |
![]() ![]() 123B 25.10±0.29 |
![]() ![]() 27B 22.81±0.24 |
![]() ![]() 70B 21.72±0.15 |
![]() ![]() 70B 18.55±0.24 |
![]() ![]() 8B 17.54±0.26 |
![]() ![]() 21B MoE 17.16±0.15 |
![]() ![]() 111B 16.20±0.22 |
![]() ![]() 9B 14.66±0.22 |
![]() ![]() 70B 13.09±0.17 |
![]() ![]() 14B 12.45±0.19 |
![]() ![]() 70B 12.28±0.25 |
![]() ![]() 8B 11.65±0.13 |
![]() ![]() 8B 9.78±0.13 |
![]() ![]() 10B 9.71±0.16 |
![]() ![]() 83B 8.95±0.22 |
![]() ![]() 9B 8.95±0.17 |
![]() ![]() 8B 8.58±0.21 |
![]() ![]() 20B 8.55±0.11 |
![]() ![]() 8B 7.07±0.19 |
![]() ![]() 7B 7.01±0.15 |
![]() ![]() 9B 7.00±0.18 |
![]() ![]() 32B 6.44±0.15 |
![]() ![]() 7B 6.21±0.16 |
![]() ![]() 14B 5.85±0.18 |
![]() ![]() 104B 5.67±0.18 |
![]() ![]() 32B 4.38±0.15 |
![]() ![]() 32B 3.91±0.15 |
![]() ![]() 8B 3.66±0.13 |
![]() ![]() 8B 3.20±0.06 |
![]() ![]() 8B 3.03±0.13 |
![]() ![]() 7B 2.65±0.13 |
![]() ![]() 7B 2.18±0.12 |
![]() ![]() 24B 2.02±0.11 |
![]() ![]() 13B 1.84±0.09 |
Burmese Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MY | Instruction Following | Multi-Turn Chat | NLG | NLR | NLU | Safety | Knowledge |
---|---|---|---|---|---|---|---|---|
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 48.28 ± 0.14 | 61.20 ± 0.83 | 15.72 ± 0.37 | 34.84 ± 0.10 | 61.75 ± 0.15 | 61.01 ± 0.12 | 53.48 ± 0.31 | 49.98 ± 0.18 |
![]() ![]() Gemma 3 27B | 47.36 ± 0.17 | 63.03 ± 0.80 | 34.28 ± 0.79 | 40.82 ± 0.06 | 55.45 ± 0.29 | 58.49 ± 0.18 | 34.35 ± 0.21 | 45.09 ± 0.25 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 46.50 ± 0.16 | 61.57 ± 0.73 | 31.95 ± 0.79 | 39.86 ± 0.07 | 55.03 ± 0.25 | 58.33 ± 0.24 | 33.82 ± 0.29 | 44.92 ± 0.36 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 44.27 ± 0.17 | 63.80 ± 0.83 | 12.57 ± 0.43 | 44.41 ± 0.07 | 51.07 ± 0.18 | 44.56 ± 0.24 | 45.38 ± 0.20 | 48.08 ± 0.22 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 43.68 ± 0.16 | 61.17 ± 0.75 | 18.89 ± 0.49 | 31.71 ± 0.08 | 53.02 ± 0.30 | 58.39 ± 0.24 | 37.85 ± 0.34 | 44.72 ± 0.38 |
![]() ![]() Qwen 3 32B Alibaba | 43.18 ± 0.19 | 64.60 ± 0.98 | 13.36 ± 0.46 | 25.08 ± 0.08 | 56.38 ± 0.29 | 57.92 ± 0.39 | 43.53 ± 0.56 | 41.41 ± 0.48 |
![]() ![]() Gemma 3 12B | 41.42 ± 0.15 | 56.37 ± 0.74 | 25.50 ± 0.68 | 37.54 ± 0.09 | 35.58 ± 0.41 | 58.03 ± 0.23 | 40.15 ± 0.30 | 36.75 ± 0.31 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 36.78 ± 0.28 | 70.83 ± 0.99 | 11.16 ± 0.48 | 34.27 ± 0.10 | 49.46 ± 0.52 | 36.65 ± 0.35 | 20.88 ± 1.13 | 34.19 ± 0.70 |
![]() ![]() Tulu 3 70B AI2 | 33.71 ± 0.17 | 53.93 ± 0.97 | 7.59 ± 0.40 | 36.44 ± 0.10 | 50.27 ± 0.37 | 50.68 ± 0.40 | 0.00 ± 0.00 | 37.05 ± 0.37 |
![]() ![]() Qwen 3 14B Alibaba | 31.19 ± 0.14 | 46.47 ± 0.73 | 8.99 ± 0.30 | 28.41 ± 0.09 | 24.78 ± 0.33 | 49.11 ± 0.30 | 33.15 ± 0.35 | 27.40 ± 0.43 |
![]() ![]() Qwen 2.5 72B Alibaba | 26.39 ± 0.20 | 50.17 ± 1.07 | 8.52 ± 0.32 | 25.91 ± 0.10 | 9.00 ± 0.18 | 44.24 ± 0.25 | 21.88 ± 0.48 | 25.00 ± 0.38 |
![]() ![]() Qwen 2.5 32B Alibaba | 25.74 ± 0.17 | 50.77 ± 0.83 | 6.38 ± 0.35 | 22.98 ± 0.09 | 9.64 ± 0.38 | 37.70 ± 0.41 | 25.07 ± 0.37 | 27.66 ± 0.57 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.55 ± 0.12 | 53.30 ± 0.66 | 13.71 ± 0.42 | 17.53 ± 0.06 | 22.03 ± 0.24 | 50.08 ± 0.24 | 22.18 ± 0.22 | 0.00 ± 0.00 |
![]() ![]() Qwen 3 8B Alibaba | 25.16 ± 0.20 | 38.50 ± 1.05 | 8.91 ± 0.42 | 26.34 ± 0.07 | 16.93 ± 0.33 | 36.93 ± 0.33 | 31.78 ± 0.31 | 16.75 ± 0.47 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 25.10 ± 0.29 | 49.90 ± 0.97 | 5.80 ± 0.38 | 29.91 ± 0.15 | 8.88 ± 0.74 | 38.82 ± 0.68 | 30.02 ± 1.52 | 12.36 ± 0.59 |
![]() ![]() Gemma 2 27B | 22.81 ± 0.24 | 47.47 ± 1.11 | 5.45 ± 0.24 | 27.76 ± 0.09 | 28.94 ± 0.56 | 26.38 ± 0.91 | 0.00 ± 0.00 | 23.70 ± 0.52 |
![]() ![]() Llama 3.3 70B Meta | 21.72 ± 0.15 | 60.33 ± 0.91 | 7.05 ± 0.35 | 34.68 ± 0.09 | 21.25 ± 0.16 | 1.85 ± 0.36 | 22.03 ± 0.67 | 4.80 ± 0.22 |
![]() ![]() Llama 3.1 70B Meta | 18.55 ± 0.24 | 48.23 ± 1.25 | 6.31 ± 0.47 | 33.63 ± 0.14 | 17.54 ± 0.35 | 18.81 ± 0.65 | 0.00 ± 0.00 | 5.34 ± 0.45 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 17.54 ± 0.26 | 49.23 ± 0.88 | 4.35 ± 0.31 | 7.40 ± 0.06 | 10.73 ± 0.72 | 29.80 ± 0.62 | 7.25 ± 1.14 | 14.03 ± 0.57 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 17.16 ± 0.15 | 48.43 ± 0.97 | 13.26 ± 0.46 | 37.50 ± 0.10 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 20.91 ± 0.41 |
![]() ![]() Command A 03-2025 111B CohereLabs | 16.20 ± 0.22 | 48.23 ± 1.06 | 6.82 ± 0.40 | 19.12 ± 0.10 | 9.28 ± 0.29 | 23.73 ± 0.65 | 0.00 ± 0.00 | 6.21 ± 0.45 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 14.66 ± 0.22 | 50.00 ± 1.19 | 8.99 ± 0.39 | 28.73 ± 0.09 | 12.55 ± 0.18 | 2.26 ± 0.67 | 0.00 ± 0.00 | 0.06 ± 0.05 |
![]() ![]() Llama 3 70B Meta | 13.09 ± 0.17 | 11.03 ± 0.58 | 5.47 ± 0.39 | 24.94 ± 0.05 | 34.05 ± 0.39 | 7.67 ± 0.56 | 0.00 ± 0.00 | 8.48 ± 0.40 |
![]() ![]() Qwen 2.5 14B Alibaba | 12.45 ± 0.19 | 43.73 ± 1.17 | 4.89 ± 0.28 | 19.03 ± 0.07 | 2.18 ± 0.34 | 16.98 ± 0.34 | 0.00 ± 0.00 | 0.36 ± 0.07 |
![]() ![]() Apertus 70B Swiss AI | 12.28 ± 0.25 | 35.57 ± 1.21 | 7.77 ± 0.54 | 31.08 ± 0.09 | 0.00 ± 0.00 | 10.89 ± 0.71 | 0.00 ± 0.00 | 0.68 ± 0.12 |
![]() ![]() Sailor2 8B SAIL | 11.65 ± 0.13 | 25.47 ± 0.73 | 9.64 ± 0.41 | 24.38 ± 0.07 | 21.87 ± 0.39 | 0.13 ± 0.13 | 0.00 ± 0.00 | 0.04 ± 0.03 |
![]() ![]() Tulu 3 8B AI2 | 9.78 ± 0.13 | 38.53 ± 0.71 | 1.26 ± 0.15 | 21.28 ± 0.08 | 0.68 ± 0.06 | 2.94 ± 0.32 | 0.00 ± 0.00 | 3.77 ± 0.26 |
![]() ![]() MERaLiON 2 10B A*STAR | 9.71 ± 0.16 | 36.33 ± 0.96 | 3.36 ± 0.27 | 20.71 ± 0.10 | 6.30 ± 0.15 | 0.03 ± 0.07 | 0.00 ± 0.00 | 1.20 ± 0.23 |
![]() ![]() Babel 83B Alibaba-DAMO | 8.95 ± 0.22 | 24.57 ± 0.95 | 0.98 ± 0.16 | 6.64 ± 0.11 | 0.20 ± 0.16 | 19.00 ± 1.16 | 0.00 ± 0.00 | 11.23 ± 0.70 |
![]() ![]() Gemma 2 9B | 8.95 ± 0.17 | 34.27 ± 1.15 | 3.19 ± 0.33 | 20.04 ± 0.10 | 4.58 ± 0.32 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.54 ± 0.10 |
![]() ![]() Apertus 8B Swiss AI | 8.58 ± 0.21 | 28.73 ± 1.05 | 3.20 ± 0.38 | 21.91 ± 0.11 | 0.13 ± 0.11 | 3.35 ± 0.74 | 0.00 ± 0.00 | 2.72 ± 0.29 |
![]() ![]() Sailor2 20B SAIL | 8.55 ± 0.11 | 24.53 ± 0.84 | 8.89 ± 0.38 | 26.43 ± 0.07 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Llama 3.1 8B Meta | 7.07 ± 0.19 | 26.67 ± 1.04 | 2.72 ± 0.28 | 14.02 ± 0.13 | 0.00 ± 0.00 | 3.74 ± 0.51 | 0.00 ± 0.00 | 2.32 ± 0.25 |
![]() ![]() Qwen 2.5 7B Alibaba | 7.01 ± 0.15 | 35.80 ± 0.91 | 2.18 ± 0.26 | 8.13 ± 0.09 | 2.98 ± 0.41 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Babel 9B Alibaba-DAMO | 7.00 ± 0.18 | 23.87 ± 1.10 | 3.03 ± 0.28 | 10.84 ± 0.12 | 0.00 ± 0.00 | 11.20 ± 0.31 | 0.00 ± 0.00 | 0.04 ± 0.03 |
![]() ![]() Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 29.80 ± 0.91 | 1.39 ± 0.17 | 10.33 ± 0.09 | 0.25 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 | 3.30 ± 0.27 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 6.21 ± 0.16 | 27.63 ± 1.11 | 3.82 ± 0.35 | 11.83 ± 0.10 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.19 ± 0.06 |
![]() ![]() phi-4 14B Microsoft | 5.85 ± 0.18 | 20.90 ± 0.96 | 1.90 ± 0.34 | 9.29 ± 0.10 | 0.00 ± 0.00 | 8.89 ± 0.35 | 0.00 ± 0.00 | 0.01 ± 0.02 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 5.67 ± 0.18 | 22.87 ± 1.13 | 0.80 ± 0.13 | 14.38 ± 0.09 | 0.00 ± 0.00 | 0.91 ± 0.34 | 0.00 ± 0.00 | 0.73 ± 0.15 |
![]() ![]() Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 22.43 ± 1.02 | 0.78 ± 0.16 | 7.36 ± 0.06 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.07 ± 0.04 |
![]() ![]() Command R 08-2024 32B CohereLabs | 3.91 ± 0.15 | 17.83 ± 0.90 | 0.33 ± 0.13 | 6.63 ± 0.09 | 0.68 ± 0.34 | 0.54 ± 0.31 | 0.00 ± 0.00 | 1.33 ± 0.19 |
![]() ![]() Ministral 2410 8B Mistral AI | 3.66 ± 0.13 | 10.33 ± 0.82 | 0.88 ± 0.18 | 14.10 ± 0.09 | 0.15 ± 0.12 | 0.03 ± 0.05 | 0.00 ± 0.00 | 0.16 ± 0.09 |
![]() ![]() Llama 3 8B Meta | 3.20 ± 0.06 | 8.87 ± 0.44 | 2.07 ± 0.25 | 11.47 ± 0.08 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.01 |
![]() ![]() Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 19.77 ± 0.88 | 0.00 ± 0.00 | 1.41 ± 0.03 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 2.65 ± 0.13 | 12.37 ± 0.89 | 0.03 ± 0.04 | 6.14 ± 0.10 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.05 ± 0.03 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 11.90 ± 0.89 | 0.13 ± 0.07 | 3.22 ± 0.05 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 2.02 ± 0.11 | 10.03 ± 0.82 | 0.50 ± 0.13 | 3.50 ± 0.08 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.10 ± 0.05 |
![]() ![]() Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 11.90 ± 0.65 | 0.43 ± 0.00 | 0.52 ± 0.02 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.03 ± 0.03 |
Burmese Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MY | Instruction Following | SEA-IFEval |
---|---|---|---|
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 48.28 ± 0.14 | 61.20 ± 0.83 | 61.20 ± 0.83 |
![]() ![]() Gemma 3 27B | 47.36 ± 0.17 | 63.03 ± 0.80 | 63.03 ± 0.80 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 46.50 ± 0.16 | 61.57 ± 0.73 | 61.57 ± 0.73 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 44.27 ± 0.17 | 63.80 ± 0.83 | 63.80 ± 0.83 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 43.68 ± 0.16 | 61.17 ± 0.75 | 61.17 ± 0.75 |
![]() ![]() Qwen 3 32B Alibaba | 43.18 ± 0.19 | 64.60 ± 0.98 | 64.60 ± 0.98 |
![]() ![]() Gemma 3 12B | 41.42 ± 0.15 | 56.37 ± 0.74 | 56.37 ± 0.74 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 36.78 ± 0.28 | 70.83 ± 0.99 | 70.83 ± 0.99 |
![]() ![]() Tulu 3 70B AI2 | 33.71 ± 0.17 | 53.93 ± 0.97 | 53.93 ± 0.97 |
![]() ![]() Qwen 3 14B Alibaba | 31.19 ± 0.14 | 46.47 ± 0.73 | 46.47 ± 0.73 |
![]() ![]() Qwen 2.5 72B Alibaba | 26.39 ± 0.20 | 50.17 ± 1.07 | 50.17 ± 1.07 |
![]() ![]() Qwen 2.5 32B Alibaba | 25.74 ± 0.17 | 50.77 ± 0.83 | 50.77 ± 0.83 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.55 ± 0.12 | 53.30 ± 0.66 | 53.30 ± 0.66 |
![]() ![]() Qwen 3 8B Alibaba | 25.16 ± 0.20 | 38.50 ± 1.05 | 38.50 ± 1.05 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 25.10 ± 0.29 | 49.90 ± 0.97 | 49.90 ± 0.97 |
![]() ![]() Gemma 2 27B | 22.81 ± 0.24 | 47.47 ± 1.11 | 47.47 ± 1.11 |
![]() ![]() Llama 3.3 70B Meta | 21.72 ± 0.15 | 60.33 ± 0.91 | 60.33 ± 0.91 |
![]() ![]() Llama 3.1 70B Meta | 18.55 ± 0.24 | 48.23 ± 1.25 | 48.23 ± 1.25 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 17.54 ± 0.26 | 49.23 ± 0.88 | 49.23 ± 0.88 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 17.16 ± 0.15 | 48.43 ± 0.97 | 48.43 ± 0.97 |
![]() ![]() Command A 03-2025 111B CohereLabs | 16.20 ± 0.22 | 48.23 ± 1.06 | 48.23 ± 1.06 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 14.66 ± 0.22 | 50.00 ± 1.19 | 50.00 ± 1.19 |
![]() ![]() Llama 3 70B Meta | 13.09 ± 0.17 | 11.03 ± 0.58 | 11.03 ± 0.58 |
![]() ![]() Qwen 2.5 14B Alibaba | 12.45 ± 0.19 | 43.73 ± 1.17 | 43.73 ± 1.17 |
![]() ![]() Apertus 70B Swiss AI | 12.28 ± 0.25 | 35.57 ± 1.21 | 35.57 ± 1.21 |
![]() ![]() Sailor2 8B SAIL | 11.65 ± 0.13 | 25.47 ± 0.73 | 25.47 ± 0.73 |
![]() ![]() Tulu 3 8B AI2 | 9.78 ± 0.13 | 38.53 ± 0.71 | 38.53 ± 0.71 |
![]() ![]() MERaLiON 2 10B A*STAR | 9.71 ± 0.16 | 36.33 ± 0.96 | 36.33 ± 0.96 |
![]() ![]() Babel 83B Alibaba-DAMO | 8.95 ± 0.22 | 24.57 ± 0.95 | 24.57 ± 0.95 |
![]() ![]() Gemma 2 9B | 8.95 ± 0.17 | 34.27 ± 1.15 | 34.27 ± 1.15 |
![]() ![]() Apertus 8B Swiss AI | 8.58 ± 0.21 | 28.73 ± 1.05 | 28.73 ± 1.05 |
![]() ![]() Sailor2 20B SAIL | 8.55 ± 0.11 | 24.53 ± 0.84 | 24.53 ± 0.84 |
![]() ![]() Llama 3.1 8B Meta | 7.07 ± 0.19 | 26.67 ± 1.04 | 26.67 ± 1.04 |
![]() ![]() Qwen 2.5 7B Alibaba | 7.01 ± 0.15 | 35.80 ± 0.91 | 35.80 ± 0.91 |
![]() ![]() Babel 9B Alibaba-DAMO | 7.00 ± 0.18 | 23.87 ± 1.10 | 23.87 ± 1.10 |
![]() ![]() Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 29.80 ± 0.91 | 29.80 ± 0.91 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 6.21 ± 0.16 | 27.63 ± 1.11 | 27.63 ± 1.11 |
![]() ![]() phi-4 14B Microsoft | 5.85 ± 0.18 | 20.90 ± 0.96 | 20.90 ± 0.96 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 5.67 ± 0.18 | 22.87 ± 1.13 | 22.87 ± 1.13 |
![]() ![]() Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 22.43 ± 1.02 | 22.43 ± 1.02 |
![]() ![]() Command R 08-2024 32B CohereLabs | 3.91 ± 0.15 | 17.83 ± 0.90 | 17.83 ± 0.90 |
![]() ![]() Ministral 2410 8B Mistral AI | 3.66 ± 0.13 | 10.33 ± 0.82 | 10.33 ± 0.82 |
![]() ![]() Llama 3 8B Meta | 3.20 ± 0.06 | 8.87 ± 0.44 | 8.87 ± 0.44 |
![]() ![]() Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 19.77 ± 0.88 | 19.77 ± 0.88 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 2.65 ± 0.13 | 12.37 ± 0.89 | 12.37 ± 0.89 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 11.90 ± 0.89 | 11.90 ± 0.89 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 2.02 ± 0.11 | 10.03 ± 0.82 | 10.03 ± 0.82 |
![]() ![]() Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 11.90 ± 0.65 | 11.90 ± 0.65 |
Model | MY | Multi-Turn Chat | SEA-MT-Bench |
---|---|---|---|
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 48.28 ± 0.14 | 15.72 ± 0.37 | 15.72 ± 0.37 |
![]() ![]() Gemma 3 27B | 47.36 ± 0.17 | 34.28 ± 0.79 | 34.28 ± 0.79 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 46.50 ± 0.16 | 31.95 ± 0.79 | 31.95 ± 0.79 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 44.27 ± 0.17 | 12.57 ± 0.43 | 12.57 ± 0.43 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 43.68 ± 0.16 | 18.89 ± 0.49 | 18.89 ± 0.49 |
![]() ![]() Qwen 3 32B Alibaba | 43.18 ± 0.19 | 13.36 ± 0.46 | 13.36 ± 0.46 |
![]() ![]() Gemma 3 12B | 41.42 ± 0.15 | 25.50 ± 0.68 | 25.50 ± 0.68 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 36.78 ± 0.28 | 11.16 ± 0.48 | 11.16 ± 0.48 |
![]() ![]() Tulu 3 70B AI2 | 33.71 ± 0.17 | 7.59 ± 0.40 | 7.59 ± 0.40 |
![]() ![]() Qwen 3 14B Alibaba | 31.19 ± 0.14 | 8.99 ± 0.30 | 8.99 ± 0.30 |
![]() ![]() Qwen 2.5 72B Alibaba | 26.39 ± 0.20 | 8.52 ± 0.32 | 8.52 ± 0.32 |
![]() ![]() Qwen 2.5 32B Alibaba | 25.74 ± 0.17 | 6.38 ± 0.35 | 6.38 ± 0.35 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.55 ± 0.12 | 13.71 ± 0.42 | 13.71 ± 0.42 |
![]() ![]() Qwen 3 8B Alibaba | 25.16 ± 0.20 | 8.91 ± 0.42 | 8.91 ± 0.42 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 25.10 ± 0.29 | 5.80 ± 0.38 | 5.80 ± 0.38 |
![]() ![]() Gemma 2 27B | 22.81 ± 0.24 | 5.45 ± 0.24 | 5.45 ± 0.24 |
![]() ![]() Llama 3.3 70B Meta | 21.72 ± 0.15 | 7.05 ± 0.35 | 7.05 ± 0.35 |
![]() ![]() Llama 3.1 70B Meta | 18.55 ± 0.24 | 6.31 ± 0.47 | 6.31 ± 0.47 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 17.54 ± 0.26 | 4.35 ± 0.31 | 4.35 ± 0.31 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 17.16 ± 0.15 | 13.26 ± 0.46 | 13.26 ± 0.46 |
![]() ![]() Command A 03-2025 111B CohereLabs | 16.20 ± 0.22 | 6.82 ± 0.40 | 6.82 ± 0.40 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 14.66 ± 0.22 | 8.99 ± 0.39 | 8.99 ± 0.39 |
![]() ![]() Llama 3 70B Meta | 13.09 ± 0.17 | 5.47 ± 0.39 | 5.47 ± 0.39 |
![]() ![]() Qwen 2.5 14B Alibaba | 12.45 ± 0.19 | 4.89 ± 0.28 | 4.89 ± 0.28 |
![]() ![]() Apertus 70B Swiss AI | 12.28 ± 0.25 | 7.77 ± 0.54 | 7.77 ± 0.54 |
![]() ![]() Sailor2 8B SAIL | 11.65 ± 0.13 | 9.64 ± 0.41 | 9.64 ± 0.41 |
![]() ![]() Tulu 3 8B AI2 | 9.78 ± 0.13 | 1.26 ± 0.15 | 1.26 ± 0.15 |
![]() ![]() MERaLiON 2 10B A*STAR | 9.71 ± 0.16 | 3.36 ± 0.27 | 3.36 ± 0.27 |
![]() ![]() Babel 83B Alibaba-DAMO | 8.95 ± 0.22 | 0.98 ± 0.16 | 0.98 ± 0.16 |
![]() ![]() Gemma 2 9B | 8.95 ± 0.17 | 3.19 ± 0.33 | 3.19 ± 0.33 |
![]() ![]() Apertus 8B Swiss AI | 8.58 ± 0.21 | 3.20 ± 0.38 | 3.20 ± 0.38 |
![]() ![]() Sailor2 20B SAIL | 8.55 ± 0.11 | 8.89 ± 0.38 | 8.89 ± 0.38 |
![]() ![]() Llama 3.1 8B Meta | 7.07 ± 0.19 | 2.72 ± 0.28 | 2.72 ± 0.28 |
![]() ![]() Qwen 2.5 7B Alibaba | 7.01 ± 0.15 | 2.18 ± 0.26 | 2.18 ± 0.26 |
![]() ![]() Babel 9B Alibaba-DAMO | 7.00 ± 0.18 | 3.03 ± 0.28 | 3.03 ± 0.28 |
![]() ![]() Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 1.39 ± 0.17 | 1.39 ± 0.17 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 6.21 ± 0.16 | 3.82 ± 0.35 | 3.82 ± 0.35 |
![]() ![]() phi-4 14B Microsoft | 5.85 ± 0.18 | 1.90 ± 0.34 | 1.90 ± 0.34 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 5.67 ± 0.18 | 0.80 ± 0.13 | 0.80 ± 0.13 |
![]() ![]() Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 0.78 ± 0.16 | 0.78 ± 0.16 |
![]() ![]() Command R 08-2024 32B CohereLabs | 3.91 ± 0.15 | 0.33 ± 0.13 | 0.33 ± 0.13 |
![]() ![]() Ministral 2410 8B Mistral AI | 3.66 ± 0.13 | 0.88 ± 0.18 | 0.88 ± 0.18 |
![]() ![]() Llama 3 8B Meta | 3.20 ± 0.06 | 2.07 ± 0.25 | 2.07 ± 0.25 |
![]() ![]() Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 2.65 ± 0.13 | 0.03 ± 0.04 | 0.03 ± 0.04 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 0.13 ± 0.07 | 0.13 ± 0.07 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 2.02 ± 0.11 | 0.50 ± 0.13 | 0.50 ± 0.13 |
![]() ![]() Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 0.43 ± 0.00 | 0.43 ± 0.00 |
Model | MY | NLG | Summarization | Translations |
---|---|---|---|---|
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 48.28 ± 0.14 | 34.84 ± 0.10 | 5.68 ± 0.08 | 64.00 ± 0.19 |
![]() ![]() Gemma 3 27B | 47.36 ± 0.17 | 40.82 ± 0.06 | 2.21 ± 0.08 | 79.44 ± 0.07 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 46.50 ± 0.16 | 39.86 ± 0.07 | 1.89 ± 0.08 | 77.83 ± 0.10 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 44.27 ± 0.17 | 44.41 ± 0.07 | 7.68 ± 0.13 | 81.15 ± 0.04 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 43.68 ± 0.16 | 31.71 ± 0.08 | 4.85 ± 0.07 | 58.58 ± 0.15 |
![]() ![]() Qwen 3 32B Alibaba | 43.18 ± 0.19 | 25.08 ± 0.08 | 5.45 ± 0.11 | 44.71 ± 0.18 |
![]() ![]() Gemma 3 12B | 41.42 ± 0.15 | 37.54 ± 0.09 | 4.12 ± 0.12 | 70.96 ± 0.09 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 36.78 ± 0.28 | 34.27 ± 0.10 | 5.26 ± 0.14 | 63.28 ± 0.16 |
![]() ![]() Tulu 3 70B AI2 | 33.71 ± 0.17 | 36.44 ± 0.10 | 6.27 ± 0.19 | 66.60 ± 0.09 |
![]() ![]() Qwen 3 14B Alibaba | 31.19 ± 0.14 | 28.41 ± 0.09 | 4.95 ± 0.09 | 51.87 ± 0.16 |
![]() ![]() Qwen 2.5 72B Alibaba | 26.39 ± 0.20 | 25.91 ± 0.10 | 5.50 ± 0.14 | 46.32 ± 0.14 |
![]() ![]() Qwen 2.5 32B Alibaba | 25.74 ± 0.17 | 22.98 ± 0.09 | 6.61 ± 0.11 | 39.36 ± 0.15 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.55 ± 0.12 | 17.53 ± 0.06 | 0.21 ± 0.04 | 34.84 ± 0.12 |
![]() ![]() Qwen 3 8B Alibaba | 25.16 ± 0.20 | 26.34 ± 0.07 | 4.79 ± 0.09 | 47.90 ± 0.13 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 25.10 ± 0.29 | 29.91 ± 0.15 | 4.75 ± 0.16 | 55.08 ± 0.20 |
![]() ![]() Gemma 2 27B | 22.81 ± 0.24 | 27.76 ± 0.09 | 5.19 ± 0.14 | 50.32 ± 0.12 |
![]() ![]() Llama 3.3 70B Meta | 21.72 ± 0.15 | 34.68 ± 0.09 | 9.30 ± 0.13 | 60.06 ± 0.15 |
![]() ![]() Llama 3.1 70B Meta | 18.55 ± 0.24 | 33.63 ± 0.14 | 8.84 ± 0.17 | 58.42 ± 0.20 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 17.54 ± 0.26 | 7.40 ± 0.06 | 0.87 ± 0.07 | 13.92 ± 0.12 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 17.16 ± 0.15 | 37.50 ± 0.10 | 2.76 ± 0.08 | 72.24 ± 0.18 |
![]() ![]() Command A 03-2025 111B CohereLabs | 16.20 ± 0.22 | 19.12 ± 0.10 | 1.15 ± 0.10 | 37.10 ± 0.19 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 14.66 ± 0.22 | 28.73 ± 0.09 | 4.37 ± 0.11 | 53.10 ± 0.15 |
![]() ![]() Llama 3 70B Meta | 13.09 ± 0.17 | 24.94 ± 0.05 | 0.00 ± 0.00 | 49.89 ± 0.11 |
![]() ![]() Qwen 2.5 14B Alibaba | 12.45 ± 0.19 | 19.03 ± 0.07 | 5.59 ± 0.10 | 32.46 ± 0.10 |
![]() ![]() Apertus 70B Swiss AI | 12.28 ± 0.25 | 31.08 ± 0.09 | 3.90 ± 0.08 | 58.27 ± 0.17 |
![]() ![]() Sailor2 8B SAIL | 11.65 ± 0.13 | 24.38 ± 0.07 | 0.00 ± 0.00 | 48.76 ± 0.14 |
![]() ![]() Tulu 3 8B AI2 | 9.78 ± 0.13 | 21.28 ± 0.08 | 5.08 ± 0.14 | 37.48 ± 0.10 |
![]() ![]() MERaLiON 2 10B A*STAR | 9.71 ± 0.16 | 20.71 ± 0.10 | 6.09 ± 0.11 | 35.33 ± 0.15 |
![]() ![]() Babel 83B Alibaba-DAMO | 8.95 ± 0.22 | 6.64 ± 0.11 | 3.68 ± 0.16 | 9.61 ± 0.15 |
![]() ![]() Gemma 2 9B | 8.95 ± 0.17 | 20.04 ± 0.10 | 4.44 ± 0.15 | 35.64 ± 0.13 |
![]() ![]() Apertus 8B Swiss AI | 8.58 ± 0.21 | 21.91 ± 0.11 | 3.15 ± 0.16 | 40.68 ± 0.16 |
![]() ![]() Sailor2 20B SAIL | 8.55 ± 0.11 | 26.43 ± 0.07 | 0.00 ± 0.00 | 52.86 ± 0.15 |
![]() ![]() Llama 3.1 8B Meta | 7.07 ± 0.19 | 14.02 ± 0.13 | 7.35 ± 0.16 | 20.69 ± 0.14 |
![]() ![]() Qwen 2.5 7B Alibaba | 7.01 ± 0.15 | 8.13 ± 0.09 | 5.03 ± 0.12 | 11.22 ± 0.11 |
![]() ![]() Babel 9B Alibaba-DAMO | 7.00 ± 0.18 | 10.84 ± 0.12 | 4.17 ± 0.15 | 17.51 ± 0.22 |
![]() ![]() Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 10.33 ± 0.09 | 0.00 ± 0.00 | 20.66 ± 0.18 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 6.21 ± 0.16 | 11.83 ± 0.10 | 4.79 ± 0.12 | 18.87 ± 0.15 |
![]() ![]() phi-4 14B Microsoft | 5.85 ± 0.18 | 9.29 ± 0.10 | 2.39 ± 0.13 | 16.19 ± 0.12 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 5.67 ± 0.18 | 14.38 ± 0.09 | 2.82 ± 0.10 | 25.95 ± 0.17 |
![]() ![]() Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 7.36 ± 0.06 | 0.00 ± 0.00 | 14.73 ± 0.12 |
![]() ![]() Command R 08-2024 32B CohereLabs | 3.91 ± 0.15 | 6.63 ± 0.09 | 3.51 ± 0.14 | 9.75 ± 0.11 |
![]() ![]() Ministral 2410 8B Mistral AI | 3.66 ± 0.13 | 14.10 ± 0.09 | 0.85 ± 0.09 | 27.35 ± 0.17 |
![]() ![]() Llama 3 8B Meta | 3.20 ± 0.06 | 11.47 ± 0.08 | 0.00 ± 0.00 | 22.95 ± 0.15 |
![]() ![]() Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 1.41 ± 0.03 | 0.00 ± 0.00 | 2.82 ± 0.06 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 2.65 ± 0.13 | 6.14 ± 0.10 | 2.54 ± 0.12 | 9.73 ± 0.14 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 3.22 ± 0.05 | 0.00 ± 0.00 | 6.44 ± 0.10 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 2.02 ± 0.11 | 3.50 ± 0.08 | 0.65 ± 0.03 | 6.34 ± 0.14 |
![]() ![]() Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 0.52 ± 0.02 | 0.00 ± 0.00 | 1.04 ± 0.03 |
Model | MY | NLR | Causal Reasoning | Natural Language Inference |
---|---|---|---|---|
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 48.28 ± 0.14 | 61.75 ± 0.15 | 66.38 ± 0.20 | 57.13 ± 0.20 |
![]() ![]() Gemma 3 27B | 47.36 ± 0.17 | 55.45 ± 0.29 | 57.08 ± 0.51 | 53.82 ± 0.16 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 46.50 ± 0.16 | 55.03 ± 0.25 | 57.15 ± 0.47 | 52.92 ± 0.18 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 44.27 ± 0.17 | 51.07 ± 0.18 | 70.40 ± 0.32 | 31.74 ± 0.21 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 43.68 ± 0.16 | 53.02 ± 0.30 | 59.27 ± 0.60 | 46.77 ± 0.22 |
![]() ![]() Qwen 3 32B Alibaba | 43.18 ± 0.19 | 56.38 ± 0.29 | 60.68 ± 0.45 | 52.07 ± 0.34 |
![]() ![]() Gemma 3 12B | 41.42 ± 0.15 | 35.58 ± 0.41 | 44.93 ± 0.77 | 26.22 ± 0.16 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 36.78 ± 0.28 | 49.46 ± 0.52 | 54.30 ± 0.91 | 44.63 ± 0.47 |
![]() ![]() Tulu 3 70B AI2 | 33.71 ± 0.17 | 50.27 ± 0.37 | 56.78 ± 0.51 | 43.75 ± 0.43 |
![]() ![]() Qwen 3 14B Alibaba | 31.19 ± 0.14 | 24.78 ± 0.33 | 31.75 ± 0.52 | 17.80 ± 0.30 |
![]() ![]() Qwen 2.5 72B Alibaba | 26.39 ± 0.20 | 9.00 ± 0.18 | 5.58 ± 0.22 | 12.41 ± 0.25 |
![]() ![]() Qwen 2.5 32B Alibaba | 25.74 ± 0.17 | 9.64 ± 0.38 | 13.68 ± 0.64 | 5.60 ± 0.47 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.55 ± 0.12 | 22.03 ± 0.24 | 0.00 ± 0.00 | 44.07 ± 0.47 |
![]() ![]() Qwen 3 8B Alibaba | 25.16 ± 0.20 | 16.93 ± 0.33 | 30.70 ± 0.65 | 3.16 ± 0.14 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 25.10 ± 0.29 | 8.88 ± 0.74 | 7.00 ± 1.42 | 10.76 ± 0.40 |
![]() ![]() Gemma 2 27B | 22.81 ± 0.24 | 28.94 ± 0.56 | 25.78 ± 1.05 | 32.10 ± 0.27 |
![]() ![]() Llama 3.3 70B Meta | 21.72 ± 0.15 | 21.25 ± 0.16 | 0.00 ± 0.00 | 42.51 ± 0.31 |
![]() ![]() Llama 3.1 70B Meta | 18.55 ± 0.24 | 17.54 ± 0.35 | 0.00 ± 0.00 | 35.08 ± 0.71 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 17.54 ± 0.26 | 10.73 ± 0.72 | 4.45 ± 1.29 | 17.01 ± 0.70 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 17.16 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command A 03-2025 111B CohereLabs | 16.20 ± 0.22 | 9.28 ± 0.29 | 0.00 ± 0.00 | 18.57 ± 0.58 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 14.66 ± 0.22 | 12.55 ± 0.18 | 0.00 ± 0.00 | 25.10 ± 0.37 |
![]() ![]() Llama 3 70B Meta | 13.09 ± 0.17 | 34.05 ± 0.39 | 32.92 ± 0.66 | 35.18 ± 0.43 |
![]() ![]() Qwen 2.5 14B Alibaba | 12.45 ± 0.19 | 2.18 ± 0.34 | 0.00 ± 0.00 | 4.36 ± 0.68 |
![]() ![]() Apertus 70B Swiss AI | 12.28 ± 0.25 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Sailor2 8B SAIL | 11.65 ± 0.13 | 21.87 ± 0.39 | 0.00 ± 0.00 | 43.74 ± 0.79 |
![]() ![]() Tulu 3 8B AI2 | 9.78 ± 0.13 | 0.68 ± 0.06 | 0.00 ± 0.00 | 1.35 ± 0.13 |
![]() ![]() MERaLiON 2 10B A*STAR | 9.71 ± 0.16 | 6.30 ± 0.15 | 0.00 ± 0.00 | 12.61 ± 0.31 |
![]() ![]() Babel 83B Alibaba-DAMO | 8.95 ± 0.22 | 0.20 ± 0.16 | 0.00 ± 0.00 | 0.40 ± 0.32 |
![]() ![]() Gemma 2 9B | 8.95 ± 0.17 | 4.58 ± 0.32 | 0.00 ± 0.00 | 9.16 ± 0.63 |
![]() ![]() Apertus 8B Swiss AI | 8.58 ± 0.21 | 0.13 ± 0.11 | 0.00 ± 0.00 | 0.25 ± 0.22 |
![]() ![]() Sailor2 20B SAIL | 8.55 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Llama 3.1 8B Meta | 7.07 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Qwen 2.5 7B Alibaba | 7.01 ± 0.15 | 2.98 ± 0.41 | 0.00 ± 0.00 | 5.97 ± 0.82 |
![]() ![]() Babel 9B Alibaba-DAMO | 7.00 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 0.25 ± 0.17 | 0.25 ± 0.24 | 0.25 ± 0.22 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 6.21 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() phi-4 14B Microsoft | 5.85 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 5.67 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R 08-2024 32B CohereLabs | 3.91 ± 0.15 | 0.68 ± 0.34 | 0.00 ± 0.00 | 1.37 ± 0.69 |
![]() ![]() Ministral 2410 8B Mistral AI | 3.66 ± 0.13 | 0.15 ± 0.12 | 0.00 ± 0.00 | 0.29 ± 0.23 |
![]() ![]() Llama 3 8B Meta | 3.20 ± 0.06 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 2.65 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 2.02 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | MY | NLU | Belebele QA | Sentiment Analysis |
---|---|---|---|---|
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 48.28 ± 0.14 | 61.01 ± 0.12 | 81.52 ± 0.22 | 40.49 ± 0.13 |
![]() ![]() Gemma 3 27B | 47.36 ± 0.17 | 58.49 ± 0.18 | 69.85 ± 0.31 | 47.13 ± 0.24 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 46.50 ± 0.16 | 58.33 ± 0.24 | 69.11 ± 0.41 | 47.56 ± 0.26 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 44.27 ± 0.17 | 44.56 ± 0.24 | 75.93 ± 0.19 | 13.19 ± 0.43 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 43.68 ± 0.16 | 58.39 ± 0.24 | 74.89 ± 0.43 | 41.88 ± 0.20 |
![]() ![]() Qwen 3 32B Alibaba | 43.18 ± 0.19 | 57.92 ± 0.39 | 74.48 ± 0.73 | 41.37 ± 0.31 |
![]() ![]() Gemma 3 12B | 41.42 ± 0.15 | 58.03 ± 0.23 | 67.11 ± 0.41 | 48.96 ± 0.22 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 36.78 ± 0.28 | 36.65 ± 0.35 | 73.30 ± 0.70 | 0.00 ± 0.00 |
![]() ![]() Tulu 3 70B AI2 | 33.71 ± 0.17 | 50.68 ± 0.40 | 66.85 ± 0.68 | 34.50 ± 0.40 |
![]() ![]() Qwen 3 14B Alibaba | 31.19 ± 0.14 | 49.11 ± 0.30 | 61.26 ± 0.63 | 36.96 ± 0.23 |
![]() ![]() Qwen 2.5 72B Alibaba | 26.39 ± 0.20 | 44.24 ± 0.25 | 48.63 ± 0.45 | 39.85 ± 0.22 |
![]() ![]() Qwen 2.5 32B Alibaba | 25.74 ± 0.17 | 37.70 ± 0.41 | 43.04 ± 0.71 | 32.36 ± 0.33 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.55 ± 0.12 | 50.08 ± 0.24 | 69.04 ± 0.44 | 31.12 ± 0.29 |
![]() ![]() Qwen 3 8B Alibaba | 25.16 ± 0.20 | 36.93 ± 0.33 | 40.04 ± 0.73 | 33.83 ± 0.33 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 25.10 ± 0.29 | 38.82 ± 0.68 | 41.41 ± 1.39 | 36.23 ± 0.41 |
![]() ![]() Gemma 2 27B | 22.81 ± 0.24 | 26.38 ± 0.91 | 31.30 ± 1.50 | 21.46 ± 0.87 |
![]() ![]() Llama 3.3 70B Meta | 21.72 ± 0.15 | 1.85 ± 0.36 | 3.70 ± 0.72 | 0.00 ± 0.00 |
![]() ![]() Llama 3.1 70B Meta | 18.55 ± 0.24 | 18.81 ± 0.65 | 37.63 ± 1.30 | 0.00 ± 0.00 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 17.54 ± 0.26 | 29.80 ± 0.62 | 37.41 ± 0.81 | 22.19 ± 0.97 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 17.16 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command A 03-2025 111B CohereLabs | 16.20 ± 0.22 | 23.73 ± 0.65 | 30.78 ± 1.22 | 16.68 ± 0.75 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 14.66 ± 0.22 | 2.26 ± 0.67 | 4.52 ± 1.35 | 0.00 ± 0.00 |
![]() ![]() Llama 3 70B Meta | 13.09 ± 0.17 | 7.67 ± 0.56 | 15.33 ± 1.12 | 0.00 ± 0.00 |
![]() ![]() Qwen 2.5 14B Alibaba | 12.45 ± 0.19 | 16.98 ± 0.34 | 33.96 ± 0.68 | 0.00 ± 0.00 |
![]() ![]() Apertus 70B Swiss AI | 12.28 ± 0.25 | 10.89 ± 0.71 | 21.78 ± 1.43 | 0.00 ± 0.00 |
![]() ![]() Sailor2 8B SAIL | 11.65 ± 0.13 | 0.13 ± 0.13 | 0.26 ± 0.27 | 0.00 ± 0.00 |
![]() ![]() Tulu 3 8B AI2 | 9.78 ± 0.13 | 2.94 ± 0.32 | 5.89 ± 0.64 | 0.00 ± 0.00 |
![]() ![]() MERaLiON 2 10B A*STAR | 9.71 ± 0.16 | 0.03 ± 0.07 | 0.00 ± 0.00 | 0.07 ± 0.13 |
![]() ![]() Babel 83B Alibaba-DAMO | 8.95 ± 0.22 | 19.00 ± 1.16 | 29.89 ± 2.08 | 8.11 ± 1.02 |
![]() ![]() Gemma 2 9B | 8.95 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Apertus 8B Swiss AI | 8.58 ± 0.21 | 3.35 ± 0.74 | 3.78 ± 1.27 | 2.92 ± 0.74 |
![]() ![]() Sailor2 20B SAIL | 8.55 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Llama 3.1 8B Meta | 7.07 ± 0.19 | 3.74 ± 0.51 | 7.48 ± 1.02 | 0.00 ± 0.00 |
![]() ![]() Qwen 2.5 7B Alibaba | 7.01 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Babel 9B Alibaba-DAMO | 7.00 ± 0.18 | 11.20 ± 0.31 | 0.00 ± 0.00 | 22.39 ± 0.63 |
![]() ![]() Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 6.21 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() phi-4 14B Microsoft | 5.85 ± 0.18 | 8.89 ± 0.35 | 0.00 ± 0.00 | 17.77 ± 0.69 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 5.67 ± 0.18 | 0.91 ± 0.34 | 0.00 ± 0.00 | 1.83 ± 0.68 |
![]() ![]() Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R 08-2024 32B CohereLabs | 3.91 ± 0.15 | 0.54 ± 0.31 | 1.07 ± 0.61 | 0.00 ± 0.00 |
![]() ![]() Ministral 2410 8B Mistral AI | 3.66 ± 0.13 | 0.03 ± 0.05 | 0.00 ± 0.00 | 0.05 ± 0.10 |
![]() ![]() Llama 3 8B Meta | 3.20 ± 0.06 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 2.65 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 2.02 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | MY | Safety | Toxicity Detection |
---|---|---|---|
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 48.28 ± 0.14 | 53.48 ± 0.31 | 53.48 ± 0.31 |
![]() ![]() Gemma 3 27B | 47.36 ± 0.17 | 34.35 ± 0.21 | 34.35 ± 0.21 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 46.50 ± 0.16 | 33.82 ± 0.29 | 33.82 ± 0.29 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 44.27 ± 0.17 | 45.38 ± 0.20 | 45.38 ± 0.20 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 43.68 ± 0.16 | 37.85 ± 0.34 | 37.85 ± 0.34 |
![]() ![]() Qwen 3 32B Alibaba | 43.18 ± 0.19 | 43.53 ± 0.56 | 43.53 ± 0.56 |
![]() ![]() Gemma 3 12B | 41.42 ± 0.15 | 40.15 ± 0.30 | 40.15 ± 0.30 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 36.78 ± 0.28 | 20.88 ± 1.13 | 20.88 ± 1.13 |
![]() ![]() Tulu 3 70B AI2 | 33.71 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Qwen 3 14B Alibaba | 31.19 ± 0.14 | 33.15 ± 0.35 | 33.15 ± 0.35 |
![]() ![]() Qwen 2.5 72B Alibaba | 26.39 ± 0.20 | 21.88 ± 0.48 | 21.88 ± 0.48 |
![]() ![]() Qwen 2.5 32B Alibaba | 25.74 ± 0.17 | 25.07 ± 0.37 | 25.07 ± 0.37 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.55 ± 0.12 | 22.18 ± 0.22 | 22.18 ± 0.22 |
![]() ![]() Qwen 3 8B Alibaba | 25.16 ± 0.20 | 31.78 ± 0.31 | 31.78 ± 0.31 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 25.10 ± 0.29 | 30.02 ± 1.52 | 30.02 ± 1.52 |
![]() ![]() Gemma 2 27B | 22.81 ± 0.24 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Llama 3.3 70B Meta | 21.72 ± 0.15 | 22.03 ± 0.67 | 22.03 ± 0.67 |
![]() ![]() Llama 3.1 70B Meta | 18.55 ± 0.24 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 17.54 ± 0.26 | 7.25 ± 1.14 | 7.25 ± 1.14 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 17.16 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command A 03-2025 111B CohereLabs | 16.20 ± 0.22 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 14.66 ± 0.22 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Llama 3 70B Meta | 13.09 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Qwen 2.5 14B Alibaba | 12.45 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Apertus 70B Swiss AI | 12.28 ± 0.25 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Sailor2 8B SAIL | 11.65 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Tulu 3 8B AI2 | 9.78 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() MERaLiON 2 10B A*STAR | 9.71 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Babel 83B Alibaba-DAMO | 8.95 ± 0.22 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Gemma 2 9B | 8.95 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Apertus 8B Swiss AI | 8.58 ± 0.21 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Sailor2 20B SAIL | 8.55 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Llama 3.1 8B Meta | 7.07 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Qwen 2.5 7B Alibaba | 7.01 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Babel 9B Alibaba-DAMO | 7.00 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 6.21 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() phi-4 14B Microsoft | 5.85 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 5.67 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R 08-2024 32B CohereLabs | 3.91 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Ministral 2410 8B Mistral AI | 3.66 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Llama 3 8B Meta | 3.20 ± 0.06 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 2.65 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 2.02 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | MY | Knowledge | Global MMLU Lite |
---|---|---|---|
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 48.28 ± 0.14 | 49.98 ± 0.18 | 49.98 ± 0.18 |
![]() ![]() Gemma 3 27B | 47.36 ± 0.17 | 45.09 ± 0.25 | 45.09 ± 0.25 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 46.50 ± 0.16 | 44.92 ± 0.36 | 44.92 ± 0.36 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 44.27 ± 0.17 | 48.08 ± 0.22 | 48.08 ± 0.22 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 43.68 ± 0.16 | 44.72 ± 0.38 | 44.72 ± 0.38 |
![]() ![]() Qwen 3 32B Alibaba | 43.18 ± 0.19 | 41.41 ± 0.48 | 41.41 ± 0.48 |
![]() ![]() Gemma 3 12B | 41.42 ± 0.15 | 36.75 ± 0.31 | 36.75 ± 0.31 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 36.78 ± 0.28 | 34.19 ± 0.70 | 34.19 ± 0.70 |
![]() ![]() Tulu 3 70B AI2 | 33.71 ± 0.17 | 37.05 ± 0.37 | 37.05 ± 0.37 |
![]() ![]() Qwen 3 14B Alibaba | 31.19 ± 0.14 | 27.40 ± 0.43 | 27.40 ± 0.43 |
![]() ![]() Qwen 2.5 72B Alibaba | 26.39 ± 0.20 | 25.00 ± 0.38 | 25.00 ± 0.38 |
![]() ![]() Qwen 2.5 32B Alibaba | 25.74 ± 0.17 | 27.66 ± 0.57 | 27.66 ± 0.57 |
![]() ![]() Qwen 3 30B MoE Alibaba | 25.55 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Qwen 3 8B Alibaba | 25.16 ± 0.20 | 16.75 ± 0.47 | 16.75 ± 0.47 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 25.10 ± 0.29 | 12.36 ± 0.59 | 12.36 ± 0.59 |
![]() ![]() Gemma 2 27B | 22.81 ± 0.24 | 23.70 ± 0.52 | 23.70 ± 0.52 |
![]() ![]() Llama 3.3 70B Meta | 21.72 ± 0.15 | 4.80 ± 0.22 | 4.80 ± 0.22 |
![]() ![]() Llama 3.1 70B Meta | 18.55 ± 0.24 | 5.34 ± 0.45 | 5.34 ± 0.45 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 17.54 ± 0.26 | 14.03 ± 0.57 | 14.03 ± 0.57 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 17.16 ± 0.15 | 20.91 ± 0.41 | 20.91 ± 0.41 |
![]() ![]() Command A 03-2025 111B CohereLabs | 16.20 ± 0.22 | 6.21 ± 0.45 | 6.21 ± 0.45 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 14.66 ± 0.22 | 0.06 ± 0.05 | 0.06 ± 0.05 |
![]() ![]() Llama 3 70B Meta | 13.09 ± 0.17 | 8.48 ± 0.40 | 8.48 ± 0.40 |
![]() ![]() Qwen 2.5 14B Alibaba | 12.45 ± 0.19 | 0.36 ± 0.07 | 0.36 ± 0.07 |
![]() ![]() Apertus 70B Swiss AI | 12.28 ± 0.25 | 0.68 ± 0.12 | 0.68 ± 0.12 |
![]() ![]() Sailor2 8B SAIL | 11.65 ± 0.13 | 0.04 ± 0.03 | 0.04 ± 0.03 |
![]() ![]() Tulu 3 8B AI2 | 9.78 ± 0.13 | 3.77 ± 0.26 | 3.77 ± 0.26 |
![]() ![]() MERaLiON 2 10B A*STAR | 9.71 ± 0.16 | 1.20 ± 0.23 | 1.20 ± 0.23 |
![]() ![]() Babel 83B Alibaba-DAMO | 8.95 ± 0.22 | 11.23 ± 0.70 | 11.23 ± 0.70 |
![]() ![]() Gemma 2 9B | 8.95 ± 0.17 | 0.54 ± 0.10 | 0.54 ± 0.10 |
![]() ![]() Apertus 8B Swiss AI | 8.58 ± 0.21 | 2.72 ± 0.29 | 2.72 ± 0.29 |
![]() ![]() Sailor2 20B SAIL | 8.55 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Llama 3.1 8B Meta | 7.07 ± 0.19 | 2.32 ± 0.25 | 2.32 ± 0.25 |
![]() ![]() Qwen 2.5 7B Alibaba | 7.01 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Babel 9B Alibaba-DAMO | 7.00 ± 0.18 | 0.04 ± 0.03 | 0.04 ± 0.03 |
![]() ![]() Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 3.30 ± 0.27 | 3.30 ± 0.27 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 6.21 ± 0.16 | 0.19 ± 0.06 | 0.19 ± 0.06 |
![]() ![]() phi-4 14B Microsoft | 5.85 ± 0.18 | 0.01 ± 0.02 | 0.01 ± 0.02 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 5.67 ± 0.18 | 0.73 ± 0.15 | 0.73 ± 0.15 |
![]() ![]() Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 0.07 ± 0.04 | 0.07 ± 0.04 |
![]() ![]() Command R 08-2024 32B CohereLabs | 3.91 ± 0.15 | 1.33 ± 0.19 | 1.33 ± 0.19 |
![]() ![]() Ministral 2410 8B Mistral AI | 3.66 ± 0.13 | 0.16 ± 0.09 | 0.16 ± 0.09 |
![]() ![]() Llama 3 8B Meta | 3.20 ± 0.06 | 0.00 ± 0.01 | 0.00 ± 0.01 |
![]() ![]() Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 2.65 ± 0.13 | 0.05 ± 0.03 | 0.05 ± 0.03 |
![]() ![]() Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 2.02 ± 0.11 | 0.10 ± 0.05 | 0.10 ± 0.05 |
![]() ![]() Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 0.03 ± 0.03 | 0.03 ± 0.03 |