Vietnamese Performance
Vietnamese Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 80B MoE 65.68±0.10 |
![]() ![]() 30B MoE 65.56±0.14 |
![]() ![]() 32B 62.63±0.14 |
![]() ![]() 70B 62.37±0.23 |
![]() ![]() 32B 62.19±0.12 |
![]() ![]() 111B 61.39±0.19 |
![]() ![]() 72B 60.70±0.13 |
![]() ![]() 27B 60.26±0.17 |
![]() ![]() 27B 60.06±0.17 |
![]() ![]() 70B 59.92±0.11 |
![]() ![]() 70B 59.74±0.20 |
![]() ![]() 12B 59.07±0.15 |
![]() ![]() 14B 59.03±0.14 |
![]() ![]() 109B MoE 57.78±0.09 |
![]() ![]() 32B 57.15±0.15 |
![]() ![]() 9B 56.15±0.20 |
![]() ![]() 70B 56.10±0.17 |
![]() ![]() 27B 55.33±0.18 |
![]() ![]() 8B 54.80±0.15 |
![]() ![]() 32B 54.04±0.15 |
![]() ![]() 14B 52.22±0.15 |
![]() ![]() 123B 52.21±0.20 |
![]() ![]() 9B 51.44±0.17 |
![]() ![]() 10B 49.85±0.20 |
![]() ![]() 8B 49.00±0.24 |
![]() ![]() 104B 48.97±0.21 |
![]() ![]() 7B 47.19±0.13 |
![]() ![]() 24B 46.52±0.24 |
![]() ![]() 8B 45.78±0.17 |
![]() ![]() 70B 45.09±0.11 |
![]() ![]() 8B 43.13±0.23 |
![]() ![]() 32B 42.34±0.25 |
![]() ![]() 21B MoE 40.86±0.22 |
![]() ![]() 8B 40.67±0.17 |
![]() ![]() 8B 38.83±0.20 |
![]() ![]() 14B 38.42±0.23 |
![]() ![]() 70B 38.12±0.25 |
![]() ![]() 83B 37.19±0.32 |
![]() ![]() 32B 36.77±0.27 |
![]() ![]() 8B 35.81±0.15 |
![]() ![]() 8B 34.32±0.34 |
![]() ![]() 9B 34.19±0.20 |
![]() ![]() 7B 30.44±0.23 |
![]() ![]() 20B 29.04±0.18 |
![]() ![]() 8B 25.51±0.29 |
![]() ![]() 7B 24.98±0.31 |
![]() ![]() 13B 23.71±0.25 |
![]() ![]() 7B 15.57±0.22 |
Vietnamese Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | VI | Instruction Following | Multi-Turn Chat | NLG | NLR | NLU | Safety | Knowledge |
---|---|---|---|---|---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 65.68 ± 0.10 | 90.89 ± 0.45 | 68.35 ± 0.57 | 54.15 ± 0.04 | 68.70 ± 0.09 | 68.36 ± 0.08 | 40.69 ± 0.20 | 68.61 ± 0.26 |
![]() ![]() Qwen 3 30B MoE Alibaba | 65.56 ± 0.14 | 87.94 ± 0.70 | 63.19 ± 0.55 | 53.48 ± 0.04 | 70.29 ± 0.10 | 66.79 ± 0.11 | 50.36 ± 0.24 | 66.90 ± 0.18 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 62.63 ± 0.14 | 87.56 ± 0.43 | 48.03 ± 0.67 | 54.87 ± 0.05 | 69.99 ± 0.10 | 63.75 ± 0.23 | 46.95 ± 0.31 | 67.26 ± 0.20 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.37 ± 0.23 | 92.19 ± 0.51 | 31.01 ± 0.68 | 56.26 ± 0.07 | 71.08 ± 0.29 | 69.58 ± 0.32 | 44.67 ± 0.92 | 71.83 ± 0.40 |
![]() ![]() Qwen 3 32B Alibaba | 62.19 ± 0.12 | 86.95 ± 0.51 | 50.69 ± 0.75 | 54.88 ± 0.06 | 68.79 ± 0.17 | 64.35 ± 0.16 | 43.42 ± 0.41 | 66.22 ± 0.23 |
![]() ![]() Command A 03-2025 111B CohereLabs | 61.39 ± 0.19 | 85.65 ± 0.72 | 42.66 ± 0.66 | 54.40 ± 0.06 | 63.84 ± 0.28 | 65.83 ± 0.30 | 53.41 ± 0.60 | 63.95 ± 0.53 |
![]() ![]() Qwen 2.5 72B Alibaba | 60.70 ± 0.13 | 89.30 ± 0.52 | 32.14 ± 0.52 | 54.28 ± 0.05 | 59.14 ± 0.17 | 69.63 ± 0.19 | 50.90 ± 0.36 | 69.50 ± 0.28 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 60.26 ± 0.17 | 85.11 ± 0.56 | 45.34 ± 0.78 | 52.62 ± 0.05 | 69.93 ± 0.17 | 62.61 ± 0.27 | 41.78 ± 0.53 | 64.41 ± 0.35 |
![]() ![]() Gemma 3 27B | 60.06 ± 0.17 | 85.75 ± 0.73 | 44.09 ± 0.60 | 51.66 ± 0.05 | 70.42 ± 0.10 | 62.28 ± 0.19 | 40.51 ± 0.45 | 65.68 ± 0.28 |
![]() ![]() Llama 3.3 70B Meta | 59.92 ± 0.11 | 92.63 ± 0.39 | 18.51 ± 0.58 | 54.22 ± 0.06 | 74.80 ± 0.12 | 66.09 ± 0.19 | 45.99 ± 0.51 | 67.23 ± 0.17 |
![]() ![]() Tulu 3 70B AI2 | 59.74 ± 0.20 | 83.94 ± 0.69 | 28.19 ± 0.74 | 57.63 ± 0.10 | 73.01 ± 0.16 | 69.43 ± 0.36 | 43.20 ± 0.74 | 62.77 ± 0.39 |
![]() ![]() Gemma 3 12B | 59.07 ± 0.15 | 86.10 ± 0.58 | 39.70 ± 0.60 | 53.47 ± 0.04 | 73.45 ± 0.10 | 63.47 ± 0.23 | 40.86 ± 0.50 | 56.44 ± 0.25 |
![]() ![]() Qwen 3 14B Alibaba | 59.03 ± 0.14 | 89.78 ± 0.50 | 42.04 ± 0.62 | 54.85 ± 0.05 | 63.77 ± 0.12 | 67.11 ± 0.17 | 41.12 ± 0.36 | 54.54 ± 0.27 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 57.78 ± 0.09 | 91.02 ± 0.44 | 22.51 ± 0.55 | 54.56 ± 0.04 | 55.29 ± 0.09 | 67.35 ± 0.10 | 48.33 ± 0.20 | 65.42 ± 0.13 |
![]() ![]() Qwen 2.5 32B Alibaba | 57.15 ± 0.15 | 87.49 ± 0.54 | 19.68 ± 0.56 | 52.29 ± 0.05 | 63.26 ± 0.07 | 69.27 ± 0.15 | 47.74 ± 0.30 | 60.30 ± 0.24 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 56.15 ± 0.20 | 84.03 ± 0.88 | 27.83 ± 0.49 | 54.56 ± 0.06 | 63.66 ± 0.16 | 70.29 ± 0.26 | 40.68 ± 0.45 | 51.98 ± 0.40 |
![]() ![]() Llama 3.1 70B Meta | 56.10 ± 0.17 | 81.49 ± 0.78 | 14.89 ± 0.57 | 56.57 ± 0.11 | 72.22 ± 0.25 | 61.93 ± 0.29 | 40.65 ± 0.73 | 64.98 ± 0.42 |
![]() ![]() Gemma 2 27B | 55.33 ± 0.18 | 80.35 ± 0.83 | 16.28 ± 0.60 | 54.88 ± 0.08 | 70.42 ± 0.16 | 69.10 ± 0.36 | 39.44 ± 0.33 | 56.81 ± 0.39 |
![]() ![]() Qwen 3 8B Alibaba | 54.80 ± 0.15 | 85.05 ± 0.68 | 35.29 ± 0.61 | 53.95 ± 0.04 | 57.02 ± 0.18 | 70.44 ± 0.22 | 31.39 ± 0.24 | 50.49 ± 0.30 |
![]() ![]() Aya Expanse 32B CohereLabs | 54.04 ± 0.15 | 77.84 ± 0.59 | 24.57 ± 0.53 | 54.86 ± 0.04 | 64.46 ± 0.13 | 61.73 ± 0.25 | 45.98 ± 0.35 | 48.84 ± 0.30 |
![]() ![]() Qwen 2.5 14B Alibaba | 52.22 ± 0.15 | 83.75 ± 0.79 | 18.68 ± 0.57 | 50.72 ± 0.06 | 53.92 ± 0.12 | 62.43 ± 0.14 | 42.51 ± 0.29 | 53.53 ± 0.11 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 52.21 ± 0.20 | 85.14 ± 0.73 | 18.43 ± 0.46 | 54.39 ± 0.08 | 60.21 ± 0.37 | 64.32 ± 0.47 | 28.39 ± 1.13 | 54.58 ± 0.61 |
![]() ![]() Gemma 2 9B | 51.44 ± 0.17 | 76.35 ± 1.06 | 12.83 ± 0.51 | 54.10 ± 0.07 | 55.52 ± 0.16 | 68.93 ± 0.31 | 41.37 ± 0.38 | 50.98 ± 0.26 |
![]() ![]() MERaLiON 2 10B A*STAR | 49.85 ± 0.20 | 75.94 ± 1.04 | 10.36 ± 0.36 | 53.56 ± 0.09 | 55.54 ± 0.16 | 66.91 ± 0.31 | 38.16 ± 0.39 | 48.46 ± 0.42 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.00 ± 0.24 | 79.87 ± 0.81 | 24.64 ± 0.57 | 54.03 ± 0.06 | 52.92 ± 0.44 | 66.99 ± 0.27 | 21.65 ± 0.93 | 42.89 ± 0.41 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 48.97 ± 0.21 | 68.48 ± 1.00 | 10.22 ± 0.40 | 53.99 ± 0.08 | 56.57 ± 0.48 | 67.25 ± 0.36 | 40.47 ± 1.10 | 45.84 ± 0.67 |
![]() ![]() Qwen 2.5 7B Alibaba | 47.19 ± 0.13 | 74.03 ± 0.75 | 15.39 ± 0.49 | 43.44 ± 0.02 | 53.85 ± 0.09 | 60.24 ± 0.21 | 32.80 ± 0.25 | 50.55 ± 0.17 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 46.52 ± 0.24 | 71.75 ± 0.86 | 10.45 ± 0.57 | 48.97 ± 0.12 | 56.02 ± 0.29 | 57.09 ± 0.59 | 29.03 ± 0.96 | 52.36 ± 0.61 |
![]() ![]() Aya Expanse 8B CohereLabs | 45.78 ± 0.17 | 65.81 ± 0.77 | 19.07 ± 0.53 | 52.28 ± 0.05 | 54.00 ± 0.19 | 60.93 ± 0.24 | 26.16 ± 0.23 | 42.22 ± 0.29 |
![]() ![]() Llama 3 70B Meta | 45.09 ± 0.11 | 18.67 ± 0.61 | 8.97 ± 0.45 | 55.53 ± 0.06 | 68.64 ± 0.09 | 67.22 ± 0.15 | 38.89 ± 0.57 | 57.75 ± 0.17 |
![]() ![]() Tulu 3 8B AI2 | 43.13 ± 0.23 | 81.90 ± 0.76 | 13.61 ± 0.59 | 36.37 ± 0.08 | 53.90 ± 0.33 | 61.39 ± 0.26 | 20.67 ± 0.90 | 34.11 ± 0.40 |
![]() ![]() Command R 08-2024 32B CohereLabs | 42.34 ± 0.25 | 61.11 ± 1.24 | 6.31 ± 0.32 | 53.43 ± 0.08 | 52.14 ± 0.39 | 49.82 ± 0.43 | 31.44 ± 1.09 | 42.11 ± 0.62 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 40.86 ± 0.22 | 75.43 ± 0.73 | 17.39 ± 0.46 | 48.73 ± 0.08 | 38.68 ± 0.59 | 52.29 ± 0.48 | 19.33 ± 0.70 | 34.17 ± 0.44 |
![]() ![]() Sailor2 8B SAIL | 40.67 ± 0.17 | 42.83 ± 0.70 | 27.63 ± 0.60 | 49.95 ± 0.06 | 65.08 ± 0.31 | 47.41 ± 0.45 | 23.43 ± 0.64 | 28.40 ± 0.54 |
![]() ![]() Llama 3.1 8B Meta | 38.83 ± 0.20 | 71.46 ± 0.87 | 10.88 ± 0.43 | 52.49 ± 0.09 | 19.22 ± 0.32 | 64.78 ± 0.27 | 20.52 ± 0.84 | 32.48 ± 0.54 |
![]() ![]() phi-4 14B Microsoft | 38.42 ± 0.23 | 66.83 ± 1.06 | 19.89 ± 0.44 | 44.79 ± 0.08 | 52.45 ± 0.36 | 33.32 ± 0.46 | 0.00 ± 0.00 | 51.68 ± 0.61 |
![]() ![]() Apertus 70B Swiss AI | 38.12 ± 0.25 | 69.84 ± 1.29 | 17.50 ± 0.69 | 49.84 ± 0.07 | 25.64 ± 0.73 | 60.35 ± 0.52 | 30.10 ± 0.69 | 13.59 ± 0.53 |
![]() ![]() Babel 83B Alibaba-DAMO | 37.19 ± 0.32 | 45.30 ± 1.44 | 8.36 ± 0.63 | 44.23 ± 0.12 | 43.64 ± 0.57 | 51.28 ± 0.58 | 25.73 ± 1.40 | 41.79 ± 0.83 |
![]() ![]() Olmo 2 0325 32B AI2 | 36.77 ± 0.27 | 69.78 ± 0.87 | 6.15 ± 0.40 | 35.78 ± 0.12 | 27.04 ± 0.47 | 56.76 ± 0.60 | 23.19 ± 0.63 | 38.69 ± 0.63 |
![]() ![]() Llama 3 8B Meta | 35.81 ± 0.15 | 21.81 ± 0.60 | 6.32 ± 0.42 | 52.42 ± 0.08 | 42.24 ± 0.31 | 63.78 ± 0.40 | 28.29 ± 0.66 | 35.79 ± 0.40 |
![]() ![]() Apertus 8B Swiss AI | 34.32 ± 0.34 | 69.24 ± 1.13 | 5.55 ± 0.43 | 48.04 ± 0.13 | 23.46 ± 0.65 | 43.33 ± 0.51 | 20.12 ± 1.06 | 30.49 ± 0.77 |
![]() ![]() Babel 9B Alibaba-DAMO | 34.19 ± 0.20 | 42.92 ± 1.07 | 4.97 ± 0.49 | 38.80 ± 0.14 | 41.08 ± 0.53 | 64.36 ± 0.38 | 28.83 ± 0.84 | 18.35 ± 0.62 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 30.44 ± 0.23 | 51.81 ± 1.00 | 10.76 ± 0.44 | 41.80 ± 0.14 | 31.90 ± 0.62 | 46.80 ± 0.60 | 0.00 ± 0.00 | 30.03 ± 0.88 |
![]() ![]() Sailor2 20B SAIL | 29.04 ± 0.18 | 45.65 ± 0.93 | 25.43 ± 0.60 | 34.92 ± 0.09 | 42.78 ± 0.09 | 20.94 ± 0.18 | 0.00 ± 0.00 | 33.56 ± 0.27 |
![]() ![]() Ministral 2410 8B Mistral AI | 25.51 ± 0.29 | 36.83 ± 1.30 | 4.05 ± 0.38 | 32.12 ± 0.14 | 18.05 ± 0.68 | 47.17 ± 0.69 | 15.63 ± 1.35 | 24.74 ± 0.83 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 24.98 ± 0.31 | 52.98 ± 1.67 | 2.51 ± 0.27 | 33.78 ± 0.13 | 19.76 ± 0.65 | 40.29 ± 0.57 | 9.54 ± 1.16 | 16.04 ± 0.58 |
![]() ![]() Olmo 2 1124 13B AI2 | 23.71 ± 0.25 | 62.57 ± 0.83 | 3.13 ± 0.31 | 32.62 ± 0.08 | 12.21 ± 0.57 | 37.31 ± 0.40 | 9.11 ± 0.75 | 8.99 ± 0.38 |
![]() ![]() Olmo 2 1124 7B AI2 | 15.57 ± 0.22 | 49.90 ± 1.09 | 2.43 ± 0.26 | 28.57 ± 0.08 | 0.30 ± 0.17 | 19.44 ± 0.30 | 2.02 ± 0.57 | 6.30 ± 0.55 |
Vietnamese Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | VI | Instruction Following | SEA-IFEval |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 65.68 ± 0.10 | 90.89 ± 0.45 | 90.89 ± 0.45 |
![]() ![]() Qwen 3 30B MoE Alibaba | 65.56 ± 0.14 | 87.94 ± 0.70 | 87.94 ± 0.70 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 62.63 ± 0.14 | 87.56 ± 0.43 | 87.56 ± 0.43 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.37 ± 0.23 | 92.19 ± 0.51 | 92.19 ± 0.51 |
![]() ![]() Qwen 3 32B Alibaba | 62.19 ± 0.12 | 86.95 ± 0.51 | 86.95 ± 0.51 |
![]() ![]() Command A 03-2025 111B CohereLabs | 61.39 ± 0.19 | 85.65 ± 0.72 | 85.65 ± 0.72 |
![]() ![]() Qwen 2.5 72B Alibaba | 60.70 ± 0.13 | 89.30 ± 0.52 | 89.30 ± 0.52 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 60.26 ± 0.17 | 85.11 ± 0.56 | 85.11 ± 0.56 |
![]() ![]() Gemma 3 27B | 60.06 ± 0.17 | 85.75 ± 0.73 | 85.75 ± 0.73 |
![]() ![]() Llama 3.3 70B Meta | 59.92 ± 0.11 | 92.63 ± 0.39 | 92.63 ± 0.39 |
![]() ![]() Tulu 3 70B AI2 | 59.74 ± 0.20 | 83.94 ± 0.69 | 83.94 ± 0.69 |
![]() ![]() Gemma 3 12B | 59.07 ± 0.15 | 86.10 ± 0.58 | 86.10 ± 0.58 |
![]() ![]() Qwen 3 14B Alibaba | 59.03 ± 0.14 | 89.78 ± 0.50 | 89.78 ± 0.50 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 57.78 ± 0.09 | 91.02 ± 0.44 | 91.02 ± 0.44 |
![]() ![]() Qwen 2.5 32B Alibaba | 57.15 ± 0.15 | 87.49 ± 0.54 | 87.49 ± 0.54 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 56.15 ± 0.20 | 84.03 ± 0.88 | 84.03 ± 0.88 |
![]() ![]() Llama 3.1 70B Meta | 56.10 ± 0.17 | 81.49 ± 0.78 | 81.49 ± 0.78 |
![]() ![]() Gemma 2 27B | 55.33 ± 0.18 | 80.35 ± 0.83 | 80.35 ± 0.83 |
![]() ![]() Qwen 3 8B Alibaba | 54.80 ± 0.15 | 85.05 ± 0.68 | 85.05 ± 0.68 |
![]() ![]() Aya Expanse 32B CohereLabs | 54.04 ± 0.15 | 77.84 ± 0.59 | 77.84 ± 0.59 |
![]() ![]() Qwen 2.5 14B Alibaba | 52.22 ± 0.15 | 83.75 ± 0.79 | 83.75 ± 0.79 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 52.21 ± 0.20 | 85.14 ± 0.73 | 85.14 ± 0.73 |
![]() ![]() Gemma 2 9B | 51.44 ± 0.17 | 76.35 ± 1.06 | 76.35 ± 1.06 |
![]() ![]() MERaLiON 2 10B A*STAR | 49.85 ± 0.20 | 75.94 ± 1.04 | 75.94 ± 1.04 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.00 ± 0.24 | 79.87 ± 0.81 | 79.87 ± 0.81 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 48.97 ± 0.21 | 68.48 ± 1.00 | 68.48 ± 1.00 |
![]() ![]() Qwen 2.5 7B Alibaba | 47.19 ± 0.13 | 74.03 ± 0.75 | 74.03 ± 0.75 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 46.52 ± 0.24 | 71.75 ± 0.86 | 71.75 ± 0.86 |
![]() ![]() Aya Expanse 8B CohereLabs | 45.78 ± 0.17 | 65.81 ± 0.77 | 65.81 ± 0.77 |
![]() ![]() Llama 3 70B Meta | 45.09 ± 0.11 | 18.67 ± 0.61 | 18.67 ± 0.61 |
![]() ![]() Tulu 3 8B AI2 | 43.13 ± 0.23 | 81.90 ± 0.76 | 81.90 ± 0.76 |
![]() ![]() Command R 08-2024 32B CohereLabs | 42.34 ± 0.25 | 61.11 ± 1.24 | 61.11 ± 1.24 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 40.86 ± 0.22 | 75.43 ± 0.73 | 75.43 ± 0.73 |
![]() ![]() Sailor2 8B SAIL | 40.67 ± 0.17 | 42.83 ± 0.70 | 42.83 ± 0.70 |
![]() ![]() Llama 3.1 8B Meta | 38.83 ± 0.20 | 71.46 ± 0.87 | 71.46 ± 0.87 |
![]() ![]() phi-4 14B Microsoft | 38.42 ± 0.23 | 66.83 ± 1.06 | 66.83 ± 1.06 |
![]() ![]() Apertus 70B Swiss AI | 38.12 ± 0.25 | 69.84 ± 1.29 | 69.84 ± 1.29 |
![]() ![]() Babel 83B Alibaba-DAMO | 37.19 ± 0.32 | 45.30 ± 1.44 | 45.30 ± 1.44 |
![]() ![]() Olmo 2 0325 32B AI2 | 36.77 ± 0.27 | 69.78 ± 0.87 | 69.78 ± 0.87 |
![]() ![]() Llama 3 8B Meta | 35.81 ± 0.15 | 21.81 ± 0.60 | 21.81 ± 0.60 |
![]() ![]() Apertus 8B Swiss AI | 34.32 ± 0.34 | 69.24 ± 1.13 | 69.24 ± 1.13 |
![]() ![]() Babel 9B Alibaba-DAMO | 34.19 ± 0.20 | 42.92 ± 1.07 | 42.92 ± 1.07 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 30.44 ± 0.23 | 51.81 ± 1.00 | 51.81 ± 1.00 |
![]() ![]() Sailor2 20B SAIL | 29.04 ± 0.18 | 45.65 ± 0.93 | 45.65 ± 0.93 |
![]() ![]() Ministral 2410 8B Mistral AI | 25.51 ± 0.29 | 36.83 ± 1.30 | 36.83 ± 1.30 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 24.98 ± 0.31 | 52.98 ± 1.67 | 52.98 ± 1.67 |
![]() ![]() Olmo 2 1124 13B AI2 | 23.71 ± 0.25 | 62.57 ± 0.83 | 62.57 ± 0.83 |
![]() ![]() Olmo 2 1124 7B AI2 | 15.57 ± 0.22 | 49.90 ± 1.09 | 49.90 ± 1.09 |
Model | VI | Multi-Turn Chat | SEA-MT-Bench |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 65.68 ± 0.10 | 68.35 ± 0.57 | 68.35 ± 0.57 |
![]() ![]() Qwen 3 30B MoE Alibaba | 65.56 ± 0.14 | 63.19 ± 0.55 | 63.19 ± 0.55 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 62.63 ± 0.14 | 48.03 ± 0.67 | 48.03 ± 0.67 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.37 ± 0.23 | 31.01 ± 0.68 | 31.01 ± 0.68 |
![]() ![]() Qwen 3 32B Alibaba | 62.19 ± 0.12 | 50.69 ± 0.75 | 50.69 ± 0.75 |
![]() ![]() Command A 03-2025 111B CohereLabs | 61.39 ± 0.19 | 42.66 ± 0.66 | 42.66 ± 0.66 |
![]() ![]() Qwen 2.5 72B Alibaba | 60.70 ± 0.13 | 32.14 ± 0.52 | 32.14 ± 0.52 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 60.26 ± 0.17 | 45.34 ± 0.78 | 45.34 ± 0.78 |
![]() ![]() Gemma 3 27B | 60.06 ± 0.17 | 44.09 ± 0.60 | 44.09 ± 0.60 |
![]() ![]() Llama 3.3 70B Meta | 59.92 ± 0.11 | 18.51 ± 0.58 | 18.51 ± 0.58 |
![]() ![]() Tulu 3 70B AI2 | 59.74 ± 0.20 | 28.19 ± 0.74 | 28.19 ± 0.74 |
![]() ![]() Gemma 3 12B | 59.07 ± 0.15 | 39.70 ± 0.60 | 39.70 ± 0.60 |
![]() ![]() Qwen 3 14B Alibaba | 59.03 ± 0.14 | 42.04 ± 0.62 | 42.04 ± 0.62 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 57.78 ± 0.09 | 22.51 ± 0.55 | 22.51 ± 0.55 |
![]() ![]() Qwen 2.5 32B Alibaba | 57.15 ± 0.15 | 19.68 ± 0.56 | 19.68 ± 0.56 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 56.15 ± 0.20 | 27.83 ± 0.49 | 27.83 ± 0.49 |
![]() ![]() Llama 3.1 70B Meta | 56.10 ± 0.17 | 14.89 ± 0.57 | 14.89 ± 0.57 |
![]() ![]() Gemma 2 27B | 55.33 ± 0.18 | 16.28 ± 0.60 | 16.28 ± 0.60 |
![]() ![]() Qwen 3 8B Alibaba | 54.80 ± 0.15 | 35.29 ± 0.61 | 35.29 ± 0.61 |
![]() ![]() Aya Expanse 32B CohereLabs | 54.04 ± 0.15 | 24.57 ± 0.53 | 24.57 ± 0.53 |
![]() ![]() Qwen 2.5 14B Alibaba | 52.22 ± 0.15 | 18.68 ± 0.57 | 18.68 ± 0.57 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 52.21 ± 0.20 | 18.43 ± 0.46 | 18.43 ± 0.46 |
![]() ![]() Gemma 2 9B | 51.44 ± 0.17 | 12.83 ± 0.51 | 12.83 ± 0.51 |
![]() ![]() MERaLiON 2 10B A*STAR | 49.85 ± 0.20 | 10.36 ± 0.36 | 10.36 ± 0.36 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.00 ± 0.24 | 24.64 ± 0.57 | 24.64 ± 0.57 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 48.97 ± 0.21 | 10.22 ± 0.40 | 10.22 ± 0.40 |
![]() ![]() Qwen 2.5 7B Alibaba | 47.19 ± 0.13 | 15.39 ± 0.49 | 15.39 ± 0.49 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 46.52 ± 0.24 | 10.45 ± 0.57 | 10.45 ± 0.57 |
![]() ![]() Aya Expanse 8B CohereLabs | 45.78 ± 0.17 | 19.07 ± 0.53 | 19.07 ± 0.53 |
![]() ![]() Llama 3 70B Meta | 45.09 ± 0.11 | 8.97 ± 0.45 | 8.97 ± 0.45 |
![]() ![]() Tulu 3 8B AI2 | 43.13 ± 0.23 | 13.61 ± 0.59 | 13.61 ± 0.59 |
![]() ![]() Command R 08-2024 32B CohereLabs | 42.34 ± 0.25 | 6.31 ± 0.32 | 6.31 ± 0.32 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 40.86 ± 0.22 | 17.39 ± 0.46 | 17.39 ± 0.46 |
![]() ![]() Sailor2 8B SAIL | 40.67 ± 0.17 | 27.63 ± 0.60 | 27.63 ± 0.60 |
![]() ![]() Llama 3.1 8B Meta | 38.83 ± 0.20 | 10.88 ± 0.43 | 10.88 ± 0.43 |
![]() ![]() phi-4 14B Microsoft | 38.42 ± 0.23 | 19.89 ± 0.44 | 19.89 ± 0.44 |
![]() ![]() Apertus 70B Swiss AI | 38.12 ± 0.25 | 17.50 ± 0.69 | 17.50 ± 0.69 |
![]() ![]() Babel 83B Alibaba-DAMO | 37.19 ± 0.32 | 8.36 ± 0.63 | 8.36 ± 0.63 |
![]() ![]() Olmo 2 0325 32B AI2 | 36.77 ± 0.27 | 6.15 ± 0.40 | 6.15 ± 0.40 |
![]() ![]() Llama 3 8B Meta | 35.81 ± 0.15 | 6.32 ± 0.42 | 6.32 ± 0.42 |
![]() ![]() Apertus 8B Swiss AI | 34.32 ± 0.34 | 5.55 ± 0.43 | 5.55 ± 0.43 |
![]() ![]() Babel 9B Alibaba-DAMO | 34.19 ± 0.20 | 4.97 ± 0.49 | 4.97 ± 0.49 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 30.44 ± 0.23 | 10.76 ± 0.44 | 10.76 ± 0.44 |
![]() ![]() Sailor2 20B SAIL | 29.04 ± 0.18 | 25.43 ± 0.60 | 25.43 ± 0.60 |
![]() ![]() Ministral 2410 8B Mistral AI | 25.51 ± 0.29 | 4.05 ± 0.38 | 4.05 ± 0.38 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 24.98 ± 0.31 | 2.51 ± 0.27 | 2.51 ± 0.27 |
![]() ![]() Olmo 2 1124 13B AI2 | 23.71 ± 0.25 | 3.13 ± 0.31 | 3.13 ± 0.31 |
![]() ![]() Olmo 2 1124 7B AI2 | 15.57 ± 0.22 | 2.43 ± 0.26 | 2.43 ± 0.26 |
Model | VI | NLG | Summarization | Translations |
---|---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 65.68 ± 0.10 | 54.15 ± 0.04 | 15.58 ± 0.08 | 92.72 ± 0.01 |
![]() ![]() Qwen 3 30B MoE Alibaba | 65.56 ± 0.14 | 53.48 ± 0.04 | 14.78 ± 0.08 | 92.19 ± 0.01 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 62.63 ± 0.14 | 54.87 ± 0.05 | 17.49 ± 0.10 | 92.25 ± 0.02 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.37 ± 0.23 | 56.26 ± 0.07 | 19.75 ± 0.13 | 92.76 ± 0.02 |
![]() ![]() Qwen 3 32B Alibaba | 62.19 ± 0.12 | 54.88 ± 0.06 | 17.41 ± 0.10 | 92.35 ± 0.01 |
![]() ![]() Command A 03-2025 111B CohereLabs | 61.39 ± 0.19 | 54.40 ± 0.06 | 16.71 ± 0.12 | 92.08 ± 0.05 |
![]() ![]() Qwen 2.5 72B Alibaba | 60.70 ± 0.13 | 54.28 ± 0.05 | 16.09 ± 0.09 | 92.46 ± 0.02 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 60.26 ± 0.17 | 52.62 ± 0.05 | 14.73 ± 0.09 | 90.50 ± 0.03 |
![]() ![]() Gemma 3 27B | 60.06 ± 0.17 | 51.66 ± 0.05 | 14.73 ± 0.09 | 88.59 ± 0.02 |
![]() ![]() Llama 3.3 70B Meta | 59.92 ± 0.11 | 54.22 ± 0.06 | 17.14 ± 0.12 | 91.30 ± 0.02 |
![]() ![]() Tulu 3 70B AI2 | 59.74 ± 0.20 | 57.63 ± 0.10 | 22.61 ± 0.21 | 92.65 ± 0.02 |
![]() ![]() Gemma 3 12B | 59.07 ± 0.15 | 53.47 ± 0.04 | 14.36 ± 0.09 | 92.58 ± 0.02 |
![]() ![]() Qwen 3 14B Alibaba | 59.03 ± 0.14 | 54.85 ± 0.05 | 17.77 ± 0.10 | 91.93 ± 0.02 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 57.78 ± 0.09 | 54.56 ± 0.04 | 17.49 ± 0.06 | 91.63 ± 0.01 |
![]() ![]() Qwen 2.5 32B Alibaba | 57.15 ± 0.15 | 52.29 ± 0.05 | 14.80 ± 0.10 | 89.79 ± 0.04 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 56.15 ± 0.20 | 54.56 ± 0.06 | 17.24 ± 0.12 | 91.88 ± 0.02 |
![]() ![]() Llama 3.1 70B Meta | 56.10 ± 0.17 | 56.57 ± 0.11 | 21.63 ± 0.21 | 91.51 ± 0.02 |
![]() ![]() Gemma 2 27B | 55.33 ± 0.18 | 54.88 ± 0.08 | 19.03 ± 0.14 | 90.72 ± 0.03 |
![]() ![]() Qwen 3 8B Alibaba | 54.80 ± 0.15 | 53.95 ± 0.04 | 16.88 ± 0.08 | 91.02 ± 0.02 |
![]() ![]() Aya Expanse 32B CohereLabs | 54.04 ± 0.15 | 54.86 ± 0.04 | 17.45 ± 0.08 | 92.27 ± 0.02 |
![]() ![]() Qwen 2.5 14B Alibaba | 52.22 ± 0.15 | 50.72 ± 0.06 | 15.13 ± 0.09 | 86.30 ± 0.08 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 52.21 ± 0.20 | 54.39 ± 0.08 | 17.49 ± 0.16 | 91.30 ± 0.03 |
![]() ![]() Gemma 2 9B | 51.44 ± 0.17 | 54.10 ± 0.07 | 17.84 ± 0.13 | 90.36 ± 0.03 |
![]() ![]() MERaLiON 2 10B A*STAR | 49.85 ± 0.20 | 53.56 ± 0.09 | 17.45 ± 0.18 | 89.67 ± 0.04 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.00 ± 0.24 | 54.03 ± 0.06 | 16.50 ± 0.12 | 91.57 ± 0.02 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 48.97 ± 0.21 | 53.99 ± 0.08 | 17.02 ± 0.15 | 90.95 ± 0.04 |
![]() ![]() Qwen 2.5 7B Alibaba | 47.19 ± 0.13 | 43.44 ± 0.02 | 0.16 ± 0.03 | 86.73 ± 0.05 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 46.52 ± 0.24 | 48.97 ± 0.12 | 14.41 ± 0.22 | 83.53 ± 0.09 |
![]() ![]() Aya Expanse 8B CohereLabs | 45.78 ± 0.17 | 52.28 ± 0.05 | 15.11 ± 0.10 | 89.46 ± 0.04 |
![]() ![]() Llama 3 70B Meta | 45.09 ± 0.11 | 55.53 ± 0.06 | 19.90 ± 0.11 | 91.15 ± 0.02 |
![]() ![]() Tulu 3 8B AI2 | 43.13 ± 0.23 | 36.37 ± 0.08 | 14.44 ± 0.13 | 58.29 ± 0.13 |
![]() ![]() Command R 08-2024 32B CohereLabs | 42.34 ± 0.25 | 53.43 ± 0.08 | 17.39 ± 0.17 | 89.47 ± 0.06 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 40.86 ± 0.22 | 48.73 ± 0.08 | 8.38 ± 0.17 | 89.08 ± 0.04 |
![]() ![]() Sailor2 8B SAIL | 40.67 ± 0.17 | 49.95 ± 0.06 | 14.30 ± 0.10 | 85.59 ± 0.09 |
![]() ![]() Llama 3.1 8B Meta | 38.83 ± 0.20 | 52.49 ± 0.09 | 17.61 ± 0.14 | 87.37 ± 0.08 |
![]() ![]() phi-4 14B Microsoft | 38.42 ± 0.23 | 44.79 ± 0.08 | 12.51 ± 0.11 | 77.07 ± 0.10 |
![]() ![]() Apertus 70B Swiss AI | 38.12 ± 0.25 | 49.84 ± 0.07 | 14.49 ± 0.13 | 85.19 ± 0.06 |
![]() ![]() Babel 83B Alibaba-DAMO | 37.19 ± 0.32 | 44.23 ± 0.12 | 14.77 ± 0.22 | 73.68 ± 0.13 |
![]() ![]() Olmo 2 0325 32B AI2 | 36.77 ± 0.27 | 35.78 ± 0.12 | 14.26 ± 0.13 | 57.31 ± 0.19 |
![]() ![]() Llama 3 8B Meta | 35.81 ± 0.15 | 52.42 ± 0.08 | 17.36 ± 0.15 | 87.47 ± 0.04 |
![]() ![]() Apertus 8B Swiss AI | 34.32 ± 0.34 | 48.04 ± 0.13 | 16.74 ± 0.24 | 79.35 ± 0.14 |
![]() ![]() Babel 9B Alibaba-DAMO | 34.19 ± 0.20 | 38.80 ± 0.14 | 8.57 ± 0.17 | 69.02 ± 0.23 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 30.44 ± 0.23 | 41.80 ± 0.14 | 11.07 ± 0.23 | 72.52 ± 0.22 |
![]() ![]() Sailor2 20B SAIL | 29.04 ± 0.18 | 34.92 ± 0.09 | 15.37 ± 0.10 | 54.47 ± 0.17 |
![]() ![]() Ministral 2410 8B Mistral AI | 25.51 ± 0.29 | 32.12 ± 0.14 | 7.33 ± 0.22 | 56.91 ± 0.16 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 24.98 ± 0.31 | 33.78 ± 0.13 | 8.08 ± 0.24 | 59.48 ± 0.14 |
![]() ![]() Olmo 2 1124 13B AI2 | 23.71 ± 0.25 | 32.62 ± 0.08 | 10.64 ± 0.09 | 54.59 ± 0.12 |
![]() ![]() Olmo 2 1124 7B AI2 | 15.57 ± 0.22 | 28.57 ± 0.08 | 10.56 ± 0.14 | 46.58 ± 0.12 |
Model | VI | NLR | Causal Reasoning | Natural Language Inference |
---|---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 65.68 ± 0.10 | 68.70 ± 0.09 | 89.44 ± 0.08 | 47.97 ± 0.14 |
![]() ![]() Qwen 3 30B MoE Alibaba | 65.56 ± 0.14 | 70.29 ± 0.10 | 85.81 ± 0.08 | 54.76 ± 0.19 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 62.63 ± 0.14 | 69.99 ± 0.10 | 85.83 ± 0.13 | 54.15 ± 0.15 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.37 ± 0.23 | 71.08 ± 0.29 | 89.76 ± 0.28 | 52.40 ± 0.45 |
![]() ![]() Qwen 3 32B Alibaba | 62.19 ± 0.12 | 68.79 ± 0.17 | 83.69 ± 0.19 | 53.89 ± 0.26 |
![]() ![]() Command A 03-2025 111B CohereLabs | 61.39 ± 0.19 | 63.84 ± 0.28 | 92.87 ± 0.16 | 34.81 ± 0.54 |
![]() ![]() Qwen 2.5 72B Alibaba | 60.70 ± 0.13 | 59.14 ± 0.17 | 89.71 ± 0.19 | 28.57 ± 0.21 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 60.26 ± 0.17 | 69.93 ± 0.17 | 85.43 ± 0.27 | 54.44 ± 0.17 |
![]() ![]() Gemma 3 27B | 60.06 ± 0.17 | 70.42 ± 0.10 | 85.76 ± 0.11 | 55.08 ± 0.15 |
![]() ![]() Llama 3.3 70B Meta | 59.92 ± 0.11 | 74.80 ± 0.12 | 89.59 ± 0.18 | 60.02 ± 0.18 |
![]() ![]() Tulu 3 70B AI2 | 59.74 ± 0.20 | 73.01 ± 0.16 | 88.17 ± 0.18 | 57.85 ± 0.26 |
![]() ![]() Gemma 3 12B | 59.07 ± 0.15 | 73.45 ± 0.10 | 90.52 ± 0.22 | 56.38 ± 0.16 |
![]() ![]() Qwen 3 14B Alibaba | 59.03 ± 0.14 | 63.77 ± 0.12 | 79.53 ± 0.19 | 48.00 ± 0.16 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 57.78 ± 0.09 | 55.29 ± 0.09 | 85.67 ± 0.13 | 24.91 ± 0.09 |
![]() ![]() Qwen 2.5 32B Alibaba | 57.15 ± 0.15 | 63.26 ± 0.07 | 89.39 ± 0.08 | 37.14 ± 0.15 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 56.15 ± 0.20 | 63.66 ± 0.16 | 85.69 ± 0.28 | 41.62 ± 0.21 |
![]() ![]() Llama 3.1 70B Meta | 56.10 ± 0.17 | 72.22 ± 0.25 | 87.45 ± 0.21 | 56.99 ± 0.48 |
![]() ![]() Gemma 2 27B | 55.33 ± 0.18 | 70.42 ± 0.16 | 86.47 ± 0.26 | 54.38 ± 0.20 |
![]() ![]() Qwen 3 8B Alibaba | 54.80 ± 0.15 | 57.02 ± 0.18 | 66.71 ± 0.29 | 47.34 ± 0.32 |
![]() ![]() Aya Expanse 32B CohereLabs | 54.04 ± 0.15 | 64.46 ± 0.13 | 89.48 ± 0.19 | 39.44 ± 0.12 |
![]() ![]() Qwen 2.5 14B Alibaba | 52.22 ± 0.15 | 53.92 ± 0.12 | 84.65 ± 0.19 | 23.18 ± 0.16 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 52.21 ± 0.20 | 60.21 ± 0.37 | 82.68 ± 0.48 | 37.74 ± 0.50 |
![]() ![]() Gemma 2 9B | 51.44 ± 0.17 | 55.52 ± 0.16 | 83.51 ± 0.26 | 27.53 ± 0.17 |
![]() ![]() MERaLiON 2 10B A*STAR | 49.85 ± 0.20 | 55.54 ± 0.16 | 84.11 ± 0.24 | 26.97 ± 0.24 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.00 ± 0.24 | 52.92 ± 0.44 | 77.29 ± 0.59 | 28.55 ± 0.46 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 48.97 ± 0.21 | 56.57 ± 0.48 | 72.36 ± 0.89 | 40.79 ± 0.43 |
![]() ![]() Qwen 2.5 7B Alibaba | 47.19 ± 0.13 | 53.85 ± 0.09 | 70.44 ± 0.10 | 37.26 ± 0.21 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 46.52 ± 0.24 | 56.02 ± 0.29 | 78.67 ± 0.45 | 33.37 ± 0.47 |
![]() ![]() Aya Expanse 8B CohereLabs | 45.78 ± 0.17 | 54.00 ± 0.19 | 76.55 ± 0.22 | 31.45 ± 0.35 |
![]() ![]() Llama 3 70B Meta | 45.09 ± 0.11 | 68.64 ± 0.09 | 85.87 ± 0.16 | 51.42 ± 0.14 |
![]() ![]() Tulu 3 8B AI2 | 43.13 ± 0.23 | 53.90 ± 0.33 | 65.96 ± 0.43 | 41.84 ± 0.52 |
![]() ![]() Command R 08-2024 32B CohereLabs | 42.34 ± 0.25 | 52.14 ± 0.39 | 78.87 ± 0.76 | 25.42 ± 0.56 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 40.86 ± 0.22 | 38.68 ± 0.59 | 57.04 ± 0.89 | 20.31 ± 0.56 |
![]() ![]() Sailor2 8B SAIL | 40.67 ± 0.17 | 65.08 ± 0.31 | 79.23 ± 0.51 | 50.93 ± 0.50 |
![]() ![]() Llama 3.1 8B Meta | 38.83 ± 0.20 | 19.22 ± 0.32 | 7.92 ± 0.50 | 30.51 ± 0.60 |
![]() ![]() phi-4 14B Microsoft | 38.42 ± 0.23 | 52.45 ± 0.36 | 74.61 ± 0.58 | 30.29 ± 0.52 |
![]() ![]() Apertus 70B Swiss AI | 38.12 ± 0.25 | 25.64 ± 0.73 | 25.53 ± 1.49 | 25.74 ± 0.55 |
![]() ![]() Babel 83B Alibaba-DAMO | 37.19 ± 0.32 | 43.64 ± 0.57 | 68.84 ± 0.83 | 18.45 ± 0.74 |
![]() ![]() Olmo 2 0325 32B AI2 | 36.77 ± 0.27 | 27.04 ± 0.47 | 53.19 ± 0.92 | 0.90 ± 0.09 |
![]() ![]() Llama 3 8B Meta | 35.81 ± 0.15 | 42.24 ± 0.31 | 50.83 ± 0.46 | 33.66 ± 0.43 |
![]() ![]() Apertus 8B Swiss AI | 34.32 ± 0.34 | 23.46 ± 0.65 | 46.27 ± 1.29 | 0.66 ± 0.13 |
![]() ![]() Babel 9B Alibaba-DAMO | 34.19 ± 0.20 | 41.08 ± 0.53 | 66.85 ± 0.89 | 15.31 ± 0.86 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 30.44 ± 0.23 | 31.90 ± 0.62 | 49.83 ± 0.92 | 13.97 ± 0.75 |
![]() ![]() Sailor2 20B SAIL | 29.04 ± 0.18 | 42.78 ± 0.09 | 85.56 ± 0.18 | 0.00 ± 0.00 |
![]() ![]() Ministral 2410 8B Mistral AI | 25.51 ± 0.29 | 18.05 ± 0.68 | 22.76 ± 1.15 | 13.34 ± 0.78 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 24.98 ± 0.31 | 19.76 ± 0.65 | 20.56 ± 1.29 | 18.95 ± 0.74 |
![]() ![]() Olmo 2 1124 13B AI2 | 23.71 ± 0.25 | 12.21 ± 0.57 | 10.19 ± 1.17 | 14.24 ± 0.67 |
![]() ![]() Olmo 2 1124 7B AI2 | 15.57 ± 0.22 | 0.30 ± 0.17 | 0.00 ± 0.00 | 0.60 ± 0.34 |
Model | VI | NLU | Question Answering | Sentiment Analysis |
---|---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 65.68 ± 0.10 | 68.36 ± 0.08 | 71.50 ± 0.11 | 65.21 ± 0.11 |
![]() ![]() Qwen 3 30B MoE Alibaba | 65.56 ± 0.14 | 66.79 ± 0.11 | 78.02 ± 0.22 | 55.56 ± 0.05 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 62.63 ± 0.14 | 63.75 ± 0.23 | 68.30 ± 0.27 | 59.21 ± 0.33 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.37 ± 0.23 | 69.58 ± 0.32 | 75.68 ± 0.37 | 63.48 ± 0.41 |
![]() ![]() Qwen 3 32B Alibaba | 62.19 ± 0.12 | 64.35 ± 0.16 | 67.28 ± 0.26 | 61.43 ± 0.27 |
![]() ![]() Command A 03-2025 111B CohereLabs | 61.39 ± 0.19 | 65.83 ± 0.30 | 74.40 ± 0.41 | 57.25 ± 0.34 |
![]() ![]() Qwen 2.5 72B Alibaba | 60.70 ± 0.13 | 69.63 ± 0.19 | 75.80 ± 0.17 | 63.46 ± 0.33 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 60.26 ± 0.17 | 62.61 ± 0.27 | 63.64 ± 0.42 | 61.59 ± 0.31 |
![]() ![]() Gemma 3 27B | 60.06 ± 0.17 | 62.28 ± 0.19 | 62.28 ± 0.36 | 62.28 ± 0.18 |
![]() ![]() Llama 3.3 70B Meta | 59.92 ± 0.11 | 66.09 ± 0.19 | 76.14 ± 0.25 | 56.04 ± 0.27 |
![]() ![]() Tulu 3 70B AI2 | 59.74 ± 0.20 | 69.43 ± 0.36 | 70.66 ± 0.56 | 68.20 ± 0.38 |
![]() ![]() Gemma 3 12B | 59.07 ± 0.15 | 63.47 ± 0.23 | 60.90 ± 0.49 | 66.04 ± 0.09 |
![]() ![]() Qwen 3 14B Alibaba | 59.03 ± 0.14 | 67.11 ± 0.17 | 77.62 ± 0.18 | 56.61 ± 0.29 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 57.78 ± 0.09 | 67.35 ± 0.10 | 75.40 ± 0.14 | 59.30 ± 0.18 |
![]() ![]() Qwen 2.5 32B Alibaba | 57.15 ± 0.15 | 69.27 ± 0.15 | 75.94 ± 0.28 | 62.61 ± 0.08 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 56.15 ± 0.20 | 70.29 ± 0.26 | 78.15 ± 0.38 | 62.44 ± 0.48 |
![]() ![]() Llama 3.1 70B Meta | 56.10 ± 0.17 | 61.93 ± 0.29 | 61.13 ± 0.48 | 62.73 ± 0.27 |
![]() ![]() Gemma 2 27B | 55.33 ± 0.18 | 69.10 ± 0.36 | 72.31 ± 0.46 | 65.88 ± 0.52 |
![]() ![]() Qwen 3 8B Alibaba | 54.80 ± 0.15 | 70.44 ± 0.22 | 79.38 ± 0.26 | 61.50 ± 0.34 |
![]() ![]() Aya Expanse 32B CohereLabs | 54.04 ± 0.15 | 61.73 ± 0.25 | 71.74 ± 0.29 | 51.72 ± 0.33 |
![]() ![]() Qwen 2.5 14B Alibaba | 52.22 ± 0.15 | 62.43 ± 0.14 | 63.26 ± 0.22 | 61.59 ± 0.17 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 52.21 ± 0.20 | 64.32 ± 0.47 | 67.21 ± 0.75 | 61.43 ± 0.47 |
![]() ![]() Gemma 2 9B | 51.44 ± 0.17 | 68.93 ± 0.31 | 74.96 ± 0.60 | 62.90 ± 0.34 |
![]() ![]() MERaLiON 2 10B A*STAR | 49.85 ± 0.20 | 66.91 ± 0.31 | 74.89 ± 0.47 | 58.93 ± 0.43 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.00 ± 0.24 | 66.99 ± 0.27 | 68.10 ± 0.42 | 65.87 ± 0.41 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 48.97 ± 0.21 | 67.25 ± 0.36 | 80.48 ± 0.51 | 54.03 ± 0.64 |
![]() ![]() Qwen 2.5 7B Alibaba | 47.19 ± 0.13 | 60.24 ± 0.21 | 61.66 ± 0.31 | 58.83 ± 0.24 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 46.52 ± 0.24 | 57.09 ± 0.59 | 66.86 ± 0.64 | 47.31 ± 1.03 |
![]() ![]() Aya Expanse 8B CohereLabs | 45.78 ± 0.17 | 60.93 ± 0.24 | 71.25 ± 0.32 | 50.61 ± 0.37 |
![]() ![]() Llama 3 70B Meta | 45.09 ± 0.11 | 67.22 ± 0.15 | 81.56 ± 0.22 | 52.87 ± 0.22 |
![]() ![]() Tulu 3 8B AI2 | 43.13 ± 0.23 | 61.39 ± 0.26 | 69.07 ± 0.43 | 53.70 ± 0.26 |
![]() ![]() Command R 08-2024 32B CohereLabs | 42.34 ± 0.25 | 49.82 ± 0.43 | 76.67 ± 0.52 | 22.98 ± 0.57 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 40.86 ± 0.22 | 52.29 ± 0.48 | 74.96 ± 0.37 | 29.61 ± 0.88 |
![]() ![]() Sailor2 8B SAIL | 40.67 ± 0.17 | 47.41 ± 0.45 | 50.37 ± 0.66 | 44.45 ± 0.41 |
![]() ![]() Llama 3.1 8B Meta | 38.83 ± 0.20 | 64.78 ± 0.27 | 71.87 ± 0.30 | 57.68 ± 0.41 |
![]() ![]() phi-4 14B Microsoft | 38.42 ± 0.23 | 33.32 ± 0.46 | 46.09 ± 0.64 | 20.55 ± 0.69 |
![]() ![]() Apertus 70B Swiss AI | 38.12 ± 0.25 | 60.35 ± 0.52 | 69.63 ± 0.78 | 51.07 ± 0.62 |
![]() ![]() Babel 83B Alibaba-DAMO | 37.19 ± 0.32 | 51.28 ± 0.58 | 54.17 ± 0.60 | 48.39 ± 0.97 |
![]() ![]() Olmo 2 0325 32B AI2 | 36.77 ± 0.27 | 56.76 ± 0.60 | 56.26 ± 0.60 | 57.27 ± 0.98 |
![]() ![]() Llama 3 8B Meta | 35.81 ± 0.15 | 63.78 ± 0.40 | 71.64 ± 0.33 | 55.93 ± 0.60 |
![]() ![]() Apertus 8B Swiss AI | 34.32 ± 0.34 | 43.33 ± 0.51 | 64.27 ± 0.68 | 22.38 ± 1.09 |
![]() ![]() Babel 9B Alibaba-DAMO | 34.19 ± 0.20 | 64.36 ± 0.38 | 73.18 ± 0.55 | 55.53 ± 0.67 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 30.44 ± 0.23 | 46.80 ± 0.60 | 66.05 ± 0.60 | 27.56 ± 1.00 |
![]() ![]() Sailor2 20B SAIL | 29.04 ± 0.18 | 20.94 ± 0.18 | 41.89 ± 0.36 | 0.00 ± 0.00 |
![]() ![]() Ministral 2410 8B Mistral AI | 25.51 ± 0.29 | 47.17 ± 0.69 | 63.54 ± 0.86 | 30.81 ± 1.06 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 24.98 ± 0.31 | 40.29 ± 0.57 | 61.33 ± 0.85 | 19.24 ± 0.87 |
![]() ![]() Olmo 2 1124 13B AI2 | 23.71 ± 0.25 | 37.31 ± 0.40 | 50.39 ± 0.71 | 24.23 ± 0.50 |
![]() ![]() Olmo 2 1124 7B AI2 | 15.57 ± 0.22 | 19.44 ± 0.30 | 38.85 ± 0.59 | 0.02 ± 0.05 |
Model | VI | Safety | Toxicity Detection |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 65.68 ± 0.10 | 40.69 ± 0.20 | 40.69 ± 0.20 |
![]() ![]() Qwen 3 30B MoE Alibaba | 65.56 ± 0.14 | 50.36 ± 0.24 | 50.36 ± 0.24 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 62.63 ± 0.14 | 46.95 ± 0.31 | 46.95 ± 0.31 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.37 ± 0.23 | 44.67 ± 0.92 | 44.67 ± 0.92 |
![]() ![]() Qwen 3 32B Alibaba | 62.19 ± 0.12 | 43.42 ± 0.41 | 43.42 ± 0.41 |
![]() ![]() Command A 03-2025 111B CohereLabs | 61.39 ± 0.19 | 53.41 ± 0.60 | 53.41 ± 0.60 |
![]() ![]() Qwen 2.5 72B Alibaba | 60.70 ± 0.13 | 50.90 ± 0.36 | 50.90 ± 0.36 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 60.26 ± 0.17 | 41.78 ± 0.53 | 41.78 ± 0.53 |
![]() ![]() Gemma 3 27B | 60.06 ± 0.17 | 40.51 ± 0.45 | 40.51 ± 0.45 |
![]() ![]() Llama 3.3 70B Meta | 59.92 ± 0.11 | 45.99 ± 0.51 | 45.99 ± 0.51 |
![]() ![]() Tulu 3 70B AI2 | 59.74 ± 0.20 | 43.20 ± 0.74 | 43.20 ± 0.74 |
![]() ![]() Gemma 3 12B | 59.07 ± 0.15 | 40.86 ± 0.50 | 40.86 ± 0.50 |
![]() ![]() Qwen 3 14B Alibaba | 59.03 ± 0.14 | 41.12 ± 0.36 | 41.12 ± 0.36 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 57.78 ± 0.09 | 48.33 ± 0.20 | 48.33 ± 0.20 |
![]() ![]() Qwen 2.5 32B Alibaba | 57.15 ± 0.15 | 47.74 ± 0.30 | 47.74 ± 0.30 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 56.15 ± 0.20 | 40.68 ± 0.45 | 40.68 ± 0.45 |
![]() ![]() Llama 3.1 70B Meta | 56.10 ± 0.17 | 40.65 ± 0.73 | 40.65 ± 0.73 |
![]() ![]() Gemma 2 27B | 55.33 ± 0.18 | 39.44 ± 0.33 | 39.44 ± 0.33 |
![]() ![]() Qwen 3 8B Alibaba | 54.80 ± 0.15 | 31.39 ± 0.24 | 31.39 ± 0.24 |
![]() ![]() Aya Expanse 32B CohereLabs | 54.04 ± 0.15 | 45.98 ± 0.35 | 45.98 ± 0.35 |
![]() ![]() Qwen 2.5 14B Alibaba | 52.22 ± 0.15 | 42.51 ± 0.29 | 42.51 ± 0.29 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 52.21 ± 0.20 | 28.39 ± 1.13 | 28.39 ± 1.13 |
![]() ![]() Gemma 2 9B | 51.44 ± 0.17 | 41.37 ± 0.38 | 41.37 ± 0.38 |
![]() ![]() MERaLiON 2 10B A*STAR | 49.85 ± 0.20 | 38.16 ± 0.39 | 38.16 ± 0.39 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.00 ± 0.24 | 21.65 ± 0.93 | 21.65 ± 0.93 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 48.97 ± 0.21 | 40.47 ± 1.10 | 40.47 ± 1.10 |
![]() ![]() Qwen 2.5 7B Alibaba | 47.19 ± 0.13 | 32.80 ± 0.25 | 32.80 ± 0.25 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 46.52 ± 0.24 | 29.03 ± 0.96 | 29.03 ± 0.96 |
![]() ![]() Aya Expanse 8B CohereLabs | 45.78 ± 0.17 | 26.16 ± 0.23 | 26.16 ± 0.23 |
![]() ![]() Llama 3 70B Meta | 45.09 ± 0.11 | 38.89 ± 0.57 | 38.89 ± 0.57 |
![]() ![]() Tulu 3 8B AI2 | 43.13 ± 0.23 | 20.67 ± 0.90 | 20.67 ± 0.90 |
![]() ![]() Command R 08-2024 32B CohereLabs | 42.34 ± 0.25 | 31.44 ± 1.09 | 31.44 ± 1.09 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 40.86 ± 0.22 | 19.33 ± 0.70 | 19.33 ± 0.70 |
![]() ![]() Sailor2 8B SAIL | 40.67 ± 0.17 | 23.43 ± 0.64 | 23.43 ± 0.64 |
![]() ![]() Llama 3.1 8B Meta | 38.83 ± 0.20 | 20.52 ± 0.84 | 20.52 ± 0.84 |
![]() ![]() phi-4 14B Microsoft | 38.42 ± 0.23 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Apertus 70B Swiss AI | 38.12 ± 0.25 | 30.10 ± 0.69 | 30.10 ± 0.69 |
![]() ![]() Babel 83B Alibaba-DAMO | 37.19 ± 0.32 | 25.73 ± 1.40 | 25.73 ± 1.40 |
![]() ![]() Olmo 2 0325 32B AI2 | 36.77 ± 0.27 | 23.19 ± 0.63 | 23.19 ± 0.63 |
![]() ![]() Llama 3 8B Meta | 35.81 ± 0.15 | 28.29 ± 0.66 | 28.29 ± 0.66 |
![]() ![]() Apertus 8B Swiss AI | 34.32 ± 0.34 | 20.12 ± 1.06 | 20.12 ± 1.06 |
![]() ![]() Babel 9B Alibaba-DAMO | 34.19 ± 0.20 | 28.83 ± 0.84 | 28.83 ± 0.84 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 30.44 ± 0.23 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Sailor2 20B SAIL | 29.04 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Ministral 2410 8B Mistral AI | 25.51 ± 0.29 | 15.63 ± 1.35 | 15.63 ± 1.35 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 24.98 ± 0.31 | 9.54 ± 1.16 | 9.54 ± 1.16 |
![]() ![]() Olmo 2 1124 13B AI2 | 23.71 ± 0.25 | 9.11 ± 0.75 | 9.11 ± 0.75 |
![]() ![]() Olmo 2 1124 7B AI2 | 15.57 ± 0.22 | 2.02 ± 0.57 | 2.02 ± 0.57 |
Model | VI | Knowledge | Global MMLU Lite |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 65.68 ± 0.10 | 68.61 ± 0.26 | 68.61 ± 0.26 |
![]() ![]() Qwen 3 30B MoE Alibaba | 65.56 ± 0.14 | 66.90 ± 0.18 | 66.90 ± 0.18 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 62.63 ± 0.14 | 67.26 ± 0.20 | 67.26 ± 0.20 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.37 ± 0.23 | 71.83 ± 0.40 | 71.83 ± 0.40 |
![]() ![]() Qwen 3 32B Alibaba | 62.19 ± 0.12 | 66.22 ± 0.23 | 66.22 ± 0.23 |
![]() ![]() Command A 03-2025 111B CohereLabs | 61.39 ± 0.19 | 63.95 ± 0.53 | 63.95 ± 0.53 |
![]() ![]() Qwen 2.5 72B Alibaba | 60.70 ± 0.13 | 69.50 ± 0.28 | 69.50 ± 0.28 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 60.26 ± 0.17 | 64.41 ± 0.35 | 64.41 ± 0.35 |
![]() ![]() Gemma 3 27B | 60.06 ± 0.17 | 65.68 ± 0.28 | 65.68 ± 0.28 |
![]() ![]() Llama 3.3 70B Meta | 59.92 ± 0.11 | 67.23 ± 0.17 | 67.23 ± 0.17 |
![]() ![]() Tulu 3 70B AI2 | 59.74 ± 0.20 | 62.77 ± 0.39 | 62.77 ± 0.39 |
![]() ![]() Gemma 3 12B | 59.07 ± 0.15 | 56.44 ± 0.25 | 56.44 ± 0.25 |
![]() ![]() Qwen 3 14B Alibaba | 59.03 ± 0.14 | 54.54 ± 0.27 | 54.54 ± 0.27 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 57.78 ± 0.09 | 65.42 ± 0.13 | 65.42 ± 0.13 |
![]() ![]() Qwen 2.5 32B Alibaba | 57.15 ± 0.15 | 60.30 ± 0.24 | 60.30 ± 0.24 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 56.15 ± 0.20 | 51.98 ± 0.40 | 51.98 ± 0.40 |
![]() ![]() Llama 3.1 70B Meta | 56.10 ± 0.17 | 64.98 ± 0.42 | 64.98 ± 0.42 |
![]() ![]() Gemma 2 27B | 55.33 ± 0.18 | 56.81 ± 0.39 | 56.81 ± 0.39 |
![]() ![]() Qwen 3 8B Alibaba | 54.80 ± 0.15 | 50.49 ± 0.30 | 50.49 ± 0.30 |
![]() ![]() Aya Expanse 32B CohereLabs | 54.04 ± 0.15 | 48.84 ± 0.30 | 48.84 ± 0.30 |
![]() ![]() Qwen 2.5 14B Alibaba | 52.22 ± 0.15 | 53.53 ± 0.11 | 53.53 ± 0.11 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 52.21 ± 0.20 | 54.58 ± 0.61 | 54.58 ± 0.61 |
![]() ![]() Gemma 2 9B | 51.44 ± 0.17 | 50.98 ± 0.26 | 50.98 ± 0.26 |
![]() ![]() MERaLiON 2 10B A*STAR | 49.85 ± 0.20 | 48.46 ± 0.42 | 48.46 ± 0.42 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.00 ± 0.24 | 42.89 ± 0.41 | 42.89 ± 0.41 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 48.97 ± 0.21 | 45.84 ± 0.67 | 45.84 ± 0.67 |
![]() ![]() Qwen 2.5 7B Alibaba | 47.19 ± 0.13 | 50.55 ± 0.17 | 50.55 ± 0.17 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 46.52 ± 0.24 | 52.36 ± 0.61 | 52.36 ± 0.61 |
![]() ![]() Aya Expanse 8B CohereLabs | 45.78 ± 0.17 | 42.22 ± 0.29 | 42.22 ± 0.29 |
![]() ![]() Llama 3 70B Meta | 45.09 ± 0.11 | 57.75 ± 0.17 | 57.75 ± 0.17 |
![]() ![]() Tulu 3 8B AI2 | 43.13 ± 0.23 | 34.11 ± 0.40 | 34.11 ± 0.40 |
![]() ![]() Command R 08-2024 32B CohereLabs | 42.34 ± 0.25 | 42.11 ± 0.62 | 42.11 ± 0.62 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 40.86 ± 0.22 | 34.17 ± 0.44 | 34.17 ± 0.44 |
![]() ![]() Sailor2 8B SAIL | 40.67 ± 0.17 | 28.40 ± 0.54 | 28.40 ± 0.54 |
![]() ![]() Llama 3.1 8B Meta | 38.83 ± 0.20 | 32.48 ± 0.54 | 32.48 ± 0.54 |
![]() ![]() phi-4 14B Microsoft | 38.42 ± 0.23 | 51.68 ± 0.61 | 51.68 ± 0.61 |
![]() ![]() Apertus 70B Swiss AI | 38.12 ± 0.25 | 13.59 ± 0.53 | 13.59 ± 0.53 |
![]() ![]() Babel 83B Alibaba-DAMO | 37.19 ± 0.32 | 41.79 ± 0.83 | 41.79 ± 0.83 |
![]() ![]() Olmo 2 0325 32B AI2 | 36.77 ± 0.27 | 38.69 ± 0.63 | 38.69 ± 0.63 |
![]() ![]() Llama 3 8B Meta | 35.81 ± 0.15 | 35.79 ± 0.40 | 35.79 ± 0.40 |
![]() ![]() Apertus 8B Swiss AI | 34.32 ± 0.34 | 30.49 ± 0.77 | 30.49 ± 0.77 |
![]() ![]() Babel 9B Alibaba-DAMO | 34.19 ± 0.20 | 18.35 ± 0.62 | 18.35 ± 0.62 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 30.44 ± 0.23 | 30.03 ± 0.88 | 30.03 ± 0.88 |
![]() ![]() Sailor2 20B SAIL | 29.04 ± 0.18 | 33.56 ± 0.27 | 33.56 ± 0.27 |
![]() ![]() Ministral 2410 8B Mistral AI | 25.51 ± 0.29 | 24.74 ± 0.83 | 24.74 ± 0.83 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 24.98 ± 0.31 | 16.04 ± 0.58 | 16.04 ± 0.58 |
![]() ![]() Olmo 2 1124 13B AI2 | 23.71 ± 0.25 | 8.99 ± 0.38 | 8.99 ± 0.38 |
![]() ![]() Olmo 2 1124 7B AI2 | 15.57 ± 0.22 | 6.30 ± 0.55 | 6.30 ± 0.55 |