Vietnamese Performance
Vietnamese Scores by Model
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 30B MoE 72.49±0.20 |
![]() ![]() 32B 69.94±0.38 |
![]() ![]() 70B 69.65±0.55 |
![]() ![]() 111B 69.10±0.58 |
![]() ![]() 70B 68.85±0.24 |
![]() ![]() 72B 68.27±0.29 |
![]() ![]() 14B 67.62±0.15 |
![]() ![]() 70B 67.36±0.19 |
![]() ![]() 27B 66.10±0.32 |
![]() ![]() 27B 65.64±0.42 |
![]() ![]() 32B 64.97±0.30 |
![]() ![]() 8B 64.68±0.24 |
![]() ![]() 12B 64.39±0.49 |
![]() ![]() 9B 64.34±0.42 |
![]() ![]() 70B 63.96±0.34 |
![]() ![]() 109B MoE 63.73±0.19 |
![]() ![]() 27B 62.98±0.47 |
![]() ![]() 123B 62.36±2.29 |
![]() ![]() 32B 62.14±0.31 |
![]() ![]() 104B 60.93±1.24 |
![]() ![]() 14B 60.02±0.25 |
![]() ![]() 9B 59.64±0.51 |
![]() ![]() 10B 58.52±0.61 |
![]() ![]() 7B 57.96±0.27 |
![]() ![]() 8B 57.87±0.63 |
![]() ![]() 8B 57.18±0.42 |
![]() ![]() 70B 55.14±0.22 |
![]() ![]() 8B 54.80±0.61 |
![]() ![]() 21B MoE 54.78±3.78 |
![]() ![]() 24B 54.16±1.61 |
![]() ![]() 32B 53.76±3.73 |
![]() ![]() 8B 51.21±0.62 |
![]() ![]() 83B 50.39±5.33 |
![]() ![]() 32B 49.35±0.89 |
![]() ![]() 14B 49.25±2.75 |
![]() ![]() 8B 48.59±0.86 |
![]() ![]() 8B 47.65±0.63 |
![]() ![]() 13B 45.08±1.53 |
![]() ![]() 9B 44.97±2.17 |
![]() ![]() 7B 41.91±3.76 |
![]() ![]() 8B 37.71±2.67 |
![]() ![]() 7B 36.11±6.91 |
![]() ![]() 20B 33.92±1.13 |
![]() ![]() 7B 30.65±2.83 |
Vietnamese Competencies
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | VI | Instruction Following | Multi-Turn Chat | NLG | NLR | NLU | Safety | Knowledge |
---|---|---|---|---|---|---|---|---|
![]() ![]() Qwen 3 30B MoE Alibaba | 72.49 ± 0.20 | 88.10 ± 0.86 | 63.47 ± 1.08 | 53.51 ± 0.06 | 81.39 ± 0.11 | 79.07 ± 0.24 | 67.24 ± 0.72 | 74.66 ± 0.31 |
![]() ![]() Qwen 3 32B Alibaba | 69.94 ± 0.38 | 87.26 ± 1.36 | 50.70 ± 1.81 | 54.89 ± 0.14 | 80.56 ± 0.57 | 73.52 ± 0.31 | 68.45 ± 0.80 | 74.22 ± 0.40 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 69.65 ± 0.55 | 92.14 ± 0.77 | 30.50 ± 0.84 | 56.24 ± 0.16 | 81.51 ± 1.80 | 79.40 ± 0.68 | 69.55 ± 2.83 | 78.19 ± 0.45 |
![]() ![]() Command A 03-2025 111B CohereLabs | 69.10 ± 0.58 | 85.95 ± 1.57 | 42.51 ± 1.78 | 54.36 ± 0.35 | 76.52 ± 3.49 | 74.68 ± 0.53 | 77.05 ± 3.63 | 72.63 ± 0.92 |
![]() ![]() Tulu 3 70B AI2 | 68.85 ± 0.24 | 84.17 ± 0.70 | 28.02 ± 1.07 | 57.64 ± 0.21 | 83.02 ± 0.68 | 77.25 ± 0.28 | 80.05 ± 2.32 | 71.78 ± 0.58 |
![]() ![]() Qwen 2.5 72B Alibaba | 68.27 ± 0.29 | 89.05 ± 0.79 | 32.65 ± 1.20 | 54.27 ± 0.09 | 73.68 ± 0.78 | 77.63 ± 0.25 | 73.95 ± 0.75 | 76.63 ± 0.39 |
![]() ![]() Qwen 3 14B Alibaba | 67.62 ± 0.15 | 89.64 ± 0.55 | 42.56 ± 1.25 | 54.85 ± 0.09 | 77.58 ± 0.18 | 78.17 ± 0.28 | 65.28 ± 0.62 | 65.22 ± 0.34 |
![]() ![]() Llama 3.3 70B Meta | 67.36 ± 0.19 | 92.62 ± 0.58 | 18.97 ± 1.27 | 54.20 ± 0.13 | 84.13 ± 0.19 | 80.19 ± 0.16 | 66.48 ± 0.76 | 74.94 ± 0.22 |
![]() ![]() SEA-LION v4 27B AISG | 66.10 ± 0.32 | 85.36 ± 0.70 | 45.74 ± 1.40 | 52.62 ± 0.57 | 81.18 ± 0.10 | 73.17 ± 0.47 | 51.85 ± 2.76 | 72.81 ± 1.19 |
![]() ![]() Gemma 3 27B | 65.64 ± 0.42 | 85.36 ± 1.73 | 43.80 ± 1.64 | 51.67 ± 0.45 | 81.44 ± 0.08 | 72.62 ± 0.49 | 50.80 ± 2.10 | 73.78 ± 0.31 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.97 ± 0.30 | 87.74 ± 1.39 | 19.72 ± 1.49 | 52.29 ± 0.07 | 76.42 ± 0.28 | 77.07 ± 0.50 | 71.64 ± 0.20 | 69.88 ± 0.29 |
![]() ![]() Qwen 3 8B Alibaba | 64.68 ± 0.24 | 85.24 ± 1.54 | 35.29 ± 0.92 | 53.97 ± 0.10 | 74.20 ± 0.78 | 78.82 ± 0.35 | 63.15 ± 1.29 | 62.09 ± 0.39 |
![]() ![]() Gemma 3 12B | 64.39 ± 0.49 | 85.95 ± 1.35 | 39.49 ± 1.01 | 53.45 ± 0.07 | 83.04 ± 0.27 | 68.58 ± 0.57 | 53.01 ± 2.63 | 67.22 ± 0.39 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 64.34 ± 0.42 | 83.21 ± 2.71 | 28.13 ± 1.39 | 54.59 ± 0.11 | 77.05 ± 0.35 | 78.51 ± 0.40 | 65.64 ± 1.93 | 63.28 ± 0.65 |
![]() ![]() Llama 3.1 70B Meta | 63.96 ± 0.34 | 81.43 ± 1.37 | 14.71 ± 1.23 | 56.52 ± 0.27 | 82.38 ± 0.50 | 73.53 ± 0.74 | 65.86 ± 2.29 | 73.25 ± 1.50 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 63.73 ± 0.19 | 90.95 ± 0.93 | 22.47 ± 0.77 | 54.55 ± 0.07 | 71.44 ± 0.17 | 78.37 ± 0.26 | 54.81 ± 0.23 | 73.50 ± 0.26 |
![]() ![]() Gemma 2 27B | 62.98 ± 0.47 | 80.36 ± 1.69 | 16.16 ± 1.13 | 54.82 ± 0.62 | 81.42 ± 0.14 | 78.02 ± 0.99 | 63.23 ± 3.26 | 66.88 ± 0.45 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 62.36 ± 2.29 | 85.36 ± 2.39 | 18.43 ± 0.98 | 54.47 ± 0.13 | 74.83 ± 3.22 | 74.20 ± 1.20 | 64.01 ± 11.26 | 65.22 ± 2.56 |
![]() ![]() Aya Expanse 32B CohereLabs | 62.14 ± 0.31 | 77.86 ± 1.72 | 25.27 ± 0.69 | 54.86 ± 0.16 | 77.21 ± 0.18 | 75.06 ± 0.50 | 64.25 ± 0.96 | 60.47 ± 0.46 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 60.93 ± 1.24 | 67.86 ± 2.75 | 10.18 ± 1.07 | 53.93 ± 0.24 | 73.51 ± 4.70 | 79.40 ± 1.30 | 83.33 ± 2.75 | 58.31 ± 1.92 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.02 ± 0.25 | 83.33 ± 1.83 | 18.53 ± 1.48 | 50.73 ± 0.09 | 70.54 ± 0.30 | 70.80 ± 0.45 | 61.30 ± 0.87 | 64.88 ± 0.31 |
![]() ![]() Gemma 2 9B | 59.64 ± 0.51 | 75.36 ± 1.59 | 12.34 ± 0.73 | 54.08 ± 0.18 | 71.71 ± 0.27 | 78.08 ± 1.27 | 63.83 ± 1.93 | 62.09 ± 0.40 |
![]() ![]() MERaLiON 2 10B A*STAR | 58.52 ± 0.61 | 75.12 ± 2.51 | 9.91 ± 0.89 | 53.57 ± 0.21 | 71.61 ± 0.42 | 76.33 ± 1.20 | 62.80 ± 2.69 | 60.31 ± 0.95 |
![]() ![]() Qwen 2.5 7B Alibaba | 57.96 ± 0.27 | 74.05 ± 1.10 | 15.57 ± 1.06 | 43.47 ± 0.06 | 71.73 ± 0.42 | 67.15 ± 0.56 | 71.61 ± 0.79 | 62.16 ± 0.16 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 57.87 ± 0.63 | 80.12 ± 1.34 | 24.46 ± 1.82 | 54.03 ± 0.19 | 70.48 ± 2.58 | 72.84 ± 1.29 | 47.14 ± 3.99 | 56.00 ± 0.44 |
![]() ![]() Aya Expanse 8B CohereLabs | 57.18 ± 0.42 | 66.19 ± 1.58 | 19.45 ± 1.08 | 52.27 ± 0.47 | 71.18 ± 2.26 | 74.72 ± 0.28 | 61.19 ± 1.43 | 55.25 ± 0.46 |
![]() ![]() Llama 3 70B Meta | 55.14 ± 0.22 | 18.45 ± 0.86 | 8.94 ± 0.83 | 55.52 ± 0.19 | 80.25 ± 0.39 | 81.75 ± 0.28 | 73.47 ± 0.71 | 67.59 ± 0.23 |
![]() ![]() Tulu 3 8B AI2 | 54.80 ± 0.61 | 81.79 ± 1.29 | 13.47 ± 1.20 | 36.41 ± 2.27 | 71.91 ± 2.10 | 66.58 ± 1.02 | 64.46 ± 2.32 | 48.97 ± 1.02 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 54.78 ± 3.78 | 74.88 ± 3.05 | 17.03 ± 1.53 | 48.75 ± 0.53 | 62.83 ± 5.34 | 59.59 ± 5.99 | 70.96 ± 12.24 | 49.44 ± 5.29 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 54.16 ± 1.61 | 72.14 ± 1.53 | 10.18 ± 0.68 | 49.06 ± 1.26 | 72.39 ± 2.79 | 65.26 ± 5.77 | 46.77 ± 10.97 | 63.31 ± 1.93 |
![]() ![]() Command R 08-2024 32B CohereLabs | 53.76 ± 3.73 | 61.19 ± 3.30 | 6.14 ± 0.90 | 53.35 ± 0.80 | 69.81 ± 5.12 | 72.09 ± 8.33 | 58.05 ± 12.11 | 55.69 ± 2.96 |
![]() ![]() Llama 3.1 8B Meta | 51.21 ± 0.62 | 71.07 ± 1.41 | 10.67 ± 0.87 | 52.44 ± 0.35 | 53.67 ± 2.40 | 76.97 ± 0.47 | 45.69 ± 2.17 | 47.94 ± 1.24 |
![]() ![]() Babel 83B Alibaba-DAMO | 50.39 ± 5.33 | 45.24 ± 3.70 | 8.62 ± 0.77 | 44.20 ± 1.47 | 65.09 ± 13.42 | 65.43 ± 3.52 | 68.44 ± 10.58 | 55.72 ± 17.07 |
![]() ![]() Olmo 2 0325 32B AI2 | 49.35 ± 0.89 | 70.71 ± 2.17 | 6.14 ± 0.63 | 35.74 ± 2.35 | 55.36 ± 1.59 | 65.95 ± 1.80 | 58.94 ± 4.79 | 52.63 ± 1.63 |
![]() ![]() phi-4 14B Microsoft | 49.25 ± 2.75 | 67.26 ± 2.78 | 19.50 ± 1.17 | 44.72 ± 0.46 | 70.52 ± 5.33 | 53.30 ± 7.18 | 26.80 ± 13.00 | 62.63 ± 1.35 |
![]() ![]() Sailor2 8B SAIL | 48.59 ± 0.86 | 41.90 ± 2.03 | 27.75 ± 1.33 | 49.96 ± 0.45 | 78.41 ± 1.02 | 57.72 ± 4.24 | 41.23 ± 3.84 | 43.19 ± 4.78 |
![]() ![]() Llama 3 8B Meta | 47.65 ± 0.63 | 22.02 ± 2.32 | 6.09 ± 1.16 | 52.46 ± 0.13 | 65.41 ± 2.69 | 76.36 ± 0.50 | 60.88 ± 2.03 | 50.31 ± 0.46 |
![]() ![]() Olmo 2 1124 13B AI2 | 45.08 ± 1.53 | 62.14 ± 3.19 | 3.18 ± 0.45 | 32.62 ± 0.81 | 49.08 ± 4.31 | 60.62 ± 4.51 | 79.15 ± 3.18 | 28.75 ± 0.81 |
![]() ![]() Babel 9B Alibaba-DAMO | 44.97 ± 2.17 | 43.45 ± 2.34 | 5.01 ± 0.97 | 38.88 ± 3.43 | 63.40 ± 4.58 | 70.25 ± 5.75 | 57.20 ± 7.83 | 36.56 ± 7.64 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.91 ± 3.76 | 52.02 ± 2.66 | 10.88 ± 1.21 | 41.83 ± 1.93 | 58.92 ± 6.58 | 58.33 ± 9.14 | 25.10 ± 17.37 | 46.28 ± 4.66 |
![]() ![]() Ministral 2410 8B Mistral AI | 37.71 ± 2.67 | 36.19 ± 3.17 | 3.93 ± 0.65 | 32.11 ± 1.95 | 51.44 ± 3.57 | 55.31 ± 4.48 | 42.56 ± 15.15 | 42.41 ± 4.09 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 36.11 ± 6.91 | 52.74 ± 5.22 | 2.32 ± 0.68 | 33.83 ± 1.71 | 53.00 ± 17.03 | 58.56 ± 10.28 | 18.24 ± 5.57 | 34.06 ± 14.13 |
![]() ![]() Sailor2 20B SAIL | 33.92 ± 1.13 | 45.83 ± 1.88 | 25.81 ± 1.15 | 34.87 ± 1.21 | 46.39 ± 0.26 | 29.14 ± 4.40 | 6.50 ± 3.62 | 48.88 ± 1.85 |
![]() ![]() Olmo 2 1124 7B AI2 | 30.65 ± 2.83 | 50.24 ± 1.21 | 2.37 ± 0.70 | 28.57 ± 0.34 | 32.91 ± 5.60 | 41.34 ± 0.69 | 35.06 ± 17.59 | 24.06 ± 2.44 |
Vietnamese Tasks
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | VI | Instruction Following | SEA-IFEval |
---|---|---|---|
![]() ![]() Qwen 3 30B MoE Alibaba | 72.49 ± 0.20 | 88.10 ± 0.86 | 88.10 ± 5.53 |
![]() ![]() Qwen 3 32B Alibaba | 69.94 ± 0.38 | 87.26 ± 1.36 | 87.26 ± 5.38 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 69.65 ± 0.55 | 92.14 ± 0.77 | 92.14 ± 4.02 |
![]() ![]() Command A 03-2025 111B CohereLabs | 69.10 ± 0.58 | 85.95 ± 1.57 | 85.95 ± 5.37 |
![]() ![]() Tulu 3 70B AI2 | 68.85 ± 0.24 | 84.17 ± 0.70 | 84.17 ± 5.81 |
![]() ![]() Qwen 2.5 72B Alibaba | 68.27 ± 0.29 | 89.05 ± 0.79 | 89.05 ± 5.21 |
![]() ![]() Qwen 3 14B Alibaba | 67.62 ± 0.15 | 89.64 ± 0.55 | 89.64 ± 5.04 |
![]() ![]() Llama 3.3 70B Meta | 67.36 ± 0.19 | 92.62 ± 0.58 | 92.62 ± 4.50 |
![]() ![]() SEA-LION v4 27B AISG | 66.10 ± 0.32 | 85.36 ± 0.70 | 85.36 ± 5.68 |
![]() ![]() Gemma 3 27B | 65.64 ± 0.42 | 85.36 ± 1.73 | 85.36 ± 5.69 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.97 ± 0.30 | 87.74 ± 1.39 | 87.74 ± 5.31 |
![]() ![]() Qwen 3 8B Alibaba | 64.68 ± 0.24 | 85.24 ± 1.54 | 85.24 ± 5.86 |
![]() ![]() Gemma 3 12B | 64.39 ± 0.49 | 85.95 ± 1.35 | 85.95 ± 5.64 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 64.34 ± 0.42 | 83.21 ± 2.71 | 83.21 ± 5.20 |
![]() ![]() Llama 3.1 70B Meta | 63.96 ± 0.34 | 81.43 ± 1.37 | 81.43 ± 5.72 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 63.73 ± 0.19 | 90.95 ± 0.93 | 90.95 ± 5.01 |
![]() ![]() Gemma 2 27B | 62.98 ± 0.47 | 80.36 ± 1.69 | 80.36 ± 6.04 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 62.36 ± 2.29 | 85.36 ± 2.39 | 85.36 ± 4.95 |
![]() ![]() Aya Expanse 32B CohereLabs | 62.14 ± 0.31 | 77.86 ± 1.72 | 77.86 ± 6.92 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 60.93 ± 1.24 | 67.86 ± 2.75 | 67.86 ± 6.43 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.02 ± 0.25 | 83.33 ± 1.83 | 83.33 ± 5.90 |
![]() ![]() Gemma 2 9B | 59.64 ± 0.51 | 75.36 ± 1.59 | 75.36 ± 5.98 |
![]() ![]() MERaLiON 2 10B A*STAR | 58.52 ± 0.61 | 75.12 ± 2.51 | 75.12 ± 6.00 |
![]() ![]() Qwen 2.5 7B Alibaba | 57.96 ± 0.27 | 74.05 ± 1.10 | 74.05 ± 6.95 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 57.87 ± 0.63 | 80.12 ± 1.34 | 80.12 ± 6.22 |
![]() ![]() Aya Expanse 8B CohereLabs | 57.18 ± 0.42 | 66.19 ± 1.58 | 66.19 ± 7.78 |
![]() ![]() Llama 3 70B Meta | 55.14 ± 0.22 | 18.45 ± 0.86 | 18.45 ± 6.79 |
![]() ![]() Tulu 3 8B AI2 | 54.80 ± 0.61 | 81.79 ± 1.29 | 81.79 ± 6.40 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 54.78 ± 3.78 | 74.88 ± 3.05 | 74.88 ± 6.77 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 54.16 ± 1.61 | 72.14 ± 1.53 | 72.14 ± 5.69 |
![]() ![]() Command R 08-2024 32B CohereLabs | 53.76 ± 3.73 | 61.19 ± 3.30 | 61.19 ± 7.01 |
![]() ![]() Llama 3.1 8B Meta | 51.21 ± 0.62 | 71.07 ± 1.41 | 71.07 ± 7.04 |
![]() ![]() Babel 83B Alibaba-DAMO | 50.39 ± 5.33 | 45.24 ± 3.70 | 45.24 ± 6.13 |
![]() ![]() Olmo 2 0325 32B AI2 | 49.35 ± 0.89 | 70.71 ± 2.17 | 70.71 ± 6.77 |
![]() ![]() phi-4 14B Microsoft | 49.25 ± 2.75 | 67.26 ± 2.78 | 67.26 ± 6.66 |
![]() ![]() Sailor2 8B SAIL | 48.59 ± 0.86 | 41.90 ± 2.03 | 41.90 ± 8.25 |
![]() ![]() Llama 3 8B Meta | 47.65 ± 0.63 | 22.02 ± 2.32 | 22.02 ± 7.04 |
![]() ![]() Olmo 2 1124 13B AI2 | 45.08 ± 1.53 | 62.14 ± 3.19 | 62.14 ± 7.30 |
![]() ![]() Babel 9B Alibaba-DAMO | 44.97 ± 2.17 | 43.45 ± 2.34 | 43.45 ± 7.38 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.91 ± 3.76 | 52.02 ± 2.66 | 52.02 ± 7.65 |
![]() ![]() Ministral 2410 8B Mistral AI | 37.71 ± 2.67 | 36.19 ± 3.17 | 36.19 ± 6.23 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 36.11 ± 6.91 | 52.74 ± 5.22 | 52.74 ± 5.98 |
![]() ![]() Sailor2 20B SAIL | 33.92 ± 1.13 | 45.83 ± 1.88 | 45.83 ± 8.40 |
![]() ![]() Olmo 2 1124 7B AI2 | 30.65 ± 2.83 | 50.24 ± 1.21 | 50.24 ± 7.29 |
Model | VI | Multi-Turn Chat | SEA-MT-Bench |
---|---|---|---|
![]() ![]() Qwen 3 30B MoE Alibaba | 72.49 ± 0.20 | 63.47 ± 1.08 | 63.47 ± 5.33 |
![]() ![]() Qwen 3 32B Alibaba | 69.94 ± 0.38 | 50.70 ± 1.81 | 50.70 ± 4.89 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 69.65 ± 0.55 | 30.50 ± 0.84 | 30.50 ± 4.83 |
![]() ![]() Command A 03-2025 111B CohereLabs | 69.10 ± 0.58 | 42.51 ± 1.78 | 42.51 ± 4.37 |
![]() ![]() Tulu 3 70B AI2 | 68.85 ± 0.24 | 28.02 ± 1.07 | 28.02 ± 4.76 |
![]() ![]() Qwen 2.5 72B Alibaba | 68.27 ± 0.29 | 32.65 ± 1.20 | 32.65 ± 4.60 |
![]() ![]() Qwen 3 14B Alibaba | 67.62 ± 0.15 | 42.56 ± 1.25 | 42.56 ± 5.19 |
![]() ![]() Llama 3.3 70B Meta | 67.36 ± 0.19 | 18.97 ± 1.27 | 18.97 ± 4.47 |
![]() ![]() SEA-LION v4 27B AISG | 66.10 ± 0.32 | 45.74 ± 1.40 | 45.74 ± 5.41 |
![]() ![]() Gemma 3 27B | 65.64 ± 0.42 | 43.80 ± 1.64 | 43.80 ± 5.66 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.97 ± 0.30 | 19.72 ± 1.49 | 19.72 ± 4.48 |
![]() ![]() Qwen 3 8B Alibaba | 64.68 ± 0.24 | 35.29 ± 0.92 | 35.29 ± 5.08 |
![]() ![]() Gemma 3 12B | 64.39 ± 0.49 | 39.49 ± 1.01 | 39.49 ± 5.74 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 64.34 ± 0.42 | 28.13 ± 1.39 | 28.13 ± 4.96 |
![]() ![]() Llama 3.1 70B Meta | 63.96 ± 0.34 | 14.71 ± 1.23 | 14.71 ± 3.99 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 63.73 ± 0.19 | 22.47 ± 0.77 | 22.47 ± 4.98 |
![]() ![]() Gemma 2 27B | 62.98 ± 0.47 | 16.16 ± 1.13 | 16.16 ± 3.92 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 62.36 ± 2.29 | 18.43 ± 0.98 | 18.43 ± 4.41 |
![]() ![]() Aya Expanse 32B CohereLabs | 62.14 ± 0.31 | 25.27 ± 0.69 | 25.27 ± 5.01 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 60.93 ± 1.24 | 10.18 ± 1.07 | 10.18 ± 3.40 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.02 ± 0.25 | 18.53 ± 1.48 | 18.53 ± 4.42 |
![]() ![]() Gemma 2 9B | 59.64 ± 0.51 | 12.34 ± 0.73 | 12.34 ± 3.62 |
![]() ![]() MERaLiON 2 10B A*STAR | 58.52 ± 0.61 | 9.91 ± 0.89 | 9.91 ± 3.59 |
![]() ![]() Qwen 2.5 7B Alibaba | 57.96 ± 0.27 | 15.57 ± 1.06 | 15.57 ± 4.11 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 57.87 ± 0.63 | 24.46 ± 1.82 | 24.46 ± 4.98 |
![]() ![]() Aya Expanse 8B CohereLabs | 57.18 ± 0.42 | 19.45 ± 1.08 | 19.45 ± 4.46 |
![]() ![]() Llama 3 70B Meta | 55.14 ± 0.22 | 8.94 ± 0.83 | 8.94 ± 2.97 |
![]() ![]() Tulu 3 8B AI2 | 54.80 ± 0.61 | 13.47 ± 1.20 | 13.47 ± 3.54 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 54.78 ± 3.78 | 17.03 ± 1.53 | 17.03 ± 4.25 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 54.16 ± 1.61 | 10.18 ± 0.68 | 10.18 ± 3.28 |
![]() ![]() Command R 08-2024 32B CohereLabs | 53.76 ± 3.73 | 6.14 ± 0.90 | 6.14 ± 2.54 |
![]() ![]() Llama 3.1 8B Meta | 51.21 ± 0.62 | 10.67 ± 0.87 | 10.67 ± 3.69 |
![]() ![]() Babel 83B Alibaba-DAMO | 50.39 ± 5.33 | 8.62 ± 0.77 | 8.62 ± 2.21 |
![]() ![]() Olmo 2 0325 32B AI2 | 49.35 ± 0.89 | 6.14 ± 0.63 | 6.14 ± 2.65 |
![]() ![]() phi-4 14B Microsoft | 49.25 ± 2.75 | 19.50 ± 1.17 | 19.50 ± 4.40 |
![]() ![]() Sailor2 8B SAIL | 48.59 ± 0.86 | 27.75 ± 1.33 | 27.75 ± 5.08 |
![]() ![]() Llama 3 8B Meta | 47.65 ± 0.63 | 6.09 ± 1.16 | 6.09 ± 2.50 |
![]() ![]() Olmo 2 1124 13B AI2 | 45.08 ± 1.53 | 3.18 ± 0.45 | 3.18 ± 1.73 |
![]() ![]() Babel 9B Alibaba-DAMO | 44.97 ± 2.17 | 5.01 ± 0.97 | 5.01 ± 1.90 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.91 ± 3.76 | 10.88 ± 1.21 | 10.88 ± 3.00 |
![]() ![]() Ministral 2410 8B Mistral AI | 37.71 ± 2.67 | 3.93 ± 0.65 | 3.93 ± 2.21 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 36.11 ± 6.91 | 2.32 ± 0.68 | 2.32 ± 1.51 |
![]() ![]() Sailor2 20B SAIL | 33.92 ± 1.13 | 25.81 ± 1.15 | 25.81 ± 4.57 |
![]() ![]() Olmo 2 1124 7B AI2 | 30.65 ± 2.83 | 2.37 ± 0.70 | 2.37 ± 1.36 |
Model | VI | NLG | Summarization | Translations |
---|---|---|---|---|
![]() ![]() Qwen 3 30B MoE Alibaba | 72.49 ± 0.20 | 53.51 ± 0.06 | 14.83 ± 1.38 | 92.19 ± 0.03 |
![]() ![]() Qwen 3 32B Alibaba | 69.94 ± 0.38 | 54.89 ± 0.14 | 17.43 ± 1.73 | 92.35 ± 0.05 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 69.65 ± 0.55 | 56.24 ± 0.16 | 19.72 ± 1.85 | 92.75 ± 0.03 |
![]() ![]() Command A 03-2025 111B CohereLabs | 69.10 ± 0.58 | 54.36 ± 0.35 | 16.62 ± 1.62 | 92.10 ± 0.57 |
![]() ![]() Tulu 3 70B AI2 | 68.85 ± 0.24 | 57.64 ± 0.21 | 22.63 ± 2.30 | 92.66 ± 0.04 |
![]() ![]() Qwen 2.5 72B Alibaba | 68.27 ± 0.29 | 54.27 ± 0.09 | 16.09 ± 1.57 | 92.46 ± 0.05 |
![]() ![]() Qwen 3 14B Alibaba | 67.62 ± 0.15 | 54.85 ± 0.09 | 17.77 ± 1.72 | 91.94 ± 0.03 |
![]() ![]() Llama 3.3 70B Meta | 67.36 ± 0.19 | 54.20 ± 0.13 | 17.12 ± 1.67 | 91.29 ± 0.06 |
![]() ![]() SEA-LION v4 27B AISG | 66.10 ± 0.32 | 52.62 ± 0.57 | 14.73 ± 1.35 | 90.51 ± 1.13 |
![]() ![]() Gemma 3 27B | 65.64 ± 0.42 | 51.67 ± 0.45 | 14.76 ± 1.37 | 88.58 ± 0.80 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.97 ± 0.30 | 52.29 ± 0.07 | 14.81 ± 1.46 | 89.78 ± 0.07 |
![]() ![]() Qwen 3 8B Alibaba | 64.68 ± 0.24 | 53.97 ± 0.10 | 16.92 ± 1.65 | 91.02 ± 0.03 |
![]() ![]() Gemma 3 12B | 64.39 ± 0.49 | 53.45 ± 0.07 | 14.32 ± 1.20 | 92.58 ± 0.03 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 64.34 ± 0.42 | 54.59 ± 0.11 | 17.31 ± 1.63 | 91.88 ± 0.10 |
![]() ![]() Llama 3.1 70B Meta | 63.96 ± 0.34 | 56.52 ± 0.27 | 21.54 ± 2.06 | 91.51 ± 0.04 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 63.73 ± 0.19 | 54.55 ± 0.07 | 17.46 ± 1.63 | 91.63 ± 0.03 |
![]() ![]() Gemma 2 27B | 62.98 ± 0.47 | 54.82 ± 0.62 | 18.93 ± 1.75 | 90.70 ± 1.14 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 62.36 ± 2.29 | 54.47 ± 0.13 | 17.65 ± 1.61 | 91.30 ± 0.18 |
![]() ![]() Aya Expanse 32B CohereLabs | 62.14 ± 0.31 | 54.86 ± 0.16 | 17.45 ± 1.60 | 92.27 ± 0.04 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 60.93 ± 1.24 | 53.93 ± 0.24 | 16.91 ± 1.38 | 90.96 ± 0.18 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.02 ± 0.25 | 50.73 ± 0.09 | 15.19 ± 1.54 | 86.27 ± 0.14 |
![]() ![]() Gemma 2 9B | 59.64 ± 0.51 | 54.08 ± 0.18 | 17.80 ± 1.70 | 90.35 ± 0.20 |
![]() ![]() MERaLiON 2 10B A*STAR | 58.52 ± 0.61 | 53.57 ± 0.21 | 17.53 ± 1.53 | 89.60 ± 0.18 |
![]() ![]() Qwen 2.5 7B Alibaba | 57.96 ± 0.27 | 43.47 ± 0.06 | 0.16 ± 0.32 | 86.78 ± 0.11 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 57.87 ± 0.63 | 54.03 ± 0.19 | 16.52 ± 1.45 | 91.55 ± 0.10 |
![]() ![]() Aya Expanse 8B CohereLabs | 57.18 ± 0.42 | 52.27 ± 0.47 | 15.10 ± 1.43 | 89.45 ± 0.95 |
![]() ![]() Llama 3 70B Meta | 55.14 ± 0.22 | 55.52 ± 0.19 | 19.88 ± 1.98 | 91.15 ± 0.05 |
![]() ![]() Tulu 3 8B AI2 | 54.80 ± 0.61 | 36.41 ± 2.27 | 14.51 ± 1.39 | 58.32 ± 4.65 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 54.78 ± 3.78 | 48.75 ± 0.53 | 8.43 ± 1.19 | 89.07 ± 0.34 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 54.16 ± 1.61 | 49.06 ± 1.26 | 14.54 ± 1.34 | 83.58 ± 0.99 |
![]() ![]() Command R 08-2024 32B CohereLabs | 53.76 ± 3.73 | 53.35 ± 0.80 | 17.21 ± 1.44 | 89.48 ± 1.34 |
![]() ![]() Llama 3.1 8B Meta | 51.21 ± 0.62 | 52.44 ± 0.35 | 17.48 ± 1.52 | 87.40 ± 0.68 |
![]() ![]() Babel 83B Alibaba-DAMO | 50.39 ± 5.33 | 44.20 ± 1.47 | 14.77 ± 1.34 | 73.63 ± 2.42 |
![]() ![]() Olmo 2 0325 32B AI2 | 49.35 ± 0.89 | 35.74 ± 2.35 | 14.29 ± 1.19 | 57.19 ± 4.66 |
![]() ![]() phi-4 14B Microsoft | 49.25 ± 2.75 | 44.72 ± 0.46 | 12.49 ± 1.06 | 76.96 ± 0.84 |
![]() ![]() Sailor2 8B SAIL | 48.59 ± 0.86 | 49.96 ± 0.45 | 14.34 ± 1.36 | 85.58 ± 0.71 |
![]() ![]() Llama 3 8B Meta | 47.65 ± 0.63 | 52.46 ± 0.13 | 17.40 ± 1.61 | 87.51 ± 0.14 |
![]() ![]() Olmo 2 1124 13B AI2 | 45.08 ± 1.53 | 32.62 ± 0.81 | 10.70 ± 0.87 | 54.55 ± 1.48 |
![]() ![]() Babel 9B Alibaba-DAMO | 44.97 ± 2.17 | 38.88 ± 3.43 | 8.63 ± 0.83 | 69.14 ± 6.14 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.91 ± 3.76 | 41.83 ± 1.93 | 11.07 ± 1.11 | 72.59 ± 4.00 |
![]() ![]() Ministral 2410 8B Mistral AI | 37.71 ± 2.67 | 32.11 ± 1.95 | 7.35 ± 0.91 | 56.87 ± 3.03 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 36.11 ± 6.91 | 33.83 ± 1.71 | 8.22 ± 0.98 | 59.44 ± 2.76 |
![]() ![]() Sailor2 20B SAIL | 33.92 ± 1.13 | 34.87 ± 1.21 | 15.30 ± 1.42 | 54.45 ± 2.43 |
![]() ![]() Olmo 2 1124 7B AI2 | 30.65 ± 2.83 | 28.57 ± 0.34 | 10.54 ± 0.82 | 46.60 ± 0.78 |
Model | VI | NLR | Causal Reasoning | Natural Language Inference |
---|---|---|---|---|
![]() ![]() Qwen 3 30B MoE Alibaba | 72.49 ± 0.20 | 81.39 ± 0.11 | 92.90 ± 2.24 | 69.88 ± 2.76 |
![]() ![]() Qwen 3 32B Alibaba | 69.94 ± 0.38 | 80.56 ± 0.57 | 91.85 ± 2.32 | 69.27 ± 2.72 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 69.65 ± 0.55 | 81.51 ± 1.80 | 94.85 ± 1.76 | 68.17 ± 2.49 |
![]() ![]() Command A 03-2025 111B CohereLabs | 69.10 ± 0.58 | 76.52 ± 3.49 | 96.38 ± 1.58 | 56.66 ± 2.59 |
![]() ![]() Tulu 3 70B AI2 | 68.85 ± 0.24 | 83.02 ± 0.68 | 94.10 ± 1.99 | 71.95 ± 2.61 |
![]() ![]() Qwen 2.5 72B Alibaba | 68.27 ± 0.29 | 73.68 ± 0.78 | 94.88 ± 1.87 | 52.49 ± 2.99 |
![]() ![]() Qwen 3 14B Alibaba | 67.62 ± 0.15 | 77.58 ± 0.18 | 89.80 ± 2.58 | 65.36 ± 2.90 |
![]() ![]() Llama 3.3 70B Meta | 67.36 ± 0.19 | 84.13 ± 0.19 | 94.80 ± 1.90 | 73.46 ± 2.64 |
![]() ![]() SEA-LION v4 27B AISG | 66.10 ± 0.32 | 81.18 ± 0.10 | 92.70 ± 2.20 | 69.66 ± 2.76 |
![]() ![]() Gemma 3 27B | 65.64 ± 0.42 | 81.44 ± 0.08 | 92.88 ± 2.23 | 70.01 ± 2.78 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.97 ± 0.30 | 76.42 ± 0.28 | 94.70 ± 1.95 | 58.15 ± 3.01 |
![]() ![]() Qwen 3 8B Alibaba | 64.68 ± 0.24 | 74.20 ± 0.78 | 83.35 ± 3.09 | 65.05 ± 2.69 |
![]() ![]() Gemma 3 12B | 64.39 ± 0.49 | 83.04 ± 0.27 | 95.28 ± 1.77 | 70.80 ± 2.75 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 64.34 ± 0.42 | 77.05 ± 0.35 | 92.92 ± 2.14 | 61.18 ± 2.92 |
![]() ![]() Llama 3.1 70B Meta | 63.96 ± 0.34 | 82.38 ± 0.50 | 93.67 ± 2.03 | 71.09 ± 2.37 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 63.73 ± 0.19 | 71.44 ± 0.17 | 92.85 ± 2.23 | 50.04 ± 3.08 |
![]() ![]() Gemma 2 27B | 62.98 ± 0.47 | 81.42 ± 0.14 | 93.23 ± 2.12 | 69.61 ± 2.76 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 62.36 ± 2.29 | 74.83 ± 3.22 | 91.30 ± 1.92 | 58.35 ± 2.35 |
![]() ![]() Aya Expanse 32B CohereLabs | 62.14 ± 0.31 | 77.21 ± 0.18 | 94.73 ± 1.91 | 59.70 ± 3.00 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 60.93 ± 1.24 | 73.51 ± 4.70 | 86.35 ± 1.92 | 60.68 ± 2.34 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.02 ± 0.25 | 70.54 ± 0.30 | 92.27 ± 2.29 | 48.80 ± 3.05 |
![]() ![]() Gemma 2 9B | 59.64 ± 0.51 | 71.71 ± 0.27 | 91.72 ± 2.27 | 51.70 ± 3.03 |
![]() ![]() MERaLiON 2 10B A*STAR | 58.52 ± 0.61 | 71.61 ± 0.42 | 91.92 ± 2.26 | 51.30 ± 2.99 |
![]() ![]() Qwen 2.5 7B Alibaba | 57.96 ± 0.27 | 71.73 ± 0.42 | 85.22 ± 3.10 | 58.24 ± 2.96 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 57.87 ± 0.63 | 70.48 ± 2.58 | 88.70 ± 2.41 | 52.25 ± 2.51 |
![]() ![]() Aya Expanse 8B CohereLabs | 57.18 ± 0.42 | 71.18 ± 2.26 | 88.22 ± 2.75 | 54.14 ± 2.75 |
![]() ![]() Llama 3 70B Meta | 55.14 ± 0.22 | 80.25 ± 0.39 | 92.95 ± 2.20 | 67.55 ± 2.84 |
![]() ![]() Tulu 3 8B AI2 | 54.80 ± 0.61 | 71.91 ± 2.10 | 83.13 ± 2.94 | 60.69 ± 2.19 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 54.78 ± 3.78 | 62.83 ± 5.34 | 78.45 ± 2.74 | 47.21 ± 2.52 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 54.16 ± 1.61 | 72.39 ± 2.79 | 89.13 ± 2.34 | 55.65 ± 2.57 |
![]() ![]() Command R 08-2024 32B CohereLabs | 53.76 ± 3.73 | 69.81 ± 5.12 | 89.20 ± 1.89 | 50.42 ± 2.32 |
![]() ![]() Llama 3.1 8B Meta | 51.21 ± 0.62 | 53.67 ± 2.40 | 53.75 ± 4.10 | 53.59 ± 2.36 |
![]() ![]() Babel 83B Alibaba-DAMO | 50.39 ± 5.33 | 65.09 ± 13.42 | 84.52 ± 1.37 | 45.66 ± 1.47 |
![]() ![]() Olmo 2 0325 32B AI2 | 49.35 ± 0.89 | 55.36 ± 1.59 | 76.72 ± 2.54 | 33.99 ± 2.92 |
![]() ![]() phi-4 14B Microsoft | 49.25 ± 2.75 | 70.52 ± 5.33 | 87.42 ± 2.19 | 53.61 ± 2.46 |
![]() ![]() Sailor2 8B SAIL | 48.59 ± 0.86 | 78.41 ± 1.02 | 89.25 ± 2.39 | 67.56 ± 2.46 |
![]() ![]() Llama 3 8B Meta | 47.65 ± 0.63 | 65.41 ± 2.69 | 75.30 ± 3.32 | 55.51 ± 2.70 |
![]() ![]() Olmo 2 1124 13B AI2 | 45.08 ± 1.53 | 49.08 ± 4.31 | 55.23 ± 2.07 | 42.93 ± 2.37 |
![]() ![]() Babel 9B Alibaba-DAMO | 44.97 ± 2.17 | 63.40 ± 4.58 | 83.28 ± 2.26 | 43.53 ± 1.52 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.91 ± 3.76 | 58.92 ± 6.58 | 75.17 ± 2.32 | 42.66 ± 1.39 |
![]() ![]() Ministral 2410 8B Mistral AI | 37.71 ± 2.67 | 51.44 ± 3.57 | 61.15 ± 2.10 | 41.74 ± 0.97 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 36.11 ± 6.91 | 53.00 ± 17.03 | 59.85 ± 1.70 | 46.15 ± 1.87 |
![]() ![]() Sailor2 20B SAIL | 33.92 ± 1.13 | 46.39 ± 0.26 | 92.77 ± 2.21 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 7B AI2 | 30.65 ± 2.83 | 32.91 ± 5.60 | 32.13 ± 2.95 | 33.69 ± 2.19 |
Model | VI | NLU | Question Answering | Sentiment Analysis |
---|---|---|---|---|
![]() ![]() Qwen 3 30B MoE Alibaba | 72.49 ± 0.20 | 79.07 ± 0.24 | 77.98 ± 5.73 | 80.16 ± 2.46 |
![]() ![]() Qwen 3 32B Alibaba | 69.94 ± 0.38 | 73.52 ± 0.31 | 67.55 ± 6.44 | 79.50 ± 2.45 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 69.65 ± 0.55 | 79.40 ± 0.68 | 75.74 ± 5.67 | 83.06 ± 2.24 |
![]() ![]() Command A 03-2025 111B CohereLabs | 69.10 ± 0.58 | 74.68 ± 0.53 | 74.04 ± 5.59 | 75.33 ± 2.60 |
![]() ![]() Tulu 3 70B AI2 | 68.85 ± 0.24 | 77.25 ± 0.28 | 70.73 ± 5.99 | 83.76 ± 2.21 |
![]() ![]() Qwen 2.5 72B Alibaba | 68.27 ± 0.29 | 77.63 ± 0.25 | 75.77 ± 6.03 | 79.50 ± 2.46 |
![]() ![]() Qwen 3 14B Alibaba | 67.62 ± 0.15 | 78.17 ± 0.28 | 77.65 ± 5.96 | 78.70 ± 2.50 |
![]() ![]() Llama 3.3 70B Meta | 67.36 ± 0.19 | 80.19 ± 0.16 | 76.15 ± 5.92 | 84.24 ± 2.24 |
![]() ![]() SEA-LION v4 27B AISG | 66.10 ± 0.32 | 73.17 ± 0.47 | 63.88 ± 6.48 | 82.46 ± 2.31 |
![]() ![]() Gemma 3 27B | 65.64 ± 0.42 | 72.62 ± 0.49 | 62.33 ± 6.52 | 82.91 ± 2.31 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.97 ± 0.30 | 77.07 ± 0.50 | 76.12 ± 5.55 | 78.01 ± 2.53 |
![]() ![]() Qwen 3 8B Alibaba | 64.68 ± 0.24 | 78.82 ± 0.35 | 79.50 ± 5.98 | 78.14 ± 2.52 |
![]() ![]() Gemma 3 12B | 64.39 ± 0.49 | 68.58 ± 0.57 | 60.82 ± 6.09 | 76.34 ± 2.60 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 64.34 ± 0.42 | 78.51 ± 0.40 | 78.26 ± 5.41 | 78.75 ± 2.42 |
![]() ![]() Llama 3.1 70B Meta | 63.96 ± 0.34 | 73.53 ± 0.74 | 61.27 ± 6.23 | 85.80 ± 2.09 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 63.73 ± 0.19 | 78.37 ± 0.26 | 75.32 ± 6.12 | 81.41 ± 2.39 |
![]() ![]() Gemma 2 27B | 62.98 ± 0.47 | 78.02 ± 0.99 | 72.34 ± 5.77 | 83.70 ± 2.11 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 62.36 ± 2.29 | 74.20 ± 1.20 | 67.13 ± 5.17 | 81.27 ± 2.20 |
![]() ![]() Aya Expanse 32B CohereLabs | 62.14 ± 0.31 | 75.06 ± 0.50 | 71.70 ± 6.08 | 78.41 ± 2.46 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 60.93 ± 1.24 | 79.40 ± 1.30 | 80.62 ± 4.88 | 78.17 ± 2.32 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.02 ± 0.25 | 70.80 ± 0.45 | 63.36 ± 6.64 | 78.25 ± 2.51 |
![]() ![]() Gemma 2 9B | 59.64 ± 0.51 | 78.08 ± 1.27 | 75.27 ± 5.61 | 80.89 ± 2.30 |
![]() ![]() MERaLiON 2 10B A*STAR | 58.52 ± 0.61 | 76.33 ± 1.20 | 74.94 ± 5.59 | 77.72 ± 2.37 |
![]() ![]() Qwen 2.5 7B Alibaba | 57.96 ± 0.27 | 67.15 ± 0.56 | 61.66 ± 6.54 | 72.64 ± 2.70 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 57.87 ± 0.63 | 72.84 ± 1.29 | 68.33 ± 6.48 | 77.35 ± 2.43 |
![]() ![]() Aya Expanse 8B CohereLabs | 57.18 ± 0.42 | 74.72 ± 0.28 | 71.01 ± 6.09 | 78.42 ± 2.52 |
![]() ![]() Llama 3 70B Meta | 55.14 ± 0.22 | 81.75 ± 0.28 | 81.67 ± 5.51 | 81.84 ± 2.38 |
![]() ![]() Tulu 3 8B AI2 | 54.80 ± 0.61 | 66.58 ± 1.02 | 69.12 ± 6.53 | 64.05 ± 2.85 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 54.78 ± 3.78 | 59.59 ± 5.99 | 74.98 ± 6.36 | 44.19 ± 2.38 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 54.16 ± 1.61 | 65.26 ± 5.77 | 66.47 ± 5.28 | 64.05 ± 2.24 |
![]() ![]() Command R 08-2024 32B CohereLabs | 53.76 ± 3.73 | 72.09 ± 8.33 | 76.85 ± 4.90 | 67.33 ± 2.32 |
![]() ![]() Llama 3.1 8B Meta | 51.21 ± 0.62 | 76.97 ± 0.47 | 72.38 ± 6.46 | 81.55 ± 2.31 |
![]() ![]() Babel 83B Alibaba-DAMO | 50.39 ± 5.33 | 65.43 ± 3.52 | 53.90 ± 4.66 | 76.96 ± 1.96 |
![]() ![]() Olmo 2 0325 32B AI2 | 49.35 ± 0.89 | 65.95 ± 1.80 | 55.91 ± 6.20 | 75.99 ± 2.12 |
![]() ![]() phi-4 14B Microsoft | 49.25 ± 2.75 | 53.30 ± 7.18 | 46.00 ± 4.78 | 60.60 ± 2.25 |
![]() ![]() Sailor2 8B SAIL | 48.59 ± 0.86 | 57.72 ± 4.24 | 50.36 ± 5.69 | 65.08 ± 2.44 |
![]() ![]() Llama 3 8B Meta | 47.65 ± 0.63 | 76.36 ± 0.50 | 71.69 ± 6.46 | 81.03 ± 2.32 |
![]() ![]() Olmo 2 1124 13B AI2 | 45.08 ± 1.53 | 60.62 ± 4.51 | 50.52 ± 5.89 | 70.73 ± 1.76 |
![]() ![]() Babel 9B Alibaba-DAMO | 44.97 ± 2.17 | 70.25 ± 5.75 | 72.74 ± 5.53 | 67.75 ± 2.12 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.91 ± 3.76 | 58.33 ± 9.14 | 65.39 ± 5.92 | 51.28 ± 1.69 |
![]() ![]() Ministral 2410 8B Mistral AI | 37.71 ± 2.67 | 55.31 ± 4.48 | 63.56 ± 6.06 | 47.06 ± 2.10 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 36.11 ± 6.91 | 58.56 ± 10.28 | 60.62 ± 5.66 | 56.50 ± 2.08 |
![]() ![]() Sailor2 20B SAIL | 33.92 ± 1.13 | 29.14 ± 4.40 | 41.87 ± 4.20 | 16.40 ± 1.71 |
![]() ![]() Olmo 2 1124 7B AI2 | 30.65 ± 2.83 | 41.34 ± 0.69 | 38.42 ± 5.01 | 44.26 ± 2.22 |
Model | VI | Safety | Toxicity Detection |
---|---|---|---|
![]() ![]() Qwen 3 30B MoE Alibaba | 72.49 ± 0.20 | 67.24 ± 0.72 | 67.24 ± 2.87 |
![]() ![]() Qwen 3 32B Alibaba | 69.94 ± 0.38 | 68.45 ± 0.80 | 68.45 ± 2.81 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 69.65 ± 0.55 | 69.55 ± 2.83 | 69.55 ± 2.57 |
![]() ![]() Command A 03-2025 111B CohereLabs | 69.10 ± 0.58 | 77.05 ± 3.63 | 77.05 ± 2.25 |
![]() ![]() Tulu 3 70B AI2 | 68.85 ± 0.24 | 80.05 ± 2.32 | 80.05 ± 2.19 |
![]() ![]() Qwen 2.5 72B Alibaba | 68.27 ± 0.29 | 73.95 ± 0.75 | 73.95 ± 2.66 |
![]() ![]() Qwen 3 14B Alibaba | 67.62 ± 0.15 | 65.28 ± 0.62 | 65.28 ± 2.90 |
![]() ![]() Llama 3.3 70B Meta | 67.36 ± 0.19 | 66.48 ± 0.76 | 66.47 ± 2.84 |
![]() ![]() SEA-LION v4 27B AISG | 66.10 ± 0.32 | 51.85 ± 2.76 | 51.85 ± 2.93 |
![]() ![]() Gemma 3 27B | 65.64 ± 0.42 | 50.80 ± 2.10 | 50.80 ± 2.98 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.97 ± 0.30 | 71.64 ± 0.20 | 71.64 ± 2.77 |
![]() ![]() Qwen 3 8B Alibaba | 64.68 ± 0.24 | 63.15 ± 1.29 | 63.15 ± 2.92 |
![]() ![]() Gemma 3 12B | 64.39 ± 0.49 | 53.01 ± 2.63 | 53.01 ± 2.95 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 64.34 ± 0.42 | 65.64 ± 1.93 | 65.64 ± 2.83 |
![]() ![]() Llama 3.1 70B Meta | 63.96 ± 0.34 | 65.86 ± 2.29 | 65.86 ± 2.73 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 63.73 ± 0.19 | 54.81 ± 0.23 | 54.81 ± 3.07 |
![]() ![]() Gemma 2 27B | 62.98 ± 0.47 | 63.23 ± 3.26 | 63.22 ± 2.81 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 62.36 ± 2.29 | 64.01 ± 11.26 | 64.01 ± 2.05 |
![]() ![]() Aya Expanse 32B CohereLabs | 62.14 ± 0.31 | 64.25 ± 0.96 | 64.25 ± 2.88 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 60.93 ± 1.24 | 83.33 ± 2.75 | 83.33 ± 1.90 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.02 ± 0.25 | 61.30 ± 0.87 | 61.30 ± 2.95 |
![]() ![]() Gemma 2 9B | 59.64 ± 0.51 | 63.83 ± 1.93 | 63.82 ± 2.87 |
![]() ![]() MERaLiON 2 10B A*STAR | 58.52 ± 0.61 | 62.80 ± 2.69 | 62.80 ± 2.85 |
![]() ![]() Qwen 2.5 7B Alibaba | 57.96 ± 0.27 | 71.61 ± 0.79 | 71.61 ± 2.74 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 57.87 ± 0.63 | 47.14 ± 3.99 | 47.14 ± 2.81 |
![]() ![]() Aya Expanse 8B CohereLabs | 57.18 ± 0.42 | 61.19 ± 1.43 | 61.19 ± 2.89 |
![]() ![]() Llama 3 70B Meta | 55.14 ± 0.22 | 73.47 ± 0.71 | 73.47 ± 2.66 |
![]() ![]() Tulu 3 8B AI2 | 54.80 ± 0.61 | 64.46 ± 2.32 | 64.46 ± 2.77 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 54.78 ± 3.78 | 70.96 ± 12.24 | 70.96 ± 2.23 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 54.16 ± 1.61 | 46.77 ± 10.97 | 46.77 ± 2.32 |
![]() ![]() Command R 08-2024 32B CohereLabs | 53.76 ± 3.73 | 58.05 ± 12.11 | 58.05 ± 2.16 |
![]() ![]() Llama 3.1 8B Meta | 51.21 ± 0.62 | 45.69 ± 2.17 | 45.69 ± 2.91 |
![]() ![]() Babel 83B Alibaba-DAMO | 50.39 ± 5.33 | 68.44 ± 10.58 | 68.44 ± 2.03 |
![]() ![]() Olmo 2 0325 32B AI2 | 49.35 ± 0.89 | 58.94 ± 4.79 | 58.94 ± 2.72 |
![]() ![]() phi-4 14B Microsoft | 49.25 ± 2.75 | 26.80 ± 13.00 | 26.80 ± 1.77 |
![]() ![]() Sailor2 8B SAIL | 48.59 ± 0.86 | 41.23 ± 3.84 | 41.23 ± 2.82 |
![]() ![]() Llama 3 8B Meta | 47.65 ± 0.63 | 60.88 ± 2.03 | 60.88 ± 2.87 |
![]() ![]() Olmo 2 1124 13B AI2 | 45.08 ± 1.53 | 79.15 ± 3.18 | 79.15 ± 2.19 |
![]() ![]() Babel 9B Alibaba-DAMO | 44.97 ± 2.17 | 57.20 ± 7.83 | 57.20 ± 2.53 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.91 ± 3.76 | 25.10 ± 17.37 | 25.10 ± 1.38 |
![]() ![]() Ministral 2410 8B Mistral AI | 37.71 ± 2.67 | 42.56 ± 15.15 | 42.56 ± 1.87 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 36.11 ± 6.91 | 18.24 ± 5.57 | 18.24 ± 1.95 |
![]() ![]() Sailor2 20B SAIL | 33.92 ± 1.13 | 6.50 ± 3.62 | 6.50 ± 1.14 |
![]() ![]() Olmo 2 1124 7B AI2 | 30.65 ± 2.83 | 35.06 ± 17.59 | 35.06 ± 1.49 |
Model | VI | Knowledge | Global MMLU Lite |
---|---|---|---|
![]() ![]() Qwen 3 30B MoE Alibaba | 72.49 ± 0.20 | 74.66 ± 0.31 | 74.66 ± 0.31 |
![]() ![]() Qwen 3 32B Alibaba | 69.94 ± 0.38 | 74.22 ± 0.40 | 74.22 ± 0.40 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 69.65 ± 0.55 | 78.19 ± 0.45 | 78.19 ± 0.45 |
![]() ![]() Command A 03-2025 111B CohereLabs | 69.10 ± 0.58 | 72.63 ± 0.92 | 72.63 ± 0.92 |
![]() ![]() Tulu 3 70B AI2 | 68.85 ± 0.24 | 71.78 ± 0.58 | 71.78 ± 0.58 |
![]() ![]() Qwen 2.5 72B Alibaba | 68.27 ± 0.29 | 76.63 ± 0.39 | 76.63 ± 0.39 |
![]() ![]() Qwen 3 14B Alibaba | 67.62 ± 0.15 | 65.22 ± 0.34 | 65.22 ± 0.34 |
![]() ![]() Llama 3.3 70B Meta | 67.36 ± 0.19 | 74.94 ± 0.22 | 74.94 ± 0.22 |
![]() ![]() SEA-LION v4 27B AISG | 66.10 ± 0.32 | 72.81 ± 1.19 | 72.81 ± 1.19 |
![]() ![]() Gemma 3 27B | 65.64 ± 0.42 | 73.78 ± 0.31 | 73.78 ± 0.31 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.97 ± 0.30 | 69.88 ± 0.29 | 69.88 ± 0.29 |
![]() ![]() Qwen 3 8B Alibaba | 64.68 ± 0.24 | 62.09 ± 0.39 | 62.09 ± 0.39 |
![]() ![]() Gemma 3 12B | 64.39 ± 0.49 | 67.22 ± 0.39 | 67.22 ± 0.39 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 64.34 ± 0.42 | 63.28 ± 0.65 | 63.28 ± 0.65 |
![]() ![]() Llama 3.1 70B Meta | 63.96 ± 0.34 | 73.25 ± 1.50 | 73.25 ± 1.50 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 63.73 ± 0.19 | 73.50 ± 0.26 | 73.50 ± 0.26 |
![]() ![]() Gemma 2 27B | 62.98 ± 0.47 | 66.88 ± 0.45 | 66.88 ± 0.45 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 62.36 ± 2.29 | 65.22 ± 2.56 | 65.22 ± 2.56 |
![]() ![]() Aya Expanse 32B CohereLabs | 62.14 ± 0.31 | 60.47 ± 0.46 | 60.47 ± 0.46 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 60.93 ± 1.24 | 58.31 ± 1.92 | 58.31 ± 1.92 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.02 ± 0.25 | 64.88 ± 0.31 | 64.88 ± 0.31 |
![]() ![]() Gemma 2 9B | 59.64 ± 0.51 | 62.09 ± 0.40 | 62.09 ± 0.40 |
![]() ![]() MERaLiON 2 10B A*STAR | 58.52 ± 0.61 | 60.31 ± 0.95 | 60.31 ± 0.95 |
![]() ![]() Qwen 2.5 7B Alibaba | 57.96 ± 0.27 | 62.16 ± 0.16 | 62.16 ± 0.16 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 57.87 ± 0.63 | 56.00 ± 0.44 | 56.00 ± 0.44 |
![]() ![]() Aya Expanse 8B CohereLabs | 57.18 ± 0.42 | 55.25 ± 0.46 | 55.25 ± 0.46 |
![]() ![]() Llama 3 70B Meta | 55.14 ± 0.22 | 67.59 ± 0.23 | 67.59 ± 0.23 |
![]() ![]() Tulu 3 8B AI2 | 54.80 ± 0.61 | 48.97 ± 1.02 | 48.97 ± 1.02 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 54.78 ± 3.78 | 49.44 ± 5.29 | 49.44 ± 5.29 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 54.16 ± 1.61 | 63.31 ± 1.93 | 63.31 ± 1.93 |
![]() ![]() Command R 08-2024 32B CohereLabs | 53.76 ± 3.73 | 55.69 ± 2.96 | 55.69 ± 2.96 |
![]() ![]() Llama 3.1 8B Meta | 51.21 ± 0.62 | 47.94 ± 1.24 | 47.94 ± 1.24 |
![]() ![]() Babel 83B Alibaba-DAMO | 50.39 ± 5.33 | 55.72 ± 17.07 | 55.72 ± 17.07 |
![]() ![]() Olmo 2 0325 32B AI2 | 49.35 ± 0.89 | 52.63 ± 1.63 | 52.63 ± 1.63 |
![]() ![]() phi-4 14B Microsoft | 49.25 ± 2.75 | 62.63 ± 1.35 | 62.63 ± 1.35 |
![]() ![]() Sailor2 8B SAIL | 48.59 ± 0.86 | 43.19 ± 4.78 | 43.19 ± 4.78 |
![]() ![]() Llama 3 8B Meta | 47.65 ± 0.63 | 50.31 ± 0.46 | 50.31 ± 0.46 |
![]() ![]() Olmo 2 1124 13B AI2 | 45.08 ± 1.53 | 28.75 ± 0.81 | 28.75 ± 0.81 |
![]() ![]() Babel 9B Alibaba-DAMO | 44.97 ± 2.17 | 36.56 ± 7.64 | 36.56 ± 7.64 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.91 ± 3.76 | 46.28 ± 4.66 | 46.28 ± 4.66 |
![]() ![]() Ministral 2410 8B Mistral AI | 37.71 ± 2.67 | 42.41 ± 4.09 | 42.41 ± 4.09 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 36.11 ± 6.91 | 34.06 ± 14.13 | 34.06 ± 14.13 |
![]() ![]() Sailor2 20B SAIL | 33.92 ± 1.13 | 48.88 ± 1.85 | 48.88 ± 1.85 |
![]() ![]() Olmo 2 1124 7B AI2 | 30.65 ± 2.83 | 24.06 ± 2.44 | 24.06 ± 2.44 |