Indonesian Performance
Indonesian Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 80B MoE 67.11±0.10 |
![]() ![]() 111B 66.73±0.17 |
![]() ![]() 32B 66.59±0.10 |
![]() ![]() 32B 65.67±0.11 |
![]() ![]() 72B 64.82±0.08 |
![]() ![]() 27B 64.33±0.14 |
![]() ![]() 27B 64.12±0.15 |
![]() ![]() 70B 64.04±0.18 |
![]() ![]() 30B MoE 63.29±0.10 |
![]() ![]() 70B 63.17±0.09 |
![]() ![]() 70B 62.66±0.17 |
![]() ![]() 32B 61.82±0.10 |
![]() ![]() 12B 61.80±0.11 |
![]() ![]() 14B 61.37±0.12 |
![]() ![]() 109B MoE 61.27±0.11 |
![]() ![]() 70B 60.48±0.14 |
![]() ![]() 27B 59.79±0.15 |
![]() ![]() 32B 59.65±0.15 |
![]() ![]() 9B 59.14±0.13 |
![]() ![]() 8B 58.58±0.14 |
![]() ![]() 14B 58.49±0.12 |
![]() ![]() 123B 58.23±0.24 |
![]() ![]() 10B 55.06±0.15 |
![]() ![]() 9B 54.92±0.18 |
![]() ![]() 24B 53.67±0.22 |
![]() ![]() 7B 52.56±0.13 |
![]() ![]() 8B 52.51±0.20 |
![]() ![]() 20B 51.53±0.16 |
![]() ![]() 32B 50.75±0.21 |
![]() ![]() 104B 50.74±0.22 |
![]() ![]() 70B 49.96±0.14 |
![]() ![]() 8B 49.78±0.15 |
![]() ![]() 32B 49.15±0.20 |
![]() ![]() 14B 49.14±0.28 |
![]() ![]() 21B MoE 49.00±0.16 |
![]() ![]() 8B 47.23±0.21 |
![]() ![]() 8B 46.68±0.18 |
![]() ![]() 8B 42.96±0.13 |
![]() ![]() 8B 38.80±0.15 |
![]() ![]() 13B 36.46±0.26 |
![]() ![]() 83B 36.39±0.32 |
![]() ![]() 8B 33.71±0.29 |
![]() ![]() 70B 33.31±0.22 |
![]() ![]() 7B 33.07±0.25 |
![]() ![]() 7B 32.57±0.22 |
![]() ![]() 9B 32.20±0.26 |
![]() ![]() 8B 28.04±0.21 |
![]() ![]() 7B 25.27±0.27 |
Indonesian Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | ID | Instruction Following | Linguistic Diagnostics | Multi-Turn Chat | NLG | NLR | NLU | Safety | Knowledge |
---|---|---|---|---|---|---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 67.11 ± 0.10 | 90.92 ± 0.37 | 48.17 ± 0.30 | 58.36 ± 0.65 | 53.73 ± 0.04 | 84.09 ± 0.07 | 76.73 ± 0.10 | 52.96 ± 0.21 | 71.95 ± 0.27 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.73 ± 0.17 | 92.25 ± 0.59 | 55.33 ± 0.70 | 49.53 ± 0.67 | 54.98 ± 0.07 | 83.00 ± 0.31 | 75.72 ± 0.26 | 53.80 ± 0.49 | 69.22 ± 0.38 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 66.59 ± 0.10 | 88.57 ± 0.51 | 55.61 ± 0.39 | 45.57 ± 0.62 | 54.82 ± 0.04 | 85.14 ± 0.07 | 79.62 ± 0.12 | 52.78 ± 0.27 | 70.64 ± 0.18 |
![]() ![]() Qwen 3 32B Alibaba | 65.67 ± 0.11 | 87.40 ± 0.57 | 58.02 ± 0.49 | 45.83 ± 0.82 | 54.59 ± 0.05 | 83.91 ± 0.13 | 78.96 ± 0.15 | 49.46 ± 0.33 | 67.20 ± 0.30 |
![]() ![]() Qwen 2.5 72B Alibaba | 64.82 ± 0.08 | 89.71 ± 0.46 | 59.82 ± 0.36 | 32.53 ± 0.55 | 54.90 ± 0.05 | 82.11 ± 0.12 | 75.68 ± 0.12 | 50.33 ± 0.25 | 73.50 ± 0.25 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.33 ± 0.14 | 90.44 ± 0.56 | 50.58 ± 0.41 | 46.35 ± 0.59 | 54.85 ± 0.05 | 82.11 ± 0.14 | 74.78 ± 0.11 | 50.45 ± 0.25 | 65.11 ± 0.25 |
![]() ![]() Gemma 3 27B | 64.12 ± 0.15 | 88.54 ± 0.72 | 52.82 ± 0.33 | 45.10 ± 0.66 | 55.03 ± 0.04 | 81.64 ± 0.09 | 74.81 ± 0.09 | 49.50 ± 0.20 | 65.50 ± 0.25 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 64.04 ± 0.18 | 92.44 ± 0.54 | 51.55 ± 0.56 | 29.61 ± 0.63 | 56.01 ± 0.07 | 85.13 ± 0.24 | 75.34 ± 0.30 | 47.32 ± 0.47 | 74.93 ± 0.41 |
![]() ![]() Qwen 3 30B MoE Alibaba | 63.29 ± 0.10 | 89.97 ± 0.45 | 37.49 ± 0.30 | 52.23 ± 0.58 | 53.78 ± 0.04 | 82.79 ± 0.10 | 71.36 ± 0.12 | 50.51 ± 0.18 | 68.18 ± 0.24 |
![]() ![]() Llama 3.3 70B Meta | 63.17 ± 0.09 | 93.30 ± 0.30 | 54.38 ± 0.30 | 16.22 ± 0.47 | 55.27 ± 0.05 | 85.70 ± 0.09 | 76.02 ± 0.09 | 51.20 ± 0.28 | 73.28 ± 0.20 |
![]() ![]() Tulu 3 70B AI2 | 62.66 ± 0.17 | 87.90 ± 0.77 | 52.41 ± 0.54 | 29.89 ± 1.00 | 53.85 ± 0.06 | 84.64 ± 0.20 | 75.27 ± 0.21 | 51.15 ± 0.50 | 66.16 ± 0.53 |
![]() ![]() Qwen 2.5 32B Alibaba | 61.82 ± 0.10 | 88.41 ± 0.66 | 54.87 ± 0.16 | 19.91 ± 0.35 | 53.57 ± 0.04 | 78.92 ± 0.07 | 77.31 ± 0.12 | 55.41 ± 0.16 | 66.19 ± 0.23 |
![]() ![]() Gemma 3 12B | 61.80 ± 0.11 | 89.21 ± 0.60 | 47.05 ± 0.36 | 38.64 ± 0.61 | 54.76 ± 0.05 | 82.34 ± 0.07 | 74.85 ± 0.11 | 44.48 ± 0.32 | 63.09 ± 0.25 |
![]() ![]() Qwen 3 14B Alibaba | 61.37 ± 0.12 | 86.76 ± 0.42 | 45.89 ± 0.33 | 34.61 ± 0.57 | 54.28 ± 0.05 | 79.92 ± 0.10 | 74.68 ± 0.13 | 55.66 ± 0.33 | 59.15 ± 0.21 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.27 ± 0.11 | 91.56 ± 0.54 | 53.54 ± 0.05 | 21.29 ± 0.47 | 55.41 ± 0.04 | 75.47 ± 0.04 | 76.19 ± 0.08 | 48.83 ± 0.10 | 67.90 ± 0.09 |
![]() ![]() Llama 3.1 70B Meta | 60.48 ± 0.14 | 82.41 ± 0.89 | 51.51 ± 0.64 | 13.18 ± 0.44 | 57.15 ± 0.10 | 81.65 ± 0.19 | 73.75 ± 0.25 | 52.70 ± 0.36 | 71.47 ± 0.39 |
![]() ![]() Gemma 2 27B | 59.79 ± 0.15 | 86.22 ± 0.87 | 50.16 ± 0.51 | 18.81 ± 0.70 | 56.17 ± 0.07 | 80.22 ± 0.18 | 75.45 ± 0.18 | 49.22 ± 0.22 | 62.08 ± 0.39 |
![]() ![]() Aya Expanse 32B CohereLabs | 59.65 ± 0.15 | 82.79 ± 0.66 | 49.33 ± 0.28 | 26.64 ± 0.61 | 55.41 ± 0.05 | 79.20 ± 0.15 | 76.14 ± 0.17 | 53.85 ± 0.27 | 53.81 ± 0.41 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 59.14 ± 0.13 | 89.21 ± 0.76 | 44.87 ± 0.53 | 26.81 ± 0.55 | 55.23 ± 0.07 | 81.06 ± 0.16 | 75.06 ± 0.24 | 42.20 ± 0.28 | 58.71 ± 0.47 |
![]() ![]() Qwen 3 8B Alibaba | 58.58 ± 0.14 | 84.19 ± 0.60 | 41.26 ± 0.58 | 32.49 ± 0.47 | 54.32 ± 0.05 | 71.08 ± 0.18 | 72.32 ± 0.13 | 53.48 ± 0.41 | 59.49 ± 0.32 |
![]() ![]() Qwen 2.5 14B Alibaba | 58.49 ± 0.12 | 84.06 ± 0.92 | 44.20 ± 0.25 | 18.20 ± 0.47 | 53.14 ± 0.05 | 78.89 ± 0.10 | 75.31 ± 0.15 | 54.33 ± 0.31 | 59.81 ± 0.23 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.23 ± 0.24 | 88.35 ± 0.75 | 38.51 ± 0.81 | 21.91 ± 0.58 | 54.52 ± 0.08 | 82.30 ± 0.29 | 74.69 ± 0.30 | 43.26 ± 0.64 | 62.33 ± 0.60 |
![]() ![]() MERaLiON 2 10B A*STAR | 55.06 ± 0.15 | 81.43 ± 0.70 | 42.96 ± 0.56 | 12.97 ± 0.55 | 55.76 ± 0.06 | 72.76 ± 0.23 | 74.90 ± 0.30 | 45.57 ± 0.31 | 54.13 ± 0.44 |
![]() ![]() Gemma 2 9B | 54.92 ± 0.18 | 82.44 ± 1.12 | 42.21 ± 0.54 | 14.99 ± 0.49 | 55.69 ± 0.07 | 75.67 ± 0.17 | 73.89 ± 0.23 | 37.86 ± 0.28 | 56.60 ± 0.27 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 53.67 ± 0.22 | 74.73 ± 1.01 | 34.42 ± 1.05 | 10.14 ± 0.43 | 51.48 ± 0.07 | 80.69 ± 0.22 | 71.97 ± 0.27 | 45.37 ± 0.82 | 60.59 ± 0.48 |
![]() ![]() Qwen 2.5 7B Alibaba | 52.56 ± 0.13 | 79.24 ± 0.73 | 28.48 ± 0.34 | 16.05 ± 0.40 | 52.44 ± 0.04 | 74.11 ± 0.11 | 71.04 ± 0.15 | 45.81 ± 0.21 | 53.30 ± 0.17 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.51 ± 0.20 | 84.19 ± 0.90 | 24.72 ± 0.77 | 23.58 ± 0.55 | 54.47 ± 0.05 | 72.98 ± 0.36 | 67.14 ± 0.36 | 41.33 ± 0.60 | 51.66 ± 0.54 |
![]() ![]() Sailor2 20B SAIL | 51.53 ± 0.16 | 43.33 ± 0.84 | 43.68 ± 0.39 | 26.39 ± 0.63 | 50.00 ± 0.05 | 76.92 ± 0.09 | 72.11 ± 0.12 | 51.72 ± 0.30 | 48.07 ± 0.28 |
![]() ![]() Olmo 2 0325 32B AI2 | 50.75 ± 0.21 | 81.94 ± 0.97 | 33.81 ± 1.12 | 9.17 ± 0.44 | 50.06 ± 0.08 | 73.73 ± 0.38 | 64.69 ± 0.38 | 40.50 ± 0.52 | 52.13 ± 0.68 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 50.74 ± 0.22 | 78.44 ± 1.07 | 34.68 ± 1.05 | 10.49 ± 0.40 | 54.25 ± 0.09 | 71.25 ± 0.40 | 71.55 ± 0.42 | 45.07 ± 0.58 | 40.20 ± 0.59 |
![]() ![]() Llama 3 70B Meta | 49.96 ± 0.14 | 28.86 ± 0.55 | 40.03 ± 0.31 | 10.93 ± 0.46 | 55.71 ± 0.06 | 78.75 ± 0.10 | 71.40 ± 0.14 | 48.67 ± 0.27 | 65.35 ± 0.36 |
![]() ![]() Aya Expanse 8B CohereLabs | 49.78 ± 0.15 | 70.22 ± 0.84 | 23.05 ± 0.28 | 21.35 ± 0.57 | 53.14 ± 0.06 | 69.93 ± 0.14 | 69.60 ± 0.16 | 48.09 ± 0.46 | 42.83 ± 0.29 |
![]() ![]() Command R 08-2024 32B CohereLabs | 49.15 ± 0.20 | 69.05 ± 0.92 | 35.07 ± 0.83 | 5.76 ± 0.48 | 53.45 ± 0.10 | 71.51 ± 0.32 | 67.78 ± 0.35 | 44.23 ± 0.58 | 46.31 ± 0.68 |
![]() ![]() phi-4 14B Microsoft | 49.14 ± 0.28 | 73.27 ± 1.20 | 40.22 ± 0.54 | 18.45 ± 0.61 | 45.22 ± 0.07 | 74.14 ± 0.29 | 66.10 ± 0.26 | 13.37 ± 0.78 | 62.37 ± 0.63 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 49.00 ± 0.16 | 82.51 ± 0.83 | 29.92 ± 0.48 | 18.58 ± 0.58 | 51.17 ± 0.04 | 66.71 ± 0.31 | 71.46 ± 0.29 | 32.12 ± 0.34 | 39.51 ± 0.68 |
![]() ![]() Llama 3.1 8B Meta | 47.23 ± 0.21 | 76.89 ± 1.00 | 24.63 ± 0.87 | 9.64 ± 0.36 | 54.23 ± 0.07 | 63.67 ± 0.36 | 67.51 ± 0.34 | 39.19 ± 0.47 | 42.06 ± 0.50 |
![]() ![]() Sailor2 8B SAIL | 46.68 ± 0.18 | 45.65 ± 0.64 | 19.87 ± 0.47 | 24.51 ± 0.63 | 50.44 ± 0.07 | 74.69 ± 0.26 | 64.44 ± 0.25 | 44.16 ± 0.47 | 49.69 ± 0.42 |
![]() ![]() Tulu 3 8B AI2 | 42.96 ± 0.13 | 83.87 ± 0.64 | 3.54 ± 0.70 | 12.60 ± 0.50 | 52.23 ± 0.07 | 59.11 ± 0.51 | 59.51 ± 0.37 | 35.43 ± 0.49 | 37.38 ± 0.48 |
![]() ![]() Llama 3 8B Meta | 38.80 ± 0.15 | 32.00 ± 0.58 | 25.84 ± 0.74 | 5.96 ± 0.34 | 53.20 ± 0.09 | 57.36 ± 0.37 | 64.69 ± 0.35 | 30.32 ± 0.28 | 41.00 ± 0.44 |
![]() ![]() Olmo 2 1124 13B AI2 | 36.46 ± 0.26 | 69.24 ± 1.12 | 18.15 ± 0.87 | 5.65 ± 0.35 | 44.41 ± 0.08 | 39.23 ± 0.55 | 58.75 ± 0.42 | 25.02 ± 0.64 | 31.20 ± 0.45 |
![]() ![]() Babel 83B Alibaba-DAMO | 36.39 ± 0.32 | 45.14 ± 1.50 | 28.16 ± 1.11 | 9.68 ± 0.60 | 45.06 ± 0.11 | 53.35 ± 0.59 | 49.69 ± 0.62 | 8.48 ± 0.78 | 51.51 ± 1.03 |
![]() ![]() Apertus 8B Swiss AI | 33.71 ± 0.29 | 75.46 ± 1.20 | 15.11 ± 1.22 | 6.38 ± 0.41 | 53.36 ± 0.10 | 24.61 ± 0.64 | 46.94 ± 0.45 | 20.30 ± 0.66 | 27.50 ± 0.88 |
![]() ![]() Apertus 70B Swiss AI | 33.31 ± 0.22 | 70.86 ± 1.09 | 17.99 ± 1.04 | 17.39 ± 0.65 | 49.12 ± 0.08 | 16.94 ± 0.28 | 48.66 ± 0.62 | 30.27 ± 0.58 | 15.29 ± 0.78 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 33.07 ± 0.25 | 56.86 ± 1.50 | 19.68 ± 0.88 | 1.52 ± 0.23 | 34.01 ± 0.12 | 37.74 ± 0.67 | 52.60 ± 0.64 | 28.69 ± 0.45 | 33.43 ± 0.79 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 32.57 ± 0.22 | 52.67 ± 1.06 | 11.15 ± 0.83 | 11.31 ± 0.55 | 45.74 ± 0.12 | 36.90 ± 0.72 | 51.00 ± 0.46 | 24.23 ± 0.61 | 27.56 ± 0.72 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.20 ± 0.26 | 45.49 ± 1.43 | 18.22 ± 0.85 | 4.18 ± 0.33 | 46.49 ± 0.09 | 39.34 ± 0.50 | 56.67 ± 0.59 | 19.55 ± 0.51 | 27.63 ± 0.88 |
![]() ![]() Ministral 2410 8B Mistral AI | 28.04 ± 0.21 | 38.63 ± 1.27 | 12.84 ± 0.79 | 4.07 ± 0.36 | 37.91 ± 0.10 | 32.29 ± 0.71 | 50.42 ± 0.54 | 21.99 ± 0.44 | 26.14 ± 0.79 |
![]() ![]() Olmo 2 1124 7B AI2 | 25.27 ± 0.27 | 60.35 ± 1.04 | 11.11 ± 0.85 | 2.61 ± 0.32 | 36.82 ± 0.06 | 21.12 ± 0.65 | 47.59 ± 0.63 | 5.21 ± 1.03 | 17.31 ± 0.68 |
Indonesian Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | ID | Instruction Following | SEA-IFEval |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 67.11 ± 0.10 | 90.92 ± 0.37 | 90.92 ± 0.37 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.73 ± 0.17 | 92.25 ± 0.59 | 92.25 ± 0.59 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 66.59 ± 0.10 | 88.57 ± 0.51 | 88.57 ± 0.51 |
![]() ![]() Qwen 3 32B Alibaba | 65.67 ± 0.11 | 87.40 ± 0.57 | 87.40 ± 0.57 |
![]() ![]() Qwen 2.5 72B Alibaba | 64.82 ± 0.08 | 89.71 ± 0.46 | 89.71 ± 0.46 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.33 ± 0.14 | 90.44 ± 0.56 | 90.44 ± 0.56 |
![]() ![]() Gemma 3 27B | 64.12 ± 0.15 | 88.54 ± 0.72 | 88.54 ± 0.72 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 64.04 ± 0.18 | 92.44 ± 0.54 | 92.44 ± 0.54 |
![]() ![]() Qwen 3 30B MoE Alibaba | 63.29 ± 0.10 | 89.97 ± 0.45 | 89.97 ± 0.45 |
![]() ![]() Llama 3.3 70B Meta | 63.17 ± 0.09 | 93.30 ± 0.30 | 93.30 ± 0.30 |
![]() ![]() Tulu 3 70B AI2 | 62.66 ± 0.17 | 87.90 ± 0.77 | 87.90 ± 0.77 |
![]() ![]() Qwen 2.5 32B Alibaba | 61.82 ± 0.10 | 88.41 ± 0.66 | 88.41 ± 0.66 |
![]() ![]() Gemma 3 12B | 61.80 ± 0.11 | 89.21 ± 0.60 | 89.21 ± 0.60 |
![]() ![]() Qwen 3 14B Alibaba | 61.37 ± 0.12 | 86.76 ± 0.42 | 86.76 ± 0.42 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.27 ± 0.11 | 91.56 ± 0.54 | 91.56 ± 0.54 |
![]() ![]() Llama 3.1 70B Meta | 60.48 ± 0.14 | 82.41 ± 0.89 | 82.41 ± 0.89 |
![]() ![]() Gemma 2 27B | 59.79 ± 0.15 | 86.22 ± 0.87 | 86.22 ± 0.87 |
![]() ![]() Aya Expanse 32B CohereLabs | 59.65 ± 0.15 | 82.79 ± 0.66 | 82.79 ± 0.66 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 59.14 ± 0.13 | 89.21 ± 0.76 | 89.21 ± 0.76 |
![]() ![]() Qwen 3 8B Alibaba | 58.58 ± 0.14 | 84.19 ± 0.60 | 84.19 ± 0.60 |
![]() ![]() Qwen 2.5 14B Alibaba | 58.49 ± 0.12 | 84.06 ± 0.92 | 84.06 ± 0.92 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.23 ± 0.24 | 88.35 ± 0.75 | 88.35 ± 0.75 |
![]() ![]() MERaLiON 2 10B A*STAR | 55.06 ± 0.15 | 81.43 ± 0.70 | 81.43 ± 0.70 |
![]() ![]() Gemma 2 9B | 54.92 ± 0.18 | 82.44 ± 1.12 | 82.44 ± 1.12 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 53.67 ± 0.22 | 74.73 ± 1.01 | 74.73 ± 1.01 |
![]() ![]() Qwen 2.5 7B Alibaba | 52.56 ± 0.13 | 79.24 ± 0.73 | 79.24 ± 0.73 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.51 ± 0.20 | 84.19 ± 0.90 | 84.19 ± 0.90 |
![]() ![]() Sailor2 20B SAIL | 51.53 ± 0.16 | 43.33 ± 0.84 | 43.33 ± 0.84 |
![]() ![]() Olmo 2 0325 32B AI2 | 50.75 ± 0.21 | 81.94 ± 0.97 | 81.94 ± 0.97 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 50.74 ± 0.22 | 78.44 ± 1.07 | 78.44 ± 1.07 |
![]() ![]() Llama 3 70B Meta | 49.96 ± 0.14 | 28.86 ± 0.55 | 28.86 ± 0.55 |
![]() ![]() Aya Expanse 8B CohereLabs | 49.78 ± 0.15 | 70.22 ± 0.84 | 70.22 ± 0.84 |
![]() ![]() Command R 08-2024 32B CohereLabs | 49.15 ± 0.20 | 69.05 ± 0.92 | 69.05 ± 0.92 |
![]() ![]() phi-4 14B Microsoft | 49.14 ± 0.28 | 73.27 ± 1.20 | 73.27 ± 1.20 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 49.00 ± 0.16 | 82.51 ± 0.83 | 82.51 ± 0.83 |
![]() ![]() Llama 3.1 8B Meta | 47.23 ± 0.21 | 76.89 ± 1.00 | 76.89 ± 1.00 |
![]() ![]() Sailor2 8B SAIL | 46.68 ± 0.18 | 45.65 ± 0.64 | 45.65 ± 0.64 |
![]() ![]() Tulu 3 8B AI2 | 42.96 ± 0.13 | 83.87 ± 0.64 | 83.87 ± 0.64 |
![]() ![]() Llama 3 8B Meta | 38.80 ± 0.15 | 32.00 ± 0.58 | 32.00 ± 0.58 |
![]() ![]() Olmo 2 1124 13B AI2 | 36.46 ± 0.26 | 69.24 ± 1.12 | 69.24 ± 1.12 |
![]() ![]() Babel 83B Alibaba-DAMO | 36.39 ± 0.32 | 45.14 ± 1.50 | 45.14 ± 1.50 |
![]() ![]() Apertus 8B Swiss AI | 33.71 ± 0.29 | 75.46 ± 1.20 | 75.46 ± 1.20 |
![]() ![]() Apertus 70B Swiss AI | 33.31 ± 0.22 | 70.86 ± 1.09 | 70.86 ± 1.09 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 33.07 ± 0.25 | 56.86 ± 1.50 | 56.86 ± 1.50 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 32.57 ± 0.22 | 52.67 ± 1.06 | 52.67 ± 1.06 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.20 ± 0.26 | 45.49 ± 1.43 | 45.49 ± 1.43 |
![]() ![]() Ministral 2410 8B Mistral AI | 28.04 ± 0.21 | 38.63 ± 1.27 | 38.63 ± 1.27 |
![]() ![]() Olmo 2 1124 7B AI2 | 25.27 ± 0.27 | 60.35 ± 1.04 | 60.35 ± 1.04 |
Model | ID | Linguistic Diagnostics | Syntax | Pragmatics |
---|---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 67.11 ± 0.10 | 48.17 ± 0.30 | 44.67 ± 0.49 | 51.67 ± 0.32 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.73 ± 0.17 | 55.33 ± 0.70 | 41.84 ± 0.83 | 68.81 ± 0.99 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 66.59 ± 0.10 | 55.61 ± 0.39 | 52.40 ± 0.30 | 58.83 ± 0.75 |
![]() ![]() Qwen 3 32B Alibaba | 65.67 ± 0.11 | 58.02 ± 0.49 | 53.95 ± 0.45 | 62.10 ± 0.86 |
![]() ![]() Qwen 2.5 72B Alibaba | 64.82 ± 0.08 | 59.82 ± 0.36 | 46.68 ± 0.40 | 72.95 ± 0.50 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.33 ± 0.14 | 50.58 ± 0.41 | 34.12 ± 0.69 | 67.03 ± 0.38 |
![]() ![]() Gemma 3 27B | 64.12 ± 0.15 | 52.82 ± 0.33 | 35.58 ± 0.53 | 70.06 ± 0.28 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 64.04 ± 0.18 | 51.55 ± 0.56 | 39.61 ± 0.85 | 63.49 ± 0.85 |
![]() ![]() Qwen 3 30B MoE Alibaba | 63.29 ± 0.10 | 37.49 ± 0.30 | 27.91 ± 0.49 | 47.07 ± 0.40 |
![]() ![]() Llama 3.3 70B Meta | 63.17 ± 0.09 | 54.38 ± 0.30 | 37.21 ± 0.42 | 71.54 ± 0.53 |
![]() ![]() Tulu 3 70B AI2 | 62.66 ± 0.17 | 52.41 ± 0.54 | 41.98 ± 0.58 | 62.84 ± 0.79 |
![]() ![]() Qwen 2.5 32B Alibaba | 61.82 ± 0.10 | 54.87 ± 0.16 | 47.53 ± 0.21 | 62.21 ± 0.20 |
![]() ![]() Gemma 3 12B | 61.80 ± 0.11 | 47.05 ± 0.36 | 36.77 ± 0.65 | 57.32 ± 0.26 |
![]() ![]() Qwen 3 14B Alibaba | 61.37 ± 0.12 | 45.89 ± 0.33 | 34.86 ± 0.31 | 56.92 ± 0.59 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.27 ± 0.11 | 53.54 ± 0.05 | 37.58 ± 0.09 | 69.49 ± 0.00 |
![]() ![]() Llama 3.1 70B Meta | 60.48 ± 0.14 | 51.51 ± 0.64 | 35.46 ± 0.84 | 67.56 ± 1.08 |
![]() ![]() Gemma 2 27B | 59.79 ± 0.15 | 50.16 ± 0.51 | 26.02 ± 0.82 | 74.29 ± 0.73 |
![]() ![]() Aya Expanse 32B CohereLabs | 59.65 ± 0.15 | 49.33 ± 0.28 | 37.81 ± 0.32 | 60.86 ± 0.39 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 59.14 ± 0.13 | 44.87 ± 0.53 | 25.11 ± 0.97 | 64.63 ± 0.91 |
![]() ![]() Qwen 3 8B Alibaba | 58.58 ± 0.14 | 41.26 ± 0.58 | 23.09 ± 0.93 | 59.43 ± 0.74 |
![]() ![]() Qwen 2.5 14B Alibaba | 58.49 ± 0.12 | 44.20 ± 0.25 | 39.37 ± 0.13 | 49.04 ± 0.49 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.23 ± 0.24 | 38.51 ± 0.81 | 18.39 ± 0.84 | 58.62 ± 1.16 |
![]() ![]() MERaLiON 2 10B A*STAR | 55.06 ± 0.15 | 42.96 ± 0.56 | 30.58 ± 0.74 | 55.34 ± 0.94 |
![]() ![]() Gemma 2 9B | 54.92 ± 0.18 | 42.21 ± 0.54 | 24.84 ± 0.63 | 59.58 ± 0.82 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 53.67 ± 0.22 | 34.42 ± 1.05 | 15.61 ± 1.07 | 53.22 ± 1.85 |
![]() ![]() Qwen 2.5 7B Alibaba | 52.56 ± 0.13 | 28.48 ± 0.34 | 13.54 ± 0.53 | 43.41 ± 0.39 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.51 ± 0.20 | 24.72 ± 0.77 | 13.42 ± 1.57 | 36.01 ± 1.14 |
![]() ![]() Sailor2 20B SAIL | 51.53 ± 0.16 | 43.68 ± 0.39 | 31.58 ± 0.40 | 55.78 ± 0.56 |
![]() ![]() Olmo 2 0325 32B AI2 | 50.75 ± 0.21 | 33.81 ± 1.12 | 11.75 ± 1.72 | 55.87 ± 1.58 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 50.74 ± 0.22 | 34.68 ± 1.05 | 22.33 ± 1.44 | 47.02 ± 1.28 |
![]() ![]() Llama 3 70B Meta | 49.96 ± 0.14 | 40.03 ± 0.31 | 24.58 ± 0.39 | 55.48 ± 0.56 |
![]() ![]() Aya Expanse 8B CohereLabs | 49.78 ± 0.15 | 23.05 ± 0.28 | 15.49 ± 0.42 | 30.62 ± 0.34 |
![]() ![]() Command R 08-2024 32B CohereLabs | 49.15 ± 0.20 | 35.07 ± 0.83 | 21.09 ± 1.23 | 49.06 ± 1.28 |
![]() ![]() phi-4 14B Microsoft | 49.14 ± 0.28 | 40.22 ± 0.54 | 26.26 ± 0.69 | 54.18 ± 1.17 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 49.00 ± 0.16 | 29.92 ± 0.48 | 15.68 ± 0.95 | 44.16 ± 1.00 |
![]() ![]() Llama 3.1 8B Meta | 47.23 ± 0.21 | 24.63 ± 0.87 | 13.72 ± 1.66 | 35.53 ± 0.99 |
![]() ![]() Sailor2 8B SAIL | 46.68 ± 0.18 | 19.87 ± 0.47 | 36.46 ± 0.70 | 3.28 ± 0.74 |
![]() ![]() Tulu 3 8B AI2 | 42.96 ± 0.13 | 3.54 ± 0.70 | 7.09 ± 1.40 | 0.00 ± 0.00 |
![]() ![]() Llama 3 8B Meta | 38.80 ± 0.15 | 25.84 ± 0.74 | 13.95 ± 1.03 | 37.73 ± 0.87 |
![]() ![]() Olmo 2 1124 13B AI2 | 36.46 ± 0.26 | 18.15 ± 0.87 | 7.51 ± 1.30 | 28.80 ± 1.00 |
![]() ![]() Babel 83B Alibaba-DAMO | 36.39 ± 0.32 | 28.16 ± 1.11 | 18.58 ± 1.66 | 37.75 ± 1.29 |
![]() ![]() Apertus 8B Swiss AI | 33.71 ± 0.29 | 15.11 ± 1.22 | 10.39 ± 1.86 | 19.83 ± 1.55 |
![]() ![]() Apertus 70B Swiss AI | 33.31 ± 0.22 | 17.99 ± 1.04 | 1.12 ± 0.70 | 34.85 ± 1.95 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 33.07 ± 0.25 | 19.68 ± 0.88 | 13.49 ± 1.31 | 25.87 ± 1.45 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 32.57 ± 0.22 | 11.15 ± 0.83 | 4.33 ± 1.25 | 17.96 ± 0.81 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.20 ± 0.26 | 18.22 ± 0.85 | 3.82 ± 0.96 | 32.61 ± 1.26 |
![]() ![]() Ministral 2410 8B Mistral AI | 28.04 ± 0.21 | 12.84 ± 0.79 | 1.11 ± 0.66 | 24.57 ± 1.35 |
![]() ![]() Olmo 2 1124 7B AI2 | 25.27 ± 0.27 | 11.11 ± 0.85 | 1.72 ± 0.69 | 20.50 ± 1.33 |
Model | ID | Multi-Turn Chat | SEA-MT-Bench |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 67.11 ± 0.10 | 58.36 ± 0.65 | 58.36 ± 0.65 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.73 ± 0.17 | 49.53 ± 0.67 | 49.53 ± 0.67 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 66.59 ± 0.10 | 45.57 ± 0.62 | 45.57 ± 0.62 |
![]() ![]() Qwen 3 32B Alibaba | 65.67 ± 0.11 | 45.83 ± 0.82 | 45.83 ± 0.82 |
![]() ![]() Qwen 2.5 72B Alibaba | 64.82 ± 0.08 | 32.53 ± 0.55 | 32.53 ± 0.55 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.33 ± 0.14 | 46.35 ± 0.59 | 46.35 ± 0.59 |
![]() ![]() Gemma 3 27B | 64.12 ± 0.15 | 45.10 ± 0.66 | 45.10 ± 0.66 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 64.04 ± 0.18 | 29.61 ± 0.63 | 29.61 ± 0.63 |
![]() ![]() Qwen 3 30B MoE Alibaba | 63.29 ± 0.10 | 52.23 ± 0.58 | 52.23 ± 0.58 |
![]() ![]() Llama 3.3 70B Meta | 63.17 ± 0.09 | 16.22 ± 0.47 | 16.22 ± 0.47 |
![]() ![]() Tulu 3 70B AI2 | 62.66 ± 0.17 | 29.89 ± 1.00 | 29.89 ± 1.00 |
![]() ![]() Qwen 2.5 32B Alibaba | 61.82 ± 0.10 | 19.91 ± 0.35 | 19.91 ± 0.35 |
![]() ![]() Gemma 3 12B | 61.80 ± 0.11 | 38.64 ± 0.61 | 38.64 ± 0.61 |
![]() ![]() Qwen 3 14B Alibaba | 61.37 ± 0.12 | 34.61 ± 0.57 | 34.61 ± 0.57 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.27 ± 0.11 | 21.29 ± 0.47 | 21.29 ± 0.47 |
![]() ![]() Llama 3.1 70B Meta | 60.48 ± 0.14 | 13.18 ± 0.44 | 13.18 ± 0.44 |
![]() ![]() Gemma 2 27B | 59.79 ± 0.15 | 18.81 ± 0.70 | 18.81 ± 0.70 |
![]() ![]() Aya Expanse 32B CohereLabs | 59.65 ± 0.15 | 26.64 ± 0.61 | 26.64 ± 0.61 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 59.14 ± 0.13 | 26.81 ± 0.55 | 26.81 ± 0.55 |
![]() ![]() Qwen 3 8B Alibaba | 58.58 ± 0.14 | 32.49 ± 0.47 | 32.49 ± 0.47 |
![]() ![]() Qwen 2.5 14B Alibaba | 58.49 ± 0.12 | 18.20 ± 0.47 | 18.20 ± 0.47 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.23 ± 0.24 | 21.91 ± 0.58 | 21.91 ± 0.58 |
![]() ![]() MERaLiON 2 10B A*STAR | 55.06 ± 0.15 | 12.97 ± 0.55 | 12.97 ± 0.55 |
![]() ![]() Gemma 2 9B | 54.92 ± 0.18 | 14.99 ± 0.49 | 14.99 ± 0.49 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 53.67 ± 0.22 | 10.14 ± 0.43 | 10.14 ± 0.43 |
![]() ![]() Qwen 2.5 7B Alibaba | 52.56 ± 0.13 | 16.05 ± 0.40 | 16.05 ± 0.40 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.51 ± 0.20 | 23.58 ± 0.55 | 23.58 ± 0.55 |
![]() ![]() Sailor2 20B SAIL | 51.53 ± 0.16 | 26.39 ± 0.63 | 26.39 ± 0.63 |
![]() ![]() Olmo 2 0325 32B AI2 | 50.75 ± 0.21 | 9.17 ± 0.44 | 9.17 ± 0.44 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 50.74 ± 0.22 | 10.49 ± 0.40 | 10.49 ± 0.40 |
![]() ![]() Llama 3 70B Meta | 49.96 ± 0.14 | 10.93 ± 0.46 | 10.93 ± 0.46 |
![]() ![]() Aya Expanse 8B CohereLabs | 49.78 ± 0.15 | 21.35 ± 0.57 | 21.35 ± 0.57 |
![]() ![]() Command R 08-2024 32B CohereLabs | 49.15 ± 0.20 | 5.76 ± 0.48 | 5.76 ± 0.48 |
![]() ![]() phi-4 14B Microsoft | 49.14 ± 0.28 | 18.45 ± 0.61 | 18.45 ± 0.61 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 49.00 ± 0.16 | 18.58 ± 0.58 | 18.58 ± 0.58 |
![]() ![]() Llama 3.1 8B Meta | 47.23 ± 0.21 | 9.64 ± 0.36 | 9.64 ± 0.36 |
![]() ![]() Sailor2 8B SAIL | 46.68 ± 0.18 | 24.51 ± 0.63 | 24.51 ± 0.63 |
![]() ![]() Tulu 3 8B AI2 | 42.96 ± 0.13 | 12.60 ± 0.50 | 12.60 ± 0.50 |
![]() ![]() Llama 3 8B Meta | 38.80 ± 0.15 | 5.96 ± 0.34 | 5.96 ± 0.34 |
![]() ![]() Olmo 2 1124 13B AI2 | 36.46 ± 0.26 | 5.65 ± 0.35 | 5.65 ± 0.35 |
![]() ![]() Babel 83B Alibaba-DAMO | 36.39 ± 0.32 | 9.68 ± 0.60 | 9.68 ± 0.60 |
![]() ![]() Apertus 8B Swiss AI | 33.71 ± 0.29 | 6.38 ± 0.41 | 6.38 ± 0.41 |
![]() ![]() Apertus 70B Swiss AI | 33.31 ± 0.22 | 17.39 ± 0.65 | 17.39 ± 0.65 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 33.07 ± 0.25 | 1.52 ± 0.23 | 1.52 ± 0.23 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 32.57 ± 0.22 | 11.31 ± 0.55 | 11.31 ± 0.55 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.20 ± 0.26 | 4.18 ± 0.33 | 4.18 ± 0.33 |
![]() ![]() Ministral 2410 8B Mistral AI | 28.04 ± 0.21 | 4.07 ± 0.36 | 4.07 ± 0.36 |
![]() ![]() Olmo 2 1124 7B AI2 | 25.27 ± 0.27 | 2.61 ± 0.32 | 2.61 ± 0.32 |
Model | ID | NLG | Summarization | Translations |
---|---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 67.11 ± 0.10 | 53.73 ± 0.04 | 14.82 ± 0.07 | 92.63 ± 0.02 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.73 ± 0.17 | 54.98 ± 0.07 | 17.48 ± 0.14 | 92.49 ± 0.04 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 66.59 ± 0.10 | 54.82 ± 0.04 | 17.06 ± 0.08 | 92.58 ± 0.02 |
![]() ![]() Qwen 3 32B Alibaba | 65.67 ± 0.11 | 54.59 ± 0.05 | 16.82 ± 0.09 | 92.36 ± 0.02 |
![]() ![]() Qwen 2.5 72B Alibaba | 64.82 ± 0.08 | 54.90 ± 0.05 | 17.31 ± 0.10 | 92.49 ± 0.02 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.33 ± 0.14 | 54.85 ± 0.05 | 16.18 ± 0.09 | 93.51 ± 0.01 |
![]() ![]() Gemma 3 27B | 64.12 ± 0.15 | 55.03 ± 0.04 | 16.47 ± 0.08 | 93.58 ± 0.01 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 64.04 ± 0.18 | 56.01 ± 0.07 | 19.11 ± 0.15 | 92.92 ± 0.02 |
![]() ![]() Qwen 3 30B MoE Alibaba | 63.29 ± 0.10 | 53.78 ± 0.04 | 15.13 ± 0.08 | 92.42 ± 0.01 |
![]() ![]() Llama 3.3 70B Meta | 63.17 ± 0.09 | 55.27 ± 0.05 | 18.73 ± 0.11 | 91.82 ± 0.02 |
![]() ![]() Tulu 3 70B AI2 | 62.66 ± 0.17 | 53.85 ± 0.06 | 14.92 ± 0.11 | 92.79 ± 0.01 |
![]() ![]() Qwen 2.5 32B Alibaba | 61.82 ± 0.10 | 53.57 ± 0.04 | 16.14 ± 0.08 | 91.00 ± 0.02 |
![]() ![]() Gemma 3 12B | 61.80 ± 0.11 | 54.76 ± 0.05 | 16.51 ± 0.10 | 93.01 ± 0.01 |
![]() ![]() Qwen 3 14B Alibaba | 61.37 ± 0.12 | 54.28 ± 0.05 | 17.57 ± 0.11 | 91.00 ± 0.02 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.27 ± 0.11 | 55.41 ± 0.04 | 18.34 ± 0.08 | 92.47 ± 0.01 |
![]() ![]() Llama 3.1 70B Meta | 60.48 ± 0.14 | 57.15 ± 0.10 | 22.39 ± 0.20 | 91.91 ± 0.02 |
![]() ![]() Gemma 2 27B | 59.79 ± 0.15 | 56.17 ± 0.07 | 19.85 ± 0.13 | 92.50 ± 0.02 |
![]() ![]() Aya Expanse 32B CohereLabs | 59.65 ± 0.15 | 55.41 ± 0.05 | 18.20 ± 0.10 | 92.61 ± 0.02 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 59.14 ± 0.13 | 55.23 ± 0.07 | 17.87 ± 0.12 | 92.58 ± 0.03 |
![]() ![]() Qwen 3 8B Alibaba | 58.58 ± 0.14 | 54.32 ± 0.05 | 17.53 ± 0.09 | 91.10 ± 0.02 |
![]() ![]() Qwen 2.5 14B Alibaba | 58.49 ± 0.12 | 53.14 ± 0.05 | 15.78 ± 0.10 | 90.49 ± 0.03 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.23 ± 0.24 | 54.52 ± 0.08 | 17.49 ± 0.14 | 91.55 ± 0.03 |
![]() ![]() MERaLiON 2 10B A*STAR | 55.06 ± 0.15 | 55.76 ± 0.06 | 19.60 ± 0.12 | 91.91 ± 0.03 |
![]() ![]() Gemma 2 9B | 54.92 ± 0.18 | 55.69 ± 0.07 | 19.59 ± 0.14 | 91.79 ± 0.03 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 53.67 ± 0.22 | 51.48 ± 0.07 | 16.48 ± 0.14 | 86.47 ± 0.07 |
![]() ![]() Qwen 2.5 7B Alibaba | 52.56 ± 0.13 | 52.44 ± 0.04 | 17.08 ± 0.09 | 87.80 ± 0.03 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.51 ± 0.20 | 54.47 ± 0.05 | 17.15 ± 0.11 | 91.80 ± 0.02 |
![]() ![]() Sailor2 20B SAIL | 51.53 ± 0.16 | 50.00 ± 0.05 | 14.55 ± 0.11 | 85.45 ± 0.09 |
![]() ![]() Olmo 2 0325 32B AI2 | 50.75 ± 0.21 | 50.06 ± 0.08 | 15.09 ± 0.15 | 85.03 ± 0.06 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 50.74 ± 0.22 | 54.25 ± 0.09 | 17.49 ± 0.18 | 91.01 ± 0.04 |
![]() ![]() Llama 3 70B Meta | 49.96 ± 0.14 | 55.71 ± 0.06 | 19.77 ± 0.12 | 91.64 ± 0.02 |
![]() ![]() Aya Expanse 8B CohereLabs | 49.78 ± 0.15 | 53.14 ± 0.06 | 15.77 ± 0.12 | 90.51 ± 0.03 |
![]() ![]() Command R 08-2024 32B CohereLabs | 49.15 ± 0.20 | 53.45 ± 0.10 | 16.25 ± 0.20 | 90.65 ± 0.04 |
![]() ![]() phi-4 14B Microsoft | 49.14 ± 0.28 | 45.22 ± 0.07 | 11.85 ± 0.08 | 78.59 ± 0.11 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 49.00 ± 0.16 | 51.17 ± 0.04 | 12.15 ± 0.07 | 90.18 ± 0.03 |
![]() ![]() Llama 3.1 8B Meta | 47.23 ± 0.21 | 54.23 ± 0.07 | 18.34 ± 0.15 | 90.12 ± 0.03 |
![]() ![]() Sailor2 8B SAIL | 46.68 ± 0.18 | 50.44 ± 0.07 | 15.56 ± 0.12 | 85.32 ± 0.06 |
![]() ![]() Tulu 3 8B AI2 | 42.96 ± 0.13 | 52.23 ± 0.07 | 14.31 ± 0.12 | 90.14 ± 0.03 |
![]() ![]() Llama 3 8B Meta | 38.80 ± 0.15 | 53.20 ± 0.09 | 17.72 ± 0.19 | 88.67 ± 0.04 |
![]() ![]() Olmo 2 1124 13B AI2 | 36.46 ± 0.26 | 44.41 ± 0.08 | 10.62 ± 0.10 | 78.21 ± 0.09 |
![]() ![]() Babel 83B Alibaba-DAMO | 36.39 ± 0.32 | 45.06 ± 0.11 | 15.48 ± 0.22 | 74.65 ± 0.11 |
![]() ![]() Apertus 8B Swiss AI | 33.71 ± 0.29 | 53.36 ± 0.10 | 17.13 ± 0.17 | 89.59 ± 0.06 |
![]() ![]() Apertus 70B Swiss AI | 33.31 ± 0.22 | 49.12 ± 0.08 | 13.26 ± 0.13 | 84.98 ± 0.06 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 33.07 ± 0.25 | 34.01 ± 0.12 | 11.61 ± 0.23 | 56.40 ± 0.12 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 32.57 ± 0.22 | 45.74 ± 0.12 | 4.33 ± 0.24 | 87.15 ± 0.08 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.20 ± 0.26 | 46.49 ± 0.09 | 15.67 ± 0.13 | 77.32 ± 0.12 |
![]() ![]() Ministral 2410 8B Mistral AI | 28.04 ± 0.21 | 37.91 ± 0.10 | 11.27 ± 0.11 | 64.55 ± 0.15 |
![]() ![]() Olmo 2 1124 7B AI2 | 25.27 ± 0.27 | 36.82 ± 0.06 | 9.73 ± 0.13 | 63.92 ± 0.08 |
Model | ID | NLR | Causal Reasoning | Natural Language Inference |
---|---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 67.11 ± 0.10 | 84.09 ± 0.07 | 92.20 ± 0.12 | 75.98 ± 0.12 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.73 ± 0.17 | 83.00 ± 0.31 | 94.43 ± 0.18 | 71.57 ± 0.54 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 66.59 ± 0.10 | 85.14 ± 0.07 | 92.01 ± 0.12 | 78.26 ± 0.10 |
![]() ![]() Qwen 3 32B Alibaba | 65.67 ± 0.11 | 83.91 ± 0.13 | 90.16 ± 0.14 | 77.66 ± 0.22 |
![]() ![]() Qwen 2.5 72B Alibaba | 64.82 ± 0.08 | 82.11 ± 0.12 | 94.47 ± 0.09 | 69.75 ± 0.22 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.33 ± 0.14 | 82.11 ± 0.14 | 90.12 ± 0.22 | 74.10 ± 0.15 |
![]() ![]() Gemma 3 27B | 64.12 ± 0.15 | 81.64 ± 0.09 | 89.75 ± 0.14 | 73.53 ± 0.13 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 64.04 ± 0.18 | 85.13 ± 0.24 | 91.83 ± 0.44 | 78.44 ± 0.24 |
![]() ![]() Qwen 3 30B MoE Alibaba | 63.29 ± 0.10 | 82.79 ± 0.10 | 92.03 ± 0.10 | 73.56 ± 0.19 |
![]() ![]() Llama 3.3 70B Meta | 63.17 ± 0.09 | 85.70 ± 0.09 | 93.33 ± 0.12 | 78.07 ± 0.15 |
![]() ![]() Tulu 3 70B AI2 | 62.66 ± 0.17 | 84.64 ± 0.20 | 91.75 ± 0.28 | 77.54 ± 0.28 |
![]() ![]() Qwen 2.5 32B Alibaba | 61.82 ± 0.10 | 78.92 ± 0.07 | 93.35 ± 0.07 | 64.48 ± 0.11 |
![]() ![]() Gemma 3 12B | 61.80 ± 0.11 | 82.34 ± 0.07 | 89.89 ± 0.08 | 74.79 ± 0.14 |
![]() ![]() Qwen 3 14B Alibaba | 61.37 ± 0.12 | 79.92 ± 0.10 | 85.13 ± 0.14 | 74.71 ± 0.12 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.27 ± 0.11 | 75.47 ± 0.04 | 91.25 ± 0.05 | 59.68 ± 0.08 |
![]() ![]() Llama 3.1 70B Meta | 60.48 ± 0.14 | 81.65 ± 0.19 | 91.61 ± 0.32 | 71.69 ± 0.37 |
![]() ![]() Gemma 2 27B | 59.79 ± 0.15 | 80.22 ± 0.18 | 89.16 ± 0.22 | 71.28 ± 0.27 |
![]() ![]() Aya Expanse 32B CohereLabs | 59.65 ± 0.15 | 79.20 ± 0.15 | 89.41 ± 0.21 | 68.99 ± 0.22 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 59.14 ± 0.13 | 81.06 ± 0.16 | 90.97 ± 0.30 | 71.16 ± 0.22 |
![]() ![]() Qwen 3 8B Alibaba | 58.58 ± 0.14 | 71.08 ± 0.18 | 74.53 ± 0.38 | 67.63 ± 0.20 |
![]() ![]() Qwen 2.5 14B Alibaba | 58.49 ± 0.12 | 78.89 ± 0.10 | 90.75 ± 0.11 | 67.03 ± 0.16 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.23 ± 0.24 | 82.30 ± 0.29 | 90.60 ± 0.42 | 74.01 ± 0.42 |
![]() ![]() MERaLiON 2 10B A*STAR | 55.06 ± 0.15 | 72.76 ± 0.23 | 87.01 ± 0.31 | 58.50 ± 0.29 |
![]() ![]() Gemma 2 9B | 54.92 ± 0.18 | 75.67 ± 0.17 | 88.87 ± 0.26 | 62.48 ± 0.17 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 53.67 ± 0.22 | 80.69 ± 0.22 | 87.39 ± 0.38 | 73.99 ± 0.26 |
![]() ![]() Qwen 2.5 7B Alibaba | 52.56 ± 0.13 | 74.11 ± 0.11 | 79.24 ± 0.15 | 68.99 ± 0.16 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.51 ± 0.20 | 72.98 ± 0.36 | 80.65 ± 0.56 | 65.30 ± 0.44 |
![]() ![]() Sailor2 20B SAIL | 51.53 ± 0.16 | 76.92 ± 0.09 | 89.49 ± 0.10 | 64.35 ± 0.12 |
![]() ![]() Olmo 2 0325 32B AI2 | 50.75 ± 0.21 | 73.73 ± 0.38 | 75.65 ± 0.73 | 71.81 ± 0.27 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 50.74 ± 0.22 | 71.25 ± 0.40 | 81.57 ± 0.54 | 60.92 ± 0.55 |
![]() ![]() Llama 3 70B Meta | 49.96 ± 0.14 | 78.75 ± 0.10 | 89.95 ± 0.16 | 67.56 ± 0.12 |
![]() ![]() Aya Expanse 8B CohereLabs | 49.78 ± 0.15 | 69.93 ± 0.14 | 80.33 ± 0.22 | 59.54 ± 0.14 |
![]() ![]() Command R 08-2024 32B CohereLabs | 49.15 ± 0.20 | 71.51 ± 0.32 | 84.73 ± 0.37 | 58.29 ± 0.53 |
![]() ![]() phi-4 14B Microsoft | 49.14 ± 0.28 | 74.14 ± 0.29 | 83.09 ± 0.57 | 65.18 ± 0.31 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 49.00 ± 0.16 | 66.71 ± 0.31 | 76.47 ± 0.41 | 56.95 ± 0.34 |
![]() ![]() Llama 3.1 8B Meta | 47.23 ± 0.21 | 63.67 ± 0.36 | 74.89 ± 0.60 | 52.44 ± 0.35 |
![]() ![]() Sailor2 8B SAIL | 46.68 ± 0.18 | 74.69 ± 0.26 | 89.40 ± 0.26 | 59.99 ± 0.36 |
![]() ![]() Tulu 3 8B AI2 | 42.96 ± 0.13 | 59.11 ± 0.51 | 64.29 ± 0.77 | 53.93 ± 0.58 |
![]() ![]() Llama 3 8B Meta | 38.80 ± 0.15 | 57.36 ± 0.37 | 63.33 ± 0.53 | 51.39 ± 0.40 |
![]() ![]() Olmo 2 1124 13B AI2 | 36.46 ± 0.26 | 39.23 ± 0.55 | 51.88 ± 0.90 | 26.57 ± 0.62 |
![]() ![]() Babel 83B Alibaba-DAMO | 36.39 ± 0.32 | 53.35 ± 0.59 | 77.19 ± 0.88 | 29.51 ± 0.68 |
![]() ![]() Apertus 8B Swiss AI | 33.71 ± 0.29 | 24.61 ± 0.64 | 43.68 ± 1.21 | 5.53 ± 0.64 |
![]() ![]() Apertus 70B Swiss AI | 33.31 ± 0.22 | 16.94 ± 0.28 | 0.07 ± 0.09 | 33.82 ± 0.56 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 33.07 ± 0.25 | 37.74 ± 0.67 | 38.65 ± 1.22 | 36.82 ± 0.59 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 32.57 ± 0.22 | 36.90 ± 0.72 | 63.89 ± 1.00 | 9.92 ± 0.87 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.20 ± 0.26 | 39.34 ± 0.50 | 65.96 ± 0.68 | 12.72 ± 0.71 |
![]() ![]() Ministral 2410 8B Mistral AI | 28.04 ± 0.21 | 32.29 ± 0.71 | 32.03 ± 1.23 | 32.55 ± 0.70 |
![]() ![]() Olmo 2 1124 7B AI2 | 25.27 ± 0.27 | 21.12 ± 0.65 | 28.41 ± 1.16 | 13.82 ± 0.81 |
Model | ID | NLU | Metaphor Understanding | Question Answering | Sentiment Analysis |
---|---|---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 67.11 ± 0.10 | 76.73 ± 0.10 | 71.96 ± 0.14 | 78.72 ± 0.19 | 79.51 ± 0.12 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.73 ± 0.17 | 75.72 ± 0.26 | 71.33 ± 0.41 | 78.02 ± 0.58 | 77.82 ± 0.35 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 66.59 ± 0.10 | 79.62 ± 0.12 | 78.86 ± 0.18 | 78.29 ± 0.25 | 81.70 ± 0.23 |
![]() ![]() Qwen 3 32B Alibaba | 65.67 ± 0.11 | 78.96 ± 0.15 | 78.15 ± 0.30 | 78.23 ± 0.22 | 80.50 ± 0.26 |
![]() ![]() Qwen 2.5 72B Alibaba | 64.82 ± 0.08 | 75.68 ± 0.12 | 71.95 ± 0.23 | 77.22 ± 0.20 | 77.88 ± 0.20 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.33 ± 0.14 | 74.78 ± 0.11 | 68.41 ± 0.30 | 81.46 ± 0.14 | 74.49 ± 0.04 |
![]() ![]() Gemma 3 27B | 64.12 ± 0.15 | 74.81 ± 0.09 | 68.25 ± 0.19 | 81.80 ± 0.17 | 74.39 ± 0.09 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 64.04 ± 0.18 | 75.34 ± 0.30 | 72.63 ± 0.70 | 79.04 ± 0.42 | 74.36 ± 0.28 |
![]() ![]() Qwen 3 30B MoE Alibaba | 63.29 ± 0.10 | 71.36 ± 0.12 | 69.65 ± 0.11 | 76.93 ± 0.17 | 67.50 ± 0.31 |
![]() ![]() Llama 3.3 70B Meta | 63.17 ± 0.09 | 76.02 ± 0.09 | 75.71 ± 0.18 | 79.10 ± 0.18 | 73.25 ± 0.19 |
![]() ![]() Tulu 3 70B AI2 | 62.66 ± 0.17 | 75.27 ± 0.21 | 68.60 ± 0.32 | 75.45 ± 0.50 | 81.77 ± 0.47 |
![]() ![]() Qwen 2.5 32B Alibaba | 61.82 ± 0.10 | 77.31 ± 0.12 | 75.58 ± 0.18 | 81.01 ± 0.18 | 75.34 ± 0.27 |
![]() ![]() Gemma 3 12B | 61.80 ± 0.11 | 74.85 ± 0.11 | 68.77 ± 0.16 | 81.17 ± 0.23 | 74.62 ± 0.15 |
![]() ![]() Qwen 3 14B Alibaba | 61.37 ± 0.12 | 74.68 ± 0.13 | 67.92 ± 0.30 | 76.54 ± 0.23 | 79.57 ± 0.15 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.27 ± 0.11 | 76.19 ± 0.08 | 72.47 ± 0.15 | 82.34 ± 0.12 | 73.74 ± 0.13 |
![]() ![]() Llama 3.1 70B Meta | 60.48 ± 0.14 | 73.75 ± 0.25 | 72.72 ± 0.44 | 75.96 ± 0.49 | 72.55 ± 0.32 |
![]() ![]() Gemma 2 27B | 59.79 ± 0.15 | 75.45 ± 0.18 | 73.15 ± 0.30 | 78.66 ± 0.47 | 74.54 ± 0.22 |
![]() ![]() Aya Expanse 32B CohereLabs | 59.65 ± 0.15 | 76.14 ± 0.17 | 72.10 ± 0.24 | 77.66 ± 0.38 | 78.64 ± 0.29 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 59.14 ± 0.13 | 75.06 ± 0.24 | 72.91 ± 0.41 | 79.41 ± 0.43 | 72.85 ± 0.38 |
![]() ![]() Qwen 3 8B Alibaba | 58.58 ± 0.14 | 72.32 ± 0.13 | 62.12 ± 0.33 | 78.73 ± 0.19 | 76.12 ± 0.18 |
![]() ![]() Qwen 2.5 14B Alibaba | 58.49 ± 0.12 | 75.31 ± 0.15 | 68.50 ± 0.12 | 76.04 ± 0.33 | 81.40 ± 0.20 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.23 ± 0.24 | 74.69 ± 0.30 | 67.86 ± 0.56 | 78.79 ± 0.68 | 77.44 ± 0.43 |
![]() ![]() MERaLiON 2 10B A*STAR | 55.06 ± 0.15 | 74.90 ± 0.30 | 72.96 ± 0.51 | 78.36 ± 0.49 | 73.39 ± 0.33 |
![]() ![]() Gemma 2 9B | 54.92 ± 0.18 | 73.89 ± 0.23 | 73.46 ± 0.45 | 79.45 ± 0.38 | 68.77 ± 0.44 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 53.67 ± 0.22 | 71.97 ± 0.27 | 64.98 ± 0.35 | 75.54 ± 0.59 | 75.39 ± 0.46 |
![]() ![]() Qwen 2.5 7B Alibaba | 52.56 ± 0.13 | 71.04 ± 0.15 | 62.40 ± 0.12 | 74.37 ± 0.30 | 76.36 ± 0.19 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.51 ± 0.20 | 67.14 ± 0.36 | 60.38 ± 0.77 | 72.22 ± 0.56 | 68.82 ± 0.42 |
![]() ![]() Sailor2 20B SAIL | 51.53 ± 0.16 | 72.11 ± 0.12 | 75.11 ± 0.15 | 62.05 ± 0.28 | 79.16 ± 0.19 |
![]() ![]() Olmo 2 0325 32B AI2 | 50.75 ± 0.21 | 64.69 ± 0.38 | 52.59 ± 0.88 | 69.67 ± 0.66 | 71.82 ± 0.55 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 50.74 ± 0.22 | 71.55 ± 0.42 | 61.46 ± 0.80 | 78.83 ± 0.76 | 74.35 ± 0.46 |
![]() ![]() Llama 3 70B Meta | 49.96 ± 0.14 | 71.40 ± 0.14 | 66.70 ± 0.21 | 77.12 ± 0.33 | 70.39 ± 0.23 |
![]() ![]() Aya Expanse 8B CohereLabs | 49.78 ± 0.15 | 69.60 ± 0.16 | 59.46 ± 0.24 | 77.27 ± 0.33 | 72.06 ± 0.21 |
![]() ![]() Command R 08-2024 32B CohereLabs | 49.15 ± 0.20 | 67.78 ± 0.35 | 66.04 ± 0.62 | 78.01 ± 0.37 | 59.29 ± 0.60 |
![]() ![]() phi-4 14B Microsoft | 49.14 ± 0.28 | 66.10 ± 0.26 | 62.86 ± 0.57 | 58.25 ± 0.54 | 77.20 ± 0.25 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 49.00 ± 0.16 | 71.46 ± 0.29 | 58.90 ± 0.80 | 74.68 ± 0.56 | 80.79 ± 0.27 |
![]() ![]() Llama 3.1 8B Meta | 47.23 ± 0.21 | 67.51 ± 0.34 | 60.07 ± 0.86 | 77.23 ± 0.42 | 65.23 ± 0.45 |
![]() ![]() Sailor2 8B SAIL | 46.68 ± 0.18 | 64.44 ± 0.25 | 67.17 ± 0.31 | 59.72 ± 0.59 | 66.45 ± 0.26 |
![]() ![]() Tulu 3 8B AI2 | 42.96 ± 0.13 | 59.51 ± 0.37 | 50.00 ± 0.85 | 56.06 ± 0.43 | 72.47 ± 0.36 |
![]() ![]() Llama 3 8B Meta | 38.80 ± 0.15 | 64.69 ± 0.35 | 53.63 ± 0.84 | 76.43 ± 0.36 | 64.02 ± 0.43 |
![]() ![]() Olmo 2 1124 13B AI2 | 36.46 ± 0.26 | 58.75 ± 0.42 | 46.25 ± 0.67 | 62.57 ± 0.62 | 67.44 ± 0.59 |
![]() ![]() Babel 83B Alibaba-DAMO | 36.39 ± 0.32 | 49.69 ± 0.62 | 43.28 ± 1.40 | 45.10 ± 0.61 | 60.71 ± 0.88 |
![]() ![]() Apertus 8B Swiss AI | 33.71 ± 0.29 | 46.94 ± 0.45 | 23.45 ± 1.13 | 72.28 ± 0.54 | 45.10 ± 0.66 |
![]() ![]() Apertus 70B Swiss AI | 33.31 ± 0.22 | 48.66 ± 0.62 | 15.62 ± 1.73 | 72.44 ± 0.86 | 57.91 ± 0.69 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 33.07 ± 0.25 | 52.60 ± 0.64 | 30.61 ± 1.66 | 70.71 ± 0.68 | 56.49 ± 0.50 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 32.57 ± 0.22 | 51.00 ± 0.46 | 44.96 ± 0.87 | 70.14 ± 0.86 | 37.89 ± 1.18 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.20 ± 0.26 | 56.67 ± 0.59 | 44.23 ± 1.53 | 71.47 ± 0.77 | 54.30 ± 1.13 |
![]() ![]() Ministral 2410 8B Mistral AI | 28.04 ± 0.21 | 50.42 ± 0.54 | 24.47 ± 0.98 | 68.43 ± 0.76 | 58.35 ± 0.95 |
![]() ![]() Olmo 2 1124 7B AI2 | 25.27 ± 0.27 | 47.59 ± 0.63 | 24.82 ± 1.16 | 64.69 ± 0.60 | 53.26 ± 1.03 |
Model | ID | Safety | Toxicity Detection |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 67.11 ± 0.10 | 52.96 ± 0.21 | 52.96 ± 0.21 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.73 ± 0.17 | 53.80 ± 0.49 | 53.80 ± 0.49 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 66.59 ± 0.10 | 52.78 ± 0.27 | 52.78 ± 0.27 |
![]() ![]() Qwen 3 32B Alibaba | 65.67 ± 0.11 | 49.46 ± 0.33 | 49.46 ± 0.33 |
![]() ![]() Qwen 2.5 72B Alibaba | 64.82 ± 0.08 | 50.33 ± 0.25 | 50.33 ± 0.25 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.33 ± 0.14 | 50.45 ± 0.25 | 50.45 ± 0.25 |
![]() ![]() Gemma 3 27B | 64.12 ± 0.15 | 49.50 ± 0.20 | 49.50 ± 0.20 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 64.04 ± 0.18 | 47.32 ± 0.47 | 47.32 ± 0.47 |
![]() ![]() Qwen 3 30B MoE Alibaba | 63.29 ± 0.10 | 50.51 ± 0.18 | 50.51 ± 0.18 |
![]() ![]() Llama 3.3 70B Meta | 63.17 ± 0.09 | 51.20 ± 0.28 | 51.20 ± 0.28 |
![]() ![]() Tulu 3 70B AI2 | 62.66 ± 0.17 | 51.15 ± 0.50 | 51.15 ± 0.50 |
![]() ![]() Qwen 2.5 32B Alibaba | 61.82 ± 0.10 | 55.41 ± 0.16 | 55.41 ± 0.16 |
![]() ![]() Gemma 3 12B | 61.80 ± 0.11 | 44.48 ± 0.32 | 44.48 ± 0.32 |
![]() ![]() Qwen 3 14B Alibaba | 61.37 ± 0.12 | 55.66 ± 0.33 | 55.66 ± 0.33 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.27 ± 0.11 | 48.83 ± 0.10 | 48.83 ± 0.10 |
![]() ![]() Llama 3.1 70B Meta | 60.48 ± 0.14 | 52.70 ± 0.36 | 52.70 ± 0.36 |
![]() ![]() Gemma 2 27B | 59.79 ± 0.15 | 49.22 ± 0.22 | 49.22 ± 0.22 |
![]() ![]() Aya Expanse 32B CohereLabs | 59.65 ± 0.15 | 53.85 ± 0.27 | 53.85 ± 0.27 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 59.14 ± 0.13 | 42.20 ± 0.28 | 42.20 ± 0.28 |
![]() ![]() Qwen 3 8B Alibaba | 58.58 ± 0.14 | 53.48 ± 0.41 | 53.48 ± 0.41 |
![]() ![]() Qwen 2.5 14B Alibaba | 58.49 ± 0.12 | 54.33 ± 0.31 | 54.33 ± 0.31 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.23 ± 0.24 | 43.26 ± 0.64 | 43.26 ± 0.64 |
![]() ![]() MERaLiON 2 10B A*STAR | 55.06 ± 0.15 | 45.57 ± 0.31 | 45.57 ± 0.31 |
![]() ![]() Gemma 2 9B | 54.92 ± 0.18 | 37.86 ± 0.28 | 37.86 ± 0.28 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 53.67 ± 0.22 | 45.37 ± 0.82 | 45.37 ± 0.82 |
![]() ![]() Qwen 2.5 7B Alibaba | 52.56 ± 0.13 | 45.81 ± 0.21 | 45.81 ± 0.21 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.51 ± 0.20 | 41.33 ± 0.60 | 41.33 ± 0.60 |
![]() ![]() Sailor2 20B SAIL | 51.53 ± 0.16 | 51.72 ± 0.30 | 51.72 ± 0.30 |
![]() ![]() Olmo 2 0325 32B AI2 | 50.75 ± 0.21 | 40.50 ± 0.52 | 40.50 ± 0.52 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 50.74 ± 0.22 | 45.07 ± 0.58 | 45.07 ± 0.58 |
![]() ![]() Llama 3 70B Meta | 49.96 ± 0.14 | 48.67 ± 0.27 | 48.67 ± 0.27 |
![]() ![]() Aya Expanse 8B CohereLabs | 49.78 ± 0.15 | 48.09 ± 0.46 | 48.09 ± 0.46 |
![]() ![]() Command R 08-2024 32B CohereLabs | 49.15 ± 0.20 | 44.23 ± 0.58 | 44.23 ± 0.58 |
![]() ![]() phi-4 14B Microsoft | 49.14 ± 0.28 | 13.37 ± 0.78 | 13.37 ± 0.78 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 49.00 ± 0.16 | 32.12 ± 0.34 | 32.12 ± 0.34 |
![]() ![]() Llama 3.1 8B Meta | 47.23 ± 0.21 | 39.19 ± 0.47 | 39.19 ± 0.47 |
![]() ![]() Sailor2 8B SAIL | 46.68 ± 0.18 | 44.16 ± 0.47 | 44.16 ± 0.47 |
![]() ![]() Tulu 3 8B AI2 | 42.96 ± 0.13 | 35.43 ± 0.49 | 35.43 ± 0.49 |
![]() ![]() Llama 3 8B Meta | 38.80 ± 0.15 | 30.32 ± 0.28 | 30.32 ± 0.28 |
![]() ![]() Olmo 2 1124 13B AI2 | 36.46 ± 0.26 | 25.02 ± 0.64 | 25.02 ± 0.64 |
![]() ![]() Babel 83B Alibaba-DAMO | 36.39 ± 0.32 | 8.48 ± 0.78 | 8.48 ± 0.78 |
![]() ![]() Apertus 8B Swiss AI | 33.71 ± 0.29 | 20.30 ± 0.66 | 20.30 ± 0.66 |
![]() ![]() Apertus 70B Swiss AI | 33.31 ± 0.22 | 30.27 ± 0.58 | 30.27 ± 0.58 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 33.07 ± 0.25 | 28.69 ± 0.45 | 28.69 ± 0.45 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 32.57 ± 0.22 | 24.23 ± 0.61 | 24.23 ± 0.61 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.20 ± 0.26 | 19.55 ± 0.51 | 19.55 ± 0.51 |
![]() ![]() Ministral 2410 8B Mistral AI | 28.04 ± 0.21 | 21.99 ± 0.44 | 21.99 ± 0.44 |
![]() ![]() Olmo 2 1124 7B AI2 | 25.27 ± 0.27 | 5.21 ± 1.03 | 5.21 ± 1.03 |
Model | ID | Knowledge | Global MMLU Lite |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 67.11 ± 0.10 | 71.95 ± 0.27 | 71.95 ± 0.27 |
![]() ![]() Command A 03-2025 111B CohereLabs | 66.73 ± 0.17 | 69.22 ± 0.38 | 69.22 ± 0.38 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 66.59 ± 0.10 | 70.64 ± 0.18 | 70.64 ± 0.18 |
![]() ![]() Qwen 3 32B Alibaba | 65.67 ± 0.11 | 67.20 ± 0.30 | 67.20 ± 0.30 |
![]() ![]() Qwen 2.5 72B Alibaba | 64.82 ± 0.08 | 73.50 ± 0.25 | 73.50 ± 0.25 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.33 ± 0.14 | 65.11 ± 0.25 | 65.11 ± 0.25 |
![]() ![]() Gemma 3 27B | 64.12 ± 0.15 | 65.50 ± 0.25 | 65.50 ± 0.25 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 64.04 ± 0.18 | 74.93 ± 0.41 | 74.93 ± 0.41 |
![]() ![]() Qwen 3 30B MoE Alibaba | 63.29 ± 0.10 | 68.18 ± 0.24 | 68.18 ± 0.24 |
![]() ![]() Llama 3.3 70B Meta | 63.17 ± 0.09 | 73.28 ± 0.20 | 73.28 ± 0.20 |
![]() ![]() Tulu 3 70B AI2 | 62.66 ± 0.17 | 66.16 ± 0.53 | 66.16 ± 0.53 |
![]() ![]() Qwen 2.5 32B Alibaba | 61.82 ± 0.10 | 66.19 ± 0.23 | 66.19 ± 0.23 |
![]() ![]() Gemma 3 12B | 61.80 ± 0.11 | 63.09 ± 0.25 | 63.09 ± 0.25 |
![]() ![]() Qwen 3 14B Alibaba | 61.37 ± 0.12 | 59.15 ± 0.21 | 59.15 ± 0.21 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.27 ± 0.11 | 67.90 ± 0.09 | 67.90 ± 0.09 |
![]() ![]() Llama 3.1 70B Meta | 60.48 ± 0.14 | 71.47 ± 0.39 | 71.47 ± 0.39 |
![]() ![]() Gemma 2 27B | 59.79 ± 0.15 | 62.08 ± 0.39 | 62.08 ± 0.39 |
![]() ![]() Aya Expanse 32B CohereLabs | 59.65 ± 0.15 | 53.81 ± 0.41 | 53.81 ± 0.41 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 59.14 ± 0.13 | 58.71 ± 0.47 | 58.71 ± 0.47 |
![]() ![]() Qwen 3 8B Alibaba | 58.58 ± 0.14 | 59.49 ± 0.32 | 59.49 ± 0.32 |
![]() ![]() Qwen 2.5 14B Alibaba | 58.49 ± 0.12 | 59.81 ± 0.23 | 59.81 ± 0.23 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.23 ± 0.24 | 62.33 ± 0.60 | 62.33 ± 0.60 |
![]() ![]() MERaLiON 2 10B A*STAR | 55.06 ± 0.15 | 54.13 ± 0.44 | 54.13 ± 0.44 |
![]() ![]() Gemma 2 9B | 54.92 ± 0.18 | 56.60 ± 0.27 | 56.60 ± 0.27 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 53.67 ± 0.22 | 60.59 ± 0.48 | 60.59 ± 0.48 |
![]() ![]() Qwen 2.5 7B Alibaba | 52.56 ± 0.13 | 53.30 ± 0.17 | 53.30 ± 0.17 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.51 ± 0.20 | 51.66 ± 0.54 | 51.66 ± 0.54 |
![]() ![]() Sailor2 20B SAIL | 51.53 ± 0.16 | 48.07 ± 0.28 | 48.07 ± 0.28 |
![]() ![]() Olmo 2 0325 32B AI2 | 50.75 ± 0.21 | 52.13 ± 0.68 | 52.13 ± 0.68 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 50.74 ± 0.22 | 40.20 ± 0.59 | 40.20 ± 0.59 |
![]() ![]() Llama 3 70B Meta | 49.96 ± 0.14 | 65.35 ± 0.36 | 65.35 ± 0.36 |
![]() ![]() Aya Expanse 8B CohereLabs | 49.78 ± 0.15 | 42.83 ± 0.29 | 42.83 ± 0.29 |
![]() ![]() Command R 08-2024 32B CohereLabs | 49.15 ± 0.20 | 46.31 ± 0.68 | 46.31 ± 0.68 |
![]() ![]() phi-4 14B Microsoft | 49.14 ± 0.28 | 62.37 ± 0.63 | 62.37 ± 0.63 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 49.00 ± 0.16 | 39.51 ± 0.68 | 39.51 ± 0.68 |
![]() ![]() Llama 3.1 8B Meta | 47.23 ± 0.21 | 42.06 ± 0.50 | 42.06 ± 0.50 |
![]() ![]() Sailor2 8B SAIL | 46.68 ± 0.18 | 49.69 ± 0.42 | 49.69 ± 0.42 |
![]() ![]() Tulu 3 8B AI2 | 42.96 ± 0.13 | 37.38 ± 0.48 | 37.38 ± 0.48 |
![]() ![]() Llama 3 8B Meta | 38.80 ± 0.15 | 41.00 ± 0.44 | 41.00 ± 0.44 |
![]() ![]() Olmo 2 1124 13B AI2 | 36.46 ± 0.26 | 31.20 ± 0.45 | 31.20 ± 0.45 |
![]() ![]() Babel 83B Alibaba-DAMO | 36.39 ± 0.32 | 51.51 ± 1.03 | 51.51 ± 1.03 |
![]() ![]() Apertus 8B Swiss AI | 33.71 ± 0.29 | 27.50 ± 0.88 | 27.50 ± 0.88 |
![]() ![]() Apertus 70B Swiss AI | 33.31 ± 0.22 | 15.29 ± 0.78 | 15.29 ± 0.78 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 33.07 ± 0.25 | 33.43 ± 0.79 | 33.43 ± 0.79 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 32.57 ± 0.22 | 27.56 ± 0.72 | 27.56 ± 0.72 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.20 ± 0.26 | 27.63 ± 0.88 | 27.63 ± 0.88 |
![]() ![]() Ministral 2410 8B Mistral AI | 28.04 ± 0.21 | 26.14 ± 0.79 | 26.14 ± 0.79 |
![]() ![]() Olmo 2 1124 7B AI2 | 25.27 ± 0.27 | 17.31 ± 0.68 | 17.31 ± 0.68 |