Indonesian Performance
Indonesian Scores by Model
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 111B 74.75±0.66 |
![]() ![]() 32B 72.81±0.18 |
![]() ![]() 30B MoE 72.36±0.28 |
![]() ![]() 70B 72.15±0.48 |
![]() ![]() 27B 71.89±0.33 |
![]() ![]() 27B 71.52±0.26 |
![]() ![]() 72B 71.09±0.44 |
![]() ![]() 70B 71.00±0.19 |
![]() ![]() 70B 70.90±0.16 |
![]() ![]() 14B 70.55±0.09 |
![]() ![]() 12B 70.17±0.23 |
![]() ![]() 109B MoE 69.72±0.24 |
![]() ![]() 32B 69.47±0.25 |
![]() ![]() 70B 68.33±0.40 |
![]() ![]() 32B 67.84±0.33 |
![]() ![]() 9B 67.80±0.35 |
![]() ![]() 8B 67.80±0.33 |
![]() ![]() 27B 67.16±0.26 |
![]() ![]() 123B 67.08±1.36 |
![]() ![]() 14B 66.67±0.29 |
![]() ![]() 10B 64.29±0.27 |
![]() ![]() 9B 64.17±0.46 |
![]() ![]() 8B 63.97±0.44 |
![]() ![]() 24B 63.76±0.55 |
![]() ![]() 7B 63.06±0.22 |
![]() ![]() 32B 62.21±0.39 |
![]() ![]() 104B 61.61±0.90 |
![]() ![]() 21B MoE 61.33±1.19 |
![]() ![]() 8B 61.16±0.31 |
![]() ![]() 14B 60.52±1.30 |
![]() ![]() 20B 60.28±0.27 |
![]() ![]() 8B 59.92±0.42 |
![]() ![]() 70B 59.62±0.28 |
![]() ![]() 32B 59.42±0.38 |
![]() ![]() 8B 56.72±0.64 |
![]() ![]() 8B 56.55±0.38 |
![]() ![]() 8B 51.32±0.52 |
![]() ![]() 13B 50.37±0.30 |
![]() ![]() 83B 49.21±3.64 |
![]() ![]() 9B 48.08±1.45 |
![]() ![]() 7B 47.11±2.13 |
![]() ![]() 7B 46.71±1.61 |
![]() ![]() 8B 42.77±0.70 |
![]() ![]() 7B 41.90±1.18 |
Indonesian Competencies
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | ID | Instruction Following | Linguistic Diagnostics | Multi-Turn Chat | NLG | NLR | NLU | Safety | Knowledge |
---|---|---|---|---|---|---|---|---|---|
![]() ![]() Command A 03-2025 111B CohereLabs | 74.75 ± 0.66 | 92.26 ± 0.96 | 77.35 ± 1.45 | 49.08 ± 1.62 | 54.98 ± 0.10 | 89.62 ± 2.47 | 83.98 ± 0.30 | 74.09 ± 0.92 | 76.66 ± 1.16 |
![]() ![]() Qwen 3 32B Alibaba | 72.81 ± 0.18 | 87.50 ± 1.02 | 79.22 ± 0.22 | 45.26 ± 0.78 | 54.64 ± 0.10 | 90.45 ± 0.27 | 85.30 ± 0.23 | 65.15 ± 1.72 | 74.97 ± 0.70 |
![]() ![]() Qwen 3 30B MoE Alibaba | 72.36 ± 0.28 | 90.24 ± 0.77 | 68.16 ± 0.31 | 52.59 ± 1.98 | 53.79 ± 0.06 | 89.79 ± 0.25 | 81.59 ± 0.16 | 67.28 ± 0.19 | 75.44 ± 0.24 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.15 ± 0.48 | 92.62 ± 1.31 | 73.99 ± 2.48 | 29.74 ± 1.67 | 55.99 ± 0.15 | 91.13 ± 0.20 | 83.80 ± 0.53 | 69.33 ± 1.25 | 80.63 ± 0.77 |
![]() ![]() SEA-LION v4 27B AISG | 71.89 ± 0.33 | 90.12 ± 1.05 | 74.50 ± 0.51 | 46.50 ± 1.15 | 54.87 ± 0.13 | 89.54 ± 0.21 | 83.98 ± 0.14 | 61.91 ± 1.53 | 73.66 ± 0.48 |
![]() ![]() Gemma 3 27B | 71.52 ± 0.26 | 88.93 ± 1.11 | 75.38 ± 0.33 | 45.15 ± 0.95 | 55.03 ± 0.12 | 89.27 ± 0.16 | 84.05 ± 0.14 | 60.55 ± 0.84 | 73.78 ± 0.33 |
![]() ![]() Qwen 2.5 72B Alibaba | 71.09 ± 0.44 | 90.12 ± 1.11 | 78.09 ± 0.76 | 32.65 ± 1.41 | 54.88 ± 0.08 | 88.08 ± 0.73 | 83.67 ± 0.18 | 61.35 ± 1.35 | 79.91 ± 0.43 |
![]() ![]() Tulu 3 70B AI2 | 71.00 ± 0.19 | 88.10 ± 1.50 | 76.23 ± 0.28 | 30.17 ± 1.16 | 53.87 ± 0.07 | 90.90 ± 0.15 | 83.25 ± 0.32 | 72.08 ± 1.04 | 73.44 ± 0.86 |
![]() ![]() Llama 3.3 70B Meta | 70.90 ± 0.16 | 93.45 ± 0.42 | 77.31 ± 0.35 | 16.06 ± 0.97 | 55.29 ± 0.16 | 91.41 ± 0.16 | 84.27 ± 0.11 | 70.09 ± 0.22 | 79.31 ± 0.22 |
![]() ![]() Qwen 3 14B Alibaba | 70.55 ± 0.09 | 86.90 ± 1.16 | 72.43 ± 0.27 | 35.40 ± 1.08 | 54.24 ± 0.10 | 88.28 ± 0.19 | 83.02 ± 0.22 | 75.21 ± 0.19 | 68.88 ± 0.58 |
![]() ![]() Gemma 3 12B | 70.17 ± 0.23 | 89.64 ± 1.39 | 72.52 ± 0.18 | 38.69 ± 1.83 | 54.76 ± 0.21 | 89.18 ± 0.15 | 84.12 ± 0.13 | 61.10 ± 0.83 | 71.34 ± 0.18 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.72 ± 0.24 | 92.02 ± 0.93 | 76.32 ± 0.07 | 21.34 ± 1.45 | 55.44 ± 0.07 | 83.77 ± 0.12 | 84.76 ± 0.10 | 68.71 ± 0.10 | 75.38 ± 0.09 |
![]() ![]() Qwen 2.5 32B Alibaba | 69.47 ± 0.25 | 88.81 ± 1.35 | 78.05 ± 0.17 | 20.04 ± 1.15 | 53.60 ± 0.13 | 86.20 ± 0.45 | 85.10 ± 0.08 | 69.24 ± 0.59 | 74.72 ± 0.31 |
![]() ![]() Llama 3.1 70B Meta | 68.33 ± 0.40 | 82.38 ± 2.00 | 74.84 ± 1.50 | 13.69 ± 0.59 | 57.15 ± 0.17 | 89.11 ± 0.81 | 82.74 ± 0.44 | 68.97 ± 1.47 | 77.78 ± 0.64 |
![]() ![]() Aya Expanse 32B CohereLabs | 67.84 ± 0.33 | 82.62 ± 0.85 | 73.36 ± 0.28 | 26.72 ± 1.60 | 55.41 ± 0.11 | 86.96 ± 0.58 | 83.97 ± 0.20 | 68.92 ± 0.59 | 64.78 ± 0.35 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 67.80 ± 0.35 | 89.29 ± 1.65 | 71.35 ± 0.48 | 27.05 ± 1.14 | 55.31 ± 0.13 | 88.01 ± 0.56 | 83.78 ± 0.59 | 59.19 ± 1.14 | 68.47 ± 0.52 |
![]() ![]() Qwen 3 8B Alibaba | 67.80 ± 0.33 | 83.69 ± 0.74 | 69.32 ± 0.79 | 32.38 ± 1.05 | 54.38 ± 0.06 | 83.96 ± 0.26 | 82.22 ± 0.23 | 67.24 ± 1.49 | 69.19 ± 0.37 |
![]() ![]() Gemma 2 27B | 67.16 ± 0.26 | 86.55 ± 1.71 | 72.42 ± 0.64 | 19.13 ± 1.03 | 56.14 ± 0.20 | 88.08 ± 0.89 | 83.86 ± 0.30 | 60.23 ± 0.99 | 70.91 ± 0.24 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 67.08 ± 1.36 | 88.93 ± 2.71 | 68.75 ± 3.35 | 21.98 ± 1.35 | 54.45 ± 0.10 | 88.91 ± 1.98 | 83.46 ± 0.33 | 58.64 ± 9.36 | 71.56 ± 1.80 |
![]() ![]() Qwen 2.5 14B Alibaba | 66.67 ± 0.29 | 84.40 ± 1.41 | 72.40 ± 0.18 | 18.16 ± 1.05 | 53.12 ± 0.14 | 86.25 ± 0.46 | 83.05 ± 0.28 | 66.56 ± 0.72 | 69.44 ± 0.27 |
![]() ![]() MERaLiON 2 10B A*STAR | 64.29 ± 0.27 | 81.67 ± 1.79 | 69.43 ± 0.57 | 13.09 ± 0.81 | 55.73 ± 0.25 | 82.21 ± 1.07 | 83.57 ± 0.35 | 63.31 ± 0.96 | 65.28 ± 0.74 |
![]() ![]() Gemma 2 9B | 64.17 ± 0.46 | 82.26 ± 2.11 | 70.15 ± 0.79 | 14.87 ± 1.70 | 55.65 ± 0.12 | 84.36 ± 0.57 | 83.12 ± 0.59 | 55.84 ± 0.78 | 67.13 ± 0.41 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 63.97 ± 0.44 | 83.81 ± 1.83 | 60.74 ± 1.38 | 23.06 ± 1.03 | 54.49 ± 0.09 | 83.80 ± 1.07 | 78.69 ± 0.77 | 64.42 ± 2.65 | 62.75 ± 0.55 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 63.76 ± 0.55 | 75.48 ± 2.17 | 67.14 ± 1.91 | 10.56 ± 1.07 | 51.42 ± 0.23 | 88.04 ± 0.58 | 81.40 ± 0.54 | 66.30 ± 1.53 | 69.75 ± 0.78 |
![]() ![]() Qwen 2.5 7B Alibaba | 63.06 ± 0.22 | 78.69 ± 1.49 | 63.75 ± 0.85 | 15.84 ± 0.84 | 52.43 ± 0.06 | 85.26 ± 0.17 | 80.52 ± 0.27 | 64.19 ± 0.72 | 63.78 ± 0.17 |
![]() ![]() Olmo 2 0325 32B AI2 | 62.21 ± 0.39 | 81.67 ± 2.79 | 66.01 ± 1.83 | 9.21 ± 0.68 | 50.15 ± 0.18 | 84.41 ± 0.93 | 77.02 ± 1.40 | 65.84 ± 3.48 | 63.41 ± 1.13 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 61.61 ± 0.90 | 78.45 ± 1.96 | 64.91 ± 3.78 | 10.40 ± 0.82 | 54.32 ± 0.19 | 82.08 ± 3.77 | 81.97 ± 0.84 | 66.51 ± 5.16 | 54.28 ± 3.07 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 61.33 ± 1.19 | 82.02 ± 1.56 | 63.49 ± 1.28 | 18.70 ± 0.45 | 51.18 ± 0.07 | 79.35 ± 1.78 | 80.86 ± 1.03 | 61.40 ± 1.34 | 53.63 ± 5.76 |
![]() ![]() Aya Expanse 8B CohereLabs | 61.16 ± 0.31 | 70.24 ± 1.99 | 60.91 ± 0.22 | 21.07 ± 1.77 | 53.11 ± 0.12 | 82.79 ± 0.23 | 80.78 ± 0.20 | 64.55 ± 2.38 | 55.81 ± 0.38 |
![]() ![]() phi-4 14B Microsoft | 60.52 ± 1.30 | 73.81 ± 2.29 | 69.12 ± 1.63 | 18.37 ± 1.01 | 45.19 ± 0.52 | 83.96 ± 2.53 | 75.86 ± 0.32 | 47.23 ± 10.77 | 70.59 ± 0.87 |
![]() ![]() Sailor2 20B SAIL | 60.28 ± 0.27 | 43.81 ± 2.82 | 67.47 ± 0.23 | 26.35 ± 1.12 | 50.03 ± 0.27 | 86.47 ± 0.10 | 79.45 ± 0.13 | 68.28 ± 0.55 | 60.41 ± 1.40 |
![]() ![]() Llama 3.1 8B Meta | 59.92 ± 0.42 | 77.74 ± 2.39 | 60.69 ± 1.28 | 9.48 ± 0.53 | 54.22 ± 0.11 | 78.98 ± 0.99 | 79.60 ± 0.26 | 62.95 ± 1.19 | 55.66 ± 0.51 |
![]() ![]() Llama 3 70B Meta | 59.62 ± 0.28 | 29.29 ± 1.31 | 70.39 ± 0.56 | 10.94 ± 0.76 | 55.71 ± 0.11 | 87.82 ± 0.17 | 81.73 ± 0.28 | 68.08 ± 0.88 | 73.00 ± 0.29 |
![]() ![]() Command R 08-2024 32B CohereLabs | 59.42 ± 0.38 | 69.88 ± 2.28 | 67.10 ± 1.75 | 6.03 ± 0.55 | 53.39 ± 0.15 | 82.14 ± 4.66 | 80.22 ± 0.61 | 57.81 ± 7.70 | 58.81 ± 1.88 |
![]() ![]() Sailor2 8B SAIL | 56.72 ± 0.64 | 45.36 ± 1.49 | 48.35 ± 3.56 | 23.92 ± 1.40 | 50.50 ± 0.28 | 84.59 ± 0.32 | 75.49 ± 0.41 | 63.89 ± 2.63 | 61.66 ± 0.43 |
![]() ![]() Tulu 3 8B AI2 | 56.55 ± 0.38 | 84.17 ± 1.49 | 36.17 ± 3.08 | 12.66 ± 1.28 | 52.19 ± 0.08 | 76.52 ± 1.41 | 71.35 ± 0.57 | 67.59 ± 0.78 | 51.75 ± 0.99 |
![]() ![]() Llama 3 8B Meta | 51.32 ± 0.52 | 32.14 ± 1.72 | 61.41 ± 1.33 | 5.77 ± 0.71 | 53.23 ± 0.12 | 75.89 ± 1.77 | 78.24 ± 0.43 | 49.35 ± 1.70 | 54.56 ± 0.86 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.37 ± 0.30 | 68.69 ± 1.34 | 56.99 ± 2.00 | 5.93 ± 0.61 | 44.38 ± 0.81 | 64.74 ± 4.33 | 71.86 ± 0.83 | 43.66 ± 3.72 | 46.72 ± 0.98 |
![]() ![]() Babel 83B Alibaba-DAMO | 49.21 ± 3.64 | 46.07 ± 3.36 | 59.72 ± 3.72 | 9.75 ± 1.29 | 45.14 ± 1.47 | 71.92 ± 5.39 | 64.52 ± 7.21 | 33.75 ± 13.77 | 62.81 ± 7.60 |
![]() ![]() Babel 9B Alibaba-DAMO | 48.08 ± 1.45 | 44.76 ± 3.85 | 54.86 ± 2.70 | 4.36 ± 0.59 | 46.46 ± 3.13 | 63.24 ± 5.48 | 71.53 ± 5.35 | 56.16 ± 3.50 | 43.28 ± 6.75 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 47.11 ± 2.13 | 56.07 ± 3.18 | 58.93 ± 1.73 | 1.78 ± 0.46 | 33.95 ± 1.07 | 63.23 ± 8.53 | 71.48 ± 3.47 | 42.16 ± 5.20 | 49.31 ± 7.37 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 46.71 ± 1.61 | 52.74 ± 1.76 | 51.03 ± 2.69 | 11.42 ± 1.01 | 45.72 ± 0.92 | 61.32 ± 4.17 | 65.78 ± 4.65 | 42.08 ± 7.92 | 43.59 ± 4.64 |
![]() ![]() Ministral 2410 8B Mistral AI | 42.77 ± 0.70 | 38.45 ± 3.19 | 54.41 ± 0.92 | 3.77 ± 0.59 | 37.91 ± 1.77 | 60.94 ± 6.66 | 68.76 ± 2.19 | 35.16 ± 8.11 | 42.75 ± 2.68 |
![]() ![]() Olmo 2 1124 7B AI2 | 41.90 ± 1.18 | 59.88 ± 1.56 | 52.91 ± 2.55 | 2.69 ± 0.74 | 36.83 ± 0.54 | 53.06 ± 5.70 | 64.69 ± 4.19 | 29.66 ± 8.82 | 35.44 ± 2.72 |
Indonesian Tasks
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | ID | Instruction Following | SEA-IFEval |
---|---|---|---|
![]() ![]() Command A 03-2025 111B CohereLabs | 74.75 ± 0.66 | 92.26 ± 0.96 | 92.26 ± 4.12 |
![]() ![]() Qwen 3 32B Alibaba | 72.81 ± 0.18 | 87.50 ± 1.02 | 87.50 ± 5.49 |
![]() ![]() Qwen 3 30B MoE Alibaba | 72.36 ± 0.28 | 90.24 ± 0.77 | 90.24 ± 5.16 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.15 ± 0.48 | 92.62 ± 1.31 | 92.62 ± 3.94 |
![]() ![]() SEA-LION v4 27B AISG | 71.89 ± 0.33 | 90.12 ± 1.05 | 90.12 ± 4.83 |
![]() ![]() Gemma 3 27B | 71.52 ± 0.26 | 88.93 ± 1.11 | 88.93 ± 4.88 |
![]() ![]() Qwen 2.5 72B Alibaba | 71.09 ± 0.44 | 90.12 ± 1.11 | 90.12 ± 4.72 |
![]() ![]() Tulu 3 70B AI2 | 71.00 ± 0.19 | 88.10 ± 1.50 | 88.10 ± 4.90 |
![]() ![]() Llama 3.3 70B Meta | 70.90 ± 0.16 | 93.45 ± 0.42 | 93.45 ± 4.54 |
![]() ![]() Qwen 3 14B Alibaba | 70.55 ± 0.09 | 86.90 ± 1.16 | 86.90 ± 5.91 |
![]() ![]() Gemma 3 12B | 70.17 ± 0.23 | 89.64 ± 1.39 | 89.64 ± 4.51 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.72 ± 0.24 | 92.02 ± 0.93 | 92.02 ± 4.48 |
![]() ![]() Qwen 2.5 32B Alibaba | 69.47 ± 0.25 | 88.81 ± 1.35 | 88.81 ± 4.80 |
![]() ![]() Llama 3.1 70B Meta | 68.33 ± 0.40 | 82.38 ± 2.00 | 82.38 ± 5.72 |
![]() ![]() Aya Expanse 32B CohereLabs | 67.84 ± 0.33 | 82.62 ± 0.85 | 82.62 ± 6.26 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 67.80 ± 0.35 | 89.29 ± 1.65 | 89.29 ± 4.40 |
![]() ![]() Qwen 3 8B Alibaba | 67.80 ± 0.33 | 83.69 ± 0.74 | 83.69 ± 6.52 |
![]() ![]() Gemma 2 27B | 67.16 ± 0.26 | 86.55 ± 1.71 | 86.55 ± 4.99 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 67.08 ± 1.36 | 88.93 ± 2.71 | 88.93 ± 4.31 |
![]() ![]() Qwen 2.5 14B Alibaba | 66.67 ± 0.29 | 84.40 ± 1.41 | 84.40 ± 5.43 |
![]() ![]() MERaLiON 2 10B A*STAR | 64.29 ± 0.27 | 81.67 ± 1.79 | 81.67 ± 5.65 |
![]() ![]() Gemma 2 9B | 64.17 ± 0.46 | 82.26 ± 2.11 | 82.26 ± 5.05 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 63.97 ± 0.44 | 83.81 ± 1.83 | 83.81 ± 5.24 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 63.76 ± 0.55 | 75.48 ± 2.17 | 75.48 ± 5.67 |
![]() ![]() Qwen 2.5 7B Alibaba | 63.06 ± 0.22 | 78.69 ± 1.49 | 78.69 ± 6.60 |
![]() ![]() Olmo 2 0325 32B AI2 | 62.21 ± 0.39 | 81.67 ± 2.79 | 81.67 ± 5.21 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 61.61 ± 0.90 | 78.45 ± 1.96 | 78.45 ± 5.58 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 61.33 ± 1.19 | 82.02 ± 1.56 | 82.02 ± 5.99 |
![]() ![]() Aya Expanse 8B CohereLabs | 61.16 ± 0.31 | 70.24 ± 1.99 | 70.24 ± 7.40 |
![]() ![]() phi-4 14B Microsoft | 60.52 ± 1.30 | 73.81 ± 2.29 | 73.81 ± 6.15 |
![]() ![]() Sailor2 20B SAIL | 60.28 ± 0.27 | 43.81 ± 2.82 | 43.81 ± 7.95 |
![]() ![]() Llama 3.1 8B Meta | 59.92 ± 0.42 | 77.74 ± 2.39 | 77.74 ± 6.12 |
![]() ![]() Llama 3 70B Meta | 59.62 ± 0.28 | 29.29 ± 1.31 | 29.29 ± 7.87 |
![]() ![]() Command R 08-2024 32B CohereLabs | 59.42 ± 0.38 | 69.88 ± 2.28 | 69.88 ± 6.04 |
![]() ![]() Sailor2 8B SAIL | 56.72 ± 0.64 | 45.36 ± 1.49 | 45.36 ± 8.75 |
![]() ![]() Tulu 3 8B AI2 | 56.55 ± 0.38 | 84.17 ± 1.49 | 84.17 ± 5.75 |
![]() ![]() Llama 3 8B Meta | 51.32 ± 0.52 | 32.14 ± 1.72 | 32.14 ± 8.01 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.37 ± 0.30 | 68.69 ± 1.34 | 68.69 ± 6.75 |
![]() ![]() Babel 83B Alibaba-DAMO | 49.21 ± 3.64 | 46.07 ± 3.36 | 46.07 ± 6.13 |
![]() ![]() Babel 9B Alibaba-DAMO | 48.08 ± 1.45 | 44.76 ± 3.85 | 44.76 ± 7.11 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 47.11 ± 2.13 | 56.07 ± 3.18 | 56.07 ± 5.65 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 46.71 ± 1.61 | 52.74 ± 1.76 | 52.74 ± 7.71 |
![]() ![]() Ministral 2410 8B Mistral AI | 42.77 ± 0.70 | 38.45 ± 3.19 | 38.45 ± 5.89 |
![]() ![]() Olmo 2 1124 7B AI2 | 41.90 ± 1.18 | 59.88 ± 1.56 | 59.88 ± 7.16 |
Model | ID | Linguistic Diagnostics | Syntax | Pragmatics |
---|---|---|---|---|
![]() ![]() Command A 03-2025 111B CohereLabs | 74.75 ± 0.66 | 77.35 ± 1.45 | 70.49 ± 4.18 | 84.21 ± 0.77 |
![]() ![]() Qwen 3 32B Alibaba | 72.81 ± 0.18 | 79.22 ± 0.22 | 77.01 ± 4.07 | 81.43 ± 0.34 |
![]() ![]() Qwen 3 30B MoE Alibaba | 72.36 ± 0.28 | 68.16 ± 0.31 | 64.01 ± 4.70 | 72.31 ± 0.43 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.15 ± 0.48 | 73.99 ± 2.48 | 69.70 ± 3.88 | 78.28 ± 5.01 |
![]() ![]() SEA-LION v4 27B AISG | 71.89 ± 0.33 | 74.50 ± 0.51 | 67.17 ± 4.41 | 81.84 ± 1.03 |
![]() ![]() Gemma 3 27B | 71.52 ± 0.26 | 75.38 ± 0.33 | 67.66 ± 4.51 | 83.10 ± 0.68 |
![]() ![]() Qwen 2.5 72B Alibaba | 71.09 ± 0.44 | 78.09 ± 0.76 | 73.39 ± 4.28 | 82.80 ± 0.47 |
![]() ![]() Tulu 3 70B AI2 | 71.00 ± 0.19 | 76.23 ± 0.28 | 70.86 ± 4.27 | 81.60 ± 0.64 |
![]() ![]() Llama 3.3 70B Meta | 70.90 ± 0.16 | 77.31 ± 0.35 | 68.55 ± 4.55 | 86.07 ± 0.63 |
![]() ![]() Qwen 3 14B Alibaba | 70.55 ± 0.09 | 72.43 ± 0.27 | 67.34 ± 4.60 | 77.53 ± 0.69 |
![]() ![]() Gemma 3 12B | 70.17 ± 0.23 | 72.52 ± 0.18 | 68.29 ± 4.40 | 76.75 ± 0.37 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.72 ± 0.24 | 76.32 ± 0.07 | 68.78 ± 4.65 | 83.86 ± 0.00 |
![]() ![]() Qwen 2.5 32B Alibaba | 69.47 ± 0.25 | 78.05 ± 0.17 | 73.75 ± 4.39 | 82.35 ± 0.26 |
![]() ![]() Llama 3.1 70B Meta | 68.33 ± 0.40 | 74.84 ± 1.50 | 67.86 ± 4.11 | 81.81 ± 2.94 |
![]() ![]() Aya Expanse 32B CohereLabs | 67.84 ± 0.33 | 73.36 ± 0.28 | 68.85 ± 4.56 | 77.87 ± 0.60 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 67.80 ± 0.35 | 71.35 ± 0.48 | 62.53 ± 4.31 | 80.17 ± 0.70 |
![]() ![]() Qwen 3 8B Alibaba | 67.80 ± 0.33 | 69.32 ± 0.79 | 61.28 ± 4.27 | 77.36 ± 0.46 |
![]() ![]() Gemma 2 27B | 67.16 ± 0.26 | 72.42 ± 0.64 | 63.06 ± 4.37 | 81.78 ± 0.36 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 67.08 ± 1.36 | 68.75 ± 3.35 | 59.31 ± 3.53 | 78.18 ± 5.44 |
![]() ![]() Qwen 2.5 14B Alibaba | 66.67 ± 0.29 | 72.40 ± 0.18 | 69.67 ± 4.60 | 75.12 ± 0.37 |
![]() ![]() MERaLiON 2 10B A*STAR | 64.29 ± 0.27 | 69.43 ± 0.57 | 65.53 ± 4.24 | 73.34 ± 1.02 |
![]() ![]() Gemma 2 9B | 64.17 ± 0.46 | 70.15 ± 0.79 | 62.66 ± 4.40 | 77.63 ± 1.62 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 63.97 ± 0.44 | 60.74 ± 1.38 | 55.79 ± 3.33 | 65.68 ± 2.65 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 63.76 ± 0.55 | 67.14 ± 1.91 | 57.37 ± 3.80 | 76.91 ± 3.38 |
![]() ![]() Qwen 2.5 7B Alibaba | 63.06 ± 0.22 | 63.75 ± 0.85 | 56.74 ± 4.78 | 70.76 ± 0.39 |
![]() ![]() Olmo 2 0325 32B AI2 | 62.21 ± 0.39 | 66.01 ± 1.83 | 55.76 ± 2.34 | 76.26 ± 1.67 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 61.61 ± 0.90 | 64.91 ± 3.78 | 61.15 ± 2.95 | 68.66 ± 4.52 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 61.33 ± 1.19 | 63.49 ± 1.28 | 57.83 ± 4.03 | 69.14 ± 1.03 |
![]() ![]() Aya Expanse 8B CohereLabs | 61.16 ± 0.31 | 60.91 ± 0.22 | 57.80 ± 4.77 | 64.02 ± 0.38 |
![]() ![]() phi-4 14B Microsoft | 60.52 ± 1.30 | 69.12 ± 1.63 | 62.60 ± 4.31 | 75.65 ± 1.44 |
![]() ![]() Sailor2 20B SAIL | 60.28 ± 0.27 | 67.47 ± 0.23 | 65.66 ± 4.67 | 69.29 ± 0.39 |
![]() ![]() Llama 3.1 8B Meta | 59.92 ± 0.42 | 60.69 ± 1.28 | 55.99 ± 3.07 | 65.40 ± 2.15 |
![]() ![]() Llama 3 70B Meta | 59.62 ± 0.28 | 70.39 ± 0.56 | 62.40 ± 4.75 | 78.37 ± 1.08 |
![]() ![]() Command R 08-2024 32B CohereLabs | 59.42 ± 0.38 | 67.10 ± 1.75 | 60.86 ± 3.66 | 73.35 ± 2.23 |
![]() ![]() Sailor2 8B SAIL | 56.72 ± 0.64 | 48.35 ± 3.56 | 68.19 ± 4.02 | 28.50 ± 6.92 |
![]() ![]() Tulu 3 8B AI2 | 56.55 ± 0.38 | 36.17 ± 3.08 | 53.13 ± 3.05 | 19.21 ± 5.49 |
![]() ![]() Llama 3 8B Meta | 51.32 ± 0.52 | 61.41 ± 1.33 | 56.74 ± 3.69 | 66.08 ± 1.91 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.37 ± 0.30 | 56.99 ± 2.00 | 52.93 ± 2.59 | 61.05 ± 1.23 |
![]() ![]() Babel 83B Alibaba-DAMO | 49.21 ± 3.64 | 59.72 ± 3.72 | 58.85 ± 2.48 | 60.58 ± 7.99 |
![]() ![]() Babel 9B Alibaba-DAMO | 48.08 ± 1.45 | 54.86 ± 2.70 | 51.61 ± 4.36 | 58.11 ± 6.01 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 47.11 ± 2.13 | 58.93 ± 1.73 | 56.71 ± 3.56 | 61.14 ± 3.40 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 46.71 ± 1.61 | 51.03 ± 2.69 | 51.88 ± 3.62 | 50.18 ± 4.62 |
![]() ![]() Ministral 2410 8B Mistral AI | 42.77 ± 0.70 | 54.41 ± 0.92 | 49.67 ± 2.65 | 59.16 ± 1.64 |
![]() ![]() Olmo 2 1124 7B AI2 | 41.90 ± 1.18 | 52.91 ± 2.55 | 50.69 ± 3.83 | 55.13 ± 4.94 |
Model | ID | Multi-Turn Chat | SEA-MT-Bench |
---|---|---|---|
![]() ![]() Command A 03-2025 111B CohereLabs | 74.75 ± 0.66 | 49.08 ± 1.62 | 49.08 ± 4.95 |
![]() ![]() Qwen 3 32B Alibaba | 72.81 ± 0.18 | 45.26 ± 0.78 | 45.26 ± 5.33 |
![]() ![]() Qwen 3 30B MoE Alibaba | 72.36 ± 0.28 | 52.59 ± 1.98 | 52.59 ± 5.39 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.15 ± 0.48 | 29.74 ± 1.67 | 29.74 ± 4.94 |
![]() ![]() SEA-LION v4 27B AISG | 71.89 ± 0.33 | 46.50 ± 1.15 | 46.50 ± 5.58 |
![]() ![]() Gemma 3 27B | 71.52 ± 0.26 | 45.15 ± 0.95 | 45.15 ± 5.53 |
![]() ![]() Qwen 2.5 72B Alibaba | 71.09 ± 0.44 | 32.65 ± 1.41 | 32.65 ± 5.16 |
![]() ![]() Tulu 3 70B AI2 | 71.00 ± 0.19 | 30.17 ± 1.16 | 30.17 ± 4.70 |
![]() ![]() Llama 3.3 70B Meta | 70.90 ± 0.16 | 16.06 ± 0.97 | 16.06 ± 3.86 |
![]() ![]() Qwen 3 14B Alibaba | 70.55 ± 0.09 | 35.40 ± 1.08 | 35.40 ± 5.60 |
![]() ![]() Gemma 3 12B | 70.17 ± 0.23 | 38.69 ± 1.83 | 38.69 ± 5.71 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.72 ± 0.24 | 21.34 ± 1.45 | 21.34 ± 4.57 |
![]() ![]() Qwen 2.5 32B Alibaba | 69.47 ± 0.25 | 20.04 ± 1.15 | 20.04 ± 4.57 |
![]() ![]() Llama 3.1 70B Meta | 68.33 ± 0.40 | 13.69 ± 0.59 | 13.69 ± 3.92 |
![]() ![]() Aya Expanse 32B CohereLabs | 67.84 ± 0.33 | 26.72 ± 1.60 | 26.72 ± 4.71 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 67.80 ± 0.35 | 27.05 ± 1.14 | 27.05 ± 4.70 |
![]() ![]() Qwen 3 8B Alibaba | 67.80 ± 0.33 | 32.38 ± 1.05 | 32.38 ± 5.59 |
![]() ![]() Gemma 2 27B | 67.16 ± 0.26 | 19.13 ± 1.03 | 19.13 ± 3.75 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 67.08 ± 1.36 | 21.98 ± 1.35 | 21.98 ± 4.49 |
![]() ![]() Qwen 2.5 14B Alibaba | 66.67 ± 0.29 | 18.16 ± 1.05 | 18.16 ± 4.35 |
![]() ![]() MERaLiON 2 10B A*STAR | 64.29 ± 0.27 | 13.09 ± 0.81 | 13.09 ± 3.54 |
![]() ![]() Gemma 2 9B | 64.17 ± 0.46 | 14.87 ± 1.70 | 14.87 ± 3.53 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 63.97 ± 0.44 | 23.06 ± 1.03 | 23.06 ± 4.68 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 63.76 ± 0.55 | 10.56 ± 1.07 | 10.56 ± 3.04 |
![]() ![]() Qwen 2.5 7B Alibaba | 63.06 ± 0.22 | 15.84 ± 0.84 | 15.84 ± 3.93 |
![]() ![]() Olmo 2 0325 32B AI2 | 62.21 ± 0.39 | 9.21 ± 0.68 | 9.21 ± 2.88 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 61.61 ± 0.90 | 10.40 ± 0.82 | 10.40 ± 3.18 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 61.33 ± 1.19 | 18.70 ± 0.45 | 18.70 ± 4.28 |
![]() ![]() Aya Expanse 8B CohereLabs | 61.16 ± 0.31 | 21.07 ± 1.77 | 21.07 ± 4.55 |
![]() ![]() phi-4 14B Microsoft | 60.52 ± 1.30 | 18.37 ± 1.01 | 18.37 ± 4.05 |
![]() ![]() Sailor2 20B SAIL | 60.28 ± 0.27 | 26.35 ± 1.12 | 26.35 ± 4.06 |
![]() ![]() Llama 3.1 8B Meta | 59.92 ± 0.42 | 9.48 ± 0.53 | 9.48 ± 3.31 |
![]() ![]() Llama 3 70B Meta | 59.62 ± 0.28 | 10.94 ± 0.76 | 10.94 ± 3.43 |
![]() ![]() Command R 08-2024 32B CohereLabs | 59.42 ± 0.38 | 6.03 ± 0.55 | 6.03 ± 2.42 |
![]() ![]() Sailor2 8B SAIL | 56.72 ± 0.64 | 23.92 ± 1.40 | 23.92 ± 4.53 |
![]() ![]() Tulu 3 8B AI2 | 56.55 ± 0.38 | 12.66 ± 1.28 | 12.66 ± 3.44 |
![]() ![]() Llama 3 8B Meta | 51.32 ± 0.52 | 5.77 ± 0.71 | 5.77 ± 2.36 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.37 ± 0.30 | 5.93 ± 0.61 | 5.93 ± 2.67 |
![]() ![]() Babel 83B Alibaba-DAMO | 49.21 ± 3.64 | 9.75 ± 1.29 | 9.75 ± 2.43 |
![]() ![]() Babel 9B Alibaba-DAMO | 48.08 ± 1.45 | 4.36 ± 0.59 | 4.36 ± 1.60 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 47.11 ± 2.13 | 1.78 ± 0.46 | 1.78 ± 0.99 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 46.71 ± 1.61 | 11.42 ± 1.01 | 11.42 ± 3.10 |
![]() ![]() Ministral 2410 8B Mistral AI | 42.77 ± 0.70 | 3.77 ± 0.59 | 3.77 ± 1.91 |
![]() ![]() Olmo 2 1124 7B AI2 | 41.90 ± 1.18 | 2.69 ± 0.74 | 2.69 ± 1.44 |
Model | ID | NLG | Summarization | Translations |
---|---|---|---|---|
![]() ![]() Command A 03-2025 111B CohereLabs | 74.75 ± 0.66 | 54.98 ± 0.10 | 17.45 ± 1.21 | 92.50 ± 0.15 |
![]() ![]() Qwen 3 32B Alibaba | 72.81 ± 0.18 | 54.64 ± 0.10 | 16.90 ± 1.15 | 92.37 ± 0.04 |
![]() ![]() Qwen 3 30B MoE Alibaba | 72.36 ± 0.28 | 53.79 ± 0.06 | 15.15 ± 0.96 | 92.43 ± 0.09 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.15 ± 0.48 | 55.99 ± 0.15 | 19.08 ± 1.38 | 92.90 ± 0.05 |
![]() ![]() SEA-LION v4 27B AISG | 71.89 ± 0.33 | 54.87 ± 0.13 | 16.23 ± 1.13 | 93.51 ± 0.03 |
![]() ![]() Gemma 3 27B | 71.52 ± 0.26 | 55.03 ± 0.12 | 16.47 ± 1.15 | 93.59 ± 0.02 |
![]() ![]() Qwen 2.5 72B Alibaba | 71.09 ± 0.44 | 54.88 ± 0.08 | 17.28 ± 1.30 | 92.48 ± 0.02 |
![]() ![]() Tulu 3 70B AI2 | 71.00 ± 0.19 | 53.87 ± 0.07 | 14.94 ± 0.87 | 92.80 ± 0.03 |
![]() ![]() Llama 3.3 70B Meta | 70.90 ± 0.16 | 55.29 ± 0.16 | 18.75 ± 1.54 | 91.83 ± 0.03 |
![]() ![]() Qwen 3 14B Alibaba | 70.55 ± 0.09 | 54.24 ± 0.10 | 17.48 ± 1.36 | 91.01 ± 0.13 |
![]() ![]() Gemma 3 12B | 70.17 ± 0.23 | 54.76 ± 0.21 | 16.51 ± 1.10 | 93.01 ± 0.21 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.72 ± 0.24 | 55.44 ± 0.07 | 18.40 ± 1.40 | 92.47 ± 0.02 |
![]() ![]() Qwen 2.5 32B Alibaba | 69.47 ± 0.25 | 53.60 ± 0.13 | 16.20 ± 1.13 | 91.00 ± 0.04 |
![]() ![]() Llama 3.1 70B Meta | 68.33 ± 0.40 | 57.15 ± 0.17 | 22.39 ± 1.93 | 91.90 ± 0.06 |
![]() ![]() Aya Expanse 32B CohereLabs | 67.84 ± 0.33 | 55.41 ± 0.11 | 18.21 ± 1.21 | 92.61 ± 0.04 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 67.80 ± 0.35 | 55.31 ± 0.13 | 18.01 ± 1.17 | 92.61 ± 0.06 |
![]() ![]() Qwen 3 8B Alibaba | 67.80 ± 0.33 | 54.38 ± 0.06 | 17.67 ± 1.37 | 91.08 ± 0.06 |
![]() ![]() Gemma 2 27B | 67.16 ± 0.26 | 56.14 ± 0.20 | 19.76 ± 1.43 | 92.51 ± 0.11 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 67.08 ± 1.36 | 54.45 ± 0.10 | 17.33 ± 1.22 | 91.56 ± 0.17 |
![]() ![]() Qwen 2.5 14B Alibaba | 66.67 ± 0.29 | 53.12 ± 0.14 | 15.74 ± 1.05 | 90.50 ± 0.05 |
![]() ![]() MERaLiON 2 10B A*STAR | 64.29 ± 0.27 | 55.73 ± 0.25 | 19.59 ± 1.42 | 91.88 ± 0.09 |
![]() ![]() Gemma 2 9B | 64.17 ± 0.46 | 55.65 ± 0.12 | 19.49 ± 1.38 | 91.80 ± 0.06 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 63.97 ± 0.44 | 54.49 ± 0.09 | 17.18 ± 1.23 | 91.81 ± 0.03 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 63.76 ± 0.55 | 51.42 ± 0.23 | 16.40 ± 1.11 | 86.44 ± 0.51 |
![]() ![]() Qwen 2.5 7B Alibaba | 63.06 ± 0.22 | 52.43 ± 0.06 | 17.08 ± 1.40 | 87.79 ± 0.10 |
![]() ![]() Olmo 2 0325 32B AI2 | 62.21 ± 0.39 | 50.15 ± 0.18 | 15.28 ± 0.93 | 85.02 ± 0.27 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 61.61 ± 0.90 | 54.32 ± 0.19 | 17.66 ± 1.14 | 90.98 ± 0.18 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 61.33 ± 1.19 | 51.18 ± 0.07 | 12.17 ± 0.77 | 90.18 ± 0.07 |
![]() ![]() Aya Expanse 8B CohereLabs | 61.16 ± 0.31 | 53.11 ± 0.12 | 15.71 ± 0.97 | 90.51 ± 0.11 |
![]() ![]() phi-4 14B Microsoft | 60.52 ± 1.30 | 45.19 ± 0.52 | 11.77 ± 0.66 | 78.62 ± 1.03 |
![]() ![]() Sailor2 20B SAIL | 60.28 ± 0.27 | 50.03 ± 0.27 | 14.59 ± 0.92 | 85.48 ± 0.58 |
![]() ![]() Llama 3.1 8B Meta | 59.92 ± 0.42 | 54.22 ± 0.11 | 18.34 ± 1.33 | 90.11 ± 0.09 |
![]() ![]() Llama 3 70B Meta | 59.62 ± 0.28 | 55.71 ± 0.11 | 19.78 ± 1.72 | 91.64 ± 0.02 |
![]() ![]() Command R 08-2024 32B CohereLabs | 59.42 ± 0.38 | 53.39 ± 0.15 | 16.16 ± 0.92 | 90.62 ± 0.20 |
![]() ![]() Sailor2 8B SAIL | 56.72 ± 0.64 | 50.50 ± 0.28 | 15.58 ± 1.10 | 85.41 ± 0.40 |
![]() ![]() Tulu 3 8B AI2 | 56.55 ± 0.38 | 52.19 ± 0.08 | 14.24 ± 1.01 | 90.14 ± 0.08 |
![]() ![]() Llama 3 8B Meta | 51.32 ± 0.52 | 53.23 ± 0.12 | 17.75 ± 1.38 | 88.71 ± 0.10 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.37 ± 0.30 | 44.38 ± 0.81 | 10.57 ± 0.58 | 78.19 ± 1.53 |
![]() ![]() Babel 83B Alibaba-DAMO | 49.21 ± 3.64 | 45.14 ± 1.47 | 15.62 ± 0.97 | 74.65 ± 2.97 |
![]() ![]() Babel 9B Alibaba-DAMO | 48.08 ± 1.45 | 46.46 ± 3.13 | 15.64 ± 1.21 | 77.28 ± 6.34 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 47.11 ± 2.13 | 33.95 ± 1.07 | 11.51 ± 1.10 | 56.39 ± 2.03 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 46.71 ± 1.61 | 45.72 ± 0.92 | 4.32 ± 0.95 | 87.13 ± 0.66 |
![]() ![]() Ministral 2410 8B Mistral AI | 42.77 ± 0.70 | 37.91 ± 1.77 | 11.22 ± 0.83 | 64.59 ± 3.38 |
![]() ![]() Olmo 2 1124 7B AI2 | 41.90 ± 1.18 | 36.83 ± 0.54 | 9.79 ± 0.70 | 63.87 ± 1.18 |
Model | ID | NLR | Causal Reasoning | Natural Language Inference |
---|---|---|---|---|
![]() ![]() Command A 03-2025 111B CohereLabs | 74.75 ± 0.66 | 89.62 ± 2.47 | 97.22 ± 1.31 | 82.01 ± 1.89 |
![]() ![]() Qwen 3 32B Alibaba | 72.81 ± 0.18 | 90.45 ± 0.27 | 95.10 ± 1.85 | 85.80 ± 2.07 |
![]() ![]() Qwen 3 30B MoE Alibaba | 72.36 ± 0.28 | 89.79 ± 0.25 | 96.08 ± 1.68 | 83.50 ± 2.20 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.15 ± 0.48 | 91.13 ± 0.20 | 95.93 ± 1.47 | 86.33 ± 1.84 |
![]() ![]() SEA-LION v4 27B AISG | 71.89 ± 0.33 | 89.54 ± 0.21 | 95.05 ± 1.81 | 84.04 ± 2.21 |
![]() ![]() Gemma 3 27B | 71.52 ± 0.26 | 89.27 ± 0.16 | 94.88 ± 1.90 | 83.66 ± 2.26 |
![]() ![]() Qwen 2.5 72B Alibaba | 71.09 ± 0.44 | 88.08 ± 0.73 | 97.22 ± 1.42 | 78.92 ± 2.40 |
![]() ![]() Tulu 3 70B AI2 | 71.00 ± 0.19 | 90.90 ± 0.15 | 95.83 ± 1.56 | 85.97 ± 2.00 |
![]() ![]() Llama 3.3 70B Meta | 70.90 ± 0.16 | 91.41 ± 0.16 | 96.65 ± 1.54 | 86.16 ± 2.09 |
![]() ![]() Qwen 3 14B Alibaba | 70.55 ± 0.09 | 88.28 ± 0.19 | 92.55 ± 2.27 | 84.01 ± 2.23 |
![]() ![]() Gemma 3 12B | 70.17 ± 0.23 | 89.18 ± 0.15 | 94.95 ± 1.90 | 83.41 ± 2.22 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.72 ± 0.24 | 83.77 ± 0.12 | 95.63 ± 1.79 | 71.92 ± 2.77 |
![]() ![]() Qwen 2.5 32B Alibaba | 69.47 ± 0.25 | 86.20 ± 0.45 | 96.67 ± 1.56 | 75.72 ± 2.60 |
![]() ![]() Llama 3.1 70B Meta | 68.33 ± 0.40 | 89.11 ± 0.81 | 95.73 ± 1.62 | 82.50 ± 2.10 |
![]() ![]() Aya Expanse 32B CohereLabs | 67.84 ± 0.33 | 86.96 ± 0.58 | 94.77 ± 1.88 | 79.14 ± 2.37 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 67.80 ± 0.35 | 88.01 ± 0.56 | 95.50 ± 1.67 | 80.51 ± 2.31 |
![]() ![]() Qwen 3 8B Alibaba | 67.80 ± 0.33 | 83.96 ± 0.26 | 87.65 ± 2.64 | 80.26 ± 2.33 |
![]() ![]() Gemma 2 27B | 67.16 ± 0.26 | 88.08 ± 0.89 | 94.55 ± 1.90 | 81.60 ± 2.23 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 67.08 ± 1.36 | 88.91 ± 1.98 | 95.45 ± 1.57 | 82.38 ± 1.94 |
![]() ![]() Qwen 2.5 14B Alibaba | 66.67 ± 0.29 | 86.25 ± 0.46 | 95.40 ± 1.80 | 77.10 ± 2.53 |
![]() ![]() MERaLiON 2 10B A*STAR | 64.29 ± 0.27 | 82.21 ± 1.07 | 93.53 ± 1.96 | 70.90 ± 2.63 |
![]() ![]() Gemma 2 9B | 64.17 ± 0.46 | 84.36 ± 0.57 | 94.47 ± 1.79 | 74.25 ± 2.60 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 63.97 ± 0.44 | 83.80 ± 1.07 | 90.35 ± 2.16 | 77.25 ± 2.00 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 63.76 ± 0.55 | 88.04 ± 0.58 | 93.75 ± 1.82 | 82.34 ± 2.19 |
![]() ![]() Qwen 2.5 7B Alibaba | 63.06 ± 0.22 | 85.26 ± 0.17 | 89.63 ± 2.64 | 80.89 ± 2.37 |
![]() ![]() Olmo 2 0325 32B AI2 | 62.21 ± 0.39 | 84.41 ± 0.93 | 88.22 ± 2.07 | 80.59 ± 2.16 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 61.61 ± 0.90 | 82.08 ± 3.77 | 90.60 ± 1.98 | 73.56 ± 1.82 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 61.33 ± 1.19 | 79.35 ± 1.78 | 88.15 ± 2.58 | 70.55 ± 2.52 |
![]() ![]() Aya Expanse 8B CohereLabs | 61.16 ± 0.31 | 82.79 ± 0.23 | 90.15 ± 2.57 | 75.44 ± 2.57 |
![]() ![]() phi-4 14B Microsoft | 60.52 ± 1.30 | 83.96 ± 2.53 | 91.55 ± 2.06 | 76.36 ± 2.25 |
![]() ![]() Sailor2 20B SAIL | 60.28 ± 0.27 | 86.47 ± 0.10 | 94.73 ± 1.94 | 78.22 ± 2.50 |
![]() ![]() Llama 3.1 8B Meta | 59.92 ± 0.42 | 78.98 ± 0.99 | 87.52 ± 2.48 | 70.44 ± 2.37 |
![]() ![]() Llama 3 70B Meta | 59.62 ± 0.28 | 87.82 ± 0.17 | 95.03 ± 1.85 | 80.61 ± 2.41 |
![]() ![]() Command R 08-2024 32B CohereLabs | 59.42 ± 0.38 | 82.14 ± 4.66 | 92.35 ± 1.64 | 71.92 ± 1.77 |
![]() ![]() Sailor2 8B SAIL | 56.72 ± 0.64 | 84.59 ± 0.32 | 94.73 ± 1.86 | 74.45 ± 2.40 |
![]() ![]() Tulu 3 8B AI2 | 56.55 ± 0.38 | 76.52 ± 1.41 | 82.38 ± 2.78 | 70.66 ± 2.07 |
![]() ![]() Llama 3 8B Meta | 51.32 ± 0.52 | 75.89 ± 1.77 | 81.65 ± 3.01 | 70.14 ± 2.47 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.37 ± 0.30 | 64.74 ± 4.33 | 75.78 ± 2.92 | 53.71 ± 2.43 |
![]() ![]() Babel 83B Alibaba-DAMO | 49.21 ± 3.64 | 71.92 ± 5.39 | 88.45 ± 1.61 | 55.39 ± 1.67 |
![]() ![]() Babel 9B Alibaba-DAMO | 48.08 ± 1.45 | 63.24 ± 5.48 | 83.00 ± 2.59 | 43.48 ± 1.55 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 47.11 ± 2.13 | 63.23 ± 8.53 | 69.58 ± 2.49 | 56.88 ± 2.06 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 46.71 ± 1.61 | 61.32 ± 4.17 | 81.90 ± 2.35 | 40.74 ± 0.84 |
![]() ![]() Ministral 2410 8B Mistral AI | 42.77 ± 0.70 | 60.94 ± 6.66 | 65.63 ± 2.46 | 56.25 ± 1.50 |
![]() ![]() Olmo 2 1124 7B AI2 | 41.90 ± 1.18 | 53.06 ± 5.70 | 63.85 ± 2.88 | 42.27 ± 1.22 |
Model | ID | NLU | Metaphor Understanding | Question Answering | Sentiment Analysis |
---|---|---|---|---|---|
![]() ![]() Command A 03-2025 111B CohereLabs | 74.75 ± 0.66 | 83.98 ± 0.30 | 85.81 ± 3.84 | 78.02 ± 5.69 | 88.13 ± 2.98 |
![]() ![]() Qwen 3 32B Alibaba | 72.81 ± 0.18 | 85.30 ± 0.23 | 89.24 ± 3.41 | 78.36 ± 6.25 | 88.31 ± 3.03 |
![]() ![]() Qwen 3 30B MoE Alibaba | 72.36 ± 0.28 | 81.59 ± 0.16 | 84.87 ± 4.08 | 76.84 ± 6.68 | 83.06 ± 3.57 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.15 ± 0.48 | 83.80 ± 0.53 | 86.36 ± 3.60 | 78.60 ± 6.22 | 86.44 ± 3.21 |
![]() ![]() SEA-LION v4 27B AISG | 71.89 ± 0.33 | 83.98 ± 0.14 | 84.11 ± 4.11 | 81.35 ± 6.21 | 86.47 ± 3.35 |
![]() ![]() Gemma 3 27B | 71.52 ± 0.26 | 84.05 ± 0.14 | 84.15 ± 4.14 | 81.63 ± 6.14 | 86.38 ± 3.35 |
![]() ![]() Qwen 2.5 72B Alibaba | 71.09 ± 0.44 | 83.67 ± 0.18 | 85.97 ± 3.91 | 77.10 ± 6.40 | 87.94 ± 3.11 |
![]() ![]() Tulu 3 70B AI2 | 71.00 ± 0.19 | 83.25 ± 0.32 | 84.19 ± 4.05 | 75.63 ± 5.42 | 89.94 ± 2.66 |
![]() ![]() Llama 3.3 70B Meta | 70.90 ± 0.16 | 84.27 ± 0.11 | 87.88 ± 3.69 | 78.98 ± 6.65 | 85.94 ± 3.38 |
![]() ![]() Qwen 3 14B Alibaba | 70.55 ± 0.09 | 83.02 ± 0.22 | 84.07 ± 4.10 | 76.67 ± 6.16 | 88.31 ± 3.10 |
![]() ![]() Gemma 3 12B | 70.17 ± 0.23 | 84.12 ± 0.13 | 84.45 ± 4.11 | 81.30 ± 6.21 | 86.63 ± 3.31 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.72 ± 0.24 | 84.76 ± 0.10 | 86.23 ± 3.92 | 82.27 ± 6.02 | 85.78 ± 3.40 |
![]() ![]() Qwen 2.5 32B Alibaba | 69.47 ± 0.25 | 85.10 ± 0.08 | 87.75 ± 3.71 | 80.92 ± 5.98 | 86.63 ± 3.27 |
![]() ![]() Llama 3.1 70B Meta | 68.33 ± 0.40 | 82.74 ± 0.44 | 86.44 ± 3.75 | 76.04 ± 5.98 | 85.75 ± 3.27 |
![]() ![]() Aya Expanse 32B CohereLabs | 67.84 ± 0.33 | 83.97 ± 0.20 | 86.02 ± 3.89 | 77.35 ± 5.79 | 88.53 ± 3.02 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 67.80 ± 0.35 | 83.78 ± 0.59 | 86.53 ± 3.69 | 79.40 ± 6.00 | 85.41 ± 3.29 |
![]() ![]() Qwen 3 8B Alibaba | 67.80 ± 0.33 | 82.22 ± 0.23 | 81.06 ± 4.37 | 78.75 ± 6.84 | 86.84 ± 3.21 |
![]() ![]() Gemma 2 27B | 67.16 ± 0.26 | 83.86 ± 0.30 | 86.57 ± 3.80 | 78.59 ± 6.31 | 86.44 ± 3.26 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 67.08 ± 1.36 | 83.46 ± 0.33 | 83.69 ± 3.87 | 78.65 ± 5.57 | 88.03 ± 2.97 |
![]() ![]() Qwen 2.5 14B Alibaba | 66.67 ± 0.29 | 83.05 ± 0.28 | 84.24 ± 4.15 | 75.75 ± 6.37 | 89.16 ± 2.98 |
![]() ![]() MERaLiON 2 10B A*STAR | 64.29 ± 0.27 | 83.57 ± 0.35 | 86.61 ± 3.66 | 78.34 ± 5.98 | 85.75 ± 3.28 |
![]() ![]() Gemma 2 9B | 64.17 ± 0.46 | 83.12 ± 0.59 | 86.74 ± 3.67 | 79.51 ± 6.41 | 83.13 ± 3.48 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 63.97 ± 0.44 | 78.69 ± 0.77 | 80.34 ± 3.86 | 72.22 ± 6.45 | 83.50 ± 3.41 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 63.76 ± 0.55 | 81.40 ± 0.54 | 82.54 ± 4.13 | 75.26 ± 6.07 | 86.41 ± 3.00 |
![]() ![]() Qwen 2.5 7B Alibaba | 63.06 ± 0.22 | 80.52 ± 0.27 | 81.23 ± 4.44 | 74.10 ± 6.61 | 86.22 ± 3.32 |
![]() ![]() Olmo 2 0325 32B AI2 | 62.21 ± 0.39 | 77.02 ± 1.40 | 76.48 ± 4.26 | 69.72 ± 5.91 | 84.84 ± 3.06 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 61.61 ± 0.90 | 81.97 ± 0.84 | 80.76 ± 3.98 | 78.83 ± 5.54 | 86.31 ± 3.01 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 61.33 ± 1.19 | 80.86 ± 1.03 | 79.66 ± 4.17 | 74.78 ± 6.27 | 88.13 ± 3.06 |
![]() ![]() Aya Expanse 8B CohereLabs | 61.16 ± 0.31 | 80.78 ± 0.20 | 79.87 ± 4.52 | 77.34 ± 6.03 | 85.13 ± 3.42 |
![]() ![]() phi-4 14B Microsoft | 60.52 ± 1.30 | 75.86 ± 0.32 | 81.44 ± 4.24 | 58.58 ± 6.28 | 87.56 ± 3.05 |
![]() ![]() Sailor2 20B SAIL | 60.28 ± 0.27 | 79.45 ± 0.13 | 87.54 ± 3.74 | 62.10 ± 5.56 | 88.72 ± 3.07 |
![]() ![]() Llama 3.1 8B Meta | 59.92 ± 0.42 | 79.60 ± 0.26 | 80.08 ± 4.15 | 77.08 ± 6.53 | 81.63 ± 3.62 |
![]() ![]() Llama 3 70B Meta | 59.62 ± 0.28 | 81.73 ± 0.28 | 83.22 ± 4.22 | 77.27 ± 6.86 | 84.69 ± 3.47 |
![]() ![]() Command R 08-2024 32B CohereLabs | 59.42 ± 0.38 | 80.22 ± 0.61 | 83.09 ± 3.92 | 78.24 ± 5.92 | 79.34 ± 3.62 |
![]() ![]() Sailor2 8B SAIL | 56.72 ± 0.64 | 75.49 ± 0.41 | 83.60 ± 4.15 | 60.10 ± 5.56 | 82.78 ± 3.58 |
![]() ![]() Tulu 3 8B AI2 | 56.55 ± 0.38 | 71.35 ± 0.57 | 75.21 ± 4.45 | 56.12 ± 5.87 | 82.72 ± 3.42 |
![]() ![]() Llama 3 8B Meta | 51.32 ± 0.52 | 78.24 ± 0.43 | 76.99 ± 4.37 | 76.36 ± 6.81 | 81.38 ± 3.63 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.37 ± 0.30 | 71.86 ± 0.83 | 73.26 ± 4.35 | 62.56 ± 5.96 | 79.75 ± 3.31 |
![]() ![]() Babel 83B Alibaba-DAMO | 49.21 ± 3.64 | 64.52 ± 7.21 | 71.74 ± 3.41 | 44.92 ± 5.01 | 76.91 ± 2.89 |
![]() ![]() Babel 9B Alibaba-DAMO | 48.08 ± 1.45 | 71.53 ± 5.35 | 71.44 ± 3.98 | 71.21 ± 5.95 | 71.94 ± 3.01 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 47.11 ± 2.13 | 71.48 ± 3.47 | 65.76 ± 3.49 | 70.39 ± 6.30 | 78.28 ± 3.75 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 46.71 ± 1.61 | 65.78 ± 4.65 | 72.42 ± 4.02 | 69.82 ± 5.72 | 55.09 ± 3.31 |
![]() ![]() Ministral 2410 8B Mistral AI | 42.77 ± 0.70 | 68.76 ± 2.19 | 62.84 ± 4.05 | 68.23 ± 6.12 | 75.22 ± 2.89 |
![]() ![]() Olmo 2 1124 7B AI2 | 41.90 ± 1.18 | 64.69 ± 4.19 | 61.95 ± 3.70 | 64.91 ± 6.30 | 67.22 ± 3.03 |
Model | ID | Safety | Toxicity Detection |
---|---|---|---|
![]() ![]() Command A 03-2025 111B CohereLabs | 74.75 ± 0.66 | 74.09 ± 0.92 | 74.09 ± 2.31 |
![]() ![]() Qwen 3 32B Alibaba | 72.81 ± 0.18 | 65.15 ± 1.72 | 65.15 ± 2.81 |
![]() ![]() Qwen 3 30B MoE Alibaba | 72.36 ± 0.28 | 67.28 ± 0.19 | 67.27 ± 2.87 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.15 ± 0.48 | 69.33 ± 1.25 | 69.33 ± 2.59 |
![]() ![]() SEA-LION v4 27B AISG | 71.89 ± 0.33 | 61.91 ± 1.53 | 61.91 ± 2.90 |
![]() ![]() Gemma 3 27B | 71.52 ± 0.26 | 60.55 ± 0.84 | 60.55 ± 2.97 |
![]() ![]() Qwen 2.5 72B Alibaba | 71.09 ± 0.44 | 61.35 ± 1.35 | 61.35 ± 2.92 |
![]() ![]() Tulu 3 70B AI2 | 71.00 ± 0.19 | 72.08 ± 1.04 | 72.08 ± 2.50 |
![]() ![]() Llama 3.3 70B Meta | 70.90 ± 0.16 | 70.09 ± 0.22 | 70.09 ± 2.78 |
![]() ![]() Qwen 3 14B Alibaba | 70.55 ± 0.09 | 75.21 ± 0.19 | 75.21 ± 2.58 |
![]() ![]() Gemma 3 12B | 70.17 ± 0.23 | 61.10 ± 0.83 | 61.10 ± 2.94 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.72 ± 0.24 | 68.71 ± 0.10 | 68.71 ± 2.86 |
![]() ![]() Qwen 2.5 32B Alibaba | 69.47 ± 0.25 | 69.24 ± 0.59 | 69.24 ± 2.79 |
![]() ![]() Llama 3.1 70B Meta | 68.33 ± 0.40 | 68.97 ± 1.47 | 68.97 ± 2.68 |
![]() ![]() Aya Expanse 32B CohereLabs | 67.84 ± 0.33 | 68.92 ± 0.59 | 68.92 ± 2.79 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 67.80 ± 0.35 | 59.19 ± 1.14 | 59.19 ± 2.85 |
![]() ![]() Qwen 3 8B Alibaba | 67.80 ± 0.33 | 67.24 ± 1.49 | 67.24 ± 2.72 |
![]() ![]() Gemma 2 27B | 67.16 ± 0.26 | 60.23 ± 0.99 | 60.22 ± 2.88 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 67.08 ± 1.36 | 58.64 ± 9.36 | 58.64 ± 2.27 |
![]() ![]() Qwen 2.5 14B Alibaba | 66.67 ± 0.29 | 66.56 ± 0.72 | 66.56 ± 2.82 |
![]() ![]() MERaLiON 2 10B A*STAR | 64.29 ± 0.27 | 63.31 ± 0.96 | 63.31 ± 2.83 |
![]() ![]() Gemma 2 9B | 64.17 ± 0.46 | 55.84 ± 0.78 | 55.84 ± 2.94 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 63.97 ± 0.44 | 64.42 ± 2.65 | 64.42 ± 2.60 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 63.76 ± 0.55 | 66.30 ± 1.53 | 66.30 ± 2.42 |
![]() ![]() Qwen 2.5 7B Alibaba | 63.06 ± 0.22 | 64.19 ± 0.72 | 64.19 ± 2.88 |
![]() ![]() Olmo 2 0325 32B AI2 | 62.21 ± 0.39 | 65.84 ± 3.48 | 65.84 ± 2.59 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 61.61 ± 0.90 | 66.51 ± 5.16 | 66.51 ± 2.30 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 61.33 ± 1.19 | 61.40 ± 1.34 | 61.40 ± 2.87 |
![]() ![]() Aya Expanse 8B CohereLabs | 61.16 ± 0.31 | 64.55 ± 2.38 | 64.55 ± 2.68 |
![]() ![]() phi-4 14B Microsoft | 60.52 ± 1.30 | 47.23 ± 10.77 | 47.23 ± 2.32 |
![]() ![]() Sailor2 20B SAIL | 60.28 ± 0.27 | 68.28 ± 0.55 | 68.27 ± 2.81 |
![]() ![]() Llama 3.1 8B Meta | 59.92 ± 0.42 | 62.95 ± 1.19 | 62.95 ± 2.76 |
![]() ![]() Llama 3 70B Meta | 59.62 ± 0.28 | 68.08 ± 0.88 | 68.08 ± 2.80 |
![]() ![]() Command R 08-2024 32B CohereLabs | 59.42 ± 0.38 | 57.81 ± 7.70 | 57.81 ± 2.42 |
![]() ![]() Sailor2 8B SAIL | 56.72 ± 0.64 | 63.89 ± 2.63 | 63.89 ± 2.68 |
![]() ![]() Tulu 3 8B AI2 | 56.55 ± 0.38 | 67.59 ± 0.78 | 67.59 ± 2.64 |
![]() ![]() Llama 3 8B Meta | 51.32 ± 0.52 | 49.35 ± 1.70 | 49.35 ± 2.97 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.37 ± 0.30 | 43.66 ± 3.72 | 43.66 ± 2.74 |
![]() ![]() Babel 83B Alibaba-DAMO | 49.21 ± 3.64 | 33.75 ± 13.77 | 33.75 ± 1.66 |
![]() ![]() Babel 9B Alibaba-DAMO | 48.08 ± 1.45 | 56.16 ± 3.50 | 56.16 ± 2.67 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 47.11 ± 2.13 | 42.16 ± 5.20 | 42.16 ± 2.63 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 46.71 ± 1.61 | 42.08 ± 7.92 | 42.08 ± 2.48 |
![]() ![]() Ministral 2410 8B Mistral AI | 42.77 ± 0.70 | 35.16 ± 8.11 | 35.16 ± 2.33 |
![]() ![]() Olmo 2 1124 7B AI2 | 41.90 ± 1.18 | 29.66 ± 8.82 | 29.66 ± 1.73 |
Model | ID | Knowledge | Global MMLU Lite |
---|---|---|---|
![]() ![]() Command A 03-2025 111B CohereLabs | 74.75 ± 0.66 | 76.66 ± 1.16 | 76.66 ± 1.16 |
![]() ![]() Qwen 3 32B Alibaba | 72.81 ± 0.18 | 74.97 ± 0.70 | 74.97 ± 0.70 |
![]() ![]() Qwen 3 30B MoE Alibaba | 72.36 ± 0.28 | 75.44 ± 0.24 | 75.44 ± 0.24 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.15 ± 0.48 | 80.63 ± 0.77 | 80.63 ± 0.77 |
![]() ![]() SEA-LION v4 27B AISG | 71.89 ± 0.33 | 73.66 ± 0.48 | 73.66 ± 0.48 |
![]() ![]() Gemma 3 27B | 71.52 ± 0.26 | 73.78 ± 0.33 | 73.78 ± 0.33 |
![]() ![]() Qwen 2.5 72B Alibaba | 71.09 ± 0.44 | 79.91 ± 0.43 | 79.91 ± 0.43 |
![]() ![]() Tulu 3 70B AI2 | 71.00 ± 0.19 | 73.44 ± 0.86 | 73.44 ± 0.86 |
![]() ![]() Llama 3.3 70B Meta | 70.90 ± 0.16 | 79.31 ± 0.22 | 79.31 ± 0.22 |
![]() ![]() Qwen 3 14B Alibaba | 70.55 ± 0.09 | 68.88 ± 0.58 | 68.88 ± 0.58 |
![]() ![]() Gemma 3 12B | 70.17 ± 0.23 | 71.34 ± 0.18 | 71.34 ± 0.18 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.72 ± 0.24 | 75.38 ± 0.09 | 75.38 ± 0.09 |
![]() ![]() Qwen 2.5 32B Alibaba | 69.47 ± 0.25 | 74.72 ± 0.31 | 74.72 ± 0.31 |
![]() ![]() Llama 3.1 70B Meta | 68.33 ± 0.40 | 77.78 ± 0.64 | 77.78 ± 0.64 |
![]() ![]() Aya Expanse 32B CohereLabs | 67.84 ± 0.33 | 64.78 ± 0.35 | 64.78 ± 0.35 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 67.80 ± 0.35 | 68.47 ± 0.52 | 68.47 ± 0.52 |
![]() ![]() Qwen 3 8B Alibaba | 67.80 ± 0.33 | 69.19 ± 0.37 | 69.19 ± 0.37 |
![]() ![]() Gemma 2 27B | 67.16 ± 0.26 | 70.91 ± 0.24 | 70.91 ± 0.24 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 67.08 ± 1.36 | 71.56 ± 1.80 | 71.56 ± 1.80 |
![]() ![]() Qwen 2.5 14B Alibaba | 66.67 ± 0.29 | 69.44 ± 0.27 | 69.44 ± 0.27 |
![]() ![]() MERaLiON 2 10B A*STAR | 64.29 ± 0.27 | 65.28 ± 0.74 | 65.28 ± 0.74 |
![]() ![]() Gemma 2 9B | 64.17 ± 0.46 | 67.13 ± 0.41 | 67.13 ± 0.41 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 63.97 ± 0.44 | 62.75 ± 0.55 | 62.75 ± 0.55 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 63.76 ± 0.55 | 69.75 ± 0.78 | 69.75 ± 0.78 |
![]() ![]() Qwen 2.5 7B Alibaba | 63.06 ± 0.22 | 63.78 ± 0.17 | 63.78 ± 0.17 |
![]() ![]() Olmo 2 0325 32B AI2 | 62.21 ± 0.39 | 63.41 ± 1.13 | 63.41 ± 1.13 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 61.61 ± 0.90 | 54.28 ± 3.07 | 54.28 ± 3.07 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 61.33 ± 1.19 | 53.63 ± 5.76 | 53.63 ± 5.76 |
![]() ![]() Aya Expanse 8B CohereLabs | 61.16 ± 0.31 | 55.81 ± 0.38 | 55.81 ± 0.38 |
![]() ![]() phi-4 14B Microsoft | 60.52 ± 1.30 | 70.59 ± 0.87 | 70.59 ± 0.87 |
![]() ![]() Sailor2 20B SAIL | 60.28 ± 0.27 | 60.41 ± 1.40 | 60.41 ± 1.40 |
![]() ![]() Llama 3.1 8B Meta | 59.92 ± 0.42 | 55.66 ± 0.51 | 55.66 ± 0.51 |
![]() ![]() Llama 3 70B Meta | 59.62 ± 0.28 | 73.00 ± 0.29 | 73.00 ± 0.29 |
![]() ![]() Command R 08-2024 32B CohereLabs | 59.42 ± 0.38 | 58.81 ± 1.88 | 58.81 ± 1.88 |
![]() ![]() Sailor2 8B SAIL | 56.72 ± 0.64 | 61.66 ± 0.43 | 61.66 ± 0.43 |
![]() ![]() Tulu 3 8B AI2 | 56.55 ± 0.38 | 51.75 ± 0.99 | 51.75 ± 0.99 |
![]() ![]() Llama 3 8B Meta | 51.32 ± 0.52 | 54.56 ± 0.86 | 54.56 ± 0.86 |
![]() ![]() Olmo 2 1124 13B AI2 | 50.37 ± 0.30 | 46.72 ± 0.98 | 46.72 ± 0.98 |
![]() ![]() Babel 83B Alibaba-DAMO | 49.21 ± 3.64 | 62.81 ± 7.60 | 62.81 ± 7.60 |
![]() ![]() Babel 9B Alibaba-DAMO | 48.08 ± 1.45 | 43.28 ± 6.75 | 43.28 ± 6.75 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 47.11 ± 2.13 | 49.31 ± 7.37 | 49.31 ± 7.37 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 46.71 ± 1.61 | 43.59 ± 4.64 | 43.59 ± 4.64 |
![]() ![]() Ministral 2410 8B Mistral AI | 42.77 ± 0.70 | 42.75 ± 2.68 | 42.75 ± 2.68 |
![]() ![]() Olmo 2 1124 7B AI2 | 41.90 ± 1.18 | 35.44 ± 2.72 | 35.44 ± 2.72 |