English Performance
English Scores by Model
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 32B 73.82±0.29 |
![]() ![]() 70B 72.16±0.15 |
![]() ![]() 14B 71.66±0.24 |
![]() ![]() 70B 71.35±0.45 |
![]() ![]() 27B 70.90±0.24 |
![]() ![]() 27B 70.89±0.29 |
![]() ![]() 109B MoE 70.38±0.18 |
![]() ![]() 111B 70.32±0.35 |
![]() ![]() 72B 70.11±0.35 |
![]() ![]() 123B 69.92±0.31 |
![]() ![]() 30B MoE 69.49±0.13 |
![]() ![]() 21B MoE 68.63±0.35 |
![]() ![]() 8B 68.26±0.34 |
![]() ![]() 70B 66.39±0.24 |
![]() ![]() 12B 65.95±0.15 |
![]() ![]() 70B 65.81±0.19 |
![]() ![]() 32B 65.77±0.39 |
![]() ![]() 14B 64.86±0.39 |
![]() ![]() 14B 64.21±0.28 |
![]() ![]() 70B 59.21±0.28 |
![]() ![]() 24B 58.03±1.82 |
![]() ![]() 8B 55.86±0.16 |
![]() ![]() 9B 54.98±0.47 |
![]() ![]() 7B 54.24±0.36 |
![]() ![]() 27B 52.83±0.40 |
![]() ![]() 8B 50.37±0.25 |
![]() ![]() 8B 49.54±0.11 |
![]() ![]() 32B 47.94±0.32 |
![]() ![]() 9B 44.79±0.85 |
![]() ![]() 20B 44.04±0.14 |
![]() ![]() 7B 43.51±0.50 |
![]() ![]() 10B 43.43±0.74 |
![]() ![]() 13B 43.29±0.46 |
![]() ![]() 32B 42.66±0.53 |
![]() ![]() 104B 41.67±0.49 |
![]() ![]() 32B 41.50±0.66 |
![]() ![]() 8B 41.46±0.39 |
![]() ![]() 83B 40.23±0.58 |
![]() ![]() 8B 38.63±0.55 |
![]() ![]() 8B 37.28±0.51 |
![]() ![]() 7B 36.68±0.34 |
![]() ![]() 9B 33.47±0.75 |
![]() ![]() 7B 32.62±1.04 |
![]() ![]() 8B 27.50±0.32 |
English Competencies
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | EN | English Tasks |
---|---|---|
![]() ![]() Qwen 3 32B Alibaba | 73.82 ± 0.29 | 73.82 ± 0.29 |
![]() ![]() Llama 3.3 70B Meta | 72.16 ± 0.15 | 72.16 ± 0.15 |
![]() ![]() Qwen 3 14B Alibaba | 71.66 ± 0.24 | 71.66 ± 0.24 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 71.35 ± 0.45 | 71.35 ± 0.45 |
![]() ![]() Gemma 3 27B | 70.90 ± 0.24 | 70.90 ± 0.24 |
![]() ![]() SEA-LION v4 27B AISG | 70.89 ± 0.29 | 70.89 ± 0.29 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 70.38 ± 0.18 | 70.38 ± 0.18 |
![]() ![]() Command A 03-2025 111B CohereLabs | 70.32 ± 0.35 | 70.32 ± 0.35 |
![]() ![]() Qwen 2.5 72B Alibaba | 70.11 ± 0.35 | 70.11 ± 0.35 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 69.92 ± 0.31 | 69.92 ± 0.31 |
![]() ![]() Qwen 3 30B MoE Alibaba | 69.49 ± 0.13 | 69.49 ± 0.13 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 68.63 ± 0.35 | 68.63 ± 0.35 |
![]() ![]() Qwen 3 8B Alibaba | 68.26 ± 0.34 | 68.26 ± 0.34 |
![]() ![]() Llama 3.1 70B Meta | 66.39 ± 0.24 | 66.39 ± 0.24 |
![]() ![]() Gemma 3 12B | 65.95 ± 0.15 | 65.95 ± 0.15 |
![]() ![]() Tulu 3 70B AI2 | 65.81 ± 0.19 | 65.81 ± 0.19 |
![]() ![]() Qwen 2.5 32B Alibaba | 65.77 ± 0.39 | 65.77 ± 0.39 |
![]() ![]() phi-4 14B Microsoft | 64.86 ± 0.39 | 64.86 ± 0.39 |
![]() ![]() Qwen 2.5 14B Alibaba | 64.21 ± 0.28 | 64.21 ± 0.28 |
![]() ![]() Llama 3 70B Meta | 59.21 ± 0.28 | 59.21 ± 0.28 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 58.03 ± 1.82 | 58.03 ± 1.82 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 55.86 ± 0.16 | 55.86 ± 0.16 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 54.98 ± 0.47 | 54.98 ± 0.47 |
![]() ![]() Qwen 2.5 7B Alibaba | 54.24 ± 0.36 | 54.24 ± 0.36 |
![]() ![]() Gemma 2 27B | 52.83 ± 0.40 | 52.83 ± 0.40 |
![]() ![]() Llama 3.1 8B Meta | 50.37 ± 0.25 | 50.37 ± 0.25 |
![]() ![]() Tulu 3 8B AI2 | 49.54 ± 0.11 | 49.54 ± 0.11 |
![]() ![]() Aya Expanse 32B CohereLabs | 47.94 ± 0.32 | 47.94 ± 0.32 |
![]() ![]() Gemma 2 9B | 44.79 ± 0.85 | 44.79 ± 0.85 |
![]() ![]() Sailor2 20B SAIL | 44.04 ± 0.14 | 44.04 ± 0.14 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 43.51 ± 0.50 | 43.51 ± 0.50 |
![]() ![]() MERaLiON 2 10B A*STAR | 43.43 ± 0.74 | 43.43 ± 0.74 |
![]() ![]() Olmo 2 1124 13B AI2 | 43.29 ± 0.46 | 43.29 ± 0.46 |
![]() ![]() Olmo 2 0325 32B AI2 | 42.66 ± 0.53 | 42.66 ± 0.53 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 41.67 ± 0.49 | 41.67 ± 0.49 |
![]() ![]() Command R 08-2024 32B CohereLabs | 41.50 ± 0.66 | 41.50 ± 0.66 |
![]() ![]() Llama 3 8B Meta | 41.46 ± 0.39 | 41.46 ± 0.39 |
![]() ![]() Babel 83B Alibaba-DAMO | 40.23 ± 0.58 | 40.23 ± 0.58 |
![]() ![]() Ministral 2410 8B Mistral AI | 38.63 ± 0.55 | 38.63 ± 0.55 |
![]() ![]() Aya Expanse 8B CohereLabs | 37.28 ± 0.51 | 37.28 ± 0.51 |
![]() ![]() Olmo 2 1124 7B AI2 | 36.68 ± 0.34 | 36.68 ± 0.34 |
![]() ![]() Babel 9B Alibaba-DAMO | 33.47 ± 0.75 | 33.47 ± 0.75 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 32.62 ± 1.04 | 32.62 ± 1.04 |
![]() ![]() Sailor2 8B SAIL | 27.50 ± 0.32 | 27.50 ± 0.32 |
English Tasks
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | EN | English Tasks | bbh | gpqa | ifeval | math_hard | mmlu_pro | musr |
---|---|---|---|---|---|---|---|---|
![]() ![]() Qwen 3 32B Alibaba | 73.82 ± 0.29 | 73.82 ± 0.29 | 88.86 ± 0.11 | 51.81 ± 3.48 | 83.71 ± 2.74 | 68.85 ± 0.70 | 69.22 ± 0.22 | 80.48 ± 0.42 |
![]() ![]() Llama 3.3 70B Meta | 72.16 ± 0.15 | 72.16 ± 0.15 | 89.56 ± 0.20 | 57.73 ± 3.86 | 88.42 ± 2.51 | 52.62 ± 0.48 | 69.00 ± 0.10 | 75.63 ± 0.57 |
![]() ![]() Qwen 3 14B Alibaba | 71.66 ± 0.24 | 71.66 ± 0.24 | 85.91 ± 0.15 | 47.10 ± 3.57 | 85.42 ± 2.71 | 66.98 ± 0.44 | 67.31 ± 0.17 | 77.26 ± 0.52 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 71.35 ± 0.45 | 71.35 ± 0.45 | 89.18 ± 0.33 | 50.59 ± 3.44 | 85.77 ± 2.49 | 55.80 ± 0.46 | 69.88 ± 0.20 | 76.88 ± 0.84 |
![]() ![]() Gemma 3 27B | 70.90 ± 0.24 | 70.90 ± 0.24 | 88.39 ± 0.18 | 42.72 ± 3.48 | 81.40 ± 2.99 | 73.34 ± 0.72 | 65.58 ± 0.13 | 73.97 ± 0.96 |
![]() ![]() SEA-LION v4 27B AISG | 70.89 ± 0.29 | 70.89 ± 0.29 | 89.08 ± 0.25 | 41.94 ± 3.40 | 80.18 ± 3.02 | 73.13 ± 0.57 | 65.49 ± 0.16 | 75.55 ± 0.48 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 70.38 ± 0.18 | 70.38 ± 0.18 | 85.60 ± 0.18 | 51.62 ± 3.89 | 84.33 ± 2.79 | 63.60 ± 0.35 | 62.72 ± 0.39 | 74.38 ± 0.38 |
![]() ![]() Command A 03-2025 111B CohereLabs | 70.32 ± 0.35 | 70.32 ± 0.35 | 89.16 ± 0.19 | 35.32 ± 3.08 | 86.81 ± 2.45 | 57.93 ± 0.80 | 68.76 ± 0.16 | 83.95 ± 0.78 |
![]() ![]() Qwen 2.5 72B Alibaba | 70.11 ± 0.35 | 70.11 ± 0.35 | 86.59 ± 0.37 | 45.09 ± 3.69 | 83.20 ± 2.74 | 61.59 ± 0.47 | 69.81 ± 0.12 | 74.40 ± 1.35 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 69.92 ± 0.31 | 69.92 ± 0.31 | 85.39 ± 0.47 | 50.06 ± 3.42 | 79.00 ± 2.82 | 52.29 ± 0.62 | 69.92 ± 0.15 | 82.86 ± 0.95 |
![]() ![]() Qwen 3 30B MoE Alibaba | 69.49 ± 0.13 | 69.49 ± 0.13 | 87.17 ± 0.23 | 37.03 ± 3.96 | 83.16 ± 2.87 | 66.49 ± 0.57 | 68.02 ± 0.13 | 75.10 ± 0.87 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 68.63 ± 0.35 | 68.63 ± 0.35 | 83.74 ± 0.34 | 55.72 ± 3.67 | 79.53 ± 2.91 | 63.02 ± 0.89 | 60.48 ± 0.21 | 69.28 ± 1.12 |
![]() ![]() Qwen 3 8B Alibaba | 68.26 ± 0.34 | 68.26 ± 0.34 | 83.05 ± 0.15 | 41.80 ± 3.45 | 82.37 ± 2.85 | 64.44 ± 0.45 | 62.82 ± 0.21 | 75.10 ± 0.76 |
![]() ![]() Llama 3.1 70B Meta | 66.39 ± 0.24 | 66.39 ± 0.24 | 87.15 ± 0.13 | 45.65 ± 3.48 | 83.80 ± 2.68 | 40.01 ± 0.46 | 67.01 ± 0.13 | 74.76 ± 0.68 |
![]() ![]() Gemma 3 12B | 65.95 ± 0.15 | 65.95 ± 0.15 | 84.16 ± 0.19 | 36.77 ± 3.26 | 78.35 ± 3.08 | 64.31 ± 0.83 | 58.76 ± 0.14 | 73.35 ± 0.60 |
![]() ![]() Tulu 3 70B AI2 | 65.81 ± 0.19 | 65.81 ± 0.19 | 86.58 ± 0.27 | 45.31 ± 3.54 | 79.83 ± 2.95 | 45.39 ± 0.49 | 64.04 ± 0.21 | 73.73 ± 0.69 |
![]() ![]() Qwen 2.5 32B Alibaba | 65.77 ± 0.39 | 65.77 ± 0.39 | 74.36 ± 1.20 | 42.47 ± 3.61 | 79.02 ± 3.00 | 57.33 ± 0.37 | 68.97 ± 0.16 | 72.48 ± 0.68 |
![]() ![]() phi-4 14B Microsoft | 64.86 ± 0.39 | 64.86 ± 0.39 | 84.58 ± 0.40 | 47.94 ± 3.54 | 59.06 ± 3.43 | 61.53 ± 0.43 | 61.93 ± 0.61 | 74.11 ± 0.88 |
![]() ![]() Qwen 2.5 14B Alibaba | 64.21 ± 0.28 | 64.21 ± 0.28 | 80.30 ± 0.33 | 39.56 ± 3.48 | 78.37 ± 2.99 | 55.13 ± 0.84 | 64.29 ± 0.17 | 67.61 ± 0.70 |
![]() ![]() Llama 3 70B Meta | 59.21 ± 0.28 | 59.21 ± 0.28 | 85.01 ± 0.14 | 37.97 ± 3.59 | 77.20 ± 3.15 | 24.28 ± 0.56 | 60.11 ± 0.24 | 70.70 ± 1.18 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 58.03 ± 1.82 | 58.03 ± 1.82 | 66.52 ± 5.09 | 44.03 ± 3.25 | 69.80 ± 2.99 | 43.58 ± 1.05 | 51.68 ± 4.84 | 72.60 ± 1.20 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 55.86 ± 0.16 | 55.86 ± 0.16 | 77.92 ± 0.42 | 35.21 ± 3.02 | 78.56 ± 2.80 | 27.56 ± 0.68 | 54.54 ± 0.26 | 61.38 ± 0.89 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 54.98 ± 0.47 | 54.98 ± 0.47 | 74.47 ± 0.42 | 37.53 ± 3.43 | 75.92 ± 2.98 | 28.83 ± 0.85 | 54.22 ± 0.97 | 58.89 ± 1.07 |
![]() ![]() Qwen 2.5 7B Alibaba | 54.24 ± 0.36 | 54.24 ± 0.36 | 71.00 ± 0.36 | 32.11 ± 3.08 | 71.05 ± 3.33 | 48.71 ± 0.47 | 55.16 ± 0.22 | 47.42 ± 1.05 |
![]() ![]() Gemma 2 27B | 52.83 ± 0.40 | 52.83 ± 0.40 | 74.22 ± 0.53 | 33.98 ± 3.17 | 74.79 ± 3.04 | 25.52 ± 0.47 | 51.63 ± 2.10 | 56.81 ± 0.92 |
![]() ![]() Llama 3.1 8B Meta | 50.37 ± 0.25 | 50.37 ± 0.25 | 71.20 ± 0.11 | 28.13 ± 2.81 | 74.40 ± 3.01 | 22.28 ± 0.90 | 47.61 ± 0.25 | 58.60 ± 1.37 |
![]() ![]() Tulu 3 8B AI2 | 49.54 ± 0.11 | 49.54 ± 0.11 | 65.53 ± 0.19 | 31.50 ± 2.94 | 78.90 ± 3.02 | 19.99 ± 0.55 | 45.49 ± 0.22 | 55.82 ± 0.58 |
![]() ![]() Aya Expanse 32B CohereLabs | 47.94 ± 0.32 | 47.94 ± 0.32 | 70.17 ± 0.44 | 30.11 ± 2.77 | 67.91 ± 3.36 | 14.76 ± 0.44 | 47.34 ± 0.31 | 57.36 ± 0.53 |
![]() ![]() Gemma 2 9B | 44.79 ± 0.85 | 44.79 ± 0.85 | 65.34 ± 0.83 | 27.68 ± 3.11 | 69.22 ± 3.21 | 19.24 ± 0.58 | 34.94 ± 3.45 | 52.32 ± 0.58 |
![]() ![]() Sailor2 20B SAIL | 44.04 ± 0.14 | 44.04 ± 0.14 | 52.14 ± 0.37 | 35.63 ± 3.69 | 34.17 ± 3.74 | 40.20 ± 0.46 | 52.37 ± 0.30 | 49.71 ± 0.40 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 43.51 ± 0.50 | 43.51 ± 0.50 | 66.67 ± 0.42 | 24.86 ± 2.27 | 67.65 ± 2.96 | 20.91 ± 0.42 | 32.12 ± 1.02 | 48.87 ± 2.35 |
![]() ![]() MERaLiON 2 10B A*STAR | 43.43 ± 0.74 | 43.43 ± 0.74 | 65.77 ± 1.43 | 25.14 ± 2.84 | 69.15 ± 3.19 | 17.03 ± 0.73 | 34.61 ± 4.02 | 48.88 ± 1.17 |
![]() ![]() Olmo 2 1124 13B AI2 | 43.29 ± 0.46 | 43.29 ± 0.46 | 58.78 ± 0.25 | 22.57 ± 2.30 | 73.38 ± 3.09 | 16.01 ± 0.52 | 40.29 ± 0.42 | 48.71 ± 2.05 |
![]() ![]() Olmo 2 0325 32B AI2 | 42.66 ± 0.53 | 42.66 ± 0.53 | 66.26 ± 0.66 | 9.85 ± 1.55 | 80.57 ± 2.78 | 20.26 ± 0.54 | 50.76 ± 0.28 | 28.25 ± 2.20 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 41.67 ± 0.49 | 41.67 ± 0.49 | 70.18 ± 0.34 | 16.35 ± 1.74 | 69.69 ± 3.11 | 9.94 ± 0.66 | 42.68 ± 0.56 | 41.16 ± 1.60 |
![]() ![]() Command R 08-2024 32B CohereLabs | 41.50 ± 0.66 | 41.50 ± 0.66 | 65.55 ± 0.63 | 28.26 ± 2.15 | 62.06 ± 3.41 | 5.95 ± 0.31 | 40.11 ± 0.34 | 47.09 ± 2.91 |
![]() ![]() Llama 3 8B Meta | 41.46 ± 0.39 | 41.46 ± 0.39 | 65.69 ± 0.21 | 19.53 ± 2.21 | 67.74 ± 3.28 | 8.01 ± 0.45 | 36.39 ± 0.40 | 51.40 ± 1.39 |
![]() ![]() Babel 83B Alibaba-DAMO | 40.23 ± 0.58 | 40.23 ± 0.58 | 66.01 ± 0.46 | 32.87 ± 2.41 | 30.08 ± 3.07 | 13.22 ± 1.92 | 52.30 ± 0.84 | 46.87 ± 1.82 |
![]() ![]() Ministral 2410 8B Mistral AI | 38.63 ± 0.55 | 38.63 ± 0.55 | 56.38 ± 0.78 | 26.06 ± 2.22 | 46.90 ± 3.19 | 19.47 ± 0.30 | 33.65 ± 0.52 | 49.34 ± 1.23 |
![]() ![]() Aya Expanse 8B CohereLabs | 37.28 ± 0.51 | 37.28 ± 0.51 | 53.49 ± 0.46 | 26.31 ± 2.57 | 57.58 ± 3.57 | 7.14 ± 0.23 | 32.98 ± 0.32 | 46.17 ± 1.36 |
![]() ![]() Olmo 2 1124 7B AI2 | 36.68 ± 0.34 | 36.68 ± 0.34 | 47.31 ± 0.41 | 23.30 ± 2.17 | 66.89 ± 3.30 | 11.77 ± 0.32 | 32.86 ± 0.28 | 37.97 ± 1.79 |
![]() ![]() Babel 9B Alibaba-DAMO | 33.47 ± 0.75 | 33.47 ± 0.75 | 61.25 ± 0.53 | 24.25 ± 2.18 | 28.37 ± 3.28 | 12.14 ± 1.63 | 39.05 ± 0.54 | 35.77 ± 1.07 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 32.62 ± 1.04 | 32.62 ± 1.04 | 58.01 ± 0.59 | 21.88 ± 1.89 | 38.24 ± 3.22 | 15.66 ± 1.34 | 39.73 ± 0.32 | 22.24 ± 5.46 |
![]() ![]() Sailor2 8B SAIL | 27.50 ± 0.32 | 27.50 ± 0.32 | 50.27 ± 0.34 | 23.72 ± 2.46 | 30.50 ± 3.51 | 12.08 ± 0.53 | 21.12 ± 0.37 | 27.31 ± 1.92 |