English Performance
English Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 32B 68.02±0.15 |
![]() ![]() 80B MoE 67.31±0.10 |
![]() ![]() 32B 67.10±0.17 |
![]() ![]() 70B 66.50±0.16 |
![]() ![]() 70B 65.20±0.18 |
![]() ![]() 14B 65.00±0.16 |
![]() ![]() 123B 64.31±0.15 |
![]() ![]() 111B 63.92±0.18 |
![]() ![]() 109B MoE 63.86±0.14 |
![]() ![]() 27B 63.68±0.21 |
![]() ![]() 27B 63.55±0.17 |
![]() ![]() 72B 63.38±0.18 |
![]() ![]() 30B MoE 62.49±0.15 |
![]() ![]() 21B MoE 61.47±0.16 |
![]() ![]() 8B 60.57±0.18 |
![]() ![]() 70B 59.59±0.18 |
![]() ![]() 70B 58.73±0.19 |
![]() ![]() 32B 58.61±0.16 |
![]() ![]() 12B 58.06±0.21 |
![]() ![]() 14B 57.87±0.20 |
![]() ![]() 14B 55.84±0.16 |
![]() ![]() 70B 51.23±0.14 |
![]() ![]() 24B 49.53±0.17 |
![]() ![]() 8B 46.31±0.28 |
![]() ![]() 9B 45.14±0.19 |
![]() ![]() 7B 43.29±0.18 |
![]() ![]() 27B 42.75±0.24 |
![]() ![]() 8B 39.62±0.20 |
![]() ![]() 8B 38.12±0.17 |
![]() ![]() 32B 37.21±0.26 |
![]() ![]() 32B 34.44±0.09 |
![]() ![]() 9B 32.65±0.19 |
![]() ![]() 20B 32.57±0.14 |
![]() ![]() 7B 31.63±0.15 |
![]() ![]() 104B 31.53±0.15 |
![]() ![]() 10B 31.41±0.19 |
![]() ![]() 13B 31.12±0.17 |
![]() ![]() 8B 29.88±0.17 |
![]() ![]() 32B 29.20±0.24 |
![]() ![]() 83B 29.20±0.20 |
![]() ![]() 70B 26.92±0.17 |
![]() ![]() 8B 26.03±0.26 |
![]() ![]() 8B 24.06±0.21 |
![]() ![]() 7B 23.09±0.13 |
![]() ![]() 8B 22.00±0.22 |
![]() ![]() 7B 21.87±0.11 |
![]() ![]() 9B 20.99±0.13 |
![]() ![]() 8B 15.26±0.11 |
English Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | EN | English Tasks |
---|---|---|
![]() ![]() Qwen 3 32B Alibaba | 68.02 ± 0.15 | 68.02 ± 0.15 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 67.31 ± 0.10 | 67.31 ± 0.10 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 67.10 ± 0.17 | 67.10 ± 0.17 |
![]() ![]() Llama 3.3 70B Meta | 66.50 ± 0.16 | 66.50 ± 0.16 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 65.20 ± 0.18 | 65.20 ± 0.18 |
![]() ![]() Qwen 3 14B Alibaba | 65.00 ± 0.16 | 65.00 ± 0.16 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 64.31 ± 0.15 | 64.31 ± 0.15 |
![]() ![]() Command A 03-2025 111B CohereLabs | 63.92 ± 0.18 | 63.92 ± 0.18 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 63.86 ± 0.14 | 63.86 ± 0.14 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 63.68 ± 0.21 | 63.68 ± 0.21 |
![]() ![]() Gemma 3 27B | 63.55 ± 0.17 | 63.55 ± 0.17 |
![]() ![]() Qwen 2.5 72B Alibaba | 63.38 ± 0.18 | 63.38 ± 0.18 |
![]() ![]() Qwen 3 30B MoE Alibaba | 62.49 ± 0.15 | 62.49 ± 0.15 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 61.47 ± 0.16 | 61.47 ± 0.16 |
![]() ![]() Qwen 3 8B Alibaba | 60.57 ± 0.18 | 60.57 ± 0.18 |
![]() ![]() Llama 3.1 70B Meta | 59.59 ± 0.18 | 59.59 ± 0.18 |
![]() ![]() Tulu 3 70B AI2 | 58.73 ± 0.19 | 58.73 ± 0.19 |
![]() ![]() Qwen 2.5 32B Alibaba | 58.61 ± 0.16 | 58.61 ± 0.16 |
![]() ![]() Gemma 3 12B | 58.06 ± 0.21 | 58.06 ± 0.21 |
![]() ![]() phi-4 14B Microsoft | 57.87 ± 0.20 | 57.87 ± 0.20 |
![]() ![]() Qwen 2.5 14B Alibaba | 55.84 ± 0.16 | 55.84 ± 0.16 |
![]() ![]() Llama 3 70B Meta | 51.23 ± 0.14 | 51.23 ± 0.14 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 49.53 ± 0.17 | 49.53 ± 0.17 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 46.31 ± 0.28 | 46.31 ± 0.28 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 45.14 ± 0.19 | 45.14 ± 0.19 |
![]() ![]() Qwen 2.5 7B Alibaba | 43.29 ± 0.18 | 43.29 ± 0.18 |
![]() ![]() Gemma 2 27B | 42.75 ± 0.24 | 42.75 ± 0.24 |
![]() ![]() Llama 3.1 8B Meta | 39.62 ± 0.20 | 39.62 ± 0.20 |
![]() ![]() Tulu 3 8B AI2 | 38.12 ± 0.17 | 38.12 ± 0.17 |
![]() ![]() Aya Expanse 32B CohereLabs | 37.21 ± 0.26 | 37.21 ± 0.26 |
![]() ![]() Olmo 2 0325 32B AI2 | 34.44 ± 0.09 | 34.44 ± 0.09 |
![]() ![]() Gemma 2 9B | 32.65 ± 0.19 | 32.65 ± 0.19 |
![]() ![]() Sailor2 20B SAIL | 32.57 ± 0.14 | 32.57 ± 0.14 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 31.63 ± 0.15 | 31.63 ± 0.15 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 31.53 ± 0.15 | 31.53 ± 0.15 |
![]() ![]() MERaLiON 2 10B A*STAR | 31.41 ± 0.19 | 31.41 ± 0.19 |
![]() ![]() Olmo 2 1124 13B AI2 | 31.12 ± 0.17 | 31.12 ± 0.17 |
![]() ![]() Llama 3 8B Meta | 29.88 ± 0.17 | 29.88 ± 0.17 |
![]() ![]() Command R 08-2024 32B CohereLabs | 29.20 ± 0.24 | 29.20 ± 0.24 |
![]() ![]() Babel 83B Alibaba-DAMO | 29.20 ± 0.20 | 29.20 ± 0.20 |
![]() ![]() Apertus 70B Swiss AI | 26.92 ± 0.17 | 26.92 ± 0.17 |
![]() ![]() Ministral 2410 8B Mistral AI | 26.03 ± 0.26 | 26.03 ± 0.26 |
![]() ![]() Aya Expanse 8B CohereLabs | 24.06 ± 0.21 | 24.06 ± 0.21 |
![]() ![]() Olmo 2 1124 7B AI2 | 23.09 ± 0.13 | 23.09 ± 0.13 |
![]() ![]() Apertus 8B Swiss AI | 22.00 ± 0.22 | 22.00 ± 0.22 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 21.87 ± 0.11 | 21.87 ± 0.11 |
![]() ![]() Babel 9B Alibaba-DAMO | 20.99 ± 0.13 | 20.99 ± 0.13 |
![]() ![]() Sailor2 8B SAIL | 15.26 ± 0.11 | 15.26 ± 0.11 |
English Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | EN | English Tasks | BBH | GPQA | IFEval | MATH Hard | MMLU Pro | MuSR |
---|---|---|---|---|---|---|---|---|
![]() ![]() Qwen 3 32B Alibaba | 68.02 ± 0.15 | 68.02 ± 0.15 | 85.65 ± 0.10 | 36.02 ± 0.73 | 83.76 ± 0.27 | 68.90 ± 0.21 | 65.43 ± 0.09 | 68.35 ± 0.58 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 67.31 ± 0.10 | 67.31 ± 0.10 | 87.85 ± 0.09 | 28.35 ± 0.41 | 86.08 ± 0.22 | 69.55 ± 0.20 | 68.16 ± 0.06 | 63.84 ± 0.57 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 67.10 ± 0.17 | 67.10 ± 0.17 | 86.02 ± 0.06 | 31.18 ± 0.82 | 83.89 ± 0.28 | 69.67 ± 0.31 | 63.96 ± 0.08 | 67.91 ± 0.55 |
![]() ![]() Llama 3.3 70B Meta | 66.50 ± 0.16 | 66.50 ± 0.16 | 85.63 ± 0.08 | 43.45 ± 0.60 | 88.34 ± 0.18 | 52.63 ± 0.25 | 65.33 ± 0.08 | 63.59 ± 0.48 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 65.20 ± 0.18 | 65.20 ± 0.18 | 85.42 ± 0.16 | 33.42 ± 0.59 | 86.10 ± 0.32 | 55.64 ± 0.36 | 66.37 ± 0.10 | 64.26 ± 0.51 |
![]() ![]() Qwen 3 14B Alibaba | 65.00 ± 0.16 | 65.00 ± 0.16 | 81.32 ± 0.08 | 29.58 ± 0.70 | 85.59 ± 0.17 | 66.95 ± 0.32 | 63.63 ± 0.08 | 62.92 ± 0.67 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 64.31 ± 0.15 | 64.31 ± 0.15 | 80.36 ± 0.15 | 33.51 ± 0.65 | 79.00 ± 0.27 | 52.52 ± 0.29 | 66.58 ± 0.09 | 73.90 ± 0.54 |
![]() ![]() Command A 03-2025 111B CohereLabs | 63.92 ± 0.18 | 63.92 ± 0.18 | 85.64 ± 0.12 | 14.05 ± 0.81 | 86.81 ± 0.29 | 57.86 ± 0.31 | 65.14 ± 0.10 | 74.02 ± 0.49 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 63.86 ± 0.14 | 63.86 ± 0.14 | 79.95 ± 0.11 | 36.18 ± 0.57 | 84.44 ± 0.22 | 63.66 ± 0.24 | 58.37 ± 0.13 | 60.57 ± 0.65 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 63.68 ± 0.21 | 63.68 ± 0.21 | 85.14 ± 0.12 | 22.99 ± 0.79 | 80.54 ± 0.23 | 73.20 ± 0.29 | 61.37 ± 0.08 | 58.87 ± 0.66 |
![]() ![]() Gemma 3 27B | 63.55 ± 0.17 | 63.55 ± 0.17 | 84.55 ± 0.15 | 23.62 ± 0.61 | 81.38 ± 0.30 | 73.34 ± 0.28 | 61.40 ± 0.09 | 57.02 ± 0.59 |
![]() ![]() Qwen 2.5 72B Alibaba | 63.38 ± 0.18 | 63.38 ± 0.18 | 81.56 ± 0.12 | 27.11 ± 0.85 | 83.44 ± 0.31 | 61.67 ± 0.28 | 66.21 ± 0.07 | 60.26 ± 0.59 |
![]() ![]() Qwen 3 30B MoE Alibaba | 62.49 ± 0.15 | 62.49 ± 0.15 | 83.96 ± 0.09 | 16.20 ± 0.49 | 82.96 ± 0.23 | 66.42 ± 0.23 | 64.53 ± 0.08 | 60.84 ± 0.64 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 61.47 ± 0.16 | 61.47 ± 0.16 | 77.52 ± 0.14 | 41.12 ± 0.63 | 79.64 ± 0.32 | 63.22 ± 0.30 | 56.00 ± 0.12 | 51.30 ± 0.72 |
![]() ![]() Qwen 3 8B Alibaba | 60.57 ± 0.18 | 60.57 ± 0.18 | 77.13 ± 0.12 | 22.93 ± 0.71 | 82.30 ± 0.35 | 64.55 ± 0.27 | 58.68 ± 0.09 | 57.82 ± 0.64 |
![]() ![]() Llama 3.1 70B Meta | 59.59 ± 0.18 | 59.59 ± 0.18 | 82.69 ± 0.12 | 27.13 ± 0.76 | 83.79 ± 0.33 | 39.96 ± 0.31 | 63.27 ± 0.08 | 60.68 ± 0.72 |
![]() ![]() Tulu 3 70B AI2 | 58.73 ± 0.19 | 58.73 ± 0.19 | 82.36 ± 0.15 | 26.29 ± 0.80 | 79.78 ± 0.31 | 45.24 ± 0.20 | 59.82 ± 0.09 | 58.86 ± 0.58 |
![]() ![]() Qwen 2.5 32B Alibaba | 58.61 ± 0.16 | 58.61 ± 0.16 | 69.62 ± 0.11 | 23.52 ± 0.68 | 79.11 ± 0.28 | 57.35 ± 0.25 | 65.42 ± 0.08 | 56.64 ± 0.84 |
![]() ![]() Gemma 3 12B | 58.06 ± 0.21 | 58.06 ± 0.21 | 78.94 ± 0.11 | 15.70 ± 0.62 | 78.43 ± 0.35 | 64.48 ± 0.31 | 53.77 ± 0.09 | 57.04 ± 0.78 |
![]() ![]() phi-4 14B Microsoft | 57.87 ± 0.20 | 57.87 ± 0.20 | 79.63 ± 0.13 | 30.95 ± 0.62 | 59.31 ± 0.42 | 61.40 ± 0.31 | 57.41 ± 0.09 | 58.51 ± 0.84 |
![]() ![]() Qwen 2.5 14B Alibaba | 55.84 ± 0.16 | 55.84 ± 0.16 | 74.80 ± 0.16 | 18.97 ± 0.77 | 78.31 ± 0.22 | 55.09 ± 0.30 | 60.09 ± 0.10 | 47.78 ± 0.71 |
![]() ![]() Llama 3 70B Meta | 51.23 ± 0.14 | 51.23 ± 0.14 | 80.22 ± 0.09 | 16.97 ± 0.55 | 77.12 ± 0.32 | 24.32 ± 0.21 | 55.35 ± 0.08 | 53.42 ± 0.56 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 49.53 ± 0.17 | 49.53 ± 0.17 | 54.06 ± 0.22 | 25.79 ± 0.79 | 70.04 ± 0.45 | 43.83 ± 0.43 | 46.20 ± 0.17 | 57.27 ± 0.75 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 46.31 ± 0.28 | 46.31 ± 0.28 | 70.67 ± 0.17 | 13.83 ± 1.01 | 78.62 ± 0.43 | 27.42 ± 0.30 | 49.17 ± 0.13 | 38.16 ± 0.94 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 45.14 ± 0.19 | 45.14 ± 0.19 | 66.10 ± 0.24 | 16.11 ± 0.81 | 75.85 ± 0.36 | 28.86 ± 0.33 | 48.91 ± 0.11 | 35.03 ± 0.60 |
![]() ![]() Qwen 2.5 7B Alibaba | 43.29 ± 0.18 | 43.29 ± 0.18 | 61.35 ± 0.15 | 9.73 ± 0.59 | 70.97 ± 0.29 | 48.70 ± 0.32 | 49.82 ± 0.12 | 19.19 ± 0.75 |
![]() ![]() Gemma 2 27B | 42.75 ± 0.24 | 42.75 ± 0.24 | 63.80 ± 0.17 | 12.86 ± 0.99 | 74.82 ± 0.36 | 25.27 ± 0.26 | 45.90 ± 0.12 | 33.86 ± 0.76 |
![]() ![]() Llama 3.1 8B Meta | 39.62 ± 0.20 | 39.62 ± 0.20 | 60.91 ± 0.22 | 4.06 ± 0.80 | 74.46 ± 0.33 | 22.33 ± 0.28 | 41.68 ± 0.11 | 34.30 ± 0.64 |
![]() ![]() Tulu 3 8B AI2 | 38.12 ± 0.17 | 38.12 ± 0.17 | 54.35 ± 0.17 | 8.14 ± 0.64 | 79.09 ± 0.32 | 20.07 ± 0.33 | 39.23 ± 0.10 | 27.83 ± 0.79 |
![]() ![]() Aya Expanse 32B CohereLabs | 37.21 ± 0.26 | 37.21 ± 0.26 | 60.64 ± 0.21 | 7.00 ± 0.96 | 68.19 ± 0.36 | 14.75 ± 0.21 | 40.96 ± 0.13 | 31.74 ± 0.97 |
![]() ![]() Olmo 2 0325 32B AI2 | 34.44 ± 0.09 | 34.44 ± 0.09 | 55.67 ± 0.18 | 0.00 ± 0.00 | 80.89 ± 0.37 | 20.24 ± 0.30 | 45.04 ± 0.11 | 4.77 ± 0.35 |
![]() ![]() Gemma 2 9B | 32.65 ± 0.19 | 32.65 ± 0.19 | 53.32 ± 0.15 | 3.84 ± 0.64 | 69.04 ± 0.37 | 19.09 ± 0.25 | 27.72 ± 0.11 | 22.87 ± 0.86 |
![]() ![]() Sailor2 20B SAIL | 32.57 ± 0.14 | 32.57 ± 0.14 | 38.74 ± 0.13 | 14.53 ± 0.55 | 33.97 ± 0.25 | 40.32 ± 0.30 | 46.62 ± 0.08 | 21.23 ± 0.74 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 31.63 ± 0.15 | 31.63 ± 0.15 | 55.93 ± 0.27 | 0.50 ± 0.28 | 67.86 ± 0.42 | 20.65 ± 0.27 | 23.59 ± 0.12 | 21.27 ± 0.80 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 31.53 ± 0.15 | 31.53 ± 0.15 | 59.97 ± 0.27 | 0.00 ± 0.00 | 69.65 ± 0.40 | 9.74 ± 0.20 | 36.26 ± 0.12 | 13.57 ± 0.76 |
![]() ![]() MERaLiON 2 10B A*STAR | 31.41 ± 0.19 | 31.41 ± 0.19 | 54.42 ± 0.21 | 0.61 ± 0.31 | 69.37 ± 0.43 | 17.09 ± 0.26 | 27.15 ± 0.12 | 19.84 ± 0.80 |
![]() ![]() Olmo 2 1124 13B AI2 | 31.12 ± 0.17 | 31.12 ± 0.17 | 44.77 ± 0.22 | 0.02 ± 0.04 | 73.27 ± 0.43 | 15.79 ± 0.22 | 33.18 ± 0.13 | 19.70 ± 0.80 |
![]() ![]() Llama 3 8B Meta | 29.88 ± 0.17 | 29.88 ± 0.17 | 53.99 ± 0.18 | 0.00 ± 0.00 | 67.62 ± 0.45 | 7.92 ± 0.25 | 28.80 ± 0.08 | 20.94 ± 0.70 |
![]() ![]() Command R 08-2024 32B CohereLabs | 29.20 ± 0.24 | 29.20 ± 0.24 | 53.66 ± 0.20 | 3.92 ± 0.88 | 61.70 ± 0.38 | 5.84 ± 0.15 | 33.17 ± 0.17 | 16.92 ± 0.78 |
![]() ![]() Babel 83B Alibaba-DAMO | 29.20 ± 0.20 | 29.20 ± 0.20 | 57.97 ± 0.20 | 10.85 ± 0.86 | 30.06 ± 0.49 | 13.11 ± 0.34 | 46.31 ± 0.16 | 16.90 ± 0.63 |
![]() ![]() Apertus 70B Swiss AI | 26.92 ± 0.17 | 26.92 ± 0.17 | 38.97 ± 0.28 | 0.03 ± 0.06 | 57.19 ± 0.40 | 8.75 ± 0.20 | 29.37 ± 0.12 | 27.20 ± 0.82 |
![]() ![]() Ministral 2410 8B Mistral AI | 26.03 ± 0.26 | 26.03 ± 0.26 | 41.40 ± 0.25 | 2.02 ± 0.72 | 47.01 ± 0.58 | 19.30 ± 0.28 | 26.01 ± 0.14 | 20.41 ± 0.81 |
![]() ![]() Aya Expanse 8B CohereLabs | 24.06 ± 0.21 | 24.06 ± 0.21 | 37.65 ± 0.23 | 2.61 ± 0.72 | 57.54 ± 0.37 | 7.20 ± 0.19 | 24.76 ± 0.12 | 14.60 ± 0.69 |
![]() ![]() Olmo 2 1124 7B AI2 | 23.09 ± 0.13 | 23.09 ± 0.13 | 30.04 ± 0.23 | 0.03 ± 0.06 | 66.87 ± 0.34 | 11.79 ± 0.24 | 24.93 ± 0.13 | 4.87 ± 0.68 |
![]() ![]() Apertus 8B Swiss AI | 22.00 ± 0.22 | 22.00 ± 0.22 | 28.94 ± 0.24 | 3.25 ± 0.82 | 64.38 ± 0.61 | 4.75 ± 0.16 | 24.13 ± 0.11 | 6.57 ± 0.58 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 21.87 ± 0.11 | 21.87 ± 0.11 | 44.35 ± 0.21 | 0.05 ± 0.08 | 38.31 ± 0.52 | 15.82 ± 0.30 | 32.71 ± 0.13 | 0.01 ± 0.01 |
![]() ![]() Babel 9B Alibaba-DAMO | 20.99 ± 0.13 | 20.99 ± 0.13 | 49.20 ± 0.22 | 0.38 ± 0.24 | 28.52 ± 0.34 | 12.00 ± 0.27 | 32.06 ± 0.16 | 3.78 ± 0.52 |
![]() ![]() Sailor2 8B SAIL | 15.26 ± 0.11 | 15.26 ± 0.11 | 34.50 ± 0.20 | 0.25 ± 0.22 | 30.62 ± 0.32 | 12.08 ± 0.23 | 12.36 ± 0.12 | 1.77 ± 0.39 |