Filipino Performance
Filipino Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 27B 68.10±0.14 |
![]() ![]() 27B 67.70±0.12 |
![]() ![]() 80B MoE 66.48±0.13 |
![]() ![]() 70B 66.38±0.17 |
![]() ![]() 32B 65.35±0.14 |
![]() ![]() 12B 65.00±0.11 |
![]() ![]() 70B 63.21±0.11 |
![]() ![]() 70B 62.96±0.24 |
![]() ![]() 32B 62.23±0.13 |
![]() ![]() 72B 61.94±0.13 |
![]() ![]() 109B MoE 61.84±0.11 |
![]() ![]() 70B 61.82±0.17 |
![]() ![]() 30B MoE 61.39±0.12 |
![]() ![]() 27B 61.19±0.16 |
![]() ![]() 9B 60.75±0.22 |
![]() ![]() 123B 58.98±0.15 |
![]() ![]() 14B 57.37±0.13 |
![]() ![]() 32B 56.49±0.16 |
![]() ![]() 9B 53.79±0.17 |
![]() ![]() 32B 53.00±0.24 |
![]() ![]() 70B 52.32±0.14 |
![]() ![]() 10B 52.26±0.20 |
![]() ![]() 20B 51.82±0.15 |
![]() ![]() 14B 51.62±0.12 |
![]() ![]() 8B 51.41±0.16 |
![]() ![]() 8B 50.42±0.16 |
![]() ![]() 111B 49.22±0.21 |
![]() ![]() 8B 49.07±0.22 |
![]() ![]() 32B 47.65±0.13 |
![]() ![]() 24B 47.51±0.33 |
![]() ![]() 21B MoE 45.78±0.35 |
![]() ![]() 104B 45.26±0.29 |
![]() ![]() 8B 39.63±0.22 |
![]() ![]() 32B 38.24±0.25 |
![]() ![]() 7B 38.20±0.16 |
![]() ![]() 13B 35.13±0.29 |
![]() ![]() 70B 34.68±0.29 |
![]() ![]() 8B 33.94±0.24 |
![]() ![]() 8B 30.16±0.15 |
![]() ![]() 83B 29.79±0.41 |
![]() ![]() 14B 29.72±0.23 |
![]() ![]() 8B 29.70±0.16 |
![]() ![]() 9B 27.56±0.27 |
![]() ![]() 7B 26.67±0.34 |
![]() ![]() 7B 25.28±0.33 |
![]() ![]() 8B 23.09±0.39 |
![]() ![]() 8B 19.24±0.35 |
![]() ![]() 7B 14.84±0.19 |
Filipino Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TL | Cultural | Instruction Following | Multi-Turn Chat | NLG | NLR | NLU | Safety | Knowledge |
---|---|---|---|---|---|---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 68.10 ± 0.14 | 82.98 ± 0.17 | 88.86 ± 0.75 | 47.05 ± 0.62 | 55.60 ± 0.05 | 72.06 ± 0.12 | 70.56 ± 0.29 | 62.67 ± 0.42 | 65.01 ± 0.33 |
![]() ![]() Gemma 3 27B | 67.70 ± 0.12 | 82.56 ± 0.18 | 87.49 ± 0.69 | 45.07 ± 0.59 | 55.54 ± 0.05 | 71.77 ± 0.15 | 70.64 ± 0.18 | 63.20 ± 0.34 | 65.34 ± 0.23 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 66.48 ± 0.13 | 83.11 ± 0.19 | 87.27 ± 0.53 | 49.71 ± 0.67 | 51.76 ± 0.05 | 70.82 ± 0.13 | 73.20 ± 0.11 | 49.55 ± 0.31 | 66.38 ± 0.25 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 66.38 ± 0.17 | 87.63 ± 0.55 | 92.79 ± 0.63 | 26.57 ± 0.70 | 57.66 ± 0.07 | 72.94 ± 0.34 | 67.54 ± 0.38 | 54.65 ± 0.87 | 71.22 ± 0.34 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 65.35 ± 0.14 | 82.94 ± 0.26 | 87.05 ± 0.62 | 38.46 ± 0.74 | 54.04 ± 0.05 | 73.68 ± 0.12 | 73.37 ± 0.18 | 52.93 ± 0.48 | 60.32 ± 0.14 |
![]() ![]() Gemma 3 12B | 65.00 ± 0.11 | 82.03 ± 0.25 | 87.56 ± 0.68 | 42.67 ± 0.71 | 54.69 ± 0.06 | 70.81 ± 0.16 | 67.30 ± 0.17 | 56.23 ± 0.45 | 58.72 ± 0.19 |
![]() ![]() Llama 3.3 70B Meta | 63.21 ± 0.11 | 83.08 ± 0.16 | 91.43 ± 0.33 | 14.99 ± 0.52 | 55.51 ± 0.07 | 73.40 ± 0.15 | 68.79 ± 0.19 | 49.95 ± 0.46 | 68.52 ± 0.24 |
![]() ![]() Tulu 3 70B AI2 | 62.96 ± 0.24 | 82.12 ± 0.47 | 82.76 ± 0.81 | 27.63 ± 0.74 | 53.46 ± 0.10 | 74.86 ± 0.24 | 73.84 ± 0.32 | 47.35 ± 0.87 | 61.67 ± 0.52 |
![]() ![]() Qwen 3 32B Alibaba | 62.23 ± 0.13 | 77.81 ± 0.38 | 82.32 ± 0.58 | 34.02 ± 0.78 | 51.44 ± 0.07 | 67.39 ± 0.24 | 72.85 ± 0.35 | 52.45 ± 0.44 | 59.58 ± 0.31 |
![]() ![]() Qwen 2.5 72B Alibaba | 61.94 ± 0.13 | 82.11 ± 0.28 | 83.33 ± 0.68 | 28.75 ± 0.61 | 49.53 ± 0.05 | 67.88 ± 0.23 | 72.49 ± 0.25 | 43.55 ± 0.50 | 67.90 ± 0.32 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.84 ± 0.11 | 86.63 ± 0.00 | 92.76 ± 0.57 | 22.08 ± 0.65 | 54.96 ± 0.04 | 62.99 ± 0.09 | 66.60 ± 0.09 | 38.83 ± 0.17 | 69.90 ± 0.11 |
![]() ![]() Llama 3.1 70B Meta | 61.82 ± 0.17 | 83.46 ± 0.52 | 86.19 ± 0.89 | 12.03 ± 0.50 | 56.99 ± 0.10 | 71.91 ± 0.32 | 67.39 ± 0.31 | 49.98 ± 0.77 | 66.61 ± 0.42 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.39 ± 0.12 | 87.98 ± 0.16 | 83.84 ± 0.52 | 42.05 ± 0.54 | 50.60 ± 0.06 | 66.06 ± 0.13 | 68.75 ± 0.18 | 33.43 ± 0.11 | 58.43 ± 0.23 |
![]() ![]() Gemma 2 27B | 61.19 ± 0.16 | 82.65 ± 0.35 | 77.40 ± 0.80 | 16.15 ± 0.65 | 56.00 ± 0.07 | 68.27 ± 0.29 | 71.20 ± 0.28 | 59.33 ± 0.43 | 58.55 ± 0.52 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.75 ± 0.22 | 85.99 ± 0.56 | 84.38 ± 0.70 | 22.60 ± 0.67 | 54.56 ± 0.06 | 66.13 ± 0.35 | 71.44 ± 0.37 | 49.42 ± 0.54 | 51.50 ± 0.49 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.98 ± 0.15 | 82.90 ± 0.50 | 76.19 ± 1.03 | 16.93 ± 0.60 | 53.92 ± 0.11 | 67.63 ± 0.44 | 67.46 ± 0.43 | 50.85 ± 0.72 | 55.93 ± 0.44 |
![]() ![]() Qwen 3 14B Alibaba | 57.37 ± 0.13 | 78.53 ± 0.31 | 77.52 ± 0.66 | 23.13 ± 0.53 | 50.19 ± 0.07 | 61.24 ± 0.21 | 70.68 ± 0.23 | 46.93 ± 0.23 | 50.74 ± 0.26 |
![]() ![]() Qwen 2.5 32B Alibaba | 56.49 ± 0.16 | 78.61 ± 0.23 | 78.35 ± 0.99 | 14.70 ± 0.35 | 46.52 ± 0.09 | 59.49 ± 0.15 | 68.54 ± 0.19 | 46.73 ± 0.28 | 59.01 ± 0.21 |
![]() ![]() Gemma 2 9B | 53.79 ± 0.17 | 76.23 ± 0.58 | 78.32 ± 0.93 | 10.96 ± 0.41 | 52.66 ± 0.10 | 60.91 ± 0.26 | 63.52 ± 0.47 | 36.77 ± 0.26 | 50.93 ± 0.41 |
![]() ![]() Olmo 2 0325 32B AI2 | 53.00 ± 0.24 | 81.47 ± 0.60 | 69.17 ± 0.96 | 11.12 ± 0.42 | 46.45 ± 0.14 | 55.24 ± 0.37 | 69.10 ± 0.43 | 43.80 ± 0.65 | 47.60 ± 0.59 |
![]() ![]() Llama 3 70B Meta | 52.32 ± 0.14 | 83.01 ± 0.38 | 30.60 ± 0.73 | 10.56 ± 0.58 | 55.48 ± 0.06 | 66.65 ± 0.18 | 64.82 ± 0.21 | 48.33 ± 0.52 | 59.07 ± 0.28 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.26 ± 0.20 | 76.42 ± 0.72 | 75.17 ± 1.05 | 9.20 ± 0.38 | 52.15 ± 0.11 | 62.30 ± 0.41 | 64.57 ± 0.27 | 28.78 ± 0.31 | 49.48 ± 0.41 |
![]() ![]() Sailor2 20B SAIL | 51.82 ± 0.15 | 78.69 ± 0.00 | 40.92 ± 0.90 | 25.00 ± 0.69 | 25.88 ± 0.13 | 70.11 ± 0.12 | 74.42 ± 0.24 | 48.57 ± 0.36 | 50.98 ± 0.36 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.62 ± 0.12 | 74.19 ± 0.31 | 71.97 ± 0.95 | 10.70 ± 0.39 | 43.52 ± 0.07 | 51.72 ± 0.17 | 68.11 ± 0.22 | 44.63 ± 0.35 | 48.15 ± 0.19 |
![]() ![]() Sailor2 8B SAIL | 51.41 ± 0.16 | 66.82 ± 0.41 | 40.63 ± 0.73 | 25.32 ± 0.67 | 50.27 ± 0.07 | 67.21 ± 0.24 | 65.53 ± 0.39 | 45.17 ± 0.28 | 50.31 ± 0.27 |
![]() ![]() Qwen 3 8B Alibaba | 50.42 ± 0.16 | 72.37 ± 0.45 | 72.10 ± 0.86 | 17.93 ± 0.48 | 47.08 ± 0.07 | 40.04 ± 0.28 | 65.19 ± 0.17 | 42.43 ± 0.28 | 46.21 ± 0.34 |
![]() ![]() Command A 03-2025 111B CohereLabs | 49.22 ± 0.21 | 72.14 ± 0.89 | 74.92 ± 0.88 | 35.50 ± 0.77 | 52.43 ± 0.05 | 44.47 ± 0.11 | 0.33 ± 0.26 | 52.12 ± 0.60 | 61.83 ± 0.36 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.07 ± 0.22 | 70.37 ± 0.64 | 71.30 ± 1.02 | 20.00 ± 0.74 | 53.19 ± 0.11 | 54.24 ± 0.49 | 59.40 ± 0.39 | 19.58 ± 0.46 | 44.50 ± 0.54 |
![]() ![]() Aya Expanse 32B CohereLabs | 47.65 ± 0.13 | 66.20 ± 0.26 | 62.03 ± 0.83 | 8.42 ± 0.35 | 44.64 ± 0.07 | 53.71 ± 0.25 | 59.38 ± 0.45 | 44.55 ± 0.36 | 42.26 ± 0.17 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 47.51 ± 0.33 | 74.70 ± 0.77 | 59.65 ± 1.36 | 9.18 ± 0.53 | 46.49 ± 0.12 | 51.40 ± 0.71 | 65.33 ± 0.41 | 19.77 ± 1.50 | 53.55 ± 0.61 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.78 ± 0.35 | 37.95 ± 1.51 | 73.65 ± 0.93 | 22.39 ± 0.75 | 51.99 ± 0.05 | 56.19 ± 0.50 | 45.27 ± 0.90 | 31.68 ± 0.36 | 47.11 ± 0.51 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 45.26 ± 0.29 | 70.43 ± 0.91 | 54.63 ± 1.18 | 4.64 ± 0.32 | 48.63 ± 0.11 | 50.73 ± 0.44 | 60.74 ± 0.55 | 33.08 ± 1.14 | 39.15 ± 0.63 |
![]() ![]() Llama 3.1 8B Meta | 39.63 ± 0.22 | 62.07 ± 0.71 | 57.05 ± 0.75 | 5.24 ± 0.38 | 51.94 ± 0.11 | 34.19 ± 0.71 | 53.04 ± 0.47 | 18.87 ± 0.75 | 34.69 ± 0.52 |
![]() ![]() Command R 08-2024 32B CohereLabs | 38.24 ± 0.25 | 63.09 ± 1.07 | 50.73 ± 1.08 | 3.03 ± 0.28 | 40.73 ± 0.08 | 38.29 ± 0.93 | 45.52 ± 0.73 | 35.07 ± 1.02 | 29.49 ± 0.72 |
![]() ![]() Qwen 2.5 7B Alibaba | 38.20 ± 0.16 | 51.56 ± 0.44 | 52.89 ± 0.98 | 7.18 ± 0.37 | 32.20 ± 0.09 | 37.20 ± 0.23 | 57.42 ± 0.15 | 29.22 ± 0.17 | 37.93 ± 0.23 |
![]() ![]() Olmo 2 1124 13B AI2 | 35.13 ± 0.29 | 57.04 ± 0.95 | 60.86 ± 1.24 | 5.22 ± 0.40 | 33.28 ± 0.17 | 22.06 ± 0.51 | 32.31 ± 0.56 | 44.95 ± 0.99 | 25.37 ± 0.46 |
![]() ![]() Apertus 70B Swiss AI | 34.68 ± 0.29 | 54.44 ± 1.56 | 55.84 ± 1.29 | 10.33 ± 0.61 | 47.20 ± 0.08 | 13.06 ± 0.95 | 49.91 ± 0.68 | 26.03 ± 1.24 | 20.63 ± 0.72 |
![]() ![]() Tulu 3 8B AI2 | 33.94 ± 0.24 | 53.89 ± 0.86 | 65.78 ± 1.00 | 8.23 ± 0.44 | 23.82 ± 0.08 | 38.05 ± 0.54 | 27.52 ± 0.43 | 23.27 ± 0.64 | 30.92 ± 0.44 |
![]() ![]() Llama 3 8B Meta | 30.16 ± 0.15 | 52.39 ± 0.48 | 20.25 ± 0.74 | 4.21 ± 0.29 | 48.49 ± 0.12 | 27.55 ± 0.56 | 49.12 ± 0.35 | 8.32 ± 0.42 | 30.93 ± 0.46 |
![]() ![]() Babel 83B Alibaba-DAMO | 29.79 ± 0.41 | 48.59 ± 1.41 | 38.41 ± 1.38 | 2.92 ± 0.37 | 32.33 ± 0.18 | 36.05 ± 0.77 | 28.50 ± 0.94 | 25.30 ± 1.03 | 26.20 ± 0.91 |
![]() ![]() phi-4 14B Microsoft | 29.72 ± 0.23 | 65.93 ± 0.90 | 52.92 ± 1.54 | 12.80 ± 0.35 | 36.22 ± 0.11 | 26.77 ± 0.45 | 0.00 ± 0.00 | 0.00 ± 0.00 | 43.15 ± 0.56 |
![]() ![]() Aya Expanse 8B CohereLabs | 29.70 ± 0.16 | 49.27 ± 0.32 | 42.95 ± 0.87 | 2.72 ± 0.28 | 33.88 ± 0.08 | 24.77 ± 0.26 | 42.85 ± 0.34 | 17.12 ± 0.52 | 24.05 ± 0.38 |
![]() ![]() Babel 9B Alibaba-DAMO | 27.56 ± 0.27 | 18.49 ± 1.68 | 43.81 ± 1.07 | 3.02 ± 0.34 | 46.30 ± 0.12 | 19.69 ± 0.75 | 54.89 ± 0.68 | 24.53 ± 0.51 | 9.79 ± 0.47 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 26.67 ± 0.34 | 52.02 ± 1.35 | 42.06 ± 1.05 | 6.91 ± 0.32 | 40.18 ± 0.12 | 16.06 ± 0.67 | 35.61 ± 0.92 | 12.10 ± 1.51 | 8.43 ± 0.52 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 25.28 ± 0.33 | 35.33 ± 1.57 | 44.73 ± 1.26 | 1.70 ± 0.30 | 27.46 ± 0.08 | 11.44 ± 0.72 | 36.20 ± 0.78 | 26.00 ± 1.25 | 19.39 ± 0.69 |
![]() ![]() Apertus 8B Swiss AI | 23.09 ± 0.39 | 15.44 ± 1.68 | 56.83 ± 1.32 | 3.84 ± 0.35 | 43.11 ± 0.12 | 7.31 ± 0.90 | 23.89 ± 0.72 | 26.53 ± 1.09 | 7.79 ± 0.63 |
![]() ![]() Ministral 2410 8B Mistral AI | 19.24 ± 0.35 | 47.59 ± 1.39 | 20.83 ± 1.02 | 1.75 ± 0.28 | 25.30 ± 0.15 | 7.58 ± 0.83 | 25.12 ± 1.07 | 10.10 ± 1.89 | 15.65 ± 0.71 |
![]() ![]() Olmo 2 1124 7B AI2 | 14.84 ± 0.19 | 0.00 ± 0.00 | 45.87 ± 1.41 | 2.07 ± 0.31 | 34.93 ± 0.11 | 2.96 ± 0.58 | 29.34 ± 0.79 | 0.00 ± 0.00 | 3.56 ± 0.35 |
Filipino Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TL | Cultural | Kalahi |
---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 68.10 ± 0.14 | 82.98 ± 0.17 | 82.98 ± 0.17 |
![]() ![]() Gemma 3 27B | 67.70 ± 0.12 | 82.56 ± 0.18 | 82.56 ± 0.18 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 66.48 ± 0.13 | 83.11 ± 0.19 | 83.11 ± 0.19 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 66.38 ± 0.17 | 87.63 ± 0.55 | 87.63 ± 0.55 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 65.35 ± 0.14 | 82.94 ± 0.26 | 82.94 ± 0.26 |
![]() ![]() Gemma 3 12B | 65.00 ± 0.11 | 82.03 ± 0.25 | 82.03 ± 0.25 |
![]() ![]() Llama 3.3 70B Meta | 63.21 ± 0.11 | 83.08 ± 0.16 | 83.08 ± 0.16 |
![]() ![]() Tulu 3 70B AI2 | 62.96 ± 0.24 | 82.12 ± 0.47 | 82.12 ± 0.47 |
![]() ![]() Qwen 3 32B Alibaba | 62.23 ± 0.13 | 77.81 ± 0.38 | 77.81 ± 0.38 |
![]() ![]() Qwen 2.5 72B Alibaba | 61.94 ± 0.13 | 82.11 ± 0.28 | 82.11 ± 0.28 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.84 ± 0.11 | 86.63 ± 0.00 | 86.63 ± 0.00 |
![]() ![]() Llama 3.1 70B Meta | 61.82 ± 0.17 | 83.46 ± 0.52 | 83.46 ± 0.52 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.39 ± 0.12 | 87.98 ± 0.16 | 87.98 ± 0.16 |
![]() ![]() Gemma 2 27B | 61.19 ± 0.16 | 82.65 ± 0.35 | 82.65 ± 0.35 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.75 ± 0.22 | 85.99 ± 0.56 | 85.99 ± 0.56 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.98 ± 0.15 | 82.90 ± 0.50 | 82.90 ± 0.50 |
![]() ![]() Qwen 3 14B Alibaba | 57.37 ± 0.13 | 78.53 ± 0.31 | 78.53 ± 0.31 |
![]() ![]() Qwen 2.5 32B Alibaba | 56.49 ± 0.16 | 78.61 ± 0.23 | 78.61 ± 0.23 |
![]() ![]() Gemma 2 9B | 53.79 ± 0.17 | 76.23 ± 0.58 | 76.23 ± 0.58 |
![]() ![]() Olmo 2 0325 32B AI2 | 53.00 ± 0.24 | 81.47 ± 0.60 | 81.47 ± 0.60 |
![]() ![]() Llama 3 70B Meta | 52.32 ± 0.14 | 83.01 ± 0.38 | 83.01 ± 0.38 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.26 ± 0.20 | 76.42 ± 0.72 | 76.42 ± 0.72 |
![]() ![]() Sailor2 20B SAIL | 51.82 ± 0.15 | 78.69 ± 0.00 | 78.69 ± 0.00 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.62 ± 0.12 | 74.19 ± 0.31 | 74.19 ± 0.31 |
![]() ![]() Sailor2 8B SAIL | 51.41 ± 0.16 | 66.82 ± 0.41 | 66.82 ± 0.41 |
![]() ![]() Qwen 3 8B Alibaba | 50.42 ± 0.16 | 72.37 ± 0.45 | 72.37 ± 0.45 |
![]() ![]() Command A 03-2025 111B CohereLabs | 49.22 ± 0.21 | 72.14 ± 0.89 | 72.14 ± 0.89 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.07 ± 0.22 | 70.37 ± 0.64 | 70.37 ± 0.64 |
![]() ![]() Aya Expanse 32B CohereLabs | 47.65 ± 0.13 | 66.20 ± 0.26 | 66.20 ± 0.26 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 47.51 ± 0.33 | 74.70 ± 0.77 | 74.70 ± 0.77 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.78 ± 0.35 | 37.95 ± 1.51 | 37.95 ± 1.51 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 45.26 ± 0.29 | 70.43 ± 0.91 | 70.43 ± 0.91 |
![]() ![]() Llama 3.1 8B Meta | 39.63 ± 0.22 | 62.07 ± 0.71 | 62.07 ± 0.71 |
![]() ![]() Command R 08-2024 32B CohereLabs | 38.24 ± 0.25 | 63.09 ± 1.07 | 63.09 ± 1.07 |
![]() ![]() Qwen 2.5 7B Alibaba | 38.20 ± 0.16 | 51.56 ± 0.44 | 51.56 ± 0.44 |
![]() ![]() Olmo 2 1124 13B AI2 | 35.13 ± 0.29 | 57.04 ± 0.95 | 57.04 ± 0.95 |
![]() ![]() Apertus 70B Swiss AI | 34.68 ± 0.29 | 54.44 ± 1.56 | 54.44 ± 1.56 |
![]() ![]() Tulu 3 8B AI2 | 33.94 ± 0.24 | 53.89 ± 0.86 | 53.89 ± 0.86 |
![]() ![]() Llama 3 8B Meta | 30.16 ± 0.15 | 52.39 ± 0.48 | 52.39 ± 0.48 |
![]() ![]() Babel 83B Alibaba-DAMO | 29.79 ± 0.41 | 48.59 ± 1.41 | 48.59 ± 1.41 |
![]() ![]() phi-4 14B Microsoft | 29.72 ± 0.23 | 65.93 ± 0.90 | 65.93 ± 0.90 |
![]() ![]() Aya Expanse 8B CohereLabs | 29.70 ± 0.16 | 49.27 ± 0.32 | 49.27 ± 0.32 |
![]() ![]() Babel 9B Alibaba-DAMO | 27.56 ± 0.27 | 18.49 ± 1.68 | 18.49 ± 1.68 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 26.67 ± 0.34 | 52.02 ± 1.35 | 52.02 ± 1.35 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 25.28 ± 0.33 | 35.33 ± 1.57 | 35.33 ± 1.57 |
![]() ![]() Apertus 8B Swiss AI | 23.09 ± 0.39 | 15.44 ± 1.68 | 15.44 ± 1.68 |
![]() ![]() Ministral 2410 8B Mistral AI | 19.24 ± 0.35 | 47.59 ± 1.39 | 47.59 ± 1.39 |
![]() ![]() Olmo 2 1124 7B AI2 | 14.84 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | TL | Instruction Following | SEA-IFEval |
---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 68.10 ± 0.14 | 88.86 ± 0.75 | 88.86 ± 0.75 |
![]() ![]() Gemma 3 27B | 67.70 ± 0.12 | 87.49 ± 0.69 | 87.49 ± 0.69 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 66.48 ± 0.13 | 87.27 ± 0.53 | 87.27 ± 0.53 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 66.38 ± 0.17 | 92.79 ± 0.63 | 92.79 ± 0.63 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 65.35 ± 0.14 | 87.05 ± 0.62 | 87.05 ± 0.62 |
![]() ![]() Gemma 3 12B | 65.00 ± 0.11 | 87.56 ± 0.68 | 87.56 ± 0.68 |
![]() ![]() Llama 3.3 70B Meta | 63.21 ± 0.11 | 91.43 ± 0.33 | 91.43 ± 0.33 |
![]() ![]() Tulu 3 70B AI2 | 62.96 ± 0.24 | 82.76 ± 0.81 | 82.76 ± 0.81 |
![]() ![]() Qwen 3 32B Alibaba | 62.23 ± 0.13 | 82.32 ± 0.58 | 82.32 ± 0.58 |
![]() ![]() Qwen 2.5 72B Alibaba | 61.94 ± 0.13 | 83.33 ± 0.68 | 83.33 ± 0.68 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.84 ± 0.11 | 92.76 ± 0.57 | 92.76 ± 0.57 |
![]() ![]() Llama 3.1 70B Meta | 61.82 ± 0.17 | 86.19 ± 0.89 | 86.19 ± 0.89 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.39 ± 0.12 | 83.84 ± 0.52 | 83.84 ± 0.52 |
![]() ![]() Gemma 2 27B | 61.19 ± 0.16 | 77.40 ± 0.80 | 77.40 ± 0.80 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.75 ± 0.22 | 84.38 ± 0.70 | 84.38 ± 0.70 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.98 ± 0.15 | 76.19 ± 1.03 | 76.19 ± 1.03 |
![]() ![]() Qwen 3 14B Alibaba | 57.37 ± 0.13 | 77.52 ± 0.66 | 77.52 ± 0.66 |
![]() ![]() Qwen 2.5 32B Alibaba | 56.49 ± 0.16 | 78.35 ± 0.99 | 78.35 ± 0.99 |
![]() ![]() Gemma 2 9B | 53.79 ± 0.17 | 78.32 ± 0.93 | 78.32 ± 0.93 |
![]() ![]() Olmo 2 0325 32B AI2 | 53.00 ± 0.24 | 69.17 ± 0.96 | 69.17 ± 0.96 |
![]() ![]() Llama 3 70B Meta | 52.32 ± 0.14 | 30.60 ± 0.73 | 30.60 ± 0.73 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.26 ± 0.20 | 75.17 ± 1.05 | 75.17 ± 1.05 |
![]() ![]() Sailor2 20B SAIL | 51.82 ± 0.15 | 40.92 ± 0.90 | 40.92 ± 0.90 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.62 ± 0.12 | 71.97 ± 0.95 | 71.97 ± 0.95 |
![]() ![]() Sailor2 8B SAIL | 51.41 ± 0.16 | 40.63 ± 0.73 | 40.63 ± 0.73 |
![]() ![]() Qwen 3 8B Alibaba | 50.42 ± 0.16 | 72.10 ± 0.86 | 72.10 ± 0.86 |
![]() ![]() Command A 03-2025 111B CohereLabs | 49.22 ± 0.21 | 74.92 ± 0.88 | 74.92 ± 0.88 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.07 ± 0.22 | 71.30 ± 1.02 | 71.30 ± 1.02 |
![]() ![]() Aya Expanse 32B CohereLabs | 47.65 ± 0.13 | 62.03 ± 0.83 | 62.03 ± 0.83 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 47.51 ± 0.33 | 59.65 ± 1.36 | 59.65 ± 1.36 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.78 ± 0.35 | 73.65 ± 0.93 | 73.65 ± 0.93 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 45.26 ± 0.29 | 54.63 ± 1.18 | 54.63 ± 1.18 |
![]() ![]() Llama 3.1 8B Meta | 39.63 ± 0.22 | 57.05 ± 0.75 | 57.05 ± 0.75 |
![]() ![]() Command R 08-2024 32B CohereLabs | 38.24 ± 0.25 | 50.73 ± 1.08 | 50.73 ± 1.08 |
![]() ![]() Qwen 2.5 7B Alibaba | 38.20 ± 0.16 | 52.89 ± 0.98 | 52.89 ± 0.98 |
![]() ![]() Olmo 2 1124 13B AI2 | 35.13 ± 0.29 | 60.86 ± 1.24 | 60.86 ± 1.24 |
![]() ![]() Apertus 70B Swiss AI | 34.68 ± 0.29 | 55.84 ± 1.29 | 55.84 ± 1.29 |
![]() ![]() Tulu 3 8B AI2 | 33.94 ± 0.24 | 65.78 ± 1.00 | 65.78 ± 1.00 |
![]() ![]() Llama 3 8B Meta | 30.16 ± 0.15 | 20.25 ± 0.74 | 20.25 ± 0.74 |
![]() ![]() Babel 83B Alibaba-DAMO | 29.79 ± 0.41 | 38.41 ± 1.38 | 38.41 ± 1.38 |
![]() ![]() phi-4 14B Microsoft | 29.72 ± 0.23 | 52.92 ± 1.54 | 52.92 ± 1.54 |
![]() ![]() Aya Expanse 8B CohereLabs | 29.70 ± 0.16 | 42.95 ± 0.87 | 42.95 ± 0.87 |
![]() ![]() Babel 9B Alibaba-DAMO | 27.56 ± 0.27 | 43.81 ± 1.07 | 43.81 ± 1.07 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 26.67 ± 0.34 | 42.06 ± 1.05 | 42.06 ± 1.05 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 25.28 ± 0.33 | 44.73 ± 1.26 | 44.73 ± 1.26 |
![]() ![]() Apertus 8B Swiss AI | 23.09 ± 0.39 | 56.83 ± 1.32 | 56.83 ± 1.32 |
![]() ![]() Ministral 2410 8B Mistral AI | 19.24 ± 0.35 | 20.83 ± 1.02 | 20.83 ± 1.02 |
![]() ![]() Olmo 2 1124 7B AI2 | 14.84 ± 0.19 | 45.87 ± 1.41 | 45.87 ± 1.41 |
Model | TL | Multi-Turn Chat | SEA-MT-Bench |
---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 68.10 ± 0.14 | 47.05 ± 0.62 | 47.05 ± 0.62 |
![]() ![]() Gemma 3 27B | 67.70 ± 0.12 | 45.07 ± 0.59 | 45.07 ± 0.59 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 66.48 ± 0.13 | 49.71 ± 0.67 | 49.71 ± 0.67 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 66.38 ± 0.17 | 26.57 ± 0.70 | 26.57 ± 0.70 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 65.35 ± 0.14 | 38.46 ± 0.74 | 38.46 ± 0.74 |
![]() ![]() Gemma 3 12B | 65.00 ± 0.11 | 42.67 ± 0.71 | 42.67 ± 0.71 |
![]() ![]() Llama 3.3 70B Meta | 63.21 ± 0.11 | 14.99 ± 0.52 | 14.99 ± 0.52 |
![]() ![]() Tulu 3 70B AI2 | 62.96 ± 0.24 | 27.63 ± 0.74 | 27.63 ± 0.74 |
![]() ![]() Qwen 3 32B Alibaba | 62.23 ± 0.13 | 34.02 ± 0.78 | 34.02 ± 0.78 |
![]() ![]() Qwen 2.5 72B Alibaba | 61.94 ± 0.13 | 28.75 ± 0.61 | 28.75 ± 0.61 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.84 ± 0.11 | 22.08 ± 0.65 | 22.08 ± 0.65 |
![]() ![]() Llama 3.1 70B Meta | 61.82 ± 0.17 | 12.03 ± 0.50 | 12.03 ± 0.50 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.39 ± 0.12 | 42.05 ± 0.54 | 42.05 ± 0.54 |
![]() ![]() Gemma 2 27B | 61.19 ± 0.16 | 16.15 ± 0.65 | 16.15 ± 0.65 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.75 ± 0.22 | 22.60 ± 0.67 | 22.60 ± 0.67 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.98 ± 0.15 | 16.93 ± 0.60 | 16.93 ± 0.60 |
![]() ![]() Qwen 3 14B Alibaba | 57.37 ± 0.13 | 23.13 ± 0.53 | 23.13 ± 0.53 |
![]() ![]() Qwen 2.5 32B Alibaba | 56.49 ± 0.16 | 14.70 ± 0.35 | 14.70 ± 0.35 |
![]() ![]() Gemma 2 9B | 53.79 ± 0.17 | 10.96 ± 0.41 | 10.96 ± 0.41 |
![]() ![]() Olmo 2 0325 32B AI2 | 53.00 ± 0.24 | 11.12 ± 0.42 | 11.12 ± 0.42 |
![]() ![]() Llama 3 70B Meta | 52.32 ± 0.14 | 10.56 ± 0.58 | 10.56 ± 0.58 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.26 ± 0.20 | 9.20 ± 0.38 | 9.20 ± 0.38 |
![]() ![]() Sailor2 20B SAIL | 51.82 ± 0.15 | 25.00 ± 0.69 | 25.00 ± 0.69 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.62 ± 0.12 | 10.70 ± 0.39 | 10.70 ± 0.39 |
![]() ![]() Sailor2 8B SAIL | 51.41 ± 0.16 | 25.32 ± 0.67 | 25.32 ± 0.67 |
![]() ![]() Qwen 3 8B Alibaba | 50.42 ± 0.16 | 17.93 ± 0.48 | 17.93 ± 0.48 |
![]() ![]() Command A 03-2025 111B CohereLabs | 49.22 ± 0.21 | 35.50 ± 0.77 | 35.50 ± 0.77 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.07 ± 0.22 | 20.00 ± 0.74 | 20.00 ± 0.74 |
![]() ![]() Aya Expanse 32B CohereLabs | 47.65 ± 0.13 | 8.42 ± 0.35 | 8.42 ± 0.35 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 47.51 ± 0.33 | 9.18 ± 0.53 | 9.18 ± 0.53 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.78 ± 0.35 | 22.39 ± 0.75 | 22.39 ± 0.75 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 45.26 ± 0.29 | 4.64 ± 0.32 | 4.64 ± 0.32 |
![]() ![]() Llama 3.1 8B Meta | 39.63 ± 0.22 | 5.24 ± 0.38 | 5.24 ± 0.38 |
![]() ![]() Command R 08-2024 32B CohereLabs | 38.24 ± 0.25 | 3.03 ± 0.28 | 3.03 ± 0.28 |
![]() ![]() Qwen 2.5 7B Alibaba | 38.20 ± 0.16 | 7.18 ± 0.37 | 7.18 ± 0.37 |
![]() ![]() Olmo 2 1124 13B AI2 | 35.13 ± 0.29 | 5.22 ± 0.40 | 5.22 ± 0.40 |
![]() ![]() Apertus 70B Swiss AI | 34.68 ± 0.29 | 10.33 ± 0.61 | 10.33 ± 0.61 |
![]() ![]() Tulu 3 8B AI2 | 33.94 ± 0.24 | 8.23 ± 0.44 | 8.23 ± 0.44 |
![]() ![]() Llama 3 8B Meta | 30.16 ± 0.15 | 4.21 ± 0.29 | 4.21 ± 0.29 |
![]() ![]() Babel 83B Alibaba-DAMO | 29.79 ± 0.41 | 2.92 ± 0.37 | 2.92 ± 0.37 |
![]() ![]() phi-4 14B Microsoft | 29.72 ± 0.23 | 12.80 ± 0.35 | 12.80 ± 0.35 |
![]() ![]() Aya Expanse 8B CohereLabs | 29.70 ± 0.16 | 2.72 ± 0.28 | 2.72 ± 0.28 |
![]() ![]() Babel 9B Alibaba-DAMO | 27.56 ± 0.27 | 3.02 ± 0.34 | 3.02 ± 0.34 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 26.67 ± 0.34 | 6.91 ± 0.32 | 6.91 ± 0.32 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 25.28 ± 0.33 | 1.70 ± 0.30 | 1.70 ± 0.30 |
![]() ![]() Apertus 8B Swiss AI | 23.09 ± 0.39 | 3.84 ± 0.35 | 3.84 ± 0.35 |
![]() ![]() Ministral 2410 8B Mistral AI | 19.24 ± 0.35 | 1.75 ± 0.28 | 1.75 ± 0.28 |
![]() ![]() Olmo 2 1124 7B AI2 | 14.84 ± 0.19 | 2.07 ± 0.31 | 2.07 ± 0.31 |
Model | TL | NLG | Summarization | Translations |
---|---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 68.10 ± 0.14 | 55.60 ± 0.05 | 19.95 ± 0.09 | 91.25 ± 0.03 |
![]() ![]() Gemma 3 27B | 67.70 ± 0.12 | 55.54 ± 0.05 | 19.70 ± 0.10 | 91.38 ± 0.02 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 66.48 ± 0.13 | 51.76 ± 0.05 | 19.00 ± 0.09 | 84.53 ± 0.05 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 66.38 ± 0.17 | 57.66 ± 0.07 | 25.46 ± 0.15 | 89.87 ± 0.03 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 65.35 ± 0.14 | 54.04 ± 0.05 | 21.12 ± 0.09 | 86.96 ± 0.06 |
![]() ![]() Gemma 3 12B | 65.00 ± 0.11 | 54.69 ± 0.06 | 19.17 ± 0.11 | 90.22 ± 0.02 |
![]() ![]() Llama 3.3 70B Meta | 63.21 ± 0.11 | 55.51 ± 0.07 | 24.29 ± 0.13 | 86.74 ± 0.04 |
![]() ![]() Tulu 3 70B AI2 | 62.96 ± 0.24 | 53.46 ± 0.10 | 18.30 ± 0.20 | 88.63 ± 0.05 |
![]() ![]() Qwen 3 32B Alibaba | 62.23 ± 0.13 | 51.44 ± 0.07 | 21.21 ± 0.13 | 81.68 ± 0.08 |
![]() ![]() Qwen 2.5 72B Alibaba | 61.94 ± 0.13 | 49.53 ± 0.05 | 18.14 ± 0.07 | 80.93 ± 0.08 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.84 ± 0.11 | 54.96 ± 0.04 | 21.22 ± 0.08 | 88.69 ± 0.02 |
![]() ![]() Llama 3.1 70B Meta | 61.82 ± 0.17 | 56.99 ± 0.10 | 26.59 ± 0.19 | 87.40 ± 0.05 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.39 ± 0.12 | 50.60 ± 0.06 | 18.17 ± 0.09 | 83.04 ± 0.05 |
![]() ![]() Gemma 2 27B | 61.19 ± 0.16 | 56.00 ± 0.07 | 23.40 ± 0.13 | 88.61 ± 0.05 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.75 ± 0.22 | 54.56 ± 0.06 | 21.19 ± 0.12 | 87.93 ± 0.05 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.98 ± 0.15 | 53.92 ± 0.11 | 22.59 ± 0.17 | 85.24 ± 0.07 |
![]() ![]() Qwen 3 14B Alibaba | 57.37 ± 0.13 | 50.19 ± 0.07 | 19.97 ± 0.10 | 80.40 ± 0.07 |
![]() ![]() Qwen 2.5 32B Alibaba | 56.49 ± 0.16 | 46.52 ± 0.09 | 20.58 ± 0.13 | 72.46 ± 0.09 |
![]() ![]() Gemma 2 9B | 53.79 ± 0.17 | 52.66 ± 0.10 | 21.69 ± 0.19 | 83.63 ± 0.07 |
![]() ![]() Olmo 2 0325 32B AI2 | 53.00 ± 0.24 | 46.45 ± 0.14 | 14.64 ± 0.24 | 78.27 ± 0.10 |
![]() ![]() Llama 3 70B Meta | 52.32 ± 0.14 | 55.48 ± 0.06 | 24.70 ± 0.12 | 86.25 ± 0.04 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.26 ± 0.20 | 52.15 ± 0.11 | 22.01 ± 0.18 | 82.29 ± 0.08 |
![]() ![]() Sailor2 20B SAIL | 51.82 ± 0.15 | 25.88 ± 0.13 | 17.46 ± 0.09 | 34.30 ± 0.24 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.62 ± 0.12 | 43.52 ± 0.07 | 18.28 ± 0.11 | 68.76 ± 0.10 |
![]() ![]() Sailor2 8B SAIL | 51.41 ± 0.16 | 50.27 ± 0.07 | 18.64 ± 0.07 | 81.91 ± 0.11 |
![]() ![]() Qwen 3 8B Alibaba | 50.42 ± 0.16 | 47.08 ± 0.07 | 19.93 ± 0.11 | 74.23 ± 0.11 |
![]() ![]() Command A 03-2025 111B CohereLabs | 49.22 ± 0.21 | 52.43 ± 0.05 | 19.60 ± 0.10 | 85.25 ± 0.06 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.07 ± 0.22 | 53.19 ± 0.11 | 20.28 ± 0.21 | 86.10 ± 0.06 |
![]() ![]() Aya Expanse 32B CohereLabs | 47.65 ± 0.13 | 44.64 ± 0.07 | 18.20 ± 0.10 | 71.09 ± 0.10 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 47.51 ± 0.33 | 46.49 ± 0.12 | 19.01 ± 0.19 | 73.97 ± 0.14 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.78 ± 0.35 | 51.99 ± 0.05 | 16.67 ± 0.09 | 87.31 ± 0.04 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 45.26 ± 0.29 | 48.63 ± 0.11 | 23.32 ± 0.21 | 73.95 ± 0.13 |
![]() ![]() Llama 3.1 8B Meta | 39.63 ± 0.22 | 51.94 ± 0.11 | 25.71 ± 0.19 | 78.17 ± 0.09 |
![]() ![]() Command R 08-2024 32B CohereLabs | 38.24 ± 0.25 | 40.73 ± 0.08 | 15.40 ± 0.14 | 66.06 ± 0.11 |
![]() ![]() Qwen 2.5 7B Alibaba | 38.20 ± 0.16 | 32.20 ± 0.09 | 7.76 ± 0.15 | 56.64 ± 0.10 |
![]() ![]() Olmo 2 1124 13B AI2 | 35.13 ± 0.29 | 33.28 ± 0.17 | 6.45 ± 0.25 | 60.11 ± 0.20 |
![]() ![]() Apertus 70B Swiss AI | 34.68 ± 0.29 | 47.20 ± 0.08 | 15.39 ± 0.12 | 79.01 ± 0.13 |
![]() ![]() Tulu 3 8B AI2 | 33.94 ± 0.24 | 23.82 ± 0.08 | 14.97 ± 0.09 | 32.67 ± 0.13 |
![]() ![]() Llama 3 8B Meta | 30.16 ± 0.15 | 48.49 ± 0.12 | 22.80 ± 0.20 | 74.19 ± 0.12 |
![]() ![]() Babel 83B Alibaba-DAMO | 29.79 ± 0.41 | 32.33 ± 0.18 | 15.12 ± 0.22 | 49.54 ± 0.31 |
![]() ![]() phi-4 14B Microsoft | 29.72 ± 0.23 | 36.22 ± 0.11 | 14.41 ± 0.10 | 58.03 ± 0.17 |
![]() ![]() Aya Expanse 8B CohereLabs | 29.70 ± 0.16 | 33.88 ± 0.08 | 14.32 ± 0.16 | 53.44 ± 0.09 |
![]() ![]() Babel 9B Alibaba-DAMO | 27.56 ± 0.27 | 46.30 ± 0.12 | 16.53 ± 0.14 | 76.07 ± 0.17 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 26.67 ± 0.34 | 40.18 ± 0.12 | 16.00 ± 0.15 | 64.36 ± 0.17 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 25.28 ± 0.33 | 27.46 ± 0.08 | 13.77 ± 0.12 | 41.15 ± 0.11 |
![]() ![]() Apertus 8B Swiss AI | 23.09 ± 0.39 | 43.11 ± 0.12 | 17.02 ± 0.13 | 69.19 ± 0.20 |
![]() ![]() Ministral 2410 8B Mistral AI | 19.24 ± 0.35 | 25.30 ± 0.15 | 12.14 ± 0.22 | 38.46 ± 0.23 |
![]() ![]() Olmo 2 1124 7B AI2 | 14.84 ± 0.19 | 34.93 ± 0.11 | 14.14 ± 0.10 | 55.71 ± 0.18 |
Model | TL | NLR | Causal Reasoning | Natural Language Inference |
---|---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 68.10 ± 0.14 | 72.06 ± 0.12 | 88.39 ± 0.21 | 55.73 ± 0.21 |
![]() ![]() Gemma 3 27B | 67.70 ± 0.12 | 71.77 ± 0.15 | 88.34 ± 0.19 | 55.21 ± 0.21 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 66.48 ± 0.13 | 70.82 ± 0.13 | 83.79 ± 0.21 | 57.85 ± 0.21 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 66.38 ± 0.17 | 72.94 ± 0.34 | 86.02 ± 0.44 | 59.87 ± 0.58 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 65.35 ± 0.14 | 73.68 ± 0.12 | 83.59 ± 0.13 | 63.77 ± 0.17 |
![]() ![]() Gemma 3 12B | 65.00 ± 0.11 | 70.81 ± 0.16 | 86.80 ± 0.16 | 54.82 ± 0.28 |
![]() ![]() Llama 3.3 70B Meta | 63.21 ± 0.11 | 73.40 ± 0.15 | 87.54 ± 0.17 | 59.27 ± 0.23 |
![]() ![]() Tulu 3 70B AI2 | 62.96 ± 0.24 | 74.86 ± 0.24 | 86.80 ± 0.34 | 62.92 ± 0.30 |
![]() ![]() Qwen 3 32B Alibaba | 62.23 ± 0.13 | 67.39 ± 0.24 | 74.46 ± 0.37 | 60.32 ± 0.29 |
![]() ![]() Qwen 2.5 72B Alibaba | 61.94 ± 0.13 | 67.88 ± 0.23 | 84.09 ± 0.21 | 51.67 ± 0.40 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.84 ± 0.11 | 62.99 ± 0.09 | 87.07 ± 0.15 | 38.90 ± 0.11 |
![]() ![]() Llama 3.1 70B Meta | 61.82 ± 0.17 | 71.91 ± 0.32 | 86.52 ± 0.39 | 57.29 ± 0.43 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.39 ± 0.12 | 66.06 ± 0.13 | 78.89 ± 0.15 | 53.24 ± 0.20 |
![]() ![]() Gemma 2 27B | 61.19 ± 0.16 | 68.27 ± 0.29 | 85.76 ± 0.26 | 50.78 ± 0.55 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.75 ± 0.22 | 66.13 ± 0.35 | 87.14 ± 0.38 | 45.12 ± 0.53 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.98 ± 0.15 | 67.63 ± 0.44 | 84.89 ± 0.60 | 50.37 ± 0.59 |
![]() ![]() Qwen 3 14B Alibaba | 57.37 ± 0.13 | 61.24 ± 0.21 | 71.83 ± 0.36 | 50.65 ± 0.20 |
![]() ![]() Qwen 2.5 32B Alibaba | 56.49 ± 0.16 | 59.49 ± 0.15 | 70.27 ± 0.17 | 48.70 ± 0.23 |
![]() ![]() Gemma 2 9B | 53.79 ± 0.17 | 60.91 ± 0.26 | 84.53 ± 0.43 | 37.30 ± 0.46 |
![]() ![]() Olmo 2 0325 32B AI2 | 53.00 ± 0.24 | 55.24 ± 0.37 | 72.78 ± 0.70 | 37.71 ± 0.42 |
![]() ![]() Llama 3 70B Meta | 52.32 ± 0.14 | 66.65 ± 0.18 | 80.60 ± 0.24 | 52.70 ± 0.28 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.26 ± 0.20 | 62.30 ± 0.41 | 85.57 ± 0.37 | 39.03 ± 0.62 |
![]() ![]() Sailor2 20B SAIL | 51.82 ± 0.15 | 70.11 ± 0.12 | 83.28 ± 0.11 | 56.94 ± 0.18 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.62 ± 0.12 | 51.72 ± 0.17 | 67.21 ± 0.23 | 36.23 ± 0.19 |
![]() ![]() Sailor2 8B SAIL | 51.41 ± 0.16 | 67.21 ± 0.24 | 83.54 ± 0.23 | 50.89 ± 0.45 |
![]() ![]() Qwen 3 8B Alibaba | 50.42 ± 0.16 | 40.04 ± 0.28 | 54.45 ± 0.49 | 25.63 ± 0.29 |
![]() ![]() Command A 03-2025 111B CohereLabs | 49.22 ± 0.21 | 44.47 ± 0.11 | 88.93 ± 0.22 | 0.00 ± 0.00 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.07 ± 0.22 | 54.24 ± 0.49 | 70.13 ± 0.70 | 38.35 ± 0.72 |
![]() ![]() Aya Expanse 32B CohereLabs | 47.65 ± 0.13 | 53.71 ± 0.25 | 69.55 ± 0.28 | 37.88 ± 0.42 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 47.51 ± 0.33 | 51.40 ± 0.71 | 56.63 ± 1.20 | 46.18 ± 0.56 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.78 ± 0.35 | 56.19 ± 0.50 | 71.31 ± 0.79 | 41.08 ± 0.65 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 45.26 ± 0.29 | 50.73 ± 0.44 | 69.54 ± 0.73 | 31.92 ± 0.70 |
![]() ![]() Llama 3.1 8B Meta | 39.63 ± 0.22 | 34.19 ± 0.71 | 45.57 ± 1.04 | 22.81 ± 0.92 |
![]() ![]() Command R 08-2024 32B CohereLabs | 38.24 ± 0.25 | 38.29 ± 0.93 | 57.78 ± 1.41 | 18.81 ± 0.94 |
![]() ![]() Qwen 2.5 7B Alibaba | 38.20 ± 0.16 | 37.20 ± 0.23 | 36.17 ± 0.23 | 38.23 ± 0.34 |
![]() ![]() Olmo 2 1124 13B AI2 | 35.13 ± 0.29 | 22.06 ± 0.51 | 30.94 ± 0.92 | 13.17 ± 0.58 |
![]() ![]() Apertus 70B Swiss AI | 34.68 ± 0.29 | 13.06 ± 0.95 | 22.62 ± 1.57 | 3.50 ± 0.98 |
![]() ![]() Tulu 3 8B AI2 | 33.94 ± 0.24 | 38.05 ± 0.54 | 52.14 ± 0.64 | 23.97 ± 0.91 |
![]() ![]() Llama 3 8B Meta | 30.16 ± 0.15 | 27.55 ± 0.56 | 36.49 ± 0.99 | 18.61 ± 0.64 |
![]() ![]() Babel 83B Alibaba-DAMO | 29.79 ± 0.41 | 36.05 ± 0.77 | 54.12 ± 1.25 | 17.97 ± 0.97 |
![]() ![]() phi-4 14B Microsoft | 29.72 ± 0.23 | 26.77 ± 0.45 | 53.54 ± 0.90 | 0.00 ± 0.00 |
![]() ![]() Aya Expanse 8B CohereLabs | 29.70 ± 0.16 | 24.77 ± 0.26 | 32.46 ± 0.33 | 17.08 ± 0.41 |
![]() ![]() Babel 9B Alibaba-DAMO | 27.56 ± 0.27 | 19.69 ± 0.75 | 39.38 ± 1.49 | 0.00 ± 0.00 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 26.67 ± 0.34 | 16.06 ± 0.67 | 29.13 ± 1.32 | 3.00 ± 0.77 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 25.28 ± 0.33 | 11.44 ± 0.72 | 20.74 ± 1.11 | 2.15 ± 0.75 |
![]() ![]() Apertus 8B Swiss AI | 23.09 ± 0.39 | 7.31 ± 0.90 | 13.71 ± 1.69 | 0.92 ± 0.48 |
![]() ![]() Ministral 2410 8B Mistral AI | 19.24 ± 0.35 | 7.58 ± 0.83 | 12.75 ± 1.54 | 2.41 ± 0.78 |
![]() ![]() Olmo 2 1124 7B AI2 | 14.84 ± 0.19 | 2.96 ± 0.58 | 5.91 ± 1.17 | 0.00 ± 0.00 |
Model | TL | NLU | Belebele QA | Sentiment Analysis |
---|---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 68.10 ± 0.14 | 70.56 ± 0.29 | 82.62 ± 0.51 | 58.50 ± 0.17 |
![]() ![]() Gemma 3 27B | 67.70 ± 0.12 | 70.64 ± 0.18 | 82.84 ± 0.35 | 58.43 ± 0.18 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 66.48 ± 0.13 | 73.20 ± 0.11 | 82.67 ± 0.00 | 63.73 ± 0.21 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 66.38 ± 0.17 | 67.54 ± 0.38 | 81.69 ± 0.66 | 53.40 ± 0.35 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 65.35 ± 0.14 | 73.37 ± 0.18 | 78.44 ± 0.18 | 68.29 ± 0.31 |
![]() ![]() Gemma 3 12B | 65.00 ± 0.11 | 67.30 ± 0.17 | 77.87 ± 0.27 | 56.72 ± 0.15 |
![]() ![]() Llama 3.3 70B Meta | 63.21 ± 0.11 | 68.79 ± 0.19 | 83.07 ± 0.34 | 54.51 ± 0.17 |
![]() ![]() Tulu 3 70B AI2 | 62.96 ± 0.24 | 73.84 ± 0.32 | 84.67 ± 0.62 | 63.02 ± 0.36 |
![]() ![]() Qwen 3 32B Alibaba | 62.23 ± 0.13 | 72.85 ± 0.35 | 77.91 ± 0.43 | 67.79 ± 0.39 |
![]() ![]() Qwen 2.5 72B Alibaba | 61.94 ± 0.13 | 72.49 ± 0.25 | 78.22 ± 0.40 | 66.75 ± 0.25 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.84 ± 0.11 | 66.60 ± 0.09 | 78.58 ± 0.12 | 54.62 ± 0.16 |
![]() ![]() Llama 3.1 70B Meta | 61.82 ± 0.17 | 67.39 ± 0.31 | 81.33 ± 0.61 | 53.45 ± 0.31 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.39 ± 0.12 | 68.75 ± 0.18 | 83.82 ± 0.33 | 53.67 ± 0.14 |
![]() ![]() Gemma 2 27B | 61.19 ± 0.16 | 71.20 ± 0.28 | 83.24 ± 0.48 | 59.16 ± 0.39 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.75 ± 0.22 | 71.44 ± 0.37 | 80.89 ± 0.58 | 61.99 ± 0.54 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.98 ± 0.15 | 67.46 ± 0.43 | 83.42 ± 0.75 | 51.50 ± 0.38 |
![]() ![]() Qwen 3 14B Alibaba | 57.37 ± 0.13 | 70.68 ± 0.23 | 76.62 ± 0.37 | 64.73 ± 0.22 |
![]() ![]() Qwen 2.5 32B Alibaba | 56.49 ± 0.16 | 68.54 ± 0.19 | 78.93 ± 0.19 | 58.14 ± 0.29 |
![]() ![]() Gemma 2 9B | 53.79 ± 0.17 | 63.52 ± 0.47 | 77.42 ± 0.35 | 49.62 ± 0.77 |
![]() ![]() Olmo 2 0325 32B AI2 | 53.00 ± 0.24 | 69.10 ± 0.43 | 78.98 ± 0.68 | 59.22 ± 0.65 |
![]() ![]() Llama 3 70B Meta | 52.32 ± 0.14 | 64.82 ± 0.21 | 78.71 ± 0.32 | 50.93 ± 0.18 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.26 ± 0.20 | 64.57 ± 0.27 | 74.62 ± 0.49 | 54.52 ± 0.43 |
![]() ![]() Sailor2 20B SAIL | 51.82 ± 0.15 | 74.42 ± 0.24 | 81.69 ± 0.39 | 67.14 ± 0.33 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.62 ± 0.12 | 68.11 ± 0.22 | 75.87 ± 0.40 | 60.36 ± 0.14 |
![]() ![]() Sailor2 8B SAIL | 51.41 ± 0.16 | 65.53 ± 0.39 | 72.80 ± 0.73 | 58.27 ± 0.22 |
![]() ![]() Qwen 3 8B Alibaba | 50.42 ± 0.16 | 65.19 ± 0.17 | 69.11 ± 0.18 | 61.27 ± 0.29 |
![]() ![]() Command A 03-2025 111B CohereLabs | 49.22 ± 0.21 | 0.33 ± 0.26 | 0.67 ± 0.53 | 0.00 ± 0.00 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.07 ± 0.22 | 59.40 ± 0.39 | 71.38 ± 0.79 | 47.42 ± 0.38 |
![]() ![]() Aya Expanse 32B CohereLabs | 47.65 ± 0.13 | 59.38 ± 0.45 | 66.71 ± 0.75 | 52.05 ± 0.27 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 47.51 ± 0.33 | 65.33 ± 0.41 | 78.22 ± 0.69 | 52.43 ± 0.55 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.78 ± 0.35 | 45.27 ± 0.90 | 57.82 ± 1.62 | 32.71 ± 0.69 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 45.26 ± 0.29 | 60.74 ± 0.55 | 64.71 ± 1.02 | 56.78 ± 0.25 |
![]() ![]() Llama 3.1 8B Meta | 39.63 ± 0.22 | 53.04 ± 0.47 | 64.40 ± 0.96 | 41.68 ± 0.35 |
![]() ![]() Command R 08-2024 32B CohereLabs | 38.24 ± 0.25 | 45.52 ± 0.73 | 47.51 ± 1.49 | 43.52 ± 0.27 |
![]() ![]() Qwen 2.5 7B Alibaba | 38.20 ± 0.16 | 57.42 ± 0.15 | 61.87 ± 0.24 | 52.98 ± 0.18 |
![]() ![]() Olmo 2 1124 13B AI2 | 35.13 ± 0.29 | 32.31 ± 0.56 | 50.31 ± 0.76 | 14.30 ± 0.91 |
![]() ![]() Apertus 70B Swiss AI | 34.68 ± 0.29 | 49.91 ± 0.68 | 53.47 ± 1.03 | 46.35 ± 0.77 |
![]() ![]() Tulu 3 8B AI2 | 33.94 ± 0.24 | 27.52 ± 0.43 | 54.22 ± 0.79 | 0.81 ± 0.43 |
![]() ![]() Llama 3 8B Meta | 30.16 ± 0.15 | 49.12 ± 0.35 | 58.76 ± 0.66 | 39.49 ± 0.25 |
![]() ![]() Babel 83B Alibaba-DAMO | 29.79 ± 0.41 | 28.50 ± 0.94 | 48.22 ± 1.63 | 8.78 ± 0.86 |
![]() ![]() phi-4 14B Microsoft | 29.72 ± 0.23 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Aya Expanse 8B CohereLabs | 29.70 ± 0.16 | 42.85 ± 0.34 | 42.84 ± 0.51 | 42.85 ± 0.34 |
![]() ![]() Babel 9B Alibaba-DAMO | 27.56 ± 0.27 | 54.89 ± 0.68 | 67.47 ± 1.25 | 42.31 ± 0.72 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 26.67 ± 0.34 | 35.61 ± 0.92 | 44.53 ± 1.78 | 26.69 ± 0.73 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 25.28 ± 0.33 | 36.20 ± 0.78 | 35.64 ± 1.53 | 36.76 ± 0.46 |
![]() ![]() Apertus 8B Swiss AI | 23.09 ± 0.39 | 23.89 ± 0.72 | 28.40 ± 1.47 | 19.38 ± 0.72 |
![]() ![]() Ministral 2410 8B Mistral AI | 19.24 ± 0.35 | 25.12 ± 1.07 | 30.71 ± 1.73 | 19.53 ± 1.03 |
![]() ![]() Olmo 2 1124 7B AI2 | 14.84 ± 0.19 | 29.34 ± 0.79 | 26.36 ± 1.62 | 32.33 ± 0.74 |
Model | TL | Safety | Toxicity Detection |
---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 68.10 ± 0.14 | 62.67 ± 0.42 | 62.67 ± 0.42 |
![]() ![]() Gemma 3 27B | 67.70 ± 0.12 | 63.20 ± 0.34 | 63.20 ± 0.34 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 66.48 ± 0.13 | 49.55 ± 0.31 | 49.55 ± 0.31 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 66.38 ± 0.17 | 54.65 ± 0.87 | 54.65 ± 0.87 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 65.35 ± 0.14 | 52.93 ± 0.48 | 52.93 ± 0.48 |
![]() ![]() Gemma 3 12B | 65.00 ± 0.11 | 56.23 ± 0.45 | 56.23 ± 0.45 |
![]() ![]() Llama 3.3 70B Meta | 63.21 ± 0.11 | 49.95 ± 0.46 | 49.95 ± 0.46 |
![]() ![]() Tulu 3 70B AI2 | 62.96 ± 0.24 | 47.35 ± 0.87 | 47.35 ± 0.87 |
![]() ![]() Qwen 3 32B Alibaba | 62.23 ± 0.13 | 52.45 ± 0.44 | 52.45 ± 0.44 |
![]() ![]() Qwen 2.5 72B Alibaba | 61.94 ± 0.13 | 43.55 ± 0.50 | 43.55 ± 0.50 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.84 ± 0.11 | 38.83 ± 0.17 | 38.83 ± 0.17 |
![]() ![]() Llama 3.1 70B Meta | 61.82 ± 0.17 | 49.98 ± 0.77 | 49.98 ± 0.77 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.39 ± 0.12 | 33.43 ± 0.11 | 33.43 ± 0.11 |
![]() ![]() Gemma 2 27B | 61.19 ± 0.16 | 59.33 ± 0.43 | 59.33 ± 0.43 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.75 ± 0.22 | 49.42 ± 0.54 | 49.42 ± 0.54 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.98 ± 0.15 | 50.85 ± 0.72 | 50.85 ± 0.72 |
![]() ![]() Qwen 3 14B Alibaba | 57.37 ± 0.13 | 46.93 ± 0.23 | 46.93 ± 0.23 |
![]() ![]() Qwen 2.5 32B Alibaba | 56.49 ± 0.16 | 46.73 ± 0.28 | 46.73 ± 0.28 |
![]() ![]() Gemma 2 9B | 53.79 ± 0.17 | 36.77 ± 0.26 | 36.77 ± 0.26 |
![]() ![]() Olmo 2 0325 32B AI2 | 53.00 ± 0.24 | 43.80 ± 0.65 | 43.80 ± 0.65 |
![]() ![]() Llama 3 70B Meta | 52.32 ± 0.14 | 48.33 ± 0.52 | 48.33 ± 0.52 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.26 ± 0.20 | 28.78 ± 0.31 | 28.78 ± 0.31 |
![]() ![]() Sailor2 20B SAIL | 51.82 ± 0.15 | 48.57 ± 0.36 | 48.57 ± 0.36 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.62 ± 0.12 | 44.63 ± 0.35 | 44.63 ± 0.35 |
![]() ![]() Sailor2 8B SAIL | 51.41 ± 0.16 | 45.17 ± 0.28 | 45.17 ± 0.28 |
![]() ![]() Qwen 3 8B Alibaba | 50.42 ± 0.16 | 42.43 ± 0.28 | 42.43 ± 0.28 |
![]() ![]() Command A 03-2025 111B CohereLabs | 49.22 ± 0.21 | 52.12 ± 0.60 | 52.12 ± 0.60 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.07 ± 0.22 | 19.58 ± 0.46 | 19.58 ± 0.46 |
![]() ![]() Aya Expanse 32B CohereLabs | 47.65 ± 0.13 | 44.55 ± 0.36 | 44.55 ± 0.36 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 47.51 ± 0.33 | 19.77 ± 1.50 | 19.77 ± 1.50 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.78 ± 0.35 | 31.68 ± 0.36 | 31.68 ± 0.36 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 45.26 ± 0.29 | 33.08 ± 1.14 | 33.08 ± 1.14 |
![]() ![]() Llama 3.1 8B Meta | 39.63 ± 0.22 | 18.87 ± 0.75 | 18.87 ± 0.75 |
![]() ![]() Command R 08-2024 32B CohereLabs | 38.24 ± 0.25 | 35.07 ± 1.02 | 35.07 ± 1.02 |
![]() ![]() Qwen 2.5 7B Alibaba | 38.20 ± 0.16 | 29.22 ± 0.17 | 29.22 ± 0.17 |
![]() ![]() Olmo 2 1124 13B AI2 | 35.13 ± 0.29 | 44.95 ± 0.99 | 44.95 ± 0.99 |
![]() ![]() Apertus 70B Swiss AI | 34.68 ± 0.29 | 26.03 ± 1.24 | 26.03 ± 1.24 |
![]() ![]() Tulu 3 8B AI2 | 33.94 ± 0.24 | 23.27 ± 0.64 | 23.27 ± 0.64 |
![]() ![]() Llama 3 8B Meta | 30.16 ± 0.15 | 8.32 ± 0.42 | 8.32 ± 0.42 |
![]() ![]() Babel 83B Alibaba-DAMO | 29.79 ± 0.41 | 25.30 ± 1.03 | 25.30 ± 1.03 |
![]() ![]() phi-4 14B Microsoft | 29.72 ± 0.23 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Aya Expanse 8B CohereLabs | 29.70 ± 0.16 | 17.12 ± 0.52 | 17.12 ± 0.52 |
![]() ![]() Babel 9B Alibaba-DAMO | 27.56 ± 0.27 | 24.53 ± 0.51 | 24.53 ± 0.51 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 26.67 ± 0.34 | 12.10 ± 1.51 | 12.10 ± 1.51 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 25.28 ± 0.33 | 26.00 ± 1.25 | 26.00 ± 1.25 |
![]() ![]() Apertus 8B Swiss AI | 23.09 ± 0.39 | 26.53 ± 1.09 | 26.53 ± 1.09 |
![]() ![]() Ministral 2410 8B Mistral AI | 19.24 ± 0.35 | 10.10 ± 1.89 | 10.10 ± 1.89 |
![]() ![]() Olmo 2 1124 7B AI2 | 14.84 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | TL | Knowledge | Global MMLU Lite |
---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 68.10 ± 0.14 | 65.01 ± 0.33 | 65.01 ± 0.33 |
![]() ![]() Gemma 3 27B | 67.70 ± 0.12 | 65.34 ± 0.23 | 65.34 ± 0.23 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 66.48 ± 0.13 | 66.38 ± 0.25 | 66.38 ± 0.25 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 66.38 ± 0.17 | 71.22 ± 0.34 | 71.22 ± 0.34 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 65.35 ± 0.14 | 60.32 ± 0.14 | 60.32 ± 0.14 |
![]() ![]() Gemma 3 12B | 65.00 ± 0.11 | 58.72 ± 0.19 | 58.72 ± 0.19 |
![]() ![]() Llama 3.3 70B Meta | 63.21 ± 0.11 | 68.52 ± 0.24 | 68.52 ± 0.24 |
![]() ![]() Tulu 3 70B AI2 | 62.96 ± 0.24 | 61.67 ± 0.52 | 61.67 ± 0.52 |
![]() ![]() Qwen 3 32B Alibaba | 62.23 ± 0.13 | 59.58 ± 0.31 | 59.58 ± 0.31 |
![]() ![]() Qwen 2.5 72B Alibaba | 61.94 ± 0.13 | 67.90 ± 0.32 | 67.90 ± 0.32 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 61.84 ± 0.11 | 69.90 ± 0.11 | 69.90 ± 0.11 |
![]() ![]() Llama 3.1 70B Meta | 61.82 ± 0.17 | 66.61 ± 0.42 | 66.61 ± 0.42 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.39 ± 0.12 | 58.43 ± 0.23 | 58.43 ± 0.23 |
![]() ![]() Gemma 2 27B | 61.19 ± 0.16 | 58.55 ± 0.52 | 58.55 ± 0.52 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.75 ± 0.22 | 51.50 ± 0.49 | 51.50 ± 0.49 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 58.98 ± 0.15 | 55.93 ± 0.44 | 55.93 ± 0.44 |
![]() ![]() Qwen 3 14B Alibaba | 57.37 ± 0.13 | 50.74 ± 0.26 | 50.74 ± 0.26 |
![]() ![]() Qwen 2.5 32B Alibaba | 56.49 ± 0.16 | 59.01 ± 0.21 | 59.01 ± 0.21 |
![]() ![]() Gemma 2 9B | 53.79 ± 0.17 | 50.93 ± 0.41 | 50.93 ± 0.41 |
![]() ![]() Olmo 2 0325 32B AI2 | 53.00 ± 0.24 | 47.60 ± 0.59 | 47.60 ± 0.59 |
![]() ![]() Llama 3 70B Meta | 52.32 ± 0.14 | 59.07 ± 0.28 | 59.07 ± 0.28 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.26 ± 0.20 | 49.48 ± 0.41 | 49.48 ± 0.41 |
![]() ![]() Sailor2 20B SAIL | 51.82 ± 0.15 | 50.98 ± 0.36 | 50.98 ± 0.36 |
![]() ![]() Qwen 2.5 14B Alibaba | 51.62 ± 0.12 | 48.15 ± 0.19 | 48.15 ± 0.19 |
![]() ![]() Sailor2 8B SAIL | 51.41 ± 0.16 | 50.31 ± 0.27 | 50.31 ± 0.27 |
![]() ![]() Qwen 3 8B Alibaba | 50.42 ± 0.16 | 46.21 ± 0.34 | 46.21 ± 0.34 |
![]() ![]() Command A 03-2025 111B CohereLabs | 49.22 ± 0.21 | 61.83 ± 0.36 | 61.83 ± 0.36 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 49.07 ± 0.22 | 44.50 ± 0.54 | 44.50 ± 0.54 |
![]() ![]() Aya Expanse 32B CohereLabs | 47.65 ± 0.13 | 42.26 ± 0.17 | 42.26 ± 0.17 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 47.51 ± 0.33 | 53.55 ± 0.61 | 53.55 ± 0.61 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 45.78 ± 0.35 | 47.11 ± 0.51 | 47.11 ± 0.51 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 45.26 ± 0.29 | 39.15 ± 0.63 | 39.15 ± 0.63 |
![]() ![]() Llama 3.1 8B Meta | 39.63 ± 0.22 | 34.69 ± 0.52 | 34.69 ± 0.52 |
![]() ![]() Command R 08-2024 32B CohereLabs | 38.24 ± 0.25 | 29.49 ± 0.72 | 29.49 ± 0.72 |
![]() ![]() Qwen 2.5 7B Alibaba | 38.20 ± 0.16 | 37.93 ± 0.23 | 37.93 ± 0.23 |
![]() ![]() Olmo 2 1124 13B AI2 | 35.13 ± 0.29 | 25.37 ± 0.46 | 25.37 ± 0.46 |
![]() ![]() Apertus 70B Swiss AI | 34.68 ± 0.29 | 20.63 ± 0.72 | 20.63 ± 0.72 |
![]() ![]() Tulu 3 8B AI2 | 33.94 ± 0.24 | 30.92 ± 0.44 | 30.92 ± 0.44 |
![]() ![]() Llama 3 8B Meta | 30.16 ± 0.15 | 30.93 ± 0.46 | 30.93 ± 0.46 |
![]() ![]() Babel 83B Alibaba-DAMO | 29.79 ± 0.41 | 26.20 ± 0.91 | 26.20 ± 0.91 |
![]() ![]() phi-4 14B Microsoft | 29.72 ± 0.23 | 43.15 ± 0.56 | 43.15 ± 0.56 |
![]() ![]() Aya Expanse 8B CohereLabs | 29.70 ± 0.16 | 24.05 ± 0.38 | 24.05 ± 0.38 |
![]() ![]() Babel 9B Alibaba-DAMO | 27.56 ± 0.27 | 9.79 ± 0.47 | 9.79 ± 0.47 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 26.67 ± 0.34 | 8.43 ± 0.52 | 8.43 ± 0.52 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 25.28 ± 0.33 | 19.39 ± 0.69 | 19.39 ± 0.69 |
![]() ![]() Apertus 8B Swiss AI | 23.09 ± 0.39 | 7.79 ± 0.63 | 7.79 ± 0.63 |
![]() ![]() Ministral 2410 8B Mistral AI | 19.24 ± 0.35 | 15.65 ± 0.71 | 15.65 ± 0.71 |
![]() ![]() Olmo 2 1124 7B AI2 | 14.84 ± 0.19 | 3.56 ± 0.35 | 3.56 ± 0.35 |