Filipino Performance
Filipino Scores by Model
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 27B 74.53±0.28 |
![]() ![]() 27B 74.09±0.12 |
![]() ![]() 70B 72.84±0.46 |
![]() ![]() 12B 72.02±0.31 |
![]() ![]() 70B 70.26±0.29 |
![]() ![]() 30B MoE 70.06±0.17 |
![]() ![]() 70B 69.96±0.54 |
![]() ![]() 109B MoE 69.94±0.21 |
![]() ![]() 32B 69.72±0.18 |
![]() ![]() 72B 69.65±0.35 |
![]() ![]() 70B 69.03±0.36 |
![]() ![]() 9B 68.43±0.50 |
![]() ![]() 27B 68.03±0.32 |
![]() ![]() 123B 66.60±0.44 |
![]() ![]() 14B 65.76±0.21 |
![]() ![]() 32B 64.68±0.40 |
![]() ![]() 9B 63.06±0.65 |
![]() ![]() 10B 61.98±0.76 |
![]() ![]() 32B 61.97±0.50 |
![]() ![]() 14B 60.86±0.34 |
![]() ![]() 8B 60.81±0.12 |
![]() ![]() 8B 60.38±0.38 |
![]() ![]() 8B 60.13±0.17 |
![]() ![]() 70B 60.08±0.27 |
![]() ![]() 20B 59.46±0.23 |
![]() ![]() 24B 58.60±1.46 |
![]() ![]() 111B 58.17±1.66 |
![]() ![]() 21B MoE 57.79±2.05 |
![]() ![]() 32B 57.63±0.35 |
![]() ![]() 104B 55.84±0.68 |
![]() ![]() 8B 52.90±0.58 |
![]() ![]() 7B 50.88±0.27 |
![]() ![]() 32B 50.45±1.54 |
![]() ![]() 13B 48.53±0.91 |
![]() ![]() 8B 48.02±0.94 |
![]() ![]() 8B 44.85±0.46 |
![]() ![]() 8B 44.54±0.42 |
![]() ![]() 83B 44.11±6.04 |
![]() ![]() 7B 42.29±2.33 |
![]() ![]() 9B 42.18±2.86 |
![]() ![]() 7B 40.99±1.25 |
![]() ![]() 8B 36.21±1.39 |
![]() ![]() 14B 35.82±1.58 |
![]() ![]() 7B 28.95±2.15 |
Filipino Competencies
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TL | Cultural | Instruction Following | Multi-Turn Chat | NLG | NLR | NLU | Safety | Knowledge |
---|---|---|---|---|---|---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 74.53 ± 0.28 | 87.17 ± 0.54 | 89.29 ± 1.89 | 47.36 ± 2.13 | 55.60 ± 0.09 | 82.35 ± 0.39 | 79.50 ± 0.30 | 81.44 ± 0.47 | 73.56 ± 0.30 |
![]() ![]() Gemma 3 27B | 74.09 ± 0.12 | 86.83 ± 0.21 | 88.10 ± 1.37 | 44.83 ± 1.52 | 55.56 ± 0.15 | 82.16 ± 0.22 | 79.70 ± 0.34 | 81.66 ± 0.23 | 73.91 ± 0.16 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.84 ± 0.46 | 90.67 ± 0.86 | 92.14 ± 1.72 | 26.51 ± 1.92 | 57.63 ± 0.15 | 83.08 ± 1.11 | 77.63 ± 1.29 | 76.81 ± 2.24 | 78.22 ± 0.91 |
![]() ![]() Gemma 3 12B | 72.02 ± 0.31 | 86.42 ± 0.24 | 87.38 ± 1.31 | 42.19 ± 1.58 | 54.73 ± 0.08 | 81.61 ± 0.22 | 77.26 ± 0.46 | 78.03 ± 0.49 | 68.56 ± 0.29 |
![]() ![]() Llama 3.3 70B Meta | 70.26 ± 0.29 | 87.42 ± 0.30 | 91.43 ± 0.93 | 14.87 ± 1.36 | 55.53 ± 0.10 | 83.26 ± 0.12 | 78.50 ± 0.39 | 74.88 ± 0.85 | 76.19 ± 0.47 |
![]() ![]() Qwen 3 30B MoE Alibaba | 70.06 ± 0.17 | 91.08 ± 0.24 | 84.05 ± 1.16 | 42.03 ± 0.94 | 50.65 ± 0.07 | 79.11 ± 0.07 | 78.52 ± 0.28 | 66.72 ± 0.06 | 68.34 ± 0.45 |
![]() ![]() Tulu 3 70B AI2 | 69.96 ± 0.54 | 86.50 ± 0.85 | 82.62 ± 1.89 | 27.48 ± 1.49 | 53.46 ± 0.24 | 84.28 ± 0.47 | 81.78 ± 0.86 | 73.41 ± 3.19 | 70.19 ± 0.76 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.94 ± 0.21 | 90.00 ± 0.00 | 92.74 ± 1.27 | 22.25 ± 1.09 | 54.96 ± 0.09 | 76.42 ± 0.10 | 76.78 ± 0.14 | 69.41 ± 0.09 | 77.00 ± 0.21 |
![]() ![]() Qwen 3 32B Alibaba | 69.72 ± 0.18 | 83.33 ± 0.55 | 82.14 ± 2.02 | 34.21 ± 1.74 | 51.44 ± 0.13 | 80.36 ± 0.64 | 80.84 ± 0.21 | 76.19 ± 0.59 | 69.22 ± 0.57 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.65 ± 0.35 | 86.67 ± 0.60 | 83.81 ± 1.69 | 29.09 ± 0.87 | 49.46 ± 0.11 | 79.96 ± 1.06 | 80.88 ± 0.31 | 71.88 ± 0.35 | 75.44 ± 0.41 |
![]() ![]() Llama 3.1 70B Meta | 69.03 ± 0.36 | 87.58 ± 0.89 | 85.95 ± 1.82 | 12.07 ± 1.36 | 57.03 ± 0.24 | 82.42 ± 1.00 | 77.50 ± 1.09 | 75.19 ± 1.52 | 74.50 ± 0.52 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 68.43 ± 0.50 | 89.33 ± 1.54 | 84.05 ± 1.99 | 22.79 ± 1.24 | 54.58 ± 0.12 | 78.66 ± 1.38 | 80.22 ± 1.80 | 74.63 ± 0.48 | 63.22 ± 0.47 |
![]() ![]() Gemma 2 27B | 68.03 ± 0.32 | 87.08 ± 0.60 | 77.14 ± 1.50 | 15.89 ± 1.54 | 55.99 ± 0.29 | 80.13 ± 1.33 | 80.06 ± 1.16 | 79.50 ± 0.57 | 68.41 ± 0.80 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 66.60 ± 0.44 | 87.33 ± 0.96 | 76.19 ± 2.06 | 16.92 ± 1.40 | 53.95 ± 0.26 | 79.56 ± 1.58 | 77.58 ± 2.06 | 75.41 ± 0.73 | 65.84 ± 0.99 |
![]() ![]() Qwen 3 14B Alibaba | 65.76 ± 0.21 | 83.75 ± 0.24 | 77.26 ± 1.39 | 23.17 ± 1.04 | 50.19 ± 0.11 | 76.36 ± 0.45 | 79.61 ± 0.23 | 73.44 ± 0.36 | 62.28 ± 0.66 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.68 ± 0.40 | 84.00 ± 0.55 | 76.67 ± 2.42 | 14.82 ± 1.32 | 46.44 ± 0.13 | 75.39 ± 0.31 | 78.20 ± 0.42 | 73.34 ± 0.35 | 68.59 ± 0.28 |
![]() ![]() Gemma 2 9B | 63.06 ± 0.65 | 82.17 ± 1.32 | 77.86 ± 1.99 | 10.94 ± 1.23 | 52.58 ± 0.35 | 75.22 ± 0.94 | 74.53 ± 5.09 | 68.44 ± 0.53 | 62.72 ± 0.88 |
![]() ![]() MERaLiON 2 10B A*STAR | 61.98 ± 0.76 | 82.25 ± 2.79 | 74.76 ± 1.96 | 9.00 ± 0.99 | 52.10 ± 0.22 | 76.19 ± 1.54 | 75.43 ± 2.85 | 64.31 ± 0.70 | 61.78 ± 1.97 |
![]() ![]() Olmo 2 0325 32B AI2 | 61.97 ± 0.50 | 85.83 ± 1.32 | 69.76 ± 1.44 | 10.83 ± 0.81 | 46.45 ± 1.48 | 72.42 ± 1.35 | 78.58 ± 2.66 | 71.91 ± 2.19 | 59.97 ± 1.39 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.86 ± 0.34 | 80.50 ± 0.41 | 71.55 ± 1.85 | 10.56 ± 0.97 | 43.47 ± 0.19 | 70.64 ± 0.16 | 77.69 ± 0.39 | 72.22 ± 0.36 | 60.25 ± 0.16 |
![]() ![]() Qwen 3 8B Alibaba | 60.81 ± 0.12 | 79.17 ± 1.04 | 72.02 ± 1.17 | 18.37 ± 1.13 | 47.09 ± 0.08 | 64.02 ± 0.56 | 75.43 ± 0.42 | 71.16 ± 0.69 | 59.25 ± 0.36 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 60.38 ± 0.38 | 77.67 ± 1.67 | 71.79 ± 1.93 | 19.77 ± 1.39 | 53.24 ± 0.54 | 71.78 ± 1.07 | 71.72 ± 1.89 | 59.91 ± 2.13 | 57.19 ± 0.30 |
![]() ![]() Sailor2 8B SAIL | 60.13 ± 0.17 | 74.92 ± 0.65 | 40.48 ± 1.37 | 25.32 ± 0.90 | 50.26 ± 0.54 | 79.48 ± 0.49 | 76.19 ± 0.42 | 72.47 ± 0.39 | 61.94 ± 0.37 |
![]() ![]() Llama 3 70B Meta | 60.08 ± 0.27 | 87.33 ± 0.60 | 29.88 ± 1.05 | 10.24 ± 0.95 | 55.44 ± 0.12 | 79.25 ± 0.28 | 75.72 ± 0.28 | 74.22 ± 0.28 | 68.59 ± 0.46 |
![]() ![]() Sailor2 20B SAIL | 59.46 ± 0.23 | 84.00 ± 0.00 | 40.12 ± 1.08 | 24.84 ± 1.24 | 25.88 ± 1.49 | 81.44 ± 0.24 | 82.07 ± 0.28 | 74.31 ± 0.18 | 63.00 ± 0.37 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 58.60 ± 1.46 | 81.58 ± 1.13 | 60.12 ± 2.27 | 9.38 ± 1.07 | 46.43 ± 0.51 | 70.95 ± 5.21 | 75.98 ± 2.09 | 60.06 ± 10.26 | 64.31 ± 1.47 |
![]() ![]() Command A 03-2025 111B CohereLabs | 58.17 ± 1.66 | 79.50 ± 7.48 | 76.07 ± 1.51 | 35.56 ± 1.80 | 52.43 ± 0.12 | 58.35 ± 1.87 | 16.55 ± 5.99 | 76.19 ± 1.08 | 70.72 ± 1.03 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 57.79 ± 2.05 | 53.83 ± 12.71 | 73.45 ± 0.96 | 22.68 ± 1.16 | 51.96 ± 0.12 | 73.27 ± 2.67 | 61.64 ± 5.18 | 66.16 ± 0.77 | 59.31 ± 1.47 |
![]() ![]() Aya Expanse 32B CohereLabs | 57.63 ± 0.35 | 74.83 ± 0.69 | 62.50 ± 1.45 | 8.24 ± 0.72 | 44.62 ± 0.24 | 71.80 ± 0.44 | 71.56 ± 0.77 | 72.28 ± 0.89 | 55.19 ± 0.63 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 55.84 ± 0.68 | 77.25 ± 1.66 | 54.17 ± 1.47 | 4.69 ± 1.23 | 48.74 ± 1.49 | 69.70 ± 2.24 | 72.44 ± 0.89 | 66.44 ± 5.48 | 53.31 ± 1.52 |
![]() ![]() Llama 3.1 8B Meta | 52.90 ± 0.58 | 71.75 ± 0.95 | 57.14 ± 1.22 | 5.28 ± 0.90 | 51.87 ± 0.35 | 60.54 ± 3.15 | 67.45 ± 0.71 | 59.53 ± 3.16 | 49.63 ± 0.57 |
![]() ![]() Qwen 2.5 7B Alibaba | 50.88 ± 0.27 | 63.67 ± 0.49 | 53.33 ± 2.34 | 6.79 ± 0.47 | 32.20 ± 0.63 | 63.48 ± 0.43 | 69.98 ± 0.27 | 64.66 ± 0.24 | 52.94 ± 0.15 |
![]() ![]() Command R 08-2024 32B CohereLabs | 50.45 ± 1.54 | 72.08 ± 3.64 | 51.55 ± 2.63 | 3.07 ± 0.46 | 40.71 ± 0.40 | 62.35 ± 3.90 | 61.35 ± 1.92 | 67.41 ± 1.12 | 45.09 ± 2.80 |
![]() ![]() Olmo 2 1124 13B AI2 | 48.53 ± 0.91 | 67.25 ± 0.72 | 61.67 ± 2.95 | 5.23 ± 0.56 | 33.21 ± 2.37 | 53.56 ± 3.71 | 53.28 ± 5.15 | 72.34 ± 2.29 | 41.69 ± 0.78 |
![]() ![]() Tulu 3 8B AI2 | 48.02 ± 0.94 | 65.42 ± 1.65 | 66.31 ± 1.80 | 8.19 ± 1.36 | 23.91 ± 1.08 | 62.74 ± 2.74 | 49.66 ± 4.83 | 61.56 ± 2.09 | 46.41 ± 1.19 |
![]() ![]() Llama 3 8B Meta | 44.85 ± 0.46 | 64.58 ± 1.59 | 19.64 ± 1.54 | 3.77 ± 0.76 | 48.43 ± 0.54 | 56.78 ± 1.38 | 64.51 ± 0.50 | 54.16 ± 0.93 | 46.94 ± 1.44 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.54 ± 0.42 | 62.08 ± 0.72 | 42.86 ± 1.22 | 2.69 ± 0.76 | 33.87 ± 0.46 | 55.47 ± 0.28 | 59.64 ± 1.09 | 58.50 ± 1.06 | 41.22 ± 3.30 |
![]() ![]() Babel 83B Alibaba-DAMO | 44.11 ± 6.04 | 61.08 ± 14.96 | 39.29 ± 3.52 | 2.96 ± 0.46 | 32.21 ± 3.75 | 60.97 ± 7.68 | 50.47 ± 14.65 | 62.75 ± 12.26 | 43.13 ± 14.09 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 42.29 ± 2.33 | 64.33 ± 6.88 | 42.62 ± 2.99 | 6.95 ± 0.74 | 40.18 ± 1.80 | 50.05 ± 4.27 | 53.92 ± 8.80 | 56.41 ± 2.26 | 23.84 ± 7.95 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.18 ± 2.86 | 38.42 ± 12.73 | 43.57 ± 3.70 | 3.18 ± 0.93 | 46.40 ± 1.01 | 48.68 ± 7.53 | 68.76 ± 2.43 | 62.25 ± 1.16 | 26.16 ± 6.56 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 40.99 ± 1.25 | 52.25 ± 9.51 | 44.29 ± 3.14 | 1.51 ± 0.48 | 27.51 ± 0.95 | 47.62 ± 2.53 | 54.66 ± 2.54 | 62.81 ± 3.59 | 37.28 ± 2.85 |
![]() ![]() Ministral 2410 8B Mistral AI | 36.21 ± 1.39 | 60.17 ± 3.82 | 21.43 ± 2.66 | 1.89 ± 0.53 | 25.36 ± 1.86 | 45.83 ± 4.38 | 46.64 ± 5.33 | 55.09 ± 3.31 | 33.31 ± 3.87 |
![]() ![]() phi-4 14B Microsoft | 35.82 ± 1.58 | 74.33 ± 4.53 | 54.29 ± 2.99 | 12.82 ± 1.01 | 36.24 ± 0.93 | 38.84 ± 1.49 | 9.17 ± 3.95 | 5.22 ± 3.59 | 55.63 ± 1.29 |
![]() ![]() Olmo 2 1124 7B AI2 | 28.95 ± 2.15 | 2.00 ± 1.37 | 46.31 ± 2.82 | 1.89 ± 0.60 | 34.91 ± 1.08 | 36.04 ± 5.52 | 50.30 ± 4.48 | 43.88 ± 11.66 | 16.28 ± 4.32 |
Filipino Tasks
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TL | Cultural | Kalahi |
---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 74.53 ± 0.28 | 87.17 ± 0.54 | 87.17 ± 5.26 |
![]() ![]() Gemma 3 27B | 74.09 ± 0.12 | 86.83 ± 0.21 | 86.83 ± 5.33 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.84 ± 0.46 | 90.67 ± 0.86 | 90.67 ± 4.19 |
![]() ![]() Gemma 3 12B | 72.02 ± 0.31 | 86.42 ± 0.24 | 86.42 ± 5.39 |
![]() ![]() Llama 3.3 70B Meta | 70.26 ± 0.29 | 87.42 ± 0.30 | 87.42 ± 5.26 |
![]() ![]() Qwen 3 30B MoE Alibaba | 70.06 ± 0.17 | 91.08 ± 0.24 | 91.08 ± 4.49 |
![]() ![]() Tulu 3 70B AI2 | 69.96 ± 0.54 | 86.50 ± 0.85 | 86.50 ± 5.02 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.94 ± 0.21 | 90.00 ± 0.00 | 90.00 ± 4.80 |
![]() ![]() Qwen 3 32B Alibaba | 69.72 ± 0.18 | 83.33 ± 0.55 | 83.33 ± 5.67 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.65 ± 0.35 | 86.67 ± 0.60 | 86.67 ± 5.31 |
![]() ![]() Llama 3.1 70B Meta | 69.03 ± 0.36 | 87.58 ± 0.89 | 87.58 ± 4.87 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 68.43 ± 0.50 | 89.33 ± 1.54 | 89.33 ± 4.20 |
![]() ![]() Gemma 2 27B | 68.03 ± 0.32 | 87.08 ± 0.60 | 87.08 ± 5.15 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 66.60 ± 0.44 | 87.33 ± 0.96 | 87.33 ± 4.97 |
![]() ![]() Qwen 3 14B Alibaba | 65.76 ± 0.21 | 83.75 ± 0.24 | 83.75 ± 5.75 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.68 ± 0.40 | 84.00 ± 0.55 | 84.00 ± 5.76 |
![]() ![]() Gemma 2 9B | 63.06 ± 0.65 | 82.17 ± 1.32 | 82.17 ± 5.61 |
![]() ![]() MERaLiON 2 10B A*STAR | 61.98 ± 0.76 | 82.25 ± 2.79 | 82.25 ± 5.47 |
![]() ![]() Olmo 2 0325 32B AI2 | 61.97 ± 0.50 | 85.83 ± 1.32 | 85.83 ± 4.97 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.86 ± 0.34 | 80.50 ± 0.41 | 80.50 ± 6.20 |
![]() ![]() Qwen 3 8B Alibaba | 60.81 ± 0.12 | 79.17 ± 1.04 | 79.17 ± 6.18 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 60.38 ± 0.38 | 77.67 ± 1.67 | 77.67 ± 5.94 |
![]() ![]() Sailor2 8B SAIL | 60.13 ± 0.17 | 74.92 ± 0.65 | 74.92 ± 6.71 |
![]() ![]() Llama 3 70B Meta | 60.08 ± 0.27 | 87.33 ± 0.60 | 87.33 ± 5.16 |
![]() ![]() Sailor2 20B SAIL | 59.46 ± 0.23 | 84.00 ± 0.00 | 84.00 ± 5.87 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 58.60 ± 1.46 | 81.58 ± 1.13 | 81.58 ± 5.51 |
![]() ![]() Command A 03-2025 111B CohereLabs | 58.17 ± 1.66 | 79.50 ± 7.48 | 79.50 ± 5.15 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 57.79 ± 2.05 | 53.83 ± 12.71 | 53.83 ± 5.80 |
![]() ![]() Aya Expanse 32B CohereLabs | 57.63 ± 0.35 | 74.83 ± 0.69 | 74.83 ± 6.85 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 55.84 ± 0.68 | 77.25 ± 1.66 | 77.25 ± 5.47 |
![]() ![]() Llama 3.1 8B Meta | 52.90 ± 0.58 | 71.75 ± 0.95 | 71.75 ± 6.69 |
![]() ![]() Qwen 2.5 7B Alibaba | 50.88 ± 0.27 | 63.67 ± 0.49 | 63.67 ± 7.52 |
![]() ![]() Command R 08-2024 32B CohereLabs | 50.45 ± 1.54 | 72.08 ± 3.64 | 72.08 ± 5.83 |
![]() ![]() Olmo 2 1124 13B AI2 | 48.53 ± 0.91 | 67.25 ± 0.72 | 67.25 ± 6.59 |
![]() ![]() Tulu 3 8B AI2 | 48.02 ± 0.94 | 65.42 ± 1.65 | 65.42 ± 6.89 |
![]() ![]() Llama 3 8B Meta | 44.85 ± 0.46 | 64.58 ± 1.59 | 64.58 ± 7.35 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.54 ± 0.42 | 62.08 ± 0.72 | 62.08 ± 7.53 |
![]() ![]() Babel 83B Alibaba-DAMO | 44.11 ± 6.04 | 61.08 ± 14.96 | 61.08 ± 3.76 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 42.29 ± 2.33 | 64.33 ± 6.88 | 64.33 ± 5.10 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.18 ± 2.86 | 38.42 ± 12.73 | 38.42 ± 4.69 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 40.99 ± 1.25 | 52.25 ± 9.51 | 52.25 ± 5.69 |
![]() ![]() Ministral 2410 8B Mistral AI | 36.21 ± 1.39 | 60.17 ± 3.82 | 60.17 ± 5.54 |
![]() ![]() phi-4 14B Microsoft | 35.82 ± 1.58 | 74.33 ± 4.53 | 74.33 ± 5.90 |
![]() ![]() Olmo 2 1124 7B AI2 | 28.95 ± 2.15 | 2.00 ± 1.37 | 2.00 ± 1.53 |
Model | TL | Instruction Following | SEA-IFEval |
---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 74.53 ± 0.28 | 89.29 ± 1.89 | 89.29 ± 4.50 |
![]() ![]() Gemma 3 27B | 74.09 ± 0.12 | 88.10 ± 1.37 | 88.10 ± 5.07 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.84 ± 0.46 | 92.14 ± 1.72 | 92.14 ± 3.77 |
![]() ![]() Gemma 3 12B | 72.02 ± 0.31 | 87.38 ± 1.31 | 87.38 ± 5.41 |
![]() ![]() Llama 3.3 70B Meta | 70.26 ± 0.29 | 91.43 ± 0.93 | 91.43 ± 5.09 |
![]() ![]() Qwen 3 30B MoE Alibaba | 70.06 ± 0.17 | 84.05 ± 1.16 | 84.05 ± 6.41 |
![]() ![]() Tulu 3 70B AI2 | 69.96 ± 0.54 | 82.62 ± 1.89 | 82.62 ± 5.90 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.94 ± 0.21 | 92.74 ± 1.27 | 92.74 ± 3.88 |
![]() ![]() Qwen 3 32B Alibaba | 69.72 ± 0.18 | 82.14 ± 2.02 | 82.14 ± 5.79 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.65 ± 0.35 | 83.81 ± 1.69 | 83.81 ± 5.90 |
![]() ![]() Llama 3.1 70B Meta | 69.03 ± 0.36 | 85.95 ± 1.82 | 85.95 ± 5.00 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 68.43 ± 0.50 | 84.05 ± 1.99 | 84.05 ± 5.28 |
![]() ![]() Gemma 2 27B | 68.03 ± 0.32 | 77.14 ± 1.50 | 77.14 ± 6.23 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 66.60 ± 0.44 | 76.19 ± 2.06 | 76.19 ± 6.06 |
![]() ![]() Qwen 3 14B Alibaba | 65.76 ± 0.21 | 77.26 ± 1.39 | 77.26 ± 7.14 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.68 ± 0.40 | 76.67 ± 2.42 | 76.67 ± 6.26 |
![]() ![]() Gemma 2 9B | 63.06 ± 0.65 | 77.86 ± 1.99 | 77.86 ± 5.64 |
![]() ![]() MERaLiON 2 10B A*STAR | 61.98 ± 0.76 | 74.76 ± 1.96 | 74.76 ± 5.92 |
![]() ![]() Olmo 2 0325 32B AI2 | 61.97 ± 0.50 | 69.76 ± 1.44 | 69.76 ± 7.03 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.86 ± 0.34 | 71.55 ± 1.85 | 71.55 ± 7.20 |
![]() ![]() Qwen 3 8B Alibaba | 60.81 ± 0.12 | 72.02 ± 1.17 | 72.02 ± 7.21 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 60.38 ± 0.38 | 71.79 ± 1.93 | 71.79 ± 6.91 |
![]() ![]() Sailor2 8B SAIL | 60.13 ± 0.17 | 40.48 ± 1.37 | 40.48 ± 8.36 |
![]() ![]() Llama 3 70B Meta | 60.08 ± 0.27 | 29.88 ± 1.05 | 29.88 ± 7.87 |
![]() ![]() Sailor2 20B SAIL | 59.46 ± 0.23 | 40.12 ± 1.08 | 40.12 ± 8.25 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 58.60 ± 1.46 | 60.12 ± 2.27 | 60.12 ± 6.83 |
![]() ![]() Command A 03-2025 111B CohereLabs | 58.17 ± 1.66 | 76.07 ± 1.51 | 76.07 ± 6.63 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 57.79 ± 2.05 | 73.45 ± 0.96 | 73.45 ± 6.78 |
![]() ![]() Aya Expanse 32B CohereLabs | 57.63 ± 0.35 | 62.50 ± 1.45 | 62.50 ± 8.04 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 55.84 ± 0.68 | 54.17 ± 1.47 | 54.17 ± 6.91 |
![]() ![]() Llama 3.1 8B Meta | 52.90 ± 0.58 | 57.14 ± 1.22 | 57.14 ± 7.91 |
![]() ![]() Qwen 2.5 7B Alibaba | 50.88 ± 0.27 | 53.33 ± 2.34 | 53.33 ± 7.96 |
![]() ![]() Command R 08-2024 32B CohereLabs | 50.45 ± 1.54 | 51.55 ± 2.63 | 51.55 ± 6.91 |
![]() ![]() Olmo 2 1124 13B AI2 | 48.53 ± 0.91 | 61.67 ± 2.95 | 61.67 ± 7.21 |
![]() ![]() Tulu 3 8B AI2 | 48.02 ± 0.94 | 66.31 ± 1.80 | 66.31 ± 7.71 |
![]() ![]() Llama 3 8B Meta | 44.85 ± 0.46 | 19.64 ± 1.54 | 19.64 ± 6.64 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.54 ± 0.42 | 42.86 ± 1.22 | 42.86 ± 8.15 |
![]() ![]() Babel 83B Alibaba-DAMO | 44.11 ± 6.04 | 39.29 ± 3.52 | 39.29 ± 5.75 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 42.29 ± 2.33 | 42.62 ± 2.99 | 42.62 ± 7.47 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.18 ± 2.86 | 43.57 ± 3.70 | 43.57 ± 6.94 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 40.99 ± 1.25 | 44.29 ± 3.14 | 44.29 ± 6.33 |
![]() ![]() Ministral 2410 8B Mistral AI | 36.21 ± 1.39 | 21.43 ± 2.66 | 21.43 ± 4.71 |
![]() ![]() phi-4 14B Microsoft | 35.82 ± 1.58 | 54.29 ± 2.99 | 54.29 ± 6.88 |
![]() ![]() Olmo 2 1124 7B AI2 | 28.95 ± 2.15 | 46.31 ± 2.82 | 46.31 ± 7.34 |
Model | TL | Multi-Turn Chat | SEA-MT-Bench |
---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 74.53 ± 0.28 | 47.36 ± 2.13 | 47.36 ± 5.47 |
![]() ![]() Gemma 3 27B | 74.09 ± 0.12 | 44.83 ± 1.52 | 44.83 ± 5.38 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.84 ± 0.46 | 26.51 ± 1.92 | 26.51 ± 4.50 |
![]() ![]() Gemma 3 12B | 72.02 ± 0.31 | 42.19 ± 1.58 | 42.19 ± 5.51 |
![]() ![]() Llama 3.3 70B Meta | 70.26 ± 0.29 | 14.87 ± 1.36 | 14.87 ± 4.54 |
![]() ![]() Qwen 3 30B MoE Alibaba | 70.06 ± 0.17 | 42.03 ± 0.94 | 42.03 ± 5.61 |
![]() ![]() Tulu 3 70B AI2 | 69.96 ± 0.54 | 27.48 ± 1.49 | 27.48 ± 4.72 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.94 ± 0.21 | 22.25 ± 1.09 | 22.25 ± 4.60 |
![]() ![]() Qwen 3 32B Alibaba | 69.72 ± 0.18 | 34.21 ± 1.74 | 34.21 ± 5.33 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.65 ± 0.35 | 29.09 ± 0.87 | 29.09 ± 4.84 |
![]() ![]() Llama 3.1 70B Meta | 69.03 ± 0.36 | 12.07 ± 1.36 | 12.07 ± 3.46 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 68.43 ± 0.50 | 22.79 ± 1.24 | 22.79 ± 4.41 |
![]() ![]() Gemma 2 27B | 68.03 ± 0.32 | 15.89 ± 1.54 | 15.89 ± 3.69 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 66.60 ± 0.44 | 16.92 ± 1.40 | 16.92 ± 3.71 |
![]() ![]() Qwen 3 14B Alibaba | 65.76 ± 0.21 | 23.17 ± 1.04 | 23.17 ± 4.80 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.68 ± 0.40 | 14.82 ± 1.32 | 14.82 ± 4.22 |
![]() ![]() Gemma 2 9B | 63.06 ± 0.65 | 10.94 ± 1.23 | 10.94 ± 3.41 |
![]() ![]() MERaLiON 2 10B A*STAR | 61.98 ± 0.76 | 9.00 ± 0.99 | 9.00 ± 3.21 |
![]() ![]() Olmo 2 0325 32B AI2 | 61.97 ± 0.50 | 10.83 ± 0.81 | 10.83 ± 3.45 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.86 ± 0.34 | 10.56 ± 0.97 | 10.56 ± 3.36 |
![]() ![]() Qwen 3 8B Alibaba | 60.81 ± 0.12 | 18.37 ± 1.13 | 18.37 ± 4.68 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 60.38 ± 0.38 | 19.77 ± 1.39 | 19.77 ± 4.29 |
![]() ![]() Sailor2 8B SAIL | 60.13 ± 0.17 | 25.32 ± 0.90 | 25.32 ± 4.39 |
![]() ![]() Llama 3 70B Meta | 60.08 ± 0.27 | 10.24 ± 0.95 | 10.24 ± 3.32 |
![]() ![]() Sailor2 20B SAIL | 59.46 ± 0.23 | 24.84 ± 1.24 | 24.84 ± 4.09 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 58.60 ± 1.46 | 9.38 ± 1.07 | 9.38 ± 3.20 |
![]() ![]() Command A 03-2025 111B CohereLabs | 58.17 ± 1.66 | 35.56 ± 1.80 | 35.56 ± 4.92 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 57.79 ± 2.05 | 22.68 ± 1.16 | 22.68 ± 4.73 |
![]() ![]() Aya Expanse 32B CohereLabs | 57.63 ± 0.35 | 8.24 ± 0.72 | 8.24 ± 3.18 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 55.84 ± 0.68 | 4.69 ± 1.23 | 4.69 ± 2.44 |
![]() ![]() Llama 3.1 8B Meta | 52.90 ± 0.58 | 5.28 ± 0.90 | 5.28 ± 2.36 |
![]() ![]() Qwen 2.5 7B Alibaba | 50.88 ± 0.27 | 6.79 ± 0.47 | 6.79 ± 3.03 |
![]() ![]() Command R 08-2024 32B CohereLabs | 50.45 ± 1.54 | 3.07 ± 0.46 | 3.07 ± 1.99 |
![]() ![]() Olmo 2 1124 13B AI2 | 48.53 ± 0.91 | 5.23 ± 0.56 | 5.23 ± 2.40 |
![]() ![]() Tulu 3 8B AI2 | 48.02 ± 0.94 | 8.19 ± 1.36 | 8.19 ± 3.17 |
![]() ![]() Llama 3 8B Meta | 44.85 ± 0.46 | 3.77 ± 0.76 | 3.77 ± 1.64 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.54 ± 0.42 | 2.69 ± 0.76 | 2.69 ± 1.88 |
![]() ![]() Babel 83B Alibaba-DAMO | 44.11 ± 6.04 | 2.96 ± 0.46 | 2.96 ± 1.37 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 42.29 ± 2.33 | 6.95 ± 0.74 | 6.95 ± 2.40 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.18 ± 2.86 | 3.18 ± 0.93 | 3.18 ± 1.68 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 40.99 ± 1.25 | 1.51 ± 0.48 | 1.51 ± 1.04 |
![]() ![]() Ministral 2410 8B Mistral AI | 36.21 ± 1.39 | 1.89 ± 0.53 | 1.89 ± 1.31 |
![]() ![]() phi-4 14B Microsoft | 35.82 ± 1.58 | 12.82 ± 1.01 | 12.82 ± 4.06 |
![]() ![]() Olmo 2 1124 7B AI2 | 28.95 ± 2.15 | 1.89 ± 0.60 | 1.89 ± 1.36 |
Model | TL | NLG | Summarization | Translations |
---|---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 74.53 ± 0.28 | 55.60 ± 0.09 | 19.94 ± 1.11 | 91.25 ± 0.04 |
![]() ![]() Gemma 3 27B | 74.09 ± 0.12 | 55.56 ± 0.15 | 19.74 ± 1.07 | 91.39 ± 0.05 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.84 ± 0.46 | 57.63 ± 0.15 | 25.41 ± 1.51 | 89.85 ± 0.05 |
![]() ![]() Gemma 3 12B | 72.02 ± 0.31 | 54.73 ± 0.08 | 19.23 ± 1.05 | 90.23 ± 0.05 |
![]() ![]() Llama 3.3 70B Meta | 70.26 ± 0.29 | 55.53 ± 0.10 | 24.30 ± 1.63 | 86.76 ± 0.12 |
![]() ![]() Qwen 3 30B MoE Alibaba | 70.06 ± 0.17 | 50.65 ± 0.07 | 18.22 ± 1.17 | 83.08 ± 0.08 |
![]() ![]() Tulu 3 70B AI2 | 69.96 ± 0.54 | 53.46 ± 0.24 | 18.28 ± 1.06 | 88.64 ± 0.07 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.94 ± 0.21 | 54.96 ± 0.09 | 21.24 ± 1.15 | 88.67 ± 0.06 |
![]() ![]() Qwen 3 32B Alibaba | 69.72 ± 0.18 | 51.44 ± 0.13 | 21.25 ± 1.28 | 81.63 ± 0.15 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.65 ± 0.35 | 49.46 ± 0.11 | 18.05 ± 1.08 | 80.87 ± 0.09 |
![]() ![]() Llama 3.1 70B Meta | 69.03 ± 0.36 | 57.03 ± 0.24 | 26.61 ± 1.59 | 87.44 ± 0.16 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 68.43 ± 0.50 | 54.58 ± 0.12 | 21.22 ± 1.06 | 87.94 ± 0.09 |
![]() ![]() Gemma 2 27B | 68.03 ± 0.32 | 55.99 ± 0.29 | 23.34 ± 1.27 | 88.64 ± 0.15 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 66.60 ± 0.44 | 53.95 ± 0.26 | 22.67 ± 1.27 | 85.22 ± 0.30 |
![]() ![]() Qwen 3 14B Alibaba | 65.76 ± 0.21 | 50.19 ± 0.11 | 20.05 ± 1.20 | 80.33 ± 0.15 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.68 ± 0.40 | 46.44 ± 0.13 | 20.44 ± 1.34 | 72.44 ± 0.11 |
![]() ![]() Gemma 2 9B | 63.06 ± 0.65 | 52.58 ± 0.35 | 21.53 ± 1.05 | 83.63 ± 0.76 |
![]() ![]() MERaLiON 2 10B A*STAR | 61.98 ± 0.76 | 52.10 ± 0.22 | 22.04 ± 1.13 | 82.15 ± 0.40 |
![]() ![]() Olmo 2 0325 32B AI2 | 61.97 ± 0.50 | 46.45 ± 1.48 | 14.63 ± 0.89 | 78.26 ± 0.78 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.86 ± 0.34 | 43.47 ± 0.19 | 18.28 ± 0.94 | 68.67 ± 0.23 |
![]() ![]() Qwen 3 8B Alibaba | 60.81 ± 0.12 | 47.09 ± 0.08 | 19.99 ± 1.00 | 74.19 ± 0.15 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 60.38 ± 0.38 | 53.24 ± 0.54 | 20.38 ± 1.29 | 86.11 ± 0.22 |
![]() ![]() Sailor2 8B SAIL | 60.13 ± 0.17 | 50.26 ± 0.54 | 18.60 ± 0.91 | 81.92 ± 1.05 |
![]() ![]() Llama 3 70B Meta | 60.08 ± 0.27 | 55.44 ± 0.12 | 24.65 ± 1.59 | 86.24 ± 0.10 |
![]() ![]() Sailor2 20B SAIL | 59.46 ± 0.23 | 25.88 ± 1.49 | 17.44 ± 0.93 | 34.31 ± 2.91 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 58.60 ± 1.46 | 46.43 ± 0.51 | 18.98 ± 0.90 | 73.88 ± 0.95 |
![]() ![]() Command A 03-2025 111B CohereLabs | 58.17 ± 1.66 | 52.43 ± 0.12 | 19.57 ± 1.03 | 85.29 ± 0.17 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 57.79 ± 2.05 | 51.96 ± 0.12 | 16.60 ± 0.91 | 87.32 ± 0.12 |
![]() ![]() Aya Expanse 32B CohereLabs | 57.63 ± 0.35 | 44.62 ± 0.24 | 18.16 ± 0.83 | 71.07 ± 0.40 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 55.84 ± 0.68 | 48.74 ± 1.49 | 23.47 ± 0.93 | 74.02 ± 2.91 |
![]() ![]() Llama 3.1 8B Meta | 52.90 ± 0.58 | 51.87 ± 0.35 | 25.59 ± 1.41 | 78.16 ± 0.21 |
![]() ![]() Qwen 2.5 7B Alibaba | 50.88 ± 0.27 | 32.20 ± 0.63 | 7.67 ± 1.37 | 56.74 ± 0.22 |
![]() ![]() Command R 08-2024 32B CohereLabs | 50.45 ± 1.54 | 40.71 ± 0.40 | 15.36 ± 0.63 | 66.05 ± 0.87 |
![]() ![]() Olmo 2 1124 13B AI2 | 48.53 ± 0.91 | 33.21 ± 2.37 | 6.50 ± 0.44 | 59.91 ± 2.33 |
![]() ![]() Tulu 3 8B AI2 | 48.02 ± 0.94 | 23.91 ± 1.08 | 15.03 ± 0.72 | 32.78 ± 2.03 |
![]() ![]() Llama 3 8B Meta | 44.85 ± 0.46 | 48.43 ± 0.54 | 22.63 ± 1.36 | 74.22 ± 1.30 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.54 ± 0.42 | 33.87 ± 0.46 | 14.28 ± 0.74 | 53.47 ± 0.38 |
![]() ![]() Babel 83B Alibaba-DAMO | 44.11 ± 6.04 | 32.21 ± 3.75 | 15.05 ± 0.74 | 49.37 ± 7.48 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 42.29 ± 2.33 | 40.18 ± 1.80 | 15.98 ± 0.93 | 64.37 ± 3.23 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.18 ± 2.86 | 46.40 ± 1.01 | 16.60 ± 0.86 | 76.20 ± 1.89 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 40.99 ± 1.25 | 27.51 ± 0.95 | 13.75 ± 0.66 | 41.28 ± 1.60 |
![]() ![]() Ministral 2410 8B Mistral AI | 36.21 ± 1.39 | 25.36 ± 1.86 | 12.27 ± 0.65 | 38.46 ± 2.78 |
![]() ![]() phi-4 14B Microsoft | 35.82 ± 1.58 | 36.24 ± 0.93 | 14.34 ± 0.68 | 58.14 ± 1.75 |
![]() ![]() Olmo 2 1124 7B AI2 | 28.95 ± 2.15 | 34.91 ± 1.08 | 14.07 ± 0.66 | 55.75 ± 2.03 |
Model | TL | NLR | Causal Reasoning | Natural Language Inference |
---|---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 74.53 ± 0.28 | 82.35 ± 0.39 | 94.31 ± 2.19 | 70.40 ± 3.57 |
![]() ![]() Gemma 3 27B | 74.09 ± 0.12 | 82.16 ± 0.22 | 94.22 ± 2.24 | 70.10 ± 3.60 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.84 ± 0.46 | 83.08 ± 1.11 | 93.13 ± 2.12 | 73.04 ± 2.85 |
![]() ![]() Gemma 3 12B | 72.02 ± 0.31 | 81.61 ± 0.22 | 93.34 ± 2.39 | 69.88 ± 3.51 |
![]() ![]() Llama 3.3 70B Meta | 70.26 ± 0.29 | 83.26 ± 0.12 | 93.78 ± 2.33 | 72.73 ± 3.46 |
![]() ![]() Qwen 3 30B MoE Alibaba | 70.06 ± 0.17 | 79.11 ± 0.07 | 89.47 ± 2.98 | 68.75 ± 3.61 |
![]() ![]() Tulu 3 70B AI2 | 69.96 ± 0.54 | 84.28 ± 0.47 | 93.41 ± 2.21 | 75.15 ± 3.14 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.94 ± 0.21 | 76.42 ± 0.10 | 93.53 ± 2.38 | 59.31 ± 3.91 |
![]() ![]() Qwen 3 32B Alibaba | 69.72 ± 0.18 | 80.36 ± 0.64 | 87.31 ± 3.09 | 73.42 ± 3.31 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.65 ± 0.35 | 79.96 ± 1.06 | 92.09 ± 2.59 | 67.83 ± 3.52 |
![]() ![]() Llama 3.1 70B Meta | 69.03 ± 0.36 | 82.42 ± 1.00 | 93.25 ± 2.15 | 71.58 ± 3.20 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 68.43 ± 0.50 | 78.66 ± 1.38 | 93.63 ± 2.18 | 63.69 ± 3.47 |
![]() ![]() Gemma 2 27B | 68.03 ± 0.32 | 80.13 ± 1.33 | 92.88 ± 2.41 | 67.38 ± 3.12 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 66.60 ± 0.44 | 79.56 ± 1.58 | 92.13 ± 2.17 | 67.00 ± 2.93 |
![]() ![]() Qwen 3 14B Alibaba | 65.76 ± 0.21 | 76.36 ± 0.45 | 85.81 ± 3.29 | 66.92 ± 3.67 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.68 ± 0.40 | 75.39 ± 0.31 | 85.00 ± 3.47 | 65.77 ± 3.69 |
![]() ![]() Gemma 2 9B | 63.06 ± 0.65 | 75.22 ± 0.94 | 92.25 ± 2.39 | 58.19 ± 3.67 |
![]() ![]() MERaLiON 2 10B A*STAR | 61.98 ± 0.76 | 76.19 ± 1.54 | 92.81 ± 2.29 | 59.56 ± 3.22 |
![]() ![]() Olmo 2 0325 32B AI2 | 61.97 ± 0.50 | 72.42 ± 1.35 | 86.28 ± 2.62 | 58.56 ± 3.50 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.86 ± 0.34 | 70.64 ± 0.16 | 83.72 ± 3.58 | 57.56 ± 3.89 |
![]() ![]() Qwen 3 8B Alibaba | 60.81 ± 0.12 | 64.02 ± 0.56 | 77.47 ± 3.79 | 50.56 ± 3.85 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 60.38 ± 0.38 | 71.78 ± 1.07 | 84.88 ± 2.64 | 58.69 ± 2.61 |
![]() ![]() Sailor2 8B SAIL | 60.13 ± 0.17 | 79.48 ± 0.49 | 91.78 ± 2.61 | 67.19 ± 3.39 |
![]() ![]() Llama 3 70B Meta | 60.08 ± 0.27 | 79.25 ± 0.28 | 90.13 ± 2.82 | 68.38 ± 3.60 |
![]() ![]() Sailor2 20B SAIL | 59.46 ± 0.23 | 81.44 ± 0.24 | 91.59 ± 2.70 | 71.29 ± 3.56 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 58.60 ± 1.46 | 70.95 ± 5.21 | 78.03 ± 2.93 | 63.88 ± 3.08 |
![]() ![]() Command A 03-2025 111B CohereLabs | 58.17 ± 1.66 | 58.35 ± 1.87 | 94.41 ± 2.11 | 22.29 ± 2.17 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 57.79 ± 2.05 | 73.27 ± 2.67 | 85.75 ± 2.98 | 60.79 ± 3.37 |
![]() ![]() Aya Expanse 32B CohereLabs | 57.63 ± 0.35 | 71.80 ± 0.44 | 84.91 ± 3.39 | 58.69 ± 3.68 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 55.84 ± 0.68 | 69.70 ± 2.24 | 84.88 ± 3.03 | 54.52 ± 2.78 |
![]() ![]() Llama 3.1 8B Meta | 52.90 ± 0.58 | 60.54 ± 3.15 | 72.53 ± 3.23 | 48.54 ± 2.54 |
![]() ![]() Qwen 2.5 7B Alibaba | 50.88 ± 0.27 | 63.48 ± 0.43 | 68.13 ± 4.51 | 58.83 ± 3.78 |
![]() ![]() Command R 08-2024 32B CohereLabs | 50.45 ± 1.54 | 62.35 ± 3.90 | 78.63 ± 2.60 | 46.08 ± 2.08 |
![]() ![]() Olmo 2 1124 13B AI2 | 48.53 ± 0.91 | 53.56 ± 3.71 | 65.09 ± 3.16 | 42.02 ± 3.27 |
![]() ![]() Tulu 3 8B AI2 | 48.02 ± 0.94 | 62.74 ± 2.74 | 75.84 ± 3.57 | 49.65 ± 2.30 |
![]() ![]() Llama 3 8B Meta | 44.85 ± 0.46 | 56.78 ± 1.38 | 67.91 ± 3.80 | 45.65 ± 3.39 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.54 ± 0.42 | 55.47 ± 0.28 | 66.34 ± 4.55 | 44.60 ± 3.65 |
![]() ![]() Babel 83B Alibaba-DAMO | 44.11 ± 6.04 | 60.97 ± 7.68 | 76.78 ± 2.18 | 45.17 ± 2.02 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 42.29 ± 2.33 | 50.05 ± 4.27 | 64.97 ± 3.09 | 35.13 ± 1.44 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.18 ± 2.86 | 48.68 ± 7.53 | 69.78 ± 2.69 | 27.58 ± 1.42 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 40.99 ± 1.25 | 47.62 ± 2.53 | 60.53 ± 3.26 | 34.71 ± 1.87 |
![]() ![]() Ministral 2410 8B Mistral AI | 36.21 ± 1.39 | 45.83 ± 4.38 | 56.84 ± 2.45 | 34.81 ± 1.70 |
![]() ![]() phi-4 14B Microsoft | 35.82 ± 1.58 | 38.84 ± 1.49 | 76.91 ± 3.14 | 0.77 ± 0.32 |
![]() ![]() Olmo 2 1124 7B AI2 | 28.95 ± 2.15 | 36.04 ± 5.52 | 52.91 ± 2.83 | 19.17 ± 0.96 |
Model | TL | NLU | Belebele QA | Sentiment Analysis |
---|---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 74.53 ± 0.28 | 79.50 ± 0.30 | 86.63 ± 6.44 | 72.38 ± 3.53 |
![]() ![]() Gemma 3 27B | 74.09 ± 0.12 | 79.70 ± 0.34 | 87.00 ± 6.49 | 72.40 ± 3.52 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.84 ± 0.46 | 77.63 ± 1.29 | 86.13 ± 6.08 | 69.13 ± 3.46 |
![]() ![]() Gemma 3 12B | 72.02 ± 0.31 | 77.26 ± 0.46 | 83.38 ± 7.19 | 71.15 ± 3.58 |
![]() ![]() Llama 3.3 70B Meta | 70.26 ± 0.29 | 78.50 ± 0.39 | 87.25 ± 6.35 | 69.75 ± 3.61 |
![]() ![]() Qwen 3 30B MoE Alibaba | 70.06 ± 0.17 | 78.52 ± 0.28 | 88.00 ± 6.17 | 69.04 ± 3.66 |
![]() ![]() Tulu 3 70B AI2 | 69.96 ± 0.54 | 81.78 ± 0.86 | 88.00 ± 5.90 | 75.56 ± 3.09 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.94 ± 0.21 | 76.78 ± 0.14 | 83.88 ± 7.18 | 69.69 ± 3.65 |
![]() ![]() Qwen 3 32B Alibaba | 69.72 ± 0.18 | 80.84 ± 0.21 | 83.38 ± 7.11 | 78.31 ± 3.07 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.65 ± 0.35 | 80.88 ± 0.31 | 83.88 ± 7.03 | 77.88 ± 3.20 |
![]() ![]() Llama 3.1 70B Meta | 69.03 ± 0.36 | 77.50 ± 1.09 | 86.00 ± 6.32 | 69.00 ± 3.54 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 68.43 ± 0.50 | 80.22 ± 1.80 | 85.63 ± 6.54 | 74.81 ± 3.04 |
![]() ![]() Gemma 2 27B | 68.03 ± 0.32 | 80.06 ± 1.16 | 87.25 ± 6.13 | 72.88 ± 3.33 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 66.60 ± 0.44 | 77.58 ± 2.06 | 87.50 ± 5.83 | 67.67 ± 3.23 |
![]() ![]() Qwen 3 14B Alibaba | 65.76 ± 0.21 | 79.61 ± 0.23 | 82.75 ± 7.22 | 76.48 ± 3.27 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.68 ± 0.40 | 78.20 ± 0.42 | 84.13 ± 7.14 | 72.27 ± 3.44 |
![]() ![]() Gemma 2 9B | 63.06 ± 0.65 | 74.53 ± 5.09 | 83.00 ± 7.15 | 66.06 ± 2.96 |
![]() ![]() MERaLiON 2 10B A*STAR | 61.98 ± 0.76 | 75.43 ± 2.85 | 81.25 ± 7.25 | 69.60 ± 3.16 |
![]() ![]() Olmo 2 0325 32B AI2 | 61.97 ± 0.50 | 78.58 ± 2.66 | 84.50 ± 6.27 | 72.67 ± 2.80 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.86 ± 0.34 | 77.69 ± 0.39 | 81.88 ± 7.38 | 73.50 ± 3.45 |
![]() ![]() Qwen 3 8B Alibaba | 60.81 ± 0.12 | 75.43 ± 0.42 | 76.75 ± 8.24 | 74.10 ± 3.38 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 60.38 ± 0.38 | 71.72 ± 1.89 | 78.38 ± 7.30 | 65.06 ± 3.47 |
![]() ![]() Sailor2 8B SAIL | 60.13 ± 0.17 | 76.19 ± 0.42 | 80.13 ± 7.30 | 72.25 ± 3.46 |
![]() ![]() Llama 3 70B Meta | 60.08 ± 0.27 | 75.72 ± 0.28 | 84.13 ± 7.00 | 67.31 ± 3.70 |
![]() ![]() Sailor2 20B SAIL | 59.46 ± 0.23 | 82.07 ± 0.28 | 86.13 ± 6.60 | 78.02 ± 3.19 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 58.60 ± 1.46 | 75.98 ± 2.09 | 83.75 ± 6.66 | 68.21 ± 2.98 |
![]() ![]() Command A 03-2025 111B CohereLabs | 58.17 ± 1.66 | 16.55 ± 5.99 | 24.13 ± 7.05 | 8.98 ± 1.56 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 57.79 ± 2.05 | 61.64 ± 5.18 | 67.88 ± 7.61 | 55.40 ± 3.27 |
![]() ![]() Aya Expanse 32B CohereLabs | 57.63 ± 0.35 | 71.56 ± 0.77 | 75.13 ± 8.08 | 68.00 ± 3.60 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 55.84 ± 0.68 | 72.44 ± 0.89 | 73.75 ± 7.28 | 71.13 ± 3.38 |
![]() ![]() Llama 3.1 8B Meta | 52.90 ± 0.58 | 67.45 ± 0.71 | 73.63 ± 7.94 | 61.27 ± 3.64 |
![]() ![]() Qwen 2.5 7B Alibaba | 50.88 ± 0.27 | 69.98 ± 0.27 | 71.38 ± 8.81 | 68.58 ± 3.63 |
![]() ![]() Command R 08-2024 32B CohereLabs | 50.45 ± 1.54 | 61.35 ± 1.92 | 60.38 ± 6.91 | 62.33 ± 3.79 |
![]() ![]() Olmo 2 1124 13B AI2 | 48.53 ± 0.91 | 53.28 ± 5.15 | 63.50 ± 8.53 | 43.06 ± 2.46 |
![]() ![]() Tulu 3 8B AI2 | 48.02 ± 0.94 | 49.66 ± 4.83 | 66.00 ± 8.70 | 33.31 ± 2.54 |
![]() ![]() Llama 3 8B Meta | 44.85 ± 0.46 | 64.51 ± 0.50 | 69.25 ± 8.56 | 59.77 ± 3.82 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.54 ± 0.42 | 59.64 ± 1.09 | 57.38 ± 9.48 | 61.90 ± 3.68 |
![]() ![]() Babel 83B Alibaba-DAMO | 44.11 ± 6.04 | 50.47 ± 14.65 | 61.75 ± 5.93 | 39.19 ± 1.58 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 42.29 ± 2.33 | 53.92 ± 8.80 | 56.75 ± 6.75 | 51.08 ± 2.42 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.18 ± 2.86 | 68.76 ± 2.43 | 75.75 ± 7.26 | 61.77 ± 2.65 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 40.99 ± 1.25 | 54.66 ± 2.54 | 51.38 ± 7.32 | 57.94 ± 3.58 |
![]() ![]() Ministral 2410 8B Mistral AI | 36.21 ± 1.39 | 46.64 ± 5.33 | 47.00 ± 7.27 | 46.27 ± 2.52 |
![]() ![]() phi-4 14B Microsoft | 35.82 ± 1.58 | 9.17 ± 3.95 | 16.75 ± 5.33 | 1.58 ± 0.68 |
![]() ![]() Olmo 2 1124 7B AI2 | 28.95 ± 2.15 | 50.30 ± 4.48 | 45.75 ± 7.36 | 54.85 ± 2.71 |
Model | TL | Safety | Toxicity Detection |
---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 74.53 ± 0.28 | 81.44 ± 0.47 | 81.44 ± 3.61 |
![]() ![]() Gemma 3 27B | 74.09 ± 0.12 | 81.66 ± 0.23 | 81.66 ± 3.66 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.84 ± 0.46 | 76.81 ± 2.24 | 76.81 ± 3.34 |
![]() ![]() Gemma 3 12B | 72.02 ± 0.31 | 78.03 ± 0.49 | 78.03 ± 3.83 |
![]() ![]() Llama 3.3 70B Meta | 70.26 ± 0.29 | 74.88 ± 0.85 | 74.88 ± 4.09 |
![]() ![]() Qwen 3 30B MoE Alibaba | 70.06 ± 0.17 | 66.72 ± 0.06 | 66.72 ± 4.60 |
![]() ![]() Tulu 3 70B AI2 | 69.96 ± 0.54 | 73.41 ± 3.19 | 73.41 ± 3.95 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.94 ± 0.21 | 69.41 ± 0.09 | 69.41 ± 4.50 |
![]() ![]() Qwen 3 32B Alibaba | 69.72 ± 0.18 | 76.19 ± 0.59 | 76.19 ± 4.00 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.65 ± 0.35 | 71.88 ± 0.35 | 71.88 ± 4.26 |
![]() ![]() Llama 3.1 70B Meta | 69.03 ± 0.36 | 75.19 ± 1.52 | 75.19 ± 3.72 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 68.43 ± 0.50 | 74.63 ± 0.48 | 74.63 ± 4.09 |
![]() ![]() Gemma 2 27B | 68.03 ± 0.32 | 79.50 ± 0.57 | 79.50 ± 3.76 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 66.60 ± 0.44 | 75.41 ± 0.73 | 75.41 ± 3.52 |
![]() ![]() Qwen 3 14B Alibaba | 65.76 ± 0.21 | 73.44 ± 0.36 | 73.44 ± 4.27 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.68 ± 0.40 | 73.34 ± 0.35 | 73.34 ± 4.25 |
![]() ![]() Gemma 2 9B | 63.06 ± 0.65 | 68.44 ± 0.53 | 68.44 ± 4.49 |
![]() ![]() MERaLiON 2 10B A*STAR | 61.98 ± 0.76 | 64.31 ± 0.70 | 64.31 ± 4.62 |
![]() ![]() Olmo 2 0325 32B AI2 | 61.97 ± 0.50 | 71.91 ± 2.19 | 71.91 ± 3.76 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.86 ± 0.34 | 72.22 ± 0.36 | 72.22 ± 4.28 |
![]() ![]() Qwen 3 8B Alibaba | 60.81 ± 0.12 | 71.16 ± 0.69 | 71.16 ± 4.33 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 60.38 ± 0.38 | 59.91 ± 2.13 | 59.91 ± 4.59 |
![]() ![]() Sailor2 8B SAIL | 60.13 ± 0.17 | 72.47 ± 0.39 | 72.47 ± 4.29 |
![]() ![]() Llama 3 70B Meta | 60.08 ± 0.27 | 74.22 ± 0.28 | 74.22 ± 4.04 |
![]() ![]() Sailor2 20B SAIL | 59.46 ± 0.23 | 74.31 ± 0.18 | 74.31 ± 4.20 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 58.60 ± 1.46 | 60.06 ± 10.26 | 60.06 ± 2.91 |
![]() ![]() Command A 03-2025 111B CohereLabs | 58.17 ± 1.66 | 76.19 ± 1.08 | 76.19 ± 3.84 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 57.79 ± 2.05 | 66.16 ± 0.77 | 66.16 ± 4.55 |
![]() ![]() Aya Expanse 32B CohereLabs | 57.63 ± 0.35 | 72.28 ± 0.89 | 72.28 ± 4.20 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 55.84 ± 0.68 | 66.44 ± 5.48 | 66.44 ± 3.49 |
![]() ![]() Llama 3.1 8B Meta | 52.90 ± 0.58 | 59.53 ± 3.16 | 59.53 ± 4.43 |
![]() ![]() Qwen 2.5 7B Alibaba | 50.88 ± 0.27 | 64.66 ± 0.24 | 64.66 ± 4.66 |
![]() ![]() Command R 08-2024 32B CohereLabs | 50.45 ± 1.54 | 67.41 ± 1.12 | 67.41 ± 3.50 |
![]() ![]() Olmo 2 1124 13B AI2 | 48.53 ± 0.91 | 72.34 ± 2.29 | 72.34 ± 3.33 |
![]() ![]() Tulu 3 8B AI2 | 48.02 ± 0.94 | 61.56 ± 2.09 | 61.56 ± 4.38 |
![]() ![]() Llama 3 8B Meta | 44.85 ± 0.46 | 54.16 ± 0.93 | 54.16 ± 4.74 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.54 ± 0.42 | 58.50 ± 1.06 | 58.50 ± 4.58 |
![]() ![]() Babel 83B Alibaba-DAMO | 44.11 ± 6.04 | 62.75 ± 12.26 | 62.75 ± 3.08 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 42.29 ± 2.33 | 56.41 ± 2.26 | 56.41 ± 2.93 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.18 ± 2.86 | 62.25 ± 1.16 | 62.25 ± 4.52 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 40.99 ± 1.25 | 62.81 ± 3.59 | 62.81 ± 2.75 |
![]() ![]() Ministral 2410 8B Mistral AI | 36.21 ± 1.39 | 55.09 ± 3.31 | 55.09 ± 1.39 |
![]() ![]() phi-4 14B Microsoft | 35.82 ± 1.58 | 5.22 ± 3.59 | 5.22 ± 1.53 |
![]() ![]() Olmo 2 1124 7B AI2 | 28.95 ± 2.15 | 43.88 ± 11.66 | 43.88 ± 4.26 |
Model | TL | Knowledge | Global MMLU Lite |
---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 74.53 ± 0.28 | 73.56 ± 0.30 | 73.56 ± 0.30 |
![]() ![]() Gemma 3 27B | 74.09 ± 0.12 | 73.91 ± 0.16 | 73.91 ± 0.16 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 72.84 ± 0.46 | 78.22 ± 0.91 | 78.22 ± 0.91 |
![]() ![]() Gemma 3 12B | 72.02 ± 0.31 | 68.56 ± 0.29 | 68.56 ± 0.29 |
![]() ![]() Llama 3.3 70B Meta | 70.26 ± 0.29 | 76.19 ± 0.47 | 76.19 ± 0.47 |
![]() ![]() Qwen 3 30B MoE Alibaba | 70.06 ± 0.17 | 68.34 ± 0.45 | 68.34 ± 0.45 |
![]() ![]() Tulu 3 70B AI2 | 69.96 ± 0.54 | 70.19 ± 0.76 | 70.19 ± 0.76 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 69.94 ± 0.21 | 77.00 ± 0.21 | 77.00 ± 0.21 |
![]() ![]() Qwen 3 32B Alibaba | 69.72 ± 0.18 | 69.22 ± 0.57 | 69.22 ± 0.57 |
![]() ![]() Qwen 2.5 72B Alibaba | 69.65 ± 0.35 | 75.44 ± 0.41 | 75.44 ± 0.41 |
![]() ![]() Llama 3.1 70B Meta | 69.03 ± 0.36 | 74.50 ± 0.52 | 74.50 ± 0.52 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 68.43 ± 0.50 | 63.22 ± 0.47 | 63.22 ± 0.47 |
![]() ![]() Gemma 2 27B | 68.03 ± 0.32 | 68.41 ± 0.80 | 68.41 ± 0.80 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 66.60 ± 0.44 | 65.84 ± 0.99 | 65.84 ± 0.99 |
![]() ![]() Qwen 3 14B Alibaba | 65.76 ± 0.21 | 62.28 ± 0.66 | 62.28 ± 0.66 |
![]() ![]() Qwen 2.5 32B Alibaba | 64.68 ± 0.40 | 68.59 ± 0.28 | 68.59 ± 0.28 |
![]() ![]() Gemma 2 9B | 63.06 ± 0.65 | 62.72 ± 0.88 | 62.72 ± 0.88 |
![]() ![]() MERaLiON 2 10B A*STAR | 61.98 ± 0.76 | 61.78 ± 1.97 | 61.78 ± 1.97 |
![]() ![]() Olmo 2 0325 32B AI2 | 61.97 ± 0.50 | 59.97 ± 1.39 | 59.97 ± 1.39 |
![]() ![]() Qwen 2.5 14B Alibaba | 60.86 ± 0.34 | 60.25 ± 0.16 | 60.25 ± 0.16 |
![]() ![]() Qwen 3 8B Alibaba | 60.81 ± 0.12 | 59.25 ± 0.36 | 59.25 ± 0.36 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 60.38 ± 0.38 | 57.19 ± 0.30 | 57.19 ± 0.30 |
![]() ![]() Sailor2 8B SAIL | 60.13 ± 0.17 | 61.94 ± 0.37 | 61.94 ± 0.37 |
![]() ![]() Llama 3 70B Meta | 60.08 ± 0.27 | 68.59 ± 0.46 | 68.59 ± 0.46 |
![]() ![]() Sailor2 20B SAIL | 59.46 ± 0.23 | 63.00 ± 0.37 | 63.00 ± 0.37 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 58.60 ± 1.46 | 64.31 ± 1.47 | 64.31 ± 1.47 |
![]() ![]() Command A 03-2025 111B CohereLabs | 58.17 ± 1.66 | 70.72 ± 1.03 | 70.72 ± 1.03 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 57.79 ± 2.05 | 59.31 ± 1.47 | 59.31 ± 1.47 |
![]() ![]() Aya Expanse 32B CohereLabs | 57.63 ± 0.35 | 55.19 ± 0.63 | 55.19 ± 0.63 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 55.84 ± 0.68 | 53.31 ± 1.52 | 53.31 ± 1.52 |
![]() ![]() Llama 3.1 8B Meta | 52.90 ± 0.58 | 49.63 ± 0.57 | 49.63 ± 0.57 |
![]() ![]() Qwen 2.5 7B Alibaba | 50.88 ± 0.27 | 52.94 ± 0.15 | 52.94 ± 0.15 |
![]() ![]() Command R 08-2024 32B CohereLabs | 50.45 ± 1.54 | 45.09 ± 2.80 | 45.09 ± 2.80 |
![]() ![]() Olmo 2 1124 13B AI2 | 48.53 ± 0.91 | 41.69 ± 0.78 | 41.69 ± 0.78 |
![]() ![]() Tulu 3 8B AI2 | 48.02 ± 0.94 | 46.41 ± 1.19 | 46.41 ± 1.19 |
![]() ![]() Llama 3 8B Meta | 44.85 ± 0.46 | 46.94 ± 1.44 | 46.94 ± 1.44 |
![]() ![]() Aya Expanse 8B CohereLabs | 44.54 ± 0.42 | 41.22 ± 3.30 | 41.22 ± 3.30 |
![]() ![]() Babel 83B Alibaba-DAMO | 44.11 ± 6.04 | 43.13 ± 14.09 | 43.13 ± 14.09 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 42.29 ± 2.33 | 23.84 ± 7.95 | 23.84 ± 7.95 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.18 ± 2.86 | 26.16 ± 6.56 | 26.16 ± 6.56 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 40.99 ± 1.25 | 37.28 ± 2.85 | 37.28 ± 2.85 |
![]() ![]() Ministral 2410 8B Mistral AI | 36.21 ± 1.39 | 33.31 ± 3.87 | 33.31 ± 3.87 |
![]() ![]() phi-4 14B Microsoft | 35.82 ± 1.58 | 55.63 ± 1.29 | 55.63 ± 1.29 |
![]() ![]() Olmo 2 1124 7B AI2 | 28.95 ± 2.15 | 16.28 ± 4.32 | 16.28 ± 4.32 |