Tamil Performance
Tamil Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 27B 64.43±0.16 |
![]() ![]() 27B 64.36±0.22 |
![]() ![]() 32B 62.30±0.15 |
![]() ![]() 80B MoE 60.05±0.13 |
![]() ![]() 12B 59.86±0.21 |
![]() ![]() 32B 58.88±0.24 |
![]() ![]() 109B MoE 58.69±0.15 |
![]() ![]() 70B 57.99±0.21 |
![]() ![]() 30B MoE 56.06±0.16 |
![]() ![]() 70B 54.74±0.15 |
![]() ![]() 9B 53.89±0.21 |
![]() ![]() 27B 53.44±0.23 |
![]() ![]() 111B 53.29±0.20 |
![]() ![]() 14B 52.30±0.17 |
![]() ![]() 70B 50.61±0.23 |
![]() ![]() 123B 49.44±0.29 |
![]() ![]() 9B 48.17±0.23 |
![]() ![]() 70B 46.08±0.21 |
![]() ![]() 10B 45.29±0.25 |
![]() ![]() 32B 44.29±0.22 |
![]() ![]() 72B 42.51±0.15 |
![]() ![]() 8B 42.13±0.32 |
![]() ![]() 21B MoE 40.75±0.18 |
![]() ![]() 32B 40.58±0.15 |
![]() ![]() 32B 36.43±0.27 |
![]() ![]() 20B 35.23±0.20 |
![]() ![]() 8B 33.23±0.16 |
![]() ![]() 14B 32.49±0.22 |
![]() ![]() 70B 31.13±0.16 |
![]() ![]() 104B 29.43±0.39 |
![]() ![]() 70B 28.48±0.30 |
![]() ![]() 83B 25.82±0.33 |
![]() ![]() 8B 24.58±0.25 |
![]() ![]() 14B 22.74±0.23 |
![]() ![]() 7B 21.63±0.13 |
![]() ![]() 8B 20.59±0.16 |
![]() ![]() 32B 19.42±0.31 |
![]() ![]() 8B 17.53±0.26 |
![]() ![]() 8B 17.21±0.27 |
![]() ![]() 8B 17.05±0.23 |
![]() ![]() 9B 16.70±0.21 |
![]() ![]() 7B 13.32±0.18 |
![]() ![]() 7B 11.92±0.18 |
![]() ![]() 8B 11.11±0.19 |
![]() ![]() 24B 10.39±0.28 |
![]() ![]() 8B 8.95±0.16 |
![]() ![]() 13B 8.48±0.18 |
![]() ![]() 7B 6.98±0.14 |
Tamil Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TA | Instruction Following | Linguistic Diagnostics | Multi-Turn Chat | NLG | NLR | NLU |
---|---|---|---|---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.43 ± 0.16 | 71.05 ± 0.64 | 71.55 ± 0.48 | 40.52 ± 0.67 | 50.37 ± 0.05 | 71.26 ± 0.16 | 81.86 ± 0.22 |
![]() ![]() Gemma 3 27B | 64.36 ± 0.22 | 71.33 ± 0.91 | 71.32 ± 0.33 | 40.19 ± 0.82 | 50.50 ± 0.06 | 71.71 ± 0.12 | 81.08 ± 0.24 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 62.30 ± 0.15 | 78.38 ± 0.63 | 71.39 ± 0.26 | 20.96 ± 0.57 | 49.73 ± 0.06 | 69.41 ± 0.11 | 83.94 ± 0.16 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 60.05 ± 0.13 | 73.71 ± 0.50 | 71.51 ± 0.29 | 23.41 ± 0.57 | 47.63 ± 0.04 | 62.50 ± 0.16 | 81.52 ± 0.08 |
![]() ![]() Gemma 3 12B | 59.86 ± 0.21 | 70.67 ± 0.87 | 58.39 ± 0.50 | 36.03 ± 0.56 | 49.59 ± 0.06 | 61.95 ± 0.18 | 82.53 ± 0.22 |
![]() ![]() Qwen 3 32B Alibaba | 58.88 ± 0.24 | 76.06 ± 0.71 | 69.40 ± 0.84 | 17.14 ± 0.53 | 44.71 ± 0.08 | 62.93 ± 0.18 | 83.05 ± 0.21 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 58.69 ± 0.15 | 80.70 ± 0.62 | 62.58 ± 0.29 | 12.10 ± 0.49 | 50.18 ± 0.07 | 63.61 ± 0.05 | 82.97 ± 0.10 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 57.99 ± 0.21 | 75.97 ± 0.94 | 60.28 ± 1.06 | 15.69 ± 0.44 | 48.62 ± 0.08 | 64.72 ± 0.31 | 82.68 ± 0.26 |
![]() ![]() Qwen 3 30B MoE Alibaba | 56.06 ± 0.16 | 66.38 ± 0.79 | 63.44 ± 0.38 | 19.20 ± 0.44 | 42.10 ± 0.05 | 63.60 ± 0.11 | 81.67 ± 0.09 |
![]() ![]() Llama 3.3 70B Meta | 54.74 ± 0.15 | 70.22 ± 0.73 | 55.64 ± 0.42 | 8.43 ± 0.52 | 46.86 ± 0.11 | 65.14 ± 0.18 | 82.12 ± 0.19 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 53.89 ± 0.21 | 66.83 ± 0.79 | 62.07 ± 0.42 | 17.53 ± 0.54 | 42.47 ± 0.09 | 52.68 ± 0.21 | 81.79 ± 0.23 |
![]() ![]() Gemma 2 27B | 53.44 ± 0.23 | 63.24 ± 1.09 | 62.72 ± 0.36 | 7.47 ± 0.30 | 44.45 ± 0.13 | 61.24 ± 0.21 | 81.52 ± 0.29 |
![]() ![]() Command A 03-2025 111B CohereLabs | 53.29 ± 0.20 | 68.60 ± 0.98 | 63.64 ± 0.58 | 17.46 ± 0.36 | 47.94 ± 0.06 | 48.83 ± 0.33 | 73.26 ± 0.44 |
![]() ![]() Qwen 3 14B Alibaba | 52.30 ± 0.17 | 68.44 ± 0.68 | 60.92 ± 0.48 | 12.80 ± 0.40 | 41.64 ± 0.10 | 45.50 ± 0.19 | 84.48 ± 0.20 |
![]() ![]() Tulu 3 70B AI2 | 50.61 ± 0.23 | 59.56 ± 0.94 | 47.47 ± 0.68 | 10.19 ± 0.46 | 44.37 ± 0.13 | 62.63 ± 0.30 | 79.47 ± 0.39 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 49.44 ± 0.29 | 62.41 ± 1.10 | 49.21 ± 1.18 | 10.70 ± 0.56 | 47.26 ± 0.08 | 45.49 ± 0.45 | 81.59 ± 0.46 |
![]() ![]() Gemma 2 9B | 48.17 ± 0.23 | 58.57 ± 1.08 | 63.86 ± 0.52 | 6.62 ± 0.39 | 35.18 ± 0.10 | 42.72 ± 0.39 | 82.08 ± 0.26 |
![]() ![]() Llama 3.1 70B Meta | 46.08 ± 0.21 | 50.70 ± 1.18 | 49.12 ± 0.73 | 8.15 ± 0.39 | 27.39 ± 0.16 | 59.55 ± 0.55 | 81.58 ± 0.25 |
![]() ![]() MERaLiON 2 10B A*STAR | 45.29 ± 0.25 | 58.89 ± 0.90 | 58.11 ± 0.49 | 5.86 ± 0.40 | 26.24 ± 0.14 | 42.79 ± 0.42 | 79.83 ± 0.36 |
![]() ![]() Qwen 2.5 32B Alibaba | 44.29 ± 0.22 | 66.79 ± 0.81 | 43.65 ± 0.55 | 7.51 ± 0.44 | 35.04 ± 0.08 | 39.56 ± 0.14 | 73.20 ± 0.20 |
![]() ![]() Qwen 2.5 72B Alibaba | 42.51 ± 0.15 | 63.49 ± 1.02 | 40.11 ± 0.35 | 10.93 ± 0.32 | 37.34 ± 0.09 | 31.67 ± 0.23 | 71.51 ± 0.24 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 42.13 ± 0.32 | 63.71 ± 1.09 | 29.91 ± 1.13 | 9.24 ± 0.50 | 42.81 ± 0.09 | 31.62 ± 0.76 | 75.47 ± 0.31 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 40.75 ± 0.18 | 55.27 ± 0.79 | 16.30 ± 0.48 | 12.20 ± 0.46 | 40.44 ± 0.09 | 45.63 ± 0.40 | 74.69 ± 0.32 |
![]() ![]() Aya Expanse 32B CohereLabs | 40.58 ± 0.15 | 54.67 ± 0.74 | 35.43 ± 0.44 | 6.81 ± 0.32 | 36.62 ± 0.04 | 35.90 ± 0.20 | 74.05 ± 0.22 |
![]() ![]() Command R 08-2024 32B CohereLabs | 36.43 ± 0.27 | 38.98 ± 1.14 | 24.42 ± 1.11 | 1.91 ± 0.31 | 37.81 ± 0.13 | 37.14 ± 0.53 | 78.31 ± 0.32 |
![]() ![]() Sailor2 20B SAIL | 35.23 ± 0.20 | 29.24 ± 0.85 | 42.96 ± 0.26 | 10.96 ± 0.45 | 22.97 ± 0.07 | 61.84 ± 0.13 | 43.42 ± 0.13 |
![]() ![]() Qwen 3 8B Alibaba | 33.23 ± 0.16 | 62.67 ± 0.59 | 28.57 ± 0.31 | 12.30 ± 0.40 | 20.38 ± 0.04 | 24.69 ± 0.16 | 50.78 ± 0.42 |
![]() ![]() Qwen 2.5 14B Alibaba | 32.49 ± 0.22 | 43.87 ± 0.91 | 38.80 ± 0.60 | 6.38 ± 0.26 | 25.85 ± 0.07 | 23.27 ± 0.19 | 56.78 ± 0.23 |
![]() ![]() Llama 3 70B Meta | 31.13 ± 0.16 | 15.40 ± 0.72 | 38.92 ± 0.44 | 4.80 ± 0.34 | 31.35 ± 0.05 | 50.44 ± 0.29 | 45.85 ± 0.04 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 29.43 ± 0.39 | 35.68 ± 1.10 | 20.33 ± 1.22 | 2.77 ± 0.24 | 40.17 ± 0.10 | 15.34 ± 0.68 | 62.28 ± 0.55 |
![]() ![]() Apertus 70B Swiss AI | 28.48 ± 0.30 | 51.02 ± 1.39 | 0.00 ± 0.00 | 8.55 ± 0.56 | 39.26 ± 0.10 | 1.45 ± 0.31 | 70.60 ± 0.44 |
![]() ![]() Babel 83B Alibaba-DAMO | 25.82 ± 0.33 | 33.40 ± 0.84 | 21.72 ± 1.31 | 2.08 ± 0.34 | 24.48 ± 0.13 | 26.08 ± 0.63 | 47.16 ± 0.55 |
![]() ![]() Tulu 3 8B AI2 | 24.58 ± 0.25 | 48.22 ± 1.14 | 9.82 ± 0.51 | 2.56 ± 0.27 | 36.04 ± 0.11 | 27.27 ± 0.52 | 23.58 ± 0.34 |
![]() ![]() phi-4 14B Microsoft | 22.74 ± 0.23 | 32.19 ± 0.98 | 0.00 ± 0.00 | 8.13 ± 0.40 | 31.25 ± 0.10 | 14.79 ± 0.71 | 50.06 ± 0.55 |
![]() ![]() Qwen 2.5 7B Alibaba | 21.63 ± 0.13 | 44.98 ± 0.65 | 3.59 ± 0.28 | 2.64 ± 0.33 | 17.57 ± 0.08 | 10.35 ± 0.13 | 50.67 ± 0.27 |
![]() ![]() Sailor2 8B SAIL | 20.59 ± 0.16 | 25.78 ± 0.90 | 0.00 ± 0.00 | 7.24 ± 0.39 | 23.60 ± 0.08 | 22.00 ± 0.18 | 44.95 ± 0.12 |
![]() ![]() Olmo 2 0325 32B AI2 | 19.42 ± 0.31 | 51.56 ± 0.95 | 12.81 ± 1.20 | 2.40 ± 0.28 | 25.23 ± 0.06 | 20.06 ± 0.58 | 4.44 ± 0.46 |
![]() ![]() Llama 3.1 8B Meta | 17.53 ± 0.26 | 36.98 ± 1.21 | 5.15 ± 0.92 | 3.92 ± 0.35 | 29.43 ± 0.10 | 2.99 ± 0.39 | 26.72 ± 0.51 |
![]() ![]() Apertus 8B Swiss AI | 17.21 ± 0.27 | 42.54 ± 1.16 | 0.00 ± 0.00 | 2.30 ± 0.34 | 30.78 ± 0.13 | 0.00 ± 0.00 | 27.67 ± 0.91 |
![]() ![]() Aya Expanse 8B CohereLabs | 17.05 ± 0.23 | 32.60 ± 1.08 | 12.17 ± 0.61 | 2.24 ± 0.22 | 19.69 ± 0.05 | 17.19 ± 0.32 | 18.40 ± 0.41 |
![]() ![]() Babel 9B Alibaba-DAMO | 16.70 ± 0.21 | 38.03 ± 0.90 | 0.00 ± 0.00 | 2.54 ± 0.22 | 28.21 ± 0.13 | 1.89 ± 0.57 | 29.55 ± 0.32 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 13.32 ± 0.18 | 38.57 ± 0.98 | 0.00 ± 0.00 | 1.51 ± 0.25 | 24.67 ± 0.09 | 0.00 ± 0.00 | 15.19 ± 0.50 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 11.92 ± 0.18 | 32.79 ± 0.83 | 0.00 ± 0.00 | 2.13 ± 0.30 | 20.07 ± 0.08 | 0.00 ± 0.00 | 16.54 ± 0.38 |
![]() ![]() Ministral 2410 8B Mistral AI | 11.11 ± 0.19 | 22.03 ± 1.00 | 0.00 ± 0.00 | 0.99 ± 0.19 | 18.72 ± 0.10 | 1.27 ± 0.30 | 23.62 ± 0.52 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 10.39 ± 0.28 | 31.97 ± 1.43 | 0.00 ± 0.00 | 1.93 ± 0.30 | 11.22 ± 0.08 | 0.00 ± 0.00 | 17.25 ± 0.44 |
![]() ![]() Llama 3 8B Meta | 8.95 ± 0.16 | 20.16 ± 0.84 | 0.00 ± 0.00 | 2.46 ± 0.24 | 27.01 ± 0.07 | 4.06 ± 0.46 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 13B AI2 | 8.48 ± 0.18 | 32.22 ± 1.12 | 0.00 ± 0.00 | 1.18 ± 0.23 | 17.41 ± 0.06 | 0.08 ± 0.06 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 7B AI2 | 6.98 ± 0.14 | 26.51 ± 0.83 | 0.00 ± 0.00 | 0.29 ± 0.12 | 15.09 ± 0.06 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tamil Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TA | Instruction Following | SEA-IFEval |
---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.43 ± 0.16 | 71.05 ± 0.64 | 71.05 ± 0.64 |
![]() ![]() Gemma 3 27B | 64.36 ± 0.22 | 71.33 ± 0.91 | 71.33 ± 0.91 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 62.30 ± 0.15 | 78.38 ± 0.63 | 78.38 ± 0.63 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 60.05 ± 0.13 | 73.71 ± 0.50 | 73.71 ± 0.50 |
![]() ![]() Gemma 3 12B | 59.86 ± 0.21 | 70.67 ± 0.87 | 70.67 ± 0.87 |
![]() ![]() Qwen 3 32B Alibaba | 58.88 ± 0.24 | 76.06 ± 0.71 | 76.06 ± 0.71 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 58.69 ± 0.15 | 80.70 ± 0.62 | 80.70 ± 0.62 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 57.99 ± 0.21 | 75.97 ± 0.94 | 75.97 ± 0.94 |
![]() ![]() Qwen 3 30B MoE Alibaba | 56.06 ± 0.16 | 66.38 ± 0.79 | 66.38 ± 0.79 |
![]() ![]() Llama 3.3 70B Meta | 54.74 ± 0.15 | 70.22 ± 0.73 | 70.22 ± 0.73 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 53.89 ± 0.21 | 66.83 ± 0.79 | 66.83 ± 0.79 |
![]() ![]() Gemma 2 27B | 53.44 ± 0.23 | 63.24 ± 1.09 | 63.24 ± 1.09 |
![]() ![]() Command A 03-2025 111B CohereLabs | 53.29 ± 0.20 | 68.60 ± 0.98 | 68.60 ± 0.98 |
![]() ![]() Qwen 3 14B Alibaba | 52.30 ± 0.17 | 68.44 ± 0.68 | 68.44 ± 0.68 |
![]() ![]() Tulu 3 70B AI2 | 50.61 ± 0.23 | 59.56 ± 0.94 | 59.56 ± 0.94 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 49.44 ± 0.29 | 62.41 ± 1.10 | 62.41 ± 1.10 |
![]() ![]() Gemma 2 9B | 48.17 ± 0.23 | 58.57 ± 1.08 | 58.57 ± 1.08 |
![]() ![]() Llama 3.1 70B Meta | 46.08 ± 0.21 | 50.70 ± 1.18 | 50.70 ± 1.18 |
![]() ![]() MERaLiON 2 10B A*STAR | 45.29 ± 0.25 | 58.89 ± 0.90 | 58.89 ± 0.90 |
![]() ![]() Qwen 2.5 32B Alibaba | 44.29 ± 0.22 | 66.79 ± 0.81 | 66.79 ± 0.81 |
![]() ![]() Qwen 2.5 72B Alibaba | 42.51 ± 0.15 | 63.49 ± 1.02 | 63.49 ± 1.02 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 42.13 ± 0.32 | 63.71 ± 1.09 | 63.71 ± 1.09 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 40.75 ± 0.18 | 55.27 ± 0.79 | 55.27 ± 0.79 |
![]() ![]() Aya Expanse 32B CohereLabs | 40.58 ± 0.15 | 54.67 ± 0.74 | 54.67 ± 0.74 |
![]() ![]() Command R 08-2024 32B CohereLabs | 36.43 ± 0.27 | 38.98 ± 1.14 | 38.98 ± 1.14 |
![]() ![]() Sailor2 20B SAIL | 35.23 ± 0.20 | 29.24 ± 0.85 | 29.24 ± 0.85 |
![]() ![]() Qwen 3 8B Alibaba | 33.23 ± 0.16 | 62.67 ± 0.59 | 62.67 ± 0.59 |
![]() ![]() Qwen 2.5 14B Alibaba | 32.49 ± 0.22 | 43.87 ± 0.91 | 43.87 ± 0.91 |
![]() ![]() Llama 3 70B Meta | 31.13 ± 0.16 | 15.40 ± 0.72 | 15.40 ± 0.72 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 29.43 ± 0.39 | 35.68 ± 1.10 | 35.68 ± 1.10 |
![]() ![]() Apertus 70B Swiss AI | 28.48 ± 0.30 | 51.02 ± 1.39 | 51.02 ± 1.39 |
![]() ![]() Babel 83B Alibaba-DAMO | 25.82 ± 0.33 | 33.40 ± 0.84 | 33.40 ± 0.84 |
![]() ![]() Tulu 3 8B AI2 | 24.58 ± 0.25 | 48.22 ± 1.14 | 48.22 ± 1.14 |
![]() ![]() phi-4 14B Microsoft | 22.74 ± 0.23 | 32.19 ± 0.98 | 32.19 ± 0.98 |
![]() ![]() Qwen 2.5 7B Alibaba | 21.63 ± 0.13 | 44.98 ± 0.65 | 44.98 ± 0.65 |
![]() ![]() Sailor2 8B SAIL | 20.59 ± 0.16 | 25.78 ± 0.90 | 25.78 ± 0.90 |
![]() ![]() Olmo 2 0325 32B AI2 | 19.42 ± 0.31 | 51.56 ± 0.95 | 51.56 ± 0.95 |
![]() ![]() Llama 3.1 8B Meta | 17.53 ± 0.26 | 36.98 ± 1.21 | 36.98 ± 1.21 |
![]() ![]() Apertus 8B Swiss AI | 17.21 ± 0.27 | 42.54 ± 1.16 | 42.54 ± 1.16 |
![]() ![]() Aya Expanse 8B CohereLabs | 17.05 ± 0.23 | 32.60 ± 1.08 | 32.60 ± 1.08 |
![]() ![]() Babel 9B Alibaba-DAMO | 16.70 ± 0.21 | 38.03 ± 0.90 | 38.03 ± 0.90 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 13.32 ± 0.18 | 38.57 ± 0.98 | 38.57 ± 0.98 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 11.92 ± 0.18 | 32.79 ± 0.83 | 32.79 ± 0.83 |
![]() ![]() Ministral 2410 8B Mistral AI | 11.11 ± 0.19 | 22.03 ± 1.00 | 22.03 ± 1.00 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 10.39 ± 0.28 | 31.97 ± 1.43 | 31.97 ± 1.43 |
![]() ![]() Llama 3 8B Meta | 8.95 ± 0.16 | 20.16 ± 0.84 | 20.16 ± 0.84 |
![]() ![]() Olmo 2 1124 13B AI2 | 8.48 ± 0.18 | 32.22 ± 1.12 | 32.22 ± 1.12 |
![]() ![]() Olmo 2 1124 7B AI2 | 6.98 ± 0.14 | 26.51 ± 0.83 | 26.51 ± 0.83 |
Model | TA | Linguistic Diagnostics | Syntax | Pragmatics |
---|---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.43 ± 0.16 | 71.55 ± 0.48 | 85.93 ± 0.29 | 57.18 ± 0.87 |
![]() ![]() Gemma 3 27B | 64.36 ± 0.22 | 71.32 ± 0.33 | 85.65 ± 0.18 | 57.00 ± 0.63 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 62.30 ± 0.15 | 71.39 ± 0.26 | 87.72 ± 0.08 | 55.06 ± 0.53 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 60.05 ± 0.13 | 71.51 ± 0.29 | 83.09 ± 0.20 | 59.93 ± 0.58 |
![]() ![]() Gemma 3 12B | 59.86 ± 0.21 | 58.39 ± 0.50 | 63.94 ± 0.68 | 52.84 ± 0.73 |
![]() ![]() Qwen 3 32B Alibaba | 58.88 ± 0.24 | 69.40 ± 0.84 | 85.33 ± 0.35 | 53.47 ± 1.63 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 58.69 ± 0.15 | 62.58 ± 0.29 | 72.03 ± 0.11 | 53.12 ± 0.62 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 57.99 ± 0.21 | 60.28 ± 1.06 | 70.65 ± 0.69 | 49.91 ± 1.93 |
![]() ![]() Qwen 3 30B MoE Alibaba | 56.06 ± 0.16 | 63.44 ± 0.38 | 78.17 ± 0.26 | 48.70 ± 0.73 |
![]() ![]() Llama 3.3 70B Meta | 54.74 ± 0.15 | 55.64 ± 0.42 | 67.43 ± 0.35 | 43.86 ± 0.63 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 53.89 ± 0.21 | 62.07 ± 0.42 | 69.70 ± 0.73 | 54.44 ± 0.48 |
![]() ![]() Gemma 2 27B | 53.44 ± 0.23 | 62.72 ± 0.36 | 77.04 ± 0.49 | 48.41 ± 0.47 |
![]() ![]() Command A 03-2025 111B CohereLabs | 53.29 ± 0.20 | 63.64 ± 0.58 | 73.69 ± 0.55 | 53.60 ± 1.09 |
![]() ![]() Qwen 3 14B Alibaba | 52.30 ± 0.17 | 60.92 ± 0.48 | 73.74 ± 0.22 | 48.10 ± 0.92 |
![]() ![]() Tulu 3 70B AI2 | 50.61 ± 0.23 | 47.47 ± 0.68 | 60.75 ± 0.53 | 34.18 ± 1.09 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 49.44 ± 0.29 | 49.21 ± 1.18 | 64.14 ± 0.90 | 34.29 ± 1.99 |
![]() ![]() Gemma 2 9B | 48.17 ± 0.23 | 63.86 ± 0.52 | 69.22 ± 0.65 | 58.51 ± 0.75 |
![]() ![]() Llama 3.1 70B Meta | 46.08 ± 0.21 | 49.12 ± 0.73 | 63.18 ± 0.70 | 35.06 ± 1.31 |
![]() ![]() MERaLiON 2 10B A*STAR | 45.29 ± 0.25 | 58.11 ± 0.49 | 59.19 ± 0.70 | 57.03 ± 0.70 |
![]() ![]() Qwen 2.5 32B Alibaba | 44.29 ± 0.22 | 43.65 ± 0.55 | 53.56 ± 0.18 | 33.74 ± 1.07 |
![]() ![]() Qwen 2.5 72B Alibaba | 42.51 ± 0.15 | 40.11 ± 0.35 | 32.84 ± 0.33 | 47.39 ± 0.72 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 42.13 ± 0.32 | 29.91 ± 1.13 | 31.21 ± 1.41 | 28.60 ± 1.38 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 40.75 ± 0.18 | 16.30 ± 0.48 | 32.61 ± 0.96 | 0.00 ± 0.00 |
![]() ![]() Aya Expanse 32B CohereLabs | 40.58 ± 0.15 | 35.43 ± 0.44 | 39.28 ± 0.42 | 31.58 ± 0.70 |
![]() ![]() Command R 08-2024 32B CohereLabs | 36.43 ± 0.27 | 24.42 ± 1.11 | 30.62 ± 1.01 | 18.22 ± 1.83 |
![]() ![]() Sailor2 20B SAIL | 35.23 ± 0.20 | 42.96 ± 0.26 | 66.95 ± 0.18 | 18.98 ± 0.52 |
![]() ![]() Qwen 3 8B Alibaba | 33.23 ± 0.16 | 28.57 ± 0.31 | 31.67 ± 0.42 | 25.48 ± 0.46 |
![]() ![]() Qwen 2.5 14B Alibaba | 32.49 ± 0.22 | 38.80 ± 0.60 | 45.05 ± 0.23 | 32.55 ± 1.18 |
![]() ![]() Llama 3 70B Meta | 31.13 ± 0.16 | 38.92 ± 0.44 | 57.23 ± 0.47 | 20.61 ± 0.68 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 29.43 ± 0.39 | 20.33 ± 1.22 | 13.35 ± 1.20 | 27.32 ± 2.18 |
![]() ![]() Apertus 70B Swiss AI | 28.48 ± 0.30 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Babel 83B Alibaba-DAMO | 25.82 ± 0.33 | 21.72 ± 1.31 | 33.52 ± 1.38 | 9.92 ± 2.41 |
![]() ![]() Tulu 3 8B AI2 | 24.58 ± 0.25 | 9.82 ± 0.51 | 19.65 ± 1.03 | 0.00 ± 0.00 |
![]() ![]() phi-4 14B Microsoft | 22.74 ± 0.23 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Qwen 2.5 7B Alibaba | 21.63 ± 0.13 | 3.59 ± 0.28 | 0.00 ± 0.00 | 7.18 ± 0.56 |
![]() ![]() Sailor2 8B SAIL | 20.59 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 0325 32B AI2 | 19.42 ± 0.31 | 12.81 ± 1.20 | 9.90 ± 1.77 | 15.71 ± 1.82 |
![]() ![]() Llama 3.1 8B Meta | 17.53 ± 0.26 | 5.15 ± 0.92 | 7.70 ± 1.63 | 2.61 ± 0.58 |
![]() ![]() Apertus 8B Swiss AI | 17.21 ± 0.27 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Aya Expanse 8B CohereLabs | 17.05 ± 0.23 | 12.17 ± 0.61 | 11.69 ± 0.18 | 12.66 ± 1.19 |
![]() ![]() Babel 9B Alibaba-DAMO | 16.70 ± 0.21 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 13.32 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 11.92 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Ministral 2410 8B Mistral AI | 11.11 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 10.39 ± 0.28 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Llama 3 8B Meta | 8.95 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 13B AI2 | 8.48 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 7B AI2 | 6.98 ± 0.14 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | TA | Multi-Turn Chat | SEA-MT-Bench |
---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.43 ± 0.16 | 40.52 ± 0.67 | 40.52 ± 0.67 |
![]() ![]() Gemma 3 27B | 64.36 ± 0.22 | 40.19 ± 0.82 | 40.19 ± 0.82 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 62.30 ± 0.15 | 20.96 ± 0.57 | 20.96 ± 0.57 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 60.05 ± 0.13 | 23.41 ± 0.57 | 23.41 ± 0.57 |
![]() ![]() Gemma 3 12B | 59.86 ± 0.21 | 36.03 ± 0.56 | 36.03 ± 0.56 |
![]() ![]() Qwen 3 32B Alibaba | 58.88 ± 0.24 | 17.14 ± 0.53 | 17.14 ± 0.53 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 58.69 ± 0.15 | 12.10 ± 0.49 | 12.10 ± 0.49 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 57.99 ± 0.21 | 15.69 ± 0.44 | 15.69 ± 0.44 |
![]() ![]() Qwen 3 30B MoE Alibaba | 56.06 ± 0.16 | 19.20 ± 0.44 | 19.20 ± 0.44 |
![]() ![]() Llama 3.3 70B Meta | 54.74 ± 0.15 | 8.43 ± 0.52 | 8.43 ± 0.52 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 53.89 ± 0.21 | 17.53 ± 0.54 | 17.53 ± 0.54 |
![]() ![]() Gemma 2 27B | 53.44 ± 0.23 | 7.47 ± 0.30 | 7.47 ± 0.30 |
![]() ![]() Command A 03-2025 111B CohereLabs | 53.29 ± 0.20 | 17.46 ± 0.36 | 17.46 ± 0.36 |
![]() ![]() Qwen 3 14B Alibaba | 52.30 ± 0.17 | 12.80 ± 0.40 | 12.80 ± 0.40 |
![]() ![]() Tulu 3 70B AI2 | 50.61 ± 0.23 | 10.19 ± 0.46 | 10.19 ± 0.46 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 49.44 ± 0.29 | 10.70 ± 0.56 | 10.70 ± 0.56 |
![]() ![]() Gemma 2 9B | 48.17 ± 0.23 | 6.62 ± 0.39 | 6.62 ± 0.39 |
![]() ![]() Llama 3.1 70B Meta | 46.08 ± 0.21 | 8.15 ± 0.39 | 8.15 ± 0.39 |
![]() ![]() MERaLiON 2 10B A*STAR | 45.29 ± 0.25 | 5.86 ± 0.40 | 5.86 ± 0.40 |
![]() ![]() Qwen 2.5 32B Alibaba | 44.29 ± 0.22 | 7.51 ± 0.44 | 7.51 ± 0.44 |
![]() ![]() Qwen 2.5 72B Alibaba | 42.51 ± 0.15 | 10.93 ± 0.32 | 10.93 ± 0.32 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 42.13 ± 0.32 | 9.24 ± 0.50 | 9.24 ± 0.50 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 40.75 ± 0.18 | 12.20 ± 0.46 | 12.20 ± 0.46 |
![]() ![]() Aya Expanse 32B CohereLabs | 40.58 ± 0.15 | 6.81 ± 0.32 | 6.81 ± 0.32 |
![]() ![]() Command R 08-2024 32B CohereLabs | 36.43 ± 0.27 | 1.91 ± 0.31 | 1.91 ± 0.31 |
![]() ![]() Sailor2 20B SAIL | 35.23 ± 0.20 | 10.96 ± 0.45 | 10.96 ± 0.45 |
![]() ![]() Qwen 3 8B Alibaba | 33.23 ± 0.16 | 12.30 ± 0.40 | 12.30 ± 0.40 |
![]() ![]() Qwen 2.5 14B Alibaba | 32.49 ± 0.22 | 6.38 ± 0.26 | 6.38 ± 0.26 |
![]() ![]() Llama 3 70B Meta | 31.13 ± 0.16 | 4.80 ± 0.34 | 4.80 ± 0.34 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 29.43 ± 0.39 | 2.77 ± 0.24 | 2.77 ± 0.24 |
![]() ![]() Apertus 70B Swiss AI | 28.48 ± 0.30 | 8.55 ± 0.56 | 8.55 ± 0.56 |
![]() ![]() Babel 83B Alibaba-DAMO | 25.82 ± 0.33 | 2.08 ± 0.34 | 2.08 ± 0.34 |
![]() ![]() Tulu 3 8B AI2 | 24.58 ± 0.25 | 2.56 ± 0.27 | 2.56 ± 0.27 |
![]() ![]() phi-4 14B Microsoft | 22.74 ± 0.23 | 8.13 ± 0.40 | 8.13 ± 0.40 |
![]() ![]() Qwen 2.5 7B Alibaba | 21.63 ± 0.13 | 2.64 ± 0.33 | 2.64 ± 0.33 |
![]() ![]() Sailor2 8B SAIL | 20.59 ± 0.16 | 7.24 ± 0.39 | 7.24 ± 0.39 |
![]() ![]() Olmo 2 0325 32B AI2 | 19.42 ± 0.31 | 2.40 ± 0.28 | 2.40 ± 0.28 |
![]() ![]() Llama 3.1 8B Meta | 17.53 ± 0.26 | 3.92 ± 0.35 | 3.92 ± 0.35 |
![]() ![]() Apertus 8B Swiss AI | 17.21 ± 0.27 | 2.30 ± 0.34 | 2.30 ± 0.34 |
![]() ![]() Aya Expanse 8B CohereLabs | 17.05 ± 0.23 | 2.24 ± 0.22 | 2.24 ± 0.22 |
![]() ![]() Babel 9B Alibaba-DAMO | 16.70 ± 0.21 | 2.54 ± 0.22 | 2.54 ± 0.22 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 13.32 ± 0.18 | 1.51 ± 0.25 | 1.51 ± 0.25 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 11.92 ± 0.18 | 2.13 ± 0.30 | 2.13 ± 0.30 |
![]() ![]() Ministral 2410 8B Mistral AI | 11.11 ± 0.19 | 0.99 ± 0.19 | 0.99 ± 0.19 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 10.39 ± 0.28 | 1.93 ± 0.30 | 1.93 ± 0.30 |
![]() ![]() Llama 3 8B Meta | 8.95 ± 0.16 | 2.46 ± 0.24 | 2.46 ± 0.24 |
![]() ![]() Olmo 2 1124 13B AI2 | 8.48 ± 0.18 | 1.18 ± 0.23 | 1.18 ± 0.23 |
![]() ![]() Olmo 2 1124 7B AI2 | 6.98 ± 0.14 | 0.29 ± 0.12 | 0.29 ± 0.12 |
Model | TA | NLG | Summarization | Translations |
---|---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.43 ± 0.16 | 50.37 ± 0.05 | 11.35 ± 0.10 | 89.38 ± 0.02 |
![]() ![]() Gemma 3 27B | 64.36 ± 0.22 | 50.50 ± 0.06 | 11.46 ± 0.12 | 89.54 ± 0.02 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 62.30 ± 0.15 | 49.73 ± 0.06 | 13.38 ± 0.12 | 86.08 ± 0.03 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 60.05 ± 0.13 | 47.63 ± 0.04 | 10.68 ± 0.08 | 84.58 ± 0.02 |
![]() ![]() Gemma 3 12B | 59.86 ± 0.21 | 49.59 ± 0.06 | 11.25 ± 0.11 | 87.92 ± 0.03 |
![]() ![]() Qwen 3 32B Alibaba | 58.88 ± 0.24 | 44.71 ± 0.08 | 13.02 ± 0.15 | 76.39 ± 0.06 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 58.69 ± 0.15 | 50.18 ± 0.07 | 15.05 ± 0.14 | 85.32 ± 0.03 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 57.99 ± 0.21 | 48.62 ± 0.08 | 13.74 ± 0.12 | 83.49 ± 0.08 |
![]() ![]() Qwen 3 30B MoE Alibaba | 56.06 ± 0.16 | 42.10 ± 0.05 | 11.68 ± 0.09 | 72.51 ± 0.07 |
![]() ![]() Llama 3.3 70B Meta | 54.74 ± 0.15 | 46.86 ± 0.11 | 13.73 ± 0.17 | 79.99 ± 0.07 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 53.89 ± 0.21 | 42.47 ± 0.09 | 12.86 ± 0.13 | 72.09 ± 0.11 |
![]() ![]() Gemma 2 27B | 53.44 ± 0.23 | 44.45 ± 0.13 | 13.13 ± 0.19 | 75.77 ± 0.14 |
![]() ![]() Command A 03-2025 111B CohereLabs | 53.29 ± 0.20 | 47.94 ± 0.06 | 12.21 ± 0.08 | 83.66 ± 0.09 |
![]() ![]() Qwen 3 14B Alibaba | 52.30 ± 0.17 | 41.64 ± 0.10 | 12.76 ± 0.11 | 70.52 ± 0.13 |
![]() ![]() Tulu 3 70B AI2 | 50.61 ± 0.23 | 44.37 ± 0.13 | 11.83 ± 0.19 | 76.90 ± 0.10 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 49.44 ± 0.29 | 47.26 ± 0.08 | 12.80 ± 0.16 | 81.71 ± 0.11 |
![]() ![]() Gemma 2 9B | 48.17 ± 0.23 | 35.18 ± 0.10 | 10.76 ± 0.16 | 59.59 ± 0.14 |
![]() ![]() Llama 3.1 70B Meta | 46.08 ± 0.21 | 27.39 ± 0.16 | 13.52 ± 0.23 | 41.26 ± 0.19 |
![]() ![]() MERaLiON 2 10B A*STAR | 45.29 ± 0.25 | 26.24 ± 0.14 | 9.51 ± 0.23 | 42.98 ± 0.11 |
![]() ![]() Qwen 2.5 32B Alibaba | 44.29 ± 0.22 | 35.04 ± 0.08 | 11.90 ± 0.15 | 58.18 ± 0.09 |
![]() ![]() Qwen 2.5 72B Alibaba | 42.51 ± 0.15 | 37.34 ± 0.09 | 7.58 ± 0.15 | 67.11 ± 0.09 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 42.13 ± 0.32 | 42.81 ± 0.09 | 11.27 ± 0.13 | 74.35 ± 0.13 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 40.75 ± 0.18 | 40.44 ± 0.09 | 11.46 ± 0.13 | 69.42 ± 0.11 |
![]() ![]() Aya Expanse 32B CohereLabs | 40.58 ± 0.15 | 36.62 ± 0.04 | 0.00 ± 0.00 | 73.24 ± 0.08 |
![]() ![]() Command R 08-2024 32B CohereLabs | 36.43 ± 0.27 | 37.81 ± 0.13 | 10.97 ± 0.21 | 64.66 ± 0.14 |
![]() ![]() Sailor2 20B SAIL | 35.23 ± 0.20 | 22.97 ± 0.07 | 0.00 ± 0.00 | 45.94 ± 0.15 |
![]() ![]() Qwen 3 8B Alibaba | 33.23 ± 0.16 | 20.38 ± 0.04 | 0.00 ± 0.00 | 40.77 ± 0.08 |
![]() ![]() Qwen 2.5 14B Alibaba | 32.49 ± 0.22 | 25.85 ± 0.07 | 9.89 ± 0.13 | 41.81 ± 0.07 |
![]() ![]() Llama 3 70B Meta | 31.13 ± 0.16 | 31.35 ± 0.05 | 0.00 ± 0.00 | 62.69 ± 0.09 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 29.43 ± 0.39 | 40.17 ± 0.10 | 8.59 ± 0.18 | 71.74 ± 0.11 |
![]() ![]() Apertus 70B Swiss AI | 28.48 ± 0.30 | 39.26 ± 0.10 | 8.95 ± 0.17 | 69.56 ± 0.14 |
![]() ![]() Babel 83B Alibaba-DAMO | 25.82 ± 0.33 | 24.48 ± 0.13 | 9.71 ± 0.24 | 39.25 ± 0.18 |
![]() ![]() Tulu 3 8B AI2 | 24.58 ± 0.25 | 36.04 ± 0.11 | 11.26 ± 0.17 | 60.81 ± 0.12 |
![]() ![]() phi-4 14B Microsoft | 22.74 ± 0.23 | 31.25 ± 0.10 | 8.72 ± 0.13 | 53.77 ± 0.11 |
![]() ![]() Qwen 2.5 7B Alibaba | 21.63 ± 0.13 | 17.57 ± 0.08 | 8.46 ± 0.14 | 26.68 ± 0.07 |
![]() ![]() Sailor2 8B SAIL | 20.59 ± 0.16 | 23.60 ± 0.08 | 0.00 ± 0.00 | 47.19 ± 0.15 |
![]() ![]() Olmo 2 0325 32B AI2 | 19.42 ± 0.31 | 25.23 ± 0.06 | 0.00 ± 0.00 | 50.47 ± 0.13 |
![]() ![]() Llama 3.1 8B Meta | 17.53 ± 0.26 | 29.43 ± 0.10 | 12.74 ± 0.16 | 46.11 ± 0.18 |
![]() ![]() Apertus 8B Swiss AI | 17.21 ± 0.27 | 30.78 ± 0.13 | 7.69 ± 0.24 | 53.87 ± 0.15 |
![]() ![]() Aya Expanse 8B CohereLabs | 17.05 ± 0.23 | 19.69 ± 0.05 | 0.00 ± 0.00 | 39.38 ± 0.10 |
![]() ![]() Babel 9B Alibaba-DAMO | 16.70 ± 0.21 | 28.21 ± 0.13 | 10.65 ± 0.16 | 45.78 ± 0.17 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 13.32 ± 0.18 | 24.67 ± 0.09 | 8.61 ± 0.18 | 40.73 ± 0.12 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 11.92 ± 0.18 | 20.07 ± 0.08 | 10.47 ± 0.17 | 29.66 ± 0.09 |
![]() ![]() Ministral 2410 8B Mistral AI | 11.11 ± 0.19 | 18.72 ± 0.10 | 3.58 ± 0.16 | 33.86 ± 0.09 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 10.39 ± 0.28 | 11.22 ± 0.08 | 0.76 ± 0.06 | 21.67 ± 0.12 |
![]() ![]() Llama 3 8B Meta | 8.95 ± 0.16 | 27.01 ± 0.07 | 0.00 ± 0.00 | 54.02 ± 0.14 |
![]() ![]() Olmo 2 1124 13B AI2 | 8.48 ± 0.18 | 17.41 ± 0.06 | 0.00 ± 0.00 | 34.82 ± 0.12 |
![]() ![]() Olmo 2 1124 7B AI2 | 6.98 ± 0.14 | 15.09 ± 0.06 | 0.00 ± 0.00 | 30.18 ± 0.12 |
Model | TA | NLR | Causal Reasoning | Natural Language Inference |
---|---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.43 ± 0.16 | 71.26 ± 0.16 | 87.84 ± 0.16 | 54.68 ± 0.25 |
![]() ![]() Gemma 3 27B | 64.36 ± 0.22 | 71.71 ± 0.12 | 88.41 ± 0.14 | 55.01 ± 0.17 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 62.30 ± 0.15 | 69.41 ± 0.11 | 87.19 ± 0.10 | 51.64 ± 0.21 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 60.05 ± 0.13 | 62.50 ± 0.16 | 83.87 ± 0.26 | 41.14 ± 0.14 |
![]() ![]() Gemma 3 12B | 59.86 ± 0.21 | 61.95 ± 0.18 | 84.25 ± 0.28 | 39.65 ± 0.27 |
![]() ![]() Qwen 3 32B Alibaba | 58.88 ± 0.24 | 62.93 ± 0.18 | 79.63 ± 0.23 | 46.22 ± 0.22 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 58.69 ± 0.15 | 63.61 ± 0.05 | 85.11 ± 0.06 | 42.12 ± 0.11 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 57.99 ± 0.21 | 64.72 ± 0.31 | 84.64 ± 0.53 | 44.79 ± 0.56 |
![]() ![]() Qwen 3 30B MoE Alibaba | 56.06 ± 0.16 | 63.60 ± 0.11 | 82.59 ± 0.19 | 44.62 ± 0.22 |
![]() ![]() Llama 3.3 70B Meta | 54.74 ± 0.15 | 65.14 ± 0.18 | 84.77 ± 0.29 | 45.51 ± 0.33 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 53.89 ± 0.21 | 52.68 ± 0.21 | 79.53 ± 0.40 | 25.82 ± 0.11 |
![]() ![]() Gemma 2 27B | 53.44 ± 0.23 | 61.24 ± 0.21 | 74.36 ± 0.41 | 48.13 ± 0.18 |
![]() ![]() Command A 03-2025 111B CohereLabs | 53.29 ± 0.20 | 48.83 ± 0.33 | 81.61 ± 0.38 | 16.04 ± 0.53 |
![]() ![]() Qwen 3 14B Alibaba | 52.30 ± 0.17 | 45.50 ± 0.19 | 61.69 ± 0.27 | 29.31 ± 0.21 |
![]() ![]() Tulu 3 70B AI2 | 50.61 ± 0.23 | 62.63 ± 0.30 | 78.53 ± 0.55 | 46.73 ± 0.53 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 49.44 ± 0.29 | 45.49 ± 0.45 | 62.29 ± 0.59 | 28.69 ± 0.56 |
![]() ![]() Gemma 2 9B | 48.17 ± 0.23 | 42.72 ± 0.39 | 69.51 ± 0.76 | 15.93 ± 0.08 |
![]() ![]() Llama 3.1 70B Meta | 46.08 ± 0.21 | 59.55 ± 0.55 | 80.89 ± 0.55 | 38.21 ± 0.74 |
![]() ![]() MERaLiON 2 10B A*STAR | 45.29 ± 0.25 | 42.79 ± 0.42 | 68.96 ± 0.76 | 16.63 ± 0.16 |
![]() ![]() Qwen 2.5 32B Alibaba | 44.29 ± 0.22 | 39.56 ± 0.14 | 49.21 ± 0.23 | 29.90 ± 0.11 |
![]() ![]() Qwen 2.5 72B Alibaba | 42.51 ± 0.15 | 31.67 ± 0.23 | 44.97 ± 0.40 | 18.37 ± 0.21 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 42.13 ± 0.32 | 31.62 ± 0.76 | 38.61 ± 1.29 | 24.63 ± 0.71 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 40.75 ± 0.18 | 45.63 ± 0.40 | 60.72 ± 0.75 | 30.54 ± 0.21 |
![]() ![]() Aya Expanse 32B CohereLabs | 40.58 ± 0.15 | 35.90 ± 0.20 | 42.04 ± 0.37 | 29.77 ± 0.29 |
![]() ![]() Command R 08-2024 32B CohereLabs | 36.43 ± 0.27 | 37.14 ± 0.53 | 50.92 ± 0.86 | 23.35 ± 0.53 |
![]() ![]() Sailor2 20B SAIL | 35.23 ± 0.20 | 61.84 ± 0.13 | 82.85 ± 0.18 | 40.82 ± 0.20 |
![]() ![]() Qwen 3 8B Alibaba | 33.23 ± 0.16 | 24.69 ± 0.16 | 47.15 ± 0.31 | 2.23 ± 0.07 |
![]() ![]() Qwen 2.5 14B Alibaba | 32.49 ± 0.22 | 23.27 ± 0.19 | 37.27 ± 0.39 | 9.28 ± 0.09 |
![]() ![]() Llama 3 70B Meta | 31.13 ± 0.16 | 50.44 ± 0.29 | 60.60 ± 0.40 | 40.28 ± 0.33 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 29.43 ± 0.39 | 15.34 ± 0.68 | 7.61 ± 1.14 | 23.07 ± 0.56 |
![]() ![]() Apertus 70B Swiss AI | 28.48 ± 0.30 | 1.45 ± 0.31 | 0.00 ± 0.00 | 2.91 ± 0.63 |
![]() ![]() Babel 83B Alibaba-DAMO | 25.82 ± 0.33 | 26.08 ± 0.63 | 43.01 ± 1.01 | 9.15 ± 0.80 |
![]() ![]() Tulu 3 8B AI2 | 24.58 ± 0.25 | 27.27 ± 0.52 | 38.43 ± 0.83 | 16.11 ± 0.65 |
![]() ![]() phi-4 14B Microsoft | 22.74 ± 0.23 | 14.79 ± 0.71 | 29.57 ± 1.41 | 0.00 ± 0.00 |
![]() ![]() Qwen 2.5 7B Alibaba | 21.63 ± 0.13 | 10.35 ± 0.13 | 0.00 ± 0.00 | 20.71 ± 0.25 |
![]() ![]() Sailor2 8B SAIL | 20.59 ± 0.16 | 22.00 ± 0.18 | 0.00 ± 0.00 | 44.00 ± 0.35 |
![]() ![]() Olmo 2 0325 32B AI2 | 19.42 ± 0.31 | 20.06 ± 0.58 | 37.33 ± 0.95 | 2.79 ± 0.67 |
![]() ![]() Llama 3.1 8B Meta | 17.53 ± 0.26 | 2.99 ± 0.39 | 0.00 ± 0.00 | 5.98 ± 0.77 |
![]() ![]() Apertus 8B Swiss AI | 17.21 ± 0.27 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Aya Expanse 8B CohereLabs | 17.05 ± 0.23 | 17.19 ± 0.32 | 25.27 ± 0.33 | 9.12 ± 0.58 |
![]() ![]() Babel 9B Alibaba-DAMO | 16.70 ± 0.21 | 1.89 ± 0.57 | 3.79 ± 1.14 | 0.00 ± 0.00 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 13.32 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 11.92 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Ministral 2410 8B Mistral AI | 11.11 ± 0.19 | 1.27 ± 0.30 | 0.00 ± 0.00 | 2.55 ± 0.59 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 10.39 ± 0.28 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Llama 3 8B Meta | 8.95 ± 0.16 | 4.06 ± 0.46 | 0.00 ± 0.00 | 8.13 ± 0.93 |
![]() ![]() Olmo 2 1124 13B AI2 | 8.48 ± 0.18 | 0.08 ± 0.06 | 0.00 ± 0.00 | 0.17 ± 0.12 |
![]() ![]() Olmo 2 1124 7B AI2 | 6.98 ± 0.14 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | TA | NLU | Question Answering | Sentiment Analysis |
---|---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 64.43 ± 0.16 | 81.86 ± 0.22 | 66.59 ± 0.43 | 97.13 ± 0.08 |
![]() ![]() Gemma 3 27B | 64.36 ± 0.22 | 81.08 ± 0.24 | 64.86 ± 0.48 | 97.30 ± 0.05 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 62.30 ± 0.15 | 83.94 ± 0.16 | 72.52 ± 0.31 | 95.36 ± 0.06 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 60.05 ± 0.13 | 81.52 ± 0.08 | 67.79 ± 0.11 | 95.24 ± 0.10 |
![]() ![]() Gemma 3 12B | 59.86 ± 0.21 | 82.53 ± 0.22 | 69.12 ± 0.45 | 95.94 ± 0.10 |
![]() ![]() Qwen 3 32B Alibaba | 58.88 ± 0.24 | 83.05 ± 0.21 | 72.29 ± 0.39 | 93.81 ± 0.16 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 58.69 ± 0.15 | 82.97 ± 0.10 | 68.69 ± 0.21 | 97.24 ± 0.03 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 57.99 ± 0.21 | 82.68 ± 0.26 | 69.01 ± 0.50 | 96.35 ± 0.18 |
![]() ![]() Qwen 3 30B MoE Alibaba | 56.06 ± 0.16 | 81.67 ± 0.09 | 68.95 ± 0.14 | 94.39 ± 0.10 |
![]() ![]() Llama 3.3 70B Meta | 54.74 ± 0.15 | 82.12 ± 0.19 | 68.52 ± 0.38 | 95.72 ± 0.06 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 53.89 ± 0.21 | 81.79 ± 0.23 | 68.70 ± 0.49 | 94.88 ± 0.10 |
![]() ![]() Gemma 2 27B | 53.44 ± 0.23 | 81.52 ± 0.29 | 66.68 ± 0.58 | 96.36 ± 0.12 |
![]() ![]() Command A 03-2025 111B CohereLabs | 53.29 ± 0.20 | 73.26 ± 0.44 | 57.90 ± 0.83 | 88.63 ± 0.25 |
![]() ![]() Qwen 3 14B Alibaba | 52.30 ± 0.17 | 84.48 ± 0.20 | 76.52 ± 0.34 | 92.44 ± 0.14 |
![]() ![]() Tulu 3 70B AI2 | 50.61 ± 0.23 | 79.47 ± 0.39 | 64.49 ± 0.75 | 94.44 ± 0.19 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 49.44 ± 0.29 | 81.59 ± 0.46 | 70.57 ± 0.92 | 92.60 ± 0.19 |
![]() ![]() Gemma 2 9B | 48.17 ± 0.23 | 82.08 ± 0.26 | 68.90 ± 0.51 | 95.27 ± 0.09 |
![]() ![]() Llama 3.1 70B Meta | 46.08 ± 0.21 | 81.58 ± 0.25 | 69.01 ± 0.51 | 94.16 ± 0.17 |
![]() ![]() MERaLiON 2 10B A*STAR | 45.29 ± 0.25 | 79.83 ± 0.36 | 65.39 ± 0.69 | 94.26 ± 0.11 |
![]() ![]() Qwen 2.5 32B Alibaba | 44.29 ± 0.22 | 73.20 ± 0.20 | 61.95 ± 0.20 | 84.46 ± 0.27 |
![]() ![]() Qwen 2.5 72B Alibaba | 42.51 ± 0.15 | 71.51 ± 0.24 | 57.37 ± 0.49 | 85.64 ± 0.16 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 42.13 ± 0.32 | 75.47 ± 0.31 | 56.96 ± 0.61 | 93.97 ± 0.18 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 40.75 ± 0.18 | 74.69 ± 0.32 | 57.54 ± 0.64 | 91.83 ± 0.10 |
![]() ![]() Aya Expanse 32B CohereLabs | 40.58 ± 0.15 | 74.05 ± 0.22 | 54.05 ± 0.45 | 94.05 ± 0.17 |
![]() ![]() Command R 08-2024 32B CohereLabs | 36.43 ± 0.27 | 78.31 ± 0.32 | 62.96 ± 0.56 | 93.67 ± 0.24 |
![]() ![]() Sailor2 20B SAIL | 35.23 ± 0.20 | 43.42 ± 0.13 | 0.00 ± 0.00 | 86.85 ± 0.26 |
![]() ![]() Qwen 3 8B Alibaba | 33.23 ± 0.16 | 50.78 ± 0.42 | 66.38 ± 0.51 | 35.18 ± 0.64 |
![]() ![]() Qwen 2.5 14B Alibaba | 32.49 ± 0.22 | 56.78 ± 0.23 | 37.93 ± 0.31 | 75.62 ± 0.24 |
![]() ![]() Llama 3 70B Meta | 31.13 ± 0.16 | 45.85 ± 0.04 | 0.00 ± 0.00 | 91.70 ± 0.08 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 29.43 ± 0.39 | 62.28 ± 0.55 | 59.18 ± 0.82 | 65.37 ± 0.76 |
![]() ![]() Apertus 70B Swiss AI | 28.48 ± 0.30 | 70.60 ± 0.44 | 54.36 ± 0.88 | 86.85 ± 0.47 |
![]() ![]() Babel 83B Alibaba-DAMO | 25.82 ± 0.33 | 47.16 ± 0.55 | 24.73 ± 0.82 | 69.58 ± 0.62 |
![]() ![]() Tulu 3 8B AI2 | 24.58 ± 0.25 | 23.58 ± 0.34 | 47.16 ± 0.69 | 0.00 ± 0.00 |
![]() ![]() phi-4 14B Microsoft | 22.74 ± 0.23 | 50.06 ± 0.55 | 42.45 ± 0.75 | 57.67 ± 0.71 |
![]() ![]() Qwen 2.5 7B Alibaba | 21.63 ± 0.13 | 50.67 ± 0.27 | 50.22 ± 0.38 | 51.12 ± 0.41 |
![]() ![]() Sailor2 8B SAIL | 20.59 ± 0.16 | 44.95 ± 0.12 | 0.00 ± 0.00 | 89.91 ± 0.24 |
![]() ![]() Olmo 2 0325 32B AI2 | 19.42 ± 0.31 | 4.44 ± 0.46 | 0.00 ± 0.00 | 8.88 ± 0.91 |
![]() ![]() Llama 3.1 8B Meta | 17.53 ± 0.26 | 26.72 ± 0.51 | 53.45 ± 1.02 | 0.00 ± 0.00 |
![]() ![]() Apertus 8B Swiss AI | 17.21 ± 0.27 | 27.67 ± 0.91 | 42.96 ± 0.87 | 12.37 ± 1.27 |
![]() ![]() Aya Expanse 8B CohereLabs | 17.05 ± 0.23 | 18.40 ± 0.41 | 36.79 ± 0.82 | 0.00 ± 0.00 |
![]() ![]() Babel 9B Alibaba-DAMO | 16.70 ± 0.21 | 29.55 ± 0.32 | 59.10 ± 0.63 | 0.00 ± 0.00 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 13.32 ± 0.18 | 15.19 ± 0.50 | 30.38 ± 0.99 | 0.00 ± 0.00 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 11.92 ± 0.18 | 16.54 ± 0.38 | 33.08 ± 0.76 | 0.00 ± 0.00 |
![]() ![]() Ministral 2410 8B Mistral AI | 11.11 ± 0.19 | 23.62 ± 0.52 | 47.24 ± 1.05 | 0.00 ± 0.00 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 10.39 ± 0.28 | 17.25 ± 0.44 | 34.50 ± 0.87 | 0.00 ± 0.00 |
![]() ![]() Llama 3 8B Meta | 8.95 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 13B AI2 | 8.48 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 7B AI2 | 6.98 ± 0.14 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |