Tamil Performance
Tamil Scores by Model
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 27B 68.47±0.30 |
![]() ![]() 27B 68.45±0.47 |
![]() ![]() 12B 65.83±0.63 |
![]() ![]() 109B MoE 64.22±0.28 |
![]() ![]() 32B 64.10±0.39 |
![]() ![]() 70B 63.77±0.69 |
![]() ![]() 30B MoE 61.89±0.30 |
![]() ![]() 70B 60.90±0.25 |
![]() ![]() 9B 60.04±0.65 |
![]() ![]() 111B 59.92±1.67 |
![]() ![]() 14B 59.14±0.29 |
![]() ![]() 27B 59.04±0.68 |
![]() ![]() 123B 57.64±1.61 |
![]() ![]() 70B 57.35±0.43 |
![]() ![]() 9B 54.87±0.74 |
![]() ![]() 32B 53.81±0.25 |
![]() ![]() 70B 53.13±0.93 |
![]() ![]() 8B 52.98±1.40 |
![]() ![]() 72B 52.63±0.34 |
![]() ![]() 10B 52.54±0.92 |
![]() ![]() 32B 50.67±0.39 |
![]() ![]() 21B MoE 50.54±1.30 |
![]() ![]() 32B 47.02±1.01 |
![]() ![]() 8B 45.75±0.73 |
![]() ![]() 104B 43.75±3.06 |
![]() ![]() 14B 43.64±0.32 |
![]() ![]() 20B 42.22±0.54 |
![]() ![]() 70B 40.11±0.45 |
![]() ![]() 83B 38.57±3.77 |
![]() ![]() 8B 35.87±0.78 |
![]() ![]() 32B 35.76±1.18 |
![]() ![]() 7B 34.29±0.76 |
![]() ![]() 14B 33.31±4.06 |
![]() ![]() 8B 30.63±2.24 |
![]() ![]() 8B 30.14±0.92 |
![]() ![]() 9B 29.64±4.45 |
![]() ![]() 8B 27.63±2.40 |
![]() ![]() 8B 21.58±3.26 |
![]() ![]() 7B 21.16±3.91 |
![]() ![]() 24B 19.50±2.79 |
![]() ![]() 13B 18.91±1.84 |
![]() ![]() 7B 17.68±1.76 |
![]() ![]() 7B 15.06±3.63 |
![]() ![]() 8B 14.98±1.96 |
Tamil Competencies
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TA | Instruction Following | Linguistic Diagnostics | Multi-Turn Chat | NLG | NLR | NLU |
---|---|---|---|---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 68.47 ± 0.30 | 71.31 ± 1.19 | 84.73 ± 0.38 | 40.19 ± 1.06 | 50.32 ± 0.08 | 81.89 ± 0.21 | 82.39 ± 0.50 |
![]() ![]() Gemma 3 27B | 68.45 ± 0.47 | 71.31 ± 1.71 | 84.95 ± 0.32 | 40.14 ± 1.79 | 50.46 ± 0.05 | 82.08 ± 0.18 | 81.77 ± 0.24 |
![]() ![]() Gemma 3 12B | 65.83 ± 0.63 | 70.60 ± 2.32 | 79.24 ± 1.18 | 36.05 ± 2.05 | 49.56 ± 0.15 | 75.92 ± 0.51 | 83.60 ± 0.64 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 64.22 ± 0.28 | 80.48 ± 1.45 | 81.82 ± 0.15 | 12.34 ± 0.86 | 50.12 ± 0.19 | 76.96 ± 0.13 | 83.63 ± 0.24 |
![]() ![]() Qwen 3 32B Alibaba | 64.10 ± 0.39 | 75.83 ± 2.02 | 85.06 ± 0.36 | 17.46 ± 1.51 | 44.69 ± 0.32 | 76.97 ± 0.75 | 84.60 ± 0.37 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 63.77 ± 0.69 | 76.79 ± 1.49 | 79.96 ± 1.15 | 15.73 ± 0.89 | 48.65 ± 0.97 | 77.78 ± 2.39 | 83.69 ± 0.86 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.89 ± 0.30 | 67.14 ± 1.54 | 82.98 ± 0.40 | 18.97 ± 0.75 | 42.09 ± 1.17 | 77.13 ± 0.10 | 83.04 ± 0.31 |
![]() ![]() Llama 3.3 70B Meta | 60.90 ± 0.25 | 69.64 ± 1.14 | 79.10 ± 0.38 | 8.73 ± 0.78 | 46.85 ± 0.24 | 77.86 ± 0.20 | 83.22 ± 0.50 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.04 ± 0.65 | 67.26 ± 2.44 | 80.01 ± 1.20 | 17.19 ± 0.77 | 42.45 ± 2.62 | 70.23 ± 0.48 | 83.10 ± 0.65 |
![]() ![]() Command A 03-2025 111B CohereLabs | 59.92 ± 1.67 | 68.81 ± 2.05 | 81.63 ± 1.57 | 17.62 ± 1.16 | 47.96 ± 0.51 | 67.47 ± 6.38 | 76.01 ± 1.12 |
![]() ![]() Qwen 3 14B Alibaba | 59.14 ± 0.29 | 68.57 ± 2.03 | 79.35 ± 0.18 | 12.34 ± 0.53 | 41.55 ± 0.78 | 66.84 ± 0.44 | 86.23 ± 0.28 |
![]() ![]() Gemma 2 27B | 59.04 ± 0.68 | 62.86 ± 3.19 | 80.50 ± 0.71 | 7.60 ± 0.83 | 44.36 ± 2.18 | 76.23 ± 0.40 | 82.71 ± 0.86 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 57.64 ± 1.61 | 63.21 ± 2.93 | 74.66 ± 2.34 | 10.78 ± 1.25 | 47.18 ± 1.07 | 66.76 ± 5.95 | 83.25 ± 1.43 |
![]() ![]() Tulu 3 70B AI2 | 57.35 ± 0.43 | 59.64 ± 2.20 | 72.67 ± 0.80 | 9.86 ± 0.77 | 44.34 ± 0.51 | 76.76 ± 1.32 | 80.86 ± 0.77 |
![]() ![]() Gemma 2 9B | 54.87 ± 0.74 | 58.10 ± 1.62 | 81.49 ± 0.63 | 6.73 ± 0.98 | 35.18 ± 3.26 | 64.39 ± 1.49 | 83.33 ± 0.89 |
![]() ![]() Qwen 2.5 32B Alibaba | 53.81 ± 0.25 | 66.55 ± 1.14 | 72.20 ± 0.31 | 7.76 ± 0.78 | 35.03 ± 0.20 | 63.98 ± 0.14 | 77.31 ± 0.46 |
![]() ![]() Llama 3.1 70B Meta | 53.13 ± 0.93 | 50.48 ± 2.12 | 74.74 ± 1.40 | 8.30 ± 0.79 | 27.46 ± 3.93 | 74.65 ± 1.18 | 83.14 ± 0.67 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.98 ± 1.40 | 63.21 ± 2.28 | 65.79 ± 3.87 | 9.27 ± 1.14 | 42.87 ± 2.04 | 59.68 ± 3.74 | 77.02 ± 1.09 |
![]() ![]() Qwen 2.5 72B Alibaba | 52.63 ± 0.34 | 63.21 ± 1.69 | 69.82 ± 0.49 | 11.05 ± 0.73 | 37.39 ± 0.18 | 59.01 ± 0.84 | 75.29 ± 0.28 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.54 ± 0.92 | 58.69 ± 1.80 | 78.74 ± 2.17 | 5.87 ± 0.94 | 26.00 ± 2.16 | 64.46 ± 0.98 | 81.49 ± 0.65 |
![]() ![]() Aya Expanse 32B CohereLabs | 50.67 ± 0.39 | 54.52 ± 1.16 | 68.68 ± 0.96 | 6.57 ± 0.87 | 36.64 ± 0.51 | 62.12 ± 0.65 | 75.51 ± 0.52 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 50.54 ± 1.30 | 55.24 ± 2.29 | 52.12 ± 6.42 | 11.96 ± 0.74 | 40.42 ± 0.88 | 66.94 ± 1.57 | 76.59 ± 0.55 |
![]() ![]() Command R 08-2024 32B CohereLabs | 47.02 ± 1.01 | 37.62 ± 2.97 | 62.72 ± 5.64 | 1.94 ± 0.58 | 37.82 ± 0.56 | 62.34 ± 2.76 | 79.69 ± 0.82 |
![]() ![]() Qwen 3 8B Alibaba | 45.75 ± 0.73 | 62.98 ± 1.02 | 57.57 ± 2.14 | 12.23 ± 0.89 | 20.40 ± 0.63 | 54.26 ± 0.25 | 67.04 ± 4.31 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 43.75 ± 3.06 | 36.19 ± 2.03 | 60.77 ± 2.09 | 2.96 ± 0.46 | 40.13 ± 0.87 | 51.18 ± 7.87 | 71.28 ± 11.38 |
![]() ![]() Qwen 2.5 14B Alibaba | 43.64 ± 0.32 | 42.98 ± 1.56 | 69.74 ± 0.35 | 6.30 ± 0.80 | 25.82 ± 0.38 | 54.13 ± 0.15 | 62.88 ± 0.18 |
![]() ![]() Sailor2 20B SAIL | 42.22 ± 0.54 | 28.33 ± 1.26 | 68.48 ± 0.41 | 10.83 ± 1.09 | 22.91 ± 1.69 | 75.98 ± 0.36 | 46.75 ± 0.73 |
![]() ![]() Llama 3 70B Meta | 40.11 ± 0.45 | 15.36 ± 1.88 | 70.89 ± 0.46 | 4.85 ± 0.59 | 31.29 ± 1.80 | 70.28 ± 0.74 | 47.96 ± 0.07 |
![]() ![]() Babel 83B Alibaba-DAMO | 38.57 ± 3.77 | 33.33 ± 2.31 | 61.14 ± 9.43 | 1.99 ± 0.68 | 24.44 ± 3.01 | 55.42 ± 9.79 | 55.08 ± 5.28 |
![]() ![]() Tulu 3 8B AI2 | 35.87 ± 0.78 | 49.17 ± 2.14 | 45.28 ± 4.40 | 2.69 ± 0.63 | 35.96 ± 0.30 | 56.52 ± 2.33 | 25.61 ± 2.77 |
![]() ![]() Olmo 2 0325 32B AI2 | 35.76 ± 1.18 | 51.19 ± 2.61 | 56.84 ± 1.64 | 2.42 ± 0.45 | 25.25 ± 1.08 | 51.77 ± 1.36 | 27.06 ± 6.80 |
![]() ![]() Qwen 2.5 7B Alibaba | 34.29 ± 0.76 | 44.88 ± 2.38 | 49.58 ± 1.02 | 2.69 ± 0.59 | 17.67 ± 0.41 | 28.03 ± 1.46 | 62.92 ± 1.41 |
![]() ![]() phi-4 14B Microsoft | 33.31 ± 4.06 | 32.50 ± 1.67 | 31.33 ± 14.08 | 8.14 ± 0.84 | 31.32 ± 1.53 | 35.71 ± 8.21 | 60.84 ± 6.57 |
![]() ![]() Llama 3.1 8B Meta | 30.63 ± 2.24 | 36.31 ± 1.39 | 55.22 ± 2.11 | 4.26 ± 0.84 | 29.48 ± 3.56 | 19.20 ± 1.80 | 39.31 ± 11.35 |
![]() ![]() Aya Expanse 8B CohereLabs | 30.14 ± 0.92 | 32.98 ± 2.28 | 56.71 ± 2.70 | 2.26 ± 0.44 | 19.72 ± 1.21 | 50.96 ± 1.72 | 18.24 ± 2.21 |
![]() ![]() Babel 9B Alibaba-DAMO | 29.64 ± 4.45 | 37.74 ± 3.11 | 19.17 ± 5.50 | 2.64 ± 0.69 | 28.16 ± 2.47 | 37.54 ± 11.20 | 52.57 ± 12.32 |
![]() ![]() Sailor2 8B SAIL | 27.63 ± 2.40 | 25.71 ± 2.42 | 7.58 ± 3.07 | 7.06 ± 0.93 | 23.59 ± 0.96 | 54.34 ± 10.45 | 47.48 ± 0.40 |
![]() ![]() Ministral 2410 8B Mistral AI | 21.58 ± 3.26 | 21.55 ± 3.81 | 16.00 ± 9.70 | 1.08 ± 0.62 | 18.73 ± 0.70 | 32.18 ± 8.09 | 39.97 ± 9.27 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 21.16 ± 3.91 | 38.45 ± 2.75 | 20.63 ± 13.50 | 1.45 ± 0.55 | 24.66 ± 1.59 | 26.39 ± 10.08 | 15.38 ± 2.22 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 19.50 ± 2.79 | 32.02 ± 1.58 | 35.18 ± 11.18 | 1.67 ± 0.87 | 11.21 ± 2.81 | 17.91 ± 5.82 | 19.03 ± 4.62 |
![]() ![]() Olmo 2 1124 13B AI2 | 18.91 ± 1.84 | 31.90 ± 2.37 | 24.02 ± 9.48 | 1.24 ± 0.46 | 17.40 ± 0.93 | 38.64 ± 1.93 | 0.24 ± 0.29 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 17.68 ± 1.76 | 33.33 ± 2.44 | 20.98 ± 7.28 | 2.16 ± 0.36 | 20.15 ± 0.82 | 10.92 ± 5.03 | 18.56 ± 3.50 |
![]() ![]() Olmo 2 1124 7B AI2 | 15.06 ± 3.63 | 26.90 ± 2.46 | 25.63 ± 11.95 | 0.32 ± 0.31 | 15.08 ± 0.29 | 19.14 ± 7.09 | 3.26 ± 4.22 |
![]() ![]() Llama 3 8B Meta | 14.98 ± 1.96 | 20.24 ± 2.51 | 7.12 ± 7.36 | 2.48 ± 0.44 | 27.03 ± 1.54 | 19.45 ± 1.19 | 13.56 ± 8.33 |
Tamil Tasks
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TA | Instruction Following | SEA-IFEval |
---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 68.47 ± 0.30 | 71.31 ± 1.19 | 71.31 ± 7.69 |
![]() ![]() Gemma 3 27B | 68.45 ± 0.47 | 71.31 ± 1.71 | 71.31 ± 7.36 |
![]() ![]() Gemma 3 12B | 65.83 ± 0.63 | 70.60 ± 2.32 | 70.60 ± 7.43 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 64.22 ± 0.28 | 80.48 ± 1.45 | 80.48 ± 6.87 |
![]() ![]() Qwen 3 32B Alibaba | 64.10 ± 0.39 | 75.83 ± 2.02 | 75.83 ± 7.15 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 63.77 ± 0.69 | 76.79 ± 1.49 | 76.79 ± 6.61 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.89 ± 0.30 | 67.14 ± 1.54 | 67.14 ± 8.08 |
![]() ![]() Llama 3.3 70B Meta | 60.90 ± 0.25 | 69.64 ± 1.14 | 69.64 ± 7.73 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.04 ± 0.65 | 67.26 ± 2.44 | 67.26 ± 7.19 |
![]() ![]() Command A 03-2025 111B CohereLabs | 59.92 ± 1.67 | 68.81 ± 2.05 | 68.81 ± 7.43 |
![]() ![]() Qwen 3 14B Alibaba | 59.14 ± 0.29 | 68.57 ± 2.03 | 68.57 ± 7.66 |
![]() ![]() Gemma 2 27B | 59.04 ± 0.68 | 62.86 ± 3.19 | 62.86 ± 6.92 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 57.64 ± 1.61 | 63.21 ± 2.93 | 63.21 ± 6.89 |
![]() ![]() Tulu 3 70B AI2 | 57.35 ± 0.43 | 59.64 ± 2.20 | 59.64 ± 7.80 |
![]() ![]() Gemma 2 9B | 54.87 ± 0.74 | 58.10 ± 1.62 | 58.10 ± 7.34 |
![]() ![]() Qwen 2.5 32B Alibaba | 53.81 ± 0.25 | 66.55 ± 1.14 | 66.55 ± 7.85 |
![]() ![]() Llama 3.1 70B Meta | 53.13 ± 0.93 | 50.48 ± 2.12 | 50.48 ± 7.69 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.98 ± 1.40 | 63.21 ± 2.28 | 63.21 ± 7.61 |
![]() ![]() Qwen 2.5 72B Alibaba | 52.63 ± 0.34 | 63.21 ± 1.69 | 63.21 ± 7.72 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.54 ± 0.92 | 58.69 ± 1.80 | 58.69 ± 7.39 |
![]() ![]() Aya Expanse 32B CohereLabs | 50.67 ± 0.39 | 54.52 ± 1.16 | 54.52 ± 7.79 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 50.54 ± 1.30 | 55.24 ± 2.29 | 55.24 ± 8.00 |
![]() ![]() Command R 08-2024 32B CohereLabs | 47.02 ± 1.01 | 37.62 ± 2.97 | 37.62 ± 6.97 |
![]() ![]() Qwen 3 8B Alibaba | 45.75 ± 0.73 | 62.98 ± 1.02 | 62.98 ± 8.57 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 43.75 ± 3.06 | 36.19 ± 2.03 | 36.19 ± 7.20 |
![]() ![]() Qwen 2.5 14B Alibaba | 43.64 ± 0.32 | 42.98 ± 1.56 | 42.98 ± 8.06 |
![]() ![]() Sailor2 20B SAIL | 42.22 ± 0.54 | 28.33 ± 1.26 | 28.33 ± 7.01 |
![]() ![]() Llama 3 70B Meta | 40.11 ± 0.45 | 15.36 ± 1.88 | 15.36 ± 5.92 |
![]() ![]() Babel 83B Alibaba-DAMO | 38.57 ± 3.77 | 33.33 ± 2.31 | 33.33 ± 6.60 |
![]() ![]() Tulu 3 8B AI2 | 35.87 ± 0.78 | 49.17 ± 2.14 | 49.17 ± 7.49 |
![]() ![]() Olmo 2 0325 32B AI2 | 35.76 ± 1.18 | 51.19 ± 2.61 | 51.19 ± 7.72 |
![]() ![]() Qwen 2.5 7B Alibaba | 34.29 ± 0.76 | 44.88 ± 2.38 | 44.88 ± 8.11 |
![]() ![]() phi-4 14B Microsoft | 33.31 ± 4.06 | 32.50 ± 1.67 | 32.50 ± 6.91 |
![]() ![]() Llama 3.1 8B Meta | 30.63 ± 2.24 | 36.31 ± 1.39 | 36.31 ± 7.70 |
![]() ![]() Aya Expanse 8B CohereLabs | 30.14 ± 0.92 | 32.98 ± 2.28 | 32.98 ± 7.29 |
![]() ![]() Babel 9B Alibaba-DAMO | 29.64 ± 4.45 | 37.74 ± 3.11 | 37.74 ± 7.11 |
![]() ![]() Sailor2 8B SAIL | 27.63 ± 2.40 | 25.71 ± 2.42 | 25.71 ± 7.13 |
![]() ![]() Ministral 2410 8B Mistral AI | 21.58 ± 3.26 | 21.55 ± 3.81 | 21.55 ± 5.30 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 21.16 ± 3.91 | 38.45 ± 2.75 | 38.45 ± 7.31 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 19.50 ± 2.79 | 32.02 ± 1.58 | 32.02 ± 6.63 |
![]() ![]() Olmo 2 1124 13B AI2 | 18.91 ± 1.84 | 31.90 ± 2.37 | 31.90 ± 7.04 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 17.68 ± 1.76 | 33.33 ± 2.44 | 33.33 ± 7.25 |
![]() ![]() Olmo 2 1124 7B AI2 | 15.06 ± 3.63 | 26.90 ± 2.46 | 26.90 ± 6.97 |
![]() ![]() Llama 3 8B Meta | 14.98 ± 1.96 | 20.24 ± 2.51 | 20.24 ± 5.96 |
Model | TA | Linguistic Diagnostics | Syntax | Pragmatics |
---|---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 68.47 ± 0.30 | 84.73 ± 0.38 | 92.90 ± 2.17 | 76.55 ± 0.67 |
![]() ![]() Gemma 3 27B | 68.45 ± 0.47 | 84.95 ± 0.32 | 92.87 ± 2.23 | 77.02 ± 0.57 |
![]() ![]() Gemma 3 12B | 65.83 ± 0.63 | 79.24 ± 1.18 | 82.05 ± 3.05 | 76.42 ± 1.14 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 64.22 ± 0.28 | 81.82 ± 0.15 | 86.01 ± 3.12 | 77.62 ± 0.31 |
![]() ![]() Qwen 3 32B Alibaba | 64.10 ± 0.39 | 85.06 ± 0.36 | 92.69 ± 2.23 | 77.43 ± 0.82 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 63.77 ± 0.69 | 79.96 ± 1.15 | 85.03 ± 2.45 | 74.90 ± 2.21 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.89 ± 0.30 | 82.98 ± 0.40 | 89.07 ± 2.71 | 76.89 ± 0.21 |
![]() ![]() Llama 3.3 70B Meta | 60.90 ± 0.25 | 79.10 ± 0.38 | 83.72 ± 3.19 | 74.48 ± 0.31 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.04 ± 0.65 | 80.01 ± 1.20 | 84.81 ± 2.76 | 75.22 ± 0.21 |
![]() ![]() Command A 03-2025 111B CohereLabs | 59.92 ± 1.67 | 81.63 ± 1.57 | 86.99 ± 2.80 | 76.26 ± 2.86 |
![]() ![]() Qwen 3 14B Alibaba | 59.14 ± 0.29 | 79.35 ± 0.18 | 86.78 ± 3.01 | 71.91 ± 0.47 |
![]() ![]() Gemma 2 27B | 59.04 ± 0.68 | 80.50 ± 0.71 | 88.43 ± 2.47 | 72.56 ± 0.75 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 57.64 ± 1.61 | 74.66 ± 2.34 | 81.89 ± 2.92 | 67.43 ± 3.95 |
![]() ![]() Tulu 3 70B AI2 | 57.35 ± 0.43 | 72.67 ± 0.80 | 80.32 ± 3.23 | 65.03 ± 1.21 |
![]() ![]() Gemma 2 9B | 54.87 ± 0.74 | 81.49 ± 0.63 | 84.39 ± 2.80 | 78.59 ± 1.20 |
![]() ![]() Qwen 2.5 32B Alibaba | 53.81 ± 0.25 | 72.20 ± 0.31 | 76.78 ± 3.75 | 67.62 ± 0.47 |
![]() ![]() Llama 3.1 70B Meta | 53.13 ± 0.93 | 74.74 ± 1.40 | 81.46 ± 2.85 | 68.01 ± 0.82 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.98 ± 1.40 | 65.79 ± 3.87 | 65.82 ± 2.13 | 65.76 ± 0.99 |
![]() ![]() Qwen 2.5 72B Alibaba | 52.63 ± 0.34 | 69.82 ± 0.49 | 66.41 ± 4.18 | 73.23 ± 1.15 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.54 ± 0.92 | 78.74 ± 2.17 | 79.57 ± 2.51 | 77.90 ± 0.23 |
![]() ![]() Aya Expanse 32B CohereLabs | 50.67 ± 0.39 | 68.68 ± 0.96 | 69.84 ± 3.97 | 67.51 ± 0.30 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 50.54 ± 1.30 | 52.12 ± 6.42 | 66.54 ± 3.09 | 37.69 ± 15.97 |
![]() ![]() Command R 08-2024 32B CohereLabs | 47.02 ± 1.01 | 62.72 ± 5.64 | 65.82 ± 3.00 | 59.61 ± 4.27 |
![]() ![]() Qwen 3 8B Alibaba | 45.75 ± 0.73 | 57.57 ± 2.14 | 65.85 ± 4.10 | 49.29 ± 3.40 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 43.75 ± 3.06 | 60.77 ± 2.09 | 56.97 ± 2.69 | 64.56 ± 3.05 |
![]() ![]() Qwen 2.5 14B Alibaba | 43.64 ± 0.32 | 69.74 ± 0.35 | 72.50 ± 3.97 | 66.99 ± 0.75 |
![]() ![]() Sailor2 20B SAIL | 42.22 ± 0.54 | 68.48 ± 0.41 | 83.43 ± 3.32 | 53.54 ± 0.98 |
![]() ![]() Llama 3 70B Meta | 40.11 ± 0.45 | 70.89 ± 0.46 | 78.78 ± 3.47 | 63.01 ± 0.26 |
![]() ![]() Babel 83B Alibaba-DAMO | 38.57 ± 3.77 | 61.14 ± 9.43 | 66.57 ± 2.23 | 55.71 ± 10.40 |
![]() ![]() Tulu 3 8B AI2 | 35.87 ± 0.78 | 45.28 ± 4.40 | 59.73 ± 3.30 | 30.83 ± 7.09 |
![]() ![]() Olmo 2 0325 32B AI2 | 35.76 ± 1.18 | 56.84 ± 1.64 | 54.28 ± 1.29 | 59.39 ± 4.55 |
![]() ![]() Qwen 2.5 7B Alibaba | 34.29 ± 0.76 | 49.58 ± 1.02 | 45.19 ± 4.15 | 53.97 ± 0.80 |
![]() ![]() phi-4 14B Microsoft | 33.31 ± 4.06 | 31.33 ± 14.08 | 35.05 ± 1.92 | 27.62 ± 14.84 |
![]() ![]() Llama 3.1 8B Meta | 30.63 ± 2.24 | 55.22 ± 2.11 | 53.59 ± 1.61 | 56.85 ± 0.56 |
![]() ![]() Aya Expanse 8B CohereLabs | 30.14 ± 0.92 | 56.71 ± 2.70 | 55.88 ± 4.45 | 57.54 ± 5.33 |
![]() ![]() Babel 9B Alibaba-DAMO | 29.64 ± 4.45 | 19.17 ± 5.50 | 12.47 ± 1.09 | 25.87 ± 11.58 |
![]() ![]() Sailor2 8B SAIL | 27.63 ± 2.40 | 7.58 ± 3.07 | 0.05 ± 0.07 | 15.11 ± 6.13 |
![]() ![]() Ministral 2410 8B Mistral AI | 21.58 ± 3.26 | 16.00 ± 9.70 | 26.54 ± 2.48 | 5.45 ± 7.55 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 21.16 ± 3.91 | 20.63 ± 13.50 | 21.44 ± 0.83 | 19.81 ± 17.33 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 19.50 ± 2.79 | 35.18 ± 11.18 | 44.26 ± 2.24 | 26.10 ± 14.26 |
![]() ![]() Olmo 2 1124 13B AI2 | 18.91 ± 1.84 | 24.02 ± 9.48 | 13.14 ± 1.25 | 34.91 ± 9.97 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 17.68 ± 1.76 | 20.98 ± 7.28 | 7.18 ± 0.93 | 34.78 ± 14.82 |
![]() ![]() Olmo 2 1124 7B AI2 | 15.06 ± 3.63 | 25.63 ± 11.95 | 26.36 ± 1.13 | 24.91 ± 13.45 |
![]() ![]() Llama 3 8B Meta | 14.98 ± 1.96 | 7.12 ± 7.36 | 5.32 ± 0.65 | 8.93 ± 6.17 |
Model | TA | Multi-Turn Chat | SEA-MT-Bench |
---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 68.47 ± 0.30 | 40.19 ± 1.06 | 40.19 ± 5.18 |
![]() ![]() Gemma 3 27B | 68.45 ± 0.47 | 40.14 ± 1.79 | 40.14 ± 5.30 |
![]() ![]() Gemma 3 12B | 65.83 ± 0.63 | 36.05 ± 2.05 | 36.05 ± 5.16 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 64.22 ± 0.28 | 12.34 ± 0.86 | 12.34 ± 3.73 |
![]() ![]() Qwen 3 32B Alibaba | 64.10 ± 0.39 | 17.46 ± 1.51 | 17.46 ± 4.32 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 63.77 ± 0.69 | 15.73 ± 0.89 | 15.73 ± 4.37 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.89 ± 0.30 | 18.97 ± 0.75 | 18.97 ± 4.98 |
![]() ![]() Llama 3.3 70B Meta | 60.90 ± 0.25 | 8.73 ± 0.78 | 8.73 ± 3.49 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.04 ± 0.65 | 17.19 ± 0.77 | 17.19 ± 4.08 |
![]() ![]() Command A 03-2025 111B CohereLabs | 59.92 ± 1.67 | 17.62 ± 1.16 | 17.62 ± 4.47 |
![]() ![]() Qwen 3 14B Alibaba | 59.14 ± 0.29 | 12.34 ± 0.53 | 12.34 ± 3.89 |
![]() ![]() Gemma 2 27B | 59.04 ± 0.68 | 7.60 ± 0.83 | 7.60 ± 2.76 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 57.64 ± 1.61 | 10.78 ± 1.25 | 10.78 ± 3.51 |
![]() ![]() Tulu 3 70B AI2 | 57.35 ± 0.43 | 9.86 ± 0.77 | 9.86 ± 3.37 |
![]() ![]() Gemma 2 9B | 54.87 ± 0.74 | 6.73 ± 0.98 | 6.73 ± 2.69 |
![]() ![]() Qwen 2.5 32B Alibaba | 53.81 ± 0.25 | 7.76 ± 0.78 | 7.76 ± 2.65 |
![]() ![]() Llama 3.1 70B Meta | 53.13 ± 0.93 | 8.30 ± 0.79 | 8.30 ± 3.32 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.98 ± 1.40 | 9.27 ± 1.14 | 9.27 ± 3.35 |
![]() ![]() Qwen 2.5 72B Alibaba | 52.63 ± 0.34 | 11.05 ± 0.73 | 11.05 ± 3.54 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.54 ± 0.92 | 5.87 ± 0.94 | 5.87 ± 2.39 |
![]() ![]() Aya Expanse 32B CohereLabs | 50.67 ± 0.39 | 6.57 ± 0.87 | 6.57 ± 2.86 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 50.54 ± 1.30 | 11.96 ± 0.74 | 11.96 ± 3.69 |
![]() ![]() Command R 08-2024 32B CohereLabs | 47.02 ± 1.01 | 1.94 ± 0.58 | 1.94 ± 1.13 |
![]() ![]() Qwen 3 8B Alibaba | 45.75 ± 0.73 | 12.23 ± 0.89 | 12.23 ± 4.05 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 43.75 ± 3.06 | 2.96 ± 0.46 | 2.96 ± 1.81 |
![]() ![]() Qwen 2.5 14B Alibaba | 43.64 ± 0.32 | 6.30 ± 0.80 | 6.30 ± 2.58 |
![]() ![]() Sailor2 20B SAIL | 42.22 ± 0.54 | 10.83 ± 1.09 | 10.83 ± 3.32 |
![]() ![]() Llama 3 70B Meta | 40.11 ± 0.45 | 4.85 ± 0.59 | 4.85 ± 2.12 |
![]() ![]() Babel 83B Alibaba-DAMO | 38.57 ± 3.77 | 1.99 ± 0.68 | 1.99 ± 1.27 |
![]() ![]() Tulu 3 8B AI2 | 35.87 ± 0.78 | 2.69 ± 0.63 | 2.69 ± 1.77 |
![]() ![]() Olmo 2 0325 32B AI2 | 35.76 ± 1.18 | 2.42 ± 0.45 | 2.42 ± 1.45 |
![]() ![]() Qwen 2.5 7B Alibaba | 34.29 ± 0.76 | 2.69 ± 0.59 | 2.69 ± 1.52 |
![]() ![]() phi-4 14B Microsoft | 33.31 ± 4.06 | 8.14 ± 0.84 | 8.14 ± 3.28 |
![]() ![]() Llama 3.1 8B Meta | 30.63 ± 2.24 | 4.26 ± 0.84 | 4.26 ± 2.15 |
![]() ![]() Aya Expanse 8B CohereLabs | 30.14 ± 0.92 | 2.26 ± 0.44 | 2.26 ± 1.56 |
![]() ![]() Babel 9B Alibaba-DAMO | 29.64 ± 4.45 | 2.64 ± 0.69 | 2.64 ± 1.54 |
![]() ![]() Sailor2 8B SAIL | 27.63 ± 2.40 | 7.06 ± 0.93 | 7.06 ± 2.79 |
![]() ![]() Ministral 2410 8B Mistral AI | 21.58 ± 3.26 | 1.08 ± 0.62 | 1.08 ± 0.78 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 21.16 ± 3.91 | 1.45 ± 0.55 | 1.45 ± 1.05 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 19.50 ± 2.79 | 1.67 ± 0.87 | 1.67 ± 1.09 |
![]() ![]() Olmo 2 1124 13B AI2 | 18.91 ± 1.84 | 1.24 ± 0.46 | 1.24 ± 1.14 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 17.68 ± 1.76 | 2.16 ± 0.36 | 2.16 ± 1.25 |
![]() ![]() Olmo 2 1124 7B AI2 | 15.06 ± 3.63 | 0.32 ± 0.31 | 0.32 ± 0.29 |
![]() ![]() Llama 3 8B Meta | 14.98 ± 1.96 | 2.48 ± 0.44 | 2.48 ± 1.62 |
Model | TA | NLG | Summarization | Translations |
---|---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 68.47 ± 0.30 | 50.32 ± 0.08 | 11.26 ± 1.40 | 89.37 ± 0.06 |
![]() ![]() Gemma 3 27B | 68.45 ± 0.47 | 50.46 ± 0.05 | 11.38 ± 1.43 | 89.55 ± 0.05 |
![]() ![]() Gemma 3 12B | 65.83 ± 0.63 | 49.56 ± 0.15 | 11.21 ± 1.34 | 87.90 ± 0.41 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 64.22 ± 0.28 | 50.12 ± 0.19 | 14.93 ± 2.01 | 85.31 ± 0.07 |
![]() ![]() Qwen 3 32B Alibaba | 64.10 ± 0.39 | 44.69 ± 0.32 | 12.97 ± 1.78 | 76.40 ± 0.47 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 63.77 ± 0.69 | 48.65 ± 0.97 | 13.82 ± 1.85 | 83.49 ± 1.78 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.89 ± 0.30 | 42.09 ± 1.17 | 11.68 ± 1.58 | 72.49 ± 2.41 |
![]() ![]() Llama 3.3 70B Meta | 60.90 ± 0.25 | 46.85 ± 0.24 | 13.74 ± 1.79 | 79.96 ± 0.33 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.04 ± 0.65 | 42.45 ± 2.62 | 12.77 ± 1.63 | 72.12 ± 5.31 |
![]() ![]() Command A 03-2025 111B CohereLabs | 59.92 ± 1.67 | 47.96 ± 0.51 | 12.29 ± 1.63 | 83.64 ± 0.83 |
![]() ![]() Qwen 3 14B Alibaba | 59.14 ± 0.29 | 41.55 ± 0.78 | 12.61 ± 1.86 | 70.49 ± 1.56 |
![]() ![]() Gemma 2 27B | 59.04 ± 0.68 | 44.36 ± 2.18 | 13.06 ± 1.68 | 75.66 ± 4.50 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 57.64 ± 1.61 | 47.18 ± 1.07 | 12.75 ± 1.72 | 81.62 ± 2.25 |
![]() ![]() Tulu 3 70B AI2 | 57.35 ± 0.43 | 44.34 ± 0.51 | 11.80 ± 1.45 | 76.87 ± 0.99 |
![]() ![]() Gemma 2 9B | 54.87 ± 0.74 | 35.18 ± 3.26 | 10.65 ± 1.36 | 59.71 ± 6.42 |
![]() ![]() Qwen 2.5 32B Alibaba | 53.81 ± 0.25 | 35.03 ± 0.20 | 11.94 ± 1.66 | 58.12 ± 0.24 |
![]() ![]() Llama 3.1 70B Meta | 53.13 ± 0.93 | 27.46 ± 3.93 | 13.76 ± 1.77 | 41.15 ± 7.55 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.98 ± 1.40 | 42.87 ± 2.04 | 11.37 ± 1.48 | 74.38 ± 4.11 |
![]() ![]() Qwen 2.5 72B Alibaba | 52.63 ± 0.34 | 37.39 ± 0.18 | 7.65 ± 1.46 | 67.12 ± 0.14 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.54 ± 0.92 | 26.00 ± 2.16 | 9.52 ± 1.25 | 42.48 ± 3.95 |
![]() ![]() Aya Expanse 32B CohereLabs | 50.67 ± 0.39 | 36.64 ± 0.51 | 0.00 ± 0.00 | 73.27 ± 1.02 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 50.54 ± 1.30 | 40.42 ± 0.88 | 11.45 ± 1.48 | 69.38 ± 1.76 |
![]() ![]() Command R 08-2024 32B CohereLabs | 47.02 ± 1.01 | 37.82 ± 0.56 | 10.98 ± 1.36 | 64.67 ± 1.33 |
![]() ![]() Qwen 3 8B Alibaba | 45.75 ± 0.73 | 20.40 ± 0.63 | 0.00 ± 0.00 | 40.79 ± 1.27 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 43.75 ± 3.06 | 40.13 ± 0.87 | 8.52 ± 1.11 | 71.75 ± 1.47 |
![]() ![]() Qwen 2.5 14B Alibaba | 43.64 ± 0.32 | 25.82 ± 0.38 | 9.86 ± 1.55 | 41.78 ± 0.64 |
![]() ![]() Sailor2 20B SAIL | 42.22 ± 0.54 | 22.91 ± 1.69 | 0.00 ± 0.00 | 45.83 ± 3.37 |
![]() ![]() Llama 3 70B Meta | 40.11 ± 0.45 | 31.29 ± 1.80 | 0.00 ± 0.00 | 62.59 ± 3.59 |
![]() ![]() Babel 83B Alibaba-DAMO | 38.57 ± 3.77 | 24.44 ± 3.01 | 9.70 ± 1.19 | 39.19 ± 6.22 |
![]() ![]() Tulu 3 8B AI2 | 35.87 ± 0.78 | 35.96 ± 0.30 | 11.21 ± 1.47 | 60.72 ± 0.67 |
![]() ![]() Olmo 2 0325 32B AI2 | 35.76 ± 1.18 | 25.25 ± 1.08 | 0.00 ± 0.00 | 50.50 ± 2.15 |
![]() ![]() Qwen 2.5 7B Alibaba | 34.29 ± 0.76 | 17.67 ± 0.41 | 8.67 ± 1.19 | 26.67 ± 0.74 |
![]() ![]() phi-4 14B Microsoft | 33.31 ± 4.06 | 31.32 ± 1.53 | 8.78 ± 1.06 | 53.86 ± 2.84 |
![]() ![]() Llama 3.1 8B Meta | 30.63 ± 2.24 | 29.48 ± 3.56 | 12.78 ± 1.55 | 46.18 ± 6.95 |
![]() ![]() Aya Expanse 8B CohereLabs | 30.14 ± 0.92 | 19.72 ± 1.21 | 0.00 ± 0.00 | 39.44 ± 2.41 |
![]() ![]() Babel 9B Alibaba-DAMO | 29.64 ± 4.45 | 28.16 ± 2.47 | 10.52 ± 1.45 | 45.80 ± 4.95 |
![]() ![]() Sailor2 8B SAIL | 27.63 ± 2.40 | 23.59 ± 0.96 | 0.00 ± 0.00 | 47.17 ± 1.92 |
![]() ![]() Ministral 2410 8B Mistral AI | 21.58 ± 3.26 | 18.73 ± 0.70 | 3.50 ± 0.59 | 33.96 ± 1.29 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 21.16 ± 3.91 | 24.66 ± 1.59 | 8.69 ± 0.97 | 40.63 ± 3.27 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 19.50 ± 2.79 | 11.21 ± 2.81 | 0.74 ± 0.14 | 21.68 ± 5.85 |
![]() ![]() Olmo 2 1124 13B AI2 | 18.91 ± 1.84 | 17.40 ± 0.93 | 0.00 ± 0.00 | 34.80 ± 1.87 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 17.68 ± 1.76 | 20.15 ± 0.82 | 10.59 ± 1.56 | 29.71 ± 1.39 |
![]() ![]() Olmo 2 1124 7B AI2 | 15.06 ± 3.63 | 15.08 ± 0.29 | 0.00 ± 0.00 | 30.15 ± 0.58 |
![]() ![]() Llama 3 8B Meta | 14.98 ± 1.96 | 27.03 ± 1.54 | 0.00 ± 0.00 | 54.07 ± 3.08 |
Model | TA | NLR | Causal Reasoning | Natural Language Inference |
---|---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 68.47 ± 0.30 | 81.89 ± 0.21 | 93.95 ± 2.03 | 69.84 ± 2.70 |
![]() ![]() Gemma 3 27B | 68.45 ± 0.47 | 82.08 ± 0.18 | 94.15 ± 2.01 | 70.00 ± 2.77 |
![]() ![]() Gemma 3 12B | 65.83 ± 0.63 | 75.92 ± 0.51 | 92.13 ± 2.25 | 59.71 ± 2.92 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 64.22 ± 0.28 | 76.96 ± 0.13 | 92.55 ± 2.30 | 61.38 ± 2.99 |
![]() ![]() Qwen 3 32B Alibaba | 64.10 ± 0.39 | 76.97 ± 0.75 | 89.85 ± 2.58 | 64.09 ± 2.81 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 63.77 ± 0.69 | 77.78 ± 2.39 | 92.38 ± 1.71 | 63.19 ± 2.18 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.89 ± 0.30 | 77.13 ± 0.10 | 91.33 ± 2.42 | 62.94 ± 2.86 |
![]() ![]() Llama 3.3 70B Meta | 60.90 ± 0.25 | 77.86 ± 0.20 | 92.17 ± 2.24 | 63.55 ± 2.78 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.04 ± 0.65 | 70.23 ± 0.48 | 89.85 ± 2.32 | 50.61 ± 3.06 |
![]() ![]() Command A 03-2025 111B CohereLabs | 59.92 ± 1.67 | 67.47 ± 6.38 | 90.77 ± 2.27 | 44.16 ± 2.44 |
![]() ![]() Qwen 3 14B Alibaba | 59.14 ± 0.29 | 66.84 ± 0.44 | 80.77 ± 3.37 | 52.90 ± 2.98 |
![]() ![]() Gemma 2 27B | 59.04 ± 0.68 | 76.23 ± 0.40 | 87.15 ± 2.77 | 65.31 ± 2.87 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 57.64 ± 1.61 | 66.76 ± 5.95 | 81.20 ± 3.06 | 52.31 ± 2.35 |
![]() ![]() Tulu 3 70B AI2 | 57.35 ± 0.43 | 76.76 ± 1.32 | 89.10 ± 2.27 | 64.41 ± 2.37 |
![]() ![]() Gemma 2 9B | 54.87 ± 0.74 | 64.39 ± 1.49 | 84.75 ± 2.43 | 44.02 ± 3.06 |
![]() ![]() Qwen 2.5 32B Alibaba | 53.81 ± 0.25 | 63.98 ± 0.14 | 74.65 ± 3.78 | 53.31 ± 3.05 |
![]() ![]() Llama 3.1 70B Meta | 53.13 ± 0.93 | 74.65 ± 1.18 | 90.38 ± 2.01 | 58.93 ± 2.03 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.98 ± 1.40 | 59.68 ± 3.74 | 69.38 ± 2.25 | 49.99 ± 1.90 |
![]() ![]() Qwen 2.5 72B Alibaba | 52.63 ± 0.34 | 59.01 ± 0.84 | 72.45 ± 3.79 | 45.57 ± 2.97 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.54 ± 0.92 | 64.46 ± 0.98 | 84.42 ± 2.30 | 44.49 ± 3.03 |
![]() ![]() Aya Expanse 32B CohereLabs | 50.67 ± 0.39 | 62.12 ± 0.65 | 70.90 ± 3.79 | 53.34 ± 2.92 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 50.54 ± 1.30 | 66.94 ± 1.57 | 80.25 ± 2.88 | 53.63 ± 2.98 |
![]() ![]() Command R 08-2024 32B CohereLabs | 47.02 ± 1.01 | 62.34 ± 2.76 | 75.20 ± 2.61 | 49.49 ± 1.55 |
![]() ![]() Qwen 3 8B Alibaba | 45.75 ± 0.73 | 54.26 ± 0.25 | 73.63 ± 3.74 | 34.90 ± 2.95 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 43.75 ± 3.06 | 51.18 ± 7.87 | 53.23 ± 1.81 | 49.14 ± 2.18 |
![]() ![]() Qwen 2.5 14B Alibaba | 43.64 ± 0.32 | 54.13 ± 0.15 | 68.63 ± 3.98 | 39.64 ± 3.00 |
![]() ![]() Sailor2 20B SAIL | 42.22 ± 0.54 | 75.98 ± 0.36 | 91.45 ± 2.41 | 60.51 ± 2.94 |
![]() ![]() Llama 3 70B Meta | 40.11 ± 0.45 | 70.28 ± 0.74 | 80.40 ± 3.30 | 60.16 ± 2.77 |
![]() ![]() Babel 83B Alibaba-DAMO | 38.57 ± 3.77 | 55.42 ± 9.79 | 71.80 ± 2.85 | 39.05 ± 1.17 |
![]() ![]() Tulu 3 8B AI2 | 35.87 ± 0.78 | 56.52 ± 2.33 | 68.97 ± 3.37 | 44.07 ± 2.14 |
![]() ![]() Olmo 2 0325 32B AI2 | 35.76 ± 1.18 | 51.77 ± 1.36 | 68.15 ± 3.12 | 35.40 ± 1.59 |
![]() ![]() Qwen 2.5 7B Alibaba | 34.29 ± 0.76 | 28.03 ± 1.46 | 8.90 ± 1.71 | 47.16 ± 2.93 |
![]() ![]() phi-4 14B Microsoft | 33.31 ± 4.06 | 35.71 ± 8.21 | 64.75 ± 2.66 | 6.66 ± 0.46 |
![]() ![]() Llama 3.1 8B Meta | 30.63 ± 2.24 | 19.20 ± 1.80 | 0.85 ± 0.28 | 37.55 ± 1.63 |
![]() ![]() Aya Expanse 8B CohereLabs | 30.14 ± 0.92 | 50.96 ± 1.72 | 62.68 ± 4.05 | 39.25 ± 2.53 |
![]() ![]() Babel 9B Alibaba-DAMO | 29.64 ± 4.45 | 37.54 ± 11.20 | 51.45 ± 2.22 | 23.63 ± 1.10 |
![]() ![]() Sailor2 8B SAIL | 27.63 ± 2.40 | 54.34 ± 10.45 | 45.98 ± 2.43 | 62.70 ± 2.56 |
![]() ![]() Ministral 2410 8B Mistral AI | 21.58 ± 3.26 | 32.18 ± 8.09 | 29.70 ± 2.50 | 34.66 ± 1.52 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 21.16 ± 3.91 | 26.39 ± 10.08 | 24.13 ± 1.21 | 28.65 ± 2.53 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 19.50 ± 2.79 | 17.91 ± 5.82 | 26.55 ± 1.60 | 9.26 ± 0.82 |
![]() ![]() Olmo 2 1124 13B AI2 | 18.91 ± 1.84 | 38.64 ± 1.93 | 44.20 ± 3.01 | 33.07 ± 2.49 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 17.68 ± 1.76 | 10.92 ± 5.03 | 17.55 ± 1.25 | 4.29 ± 0.47 |
![]() ![]() Olmo 2 1124 7B AI2 | 15.06 ± 3.63 | 19.14 ± 7.09 | 23.43 ± 1.55 | 14.86 ± 0.81 |
![]() ![]() Llama 3 8B Meta | 14.98 ± 1.96 | 19.45 ± 1.19 | 0.00 ± 0.00 | 38.90 ± 1.84 |
Model | TA | NLU | Question Answering | Sentiment Analysis |
---|---|---|---|---|
![]() ![]() SEA-LION v4 27B AISG | 68.47 ± 0.30 | 82.39 ± 0.50 | 66.25 ± 7.38 | 98.54 ± 0.68 |
![]() ![]() Gemma 3 27B | 68.45 ± 0.47 | 81.77 ± 0.24 | 64.86 ± 7.56 | 98.67 ± 0.68 |
![]() ![]() Gemma 3 12B | 65.83 ± 0.63 | 83.60 ± 0.64 | 69.23 ± 6.99 | 97.97 ± 0.80 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 64.22 ± 0.28 | 83.63 ± 0.24 | 68.64 ± 7.87 | 98.61 ± 0.72 |
![]() ![]() Qwen 3 32B Alibaba | 64.10 ± 0.39 | 84.60 ± 0.37 | 72.21 ± 7.27 | 96.99 ± 0.96 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 63.77 ± 0.69 | 83.69 ± 0.86 | 69.16 ± 7.25 | 98.22 ± 0.71 |
![]() ![]() Qwen 3 30B MoE Alibaba | 61.89 ± 0.30 | 83.04 ± 0.31 | 68.94 ± 7.79 | 97.14 ± 0.98 |
![]() ![]() Llama 3.3 70B Meta | 60.90 ± 0.25 | 83.22 ± 0.50 | 68.57 ± 7.21 | 97.88 ± 0.88 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 60.04 ± 0.65 | 83.10 ± 0.65 | 68.74 ± 6.81 | 97.46 ± 0.93 |
![]() ![]() Command A 03-2025 111B CohereLabs | 59.92 ± 1.67 | 76.01 ± 1.12 | 57.65 ± 6.93 | 94.38 ± 1.18 |
![]() ![]() Qwen 3 14B Alibaba | 59.14 ± 0.29 | 86.23 ± 0.28 | 76.25 ± 6.90 | 96.20 ± 1.14 |
![]() ![]() Gemma 2 27B | 59.04 ± 0.68 | 82.71 ± 0.86 | 67.21 ± 6.90 | 98.21 ± 0.76 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 57.64 ± 1.61 | 83.25 ± 1.43 | 70.19 ± 6.32 | 96.30 ± 0.93 |
![]() ![]() Tulu 3 70B AI2 | 57.35 ± 0.43 | 80.86 ± 0.77 | 64.56 ± 7.11 | 97.16 ± 0.91 |
![]() ![]() Gemma 2 9B | 54.87 ± 0.74 | 83.33 ± 0.89 | 69.03 ± 6.97 | 97.64 ± 0.88 |
![]() ![]() Qwen 2.5 32B Alibaba | 53.81 ± 0.25 | 77.31 ± 0.46 | 62.30 ± 7.61 | 92.33 ± 1.52 |
![]() ![]() Llama 3.1 70B Meta | 53.13 ± 0.93 | 83.14 ± 0.67 | 69.21 ± 6.97 | 97.08 ± 0.91 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 52.98 ± 1.40 | 77.02 ± 1.09 | 57.00 ± 7.13 | 97.05 ± 0.87 |
![]() ![]() Qwen 2.5 72B Alibaba | 52.63 ± 0.34 | 75.29 ± 0.28 | 57.65 ± 7.23 | 92.92 ± 1.53 |
![]() ![]() MERaLiON 2 10B A*STAR | 52.54 ± 0.92 | 81.49 ± 0.65 | 65.85 ± 6.75 | 97.13 ± 1.00 |
![]() ![]() Aya Expanse 32B CohereLabs | 50.67 ± 0.39 | 75.51 ± 0.52 | 53.92 ± 7.61 | 97.11 ± 0.93 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 50.54 ± 1.30 | 76.59 ± 0.55 | 57.21 ± 7.61 | 95.97 ± 1.19 |
![]() ![]() Command R 08-2024 32B CohereLabs | 47.02 ± 1.01 | 79.69 ± 0.82 | 62.49 ± 7.23 | 96.89 ± 0.76 |
![]() ![]() Qwen 3 8B Alibaba | 45.75 ± 0.73 | 67.04 ± 4.31 | 66.35 ± 7.94 | 67.74 ± 2.42 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 43.75 ± 3.06 | 71.28 ± 11.38 | 59.54 ± 7.05 | 83.03 ± 1.03 |
![]() ![]() Qwen 2.5 14B Alibaba | 43.64 ± 0.32 | 62.88 ± 0.18 | 38.00 ± 6.72 | 87.76 ± 1.88 |
![]() ![]() Sailor2 20B SAIL | 42.22 ± 0.54 | 46.75 ± 0.73 | 0.00 ± 0.00 | 93.50 ± 1.39 |
![]() ![]() Llama 3 70B Meta | 40.11 ± 0.45 | 47.96 ± 0.07 | 0.00 ± 0.00 | 95.93 ± 1.20 |
![]() ![]() Babel 83B Alibaba-DAMO | 38.57 ± 3.77 | 55.08 ± 5.28 | 24.81 ± 4.10 | 85.34 ± 1.15 |
![]() ![]() Tulu 3 8B AI2 | 35.87 ± 0.78 | 25.61 ± 2.77 | 47.11 ± 5.93 | 4.11 ± 0.59 |
![]() ![]() Olmo 2 0325 32B AI2 | 35.76 ± 1.18 | 27.06 ± 6.80 | 0.00 ± 0.00 | 54.13 ± 1.88 |
![]() ![]() Qwen 2.5 7B Alibaba | 34.29 ± 0.76 | 62.92 ± 1.41 | 49.84 ± 8.16 | 76.00 ± 2.44 |
![]() ![]() phi-4 14B Microsoft | 33.31 ± 4.06 | 60.84 ± 6.57 | 42.63 ± 5.42 | 79.05 ± 1.60 |
![]() ![]() Llama 3.1 8B Meta | 30.63 ± 2.24 | 39.31 ± 11.35 | 52.79 ± 6.62 | 25.82 ± 1.26 |
![]() ![]() Aya Expanse 8B CohereLabs | 30.14 ± 0.92 | 18.24 ± 2.21 | 36.37 ± 6.21 | 0.11 ± 0.10 |
![]() ![]() Babel 9B Alibaba-DAMO | 29.64 ± 4.45 | 52.57 ± 12.32 | 58.86 ± 7.10 | 46.29 ± 0.99 |
![]() ![]() Sailor2 8B SAIL | 27.63 ± 2.40 | 47.48 ± 0.40 | 0.00 ± 0.00 | 94.96 ± 1.15 |
![]() ![]() Ministral 2410 8B Mistral AI | 21.58 ± 3.26 | 39.97 ± 9.27 | 47.24 ± 6.21 | 32.71 ± 1.13 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 21.16 ± 3.91 | 15.38 ± 2.22 | 30.72 ± 5.20 | 0.04 ± 0.04 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 19.50 ± 2.79 | 19.03 ± 4.62 | 34.91 ± 5.13 | 3.15 ± 0.35 |
![]() ![]() Olmo 2 1124 13B AI2 | 18.91 ± 1.84 | 0.24 ± 0.29 | 0.00 ± 0.00 | 0.47 ± 0.15 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 17.68 ± 1.76 | 18.56 ± 3.50 | 32.85 ± 5.12 | 4.28 ± 0.37 |
![]() ![]() Olmo 2 1124 7B AI2 | 15.06 ± 3.63 | 3.26 ± 4.22 | 0.00 ± 0.00 | 6.53 ± 0.51 |
![]() ![]() Llama 3 8B Meta | 14.98 ± 1.96 | 13.56 ± 8.33 | 0.00 ± 0.00 | 27.13 ± 1.45 |