Thai Performance
Thai Scores by Model
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 32B 65.36±0.33 |
![]() ![]() 30B MoE 64.57±0.21 |
![]() ![]() 27B 63.18±0.16 |
![]() ![]() 14B 63.01±0.32 |
![]() ![]() 72B 62.91±0.44 |
![]() ![]() 27B 62.79±0.32 |
![]() ![]() 70B 62.67±0.31 |
![]() ![]() 70B 61.09±0.40 |
![]() ![]() 12B 60.27±0.22 |
![]() ![]() 32B 60.10±0.27 |
![]() ![]() 70B 59.73±0.23 |
![]() ![]() 8B 58.57±0.25 |
![]() ![]() 109B MoE 58.52±0.06 |
![]() ![]() 27B 57.80±0.36 |
![]() ![]() 70B 57.28±0.33 |
![]() ![]() 9B 57.24±0.65 |
![]() ![]() 14B 56.23±0.39 |
![]() ![]() 123B 55.62±2.15 |
![]() ![]() 9B 53.68±0.51 |
![]() ![]() 8B 53.05±0.53 |
![]() ![]() 21B MoE 52.11±0.37 |
![]() ![]() 10B 51.78±0.81 |
![]() ![]() 7B 49.64±2.70 |
![]() ![]() 70B 48.80±0.13 |
![]() ![]() 8B 48.34±0.63 |
![]() ![]() 20B 47.52±0.48 |
![]() ![]() 8B 47.38±0.34 |
![]() ![]() 8B 46.93±0.57 |
![]() ![]() 111B 44.92±4.59 |
![]() ![]() 32B 44.09±0.97 |
![]() ![]() 104B 44.05±0.96 |
![]() ![]() 24B 43.94±0.73 |
![]() ![]() 9B 42.78±0.98 |
![]() ![]() 7B 41.00±2.35 |
![]() ![]() 32B 40.66±0.34 |
![]() ![]() 32B 40.20±0.79 |
![]() ![]() 14B 39.41±2.54 |
![]() ![]() 8B 39.28±0.54 |
![]() ![]() 83B 38.88±4.47 |
![]() ![]() 8B 33.07±0.56 |
![]() ![]() 8B 32.06±1.64 |
![]() ![]() 7B 30.14±2.73 |
![]() ![]() 7B 26.04±1.30 |
![]() ![]() 13B 25.67±1.26 |
Thai Competencies
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TH | Instruction Following | Multi-Turn Chat | NLG | NLR | NLU | Safety | Knowledge |
---|---|---|---|---|---|---|---|---|
![]() ![]() Qwen 3 32B Alibaba | 65.36 ± 0.33 | 75.36 ± 1.71 | 43.41 ± 1.73 | 57.10 ± 0.07 | 79.79 ± 0.46 | 72.80 ± 0.19 | 75.83 ± 0.42 | 53.25 ± 0.58 |
![]() ![]() Qwen 3 30B MoE Alibaba | 64.57 ± 0.21 | 77.86 ± 1.31 | 51.44 ± 1.43 | 55.82 ± 0.17 | 81.27 ± 0.08 | 65.19 ± 0.30 | 72.38 ± 0.16 | 48.03 ± 0.29 |
![]() ![]() SEA-LION v4 27B AISG | 63.18 ± 0.16 | 75.36 ± 1.43 | 41.35 ± 1.18 | 58.32 ± 0.14 | 79.55 ± 0.49 | 65.96 ± 0.77 | 69.07 ± 0.26 | 52.66 ± 1.18 |
![]() ![]() Qwen 3 14B Alibaba | 63.01 ± 0.32 | 80.71 ± 1.68 | 35.54 ± 1.73 | 54.75 ± 0.13 | 73.42 ± 0.29 | 71.83 ± 0.23 | 76.46 ± 0.08 | 48.33 ± 0.50 |
![]() ![]() Qwen 2.5 72B Alibaba | 62.91 ± 0.44 | 78.33 ± 1.72 | 30.08 ± 1.55 | 57.71 ± 0.08 | 79.03 ± 0.76 | 71.17 ± 0.18 | 70.57 ± 0.63 | 53.44 ± 1.03 |
![]() ![]() Gemma 3 27B | 62.79 ± 0.32 | 72.26 ± 1.95 | 40.73 ± 1.46 | 58.51 ± 0.08 | 79.41 ± 0.26 | 66.91 ± 0.53 | 69.19 ± 0.19 | 52.56 ± 0.81 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.67 ± 0.31 | 82.38 ± 1.58 | 20.91 ± 1.14 | 58.55 ± 0.14 | 82.14 ± 0.68 | 63.83 ± 0.71 | 74.05 ± 1.17 | 56.79 ± 0.85 |
![]() ![]() Tulu 3 70B AI2 | 61.09 ± 0.40 | 76.90 ± 1.89 | 21.22 ± 1.12 | 56.20 ± 0.23 | 83.38 ± 0.31 | 64.00 ± 1.39 | 76.54 ± 1.25 | 49.41 ± 1.36 |
![]() ![]() Gemma 3 12B | 60.27 ± 0.22 | 70.48 ± 1.27 | 36.20 ± 1.13 | 57.92 ± 0.12 | 78.54 ± 0.29 | 66.51 ± 0.41 | 68.28 ± 0.38 | 44.00 ± 0.74 |
![]() ![]() Qwen 2.5 32B Alibaba | 60.10 ± 0.27 | 76.67 ± 1.17 | 13.84 ± 0.77 | 55.76 ± 0.10 | 78.29 ± 0.24 | 70.38 ± 0.27 | 70.09 ± 0.16 | 55.71 ± 0.56 |
![]() ![]() Llama 3.3 70B Meta | 59.73 ± 0.23 | 79.64 ± 1.05 | 8.41 ± 0.81 | 56.16 ± 0.09 | 82.45 ± 0.08 | 64.19 ± 0.38 | 69.70 ± 0.16 | 57.58 ± 0.99 |
![]() ![]() Qwen 3 8B Alibaba | 58.57 ± 0.25 | 73.69 ± 1.11 | 28.88 ± 1.40 | 55.95 ± 0.06 | 68.77 ± 1.25 | 65.80 ± 0.28 | 74.54 ± 0.41 | 42.32 ± 1.45 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 58.52 ± 0.06 | 82.86 ± 0.50 | 11.37 ± 0.63 | 58.47 ± 0.07 | 75.59 ± 0.06 | 67.47 ± 0.21 | 67.04 ± 0.05 | 46.85 ± 0.29 |
![]() ![]() Gemma 2 27B | 57.80 ± 0.36 | 68.21 ± 2.69 | 11.95 ± 1.36 | 55.26 ± 0.26 | 78.99 ± 1.59 | 66.09 ± 0.33 | 74.76 ± 0.58 | 49.31 ± 1.13 |
![]() ![]() Llama 3.1 70B Meta | 57.28 ± 0.33 | 68.45 ± 1.56 | 8.07 ± 1.00 | 57.19 ± 0.22 | 80.58 ± 1.05 | 60.72 ± 0.85 | 69.28 ± 0.64 | 56.69 ± 1.01 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 57.24 ± 0.65 | 68.93 ± 2.39 | 19.13 ± 0.73 | 55.72 ± 0.48 | 75.81 ± 1.06 | 62.90 ± 0.90 | 74.01 ± 0.51 | 44.19 ± 0.94 |
![]() ![]() Qwen 2.5 14B Alibaba | 56.23 ± 0.39 | 65.71 ± 1.58 | 12.71 ± 0.59 | 55.30 ± 0.20 | 71.94 ± 0.44 | 66.16 ± 0.39 | 71.81 ± 0.20 | 50.00 ± 0.65 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 55.62 ± 2.15 | 68.33 ± 2.36 | 12.74 ± 0.56 | 55.06 ± 0.55 | 72.60 ± 3.71 | 70.45 ± 2.08 | 68.54 ± 8.13 | 41.63 ± 2.94 |
![]() ![]() Gemma 2 9B | 53.68 ± 0.51 | 64.88 ± 1.74 | 6.11 ± 0.59 | 54.02 ± 0.93 | 74.03 ± 0.91 | 62.13 ± 0.81 | 72.06 ± 0.43 | 42.52 ± 0.71 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 53.05 ± 0.53 | 66.90 ± 1.79 | 17.58 ± 1.62 | 56.39 ± 0.15 | 72.42 ± 1.64 | 58.76 ± 0.99 | 66.98 ± 0.54 | 32.28 ± 2.06 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 52.11 ± 0.37 | 60.48 ± 1.76 | 13.53 ± 0.44 | 51.30 ± 0.14 | 70.14 ± 1.21 | 63.27 ± 0.40 | 70.13 ± 0.24 | 35.93 ± 1.30 |
![]() ![]() MERaLiON 2 10B A*STAR | 51.78 ± 0.81 | 61.19 ± 2.63 | 4.22 ± 0.38 | 51.96 ± 1.85 | 70.75 ± 1.24 | 58.49 ± 0.67 | 71.88 ± 0.43 | 44.00 ± 1.99 |
![]() ![]() Qwen 2.5 7B Alibaba | 49.64 ± 2.70 | 62.98 ± 0.96 | 11.47 ± 0.96 | 53.22 ± 0.09 | 70.71 ± 0.40 | 66.82 ± 1.35 | 43.29 ± 17.62 | 38.98 ± 0.97 |
![]() ![]() Llama 3 70B Meta | 48.80 ± 0.13 | 17.74 ± 0.78 | 6.15 ± 0.63 | 56.67 ± 0.15 | 76.09 ± 0.30 | 63.18 ± 0.38 | 72.56 ± 0.08 | 49.21 ± 0.65 |
![]() ![]() Tulu 3 8B AI2 | 48.34 ± 0.63 | 66.31 ± 1.93 | 5.87 ± 0.27 | 53.32 ± 0.27 | 64.72 ± 2.58 | 57.25 ± 0.49 | 65.25 ± 1.39 | 25.69 ± 1.82 |
![]() ![]() Sailor2 20B SAIL | 47.52 ± 0.48 | 37.26 ± 1.56 | 19.99 ± 1.28 | 19.64 ± 1.99 | 83.32 ± 0.58 | 50.77 ± 1.12 | 69.57 ± 0.18 | 52.07 ± 0.46 |
![]() ![]() Sailor2 8B SAIL | 47.38 ± 0.34 | 32.98 ± 1.54 | 20.74 ± 0.87 | 41.05 ± 1.04 | 77.12 ± 0.29 | 44.05 ± 1.53 | 69.95 ± 0.70 | 45.77 ± 1.15 |
![]() ![]() Llama 3.1 8B Meta | 46.93 ± 0.57 | 60.83 ± 1.74 | 5.49 ± 0.43 | 45.13 ± 0.73 | 63.21 ± 2.52 | 60.30 ± 0.60 | 59.16 ± 1.26 | 34.35 ± 2.16 |
![]() ![]() Command A 03-2025 111B CohereLabs | 44.92 ± 4.59 | 67.98 ± 1.76 | 9.10 ± 0.65 | 48.88 ± 0.27 | 49.51 ± 7.76 | 57.79 ± 1.89 | 47.90 ± 19.31 | 33.27 ± 8.26 |
![]() ![]() Olmo 2 0325 32B AI2 | 44.09 ± 0.97 | 59.40 ± 2.20 | 1.27 ± 0.35 | 29.77 ± 0.86 | 64.34 ± 2.52 | 52.39 ± 6.39 | 70.25 ± 0.60 | 31.20 ± 3.06 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 44.05 ± 0.96 | 50.12 ± 3.07 | 1.85 ± 0.68 | 45.16 ± 0.47 | 59.29 ± 1.27 | 57.07 ± 3.27 | 69.76 ± 3.16 | 25.10 ± 2.74 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 43.94 ± 0.73 | 49.17 ± 2.54 | 1.51 ± 0.64 | 42.17 ± 0.80 | 56.96 ± 3.21 | 52.94 ± 1.36 | 69.36 ± 0.95 | 35.43 ± 1.80 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.78 ± 0.98 | 42.86 ± 3.60 | 2.75 ± 0.39 | 34.71 ± 1.12 | 64.45 ± 3.92 | 60.10 ± 1.41 | 67.70 ± 4.94 | 26.87 ± 3.90 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.00 ± 2.35 | 41.43 ± 2.31 | 7.21 ± 0.92 | 49.59 ± 0.99 | 55.58 ± 12.71 | 55.83 ± 2.56 | 56.52 ± 8.20 | 20.87 ± 7.31 |
![]() ![]() Aya Expanse 32B CohereLabs | 40.66 ± 0.34 | 47.98 ± 2.23 | 2.58 ± 0.52 | 30.88 ± 0.71 | 62.59 ± 0.59 | 50.68 ± 1.31 | 68.67 ± 0.16 | 21.26 ± 0.92 |
![]() ![]() Command R 08-2024 32B CohereLabs | 40.20 ± 0.79 | 38.81 ± 2.20 | 1.68 ± 0.19 | 36.17 ± 1.80 | 57.57 ± 2.84 | 51.22 ± 3.90 | 69.66 ± 0.30 | 26.28 ± 3.32 |
![]() ![]() phi-4 14B Microsoft | 39.41 ± 2.54 | 48.45 ± 2.43 | 9.89 ± 1.23 | 40.26 ± 0.62 | 44.52 ± 11.18 | 58.16 ± 3.33 | 66.25 ± 5.04 | 8.37 ± 4.44 |
![]() ![]() Llama 3 8B Meta | 39.28 ± 0.54 | 15.83 ± 1.05 | 2.44 ± 0.45 | 42.84 ± 0.14 | 56.44 ± 1.73 | 57.84 ± 0.41 | 65.69 ± 0.55 | 33.86 ± 2.02 |
![]() ![]() Babel 83B Alibaba-DAMO | 38.88 ± 4.47 | 33.69 ± 3.58 | 3.81 ± 0.66 | 28.48 ± 4.28 | 58.06 ± 11.95 | 52.88 ± 5.83 | 53.57 ± 16.73 | 41.63 ± 5.34 |
![]() ![]() Aya Expanse 8B CohereLabs | 33.07 ± 0.56 | 32.62 ± 1.72 | 1.13 ± 0.40 | 20.86 ± 0.50 | 48.74 ± 1.93 | 44.64 ± 2.10 | 58.30 ± 0.93 | 25.20 ± 0.92 |
![]() ![]() Ministral 2410 8B Mistral AI | 32.06 ± 1.64 | 22.26 ± 1.99 | 1.30 ± 0.24 | 28.77 ± 1.41 | 42.32 ± 6.05 | 46.20 ± 5.13 | 63.10 ± 6.21 | 20.47 ± 4.06 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 30.14 ± 2.73 | 40.36 ± 0.78 | 0.58 ± 0.24 | 26.28 ± 0.98 | 43.69 ± 1.64 | 33.25 ± 2.54 | 47.45 ± 13.68 | 19.39 ± 2.94 |
![]() ![]() Olmo 2 1124 7B AI2 | 26.04 ± 1.30 | 37.26 ± 2.35 | 0.17 ± 0.14 | 16.54 ± 0.60 | 25.49 ± 5.31 | 36.11 ± 3.85 | 60.89 ± 0.94 | 5.81 ± 3.69 |
![]() ![]() Olmo 2 1124 13B AI2 | 25.67 ± 1.26 | 37.26 ± 1.59 | 0.21 ± 0.20 | 22.11 ± 0.70 | 45.09 ± 2.44 | 45.14 ± 4.41 | 18.06 ± 6.03 | 11.81 ± 3.87 |
Thai Tasks
Average of 8 runs. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TH | Instruction Following | SEA-IFEval |
---|---|---|---|
![]() ![]() Qwen 3 32B Alibaba | 65.36 ± 0.33 | 75.36 ± 1.71 | 75.36 ± 7.50 |
![]() ![]() Qwen 3 30B MoE Alibaba | 64.57 ± 0.21 | 77.86 ± 1.31 | 77.86 ± 7.40 |
![]() ![]() SEA-LION v4 27B AISG | 63.18 ± 0.16 | 75.36 ± 1.43 | 75.36 ± 7.29 |
![]() ![]() Qwen 3 14B Alibaba | 63.01 ± 0.32 | 80.71 ± 1.68 | 80.71 ± 6.67 |
![]() ![]() Qwen 2.5 72B Alibaba | 62.91 ± 0.44 | 78.33 ± 1.72 | 78.33 ± 7.11 |
![]() ![]() Gemma 3 27B | 62.79 ± 0.32 | 72.26 ± 1.95 | 72.26 ± 7.97 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.67 ± 0.31 | 82.38 ± 1.58 | 82.38 ± 6.45 |
![]() ![]() Tulu 3 70B AI2 | 61.09 ± 0.40 | 76.90 ± 1.89 | 76.90 ± 6.98 |
![]() ![]() Gemma 3 12B | 60.27 ± 0.22 | 70.48 ± 1.27 | 70.48 ± 7.97 |
![]() ![]() Qwen 2.5 32B Alibaba | 60.10 ± 0.27 | 76.67 ± 1.17 | 76.67 ± 7.35 |
![]() ![]() Llama 3.3 70B Meta | 59.73 ± 0.23 | 79.64 ± 1.05 | 79.64 ± 7.29 |
![]() ![]() Qwen 3 8B Alibaba | 58.57 ± 0.25 | 73.69 ± 1.11 | 73.69 ± 7.91 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 58.52 ± 0.06 | 82.86 ± 0.50 | 82.86 ± 6.93 |
![]() ![]() Gemma 2 27B | 57.80 ± 0.36 | 68.21 ± 2.69 | 68.21 ± 7.15 |
![]() ![]() Llama 3.1 70B Meta | 57.28 ± 0.33 | 68.45 ± 1.56 | 68.45 ± 7.33 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 57.24 ± 0.65 | 68.93 ± 2.39 | 68.93 ± 7.36 |
![]() ![]() Qwen 2.5 14B Alibaba | 56.23 ± 0.39 | 65.71 ± 1.58 | 65.71 ± 7.80 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 55.62 ± 2.15 | 68.33 ± 2.36 | 68.33 ± 6.89 |
![]() ![]() Gemma 2 9B | 53.68 ± 0.51 | 64.88 ± 1.74 | 64.88 ± 7.15 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 53.05 ± 0.53 | 66.90 ± 1.79 | 66.90 ± 7.69 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 52.11 ± 0.37 | 60.48 ± 1.76 | 60.48 ± 8.03 |
![]() ![]() MERaLiON 2 10B A*STAR | 51.78 ± 0.81 | 61.19 ± 2.63 | 61.19 ± 6.86 |
![]() ![]() Qwen 2.5 7B Alibaba | 49.64 ± 2.70 | 62.98 ± 0.96 | 62.98 ± 7.75 |
![]() ![]() Llama 3 70B Meta | 48.80 ± 0.13 | 17.74 ± 0.78 | 17.74 ± 6.87 |
![]() ![]() Tulu 3 8B AI2 | 48.34 ± 0.63 | 66.31 ± 1.93 | 66.31 ± 7.79 |
![]() ![]() Sailor2 20B SAIL | 47.52 ± 0.48 | 37.26 ± 1.56 | 37.26 ± 7.85 |
![]() ![]() Sailor2 8B SAIL | 47.38 ± 0.34 | 32.98 ± 1.54 | 32.98 ± 8.06 |
![]() ![]() Llama 3.1 8B Meta | 46.93 ± 0.57 | 60.83 ± 1.74 | 60.83 ± 7.88 |
![]() ![]() Command A 03-2025 111B CohereLabs | 44.92 ± 4.59 | 67.98 ± 1.76 | 67.98 ± 6.73 |
![]() ![]() Olmo 2 0325 32B AI2 | 44.09 ± 0.97 | 59.40 ± 2.20 | 59.40 ± 7.48 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 44.05 ± 0.96 | 50.12 ± 3.07 | 50.12 ± 7.49 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 43.94 ± 0.73 | 49.17 ± 2.54 | 49.17 ± 7.44 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.78 ± 0.98 | 42.86 ± 3.60 | 42.86 ± 7.46 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.00 ± 2.35 | 41.43 ± 2.31 | 41.43 ± 7.70 |
![]() ![]() Aya Expanse 32B CohereLabs | 40.66 ± 0.34 | 47.98 ± 2.23 | 47.98 ± 8.18 |
![]() ![]() Command R 08-2024 32B CohereLabs | 40.20 ± 0.79 | 38.81 ± 2.20 | 38.81 ± 6.93 |
![]() ![]() phi-4 14B Microsoft | 39.41 ± 2.54 | 48.45 ± 2.43 | 48.45 ± 7.26 |
![]() ![]() Llama 3 8B Meta | 39.28 ± 0.54 | 15.83 ± 1.05 | 15.83 ± 6.36 |
![]() ![]() Babel 83B Alibaba-DAMO | 38.88 ± 4.47 | 33.69 ± 3.58 | 33.69 ± 5.76 |
![]() ![]() Aya Expanse 8B CohereLabs | 33.07 ± 0.56 | 32.62 ± 1.72 | 32.62 ± 7.63 |
![]() ![]() Ministral 2410 8B Mistral AI | 32.06 ± 1.64 | 22.26 ± 1.99 | 22.26 ± 5.26 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 30.14 ± 2.73 | 40.36 ± 0.78 | 40.36 ± 7.49 |
![]() ![]() Olmo 2 1124 7B AI2 | 26.04 ± 1.30 | 37.26 ± 2.35 | 37.26 ± 6.90 |
![]() ![]() Olmo 2 1124 13B AI2 | 25.67 ± 1.26 | 37.26 ± 1.59 | 37.26 ± 7.20 |
Model | TH | Multi-Turn Chat | SEA-MT-Bench |
---|---|---|---|
![]() ![]() Qwen 3 32B Alibaba | 65.36 ± 0.33 | 43.41 ± 1.73 | 43.41 ± 5.07 |
![]() ![]() Qwen 3 30B MoE Alibaba | 64.57 ± 0.21 | 51.44 ± 1.43 | 51.44 ± 5.21 |
![]() ![]() SEA-LION v4 27B AISG | 63.18 ± 0.16 | 41.35 ± 1.18 | 41.35 ± 5.04 |
![]() ![]() Qwen 3 14B Alibaba | 63.01 ± 0.32 | 35.54 ± 1.73 | 35.54 ± 4.72 |
![]() ![]() Qwen 2.5 72B Alibaba | 62.91 ± 0.44 | 30.08 ± 1.55 | 30.08 ± 4.55 |
![]() ![]() Gemma 3 27B | 62.79 ± 0.32 | 40.73 ± 1.46 | 40.73 ± 5.15 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.67 ± 0.31 | 20.91 ± 1.14 | 20.91 ± 3.76 |
![]() ![]() Tulu 3 70B AI2 | 61.09 ± 0.40 | 21.22 ± 1.12 | 21.22 ± 3.98 |
![]() ![]() Gemma 3 12B | 60.27 ± 0.22 | 36.20 ± 1.13 | 36.20 ± 4.93 |
![]() ![]() Qwen 2.5 32B Alibaba | 60.10 ± 0.27 | 13.84 ± 0.77 | 13.84 ± 3.14 |
![]() ![]() Llama 3.3 70B Meta | 59.73 ± 0.23 | 8.41 ± 0.81 | 8.41 ± 2.54 |
![]() ![]() Qwen 3 8B Alibaba | 58.57 ± 0.25 | 28.88 ± 1.40 | 28.88 ± 4.49 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 58.52 ± 0.06 | 11.37 ± 0.63 | 11.37 ± 2.93 |
![]() ![]() Gemma 2 27B | 57.80 ± 0.36 | 11.95 ± 1.36 | 11.95 ± 2.86 |
![]() ![]() Llama 3.1 70B Meta | 57.28 ± 0.33 | 8.07 ± 1.00 | 8.07 ± 2.65 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 57.24 ± 0.65 | 19.13 ± 0.73 | 19.13 ± 3.63 |
![]() ![]() Qwen 2.5 14B Alibaba | 56.23 ± 0.39 | 12.71 ± 0.59 | 12.71 ± 3.02 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 55.62 ± 2.15 | 12.74 ± 0.56 | 12.74 ± 3.18 |
![]() ![]() Gemma 2 9B | 53.68 ± 0.51 | 6.11 ± 0.59 | 6.11 ± 1.97 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 53.05 ± 0.53 | 17.58 ± 1.62 | 17.58 ± 3.48 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 52.11 ± 0.37 | 13.53 ± 0.44 | 13.53 ± 3.35 |
![]() ![]() MERaLiON 2 10B A*STAR | 51.78 ± 0.81 | 4.22 ± 0.38 | 4.22 ± 1.75 |
![]() ![]() Qwen 2.5 7B Alibaba | 49.64 ± 2.70 | 11.47 ± 0.96 | 11.47 ± 2.93 |
![]() ![]() Llama 3 70B Meta | 48.80 ± 0.13 | 6.15 ± 0.63 | 6.15 ± 2.14 |
![]() ![]() Tulu 3 8B AI2 | 48.34 ± 0.63 | 5.87 ± 0.27 | 5.87 ± 1.99 |
![]() ![]() Sailor2 20B SAIL | 47.52 ± 0.48 | 19.99 ± 1.28 | 19.99 ± 3.69 |
![]() ![]() Sailor2 8B SAIL | 47.38 ± 0.34 | 20.74 ± 0.87 | 20.74 ± 3.91 |
![]() ![]() Llama 3.1 8B Meta | 46.93 ± 0.57 | 5.49 ± 0.43 | 5.49 ± 1.97 |
![]() ![]() Command A 03-2025 111B CohereLabs | 44.92 ± 4.59 | 9.10 ± 0.65 | 9.10 ± 2.78 |
![]() ![]() Olmo 2 0325 32B AI2 | 44.09 ± 0.97 | 1.27 ± 0.35 | 1.27 ± 0.86 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 44.05 ± 0.96 | 1.85 ± 0.68 | 1.85 ± 1.18 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 43.94 ± 0.73 | 1.51 ± 0.64 | 1.51 ± 0.84 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.78 ± 0.98 | 2.75 ± 0.39 | 2.75 ± 1.20 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.00 ± 2.35 | 7.21 ± 0.92 | 7.21 ± 2.23 |
![]() ![]() Aya Expanse 32B CohereLabs | 40.66 ± 0.34 | 2.58 ± 0.52 | 2.58 ± 1.47 |
![]() ![]() Command R 08-2024 32B CohereLabs | 40.20 ± 0.79 | 1.68 ± 0.19 | 1.68 ± 1.02 |
![]() ![]() phi-4 14B Microsoft | 39.41 ± 2.54 | 9.89 ± 1.23 | 9.89 ± 2.88 |
![]() ![]() Llama 3 8B Meta | 39.28 ± 0.54 | 2.44 ± 0.45 | 2.44 ± 1.15 |
![]() ![]() Babel 83B Alibaba-DAMO | 38.88 ± 4.47 | 3.81 ± 0.66 | 3.81 ± 0.92 |
![]() ![]() Aya Expanse 8B CohereLabs | 33.07 ± 0.56 | 1.13 ± 0.40 | 1.13 ± 0.88 |
![]() ![]() Ministral 2410 8B Mistral AI | 32.06 ± 1.64 | 1.30 ± 0.24 | 1.30 ± 1.01 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 30.14 ± 2.73 | 0.58 ± 0.24 | 0.58 ± 0.37 |
![]() ![]() Olmo 2 1124 7B AI2 | 26.04 ± 1.30 | 0.17 ± 0.14 | 0.17 ± 0.22 |
![]() ![]() Olmo 2 1124 13B AI2 | 25.67 ± 1.26 | 0.21 ± 0.20 | 0.21 ± 0.21 |
Model | TH | NLG | Summarization | Translations |
---|---|---|---|---|
![]() ![]() Qwen 3 32B Alibaba | 65.36 ± 0.33 | 57.10 ± 0.07 | 23.71 ± 1.48 | 90.49 ± 0.05 |
![]() ![]() Qwen 3 30B MoE Alibaba | 64.57 ± 0.21 | 55.82 ± 0.17 | 21.41 ± 1.37 | 90.22 ± 0.11 |
![]() ![]() SEA-LION v4 27B AISG | 63.18 ± 0.16 | 58.32 ± 0.14 | 23.71 ± 1.45 | 92.93 ± 0.05 |
![]() ![]() Qwen 3 14B Alibaba | 63.01 ± 0.32 | 54.75 ± 0.13 | 23.01 ± 1.35 | 86.48 ± 0.07 |
![]() ![]() Qwen 2.5 72B Alibaba | 62.91 ± 0.44 | 57.71 ± 0.08 | 24.12 ± 1.49 | 91.30 ± 0.01 |
![]() ![]() Gemma 3 27B | 62.79 ± 0.32 | 58.51 ± 0.08 | 24.00 ± 1.50 | 93.03 ± 0.04 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.67 ± 0.31 | 58.55 ± 0.14 | 25.36 ± 1.61 | 91.74 ± 0.10 |
![]() ![]() Tulu 3 70B AI2 | 61.09 ± 0.40 | 56.20 ± 0.23 | 21.63 ± 1.23 | 90.76 ± 0.07 |
![]() ![]() Gemma 3 12B | 60.27 ± 0.22 | 57.92 ± 0.12 | 23.60 ± 1.56 | 92.24 ± 0.04 |
![]() ![]() Qwen 2.5 32B Alibaba | 60.10 ± 0.27 | 55.76 ± 0.10 | 23.57 ± 1.48 | 87.94 ± 0.13 |
![]() ![]() Llama 3.3 70B Meta | 59.73 ± 0.23 | 56.16 ± 0.09 | 23.03 ± 1.39 | 89.30 ± 0.06 |
![]() ![]() Qwen 3 8B Alibaba | 58.57 ± 0.25 | 55.95 ± 0.06 | 23.02 ± 1.46 | 88.89 ± 0.07 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 58.52 ± 0.06 | 58.47 ± 0.07 | 25.84 ± 1.75 | 91.11 ± 0.04 |
![]() ![]() Gemma 2 27B | 57.80 ± 0.36 | 55.26 ± 0.26 | 25.13 ± 1.59 | 85.40 ± 0.37 |
![]() ![]() Llama 3.1 70B Meta | 57.28 ± 0.33 | 57.19 ± 0.22 | 26.13 ± 1.61 | 88.24 ± 0.35 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 57.24 ± 0.65 | 55.72 ± 0.48 | 23.87 ± 1.36 | 87.57 ± 1.01 |
![]() ![]() Qwen 2.5 14B Alibaba | 56.23 ± 0.39 | 55.30 ± 0.20 | 23.67 ± 1.41 | 86.93 ± 0.15 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 55.62 ± 2.15 | 55.06 ± 0.55 | 22.84 ± 1.47 | 87.28 ± 1.00 |
![]() ![]() Gemma 2 9B | 53.68 ± 0.51 | 54.02 ± 0.93 | 25.09 ± 1.48 | 82.95 ± 1.65 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 53.05 ± 0.53 | 56.39 ± 0.15 | 23.87 ± 1.44 | 88.91 ± 0.09 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 52.11 ± 0.37 | 51.30 ± 0.14 | 19.27 ± 1.01 | 83.33 ± 0.17 |
![]() ![]() MERaLiON 2 10B A*STAR | 51.78 ± 0.81 | 51.96 ± 1.85 | 23.77 ± 1.38 | 80.15 ± 3.52 |
![]() ![]() Qwen 2.5 7B Alibaba | 49.64 ± 2.70 | 53.22 ± 0.09 | 22.94 ± 1.37 | 83.51 ± 0.10 |
![]() ![]() Llama 3 70B Meta | 48.80 ± 0.13 | 56.67 ± 0.15 | 24.34 ± 1.57 | 89.00 ± 0.11 |
![]() ![]() Tulu 3 8B AI2 | 48.34 ± 0.63 | 53.32 ± 0.27 | 23.75 ± 1.56 | 82.89 ± 0.36 |
![]() ![]() Sailor2 20B SAIL | 47.52 ± 0.48 | 19.64 ± 1.99 | 0.00 ± 0.00 | 39.28 ± 3.97 |
![]() ![]() Sailor2 8B SAIL | 47.38 ± 0.34 | 41.05 ± 1.04 | 0.00 ± 0.00 | 82.09 ± 2.08 |
![]() ![]() Llama 3.1 8B Meta | 46.93 ± 0.57 | 45.13 ± 0.73 | 25.08 ± 1.71 | 65.17 ± 1.36 |
![]() ![]() Command A 03-2025 111B CohereLabs | 44.92 ± 4.59 | 48.88 ± 0.27 | 21.29 ± 1.20 | 76.46 ± 0.70 |
![]() ![]() Olmo 2 0325 32B AI2 | 44.09 ± 0.97 | 29.77 ± 0.86 | 0.00 ± 0.00 | 59.54 ± 1.72 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 44.05 ± 0.96 | 45.16 ± 0.47 | 19.68 ± 0.93 | 70.65 ± 0.94 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 43.94 ± 0.73 | 42.17 ± 0.80 | 20.32 ± 1.14 | 64.03 ± 1.72 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.78 ± 0.98 | 34.71 ± 1.12 | 20.75 ± 1.20 | 48.67 ± 2.60 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.00 ± 2.35 | 49.59 ± 0.99 | 21.55 ± 1.31 | 77.62 ± 2.00 |
![]() ![]() Aya Expanse 32B CohereLabs | 40.66 ± 0.34 | 30.88 ± 0.71 | 0.00 ± 0.00 | 61.75 ± 1.42 |
![]() ![]() Command R 08-2024 32B CohereLabs | 40.20 ± 0.79 | 36.17 ± 1.80 | 18.35 ± 0.90 | 53.99 ± 3.44 |
![]() ![]() phi-4 14B Microsoft | 39.41 ± 2.54 | 40.26 ± 0.62 | 18.73 ± 0.87 | 61.80 ± 1.25 |
![]() ![]() Llama 3 8B Meta | 39.28 ± 0.54 | 42.84 ± 0.14 | 22.99 ± 1.40 | 62.68 ± 0.15 |
![]() ![]() Babel 83B Alibaba-DAMO | 38.88 ± 4.47 | 28.48 ± 4.28 | 19.87 ± 1.04 | 37.09 ± 8.19 |
![]() ![]() Aya Expanse 8B CohereLabs | 33.07 ± 0.56 | 20.86 ± 0.50 | 0.00 ± 0.00 | 41.71 ± 0.99 |
![]() ![]() Ministral 2410 8B Mistral AI | 32.06 ± 1.64 | 28.77 ± 1.41 | 14.85 ± 1.06 | 42.68 ± 2.10 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 30.14 ± 2.73 | 26.28 ± 0.98 | 15.36 ± 1.19 | 37.20 ± 1.55 |
![]() ![]() Olmo 2 1124 7B AI2 | 26.04 ± 1.30 | 16.54 ± 0.60 | 0.00 ± 0.00 | 33.07 ± 1.20 |
![]() ![]() Olmo 2 1124 13B AI2 | 25.67 ± 1.26 | 22.11 ± 0.70 | 0.00 ± 0.00 | 44.22 ± 1.40 |
Model | TH | NLR | Causal Reasoning | Natural Language Inference |
---|---|---|---|---|
![]() ![]() Qwen 3 32B Alibaba | 65.36 ± 0.33 | 79.79 ± 0.46 | 94.70 ± 1.86 | 64.88 ± 2.78 |
![]() ![]() Qwen 3 30B MoE Alibaba | 64.57 ± 0.21 | 81.27 ± 0.08 | 93.70 ± 2.05 | 68.84 ± 2.83 |
![]() ![]() SEA-LION v4 27B AISG | 63.18 ± 0.16 | 79.55 ± 0.49 | 93.85 ± 1.98 | 65.25 ± 2.78 |
![]() ![]() Qwen 3 14B Alibaba | 63.01 ± 0.32 | 73.42 ± 0.29 | 89.98 ± 2.53 | 56.88 ± 2.99 |
![]() ![]() Qwen 2.5 72B Alibaba | 62.91 ± 0.44 | 79.03 ± 0.76 | 96.83 ± 1.46 | 61.24 ± 2.89 |
![]() ![]() Gemma 3 27B | 62.79 ± 0.32 | 79.41 ± 0.26 | 93.95 ± 2.00 | 64.86 ± 2.88 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.67 ± 0.31 | 82.14 ± 0.68 | 94.58 ± 1.60 | 69.71 ± 2.44 |
![]() ![]() Tulu 3 70B AI2 | 61.09 ± 0.40 | 83.38 ± 0.31 | 95.15 ± 1.72 | 71.60 ± 2.55 |
![]() ![]() Gemma 3 12B | 60.27 ± 0.22 | 78.54 ± 0.29 | 92.47 ± 2.18 | 64.60 ± 2.83 |
![]() ![]() Qwen 2.5 32B Alibaba | 60.10 ± 0.27 | 78.29 ± 0.24 | 94.73 ± 1.94 | 61.85 ± 2.96 |
![]() ![]() Llama 3.3 70B Meta | 59.73 ± 0.23 | 82.45 ± 0.08 | 95.80 ± 1.70 | 69.10 ± 2.79 |
![]() ![]() Qwen 3 8B Alibaba | 58.57 ± 0.25 | 68.77 ± 1.25 | 80.45 ± 3.18 | 57.09 ± 2.83 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 58.52 ± 0.06 | 75.59 ± 0.06 | 93.60 ± 2.12 | 57.59 ± 3.04 |
![]() ![]() Gemma 2 27B | 57.80 ± 0.36 | 78.99 ± 1.59 | 93.20 ± 1.96 | 64.79 ± 2.47 |
![]() ![]() Llama 3.1 70B Meta | 57.28 ± 0.33 | 80.58 ± 1.05 | 94.75 ± 1.59 | 66.41 ± 2.57 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 57.24 ± 0.65 | 75.81 ± 1.06 | 92.35 ± 2.04 | 59.27 ± 2.64 |
![]() ![]() Qwen 2.5 14B Alibaba | 56.23 ± 0.39 | 71.94 ± 0.44 | 92.70 ± 2.27 | 51.18 ± 3.01 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 55.62 ± 2.15 | 72.60 ± 3.71 | 84.88 ± 2.39 | 60.32 ± 2.53 |
![]() ![]() Gemma 2 9B | 53.68 ± 0.51 | 74.03 ± 0.91 | 91.55 ± 2.20 | 56.51 ± 2.75 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 53.05 ± 0.53 | 72.42 ± 1.64 | 83.25 ± 2.40 | 61.60 ± 2.40 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 52.11 ± 0.37 | 70.14 ± 1.21 | 79.97 ± 3.23 | 60.31 ± 2.62 |
![]() ![]() MERaLiON 2 10B A*STAR | 51.78 ± 0.81 | 70.75 ± 1.24 | 90.85 ± 2.14 | 50.65 ± 2.58 |
![]() ![]() Qwen 2.5 7B Alibaba | 49.64 ± 2.70 | 70.71 ± 0.40 | 85.15 ± 3.07 | 56.26 ± 2.98 |
![]() ![]() Llama 3 70B Meta | 48.80 ± 0.13 | 76.09 ± 0.30 | 89.25 ± 2.60 | 62.94 ± 2.94 |
![]() ![]() Tulu 3 8B AI2 | 48.34 ± 0.63 | 64.72 ± 2.58 | 73.65 ± 3.07 | 55.79 ± 2.69 |
![]() ![]() Sailor2 20B SAIL | 47.52 ± 0.48 | 83.32 ± 0.58 | 94.55 ± 1.95 | 72.09 ± 2.69 |
![]() ![]() Sailor2 8B SAIL | 47.38 ± 0.34 | 77.12 ± 0.29 | 93.03 ± 2.13 | 61.21 ± 2.80 |
![]() ![]() Llama 3.1 8B Meta | 46.93 ± 0.57 | 63.21 ± 2.52 | 71.73 ± 3.03 | 54.70 ± 2.44 |
![]() ![]() Command A 03-2025 111B CohereLabs | 44.92 ± 4.59 | 49.51 ± 7.76 | 84.50 ± 2.52 | 14.51 ± 0.91 |
![]() ![]() Olmo 2 0325 32B AI2 | 44.09 ± 0.97 | 64.34 ± 2.52 | 74.15 ± 2.90 | 54.52 ± 2.36 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 44.05 ± 0.96 | 59.29 ± 1.27 | 65.03 ± 2.53 | 53.55 ± 2.34 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 43.94 ± 0.73 | 56.96 ± 3.21 | 71.67 ± 2.83 | 42.25 ± 2.67 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.78 ± 0.98 | 64.45 ± 3.92 | 83.90 ± 2.09 | 45.00 ± 1.41 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.00 ± 2.35 | 55.58 ± 12.71 | 68.00 ± 1.84 | 43.15 ± 1.11 |
![]() ![]() Aya Expanse 32B CohereLabs | 40.66 ± 0.34 | 62.59 ± 0.59 | 69.15 ± 3.90 | 56.04 ± 2.91 |
![]() ![]() Command R 08-2024 32B CohereLabs | 40.20 ± 0.79 | 57.57 ± 2.84 | 67.85 ± 2.29 | 47.29 ± 1.71 |
![]() ![]() phi-4 14B Microsoft | 39.41 ± 2.54 | 44.52 ± 11.18 | 64.22 ± 2.82 | 24.81 ± 1.83 |
![]() ![]() Llama 3 8B Meta | 39.28 ± 0.54 | 56.44 ± 1.73 | 67.00 ± 2.30 | 45.88 ± 2.87 |
![]() ![]() Babel 83B Alibaba-DAMO | 38.88 ± 4.47 | 58.06 ± 11.95 | 84.95 ± 1.38 | 31.16 ± 1.06 |
![]() ![]() Aya Expanse 8B CohereLabs | 33.07 ± 0.56 | 48.74 ± 1.93 | 54.95 ± 4.07 | 42.54 ± 2.80 |
![]() ![]() Ministral 2410 8B Mistral AI | 32.06 ± 1.64 | 42.32 ± 6.05 | 42.50 ± 1.46 | 42.14 ± 0.96 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 30.14 ± 2.73 | 43.69 ± 1.64 | 53.35 ± 2.48 | 34.04 ± 2.04 |
![]() ![]() Olmo 2 1124 7B AI2 | 26.04 ± 1.30 | 25.49 ± 5.31 | 18.02 ± 2.21 | 32.96 ± 1.47 |
![]() ![]() Olmo 2 1124 13B AI2 | 25.67 ± 1.26 | 45.09 ± 2.44 | 51.38 ± 3.90 | 38.81 ± 2.07 |
Model | TH | NLU | Question Answering | Sentiment Analysis |
---|---|---|---|---|
![]() ![]() Qwen 3 32B Alibaba | 65.36 ± 0.33 | 72.80 ± 0.19 | 86.89 ± 5.22 | 58.71 ± 2.98 |
![]() ![]() Qwen 3 30B MoE Alibaba | 64.57 ± 0.21 | 65.19 ± 0.30 | 85.93 ± 5.62 | 44.45 ± 3.04 |
![]() ![]() SEA-LION v4 27B AISG | 63.18 ± 0.16 | 65.96 ± 0.77 | 85.50 ± 5.34 | 46.42 ± 2.97 |
![]() ![]() Qwen 3 14B Alibaba | 63.01 ± 0.32 | 71.83 ± 0.23 | 88.33 ± 5.05 | 55.33 ± 3.02 |
![]() ![]() Qwen 2.5 72B Alibaba | 62.91 ± 0.44 | 71.17 ± 0.18 | 85.15 ± 5.26 | 57.20 ± 2.98 |
![]() ![]() Gemma 3 27B | 62.79 ± 0.32 | 66.91 ± 0.53 | 86.20 ± 5.33 | 47.61 ± 3.01 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.67 ± 0.31 | 63.83 ± 0.71 | 86.08 ± 5.01 | 41.58 ± 2.86 |
![]() ![]() Tulu 3 70B AI2 | 61.09 ± 0.40 | 64.00 ± 1.39 | 86.04 ± 5.20 | 41.96 ± 2.83 |
![]() ![]() Gemma 3 12B | 60.27 ± 0.22 | 66.51 ± 0.41 | 82.26 ± 5.87 | 50.76 ± 3.00 |
![]() ![]() Qwen 2.5 32B Alibaba | 60.10 ± 0.27 | 70.38 ± 0.27 | 87.33 ± 4.98 | 53.42 ± 3.04 |
![]() ![]() Llama 3.3 70B Meta | 59.73 ± 0.23 | 64.19 ± 0.38 | 85.97 ± 5.11 | 42.40 ± 2.98 |
![]() ![]() Qwen 3 8B Alibaba | 58.57 ± 0.25 | 65.80 ± 0.28 | 83.87 ± 5.90 | 47.74 ± 3.05 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 58.52 ± 0.06 | 67.47 ± 0.21 | 88.76 ± 4.98 | 46.17 ± 3.07 |
![]() ![]() Gemma 2 27B | 57.80 ± 0.36 | 66.09 ± 0.33 | 85.41 ± 5.50 | 46.77 ± 2.99 |
![]() ![]() Llama 3.1 70B Meta | 57.28 ± 0.33 | 60.72 ± 0.85 | 83.07 ± 5.38 | 38.38 ± 2.87 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 57.24 ± 0.65 | 62.90 ± 0.90 | 82.70 ± 5.35 | 43.10 ± 2.92 |
![]() ![]() Qwen 2.5 14B Alibaba | 56.23 ± 0.39 | 66.16 ± 0.39 | 82.10 ± 5.32 | 50.21 ± 2.99 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 55.62 ± 2.15 | 70.45 ± 2.08 | 86.38 ± 4.40 | 54.52 ± 2.59 |
![]() ![]() Gemma 2 9B | 53.68 ± 0.51 | 62.13 ± 0.81 | 84.08 ± 5.56 | 40.17 ± 2.92 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 53.05 ± 0.53 | 58.76 ± 0.99 | 81.13 ± 5.72 | 36.39 ± 2.85 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 52.11 ± 0.37 | 63.27 ± 0.40 | 79.74 ± 6.43 | 46.80 ± 3.04 |
![]() ![]() MERaLiON 2 10B A*STAR | 51.78 ± 0.81 | 58.49 ± 0.67 | 79.12 ± 5.86 | 37.86 ± 2.87 |
![]() ![]() Qwen 2.5 7B Alibaba | 49.64 ± 2.70 | 66.82 ± 1.35 | 87.16 ± 4.89 | 46.49 ± 2.95 |
![]() ![]() Llama 3 70B Meta | 48.80 ± 0.13 | 63.18 ± 0.38 | 85.92 ± 5.59 | 40.45 ± 2.96 |
![]() ![]() Tulu 3 8B AI2 | 48.34 ± 0.63 | 57.25 ± 0.49 | 81.32 ± 6.36 | 33.17 ± 2.83 |
![]() ![]() Sailor2 20B SAIL | 47.52 ± 0.48 | 50.77 ± 1.12 | 49.79 ± 4.94 | 51.76 ± 2.95 |
![]() ![]() Sailor2 8B SAIL | 47.38 ± 0.34 | 44.05 ± 1.53 | 63.71 ± 5.59 | 24.39 ± 2.46 |
![]() ![]() Llama 3.1 8B Meta | 46.93 ± 0.57 | 60.30 ± 0.60 | 86.46 ± 5.10 | 34.14 ± 2.84 |
![]() ![]() Command A 03-2025 111B CohereLabs | 44.92 ± 4.59 | 57.79 ± 1.89 | 80.24 ± 4.89 | 35.34 ± 2.59 |
![]() ![]() Olmo 2 0325 32B AI2 | 44.09 ± 0.97 | 52.39 ± 6.39 | 71.36 ± 6.27 | 33.41 ± 2.15 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 44.05 ± 0.96 | 57.07 ± 3.27 | 75.98 ± 5.78 | 38.16 ± 2.42 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 43.94 ± 0.73 | 52.94 ± 1.36 | 74.12 ± 5.95 | 31.76 ± 2.72 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.78 ± 0.98 | 60.10 ± 1.41 | 83.09 ± 4.94 | 37.11 ± 2.59 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.00 ± 2.35 | 55.83 ± 2.56 | 74.89 ± 6.09 | 36.76 ± 2.37 |
![]() ![]() Aya Expanse 32B CohereLabs | 40.66 ± 0.34 | 50.68 ± 1.31 | 64.07 ± 6.68 | 37.29 ± 2.75 |
![]() ![]() Command R 08-2024 32B CohereLabs | 40.20 ± 0.79 | 51.22 ± 3.90 | 71.20 ± 6.39 | 31.25 ± 2.58 |
![]() ![]() phi-4 14B Microsoft | 39.41 ± 2.54 | 58.16 ± 3.33 | 71.91 ± 5.83 | 44.41 ± 2.48 |
![]() ![]() Llama 3 8B Meta | 39.28 ± 0.54 | 57.84 ± 0.41 | 80.84 ± 6.15 | 34.85 ± 2.88 |
![]() ![]() Babel 83B Alibaba-DAMO | 38.88 ± 4.47 | 52.88 ± 5.83 | 69.01 ± 3.92 | 36.75 ± 2.16 |
![]() ![]() Aya Expanse 8B CohereLabs | 33.07 ± 0.56 | 44.64 ± 2.10 | 53.79 ± 6.64 | 35.49 ± 2.53 |
![]() ![]() Ministral 2410 8B Mistral AI | 32.06 ± 1.64 | 46.20 ± 5.13 | 61.26 ± 5.93 | 31.14 ± 2.11 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 30.14 ± 2.73 | 33.25 ± 2.54 | 48.20 ± 6.62 | 18.30 ± 1.48 |
![]() ![]() Olmo 2 1124 7B AI2 | 26.04 ± 1.30 | 36.11 ± 3.85 | 41.71 ± 6.31 | 30.50 ± 1.79 |
![]() ![]() Olmo 2 1124 13B AI2 | 25.67 ± 1.26 | 45.14 ± 4.41 | 63.03 ± 6.34 | 27.26 ± 1.28 |
Model | TH | Safety | Toxicity Detection |
---|---|---|---|
![]() ![]() Qwen 3 32B Alibaba | 65.36 ± 0.33 | 75.83 ± 0.42 | 75.83 ± 2.59 |
![]() ![]() Qwen 3 30B MoE Alibaba | 64.57 ± 0.21 | 72.38 ± 0.16 | 72.38 ± 2.76 |
![]() ![]() SEA-LION v4 27B AISG | 63.18 ± 0.16 | 69.07 ± 0.26 | 69.08 ± 2.83 |
![]() ![]() Qwen 3 14B Alibaba | 63.01 ± 0.32 | 76.46 ± 0.08 | 76.46 ± 2.61 |
![]() ![]() Qwen 2.5 72B Alibaba | 62.91 ± 0.44 | 70.57 ± 0.63 | 70.58 ± 2.78 |
![]() ![]() Gemma 3 27B | 62.79 ± 0.32 | 69.19 ± 0.19 | 69.19 ± 2.84 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.67 ± 0.31 | 74.05 ± 1.17 | 74.05 ± 2.60 |
![]() ![]() Tulu 3 70B AI2 | 61.09 ± 0.40 | 76.54 ± 1.25 | 76.54 ± 2.49 |
![]() ![]() Gemma 3 12B | 60.27 ± 0.22 | 68.28 ± 0.38 | 68.27 ± 2.85 |
![]() ![]() Qwen 2.5 32B Alibaba | 60.10 ± 0.27 | 70.09 ± 0.16 | 70.09 ± 2.82 |
![]() ![]() Llama 3.3 70B Meta | 59.73 ± 0.23 | 69.70 ± 0.16 | 69.70 ± 2.84 |
![]() ![]() Qwen 3 8B Alibaba | 58.57 ± 0.25 | 74.54 ± 0.41 | 74.54 ± 2.59 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 58.52 ± 0.06 | 67.04 ± 0.05 | 67.04 ± 2.91 |
![]() ![]() Gemma 2 27B | 57.80 ± 0.36 | 74.76 ± 0.58 | 74.76 ± 2.63 |
![]() ![]() Llama 3.1 70B Meta | 57.28 ± 0.33 | 69.28 ± 0.64 | 69.27 ± 2.82 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 57.24 ± 0.65 | 74.01 ± 0.51 | 74.01 ± 2.67 |
![]() ![]() Qwen 2.5 14B Alibaba | 56.23 ± 0.39 | 71.81 ± 0.20 | 71.81 ± 2.77 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 55.62 ± 2.15 | 68.54 ± 8.13 | 68.54 ± 2.40 |
![]() ![]() Gemma 2 9B | 53.68 ± 0.51 | 72.06 ± 0.43 | 72.06 ± 2.76 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 53.05 ± 0.53 | 66.98 ± 0.54 | 66.97 ± 2.87 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 52.11 ± 0.37 | 70.13 ± 0.24 | 70.13 ± 2.79 |
![]() ![]() MERaLiON 2 10B A*STAR | 51.78 ± 0.81 | 71.88 ± 0.43 | 71.88 ± 2.75 |
![]() ![]() Qwen 2.5 7B Alibaba | 49.64 ± 2.70 | 43.29 ± 17.62 | 43.29 ± 1.98 |
![]() ![]() Llama 3 70B Meta | 48.80 ± 0.13 | 72.56 ± 0.08 | 72.56 ± 2.74 |
![]() ![]() Tulu 3 8B AI2 | 48.34 ± 0.63 | 65.25 ± 1.39 | 65.25 ± 2.74 |
![]() ![]() Sailor2 20B SAIL | 47.52 ± 0.48 | 69.57 ± 0.18 | 69.58 ± 2.84 |
![]() ![]() Sailor2 8B SAIL | 47.38 ± 0.34 | 69.95 ± 0.70 | 69.95 ± 2.78 |
![]() ![]() Llama 3.1 8B Meta | 46.93 ± 0.57 | 59.16 ± 1.26 | 59.16 ± 2.95 |
![]() ![]() Command A 03-2025 111B CohereLabs | 44.92 ± 4.59 | 47.90 ± 19.31 | 47.90 ± 1.78 |
![]() ![]() Olmo 2 0325 32B AI2 | 44.09 ± 0.97 | 70.25 ± 0.60 | 70.25 ± 2.56 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 44.05 ± 0.96 | 69.76 ± 3.16 | 69.76 ± 2.68 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 43.94 ± 0.73 | 69.36 ± 0.95 | 69.36 ± 2.56 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.78 ± 0.98 | 67.70 ± 4.94 | 67.70 ± 1.97 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.00 ± 2.35 | 56.52 ± 8.20 | 56.53 ± 1.84 |
![]() ![]() Aya Expanse 32B CohereLabs | 40.66 ± 0.34 | 68.67 ± 0.16 | 68.67 ± 2.86 |
![]() ![]() Command R 08-2024 32B CohereLabs | 40.20 ± 0.79 | 69.66 ± 0.30 | 69.66 ± 2.67 |
![]() ![]() phi-4 14B Microsoft | 39.41 ± 2.54 | 66.25 ± 5.04 | 66.25 ± 2.69 |
![]() ![]() Llama 3 8B Meta | 39.28 ± 0.54 | 65.69 ± 0.55 | 65.69 ± 2.91 |
![]() ![]() Babel 83B Alibaba-DAMO | 38.88 ± 4.47 | 53.57 ± 16.73 | 53.57 ± 1.70 |
![]() ![]() Aya Expanse 8B CohereLabs | 33.07 ± 0.56 | 58.30 ± 0.93 | 58.30 ± 2.81 |
![]() ![]() Ministral 2410 8B Mistral AI | 32.06 ± 1.64 | 63.10 ± 6.21 | 63.10 ± 1.83 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 30.14 ± 2.73 | 47.45 ± 13.68 | 47.45 ± 1.56 |
![]() ![]() Olmo 2 1124 7B AI2 | 26.04 ± 1.30 | 60.89 ± 0.94 | 60.89 ± 2.91 |
![]() ![]() Olmo 2 1124 13B AI2 | 25.67 ± 1.26 | 18.06 ± 6.03 | 18.06 ± 1.97 |
Model | TH | Knowledge | thai_exam |
---|---|---|---|
![]() ![]() Qwen 3 32B Alibaba | 65.36 ± 0.33 | 53.25 ± 0.58 | 53.25 ± 0.58 |
![]() ![]() Qwen 3 30B MoE Alibaba | 64.57 ± 0.21 | 48.03 ± 0.29 | 48.03 ± 0.29 |
![]() ![]() SEA-LION v4 27B AISG | 63.18 ± 0.16 | 52.66 ± 1.18 | 52.66 ± 1.18 |
![]() ![]() Qwen 3 14B Alibaba | 63.01 ± 0.32 | 48.33 ± 0.50 | 48.33 ± 0.50 |
![]() ![]() Qwen 2.5 72B Alibaba | 62.91 ± 0.44 | 53.44 ± 1.03 | 53.44 ± 1.03 |
![]() ![]() Gemma 3 27B | 62.79 ± 0.32 | 52.56 ± 0.81 | 52.56 ± 0.81 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 62.67 ± 0.31 | 56.79 ± 0.85 | 56.79 ± 0.85 |
![]() ![]() Tulu 3 70B AI2 | 61.09 ± 0.40 | 49.41 ± 1.36 | 49.41 ± 1.36 |
![]() ![]() Gemma 3 12B | 60.27 ± 0.22 | 44.00 ± 0.74 | 44.00 ± 0.74 |
![]() ![]() Qwen 2.5 32B Alibaba | 60.10 ± 0.27 | 55.71 ± 0.56 | 55.71 ± 0.56 |
![]() ![]() Llama 3.3 70B Meta | 59.73 ± 0.23 | 57.58 ± 0.99 | 57.58 ± 0.99 |
![]() ![]() Qwen 3 8B Alibaba | 58.57 ± 0.25 | 42.32 ± 1.45 | 42.32 ± 1.45 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 58.52 ± 0.06 | 46.85 ± 0.29 | 46.85 ± 0.29 |
![]() ![]() Gemma 2 27B | 57.80 ± 0.36 | 49.31 ± 1.13 | 49.31 ± 1.13 |
![]() ![]() Llama 3.1 70B Meta | 57.28 ± 0.33 | 56.69 ± 1.01 | 56.69 ± 1.01 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 57.24 ± 0.65 | 44.19 ± 0.94 | 44.19 ± 0.94 |
![]() ![]() Qwen 2.5 14B Alibaba | 56.23 ± 0.39 | 50.00 ± 0.65 | 50.00 ± 0.65 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 55.62 ± 2.15 | 41.63 ± 2.94 | 41.63 ± 2.94 |
![]() ![]() Gemma 2 9B | 53.68 ± 0.51 | 42.52 ± 0.71 | 42.52 ± 0.71 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 53.05 ± 0.53 | 32.28 ± 2.06 | 32.28 ± 2.06 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 52.11 ± 0.37 | 35.93 ± 1.30 | 35.93 ± 1.30 |
![]() ![]() MERaLiON 2 10B A*STAR | 51.78 ± 0.81 | 44.00 ± 1.99 | 44.00 ± 1.99 |
![]() ![]() Qwen 2.5 7B Alibaba | 49.64 ± 2.70 | 38.98 ± 0.97 | 38.98 ± 0.97 |
![]() ![]() Llama 3 70B Meta | 48.80 ± 0.13 | 49.21 ± 0.65 | 49.21 ± 0.65 |
![]() ![]() Tulu 3 8B AI2 | 48.34 ± 0.63 | 25.69 ± 1.82 | 25.69 ± 1.82 |
![]() ![]() Sailor2 20B SAIL | 47.52 ± 0.48 | 52.07 ± 0.46 | 52.07 ± 0.46 |
![]() ![]() Sailor2 8B SAIL | 47.38 ± 0.34 | 45.77 ± 1.15 | 45.77 ± 1.15 |
![]() ![]() Llama 3.1 8B Meta | 46.93 ± 0.57 | 34.35 ± 2.16 | 34.35 ± 2.16 |
![]() ![]() Command A 03-2025 111B CohereLabs | 44.92 ± 4.59 | 33.27 ± 8.26 | 33.27 ± 8.26 |
![]() ![]() Olmo 2 0325 32B AI2 | 44.09 ± 0.97 | 31.20 ± 3.06 | 31.20 ± 3.06 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 44.05 ± 0.96 | 25.10 ± 2.74 | 25.10 ± 2.74 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 43.94 ± 0.73 | 35.43 ± 1.80 | 35.43 ± 1.80 |
![]() ![]() Babel 9B Alibaba-DAMO | 42.78 ± 0.98 | 26.87 ± 3.90 | 26.87 ± 3.90 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 41.00 ± 2.35 | 20.87 ± 7.31 | 20.87 ± 7.31 |
![]() ![]() Aya Expanse 32B CohereLabs | 40.66 ± 0.34 | 21.26 ± 0.92 | 21.26 ± 0.92 |
![]() ![]() Command R 08-2024 32B CohereLabs | 40.20 ± 0.79 | 26.28 ± 3.32 | 26.28 ± 3.32 |
![]() ![]() phi-4 14B Microsoft | 39.41 ± 2.54 | 8.37 ± 4.44 | 8.37 ± 4.44 |
![]() ![]() Llama 3 8B Meta | 39.28 ± 0.54 | 33.86 ± 2.02 | 33.86 ± 2.02 |
![]() ![]() Babel 83B Alibaba-DAMO | 38.88 ± 4.47 | 41.63 ± 5.34 | 41.63 ± 5.34 |
![]() ![]() Aya Expanse 8B CohereLabs | 33.07 ± 0.56 | 25.20 ± 0.92 | 25.20 ± 0.92 |
![]() ![]() Ministral 2410 8B Mistral AI | 32.06 ± 1.64 | 20.47 ± 4.06 | 20.47 ± 4.06 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 30.14 ± 2.73 | 19.39 ± 2.94 | 19.39 ± 2.94 |
![]() ![]() Olmo 2 1124 7B AI2 | 26.04 ± 1.30 | 5.81 ± 3.69 | 5.81 ± 3.69 |
![]() ![]() Olmo 2 1124 13B AI2 | 25.67 ± 1.26 | 11.81 ± 3.87 | 11.81 ± 3.87 |