Thai Performance
Thai Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 80B MoE 58.09±0.09 |
![]() ![]() 32B 57.91±0.13 |
![]() ![]() 32B 56.98±0.15 |
![]() ![]() 30B MoE 55.77±0.11 |
![]() ![]() 14B 54.82±0.15 |
![]() ![]() 70B 54.60±0.15 |
![]() ![]() 70B 54.25±0.16 |
![]() ![]() 72B 53.80±0.16 |
![]() ![]() 27B 53.46±0.14 |
![]() ![]() 27B 52.88±0.12 |
![]() ![]() 12B 50.75±0.13 |
![]() ![]() 8B 50.66±0.14 |
![]() ![]() 70B 50.05±0.08 |
![]() ![]() 32B 50.00±0.12 |
![]() ![]() 27B 49.78±0.17 |
![]() ![]() 9B 48.95±0.18 |
![]() ![]() 109B MoE 48.44±0.08 |
![]() ![]() 70B 47.14±0.20 |
![]() ![]() 14B 46.32±0.13 |
![]() ![]() 123B 45.34±0.17 |
![]() ![]() 9B 44.65±0.17 |
![]() ![]() 21B MoE 42.75±0.18 |
![]() ![]() 8B 42.37±0.19 |
![]() ![]() 10B 41.93±0.15 |
![]() ![]() 7B 39.89±0.16 |
![]() ![]() 8B 39.85±0.17 |
![]() ![]() 70B 39.51±0.09 |
![]() ![]() 20B 37.12±0.15 |
![]() ![]() 8B 35.75±0.15 |
![]() ![]() 111B 34.58±0.21 |
![]() ![]() 8B 33.65±0.18 |
![]() ![]() 32B 32.99±0.22 |
![]() ![]() 104B 32.19±0.18 |
![]() ![]() 9B 32.17±0.22 |
![]() ![]() 24B 31.49±0.25 |
![]() ![]() 32B 30.47±0.18 |
![]() ![]() 7B 29.69±0.27 |
![]() ![]() 8B 29.35±0.30 |
![]() ![]() 70B 29.18±0.23 |
![]() ![]() 32B 27.84±0.22 |
![]() ![]() 14B 25.91±0.26 |
![]() ![]() 83B 25.76±0.30 |
![]() ![]() 8B 25.16±0.15 |
![]() ![]() 8B 17.22±0.13 |
![]() ![]() 8B 17.12±0.22 |
![]() ![]() 7B 14.64±0.20 |
![]() ![]() 13B 14.08±0.18 |
![]() ![]() 7B 11.14±0.19 |
Thai Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TH | Instruction Following | Multi-Turn Chat | NLG | NLR | NLU | Safety | Knowledge |
---|---|---|---|---|---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 58.09 ± 0.09 | 85.47 ± 0.41 | 62.08 ± 0.46 | 57.19 ± 0.03 | 64.32 ± 0.18 | 61.42 ± 0.08 | 38.30 ± 0.22 | 37.85 ± 0.32 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 57.91 ± 0.13 | 82.53 ± 0.71 | 39.55 ± 0.48 | 56.69 ± 0.05 | 70.79 ± 0.13 | 61.80 ± 0.13 | 41.54 ± 0.17 | 52.47 ± 0.16 |
![]() ![]() Qwen 3 32B Alibaba | 56.98 ± 0.15 | 78.67 ± 0.70 | 43.60 ± 0.55 | 57.08 ± 0.06 | 68.33 ± 0.15 | 61.46 ± 0.15 | 41.21 ± 0.22 | 48.54 ± 0.29 |
![]() ![]() Qwen 3 30B MoE Alibaba | 55.77 ± 0.11 | 81.50 ± 0.56 | 51.49 ± 0.57 | 55.78 ± 0.05 | 70.43 ± 0.16 | 60.37 ± 0.07 | 29.71 ± 0.09 | 41.11 ± 0.18 |
![]() ![]() Qwen 3 14B Alibaba | 54.82 ± 0.15 | 84.57 ± 0.73 | 36.26 ± 0.63 | 54.76 ± 0.05 | 57.57 ± 0.17 | 62.11 ± 0.13 | 44.07 ± 0.12 | 44.38 ± 0.20 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 54.60 ± 0.15 | 86.73 ± 0.73 | 21.04 ± 0.64 | 58.58 ± 0.07 | 71.78 ± 0.29 | 59.42 ± 0.27 | 33.82 ± 0.25 | 50.83 ± 0.35 |
![]() ![]() Tulu 3 70B AI2 | 54.25 ± 0.16 | 80.53 ± 0.75 | 21.46 ± 0.47 | 56.24 ± 0.07 | 73.98 ± 0.20 | 57.97 ± 0.24 | 42.23 ± 0.36 | 47.33 ± 0.35 |
![]() ![]() Qwen 2.5 72B Alibaba | 53.80 ± 0.16 | 82.47 ± 0.71 | 29.84 ± 0.56 | 57.69 ± 0.07 | 67.58 ± 0.16 | 62.28 ± 0.21 | 24.41 ± 0.19 | 52.36 ± 0.25 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 53.46 ± 0.14 | 79.63 ± 0.65 | 41.40 ± 0.52 | 58.31 ± 0.06 | 67.76 ± 0.15 | 61.79 ± 0.14 | 19.71 ± 0.17 | 45.60 ± 0.24 |
![]() ![]() Gemma 3 27B | 52.88 ± 0.12 | 76.33 ± 0.48 | 41.10 ± 0.59 | 58.49 ± 0.05 | 67.50 ± 0.17 | 62.55 ± 0.12 | 19.80 ± 0.15 | 44.39 ± 0.18 |
![]() ![]() Gemma 3 12B | 50.75 ± 0.13 | 74.43 ± 0.55 | 36.26 ± 0.56 | 57.95 ± 0.04 | 65.88 ± 0.21 | 60.60 ± 0.14 | 17.78 ± 0.14 | 42.34 ± 0.22 |
![]() ![]() Qwen 3 8B Alibaba | 50.66 ± 0.14 | 77.13 ± 0.62 | 28.83 ± 0.51 | 55.96 ± 0.06 | 48.08 ± 0.30 | 59.80 ± 0.18 | 44.91 ± 0.28 | 39.94 ± 0.31 |
![]() ![]() Llama 3.3 70B Meta | 50.05 ± 0.08 | 83.27 ± 0.45 | 8.56 ± 0.33 | 56.16 ± 0.06 | 72.58 ± 0.12 | 59.23 ± 0.20 | 22.06 ± 0.09 | 48.51 ± 0.15 |
![]() ![]() Qwen 2.5 32B Alibaba | 50.00 ± 0.12 | 80.57 ± 0.61 | 13.85 ± 0.42 | 55.82 ± 0.07 | 66.17 ± 0.09 | 62.54 ± 0.08 | 22.89 ± 0.10 | 48.18 ± 0.12 |
![]() ![]() Gemma 2 27B | 49.78 ± 0.17 | 71.53 ± 0.88 | 11.76 ± 0.59 | 55.24 ± 0.09 | 66.81 ± 0.22 | 61.38 ± 0.19 | 36.76 ± 0.24 | 44.97 ± 0.37 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 48.95 ± 0.18 | 72.03 ± 0.83 | 19.61 ± 0.52 | 55.82 ± 0.07 | 61.71 ± 0.26 | 57.50 ± 0.26 | 33.71 ± 0.25 | 42.27 ± 0.39 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 48.44 ± 0.08 | 86.97 ± 0.33 | 11.21 ± 0.40 | 58.48 ± 0.05 | 61.80 ± 0.07 | 62.42 ± 0.11 | 14.15 ± 0.08 | 44.05 ± 0.23 |
![]() ![]() Llama 3.1 70B Meta | 47.14 ± 0.20 | 71.07 ± 1.17 | 7.72 ± 0.35 | 57.19 ± 0.09 | 69.32 ± 0.32 | 56.65 ± 0.25 | 20.34 ± 0.29 | 47.69 ± 0.34 |
![]() ![]() Qwen 2.5 14B Alibaba | 46.32 ± 0.13 | 69.30 ± 0.91 | 12.73 ± 0.45 | 55.29 ± 0.05 | 56.04 ± 0.12 | 59.24 ± 0.19 | 28.14 ± 0.14 | 43.48 ± 0.23 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 45.34 ± 0.17 | 70.70 ± 1.21 | 12.70 ± 0.33 | 55.05 ± 0.09 | 55.08 ± 0.39 | 60.51 ± 0.29 | 24.51 ± 0.55 | 38.86 ± 0.48 |
![]() ![]() Gemma 2 9B | 44.65 ± 0.17 | 67.83 ± 0.97 | 5.93 ± 0.43 | 54.03 ± 0.09 | 58.94 ± 0.25 | 57.56 ± 0.22 | 28.22 ± 0.16 | 40.07 ± 0.42 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 42.75 ± 0.18 | 64.10 ± 1.06 | 13.58 ± 0.38 | 51.34 ± 0.07 | 50.37 ± 0.32 | 56.34 ± 0.16 | 29.44 ± 0.21 | 34.09 ± 0.31 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 42.37 ± 0.19 | 69.53 ± 0.93 | 17.86 ± 0.57 | 56.43 ± 0.06 | 54.50 ± 0.38 | 54.52 ± 0.26 | 14.88 ± 0.21 | 28.87 ± 0.47 |
![]() ![]() MERaLiON 2 10B A*STAR | 41.93 ± 0.15 | 63.10 ± 0.99 | 4.12 ± 0.30 | 51.99 ± 0.08 | 53.74 ± 0.26 | 54.24 ± 0.28 | 27.42 ± 0.20 | 38.89 ± 0.38 |
![]() ![]() Qwen 2.5 7B Alibaba | 39.89 ± 0.16 | 66.73 ± 0.91 | 11.49 ± 0.44 | 53.26 ± 0.07 | 52.36 ± 0.17 | 59.37 ± 0.23 | 0.00 ± 0.00 | 36.03 ± 0.27 |
![]() ![]() Tulu 3 8B AI2 | 39.85 ± 0.17 | 69.37 ± 1.04 | 5.57 ± 0.28 | 53.39 ± 0.07 | 39.86 ± 0.50 | 52.44 ± 0.19 | 34.42 ± 0.49 | 23.91 ± 0.34 |
![]() ![]() Llama 3 70B Meta | 39.51 ± 0.09 | 18.40 ± 0.45 | 6.32 ± 0.28 | 56.71 ± 0.06 | 61.46 ± 0.18 | 59.08 ± 0.16 | 31.73 ± 0.13 | 42.89 ± 0.24 |
![]() ![]() Sailor2 20B SAIL | 37.12 ± 0.15 | 39.83 ± 0.79 | 19.37 ± 0.51 | 19.63 ± 0.07 | 73.58 ± 0.10 | 43.51 ± 0.28 | 20.74 ± 0.09 | 43.19 ± 0.23 |
![]() ![]() Sailor2 8B SAIL | 35.75 ± 0.15 | 35.07 ± 0.64 | 21.03 ± 0.64 | 41.06 ± 0.05 | 64.04 ± 0.21 | 31.60 ± 0.25 | 22.44 ± 0.31 | 35.04 ± 0.30 |
![]() ![]() Command A 03-2025 111B CohereLabs | 34.58 ± 0.21 | 72.23 ± 0.90 | 8.62 ± 0.40 | 48.81 ± 0.10 | 34.50 ± 0.38 | 50.35 ± 0.45 | 0.00 ± 0.00 | 27.56 ± 0.53 |
![]() ![]() Llama 3.1 8B Meta | 33.65 ± 0.18 | 63.80 ± 1.08 | 5.60 ± 0.30 | 45.13 ± 0.08 | 37.28 ± 0.60 | 55.03 ± 0.22 | 0.77 ± 0.24 | 27.94 ± 0.41 |
![]() ![]() Olmo 2 0325 32B AI2 | 32.99 ± 0.22 | 62.33 ± 1.09 | 1.39 ± 0.22 | 29.75 ± 0.05 | 40.11 ± 0.50 | 40.64 ± 0.51 | 34.85 ± 0.45 | 21.85 ± 0.68 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 32.19 ± 0.18 | 52.17 ± 1.30 | 1.90 ± 0.23 | 45.09 ± 0.09 | 30.20 ± 0.57 | 48.56 ± 0.34 | 28.38 ± 0.40 | 19.03 ± 0.48 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.17 ± 0.22 | 44.83 ± 1.00 | 2.89 ± 0.34 | 34.66 ± 0.15 | 42.18 ± 0.56 | 53.21 ± 0.39 | 27.86 ± 0.84 | 19.57 ± 0.60 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 31.49 ± 0.25 | 51.83 ± 1.04 | 1.63 ± 0.16 | 42.17 ± 0.08 | 28.08 ± 0.54 | 47.08 ± 0.34 | 24.41 ± 0.46 | 25.20 ± 0.75 |
![]() ![]() Aya Expanse 32B CohereLabs | 30.47 ± 0.18 | 50.77 ± 0.95 | 2.68 ± 0.23 | 30.84 ± 0.04 | 36.12 ± 0.29 | 43.93 ± 0.33 | 32.50 ± 0.13 | 16.44 ± 0.26 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 29.69 ± 0.27 | 43.13 ± 1.21 | 7.01 ± 0.37 | 49.63 ± 0.10 | 25.03 ± 0.87 | 47.98 ± 0.47 | 20.54 ± 0.87 | 14.54 ± 0.76 |
![]() ![]() Apertus 8B Swiss AI | 29.35 ± 0.30 | 52.80 ± 1.23 | 3.17 ± 0.29 | 50.93 ± 0.11 | 24.27 ± 0.73 | 37.51 ± 0.43 | 24.55 ± 0.66 | 12.24 ± 0.95 |
![]() ![]() Apertus 70B Swiss AI | 29.18 ± 0.23 | 51.10 ± 1.22 | 12.83 ± 0.46 | 50.44 ± 0.09 | 10.36 ± 0.28 | 46.12 ± 0.47 | 25.43 ± 0.76 | 7.98 ± 0.83 |
![]() ![]() Command R 08-2024 32B CohereLabs | 27.84 ± 0.22 | 41.27 ± 1.35 | 1.58 ± 0.18 | 36.14 ± 0.10 | 28.59 ± 0.69 | 45.35 ± 0.43 | 28.60 ± 0.43 | 13.33 ± 0.53 |
![]() ![]() phi-4 14B Microsoft | 25.91 ± 0.26 | 50.30 ± 1.35 | 10.12 ± 0.42 | 40.26 ± 0.05 | 14.32 ± 0.60 | 48.46 ± 0.46 | 17.05 ± 0.41 | 0.85 ± 0.29 |
![]() ![]() Babel 83B Alibaba-DAMO | 25.76 ± 0.30 | 36.50 ± 1.52 | 4.01 ± 0.22 | 28.60 ± 0.15 | 35.10 ± 0.42 | 44.09 ± 0.52 | 0.10 ± 0.14 | 31.92 ± 0.85 |
![]() ![]() Llama 3 8B Meta | 25.16 ± 0.15 | 16.53 ± 0.61 | 2.64 ± 0.21 | 42.83 ± 0.08 | 26.07 ± 0.65 | 52.70 ± 0.17 | 11.47 ± 0.20 | 23.88 ± 0.39 |
![]() ![]() Aya Expanse 8B CohereLabs | 17.22 ± 0.13 | 34.73 ± 1.02 | 1.33 ± 0.20 | 20.88 ± 0.05 | 12.10 ± 0.38 | 31.74 ± 0.30 | 9.81 ± 0.51 | 9.96 ± 0.33 |
![]() ![]() Ministral 2410 8B Mistral AI | 17.12 ± 0.22 | 24.07 ± 1.09 | 1.32 ± 0.15 | 28.78 ± 0.16 | 6.81 ± 0.40 | 33.78 ± 0.49 | 17.31 ± 0.80 | 7.77 ± 0.75 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 14.64 ± 0.20 | 42.30 ± 0.89 | 0.64 ± 0.12 | 26.34 ± 0.12 | 4.26 ± 0.64 | 24.34 ± 0.42 | 0.00 ± 0.00 | 4.57 ± 0.51 |
![]() ![]() Olmo 2 1124 13B AI2 | 14.08 ± 0.18 | 39.00 ± 0.89 | 0.22 ± 0.08 | 22.11 ± 0.07 | 5.75 ± 0.47 | 31.48 ± 0.45 | 0.00 ± 0.00 | 0.01 ± 0.02 |
![]() ![]() Olmo 2 1124 7B AI2 | 11.14 ± 0.19 | 39.10 ± 1.37 | 0.22 ± 0.07 | 16.53 ± 0.06 | 0.35 ± 0.17 | 21.70 ± 0.52 | 0.07 ± 0.05 | 0.00 ± 0.00 |
Thai Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | TH | Instruction Following | SEA-IFEval |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 58.09 ± 0.09 | 85.47 ± 0.41 | 85.47 ± 0.41 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 57.91 ± 0.13 | 82.53 ± 0.71 | 82.53 ± 0.71 |
![]() ![]() Qwen 3 32B Alibaba | 56.98 ± 0.15 | 78.67 ± 0.70 | 78.67 ± 0.70 |
![]() ![]() Qwen 3 30B MoE Alibaba | 55.77 ± 0.11 | 81.50 ± 0.56 | 81.50 ± 0.56 |
![]() ![]() Qwen 3 14B Alibaba | 54.82 ± 0.15 | 84.57 ± 0.73 | 84.57 ± 0.73 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 54.60 ± 0.15 | 86.73 ± 0.73 | 86.73 ± 0.73 |
![]() ![]() Tulu 3 70B AI2 | 54.25 ± 0.16 | 80.53 ± 0.75 | 80.53 ± 0.75 |
![]() ![]() Qwen 2.5 72B Alibaba | 53.80 ± 0.16 | 82.47 ± 0.71 | 82.47 ± 0.71 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 53.46 ± 0.14 | 79.63 ± 0.65 | 79.63 ± 0.65 |
![]() ![]() Gemma 3 27B | 52.88 ± 0.12 | 76.33 ± 0.48 | 76.33 ± 0.48 |
![]() ![]() Gemma 3 12B | 50.75 ± 0.13 | 74.43 ± 0.55 | 74.43 ± 0.55 |
![]() ![]() Qwen 3 8B Alibaba | 50.66 ± 0.14 | 77.13 ± 0.62 | 77.13 ± 0.62 |
![]() ![]() Llama 3.3 70B Meta | 50.05 ± 0.08 | 83.27 ± 0.45 | 83.27 ± 0.45 |
![]() ![]() Qwen 2.5 32B Alibaba | 50.00 ± 0.12 | 80.57 ± 0.61 | 80.57 ± 0.61 |
![]() ![]() Gemma 2 27B | 49.78 ± 0.17 | 71.53 ± 0.88 | 71.53 ± 0.88 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 48.95 ± 0.18 | 72.03 ± 0.83 | 72.03 ± 0.83 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 48.44 ± 0.08 | 86.97 ± 0.33 | 86.97 ± 0.33 |
![]() ![]() Llama 3.1 70B Meta | 47.14 ± 0.20 | 71.07 ± 1.17 | 71.07 ± 1.17 |
![]() ![]() Qwen 2.5 14B Alibaba | 46.32 ± 0.13 | 69.30 ± 0.91 | 69.30 ± 0.91 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 45.34 ± 0.17 | 70.70 ± 1.21 | 70.70 ± 1.21 |
![]() ![]() Gemma 2 9B | 44.65 ± 0.17 | 67.83 ± 0.97 | 67.83 ± 0.97 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 42.75 ± 0.18 | 64.10 ± 1.06 | 64.10 ± 1.06 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 42.37 ± 0.19 | 69.53 ± 0.93 | 69.53 ± 0.93 |
![]() ![]() MERaLiON 2 10B A*STAR | 41.93 ± 0.15 | 63.10 ± 0.99 | 63.10 ± 0.99 |
![]() ![]() Qwen 2.5 7B Alibaba | 39.89 ± 0.16 | 66.73 ± 0.91 | 66.73 ± 0.91 |
![]() ![]() Tulu 3 8B AI2 | 39.85 ± 0.17 | 69.37 ± 1.04 | 69.37 ± 1.04 |
![]() ![]() Llama 3 70B Meta | 39.51 ± 0.09 | 18.40 ± 0.45 | 18.40 ± 0.45 |
![]() ![]() Sailor2 20B SAIL | 37.12 ± 0.15 | 39.83 ± 0.79 | 39.83 ± 0.79 |
![]() ![]() Sailor2 8B SAIL | 35.75 ± 0.15 | 35.07 ± 0.64 | 35.07 ± 0.64 |
![]() ![]() Command A 03-2025 111B CohereLabs | 34.58 ± 0.21 | 72.23 ± 0.90 | 72.23 ± 0.90 |
![]() ![]() Llama 3.1 8B Meta | 33.65 ± 0.18 | 63.80 ± 1.08 | 63.80 ± 1.08 |
![]() ![]() Olmo 2 0325 32B AI2 | 32.99 ± 0.22 | 62.33 ± 1.09 | 62.33 ± 1.09 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 32.19 ± 0.18 | 52.17 ± 1.30 | 52.17 ± 1.30 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.17 ± 0.22 | 44.83 ± 1.00 | 44.83 ± 1.00 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 31.49 ± 0.25 | 51.83 ± 1.04 | 51.83 ± 1.04 |
![]() ![]() Aya Expanse 32B CohereLabs | 30.47 ± 0.18 | 50.77 ± 0.95 | 50.77 ± 0.95 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 29.69 ± 0.27 | 43.13 ± 1.21 | 43.13 ± 1.21 |
![]() ![]() Apertus 8B Swiss AI | 29.35 ± 0.30 | 52.80 ± 1.23 | 52.80 ± 1.23 |
![]() ![]() Apertus 70B Swiss AI | 29.18 ± 0.23 | 51.10 ± 1.22 | 51.10 ± 1.22 |
![]() ![]() Command R 08-2024 32B CohereLabs | 27.84 ± 0.22 | 41.27 ± 1.35 | 41.27 ± 1.35 |
![]() ![]() phi-4 14B Microsoft | 25.91 ± 0.26 | 50.30 ± 1.35 | 50.30 ± 1.35 |
![]() ![]() Babel 83B Alibaba-DAMO | 25.76 ± 0.30 | 36.50 ± 1.52 | 36.50 ± 1.52 |
![]() ![]() Llama 3 8B Meta | 25.16 ± 0.15 | 16.53 ± 0.61 | 16.53 ± 0.61 |
![]() ![]() Aya Expanse 8B CohereLabs | 17.22 ± 0.13 | 34.73 ± 1.02 | 34.73 ± 1.02 |
![]() ![]() Ministral 2410 8B Mistral AI | 17.12 ± 0.22 | 24.07 ± 1.09 | 24.07 ± 1.09 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 14.64 ± 0.20 | 42.30 ± 0.89 | 42.30 ± 0.89 |
![]() ![]() Olmo 2 1124 13B AI2 | 14.08 ± 0.18 | 39.00 ± 0.89 | 39.00 ± 0.89 |
![]() ![]() Olmo 2 1124 7B AI2 | 11.14 ± 0.19 | 39.10 ± 1.37 | 39.10 ± 1.37 |
Model | TH | Multi-Turn Chat | SEA-MT-Bench |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 58.09 ± 0.09 | 62.08 ± 0.46 | 62.08 ± 0.46 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 57.91 ± 0.13 | 39.55 ± 0.48 | 39.55 ± 0.48 |
![]() ![]() Qwen 3 32B Alibaba | 56.98 ± 0.15 | 43.60 ± 0.55 | 43.60 ± 0.55 |
![]() ![]() Qwen 3 30B MoE Alibaba | 55.77 ± 0.11 | 51.49 ± 0.57 | 51.49 ± 0.57 |
![]() ![]() Qwen 3 14B Alibaba | 54.82 ± 0.15 | 36.26 ± 0.63 | 36.26 ± 0.63 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 54.60 ± 0.15 | 21.04 ± 0.64 | 21.04 ± 0.64 |
![]() ![]() Tulu 3 70B AI2 | 54.25 ± 0.16 | 21.46 ± 0.47 | 21.46 ± 0.47 |
![]() ![]() Qwen 2.5 72B Alibaba | 53.80 ± 0.16 | 29.84 ± 0.56 | 29.84 ± 0.56 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 53.46 ± 0.14 | 41.40 ± 0.52 | 41.40 ± 0.52 |
![]() ![]() Gemma 3 27B | 52.88 ± 0.12 | 41.10 ± 0.59 | 41.10 ± 0.59 |
![]() ![]() Gemma 3 12B | 50.75 ± 0.13 | 36.26 ± 0.56 | 36.26 ± 0.56 |
![]() ![]() Qwen 3 8B Alibaba | 50.66 ± 0.14 | 28.83 ± 0.51 | 28.83 ± 0.51 |
![]() ![]() Llama 3.3 70B Meta | 50.05 ± 0.08 | 8.56 ± 0.33 | 8.56 ± 0.33 |
![]() ![]() Qwen 2.5 32B Alibaba | 50.00 ± 0.12 | 13.85 ± 0.42 | 13.85 ± 0.42 |
![]() ![]() Gemma 2 27B | 49.78 ± 0.17 | 11.76 ± 0.59 | 11.76 ± 0.59 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 48.95 ± 0.18 | 19.61 ± 0.52 | 19.61 ± 0.52 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 48.44 ± 0.08 | 11.21 ± 0.40 | 11.21 ± 0.40 |
![]() ![]() Llama 3.1 70B Meta | 47.14 ± 0.20 | 7.72 ± 0.35 | 7.72 ± 0.35 |
![]() ![]() Qwen 2.5 14B Alibaba | 46.32 ± 0.13 | 12.73 ± 0.45 | 12.73 ± 0.45 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 45.34 ± 0.17 | 12.70 ± 0.33 | 12.70 ± 0.33 |
![]() ![]() Gemma 2 9B | 44.65 ± 0.17 | 5.93 ± 0.43 | 5.93 ± 0.43 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 42.75 ± 0.18 | 13.58 ± 0.38 | 13.58 ± 0.38 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 42.37 ± 0.19 | 17.86 ± 0.57 | 17.86 ± 0.57 |
![]() ![]() MERaLiON 2 10B A*STAR | 41.93 ± 0.15 | 4.12 ± 0.30 | 4.12 ± 0.30 |
![]() ![]() Qwen 2.5 7B Alibaba | 39.89 ± 0.16 | 11.49 ± 0.44 | 11.49 ± 0.44 |
![]() ![]() Tulu 3 8B AI2 | 39.85 ± 0.17 | 5.57 ± 0.28 | 5.57 ± 0.28 |
![]() ![]() Llama 3 70B Meta | 39.51 ± 0.09 | 6.32 ± 0.28 | 6.32 ± 0.28 |
![]() ![]() Sailor2 20B SAIL | 37.12 ± 0.15 | 19.37 ± 0.51 | 19.37 ± 0.51 |
![]() ![]() Sailor2 8B SAIL | 35.75 ± 0.15 | 21.03 ± 0.64 | 21.03 ± 0.64 |
![]() ![]() Command A 03-2025 111B CohereLabs | 34.58 ± 0.21 | 8.62 ± 0.40 | 8.62 ± 0.40 |
![]() ![]() Llama 3.1 8B Meta | 33.65 ± 0.18 | 5.60 ± 0.30 | 5.60 ± 0.30 |
![]() ![]() Olmo 2 0325 32B AI2 | 32.99 ± 0.22 | 1.39 ± 0.22 | 1.39 ± 0.22 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 32.19 ± 0.18 | 1.90 ± 0.23 | 1.90 ± 0.23 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.17 ± 0.22 | 2.89 ± 0.34 | 2.89 ± 0.34 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 31.49 ± 0.25 | 1.63 ± 0.16 | 1.63 ± 0.16 |
![]() ![]() Aya Expanse 32B CohereLabs | 30.47 ± 0.18 | 2.68 ± 0.23 | 2.68 ± 0.23 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 29.69 ± 0.27 | 7.01 ± 0.37 | 7.01 ± 0.37 |
![]() ![]() Apertus 8B Swiss AI | 29.35 ± 0.30 | 3.17 ± 0.29 | 3.17 ± 0.29 |
![]() ![]() Apertus 70B Swiss AI | 29.18 ± 0.23 | 12.83 ± 0.46 | 12.83 ± 0.46 |
![]() ![]() Command R 08-2024 32B CohereLabs | 27.84 ± 0.22 | 1.58 ± 0.18 | 1.58 ± 0.18 |
![]() ![]() phi-4 14B Microsoft | 25.91 ± 0.26 | 10.12 ± 0.42 | 10.12 ± 0.42 |
![]() ![]() Babel 83B Alibaba-DAMO | 25.76 ± 0.30 | 4.01 ± 0.22 | 4.01 ± 0.22 |
![]() ![]() Llama 3 8B Meta | 25.16 ± 0.15 | 2.64 ± 0.21 | 2.64 ± 0.21 |
![]() ![]() Aya Expanse 8B CohereLabs | 17.22 ± 0.13 | 1.33 ± 0.20 | 1.33 ± 0.20 |
![]() ![]() Ministral 2410 8B Mistral AI | 17.12 ± 0.22 | 1.32 ± 0.15 | 1.32 ± 0.15 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 14.64 ± 0.20 | 0.64 ± 0.12 | 0.64 ± 0.12 |
![]() ![]() Olmo 2 1124 13B AI2 | 14.08 ± 0.18 | 0.22 ± 0.08 | 0.22 ± 0.08 |
![]() ![]() Olmo 2 1124 7B AI2 | 11.14 ± 0.19 | 0.22 ± 0.07 | 0.22 ± 0.07 |
Model | TH | NLG | Summarization | Translations |
---|---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 58.09 ± 0.09 | 57.19 ± 0.03 | 22.61 ± 0.07 | 91.78 ± 0.02 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 57.91 ± 0.13 | 56.69 ± 0.05 | 23.30 ± 0.09 | 90.07 ± 0.04 |
![]() ![]() Qwen 3 32B Alibaba | 56.98 ± 0.15 | 57.08 ± 0.06 | 23.68 ± 0.11 | 90.49 ± 0.03 |
![]() ![]() Qwen 3 30B MoE Alibaba | 55.77 ± 0.11 | 55.78 ± 0.05 | 21.32 ± 0.08 | 90.23 ± 0.03 |
![]() ![]() Qwen 3 14B Alibaba | 54.82 ± 0.15 | 54.76 ± 0.05 | 23.04 ± 0.10 | 86.48 ± 0.03 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 54.60 ± 0.15 | 58.58 ± 0.07 | 25.43 ± 0.14 | 91.72 ± 0.02 |
![]() ![]() Tulu 3 70B AI2 | 54.25 ± 0.16 | 56.24 ± 0.07 | 21.70 ± 0.14 | 90.77 ± 0.02 |
![]() ![]() Qwen 2.5 72B Alibaba | 53.80 ± 0.16 | 57.69 ± 0.07 | 24.08 ± 0.11 | 91.29 ± 0.03 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 53.46 ± 0.14 | 58.31 ± 0.06 | 23.69 ± 0.11 | 92.94 ± 0.01 |
![]() ![]() Gemma 3 27B | 52.88 ± 0.12 | 58.49 ± 0.05 | 23.95 ± 0.10 | 93.03 ± 0.01 |
![]() ![]() Gemma 3 12B | 50.75 ± 0.13 | 57.95 ± 0.04 | 23.66 ± 0.07 | 92.24 ± 0.02 |
![]() ![]() Qwen 3 8B Alibaba | 50.66 ± 0.14 | 55.96 ± 0.06 | 23.03 ± 0.12 | 88.89 ± 0.03 |
![]() ![]() Llama 3.3 70B Meta | 50.05 ± 0.08 | 56.16 ± 0.06 | 23.02 ± 0.12 | 89.30 ± 0.04 |
![]() ![]() Qwen 2.5 32B Alibaba | 50.00 ± 0.12 | 55.82 ± 0.07 | 23.67 ± 0.12 | 87.97 ± 0.04 |
![]() ![]() Gemma 2 27B | 49.78 ± 0.17 | 55.24 ± 0.09 | 25.09 ± 0.17 | 85.38 ± 0.05 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 48.95 ± 0.18 | 55.82 ± 0.07 | 24.01 ± 0.14 | 87.63 ± 0.05 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 48.44 ± 0.08 | 58.48 ± 0.05 | 25.85 ± 0.11 | 91.12 ± 0.02 |
![]() ![]() Llama 3.1 70B Meta | 47.14 ± 0.20 | 57.19 ± 0.09 | 26.15 ± 0.16 | 88.23 ± 0.05 |
![]() ![]() Qwen 2.5 14B Alibaba | 46.32 ± 0.13 | 55.29 ± 0.05 | 23.65 ± 0.09 | 86.94 ± 0.06 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 45.34 ± 0.17 | 55.05 ± 0.09 | 22.85 ± 0.17 | 87.24 ± 0.06 |
![]() ![]() Gemma 2 9B | 44.65 ± 0.17 | 54.03 ± 0.09 | 25.10 ± 0.16 | 82.96 ± 0.08 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 42.75 ± 0.18 | 51.34 ± 0.07 | 19.32 ± 0.13 | 83.36 ± 0.05 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 42.37 ± 0.19 | 56.43 ± 0.06 | 23.92 ± 0.11 | 88.94 ± 0.05 |
![]() ![]() MERaLiON 2 10B A*STAR | 41.93 ± 0.15 | 51.99 ± 0.08 | 23.75 ± 0.16 | 80.22 ± 0.10 |
![]() ![]() Qwen 2.5 7B Alibaba | 39.89 ± 0.16 | 53.26 ± 0.07 | 22.96 ± 0.10 | 83.56 ± 0.06 |
![]() ![]() Tulu 3 8B AI2 | 39.85 ± 0.17 | 53.39 ± 0.07 | 23.87 ± 0.13 | 82.90 ± 0.08 |
![]() ![]() Llama 3 70B Meta | 39.51 ± 0.09 | 56.71 ± 0.06 | 24.44 ± 0.12 | 88.98 ± 0.04 |
![]() ![]() Sailor2 20B SAIL | 37.12 ± 0.15 | 19.63 ± 0.07 | 0.00 ± 0.00 | 39.27 ± 0.14 |
![]() ![]() Sailor2 8B SAIL | 35.75 ± 0.15 | 41.06 ± 0.05 | 0.00 ± 0.00 | 82.13 ± 0.09 |
![]() ![]() Command A 03-2025 111B CohereLabs | 34.58 ± 0.21 | 48.81 ± 0.10 | 21.18 ± 0.16 | 76.44 ± 0.10 |
![]() ![]() Llama 3.1 8B Meta | 33.65 ± 0.18 | 45.13 ± 0.08 | 25.15 ± 0.13 | 65.11 ± 0.11 |
![]() ![]() Olmo 2 0325 32B AI2 | 32.99 ± 0.22 | 29.75 ± 0.05 | 0.00 ± 0.00 | 59.50 ± 0.10 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 32.19 ± 0.18 | 45.09 ± 0.09 | 19.53 ± 0.15 | 70.65 ± 0.10 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.17 ± 0.22 | 34.66 ± 0.15 | 20.62 ± 0.18 | 48.69 ± 0.21 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 31.49 ± 0.25 | 42.17 ± 0.08 | 20.35 ± 0.13 | 63.99 ± 0.13 |
![]() ![]() Aya Expanse 32B CohereLabs | 30.47 ± 0.18 | 30.84 ± 0.04 | 0.00 ± 0.00 | 61.68 ± 0.09 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 29.69 ± 0.27 | 49.63 ± 0.10 | 21.66 ± 0.19 | 77.59 ± 0.10 |
![]() ![]() Apertus 8B Swiss AI | 29.35 ± 0.30 | 50.93 ± 0.11 | 21.88 ± 0.20 | 79.98 ± 0.10 |
![]() ![]() Apertus 70B Swiss AI | 29.18 ± 0.23 | 50.44 ± 0.09 | 19.80 ± 0.12 | 81.07 ± 0.14 |
![]() ![]() Command R 08-2024 32B CohereLabs | 27.84 ± 0.22 | 36.14 ± 0.10 | 18.37 ± 0.14 | 53.92 ± 0.13 |
![]() ![]() phi-4 14B Microsoft | 25.91 ± 0.26 | 40.26 ± 0.05 | 18.72 ± 0.08 | 61.80 ± 0.12 |
![]() ![]() Babel 83B Alibaba-DAMO | 25.76 ± 0.30 | 28.60 ± 0.15 | 19.98 ± 0.21 | 37.22 ± 0.23 |
![]() ![]() Llama 3 8B Meta | 25.16 ± 0.15 | 42.83 ± 0.08 | 22.97 ± 0.14 | 62.68 ± 0.04 |
![]() ![]() Aya Expanse 8B CohereLabs | 17.22 ± 0.13 | 20.88 ± 0.05 | 0.00 ± 0.00 | 41.76 ± 0.09 |
![]() ![]() Ministral 2410 8B Mistral AI | 17.12 ± 0.22 | 28.78 ± 0.16 | 14.87 ± 0.30 | 42.68 ± 0.14 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 14.64 ± 0.20 | 26.34 ± 0.12 | 15.44 ± 0.17 | 37.23 ± 0.14 |
![]() ![]() Olmo 2 1124 13B AI2 | 14.08 ± 0.18 | 22.11 ± 0.07 | 0.00 ± 0.00 | 44.22 ± 0.14 |
![]() ![]() Olmo 2 1124 7B AI2 | 11.14 ± 0.19 | 16.53 ± 0.06 | 0.00 ± 0.00 | 33.05 ± 0.12 |
Model | TH | NLR | Causal Reasoning | Natural Language Inference |
---|---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 58.09 ± 0.09 | 64.32 ± 0.18 | 89.60 ± 0.22 | 39.05 ± 0.25 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 57.91 ± 0.13 | 70.79 ± 0.13 | 91.83 ± 0.21 | 49.75 ± 0.15 |
![]() ![]() Qwen 3 32B Alibaba | 56.98 ± 0.15 | 68.33 ± 0.15 | 89.40 ± 0.22 | 47.26 ± 0.23 |
![]() ![]() Qwen 3 30B MoE Alibaba | 55.77 ± 0.11 | 70.43 ± 0.16 | 87.59 ± 0.23 | 53.27 ± 0.15 |
![]() ![]() Qwen 3 14B Alibaba | 54.82 ± 0.15 | 57.57 ± 0.17 | 79.95 ± 0.27 | 35.19 ± 0.21 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 54.60 ± 0.15 | 71.78 ± 0.29 | 89.11 ± 0.47 | 54.45 ± 0.41 |
![]() ![]() Tulu 3 70B AI2 | 54.25 ± 0.16 | 73.98 ± 0.20 | 90.43 ± 0.28 | 57.53 ± 0.30 |
![]() ![]() Qwen 2.5 72B Alibaba | 53.80 ± 0.16 | 67.58 ± 0.16 | 93.47 ± 0.20 | 41.68 ± 0.22 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 53.46 ± 0.14 | 67.76 ± 0.15 | 87.44 ± 0.25 | 48.08 ± 0.19 |
![]() ![]() Gemma 3 27B | 52.88 ± 0.12 | 67.50 ± 0.17 | 87.69 ± 0.27 | 47.31 ± 0.20 |
![]() ![]() Gemma 3 12B | 50.75 ± 0.13 | 65.88 ± 0.21 | 84.99 ± 0.27 | 46.76 ± 0.24 |
![]() ![]() Qwen 3 8B Alibaba | 50.66 ± 0.14 | 48.08 ± 0.30 | 60.41 ± 0.46 | 35.74 ± 0.35 |
![]() ![]() Llama 3.3 70B Meta | 50.05 ± 0.08 | 72.58 ± 0.12 | 91.56 ± 0.20 | 53.59 ± 0.13 |
![]() ![]() Qwen 2.5 32B Alibaba | 50.00 ± 0.12 | 66.17 ± 0.09 | 89.44 ± 0.12 | 42.89 ± 0.16 |
![]() ![]() Gemma 2 27B | 49.78 ± 0.17 | 66.81 ± 0.22 | 86.25 ± 0.34 | 47.36 ± 0.33 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 48.95 ± 0.18 | 61.71 ± 0.26 | 84.60 ± 0.38 | 38.82 ± 0.31 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 48.44 ± 0.08 | 61.80 ± 0.07 | 87.21 ± 0.10 | 36.38 ± 0.10 |
![]() ![]() Llama 3.1 70B Meta | 47.14 ± 0.20 | 69.32 ± 0.32 | 89.21 ± 0.44 | 49.43 ± 0.39 |
![]() ![]() Qwen 2.5 14B Alibaba | 46.32 ± 0.13 | 56.04 ± 0.12 | 85.35 ± 0.08 | 26.74 ± 0.21 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 45.34 ± 0.17 | 55.08 ± 0.39 | 70.13 ± 0.71 | 40.03 ± 0.41 |
![]() ![]() Gemma 2 9B | 44.65 ± 0.17 | 58.94 ± 0.25 | 82.97 ± 0.44 | 34.91 ± 0.32 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 42.75 ± 0.18 | 50.37 ± 0.32 | 59.93 ± 0.44 | 40.81 ± 0.45 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 42.37 ± 0.19 | 54.50 ± 0.38 | 66.48 ± 0.74 | 42.51 ± 0.43 |
![]() ![]() MERaLiON 2 10B A*STAR | 41.93 ± 0.15 | 53.74 ± 0.26 | 81.44 ± 0.43 | 26.04 ± 0.43 |
![]() ![]() Qwen 2.5 7B Alibaba | 39.89 ± 0.16 | 52.36 ± 0.17 | 70.31 ± 0.20 | 34.42 ± 0.21 |
![]() ![]() Tulu 3 8B AI2 | 39.85 ± 0.17 | 39.86 ± 0.50 | 46.07 ± 0.97 | 33.65 ± 0.39 |
![]() ![]() Llama 3 70B Meta | 39.51 ± 0.09 | 61.46 ± 0.18 | 78.47 ± 0.35 | 44.46 ± 0.14 |
![]() ![]() Sailor2 20B SAIL | 37.12 ± 0.15 | 73.58 ± 0.10 | 89.13 ± 0.14 | 58.02 ± 0.15 |
![]() ![]() Sailor2 8B SAIL | 35.75 ± 0.15 | 64.04 ± 0.21 | 86.11 ± 0.30 | 41.98 ± 0.26 |
![]() ![]() Command A 03-2025 111B CohereLabs | 34.58 ± 0.21 | 34.50 ± 0.38 | 69.00 ± 0.76 | 0.00 ± 0.00 |
![]() ![]() Llama 3.1 8B Meta | 33.65 ± 0.18 | 37.28 ± 0.60 | 42.65 ± 1.06 | 31.91 ± 0.52 |
![]() ![]() Olmo 2 0325 32B AI2 | 32.99 ± 0.22 | 40.11 ± 0.50 | 48.59 ± 0.89 | 31.63 ± 0.56 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 32.19 ± 0.18 | 30.20 ± 0.57 | 30.41 ± 0.91 | 29.98 ± 0.46 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.17 ± 0.22 | 42.18 ± 0.56 | 67.44 ± 0.89 | 16.92 ± 0.69 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 31.49 ± 0.25 | 28.08 ± 0.54 | 42.79 ± 0.93 | 13.38 ± 0.34 |
![]() ![]() Aya Expanse 32B CohereLabs | 30.47 ± 0.18 | 36.12 ± 0.29 | 38.29 ± 0.49 | 33.94 ± 0.33 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 29.69 ± 0.27 | 25.03 ± 0.87 | 34.84 ± 1.42 | 15.22 ± 0.93 |
![]() ![]() Apertus 8B Swiss AI | 29.35 ± 0.30 | 24.27 ± 0.73 | 39.88 ± 1.15 | 8.65 ± 0.80 |
![]() ![]() Apertus 70B Swiss AI | 29.18 ± 0.23 | 10.36 ± 0.28 | 0.00 ± 0.00 | 20.72 ± 0.56 |
![]() ![]() Command R 08-2024 32B CohereLabs | 27.84 ± 0.22 | 28.59 ± 0.69 | 36.71 ± 1.23 | 20.47 ± 0.61 |
![]() ![]() phi-4 14B Microsoft | 25.91 ± 0.26 | 14.32 ± 0.60 | 28.64 ± 1.19 | 0.00 ± 0.00 |
![]() ![]() Babel 83B Alibaba-DAMO | 25.76 ± 0.30 | 35.10 ± 0.42 | 70.09 ± 0.80 | 0.10 ± 0.14 |
![]() ![]() Llama 3 8B Meta | 25.16 ± 0.15 | 26.07 ± 0.65 | 33.49 ± 1.25 | 18.64 ± 0.31 |
![]() ![]() Aya Expanse 8B CohereLabs | 17.22 ± 0.13 | 12.10 ± 0.38 | 10.31 ± 0.68 | 13.89 ± 0.30 |
![]() ![]() Ministral 2410 8B Mistral AI | 17.12 ± 0.22 | 6.81 ± 0.40 | 0.00 ± 0.00 | 13.63 ± 0.81 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 14.64 ± 0.20 | 4.26 ± 0.64 | 7.47 ± 1.14 | 1.05 ± 0.49 |
![]() ![]() Olmo 2 1124 13B AI2 | 14.08 ± 0.18 | 5.75 ± 0.47 | 2.75 ± 0.66 | 8.75 ± 0.58 |
![]() ![]() Olmo 2 1124 7B AI2 | 11.14 ± 0.19 | 0.35 ± 0.17 | 0.00 ± 0.00 | 0.70 ± 0.34 |
Model | TH | NLU | Question Answering | Sentiment Analysis |
---|---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 58.09 ± 0.09 | 61.42 ± 0.08 | 85.40 ± 0.11 | 37.45 ± 0.14 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 57.91 ± 0.13 | 61.80 ± 0.13 | 87.85 ± 0.15 | 35.76 ± 0.20 |
![]() ![]() Qwen 3 32B Alibaba | 56.98 ± 0.15 | 61.46 ± 0.15 | 86.83 ± 0.17 | 36.09 ± 0.20 |
![]() ![]() Qwen 3 30B MoE Alibaba | 55.77 ± 0.11 | 60.37 ± 0.07 | 85.92 ± 0.07 | 34.82 ± 0.14 |
![]() ![]() Qwen 3 14B Alibaba | 54.82 ± 0.15 | 62.11 ± 0.13 | 88.24 ± 0.19 | 35.98 ± 0.16 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 54.60 ± 0.15 | 59.42 ± 0.27 | 85.91 ± 0.41 | 32.94 ± 0.29 |
![]() ![]() Tulu 3 70B AI2 | 54.25 ± 0.16 | 57.97 ± 0.24 | 85.90 ± 0.36 | 30.03 ± 0.35 |
![]() ![]() Qwen 2.5 72B Alibaba | 53.80 ± 0.16 | 62.28 ± 0.21 | 84.84 ± 0.27 | 39.72 ± 0.23 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 53.46 ± 0.14 | 61.79 ± 0.14 | 85.53 ± 0.18 | 38.04 ± 0.23 |
![]() ![]() Gemma 3 27B | 52.88 ± 0.12 | 62.55 ± 0.12 | 86.26 ± 0.16 | 38.83 ± 0.19 |
![]() ![]() Gemma 3 12B | 50.75 ± 0.13 | 60.60 ± 0.14 | 82.30 ± 0.15 | 38.90 ± 0.23 |
![]() ![]() Qwen 3 8B Alibaba | 50.66 ± 0.14 | 59.80 ± 0.18 | 83.90 ± 0.31 | 35.70 ± 0.18 |
![]() ![]() Llama 3.3 70B Meta | 50.05 ± 0.08 | 59.23 ± 0.20 | 85.72 ± 0.33 | 32.73 ± 0.21 |
![]() ![]() Qwen 2.5 32B Alibaba | 50.00 ± 0.12 | 62.54 ± 0.08 | 87.36 ± 0.12 | 37.73 ± 0.13 |
![]() ![]() Gemma 2 27B | 49.78 ± 0.17 | 61.38 ± 0.19 | 85.49 ± 0.39 | 37.26 ± 0.22 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 48.95 ± 0.18 | 57.50 ± 0.26 | 82.69 ± 0.39 | 32.31 ± 0.30 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 48.44 ± 0.08 | 62.42 ± 0.11 | 88.72 ± 0.18 | 36.12 ± 0.09 |
![]() ![]() Llama 3.1 70B Meta | 47.14 ± 0.20 | 56.65 ± 0.25 | 83.04 ± 0.46 | 30.26 ± 0.31 |
![]() ![]() Qwen 2.5 14B Alibaba | 46.32 ± 0.13 | 59.24 ± 0.19 | 82.09 ± 0.30 | 36.39 ± 0.23 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 45.34 ± 0.17 | 60.51 ± 0.29 | 86.18 ± 0.56 | 34.83 ± 0.55 |
![]() ![]() Gemma 2 9B | 44.65 ± 0.17 | 57.56 ± 0.22 | 84.09 ± 0.37 | 31.03 ± 0.21 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 42.75 ± 0.18 | 56.34 ± 0.16 | 79.96 ± 0.28 | 32.73 ± 0.20 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 42.37 ± 0.19 | 54.52 ± 0.26 | 81.33 ± 0.37 | 27.70 ± 0.33 |
![]() ![]() MERaLiON 2 10B A*STAR | 41.93 ± 0.15 | 54.24 ± 0.28 | 78.95 ± 0.45 | 29.52 ± 0.27 |
![]() ![]() Qwen 2.5 7B Alibaba | 39.89 ± 0.16 | 59.37 ± 0.23 | 87.11 ± 0.30 | 31.64 ± 0.32 |
![]() ![]() Tulu 3 8B AI2 | 39.85 ± 0.17 | 52.44 ± 0.19 | 81.50 ± 0.37 | 23.39 ± 0.25 |
![]() ![]() Llama 3 70B Meta | 39.51 ± 0.09 | 59.08 ± 0.16 | 85.87 ± 0.24 | 32.30 ± 0.20 |
![]() ![]() Sailor2 20B SAIL | 37.12 ± 0.15 | 43.51 ± 0.28 | 50.01 ± 0.45 | 37.01 ± 0.27 |
![]() ![]() Sailor2 8B SAIL | 35.75 ± 0.15 | 31.60 ± 0.25 | 63.19 ± 0.50 | 0.00 ± 0.00 |
![]() ![]() Command A 03-2025 111B CohereLabs | 34.58 ± 0.21 | 50.35 ± 0.45 | 80.27 ± 0.85 | 20.44 ± 0.59 |
![]() ![]() Llama 3.1 8B Meta | 33.65 ± 0.18 | 55.03 ± 0.22 | 86.37 ± 0.39 | 23.69 ± 0.27 |
![]() ![]() Olmo 2 0325 32B AI2 | 32.99 ± 0.22 | 40.64 ± 0.51 | 71.22 ± 0.72 | 10.05 ± 0.75 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 32.19 ± 0.18 | 48.56 ± 0.34 | 76.05 ± 0.43 | 21.07 ± 0.52 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.17 ± 0.22 | 53.21 ± 0.39 | 83.05 ± 0.61 | 23.37 ± 0.38 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 31.49 ± 0.25 | 47.08 ± 0.34 | 74.23 ± 0.69 | 19.92 ± 0.34 |
![]() ![]() Aya Expanse 32B CohereLabs | 30.47 ± 0.18 | 43.93 ± 0.33 | 63.87 ± 0.58 | 24.00 ± 0.42 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 29.69 ± 0.27 | 47.98 ± 0.47 | 74.77 ± 0.60 | 21.19 ± 0.65 |
![]() ![]() Apertus 8B Swiss AI | 29.35 ± 0.30 | 37.51 ± 0.43 | 75.02 ± 0.86 | 0.00 ± 0.00 |
![]() ![]() Apertus 70B Swiss AI | 29.18 ± 0.23 | 46.12 ± 0.47 | 70.48 ± 0.72 | 21.76 ± 0.72 |
![]() ![]() Command R 08-2024 32B CohereLabs | 27.84 ± 0.22 | 45.35 ± 0.43 | 71.18 ± 0.60 | 19.53 ± 0.49 |
![]() ![]() phi-4 14B Microsoft | 25.91 ± 0.26 | 48.46 ± 0.46 | 72.12 ± 0.57 | 24.79 ± 0.66 |
![]() ![]() Babel 83B Alibaba-DAMO | 25.76 ± 0.30 | 44.09 ± 0.52 | 68.61 ± 0.69 | 19.58 ± 0.53 |
![]() ![]() Llama 3 8B Meta | 25.16 ± 0.15 | 52.70 ± 0.17 | 80.98 ± 0.25 | 24.42 ± 0.24 |
![]() ![]() Aya Expanse 8B CohereLabs | 17.22 ± 0.13 | 31.74 ± 0.30 | 53.95 ± 0.51 | 9.53 ± 0.46 |
![]() ![]() Ministral 2410 8B Mistral AI | 17.12 ± 0.22 | 33.78 ± 0.49 | 60.72 ± 1.03 | 6.84 ± 0.49 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 14.64 ± 0.20 | 24.34 ± 0.42 | 48.68 ± 0.84 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 13B AI2 | 14.08 ± 0.18 | 31.48 ± 0.45 | 62.96 ± 0.91 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 7B AI2 | 11.14 ± 0.19 | 21.70 ± 0.52 | 41.91 ± 0.94 | 1.50 ± 0.54 |
Model | TH | Safety | Toxicity Detection |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 58.09 ± 0.09 | 38.30 ± 0.22 | 38.30 ± 0.22 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 57.91 ± 0.13 | 41.54 ± 0.17 | 41.54 ± 0.17 |
![]() ![]() Qwen 3 32B Alibaba | 56.98 ± 0.15 | 41.21 ± 0.22 | 41.21 ± 0.22 |
![]() ![]() Qwen 3 30B MoE Alibaba | 55.77 ± 0.11 | 29.71 ± 0.09 | 29.71 ± 0.09 |
![]() ![]() Qwen 3 14B Alibaba | 54.82 ± 0.15 | 44.07 ± 0.12 | 44.07 ± 0.12 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 54.60 ± 0.15 | 33.82 ± 0.25 | 33.82 ± 0.25 |
![]() ![]() Tulu 3 70B AI2 | 54.25 ± 0.16 | 42.23 ± 0.36 | 42.23 ± 0.36 |
![]() ![]() Qwen 2.5 72B Alibaba | 53.80 ± 0.16 | 24.41 ± 0.19 | 24.41 ± 0.19 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 53.46 ± 0.14 | 19.71 ± 0.17 | 19.71 ± 0.17 |
![]() ![]() Gemma 3 27B | 52.88 ± 0.12 | 19.80 ± 0.15 | 19.80 ± 0.15 |
![]() ![]() Gemma 3 12B | 50.75 ± 0.13 | 17.78 ± 0.14 | 17.78 ± 0.14 |
![]() ![]() Qwen 3 8B Alibaba | 50.66 ± 0.14 | 44.91 ± 0.28 | 44.91 ± 0.28 |
![]() ![]() Llama 3.3 70B Meta | 50.05 ± 0.08 | 22.06 ± 0.09 | 22.06 ± 0.09 |
![]() ![]() Qwen 2.5 32B Alibaba | 50.00 ± 0.12 | 22.89 ± 0.10 | 22.89 ± 0.10 |
![]() ![]() Gemma 2 27B | 49.78 ± 0.17 | 36.76 ± 0.24 | 36.76 ± 0.24 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 48.95 ± 0.18 | 33.71 ± 0.25 | 33.71 ± 0.25 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 48.44 ± 0.08 | 14.15 ± 0.08 | 14.15 ± 0.08 |
![]() ![]() Llama 3.1 70B Meta | 47.14 ± 0.20 | 20.34 ± 0.29 | 20.34 ± 0.29 |
![]() ![]() Qwen 2.5 14B Alibaba | 46.32 ± 0.13 | 28.14 ± 0.14 | 28.14 ± 0.14 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 45.34 ± 0.17 | 24.51 ± 0.55 | 24.51 ± 0.55 |
![]() ![]() Gemma 2 9B | 44.65 ± 0.17 | 28.22 ± 0.16 | 28.22 ± 0.16 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 42.75 ± 0.18 | 29.44 ± 0.21 | 29.44 ± 0.21 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 42.37 ± 0.19 | 14.88 ± 0.21 | 14.88 ± 0.21 |
![]() ![]() MERaLiON 2 10B A*STAR | 41.93 ± 0.15 | 27.42 ± 0.20 | 27.42 ± 0.20 |
![]() ![]() Qwen 2.5 7B Alibaba | 39.89 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Tulu 3 8B AI2 | 39.85 ± 0.17 | 34.42 ± 0.49 | 34.42 ± 0.49 |
![]() ![]() Llama 3 70B Meta | 39.51 ± 0.09 | 31.73 ± 0.13 | 31.73 ± 0.13 |
![]() ![]() Sailor2 20B SAIL | 37.12 ± 0.15 | 20.74 ± 0.09 | 20.74 ± 0.09 |
![]() ![]() Sailor2 8B SAIL | 35.75 ± 0.15 | 22.44 ± 0.31 | 22.44 ± 0.31 |
![]() ![]() Command A 03-2025 111B CohereLabs | 34.58 ± 0.21 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Llama 3.1 8B Meta | 33.65 ± 0.18 | 0.77 ± 0.24 | 0.77 ± 0.24 |
![]() ![]() Olmo 2 0325 32B AI2 | 32.99 ± 0.22 | 34.85 ± 0.45 | 34.85 ± 0.45 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 32.19 ± 0.18 | 28.38 ± 0.40 | 28.38 ± 0.40 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.17 ± 0.22 | 27.86 ± 0.84 | 27.86 ± 0.84 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 31.49 ± 0.25 | 24.41 ± 0.46 | 24.41 ± 0.46 |
![]() ![]() Aya Expanse 32B CohereLabs | 30.47 ± 0.18 | 32.50 ± 0.13 | 32.50 ± 0.13 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 29.69 ± 0.27 | 20.54 ± 0.87 | 20.54 ± 0.87 |
![]() ![]() Apertus 8B Swiss AI | 29.35 ± 0.30 | 24.55 ± 0.66 | 24.55 ± 0.66 |
![]() ![]() Apertus 70B Swiss AI | 29.18 ± 0.23 | 25.43 ± 0.76 | 25.43 ± 0.76 |
![]() ![]() Command R 08-2024 32B CohereLabs | 27.84 ± 0.22 | 28.60 ± 0.43 | 28.60 ± 0.43 |
![]() ![]() phi-4 14B Microsoft | 25.91 ± 0.26 | 17.05 ± 0.41 | 17.05 ± 0.41 |
![]() ![]() Babel 83B Alibaba-DAMO | 25.76 ± 0.30 | 0.10 ± 0.14 | 0.10 ± 0.14 |
![]() ![]() Llama 3 8B Meta | 25.16 ± 0.15 | 11.47 ± 0.20 | 11.47 ± 0.20 |
![]() ![]() Aya Expanse 8B CohereLabs | 17.22 ± 0.13 | 9.81 ± 0.51 | 9.81 ± 0.51 |
![]() ![]() Ministral 2410 8B Mistral AI | 17.12 ± 0.22 | 17.31 ± 0.80 | 17.31 ± 0.80 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 14.64 ± 0.20 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 13B AI2 | 14.08 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 |
![]() ![]() Olmo 2 1124 7B AI2 | 11.14 ± 0.19 | 0.07 ± 0.05 | 0.07 ± 0.05 |
Model | TH | Knowledge | thai_exam |
---|---|---|---|
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 58.09 ± 0.09 | 37.85 ± 0.32 | 37.85 ± 0.32 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 57.91 ± 0.13 | 52.47 ± 0.16 | 52.47 ± 0.16 |
![]() ![]() Qwen 3 32B Alibaba | 56.98 ± 0.15 | 48.54 ± 0.29 | 48.54 ± 0.29 |
![]() ![]() Qwen 3 30B MoE Alibaba | 55.77 ± 0.11 | 41.11 ± 0.18 | 41.11 ± 0.18 |
![]() ![]() Qwen 3 14B Alibaba | 54.82 ± 0.15 | 44.38 ± 0.20 | 44.38 ± 0.20 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 54.60 ± 0.15 | 50.83 ± 0.35 | 50.83 ± 0.35 |
![]() ![]() Tulu 3 70B AI2 | 54.25 ± 0.16 | 47.33 ± 0.35 | 47.33 ± 0.35 |
![]() ![]() Qwen 2.5 72B Alibaba | 53.80 ± 0.16 | 52.36 ± 0.25 | 52.36 ± 0.25 |
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 53.46 ± 0.14 | 45.60 ± 0.24 | 45.60 ± 0.24 |
![]() ![]() Gemma 3 27B | 52.88 ± 0.12 | 44.39 ± 0.18 | 44.39 ± 0.18 |
![]() ![]() Gemma 3 12B | 50.75 ± 0.13 | 42.34 ± 0.22 | 42.34 ± 0.22 |
![]() ![]() Qwen 3 8B Alibaba | 50.66 ± 0.14 | 39.94 ± 0.31 | 39.94 ± 0.31 |
![]() ![]() Llama 3.3 70B Meta | 50.05 ± 0.08 | 48.51 ± 0.15 | 48.51 ± 0.15 |
![]() ![]() Qwen 2.5 32B Alibaba | 50.00 ± 0.12 | 48.18 ± 0.12 | 48.18 ± 0.12 |
![]() ![]() Gemma 2 27B | 49.78 ± 0.17 | 44.97 ± 0.37 | 44.97 ± 0.37 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 48.95 ± 0.18 | 42.27 ± 0.39 | 42.27 ± 0.39 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 48.44 ± 0.08 | 44.05 ± 0.23 | 44.05 ± 0.23 |
![]() ![]() Llama 3.1 70B Meta | 47.14 ± 0.20 | 47.69 ± 0.34 | 47.69 ± 0.34 |
![]() ![]() Qwen 2.5 14B Alibaba | 46.32 ± 0.13 | 43.48 ± 0.23 | 43.48 ± 0.23 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 45.34 ± 0.17 | 38.86 ± 0.48 | 38.86 ± 0.48 |
![]() ![]() Gemma 2 9B | 44.65 ± 0.17 | 40.07 ± 0.42 | 40.07 ± 0.42 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 42.75 ± 0.18 | 34.09 ± 0.31 | 34.09 ± 0.31 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 42.37 ± 0.19 | 28.87 ± 0.47 | 28.87 ± 0.47 |
![]() ![]() MERaLiON 2 10B A*STAR | 41.93 ± 0.15 | 38.89 ± 0.38 | 38.89 ± 0.38 |
![]() ![]() Qwen 2.5 7B Alibaba | 39.89 ± 0.16 | 36.03 ± 0.27 | 36.03 ± 0.27 |
![]() ![]() Tulu 3 8B AI2 | 39.85 ± 0.17 | 23.91 ± 0.34 | 23.91 ± 0.34 |
![]() ![]() Llama 3 70B Meta | 39.51 ± 0.09 | 42.89 ± 0.24 | 42.89 ± 0.24 |
![]() ![]() Sailor2 20B SAIL | 37.12 ± 0.15 | 43.19 ± 0.23 | 43.19 ± 0.23 |
![]() ![]() Sailor2 8B SAIL | 35.75 ± 0.15 | 35.04 ± 0.30 | 35.04 ± 0.30 |
![]() ![]() Command A 03-2025 111B CohereLabs | 34.58 ± 0.21 | 27.56 ± 0.53 | 27.56 ± 0.53 |
![]() ![]() Llama 3.1 8B Meta | 33.65 ± 0.18 | 27.94 ± 0.41 | 27.94 ± 0.41 |
![]() ![]() Olmo 2 0325 32B AI2 | 32.99 ± 0.22 | 21.85 ± 0.68 | 21.85 ± 0.68 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 32.19 ± 0.18 | 19.03 ± 0.48 | 19.03 ± 0.48 |
![]() ![]() Babel 9B Alibaba-DAMO | 32.17 ± 0.22 | 19.57 ± 0.60 | 19.57 ± 0.60 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 31.49 ± 0.25 | 25.20 ± 0.75 | 25.20 ± 0.75 |
![]() ![]() Aya Expanse 32B CohereLabs | 30.47 ± 0.18 | 16.44 ± 0.26 | 16.44 ± 0.26 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 29.69 ± 0.27 | 14.54 ± 0.76 | 14.54 ± 0.76 |
![]() ![]() Apertus 8B Swiss AI | 29.35 ± 0.30 | 12.24 ± 0.95 | 12.24 ± 0.95 |
![]() ![]() Apertus 70B Swiss AI | 29.18 ± 0.23 | 7.98 ± 0.83 | 7.98 ± 0.83 |
![]() ![]() Command R 08-2024 32B CohereLabs | 27.84 ± 0.22 | 13.33 ± 0.53 | 13.33 ± 0.53 |
![]() ![]() phi-4 14B Microsoft | 25.91 ± 0.26 | 0.85 ± 0.29 | 0.85 ± 0.29 |
![]() ![]() Babel 83B Alibaba-DAMO | 25.76 ± 0.30 | 31.92 ± 0.85 | 31.92 ± 0.85 |
![]() ![]() Llama 3 8B Meta | 25.16 ± 0.15 | 23.88 ± 0.39 | 23.88 ± 0.39 |
![]() ![]() Aya Expanse 8B CohereLabs | 17.22 ± 0.13 | 9.96 ± 0.33 | 9.96 ± 0.33 |
![]() ![]() Ministral 2410 8B Mistral AI | 17.12 ± 0.22 | 7.77 ± 0.75 | 7.77 ± 0.75 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 14.64 ± 0.20 | 4.57 ± 0.51 | 4.57 ± 0.51 |
![]() ![]() Olmo 2 1124 13B AI2 | 14.08 ± 0.18 | 0.01 ± 0.02 | 0.01 ± 0.02 |
![]() ![]() Olmo 2 1124 7B AI2 | 11.14 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 |