Burmese Performance
Burmese Scores by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
32B 49.56±0.14 |
27B 48.14±0.17 |
27B 47.18±0.15 |
109B MoE 45.54±0.17 |
80B MoE 44.88±0.16 |
32B 44.60±0.19 |
12B 42.46±0.15 |
32B 40.89±0.19 |
70B 38.21±0.28 |
70B 35.11±0.17 |
14B 32.46±0.13 |
8B 30.67±0.19 |
8B 30.25±0.22 |
72B 27.54±0.20 |
32B 26.99±0.17 |
8B 26.26±0.20 |
123B 26.17±0.29 |
30B MoE 25.62±0.12 |
27B 23.97±0.24 |
4B 23.31±0.17 |
70B 23.07±0.15 |
4B 20.42±0.16 |
70B 19.87±0.24 |
8B 17.88±0.26 |
21B MoE 17.70±0.15 |
111B 16.52±0.23 |
9B 15.40±0.22 |
14B 13.59±0.19 |
70B 13.09±0.17 |
70B 13.09±0.25 |
8B 11.65±0.13 |
8B 10.95±0.14 |
10B 10.66±0.16 |
83B 9.87±0.23 |
9B 9.63±0.17 |
8B 9.30±0.21 |
20B 8.55±0.11 |
8B 8.45±0.19 |
7B 8.04±0.15 |
9B 8.01±0.18 |
7B 7.13±0.16 |
104B 6.61±0.18 |
14B 6.45±0.18 |
32B 6.44±0.15 |
32B 4.87±0.15 |
32B 4.38±0.15 |
8B 3.90±0.13 |
7B 3.41±0.11 |
8B 3.20±0.06 |
7B 3.19±0.13 |
8B 3.03±0.13 |
24B 2.22±0.11 |
7B 2.18±0.12 |
13B 1.84±0.09 |
Burmese Competencies
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MY | Instruction Following | Multi-Turn Chat | NLG | NLR | NLU | Safety | Knowledge |
|---|---|---|---|---|---|---|---|---|
SEA-LION v4 (Qwen) 32B AISG | 49.56 ± 0.14 | 61.20 ± 0.83 | 15.72 ± 0.37 | 43.78 ± 0.10 | 61.75 ± 0.15 | 61.01 ± 0.12 | 53.48 ± 0.31 | 49.98 ± 0.18 |
Gemma 3 27B | 48.14 ± 0.17 | 63.03 ± 0.80 | 34.28 ± 0.79 | 46.30 ± 0.13 | 55.45 ± 0.29 | 58.49 ± 0.18 | 34.35 ± 0.21 | 45.09 ± 0.25 |
SEA-LION v4 (Gemma) 27B AISG | 47.18 ± 0.15 | 61.57 ± 0.73 | 31.95 ± 0.79 | 44.63 ± 0.13 | 55.03 ± 0.25 | 58.33 ± 0.24 | 33.82 ± 0.29 | 44.92 ± 0.36 |
Llama 4 Scout 109B MoE Meta | 45.54 ± 0.17 | 63.80 ± 0.83 | 12.57 ± 0.43 | 53.34 ± 0.13 | 51.07 ± 0.18 | 44.56 ± 0.24 | 45.38 ± 0.20 | 48.08 ± 0.22 |
Qwen 3 Next 80B MoE Alibaba | 44.88 ± 0.16 | 61.17 ± 0.75 | 18.89 ± 0.49 | 40.12 ± 0.08 | 53.02 ± 0.30 | 58.39 ± 0.24 | 37.85 ± 0.34 | 44.72 ± 0.38 |
Qwen 3 32B Alibaba | 44.60 ± 0.19 | 64.60 ± 0.98 | 13.36 ± 0.46 | 35.01 ± 0.14 | 56.38 ± 0.29 | 57.92 ± 0.39 | 43.53 ± 0.56 | 41.41 ± 0.48 |
Gemma 3 12B | 42.46 ± 0.15 | 56.37 ± 0.74 | 25.50 ± 0.68 | 44.87 ± 0.14 | 35.58 ± 0.41 | 58.03 ± 0.23 | 40.15 ± 0.30 | 36.75 ± 0.31 |
Qwen 3 VL 32B Alibaba | 40.89 ± 0.19 | 58.20 ± 1.00 | 12.93 ± 0.46 | 33.40 ± 0.14 | 46.49 ± 0.27 | 60.61 ± 0.15 | 32.85 ± 0.40 | 41.73 ± 0.35 |
SEA-LION v3 (Llama) 70B AISG | 38.21 ± 0.28 | 70.83 ± 0.99 | 11.16 ± 0.48 | 44.27 ± 0.12 | 49.46 ± 0.52 | 36.65 ± 0.35 | 20.88 ± 1.13 | 34.19 ± 0.70 |
Tulu 3 70B AI2 | 35.11 ± 0.17 | 53.93 ± 0.97 | 7.59 ± 0.40 | 46.29 ± 0.14 | 50.27 ± 0.37 | 50.68 ± 0.40 | 0.00 ± 0.00 | 37.05 ± 0.37 |
Qwen 3 14B Alibaba | 32.46 ± 0.13 | 46.47 ± 0.73 | 8.99 ± 0.30 | 37.31 ± 0.11 | 24.78 ± 0.33 | 49.11 ± 0.30 | 33.15 ± 0.35 | 27.40 ± 0.43 |
SEA-LION v4 (Qwen VL) 8B AISG | 30.67 ± 0.19 | 40.57 ± 0.65 | 10.83 ± 0.32 | 33.60 ± 0.14 | 31.02 ± 0.25 | 47.22 ± 0.32 | 18.98 ± 0.63 | 32.47 ± 0.34 |
Qwen 3 VL 8B Alibaba | 30.25 ± 0.22 | 39.80 ± 1.09 | 8.58 ± 0.34 | 33.11 ± 0.13 | 34.45 ± 0.43 | 45.00 ± 0.43 | 22.10 ± 0.43 | 28.73 ± 0.39 |
Qwen 2.5 72B Alibaba | 27.54 ± 0.20 | 50.17 ± 1.07 | 8.52 ± 0.32 | 34.00 ± 0.15 | 9.00 ± 0.18 | 44.24 ± 0.25 | 21.88 ± 0.48 | 25.00 ± 0.38 |
Qwen 2.5 32B Alibaba | 26.99 ± 0.17 | 50.77 ± 0.83 | 6.38 ± 0.35 | 31.71 ± 0.08 | 9.64 ± 0.38 | 37.70 ± 0.41 | 25.07 ± 0.37 | 27.66 ± 0.57 |
Qwen 3 8B Alibaba | 26.26 ± 0.20 | 38.50 ± 1.05 | 8.91 ± 0.42 | 34.00 ± 0.10 | 16.93 ± 0.33 | 36.93 ± 0.33 | 31.78 ± 0.31 | 16.75 ± 0.47 |
Mistral Large 2411 123B Mistral AI | 26.17 ± 0.29 | 49.90 ± 0.97 | 5.80 ± 0.38 | 37.38 ± 0.20 | 8.88 ± 0.74 | 38.82 ± 0.68 | 30.02 ± 1.52 | 12.36 ± 0.59 |
Qwen 3 30B MoE Alibaba | 25.62 ± 0.12 | 53.30 ± 0.66 | 13.71 ± 0.42 | 18.06 ± 0.08 | 22.03 ± 0.24 | 50.08 ± 0.24 | 22.18 ± 0.22 | 0.00 ± 0.00 |
Gemma 2 27B | 23.97 ± 0.24 | 47.47 ± 1.11 | 5.45 ± 0.24 | 35.86 ± 0.11 | 28.94 ± 0.56 | 26.38 ± 0.91 | 0.00 ± 0.00 | 23.70 ± 0.52 |
SEA-LION v4 (Qwen VL) 4B AISG | 23.31 ± 0.17 | 33.43 ± 1.01 | 5.53 ± 0.36 | 12.77 ± 0.06 | 25.03 ± 0.23 | 37.18 ± 0.16 | 24.03 ± 0.47 | 25.20 ± 0.32 |
Llama 3.3 70B Meta | 23.07 ± 0.15 | 60.33 ± 0.91 | 7.05 ± 0.35 | 44.19 ± 0.14 | 21.25 ± 0.16 | 1.85 ± 0.36 | 22.03 ± 0.67 | 4.80 ± 0.22 |
Qwen 3 VL 4B Alibaba | 20.42 ± 0.16 | 33.33 ± 0.96 | 5.00 ± 0.29 | 17.45 ± 0.05 | 22.15 ± 0.34 | 40.57 ± 0.20 | 2.97 ± 0.29 | 21.51 ± 0.38 |
Llama 3.1 70B Meta | 19.87 ± 0.24 | 48.23 ± 1.25 | 6.31 ± 0.47 | 42.86 ± 0.17 | 17.54 ± 0.35 | 18.81 ± 0.65 | 0.00 ± 0.00 | 5.34 ± 0.45 |
SEA-LION v3 (Llama) 8B AISG | 17.88 ± 0.26 | 49.23 ± 0.88 | 4.35 ± 0.31 | 9.77 ± 0.16 | 10.73 ± 0.72 | 29.80 ± 0.62 | 7.25 ± 1.14 | 14.03 ± 0.57 |
ERNIE 4.5 21B MoE Baidu | 17.70 ± 0.15 | 48.43 ± 0.97 | 13.26 ± 0.46 | 41.31 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 20.91 ± 0.41 |
Command A 03-2025 111B CohereLabs | 16.52 ± 0.23 | 48.23 ± 1.06 | 6.82 ± 0.40 | 21.33 ± 0.19 | 9.28 ± 0.29 | 23.73 ± 0.65 | 0.00 ± 0.00 | 6.21 ± 0.45 |
SEA-LION v3 (Gemma 2) 9B AISG | 15.40 ± 0.22 | 50.00 ± 1.19 | 8.99 ± 0.39 | 33.97 ± 0.17 | 12.55 ± 0.18 | 2.26 ± 0.67 | 0.00 ± 0.00 | 0.06 ± 0.05 |
Qwen 2.5 14B Alibaba | 13.59 ± 0.19 | 43.73 ± 1.17 | 4.89 ± 0.28 | 26.98 ± 0.07 | 2.18 ± 0.34 | 16.98 ± 0.34 | 0.00 ± 0.00 | 0.36 ± 0.07 |
Llama 3 70B Meta | 13.09 ± 0.17 | 11.03 ± 0.58 | 5.47 ± 0.39 | 24.94 ± 0.05 | 34.05 ± 0.39 | 7.67 ± 0.56 | 0.00 ± 0.00 | 8.48 ± 0.40 |
Apertus 70B Swiss AI | 13.09 ± 0.25 | 35.57 ± 1.21 | 7.77 ± 0.54 | 36.72 ± 0.10 | 0.00 ± 0.00 | 10.89 ± 0.71 | 0.00 ± 0.00 | 0.68 ± 0.12 |
Sailor2 8B SAIL | 11.65 ± 0.13 | 25.47 ± 0.73 | 9.64 ± 0.41 | 24.38 ± 0.07 | 21.87 ± 0.39 | 0.13 ± 0.13 | 0.00 ± 0.00 | 0.04 ± 0.03 |
Tulu 3 8B AI2 | 10.95 ± 0.14 | 38.53 ± 0.71 | 1.26 ± 0.15 | 29.48 ± 0.09 | 0.68 ± 0.06 | 2.94 ± 0.32 | 0.00 ± 0.00 | 3.77 ± 0.26 |
MERaLiON 2 10B A*STAR | 10.66 ± 0.16 | 36.33 ± 0.96 | 3.36 ± 0.27 | 27.41 ± 0.17 | 6.30 ± 0.15 | 0.03 ± 0.07 | 0.00 ± 0.00 | 1.20 ± 0.23 |
Babel 83B Alibaba-DAMO | 9.87 ± 0.23 | 24.57 ± 0.95 | 0.98 ± 0.16 | 13.08 ± 0.13 | 0.20 ± 0.16 | 19.00 ± 1.16 | 0.00 ± 0.00 | 11.23 ± 0.70 |
Gemma 2 9B | 9.63 ± 0.17 | 34.27 ± 1.15 | 3.19 ± 0.33 | 24.86 ± 0.15 | 4.58 ± 0.32 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.54 ± 0.10 |
Apertus 8B Swiss AI | 9.30 ± 0.21 | 28.73 ± 1.05 | 3.20 ± 0.38 | 26.95 ± 0.16 | 0.13 ± 0.11 | 3.35 ± 0.74 | 0.00 ± 0.00 | 2.72 ± 0.29 |
Sailor2 20B SAIL | 8.55 ± 0.11 | 24.53 ± 0.84 | 8.89 ± 0.38 | 26.43 ± 0.07 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.1 8B Meta | 8.45 ± 0.19 | 26.67 ± 1.04 | 2.72 ± 0.28 | 23.71 ± 0.18 | 0.00 ± 0.00 | 3.74 ± 0.51 | 0.00 ± 0.00 | 2.32 ± 0.25 |
Qwen 2.5 7B Alibaba | 8.04 ± 0.15 | 35.80 ± 0.91 | 2.18 ± 0.26 | 15.29 ± 0.12 | 2.98 ± 0.41 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Babel 9B Alibaba-DAMO | 8.01 ± 0.18 | 23.87 ± 1.10 | 3.03 ± 0.28 | 17.92 ± 0.16 | 0.00 ± 0.00 | 11.20 ± 0.31 | 0.00 ± 0.00 | 0.04 ± 0.03 |
SeaLLMs V3 7B Alibaba-DAMO | 7.13 ± 0.16 | 27.63 ± 1.11 | 3.82 ± 0.35 | 18.27 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.19 ± 0.06 |
Command R+ 08-2024 104B CohereLabs | 6.61 ± 0.18 | 22.87 ± 1.13 | 0.80 ± 0.13 | 20.97 ± 0.12 | 0.00 ± 0.00 | 0.91 ± 0.34 | 0.00 ± 0.00 | 0.73 ± 0.15 |
phi-4 14B Microsoft | 6.45 ± 0.18 | 20.90 ± 0.96 | 1.90 ± 0.34 | 13.49 ± 0.21 | 0.00 ± 0.00 | 8.89 ± 0.35 | 0.00 ± 0.00 | 0.01 ± 0.02 |
Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 29.80 ± 0.91 | 1.39 ± 0.17 | 10.33 ± 0.09 | 0.25 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 | 3.30 ± 0.27 |
Command R 08-2024 32B CohereLabs | 4.87 ± 0.15 | 17.83 ± 0.90 | 0.33 ± 0.13 | 13.36 ± 0.11 | 0.68 ± 0.34 | 0.54 ± 0.31 | 0.00 ± 0.00 | 1.33 ± 0.19 |
Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 22.43 ± 1.02 | 0.78 ± 0.16 | 7.36 ± 0.06 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.07 ± 0.04 |
Ministral 2410 8B Mistral AI | 3.90 ± 0.13 | 10.33 ± 0.82 | 0.88 ± 0.18 | 15.78 ± 0.13 | 0.15 ± 0.12 | 0.03 ± 0.05 | 0.00 ± 0.00 | 0.16 ± 0.09 |
Olmo 3 7B AI2 | 3.41 ± 0.11 | 15.87 ± 0.77 | 0.39 ± 0.12 | 7.56 ± 0.08 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.03 ± 0.02 |
Llama 3 8B Meta | 3.20 ± 0.06 | 8.87 ± 0.44 | 2.07 ± 0.25 | 11.47 ± 0.08 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.01 |
Command R7B 12-2024 7B CohereLabs | 3.19 ± 0.13 | 12.37 ± 0.89 | 0.03 ± 0.04 | 9.89 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.05 ± 0.03 |
Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 19.77 ± 0.88 | 0.00 ± 0.00 | 1.41 ± 0.03 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Mistral Small 3.1 2503 24B Mistral AI | 2.22 ± 0.11 | 10.03 ± 0.82 | 0.50 ± 0.13 | 4.91 ± 0.09 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.10 ± 0.05 |
Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 11.90 ± 0.89 | 0.13 ± 0.07 | 3.22 ± 0.05 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 11.90 ± 0.65 | 0.43 ± 0.00 | 0.52 ± 0.02 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.03 ± 0.03 |
Burmese Tasks
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | MY | Instruction Following | SEA-IFEval |
|---|---|---|---|
SEA-LION v4 (Qwen) 32B AISG | 49.56 ± 0.14 | 61.20 ± 0.83 | 61.20 ± 0.83 |
Gemma 3 27B | 48.14 ± 0.17 | 63.03 ± 0.80 | 63.03 ± 0.80 |
SEA-LION v4 (Gemma) 27B AISG | 47.18 ± 0.15 | 61.57 ± 0.73 | 61.57 ± 0.73 |
Llama 4 Scout 109B MoE Meta | 45.54 ± 0.17 | 63.80 ± 0.83 | 63.80 ± 0.83 |
Qwen 3 Next 80B MoE Alibaba | 44.88 ± 0.16 | 61.17 ± 0.75 | 61.17 ± 0.75 |
Qwen 3 32B Alibaba | 44.60 ± 0.19 | 64.60 ± 0.98 | 64.60 ± 0.98 |
Gemma 3 12B | 42.46 ± 0.15 | 56.37 ± 0.74 | 56.37 ± 0.74 |
Qwen 3 VL 32B Alibaba | 40.89 ± 0.19 | 58.20 ± 1.00 | 58.20 ± 1.00 |
SEA-LION v3 (Llama) 70B AISG | 38.21 ± 0.28 | 70.83 ± 0.99 | 70.83 ± 0.99 |
Tulu 3 70B AI2 | 35.11 ± 0.17 | 53.93 ± 0.97 | 53.93 ± 0.97 |
Qwen 3 14B Alibaba | 32.46 ± 0.13 | 46.47 ± 0.73 | 46.47 ± 0.73 |
SEA-LION v4 (Qwen VL) 8B AISG | 30.67 ± 0.19 | 40.57 ± 0.65 | 40.57 ± 0.65 |
Qwen 3 VL 8B Alibaba | 30.25 ± 0.22 | 39.80 ± 1.09 | 39.80 ± 1.09 |
Qwen 2.5 72B Alibaba | 27.54 ± 0.20 | 50.17 ± 1.07 | 50.17 ± 1.07 |
Qwen 2.5 32B Alibaba | 26.99 ± 0.17 | 50.77 ± 0.83 | 50.77 ± 0.83 |
Qwen 3 8B Alibaba | 26.26 ± 0.20 | 38.50 ± 1.05 | 38.50 ± 1.05 |
Mistral Large 2411 123B Mistral AI | 26.17 ± 0.29 | 49.90 ± 0.97 | 49.90 ± 0.97 |
Qwen 3 30B MoE Alibaba | 25.62 ± 0.12 | 53.30 ± 0.66 | 53.30 ± 0.66 |
Gemma 2 27B | 23.97 ± 0.24 | 47.47 ± 1.11 | 47.47 ± 1.11 |
SEA-LION v4 (Qwen VL) 4B AISG | 23.31 ± 0.17 | 33.43 ± 1.01 | 33.43 ± 1.01 |
Llama 3.3 70B Meta | 23.07 ± 0.15 | 60.33 ± 0.91 | 60.33 ± 0.91 |
Qwen 3 VL 4B Alibaba | 20.42 ± 0.16 | 33.33 ± 0.96 | 33.33 ± 0.96 |
Llama 3.1 70B Meta | 19.87 ± 0.24 | 48.23 ± 1.25 | 48.23 ± 1.25 |
SEA-LION v3 (Llama) 8B AISG | 17.88 ± 0.26 | 49.23 ± 0.88 | 49.23 ± 0.88 |
ERNIE 4.5 21B MoE Baidu | 17.70 ± 0.15 | 48.43 ± 0.97 | 48.43 ± 0.97 |
Command A 03-2025 111B CohereLabs | 16.52 ± 0.23 | 48.23 ± 1.06 | 48.23 ± 1.06 |
SEA-LION v3 (Gemma 2) 9B AISG | 15.40 ± 0.22 | 50.00 ± 1.19 | 50.00 ± 1.19 |
Qwen 2.5 14B Alibaba | 13.59 ± 0.19 | 43.73 ± 1.17 | 43.73 ± 1.17 |
Llama 3 70B Meta | 13.09 ± 0.17 | 11.03 ± 0.58 | 11.03 ± 0.58 |
Apertus 70B Swiss AI | 13.09 ± 0.25 | 35.57 ± 1.21 | 35.57 ± 1.21 |
Sailor2 8B SAIL | 11.65 ± 0.13 | 25.47 ± 0.73 | 25.47 ± 0.73 |
Tulu 3 8B AI2 | 10.95 ± 0.14 | 38.53 ± 0.71 | 38.53 ± 0.71 |
MERaLiON 2 10B A*STAR | 10.66 ± 0.16 | 36.33 ± 0.96 | 36.33 ± 0.96 |
Babel 83B Alibaba-DAMO | 9.87 ± 0.23 | 24.57 ± 0.95 | 24.57 ± 0.95 |
Gemma 2 9B | 9.63 ± 0.17 | 34.27 ± 1.15 | 34.27 ± 1.15 |
Apertus 8B Swiss AI | 9.30 ± 0.21 | 28.73 ± 1.05 | 28.73 ± 1.05 |
Sailor2 20B SAIL | 8.55 ± 0.11 | 24.53 ± 0.84 | 24.53 ± 0.84 |
Llama 3.1 8B Meta | 8.45 ± 0.19 | 26.67 ± 1.04 | 26.67 ± 1.04 |
Qwen 2.5 7B Alibaba | 8.04 ± 0.15 | 35.80 ± 0.91 | 35.80 ± 0.91 |
Babel 9B Alibaba-DAMO | 8.01 ± 0.18 | 23.87 ± 1.10 | 23.87 ± 1.10 |
SeaLLMs V3 7B Alibaba-DAMO | 7.13 ± 0.16 | 27.63 ± 1.11 | 27.63 ± 1.11 |
Command R+ 08-2024 104B CohereLabs | 6.61 ± 0.18 | 22.87 ± 1.13 | 22.87 ± 1.13 |
phi-4 14B Microsoft | 6.45 ± 0.18 | 20.90 ± 0.96 | 20.90 ± 0.96 |
Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 29.80 ± 0.91 | 29.80 ± 0.91 |
Command R 08-2024 32B CohereLabs | 4.87 ± 0.15 | 17.83 ± 0.90 | 17.83 ± 0.90 |
Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 22.43 ± 1.02 | 22.43 ± 1.02 |
Ministral 2410 8B Mistral AI | 3.90 ± 0.13 | 10.33 ± 0.82 | 10.33 ± 0.82 |
Olmo 3 7B AI2 | 3.41 ± 0.11 | 15.87 ± 0.77 | 15.87 ± 0.77 |
Llama 3 8B Meta | 3.20 ± 0.06 | 8.87 ± 0.44 | 8.87 ± 0.44 |
Command R7B 12-2024 7B CohereLabs | 3.19 ± 0.13 | 12.37 ± 0.89 | 12.37 ± 0.89 |
Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 19.77 ± 0.88 | 19.77 ± 0.88 |
Mistral Small 3.1 2503 24B Mistral AI | 2.22 ± 0.11 | 10.03 ± 0.82 | 10.03 ± 0.82 |
Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 11.90 ± 0.89 | 11.90 ± 0.89 |
Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 11.90 ± 0.65 | 11.90 ± 0.65 |
Model | MY | Multi-Turn Chat | SEA-MT-Bench |
|---|---|---|---|
SEA-LION v4 (Qwen) 32B AISG | 49.56 ± 0.14 | 15.72 ± 0.37 | 15.72 ± 0.37 |
Gemma 3 27B | 48.14 ± 0.17 | 34.28 ± 0.79 | 34.28 ± 0.79 |
SEA-LION v4 (Gemma) 27B AISG | 47.18 ± 0.15 | 31.95 ± 0.79 | 31.95 ± 0.79 |
Llama 4 Scout 109B MoE Meta | 45.54 ± 0.17 | 12.57 ± 0.43 | 12.57 ± 0.43 |
Qwen 3 Next 80B MoE Alibaba | 44.88 ± 0.16 | 18.89 ± 0.49 | 18.89 ± 0.49 |
Qwen 3 32B Alibaba | 44.60 ± 0.19 | 13.36 ± 0.46 | 13.36 ± 0.46 |
Gemma 3 12B | 42.46 ± 0.15 | 25.50 ± 0.68 | 25.50 ± 0.68 |
Qwen 3 VL 32B Alibaba | 40.89 ± 0.19 | 12.93 ± 0.46 | 12.93 ± 0.46 |
SEA-LION v3 (Llama) 70B AISG | 38.21 ± 0.28 | 11.16 ± 0.48 | 11.16 ± 0.48 |
Tulu 3 70B AI2 | 35.11 ± 0.17 | 7.59 ± 0.40 | 7.59 ± 0.40 |
Qwen 3 14B Alibaba | 32.46 ± 0.13 | 8.99 ± 0.30 | 8.99 ± 0.30 |
SEA-LION v4 (Qwen VL) 8B AISG | 30.67 ± 0.19 | 10.83 ± 0.32 | 10.83 ± 0.32 |
Qwen 3 VL 8B Alibaba | 30.25 ± 0.22 | 8.58 ± 0.34 | 8.58 ± 0.34 |
Qwen 2.5 72B Alibaba | 27.54 ± 0.20 | 8.52 ± 0.32 | 8.52 ± 0.32 |
Qwen 2.5 32B Alibaba | 26.99 ± 0.17 | 6.38 ± 0.35 | 6.38 ± 0.35 |
Qwen 3 8B Alibaba | 26.26 ± 0.20 | 8.91 ± 0.42 | 8.91 ± 0.42 |
Mistral Large 2411 123B Mistral AI | 26.17 ± 0.29 | 5.80 ± 0.38 | 5.80 ± 0.38 |
Qwen 3 30B MoE Alibaba | 25.62 ± 0.12 | 13.71 ± 0.42 | 13.71 ± 0.42 |
Gemma 2 27B | 23.97 ± 0.24 | 5.45 ± 0.24 | 5.45 ± 0.24 |
SEA-LION v4 (Qwen VL) 4B AISG | 23.31 ± 0.17 | 5.53 ± 0.36 | 5.53 ± 0.36 |
Llama 3.3 70B Meta | 23.07 ± 0.15 | 7.05 ± 0.35 | 7.05 ± 0.35 |
Qwen 3 VL 4B Alibaba | 20.42 ± 0.16 | 5.00 ± 0.29 | 5.00 ± 0.29 |
Llama 3.1 70B Meta | 19.87 ± 0.24 | 6.31 ± 0.47 | 6.31 ± 0.47 |
SEA-LION v3 (Llama) 8B AISG | 17.88 ± 0.26 | 4.35 ± 0.31 | 4.35 ± 0.31 |
ERNIE 4.5 21B MoE Baidu | 17.70 ± 0.15 | 13.26 ± 0.46 | 13.26 ± 0.46 |
Command A 03-2025 111B CohereLabs | 16.52 ± 0.23 | 6.82 ± 0.40 | 6.82 ± 0.40 |
SEA-LION v3 (Gemma 2) 9B AISG | 15.40 ± 0.22 | 8.99 ± 0.39 | 8.99 ± 0.39 |
Qwen 2.5 14B Alibaba | 13.59 ± 0.19 | 4.89 ± 0.28 | 4.89 ± 0.28 |
Llama 3 70B Meta | 13.09 ± 0.17 | 5.47 ± 0.39 | 5.47 ± 0.39 |
Apertus 70B Swiss AI | 13.09 ± 0.25 | 7.77 ± 0.54 | 7.77 ± 0.54 |
Sailor2 8B SAIL | 11.65 ± 0.13 | 9.64 ± 0.41 | 9.64 ± 0.41 |
Tulu 3 8B AI2 | 10.95 ± 0.14 | 1.26 ± 0.15 | 1.26 ± 0.15 |
MERaLiON 2 10B A*STAR | 10.66 ± 0.16 | 3.36 ± 0.27 | 3.36 ± 0.27 |
Babel 83B Alibaba-DAMO | 9.87 ± 0.23 | 0.98 ± 0.16 | 0.98 ± 0.16 |
Gemma 2 9B | 9.63 ± 0.17 | 3.19 ± 0.33 | 3.19 ± 0.33 |
Apertus 8B Swiss AI | 9.30 ± 0.21 | 3.20 ± 0.38 | 3.20 ± 0.38 |
Sailor2 20B SAIL | 8.55 ± 0.11 | 8.89 ± 0.38 | 8.89 ± 0.38 |
Llama 3.1 8B Meta | 8.45 ± 0.19 | 2.72 ± 0.28 | 2.72 ± 0.28 |
Qwen 2.5 7B Alibaba | 8.04 ± 0.15 | 2.18 ± 0.26 | 2.18 ± 0.26 |
Babel 9B Alibaba-DAMO | 8.01 ± 0.18 | 3.03 ± 0.28 | 3.03 ± 0.28 |
SeaLLMs V3 7B Alibaba-DAMO | 7.13 ± 0.16 | 3.82 ± 0.35 | 3.82 ± 0.35 |
Command R+ 08-2024 104B CohereLabs | 6.61 ± 0.18 | 0.80 ± 0.13 | 0.80 ± 0.13 |
phi-4 14B Microsoft | 6.45 ± 0.18 | 1.90 ± 0.34 | 1.90 ± 0.34 |
Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 1.39 ± 0.17 | 1.39 ± 0.17 |
Command R 08-2024 32B CohereLabs | 4.87 ± 0.15 | 0.33 ± 0.13 | 0.33 ± 0.13 |
Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 0.78 ± 0.16 | 0.78 ± 0.16 |
Ministral 2410 8B Mistral AI | 3.90 ± 0.13 | 0.88 ± 0.18 | 0.88 ± 0.18 |
Olmo 3 7B AI2 | 3.41 ± 0.11 | 0.39 ± 0.12 | 0.39 ± 0.12 |
Llama 3 8B Meta | 3.20 ± 0.06 | 2.07 ± 0.25 | 2.07 ± 0.25 |
Command R7B 12-2024 7B CohereLabs | 3.19 ± 0.13 | 0.03 ± 0.04 | 0.03 ± 0.04 |
Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Mistral Small 3.1 2503 24B Mistral AI | 2.22 ± 0.11 | 0.50 ± 0.13 | 0.50 ± 0.13 |
Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 0.13 ± 0.07 | 0.13 ± 0.07 |
Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 0.43 ± 0.00 | 0.43 ± 0.00 |
Model | MY | NLG | Summarization | Translations |
|---|---|---|---|---|
SEA-LION v4 (Qwen) 32B AISG | 49.56 ± 0.14 | 43.78 ± 0.10 | 23.56 ± 0.08 | 64.00 ± 0.19 |
Gemma 3 27B | 48.14 ± 0.17 | 46.30 ± 0.13 | 13.16 ± 0.25 | 79.44 ± 0.07 |
SEA-LION v4 (Gemma) 27B AISG | 47.18 ± 0.15 | 44.63 ± 0.13 | 11.43 ± 0.23 | 77.83 ± 0.10 |
Llama 4 Scout 109B MoE Meta | 45.54 ± 0.17 | 53.34 ± 0.13 | 25.53 ± 0.25 | 81.15 ± 0.04 |
Qwen 3 Next 80B MoE Alibaba | 44.88 ± 0.16 | 40.12 ± 0.08 | 21.66 ± 0.08 | 58.58 ± 0.15 |
Qwen 3 32B Alibaba | 44.60 ± 0.19 | 35.01 ± 0.14 | 25.31 ± 0.18 | 44.71 ± 0.18 |
Gemma 3 12B | 42.46 ± 0.15 | 44.87 ± 0.14 | 18.77 ± 0.26 | 70.96 ± 0.09 |
Qwen 3 VL 32B Alibaba | 40.89 ± 0.19 | 33.40 ± 0.14 | 10.77 ± 0.23 | 56.03 ± 0.11 |
SEA-LION v3 (Llama) 70B AISG | 38.21 ± 0.28 | 44.27 ± 0.12 | 25.26 ± 0.16 | 63.28 ± 0.16 |
Tulu 3 70B AI2 | 35.11 ± 0.17 | 46.29 ± 0.14 | 25.97 ± 0.24 | 66.60 ± 0.09 |
Qwen 3 14B Alibaba | 32.46 ± 0.13 | 37.31 ± 0.11 | 22.76 ± 0.12 | 51.87 ± 0.16 |
SEA-LION v4 (Qwen VL) 8B AISG | 30.67 ± 0.19 | 33.60 ± 0.14 | 15.49 ± 0.21 | 51.70 ± 0.20 |
Qwen 3 VL 8B Alibaba | 30.25 ± 0.22 | 33.11 ± 0.13 | 16.51 ± 0.23 | 49.72 ± 0.09 |
Qwen 2.5 72B Alibaba | 27.54 ± 0.20 | 34.00 ± 0.15 | 21.68 ± 0.27 | 46.32 ± 0.14 |
Qwen 2.5 32B Alibaba | 26.99 ± 0.17 | 31.71 ± 0.08 | 24.05 ± 0.12 | 39.36 ± 0.15 |
Qwen 3 8B Alibaba | 26.26 ± 0.20 | 34.00 ± 0.10 | 20.11 ± 0.15 | 47.90 ± 0.13 |
Mistral Large 2411 123B Mistral AI | 26.17 ± 0.29 | 37.38 ± 0.20 | 19.68 ± 0.28 | 55.08 ± 0.20 |
Qwen 3 30B MoE Alibaba | 25.62 ± 0.12 | 18.06 ± 0.08 | 1.27 ± 0.12 | 34.84 ± 0.12 |
Gemma 2 27B | 23.97 ± 0.24 | 35.86 ± 0.11 | 21.41 ± 0.19 | 50.32 ± 0.12 |
SEA-LION v4 (Qwen VL) 4B AISG | 23.31 ± 0.17 | 12.77 ± 0.06 | 0.63 ± 0.03 | 24.90 ± 0.13 |
Llama 3.3 70B Meta | 23.07 ± 0.15 | 44.19 ± 0.14 | 28.32 ± 0.21 | 60.06 ± 0.15 |
Qwen 3 VL 4B Alibaba | 20.42 ± 0.16 | 17.45 ± 0.05 | 0.55 ± 0.00 | 34.34 ± 0.11 |
Llama 3.1 70B Meta | 19.87 ± 0.24 | 42.86 ± 0.17 | 27.30 ± 0.26 | 58.42 ± 0.20 |
SEA-LION v3 (Llama) 8B AISG | 17.88 ± 0.26 | 9.77 ± 0.16 | 5.61 ± 0.27 | 13.92 ± 0.12 |
ERNIE 4.5 21B MoE Baidu | 17.70 ± 0.15 | 41.31 ± 0.12 | 10.37 ± 0.19 | 72.24 ± 0.18 |
Command A 03-2025 111B CohereLabs | 16.52 ± 0.23 | 21.33 ± 0.19 | 5.57 ± 0.35 | 37.10 ± 0.19 |
SEA-LION v3 (Gemma 2) 9B AISG | 15.40 ± 0.22 | 33.97 ± 0.17 | 14.84 ± 0.29 | 53.10 ± 0.15 |
Qwen 2.5 14B Alibaba | 13.59 ± 0.19 | 26.98 ± 0.07 | 21.50 ± 0.11 | 32.46 ± 0.10 |
Llama 3 70B Meta | 13.09 ± 0.17 | 24.94 ± 0.05 | 0.00 ± 0.00 | 49.89 ± 0.11 |
Apertus 70B Swiss AI | 13.09 ± 0.25 | 36.72 ± 0.10 | 15.17 ± 0.17 | 58.27 ± 0.17 |
Sailor2 8B SAIL | 11.65 ± 0.13 | 24.38 ± 0.07 | 0.00 ± 0.00 | 48.76 ± 0.14 |
Tulu 3 8B AI2 | 10.95 ± 0.14 | 29.48 ± 0.09 | 21.48 ± 0.17 | 37.48 ± 0.10 |
MERaLiON 2 10B A*STAR | 10.66 ± 0.16 | 27.41 ± 0.17 | 19.49 ± 0.27 | 35.33 ± 0.15 |
Babel 83B Alibaba-DAMO | 9.87 ± 0.23 | 13.08 ± 0.13 | 16.56 ± 0.24 | 9.61 ± 0.15 |
Gemma 2 9B | 9.63 ± 0.17 | 24.86 ± 0.15 | 14.08 ± 0.28 | 35.64 ± 0.13 |
Apertus 8B Swiss AI | 9.30 ± 0.21 | 26.95 ± 0.16 | 13.22 ± 0.27 | 40.68 ± 0.16 |
Sailor2 20B SAIL | 8.55 ± 0.11 | 26.43 ± 0.07 | 0.00 ± 0.00 | 52.86 ± 0.15 |
Llama 3.1 8B Meta | 8.45 ± 0.19 | 23.71 ± 0.18 | 26.73 ± 0.27 | 20.69 ± 0.14 |
Qwen 2.5 7B Alibaba | 8.04 ± 0.15 | 15.29 ± 0.12 | 19.35 ± 0.18 | 11.22 ± 0.11 |
Babel 9B Alibaba-DAMO | 8.01 ± 0.18 | 17.92 ± 0.16 | 18.33 ± 0.33 | 17.51 ± 0.22 |
SeaLLMs V3 7B Alibaba-DAMO | 7.13 ± 0.16 | 18.27 ± 0.17 | 17.68 ± 0.31 | 18.87 ± 0.15 |
Command R+ 08-2024 104B CohereLabs | 6.61 ± 0.18 | 20.97 ± 0.12 | 16.00 ± 0.17 | 25.95 ± 0.17 |
phi-4 14B Microsoft | 6.45 ± 0.18 | 13.49 ± 0.21 | 10.79 ± 0.39 | 16.19 ± 0.12 |
Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 10.33 ± 0.09 | 0.00 ± 0.00 | 20.66 ± 0.18 |
Command R 08-2024 32B CohereLabs | 4.87 ± 0.15 | 13.36 ± 0.11 | 16.97 ± 0.15 | 9.75 ± 0.11 |
Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 7.36 ± 0.06 | 0.00 ± 0.00 | 14.73 ± 0.12 |
Ministral 2410 8B Mistral AI | 3.90 ± 0.13 | 15.78 ± 0.13 | 4.22 ± 0.19 | 27.35 ± 0.17 |
Olmo 3 7B AI2 | 3.41 ± 0.11 | 7.56 ± 0.08 | 14.00 ± 0.14 | 1.11 ± 0.02 |
Llama 3 8B Meta | 3.20 ± 0.06 | 11.47 ± 0.08 | 0.00 ± 0.00 | 22.95 ± 0.15 |
Command R7B 12-2024 7B CohereLabs | 3.19 ± 0.13 | 9.89 ± 0.16 | 10.06 ± 0.28 | 9.73 ± 0.14 |
Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 1.41 ± 0.03 | 0.00 ± 0.00 | 2.82 ± 0.06 |
Mistral Small 3.1 2503 24B Mistral AI | 2.22 ± 0.11 | 4.91 ± 0.09 | 3.48 ± 0.09 | 6.34 ± 0.14 |
Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 3.22 ± 0.05 | 0.00 ± 0.00 | 6.44 ± 0.10 |
Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 0.52 ± 0.02 | 0.00 ± 0.00 | 1.04 ± 0.03 |
Model | MY | NLR | Causal Reasoning | Natural Language Inference |
|---|---|---|---|---|
SEA-LION v4 (Qwen) 32B AISG | 49.56 ± 0.14 | 61.75 ± 0.15 | 66.38 ± 0.20 | 57.13 ± 0.20 |
Gemma 3 27B | 48.14 ± 0.17 | 55.45 ± 0.29 | 57.08 ± 0.51 | 53.82 ± 0.16 |
SEA-LION v4 (Gemma) 27B AISG | 47.18 ± 0.15 | 55.03 ± 0.25 | 57.15 ± 0.47 | 52.92 ± 0.18 |
Llama 4 Scout 109B MoE Meta | 45.54 ± 0.17 | 51.07 ± 0.18 | 70.40 ± 0.32 | 31.74 ± 0.21 |
Qwen 3 Next 80B MoE Alibaba | 44.88 ± 0.16 | 53.02 ± 0.30 | 59.27 ± 0.60 | 46.77 ± 0.22 |
Qwen 3 32B Alibaba | 44.60 ± 0.19 | 56.38 ± 0.29 | 60.68 ± 0.45 | 52.07 ± 0.34 |
Gemma 3 12B | 42.46 ± 0.15 | 35.58 ± 0.41 | 44.93 ± 0.77 | 26.22 ± 0.16 |
Qwen 3 VL 32B Alibaba | 40.89 ± 0.19 | 46.49 ± 0.27 | 49.48 ± 0.48 | 43.49 ± 0.28 |
SEA-LION v3 (Llama) 70B AISG | 38.21 ± 0.28 | 49.46 ± 0.52 | 54.30 ± 0.91 | 44.63 ± 0.47 |
Tulu 3 70B AI2 | 35.11 ± 0.17 | 50.27 ± 0.37 | 56.78 ± 0.51 | 43.75 ± 0.43 |
Qwen 3 14B Alibaba | 32.46 ± 0.13 | 24.78 ± 0.33 | 31.75 ± 0.52 | 17.80 ± 0.30 |
SEA-LION v4 (Qwen VL) 8B AISG | 30.67 ± 0.19 | 31.02 ± 0.25 | 38.77 ± 0.41 | 23.28 ± 0.25 |
Qwen 3 VL 8B Alibaba | 30.25 ± 0.22 | 34.45 ± 0.43 | 36.85 ± 0.59 | 32.05 ± 0.46 |
Qwen 2.5 72B Alibaba | 27.54 ± 0.20 | 9.00 ± 0.18 | 5.58 ± 0.22 | 12.41 ± 0.25 |
Qwen 2.5 32B Alibaba | 26.99 ± 0.17 | 9.64 ± 0.38 | 13.68 ± 0.64 | 5.60 ± 0.47 |
Qwen 3 8B Alibaba | 26.26 ± 0.20 | 16.93 ± 0.33 | 30.70 ± 0.65 | 3.16 ± 0.14 |
Mistral Large 2411 123B Mistral AI | 26.17 ± 0.29 | 8.88 ± 0.74 | 7.00 ± 1.42 | 10.76 ± 0.40 |
Qwen 3 30B MoE Alibaba | 25.62 ± 0.12 | 22.03 ± 0.24 | 0.00 ± 0.00 | 44.07 ± 0.47 |
Gemma 2 27B | 23.97 ± 0.24 | 28.94 ± 0.56 | 25.78 ± 1.05 | 32.10 ± 0.27 |
SEA-LION v4 (Qwen VL) 4B AISG | 23.31 ± 0.17 | 25.03 ± 0.23 | 28.80 ± 0.41 | 21.26 ± 0.21 |
Llama 3.3 70B Meta | 23.07 ± 0.15 | 21.25 ± 0.16 | 0.00 ± 0.00 | 42.51 ± 0.31 |
Qwen 3 VL 4B Alibaba | 20.42 ± 0.16 | 22.15 ± 0.34 | 22.77 ± 0.57 | 21.53 ± 0.32 |
Llama 3.1 70B Meta | 19.87 ± 0.24 | 17.54 ± 0.35 | 0.00 ± 0.00 | 35.08 ± 0.71 |
SEA-LION v3 (Llama) 8B AISG | 17.88 ± 0.26 | 10.73 ± 0.72 | 4.45 ± 1.29 | 17.01 ± 0.70 |
ERNIE 4.5 21B MoE Baidu | 17.70 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Command A 03-2025 111B CohereLabs | 16.52 ± 0.23 | 9.28 ± 0.29 | 0.00 ± 0.00 | 18.57 ± 0.58 |
SEA-LION v3 (Gemma 2) 9B AISG | 15.40 ± 0.22 | 12.55 ± 0.18 | 0.00 ± 0.00 | 25.10 ± 0.37 |
Qwen 2.5 14B Alibaba | 13.59 ± 0.19 | 2.18 ± 0.34 | 0.00 ± 0.00 | 4.36 ± 0.68 |
Llama 3 70B Meta | 13.09 ± 0.17 | 34.05 ± 0.39 | 32.92 ± 0.66 | 35.18 ± 0.43 |
Apertus 70B Swiss AI | 13.09 ± 0.25 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Sailor2 8B SAIL | 11.65 ± 0.13 | 21.87 ± 0.39 | 0.00 ± 0.00 | 43.74 ± 0.79 |
Tulu 3 8B AI2 | 10.95 ± 0.14 | 0.68 ± 0.06 | 0.00 ± 0.00 | 1.35 ± 0.13 |
MERaLiON 2 10B A*STAR | 10.66 ± 0.16 | 6.30 ± 0.15 | 0.00 ± 0.00 | 12.61 ± 0.31 |
Babel 83B Alibaba-DAMO | 9.87 ± 0.23 | 0.20 ± 0.16 | 0.00 ± 0.00 | 0.40 ± 0.32 |
Gemma 2 9B | 9.63 ± 0.17 | 4.58 ± 0.32 | 0.00 ± 0.00 | 9.16 ± 0.63 |
Apertus 8B Swiss AI | 9.30 ± 0.21 | 0.13 ± 0.11 | 0.00 ± 0.00 | 0.25 ± 0.22 |
Sailor2 20B SAIL | 8.55 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.1 8B Meta | 8.45 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Qwen 2.5 7B Alibaba | 8.04 ± 0.15 | 2.98 ± 0.41 | 0.00 ± 0.00 | 5.97 ± 0.82 |
Babel 9B Alibaba-DAMO | 8.01 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
SeaLLMs V3 7B Alibaba-DAMO | 7.13 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Command R+ 08-2024 104B CohereLabs | 6.61 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
phi-4 14B Microsoft | 6.45 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 0.25 ± 0.17 | 0.25 ± 0.24 | 0.25 ± 0.22 |
Command R 08-2024 32B CohereLabs | 4.87 ± 0.15 | 0.68 ± 0.34 | 0.00 ± 0.00 | 1.37 ± 0.69 |
Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Ministral 2410 8B Mistral AI | 3.90 ± 0.13 | 0.15 ± 0.12 | 0.00 ± 0.00 | 0.29 ± 0.23 |
Olmo 3 7B AI2 | 3.41 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3 8B Meta | 3.20 ± 0.06 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Command R7B 12-2024 7B CohereLabs | 3.19 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Mistral Small 3.1 2503 24B Mistral AI | 2.22 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | MY | NLU | Belebele QA | Sentiment Analysis |
|---|---|---|---|---|
SEA-LION v4 (Qwen) 32B AISG | 49.56 ± 0.14 | 61.01 ± 0.12 | 81.52 ± 0.22 | 40.49 ± 0.13 |
Gemma 3 27B | 48.14 ± 0.17 | 58.49 ± 0.18 | 69.85 ± 0.31 | 47.13 ± 0.24 |
SEA-LION v4 (Gemma) 27B AISG | 47.18 ± 0.15 | 58.33 ± 0.24 | 69.11 ± 0.41 | 47.56 ± 0.26 |
Llama 4 Scout 109B MoE Meta | 45.54 ± 0.17 | 44.56 ± 0.24 | 75.93 ± 0.19 | 13.19 ± 0.43 |
Qwen 3 Next 80B MoE Alibaba | 44.88 ± 0.16 | 58.39 ± 0.24 | 74.89 ± 0.43 | 41.88 ± 0.20 |
Qwen 3 32B Alibaba | 44.60 ± 0.19 | 57.92 ± 0.39 | 74.48 ± 0.73 | 41.37 ± 0.31 |
Gemma 3 12B | 42.46 ± 0.15 | 58.03 ± 0.23 | 67.11 ± 0.41 | 48.96 ± 0.22 |
Qwen 3 VL 32B Alibaba | 40.89 ± 0.19 | 60.61 ± 0.15 | 74.04 ± 0.27 | 47.17 ± 0.17 |
SEA-LION v3 (Llama) 70B AISG | 38.21 ± 0.28 | 36.65 ± 0.35 | 73.30 ± 0.70 | 0.00 ± 0.00 |
Tulu 3 70B AI2 | 35.11 ± 0.17 | 50.68 ± 0.40 | 66.85 ± 0.68 | 34.50 ± 0.40 |
Qwen 3 14B Alibaba | 32.46 ± 0.13 | 49.11 ± 0.30 | 61.26 ± 0.63 | 36.96 ± 0.23 |
SEA-LION v4 (Qwen VL) 8B AISG | 30.67 ± 0.19 | 47.22 ± 0.32 | 58.22 ± 0.60 | 36.21 ± 0.27 |
Qwen 3 VL 8B Alibaba | 30.25 ± 0.22 | 45.00 ± 0.43 | 53.59 ± 0.70 | 36.41 ± 0.26 |
Qwen 2.5 72B Alibaba | 27.54 ± 0.20 | 44.24 ± 0.25 | 48.63 ± 0.45 | 39.85 ± 0.22 |
Qwen 2.5 32B Alibaba | 26.99 ± 0.17 | 37.70 ± 0.41 | 43.04 ± 0.71 | 32.36 ± 0.33 |
Qwen 3 8B Alibaba | 26.26 ± 0.20 | 36.93 ± 0.33 | 40.04 ± 0.73 | 33.83 ± 0.33 |
Mistral Large 2411 123B Mistral AI | 26.17 ± 0.29 | 38.82 ± 0.68 | 41.41 ± 1.39 | 36.23 ± 0.41 |
Qwen 3 30B MoE Alibaba | 25.62 ± 0.12 | 50.08 ± 0.24 | 69.04 ± 0.44 | 31.12 ± 0.29 |
Gemma 2 27B | 23.97 ± 0.24 | 26.38 ± 0.91 | 31.30 ± 1.50 | 21.46 ± 0.87 |
SEA-LION v4 (Qwen VL) 4B AISG | 23.31 ± 0.17 | 37.18 ± 0.16 | 42.81 ± 0.25 | 31.54 ± 0.14 |
Llama 3.3 70B Meta | 23.07 ± 0.15 | 1.85 ± 0.36 | 3.70 ± 0.72 | 0.00 ± 0.00 |
Qwen 3 VL 4B Alibaba | 20.42 ± 0.16 | 40.57 ± 0.20 | 44.44 ± 0.38 | 36.70 ± 0.13 |
Llama 3.1 70B Meta | 19.87 ± 0.24 | 18.81 ± 0.65 | 37.63 ± 1.30 | 0.00 ± 0.00 |
SEA-LION v3 (Llama) 8B AISG | 17.88 ± 0.26 | 29.80 ± 0.62 | 37.41 ± 0.81 | 22.19 ± 0.97 |
ERNIE 4.5 21B MoE Baidu | 17.70 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Command A 03-2025 111B CohereLabs | 16.52 ± 0.23 | 23.73 ± 0.65 | 30.78 ± 1.22 | 16.68 ± 0.75 |
SEA-LION v3 (Gemma 2) 9B AISG | 15.40 ± 0.22 | 2.26 ± 0.67 | 4.52 ± 1.35 | 0.00 ± 0.00 |
Qwen 2.5 14B Alibaba | 13.59 ± 0.19 | 16.98 ± 0.34 | 33.96 ± 0.68 | 0.00 ± 0.00 |
Llama 3 70B Meta | 13.09 ± 0.17 | 7.67 ± 0.56 | 15.33 ± 1.12 | 0.00 ± 0.00 |
Apertus 70B Swiss AI | 13.09 ± 0.25 | 10.89 ± 0.71 | 21.78 ± 1.43 | 0.00 ± 0.00 |
Sailor2 8B SAIL | 11.65 ± 0.13 | 0.13 ± 0.13 | 0.26 ± 0.27 | 0.00 ± 0.00 |
Tulu 3 8B AI2 | 10.95 ± 0.14 | 2.94 ± 0.32 | 5.89 ± 0.64 | 0.00 ± 0.00 |
MERaLiON 2 10B A*STAR | 10.66 ± 0.16 | 0.03 ± 0.07 | 0.00 ± 0.00 | 0.07 ± 0.13 |
Babel 83B Alibaba-DAMO | 9.87 ± 0.23 | 19.00 ± 1.16 | 29.89 ± 2.08 | 8.11 ± 1.02 |
Gemma 2 9B | 9.63 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Apertus 8B Swiss AI | 9.30 ± 0.21 | 3.35 ± 0.74 | 3.78 ± 1.27 | 2.92 ± 0.74 |
Sailor2 20B SAIL | 8.55 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.1 8B Meta | 8.45 ± 0.19 | 3.74 ± 0.51 | 7.48 ± 1.02 | 0.00 ± 0.00 |
Qwen 2.5 7B Alibaba | 8.04 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Babel 9B Alibaba-DAMO | 8.01 ± 0.18 | 11.20 ± 0.31 | 0.00 ± 0.00 | 22.39 ± 0.63 |
SeaLLMs V3 7B Alibaba-DAMO | 7.13 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Command R+ 08-2024 104B CohereLabs | 6.61 ± 0.18 | 0.91 ± 0.34 | 0.00 ± 0.00 | 1.83 ± 0.68 |
phi-4 14B Microsoft | 6.45 ± 0.18 | 8.89 ± 0.35 | 0.00 ± 0.00 | 17.77 ± 0.69 |
Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Command R 08-2024 32B CohereLabs | 4.87 ± 0.15 | 0.54 ± 0.31 | 1.07 ± 0.61 | 0.00 ± 0.00 |
Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Ministral 2410 8B Mistral AI | 3.90 ± 0.13 | 0.03 ± 0.05 | 0.00 ± 0.00 | 0.05 ± 0.10 |
Olmo 3 7B AI2 | 3.41 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3 8B Meta | 3.20 ± 0.06 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Command R7B 12-2024 7B CohereLabs | 3.19 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Mistral Small 3.1 2503 24B Mistral AI | 2.22 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | MY | Safety | Toxicity Detection |
|---|---|---|---|
SEA-LION v4 (Qwen) 32B AISG | 49.56 ± 0.14 | 53.48 ± 0.31 | 53.48 ± 0.31 |
Gemma 3 27B | 48.14 ± 0.17 | 34.35 ± 0.21 | 34.35 ± 0.21 |
SEA-LION v4 (Gemma) 27B AISG | 47.18 ± 0.15 | 33.82 ± 0.29 | 33.82 ± 0.29 |
Llama 4 Scout 109B MoE Meta | 45.54 ± 0.17 | 45.38 ± 0.20 | 45.38 ± 0.20 |
Qwen 3 Next 80B MoE Alibaba | 44.88 ± 0.16 | 37.85 ± 0.34 | 37.85 ± 0.34 |
Qwen 3 32B Alibaba | 44.60 ± 0.19 | 43.53 ± 0.56 | 43.53 ± 0.56 |
Gemma 3 12B | 42.46 ± 0.15 | 40.15 ± 0.30 | 40.15 ± 0.30 |
Qwen 3 VL 32B Alibaba | 40.89 ± 0.19 | 32.85 ± 0.40 | 32.85 ± 0.40 |
SEA-LION v3 (Llama) 70B AISG | 38.21 ± 0.28 | 20.88 ± 1.13 | 20.88 ± 1.13 |
Tulu 3 70B AI2 | 35.11 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Qwen 3 14B Alibaba | 32.46 ± 0.13 | 33.15 ± 0.35 | 33.15 ± 0.35 |
SEA-LION v4 (Qwen VL) 8B AISG | 30.67 ± 0.19 | 18.98 ± 0.63 | 18.98 ± 0.63 |
Qwen 3 VL 8B Alibaba | 30.25 ± 0.22 | 22.10 ± 0.43 | 22.10 ± 0.43 |
Qwen 2.5 72B Alibaba | 27.54 ± 0.20 | 21.88 ± 0.48 | 21.88 ± 0.48 |
Qwen 2.5 32B Alibaba | 26.99 ± 0.17 | 25.07 ± 0.37 | 25.07 ± 0.37 |
Qwen 3 8B Alibaba | 26.26 ± 0.20 | 31.78 ± 0.31 | 31.78 ± 0.31 |
Mistral Large 2411 123B Mistral AI | 26.17 ± 0.29 | 30.02 ± 1.52 | 30.02 ± 1.52 |
Qwen 3 30B MoE Alibaba | 25.62 ± 0.12 | 22.18 ± 0.22 | 22.18 ± 0.22 |
Gemma 2 27B | 23.97 ± 0.24 | 0.00 ± 0.00 | 0.00 ± 0.00 |
SEA-LION v4 (Qwen VL) 4B AISG | 23.31 ± 0.17 | 24.03 ± 0.47 | 24.03 ± 0.47 |
Llama 3.3 70B Meta | 23.07 ± 0.15 | 22.03 ± 0.67 | 22.03 ± 0.67 |
Qwen 3 VL 4B Alibaba | 20.42 ± 0.16 | 2.97 ± 0.29 | 2.97 ± 0.29 |
Llama 3.1 70B Meta | 19.87 ± 0.24 | 0.00 ± 0.00 | 0.00 ± 0.00 |
SEA-LION v3 (Llama) 8B AISG | 17.88 ± 0.26 | 7.25 ± 1.14 | 7.25 ± 1.14 |
ERNIE 4.5 21B MoE Baidu | 17.70 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Command A 03-2025 111B CohereLabs | 16.52 ± 0.23 | 0.00 ± 0.00 | 0.00 ± 0.00 |
SEA-LION v3 (Gemma 2) 9B AISG | 15.40 ± 0.22 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Qwen 2.5 14B Alibaba | 13.59 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3 70B Meta | 13.09 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Apertus 70B Swiss AI | 13.09 ± 0.25 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Sailor2 8B SAIL | 11.65 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Tulu 3 8B AI2 | 10.95 ± 0.14 | 0.00 ± 0.00 | 0.00 ± 0.00 |
MERaLiON 2 10B A*STAR | 10.66 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Babel 83B Alibaba-DAMO | 9.87 ± 0.23 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Gemma 2 9B | 9.63 ± 0.17 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Apertus 8B Swiss AI | 9.30 ± 0.21 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Sailor2 20B SAIL | 8.55 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.1 8B Meta | 8.45 ± 0.19 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Qwen 2.5 7B Alibaba | 8.04 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Babel 9B Alibaba-DAMO | 8.01 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 |
SeaLLMs V3 7B Alibaba-DAMO | 7.13 ± 0.16 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Command R+ 08-2024 104B CohereLabs | 6.61 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 |
phi-4 14B Microsoft | 6.45 ± 0.18 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Command R 08-2024 32B CohereLabs | 4.87 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Ministral 2410 8B Mistral AI | 3.90 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Olmo 3 7B AI2 | 3.41 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3 8B Meta | 3.20 ± 0.06 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Command R7B 12-2024 7B CohereLabs | 3.19 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Mistral Small 3.1 2503 24B Mistral AI | 2.22 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Model | MY | Knowledge | Global MMLU Lite |
|---|---|---|---|
SEA-LION v4 (Qwen) 32B AISG | 49.56 ± 0.14 | 49.98 ± 0.18 | 49.98 ± 0.18 |
Gemma 3 27B | 48.14 ± 0.17 | 45.09 ± 0.25 | 45.09 ± 0.25 |
SEA-LION v4 (Gemma) 27B AISG | 47.18 ± 0.15 | 44.92 ± 0.36 | 44.92 ± 0.36 |
Llama 4 Scout 109B MoE Meta | 45.54 ± 0.17 | 48.08 ± 0.22 | 48.08 ± 0.22 |
Qwen 3 Next 80B MoE Alibaba | 44.88 ± 0.16 | 44.72 ± 0.38 | 44.72 ± 0.38 |
Qwen 3 32B Alibaba | 44.60 ± 0.19 | 41.41 ± 0.48 | 41.41 ± 0.48 |
Gemma 3 12B | 42.46 ± 0.15 | 36.75 ± 0.31 | 36.75 ± 0.31 |
Qwen 3 VL 32B Alibaba | 40.89 ± 0.19 | 41.73 ± 0.35 | 41.73 ± 0.35 |
SEA-LION v3 (Llama) 70B AISG | 38.21 ± 0.28 | 34.19 ± 0.70 | 34.19 ± 0.70 |
Tulu 3 70B AI2 | 35.11 ± 0.17 | 37.05 ± 0.37 | 37.05 ± 0.37 |
Qwen 3 14B Alibaba | 32.46 ± 0.13 | 27.40 ± 0.43 | 27.40 ± 0.43 |
SEA-LION v4 (Qwen VL) 8B AISG | 30.67 ± 0.19 | 32.47 ± 0.34 | 32.47 ± 0.34 |
Qwen 3 VL 8B Alibaba | 30.25 ± 0.22 | 28.73 ± 0.39 | 28.73 ± 0.39 |
Qwen 2.5 72B Alibaba | 27.54 ± 0.20 | 25.00 ± 0.38 | 25.00 ± 0.38 |
Qwen 2.5 32B Alibaba | 26.99 ± 0.17 | 27.66 ± 0.57 | 27.66 ± 0.57 |
Qwen 3 8B Alibaba | 26.26 ± 0.20 | 16.75 ± 0.47 | 16.75 ± 0.47 |
Mistral Large 2411 123B Mistral AI | 26.17 ± 0.29 | 12.36 ± 0.59 | 12.36 ± 0.59 |
Qwen 3 30B MoE Alibaba | 25.62 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Gemma 2 27B | 23.97 ± 0.24 | 23.70 ± 0.52 | 23.70 ± 0.52 |
SEA-LION v4 (Qwen VL) 4B AISG | 23.31 ± 0.17 | 25.20 ± 0.32 | 25.20 ± 0.32 |
Llama 3.3 70B Meta | 23.07 ± 0.15 | 4.80 ± 0.22 | 4.80 ± 0.22 |
Qwen 3 VL 4B Alibaba | 20.42 ± 0.16 | 21.51 ± 0.38 | 21.51 ± 0.38 |
Llama 3.1 70B Meta | 19.87 ± 0.24 | 5.34 ± 0.45 | 5.34 ± 0.45 |
SEA-LION v3 (Llama) 8B AISG | 17.88 ± 0.26 | 14.03 ± 0.57 | 14.03 ± 0.57 |
ERNIE 4.5 21B MoE Baidu | 17.70 ± 0.15 | 20.91 ± 0.41 | 20.91 ± 0.41 |
Command A 03-2025 111B CohereLabs | 16.52 ± 0.23 | 6.21 ± 0.45 | 6.21 ± 0.45 |
SEA-LION v3 (Gemma 2) 9B AISG | 15.40 ± 0.22 | 0.06 ± 0.05 | 0.06 ± 0.05 |
Qwen 2.5 14B Alibaba | 13.59 ± 0.19 | 0.36 ± 0.07 | 0.36 ± 0.07 |
Llama 3 70B Meta | 13.09 ± 0.17 | 8.48 ± 0.40 | 8.48 ± 0.40 |
Apertus 70B Swiss AI | 13.09 ± 0.25 | 0.68 ± 0.12 | 0.68 ± 0.12 |
Sailor2 8B SAIL | 11.65 ± 0.13 | 0.04 ± 0.03 | 0.04 ± 0.03 |
Tulu 3 8B AI2 | 10.95 ± 0.14 | 3.77 ± 0.26 | 3.77 ± 0.26 |
MERaLiON 2 10B A*STAR | 10.66 ± 0.16 | 1.20 ± 0.23 | 1.20 ± 0.23 |
Babel 83B Alibaba-DAMO | 9.87 ± 0.23 | 11.23 ± 0.70 | 11.23 ± 0.70 |
Gemma 2 9B | 9.63 ± 0.17 | 0.54 ± 0.10 | 0.54 ± 0.10 |
Apertus 8B Swiss AI | 9.30 ± 0.21 | 2.72 ± 0.29 | 2.72 ± 0.29 |
Sailor2 20B SAIL | 8.55 ± 0.11 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Llama 3.1 8B Meta | 8.45 ± 0.19 | 2.32 ± 0.25 | 2.32 ± 0.25 |
Qwen 2.5 7B Alibaba | 8.04 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Babel 9B Alibaba-DAMO | 8.01 ± 0.18 | 0.04 ± 0.03 | 0.04 ± 0.03 |
SeaLLMs V3 7B Alibaba-DAMO | 7.13 ± 0.16 | 0.19 ± 0.06 | 0.19 ± 0.06 |
Command R+ 08-2024 104B CohereLabs | 6.61 ± 0.18 | 0.73 ± 0.15 | 0.73 ± 0.15 |
phi-4 14B Microsoft | 6.45 ± 0.18 | 0.01 ± 0.02 | 0.01 ± 0.02 |
Aya Expanse 32B CohereLabs | 6.44 ± 0.15 | 3.30 ± 0.27 | 3.30 ± 0.27 |
Command R 08-2024 32B CohereLabs | 4.87 ± 0.15 | 1.33 ± 0.19 | 1.33 ± 0.19 |
Olmo 2 0325 32B AI2 | 4.38 ± 0.15 | 0.07 ± 0.04 | 0.07 ± 0.04 |
Ministral 2410 8B Mistral AI | 3.90 ± 0.13 | 0.16 ± 0.09 | 0.16 ± 0.09 |
Olmo 3 7B AI2 | 3.41 ± 0.11 | 0.03 ± 0.02 | 0.03 ± 0.02 |
Llama 3 8B Meta | 3.20 ± 0.06 | 0.00 ± 0.01 | 0.00 ± 0.01 |
Command R7B 12-2024 7B CohereLabs | 3.19 ± 0.13 | 0.05 ± 0.03 | 0.05 ± 0.03 |
Aya Expanse 8B CohereLabs | 3.03 ± 0.13 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Mistral Small 3.1 2503 24B Mistral AI | 2.22 ± 0.11 | 0.10 ± 0.05 | 0.10 ± 0.05 |
Olmo 2 1124 7B AI2 | 2.18 ± 0.12 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Olmo 2 1124 13B AI2 | 1.84 ± 0.09 | 0.03 ± 0.03 | 0.03 ± 0.03 |