Elo Performance
Relative scoring (Win/Lose/Tie) via pairwise matchups across tasks and languages
Difference of
When Model A has 400 more Elo points than Model B, the win rates are 91% and 9% respectively when compared against each other.
Matches sampled equally across languages first, and then by competency
Rankings show relative performance based on match outcomes between models
This can augment SEA-HELM Average Scores by only considering head-to-head comparisons.
Rankings may differ as Elo uses Win/Lose/Tie outcomes and does not account for the score magnitudes.
Read more about Elo ratings in this blog post
Overall Elo Ratings
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
![]() ![]() 27B 1046.9±1.1 |
![]() ![]() 27B 1046.6±0.9 |
![]() ![]() 80B MoE 1045.4±1.1 |
![]() ![]() 32B 1044.2±0.8 |
![]() ![]() 70B 1037.3±1.1 |
![]() ![]() 12B 1035.2±1.0 |
![]() ![]() 32B 1034.6±0.8 |
![]() ![]() 109B MoE 1029.1±1.0 |
![]() ![]() 30B MoE 1022.3±1.2 |
![]() ![]() 70B 1022.1±1.1 |
![]() ![]() 72B 1016.3±0.7 |
![]() ![]() 70B 1014.3±1.1 |
![]() ![]() 14B 1013.9±1.0 |
![]() ![]() 27B 1006.8±0.8 |
![]() ![]() 123B 1003.1±1.0 |
![]() ![]() 9B 1002.1±1.1 |
![]() ![]() 32B 999.8±1.0 |
![]() ![]() 70B 999.7±0.9 |
![]() ![]() 111B 998.3±1.2 |
![]() ![]() 8B 993.6±1.1 |
![]() ![]() 8B 986.6±1.0 |
![]() ![]() 9B 977.0±1.1 |
![]() ![]() 10B 971.8±1.1 |
![]() ![]() 14B 970.0±1.0 |
![]() ![]() 21B MoE 966.6±1.0 |
![]() ![]() 32B 956.9±1.0 |
![]() ![]() 70B 954.6±0.9 |
![]() ![]() 104B 950.6±1.3 |
![]() ![]() 8B 942.3±1.0 |
![]() ![]() 8B 937.7±0.9 |
![]() ![]() 70B 935.8±1.0 |
![]() ![]() 32B 934.9±1.3 |
![]() ![]() 7B 934.9±1.2 |
![]() ![]() 8B 927.0±1.1 |
![]() ![]() 24B 922.8±1.1 |
![]() ![]() 32B 920.9±1.0 |
![]() ![]() 20B 919.7±0.9 |
![]() ![]() 8B 915.4±0.9 |
![]() ![]() 8B 907.8±1.0 |
![]() ![]() 14B 905.6±0.9 |
![]() ![]() 9B 903.6±1.4 |
![]() ![]() 83B 903.6±1.0 |
![]() ![]() 7B 895.8±1.2 |
![]() ![]() 8B 894.8±1.1 |
![]() ![]() 7B 869.1±1.1 |
![]() ![]() 8B 869.0±1.2 |
![]() ![]() 13B 866.2±1.0 |
![]() ![]() 7B 835.4±1.0 |
Language Elo Ratings by Model
Average of 30 bootstraps. 95% CI are shown.
Model Size: ≤200B
Open instruct models only
Model | SEA | MY | TL | ID | MS | TA | TH | VI |
---|---|---|---|---|---|---|---|---|
![]() ![]() SEA-LION v4 (Gemma) 27B AISG | 1046.9 ± 1.1 | 1077.9 ± 1.0 | 1051.7 ± 0.7 | 1026.1 ± 0.9 | 1041.4 ± 1.1 | 1080.0 ± 1.1 | 1043.3 ± 0.9 | 1012.9 ± 0.9 |
![]() ![]() Gemma 3 27B | 1046.6 ± 0.9 | 1083.3 ± 1.0 | 1050.1 ± 1.1 | 1026.6 ± 1.1 | 1041.4 ± 0.9 | 1080.4 ± 1.2 | 1043.5 ± 1.1 | 1005.5 ± 1.3 |
![]() ![]() Qwen 3 Next 80B MoE Alibaba | 1045.4 ± 1.1 | 1061.0 ± 1.2 | 1037.9 ± 0.8 | 1030.6 ± 1.0 | 1036.5 ± 1.0 | 1052.5 ± 1.0 | 1053.8 ± 0.9 | 1057.2 ± 0.9 |
![]() ![]() SEA-LION v4 (Qwen) 32B AISG | 1044.2 ± 0.8 | 1072.9 ± 1.2 | 1038.0 ± 0.9 | 1029.1 ± 0.9 | 1032.6 ± 1.0 | 1065.8 ± 0.9 | 1042.3 ± 1.1 | 1039.9 ± 0.8 |
![]() ![]() SEA-LION v3 (Llama) 70B AISG | 1037.3 ± 1.1 | 1044.0 ± 1.2 | 1044.0 ± 1.1 | 1025.8 ± 0.8 | 1021.8 ± 0.9 | 1050.8 ± 1.0 | 1035.5 ± 1.1 | 1039.2 ± 0.7 |
![]() ![]() Gemma 3 12B | 1035.2 ± 1.0 | 1062.7 ± 1.1 | 1037.6 ± 1.0 | 1017.5 ± 0.9 | 1028.5 ± 1.1 | 1066.0 ± 1.2 | 1030.9 ± 1.0 | 1007.8 ± 0.9 |
![]() ![]() Qwen 3 32B Alibaba | 1034.6 ± 0.8 | 1051.7 ± 0.9 | 1019.4 ± 0.7 | 1025.7 ± 0.8 | 1020.4 ± 0.9 | 1043.2 ± 1.1 | 1043.6 ± 0.9 | 1038.1 ± 1.0 |
![]() ![]() Llama 4 Scout 109B MoE Meta | 1029.1 ± 1.0 | 1076.4 ± 1.1 | 1026.3 ± 0.7 | 1012.0 ± 0.9 | 1018.0 ± 1.1 | 1050.7 ± 0.8 | 1020.2 ± 0.7 | 1005.9 ± 1.1 |
![]() ![]() Qwen 3 30B MoE Alibaba | 1022.3 ± 1.2 | 977.4 ± 1.2 | 1018.6 ± 0.8 | 1021.2 ± 1.0 | 1036.4 ± 1.1 | 1017.8 ± 0.9 | 1040.3 ± 0.8 | 1046.6 ± 1.1 |
![]() ![]() Tulu 3 70B AI2 | 1022.1 ± 1.1 | 1034.8 ± 1.1 | 1022.4 ± 0.9 | 1013.9 ± 0.9 | 1018.2 ± 0.8 | 1011.3 ± 0.9 | 1023.9 ± 1.0 | 1030.6 ± 1.0 |
![]() ![]() Qwen 2.5 72B Alibaba | 1016.3 ± 0.7 | 1013.8 ± 1.2 | 1013.0 ± 1.2 | 1016.7 ± 0.8 | 1021.8 ± 0.8 | 984.6 ± 1.0 | 1038.1 ± 0.9 | 1027.6 ± 0.8 |
![]() ![]() Llama 3.3 70B Meta | 1014.3 ± 1.1 | 983.9 ± 1.0 | 1024.6 ± 0.7 | 1014.4 ± 0.8 | 1014.6 ± 1.0 | 1033.3 ± 1.0 | 1010.8 ± 1.0 | 1017.5 ± 0.9 |
![]() ![]() Qwen 3 14B Alibaba | 1013.9 ± 1.0 | 1021.1 ± 1.1 | 999.8 ± 1.0 | 1008.6 ± 0.7 | 995.1 ± 1.2 | 1017.4 ± 0.8 | 1023.5 ± 1.1 | 1028.4 ± 0.9 |
![]() ![]() Gemma 2 27B | 1006.8 ± 0.8 | 999.8 ± 1.2 | 1020.9 ± 0.8 | 1004.4 ± 1.0 | 1001.7 ± 1.2 | 1017.5 ± 1.2 | 1001.5 ± 0.8 | 1002.0 ± 1.0 |
![]() ![]() Mistral Large 2411 123B Mistral AI | 1003.1 ± 1.0 | 1005.7 ± 1.2 | 1007.5 ± 1.4 | 998.3 ± 0.9 | 991.8 ± 1.0 | 1022.0 ± 0.9 | 998.5 ± 1.0 | 996.6 ± 1.1 |
![]() ![]() SEA-LION v3 (Gemma 2) 9B AISG | 1002.1 ± 1.1 | 934.5 ± 1.2 | 1021.3 ± 1.0 | 1007.5 ± 1.0 | 1009.4 ± 1.1 | 1020.7 ± 1.2 | 1003.3 ± 0.6 | 1010.3 ± 0.9 |
![]() ![]() Qwen 2.5 32B Alibaba | 999.8 ± 1.0 | 1007.2 ± 1.1 | 991.2 ± 0.8 | 1006.3 ± 0.9 | 990.0 ± 1.0 | 987.5 ± 1.1 | 1014.3 ± 0.8 | 1005.1 ± 1.1 |
![]() ![]() Llama 3.1 70B Meta | 999.7 ± 0.9 | 966.9 ± 1.1 | 1023.1 ± 1.1 | 1005.5 ± 1.1 | 1002.9 ± 0.8 | 987.8 ± 1.2 | 1003.4 ± 1.1 | 1006.2 ± 1.2 |
![]() ![]() Command A 03-2025 111B CohereLabs | 998.3 ± 1.2 | 949.9 ± 1.0 | 981.6 ± 1.1 | 1035.8 ± 1.1 | 1007.6 ± 1.0 | 1029.5 ± 1.1 | 949.3 ± 1.1 | 1033.6 ± 1.0 |
![]() ![]() Qwen 3 8B Alibaba | 993.6 ± 1.1 | 994.6 ± 1.3 | 979.4 ± 0.9 | 1004.8 ± 0.9 | 991.9 ± 1.0 | 953.0 ± 0.9 | 1013.2 ± 0.8 | 1013.2 ± 0.9 |
![]() ![]() SEA-LION v3 (Llama) 8B AISG | 986.6 ± 1.0 | 966.9 ± 1.0 | 985.3 ± 0.8 | 987.7 ± 0.9 | 988.8 ± 0.8 | 994.7 ± 0.9 | 994.8 ± 0.8 | 981.7 ± 1.1 |
![]() ![]() Gemma 2 9B | 977.0 ± 1.1 | 905.0 ± 1.2 | 994.4 ± 0.8 | 991.6 ± 0.8 | 983.0 ± 1.0 | 991.3 ± 1.0 | 984.5 ± 1.0 | 984.1 ± 0.8 |
![]() ![]() MERaLiON 2 10B A*STAR | 971.8 ± 1.1 | 923.9 ± 1.1 | 989.0 ± 1.0 | 991.2 ± 0.9 | 971.3 ± 1.1 | 978.9 ± 1.2 | 968.7 ± 1.0 | 978.8 ± 1.0 |
![]() ![]() Qwen 2.5 14B Alibaba | 970.0 ± 1.0 | 926.6 ± 0.9 | 970.0 ± 1.1 | 991.4 ± 1.1 | 975.0 ± 1.1 | 938.5 ± 1.1 | 997.3 ± 1.2 | 982.0 ± 0.8 |
![]() ![]() ERNIE 4.5 21B MoE Baidu | 966.6 ± 1.0 | 961.6 ± 1.2 | 969.8 ± 0.9 | 964.3 ± 1.1 | 963.6 ± 0.9 | 981.6 ± 1.0 | 970.9 ± 1.0 | 957.2 ± 1.0 |
![]() ![]() Aya Expanse 32B CohereLabs | 956.9 ± 1.0 | 878.4 ± 0.9 | 958.5 ± 0.8 | 1007.5 ± 1.1 | 969.4 ± 1.0 | 959.9 ± 1.1 | 910.1 ± 0.9 | 1002.4 ± 1.1 |
![]() ![]() Llama 3 70B Meta | 954.6 ± 0.9 | 914.7 ± 0.9 | 988.8 ± 0.8 | 972.8 ± 1.0 | 955.5 ± 1.4 | 900.4 ± 1.3 | 973.4 ± 1.0 | 976.8 ± 1.3 |
![]() ![]() Command R+ 08-2024 104B CohereLabs | 950.6 ± 1.3 | 885.4 ± 1.1 | 957.7 ± 1.3 | 975.7 ± 1.0 | 951.6 ± 0.8 | 950.1 ± 1.1 | 937.6 ± 0.9 | 991.8 ± 1.0 |
![]() ![]() Llama 3.1 8B Meta | 942.3 ± 1.0 | 906.1 ± 1.2 | 953.1 ± 1.0 | 967.6 ± 1.0 | 959.2 ± 1.2 | 901.2 ± 1.2 | 956.6 ± 0.9 | 949.3 ± 1.2 |
![]() ![]() Tulu 3 8B AI2 | 937.7 ± 0.9 | 918.8 ± 1.1 | 914.7 ± 0.9 | 941.2 ± 1.2 | 952.3 ± 1.2 | 909.9 ± 1.1 | 968.1 ± 1.1 | 950.6 ± 1.1 |
![]() ![]() Apertus 70B Swiss AI | 935.8 ± 1.0 | 935.8 ± 1.2 | 926.2 ± 1.0 | 921.0 ± 0.8 | 941.7 ± 1.3 | 940.1 ± 1.2 | 935.2 ± 1.2 | 945.4 ± 0.8 |
![]() ![]() Command R 08-2024 32B CohereLabs | 934.9 ± 1.3 | 882.7 ± 1.0 | 927.0 ± 0.9 | 963.4 ± 0.8 | 927.5 ± 0.9 | 960.9 ± 0.7 | 912.5 ± 1.3 | 960.7 ± 1.2 |
![]() ![]() Qwen 2.5 7B Alibaba | 934.9 ± 1.2 | 858.4 ± 1.2 | 924.2 ± 0.9 | 976.0 ± 1.0 | 959.2 ± 1.0 | 886.0 ± 1.1 | 967.1 ± 1.1 | 960.0 ± 1.0 |
![]() ![]() Sailor2 8B SAIL | 927.0 ± 1.1 | 883.9 ± 1.1 | 979.2 ± 0.9 | 942.4 ± 0.9 | 950.1 ± 1.0 | 847.8 ± 1.1 | 938.0 ± 1.0 | 927.8 ± 1.0 |
![]() ![]() Mistral Small 3.1 2503 24B Mistral AI | 922.8 ± 1.1 | 839.3 ± 1.0 | 966.2 ± 1.1 | 977.0 ± 0.8 | 944.4 ± 0.9 | 821.2 ± 1.2 | 932.5 ± 1.2 | 953.8 ± 1.2 |
![]() ![]() Olmo 2 0325 32B AI2 | 920.9 ± 1.0 | 822.2 ± 0.9 | 975.2 ± 1.0 | 959.6 ± 0.9 | 954.1 ± 1.1 | 879.7 ± 1.4 | 921.0 ± 1.1 | 918.2 ± 1.1 |
![]() ![]() Sailor2 20B SAIL | 919.7 ± 0.9 | 856.0 ± 1.3 | 967.0 ± 1.1 | 952.2 ± 1.0 | 947.2 ± 1.0 | 903.8 ± 1.6 | 929.8 ± 1.0 | 866.5 ± 1.4 |
![]() ![]() Apertus 8B Swiss AI | 915.4 ± 0.9 | 927.7 ± 1.0 | 890.5 ± 1.2 | 922.6 ± 1.0 | 936.6 ± 1.2 | 873.6 ± 1.4 | 929.7 ± 1.1 | 920.4 ± 1.2 |
![]() ![]() Aya Expanse 8B CohereLabs | 907.8 ± 1.0 | 815.0 ± 1.1 | 897.7 ± 1.2 | 971.7 ± 1.2 | 937.7 ± 1.1 | 867.2 ± 1.5 | 872.1 ± 1.0 | 968.5 ± 1.1 |
![]() ![]() phi-4 14B Microsoft | 905.6 ± 0.9 | 867.4 ± 1.3 | 868.0 ± 0.9 | 951.5 ± 0.9 | 916.2 ± 1.1 | 901.3 ± 1.1 | 909.1 ± 0.9 | 920.9 ± 1.0 |
![]() ![]() Babel 9B Alibaba-DAMO | 903.6 ± 1.4 | 883.3 ± 1.1 | 897.9 ± 0.8 | 911.1 ± 1.0 | 897.3 ± 1.2 | 891.8 ± 1.1 | 931.1 ± 1.0 | 906.7 ± 1.0 |
![]() ![]() Babel 83B Alibaba-DAMO | 903.6 ± 1.0 | 909.1 ± 1.0 | 899.3 ± 1.1 | 904.7 ± 1.2 | 867.0 ± 1.3 | 917.6 ± 1.0 | 902.1 ± 1.0 | 922.0 ± 1.1 |
![]() ![]() SeaLLMs V3 7B Alibaba-DAMO | 895.8 ± 1.2 | 876.3 ± 0.9 | 897.7 ± 1.1 | 901.0 ± 1.0 | 923.5 ± 1.1 | 835.1 ± 1.2 | 928.2 ± 1.1 | 902.2 ± 1.2 |
![]() ![]() Llama 3 8B Meta | 894.8 ± 1.1 | 833.0 ± 0.9 | 915.6 ± 0.8 | 929.6 ± 0.9 | 916.1 ± 1.1 | 785.1 ± 1.2 | 919.4 ± 1.0 | 932.6 ± 1.3 |
![]() ![]() Command R7B 12-2024 7B CohereLabs | 869.1 ± 1.1 | 842.4 ± 0.8 | 884.8 ± 1.1 | 896.9 ± 1.1 | 868.7 ± 1.1 | 845.5 ± 1.3 | 869.0 ± 1.1 | 868.4 ± 1.2 |
![]() ![]() Ministral 2410 8B Mistral AI | 869.0 ± 1.2 | 880.3 ± 1.0 | 863.5 ± 1.1 | 880.1 ± 1.0 | 865.7 ± 1.0 | 838.3 ± 1.2 | 879.4 ± 1.1 | 874.9 ± 1.2 |
![]() ![]() Olmo 2 1124 13B AI2 | 866.2 ± 1.0 | 763.5 ± 1.3 | 912.8 ± 0.9 | 906.2 ± 1.0 | 915.8 ± 1.0 | 802.5 ± 1.2 | 843.0 ± 1.1 | 894.7 ± 0.8 |
![]() ![]() Olmo 2 1124 7B AI2 | 835.4 ± 1.0 | 790.9 ± 0.9 | 835.1 ± 1.5 | 866.9 ± 0.9 | 877.9 ± 1.1 | 780.7 ± 1.1 | 844.2 ± 0.8 | 834.6 ± 1.0 |