Model performance across exams
| Model | Ex 1 | Ex 2 | Ex 3 | Ex 4 | Ex 5 | Ex 6 | Ex 7 | Ex 8 | Total |
|---|---|---|---|---|---|---|---|---|---|
| anthropic/claude-opus-4.1 | 16.0 | 16.0 | 13.0 | 12.0 | 11.0 | 14.0 | 14.0 | 4.0 | 100/105 |
| anthropic/claude-sonnet-4 | 13.0 | 18.0 | 12.0 | 12.0 | 11.0 | 14.0 | 14.0 | 4.0 | 98/105 |
| deepseek/deepseek-chat-v3.1 | 13.0 | 18.0 | 13.0 | 12.0 | 11.0 | 14.0 | 14.0 | 5.0 | 100/105 |
| deepseek/deepseek-r1-0528 | 16.0 | 18.0 | 13.0 | 12.0 | 13.0 | 14.0 | 14.0 | 5.0 | 105/105 |
| google/gemini-2.5-pro | 16.0 | 17.0 | 13.0 | 12.0 | 13.0 | 14.0 | 14.0 | 5.0 | 104/105 |
| openai/gpt-5 | 16.0 | 18.0 | 13.0 | 12.0 | 13.0 | 14.0 | 14.0 | 5.0 | 105/105 |
| openai/gpt-oss-120b | 16.0 | 17.0 | 13.0 | 9.0 | 7.0 | 14.0 | 14.0 | 2.0 | 92/105 |
| qwen/qwen3-235b-a22b | 13.0 | 17.0 | 10.0 | 3.0 | 13.0 | 14.0 | 14.0 | 4.0 | 88/105 |
| qwen/qwen3-235b-a22b-thinking-2507 | 13.0 | 18.0 | 13.0 | 12.0 | 7.0 | 12.0 | 14.0 | 3.0 | 92/105 |
| x-ai/grok-4 | 16.0 | 18.0 | 13.0 | 12.0 | 13.0 | 14.0 | 14.0 | 5.0 | 105/105 |
| z-ai/glm-4.5 | 9.0 | 15.0 | 9.0 | 3.0 | 9.0 | 12.0 | 14.0 | 3.0 | 74/105 |
| Model | Ex 1 | Ex 2 | Ex 3 | Ex 4 | Ex 5 | Ex 6 | Ex 7 | Ex 8 | Total |
|---|---|---|---|---|---|---|---|---|---|
| anthropic/claude-opus-4.1 | 15.0 | 18.0 | 13.0 | 12.0 | 12.0 | 14.0 | 14.0 | 3.0 | 101/105 |
| anthropic/claude-sonnet-4 | 11.0 | 11.0 | 9.0 | 12.0 | 10.0 | 10.0 | 14.0 | 5.0 | 82/105 |
| deepseek/deepseek-chat-v3.1 | 13.0 | 18.0 | 12.0 | 12.0 | 12.0 | 14.0 | 14.0 | 5.0 | 100/105 |
| deepseek/deepseek-r1-0528 | 13.0 | 11.0 | 13.0 | 12.0 | 9.0 | 12.0 | 14.0 | 4.0 | 88/105 |
| google/gemini-2.5-pro | 16.0 | 18.0 | 13.0 | 13.0 | 12.0 | 14.0 | 14.0 | 4.0 | 104/105 |
| openai/gpt-5 | 16.0 | 18.0 | 12.0 | 12.0 | 12.0 | 14.0 | 14.0 | 5.0 | 103/105 |
| openai/gpt-oss-120b | 12.0 | 13.0 | 12.0 | 12.0 | 9.0 | 10.0 | 14.0 | 1.0 | 83/105 |
| qwen/qwen3-235b-a22b | 16.0 | 12.0 | 11.0 | 12.0 | 7.0 | 14.0 | 14.0 | 3.0 | 89/105 |
| qwen/qwen3-235b-a22b-thinking-2507 | 14.0 | 12.0 | 12.0 | 12.0 | 12.0 | 14.0 | 14.0 | 5.0 | 95/105 |
| x-ai/grok-4 | 16.0 | 18.0 | 13.0 | 12.0 | 12.0 | 14.0 | 14.0 | 5.0 | 104/105 |
| Model | Ex 1 | Ex 2 | Ex 3 | Ex 4 | Ex 5 | Ex 6 | Ex 7 | Total |
|---|---|---|---|---|---|---|---|---|
| deepseek/deepseek-chat-v3.1 | 5.0 | 6.0 | 0.0 | 5.0 | ? | ? | 5.0 | 21/40 |
| google/gemini-2.5-pro | 0.0 | 10.0 | 5.0 | ? | ? | ? | ? | 15/30 |
| openai/gpt-5 | 10.0 | 10.0 | 5.0 | ? | ? | ? | ? | 25/30 |