Playing with ARC-AGI tests

Exercises Jan 19, 2026

Workroom PlayTime045

Exercise

15 minutes playing + 5 mins conversation

Demo – let’s pick an AGI-1 task from Explore All Tasks and do it together.

Try (AGI-1 task) ARC-AGI Task #03560426 alone and keep a note of your hypotheses.

Try (AGI-1 / 2 task) ARC-AGI Task #12422b43 together – listen for hypotheses

Then let’s try to solve (AGI-1 / 2 task) ARC-AGI Task #136b0064 together.

If we've got time, we'll go back to solving solo / in pairs.

5 mins – though we will be talking about most of this as we go. The following topics are suggestions if we feel quiet.

How did the tests evaluate reasoning?
Compare several tests. What is different in how they evaluate?
Does this 'feel' like a challenge to reasoning, or simply a sweet spot for tests?
Looking at the leaderboard, AGI-2 is a greater challenge to current models than AGI-1, and was developed as AGI-1's challenges were surmounted. Can this reflect a change in reasoning capability?