Playing with ARC-AGI tests
Workroom PlayTime045
Exercise
15 minutes
Kickoff
Demo – let’s pick an AGI-2 task from Explore All Tasks and do it together.
Do one yourself
Try (AGI-1 task) ARC-AGI Task #03560426 alone and keep a note of your hypotheses.
Work together
Try (AGI-1 / 2 task) ARC-AGI Task #12422b43 together – listen for hypotheses
Then let’s try to solve (AGI-1 / 2 task) ARC-AGI Task #136b0064 together.
If we've got time, we'll go back to solving solo / in pairs.
Debrief
5 mins – though we will be talking about most of this as we go. The following topics are suggestions if we feel quiet.
- How did the tests evaluate reasoning?
- Compare several tests. What is different?
- AGI-1 was a challenge to machine reasoning in ¿2022, not in 2025. AGI-2 was harder, and AGI-3 is being built. Can this reflect a change in reasoning capability?
Extend
Only humans managed ARC-AGI Task #25094a63 and ARC-AGI Task #212895b5. Why might that be?
Extension to 90/120 mins on JL's machines.