Playing with ARC-AGI tests
Workroom PlayTime045
Exercise
15 minutes playing + 5 mins conversation
Kickoff
Demo – let’s pick an AGI-1 task from Explore All Tasks and do it together.
Do one yourself
Try (AGI-1 task) ARC-AGI Task #03560426 alone and keep a note of your hypotheses.
Work together
Try (AGI-1 / 2 task) ARC-AGI Task #12422b43 together – listen for hypotheses
Then let’s try to solve (AGI-1 / 2 task) ARC-AGI Task #136b0064 together.
If we've got time, we'll go back to solving solo / in pairs.
Debrief
5 mins – though we will be talking about most of this as we go. The following topics are suggestions if we feel quiet.
- How did the tests evaluate reasoning?
- Compare several tests. What is different in how they evaluate?
- Does this 'feel' like a challenge to reasoning, or simply a sweet spot for tests?
- Looking at the leaderboard, AGI-2 is a greater challenge to current models than AGI-1, and was developed as AGI-1's challenges were surmounted. Can this reflect a change in reasoning capability?
Extend
Only humans managed ARC-AGI Task #25094a63 and ARC-AGI Task #212895b5. Why might that be?
JL note: JL's machines have a further extension to 90/120 mins.