The ARC Prize Foundation introduced this month the newest iteration of its popular benchmark: ARC-AGI-2 (also known as the Abstraction and Reasoning Corpus). The latest benchmark test is even more challenging than the original called ARC-AGI-1, which launched in 2019.
According to Arc Prize Foundation President Greg Kamradt, “ARC-AGI-2 significantly raises the bar for AI.”
Examining the early scores
The ARC-AGI-2 benchmark is comprised of a series of puzzles for AI to solve. After giving the test to 400+ humans, the ARC Prize Foundation established a human baseline for its tests.
- Human panel: 60% average with a cost per task of $17.
Current generative AI tools, however, didn’t fare so well.
- OpenAI o1-pro: 1% with a cost per task of $200.
- OpenAI o3-mini-high: 0.0% with a cost per task of $0.41.
- OpenAI GPT-4.5: 0.0% with a cost per task of $0.29.
- DeepSeek-R1 and R1-Zero: 0.3% with a cost per task of $0.08.
- Anthropic Claude 3.7: 0.0% with a cost per task of $0.120.
- Google Gemini 2.0 Flash: 1.3% with a cost per task of $0.004.
The human panel vastly outperformed the large language models (LLMs) and AI systems that were evaluated using ARC-AGI-2. But what datasets did the tests use?
Analyzing the datasets
The ARC-AGI-2 benchmark comprises a total of four datasets.
- Training: 1,000 uncalibrated public tasks.
- Public Eval: 120 calibrated public tasks.
- Semi-Private Eval: 120 calibrated private tasks.
- Private Eval: 120 calibrated private tasks.
Tasks are considered calibrated when they are independent and identically distributed. This calibration approach, as detailed by TechCrunch, ensures that scores across these datasets remain directly comparable.
ARC Prize 2025: The grand prize is $700K
March 2025 also saw the announcement of ARC Prize 2025, which is based on the ARC-AGI-2 benchmarks and datasets. With a grand prize of $700,000, the competition challenges AI developers to attain an 85% accuracy rating on ARC-AGI-2’s private evaluation dataset.
In order to be eligible, the types of AI models competing for the prize can spend no more than $0.42 per task. Moreover, they must complete the evaluation without access to the internet.
Those competing in ARC Prize 2025 must have their submission ready by November 3, 2025. Researchers can also submit whitepapers for a shot at a $75,000 grand prize. The deadline for paper submissions is November 9, 2025.
Whichever AI model scores the highest on the private evaluation dataset will be declared the winner. Paper submissions are evaluated according to a standardized rubric.