Exclusive: This new benchmark could expose AI’s biggest weakness
ARC-AGI-3 tests whether models can reason through novel problems, not just recall patterns, a task even top systems still struggle to do. The influential AI researcher François Chollet has long argued that the field measures intelligence incorrectly, that popular benchmarks reward a model’s ability ...