AI system’s code optimisation ‘evolution’
A new research paper details an AI system that could change how complex code is developed and optimised. AlphaEvolve, released in May 2025 by Google DeepMind, attempts to combine large language models with automated evaluation at scale to tackle problems in science and engineering.
Like scientific discovery, traditional code optimisation involves lengthy cycles of theorising, testing and refinement. AlphaEvolve aims to accelerate this process through what DeepMind calls "evolutionary computation." The system takes existing code, proposes modifications, and tests these changes against defined metrics - essentially automating the trial-and-error process that characterises much of scientific advancement.
The approach appears to address one of the key challenges with AI in technical fields: reliability. Where large language models often generate plausible but incorrect solutions, AlphaEvolve's automated evaluation process provides objective verification of any proposed improvements.
This verification is central to AlphaEvolve's operation. While Large Language Models (LLMs) provide the creative spark, generating novel code modifications by drawing on their pattern-recognition capabilities, these suggestions are not accepted at face value. Each proposed change is systematically subjected to automated code execution and rigorous performance evaluation. The outcomes of these tests – the objective scores – then directly inform the next cycle of the evolutionary process, ensuring that only genuinely beneficial modifications are propagated and built upon.
Early results show promise in areas relevant to academic research:
The research team tested AlphaEvolve against more than 50 open problems in mathematical analysis, geometry, combinatorics, and number theory. They report matching existing solutions in approximately 75 percent of cases, with improvements over previous best-known solutions in 20 percent of cases.
The research team tested AlphaEvolve against more than 50 open problems in mathematical analysis, geometry, combinatorics, and number theory. They report matching existing solutions in approximately 75 percent of cases, with improvements over previous best-known solutions in 20 percent of cases.
Environmental Science The principles used by AlphaEvolve to optimise complex systems could be applied to refining climate models or ecological simulations, provided clear evaluation metrics can be established.
Agricultural Science the system's ability to evolve heuristic functions for optimisation problems might find parallels in optimising resource allocation models or genetic algorithm parameters for crop breeding programs.
Computational Science: AlphaEvolve's reported successes in optimising research code offer more direct parallels. The system's ability to refine algorithms for tasks like matrix multiplication or search heuristics could be relevant to researchers working on complex simulations or data analysis across various scientific disciplines.
However, the system has limitations. It can only be applied to problems where success can be automatically measured and evaluated, ruling out many areas of scientific inquiry that require human judgment or physical experimentation. Additionally, while the system can optimise existing code, it requires careful human oversight to ensure modifications maintain the intended functionality.
The research team tested AlphaEvolve against more than 50 open problems in mathematical analysis, geometry, combinatorics, and number theory. They report matching existing solutions in approximately 75 percent of cases, with improvements over previous best-known solutions in 20 percent of cases.
Key points
Evolutionary approach: The system uses Large Language Models (LLMs) to propose code modifications. These changes are then automatically executed and scored for performance, driving an iterative process to gradually improve or invent algorithms, mimicking an evolutionary cycle.
Broad optimisation scope: AlphaEvolve can modify entire codebases, not just isolated functions, and is designed to optimise for multiple performance metrics simultaneously by building upon previous successful solutions.
Mathematical discoveries: It has reportedly discovered novel, provably correct algorithms. A notable example is a rank-48 method for multiplying two 4×4 complex-valued matrices, surpassing a 56-year-old benchmark (Strassen's algorithm).
Practical applications: Improved data centre scheduling, reclaiming approximately 0.7% of compute capacity; Reduced LLM training times by about 1% overall, achieved through a ~23% speedup in specific kernel operations; Contributed to optimising TPU (Tensor Processing Unit) circuit designs for area and power efficiency.
Strategic model use: The system employs an asynchronous pipeline, utilising a combination of faster (e.g., Gemini Flash) and more powerful (e.g., Gemini Pro) LLMs. This balances rapid exploration of many ideas with the generation of potentially higher-impact, breakthrough suggestions.