Automated Design of Agentic Systems: A New Era in AI Development

Introduction: The Rise of Automated Design of Agentic Systems

Researchers are directing significant efforts toward creating powerful general-purpose agents by leveraging Foundation Models (FMs) like GPT and Claude^[1]. Unlike monolithic models, these agents often require complex systems, integrating various components such as chain-of-thought planning, tool usage, and self-reflection^[1]. Yet, these designs usually need extensive manual fine-tuning from researchers and engineers^[1].

However, history shows that hand-crafted solutions in machine learning are eventually replaced by more efficient, learned solutions^[1]. Building on this premise, this work introduces a new research area called Automated Design of Agentic Systems (ADAS). ADAS aims to automatically invent novel building blocks and optimize entire agentic system designs^[1]. The ultimate goal is to create increasingly powerful agents that outperform state-of-the-art hand-designed solutions^[1].

The Vision of ADAS

To revolutionize agentic system design, ADAS focuses on automating the creation of these systems by using meta agents. Meta agents are designed to program better agents iteratively in code, leveraging the Turing completeness of programming languages like Python^[1]. This approach allows for the possible learning and discovery of any agentic system, including novel prompts, control flows, and tool use^[1]. The Meta Agent Search algorithm demonstrates this concept effectively^[1].

Key Components of ADAS

To operationalize ADAS, three components are essential:

Search Space: This defines which agentic systems can be represented. For example, some works optimize text prompts, while others explore graph structures or feed-forward networks^[1].
Search Algorithm: It specifies how ADAS explores the search space. Effective algorithms balance rapid discovery of high-performance systems while avoiding local optima^[1]. Variants include reinforcement learning or iterative generation by FMs.
Evaluation Function: Depending on ADAS's application, this function assesses candidate agents based on various criteria like performance, cost, and latency^[1].

Meta Agent Search: The Core Algorithm

Table 1 | Performance comparison between Meta Agent Search and state-of-the-art handdesigned agents across multiple domains. Meta Agent Search discovers superior agents compared to the baselines in every domain. We report the test accuracy and the 95% bootstrap confidence interval on held-out test sets. The search is conducted independently for each domain. — Table 1 | Performance comparison between Meta Agent Search and state-of-the-art handdesigned agents across multiple domains. Meta Agent Search discovers superior agents compared to the baselines in every domain. We report the test accuracy and the 95...Read More

Meta Agent Search is one of the initial algorithms within ADAS that operates entirely in a code space. The meta agent iteratively creates new agents, evaluates their performance, adds them to an archive, and uses this archive for subsequent iterations^[1]. By continuously incorporating feedback and refining its approach, the meta agent can build progressively more effective agents. Initial evaluation has shown Meta Agent Search's ability to greatly outperform hand-designed agents across multiple domains, including coding, science, and math^[1].

Case Study: The ARC Challenge

One demonstration of Meta Agent Search's efficacy is the ARC (Abstraction and Reasoning Corpus) challenge. This task evaluates AI systems' general intelligence by requiring them to learn transformation rules from a few examples and apply them to new inputs^[1].

Setup and Comparison

title: 'Figure 4 | An example task from the ARC challenge (Chollet, 2019). Given the input-output grid examples, the AI system is asked to learn the transformation rules and then apply these learned rules to the test grid to predict the final answer.' — title: 'Figure 4 | An example task from the ARC challenge (Chollet, 2019). Given the input-output grid examples, the AI system is asked to learn the transformation rules and then apply these learned rules to the test grid to predict the final answer...Read More

To address ARC's challenges, the agent writes code for transformation rules instead of direct answers. The experiment involved comparing Meta Agent Search against five state-of-the-art hand-designed agents^[1]:

Chain-of-Thought (COT)
Self-Consistency with Chain-of-Thought (COT-SC)
Self-Refine
LLM Debate
Quality-Diversity through Method Scaling

The best-discovered agent from these Meta Agent Search runs employed a sophisticated feedback mechanism, iterating through trials of multi-step reviews and refinements. This sophisticated process improved overall predictive accuracy significantly compared to baselines^[1].

Beyond ARC: Extending to Multi-domain Challenges

Benchmarks in Reading, Math, and Problem-Solving

Table 4 | Performance across multiple domains when transferring top agents from the Math (MGSM) domain to non-math domains. Agents discovered by Meta Agent Search in the math domain can outperform or match the performance of baselines after being transferred to domains beyond math. We report the test accuracy and the 95% bootstrap confidence interval. — Table 4 | Performance across multiple domains when transferring top agents from the Math (MGSM) domain to non-math domains. Agents discovered by Meta Agent Search in the math domain can outperform or match the performance of baselines after being tra...Read More

Meta Agent Search was also tested on four popular benchmarks: DROP for reading comprehension, MGSM for multilingual math, MMLU for multi-task problem-solving, and GPQA for advanced science questions^[1]. The ADAS approach consistently discovered high-performing agents in all tested domains, improving preceding state-of-the-art solutions by substantial margins^[1].

For example, in reading comprehension tasks, the algorithm improved F1 scores by 13.6/100 points, and in math tasks, accuracy rates increased by 14.4%^[1]. The discovered agents also demonstrated significant robustness, maintaining superior performance when transferred across models and domains^[1].

Generalization and Transferability

Table 2 | Performance on ARC when transferring top agents from GPT-3.5 to other FMs. Agents discovered by Meta Agent Search consistently outperform the baselines across different models. We report the test accuracy and the 95% bootstrap confidence interval. The names of top agents are generated by Meta Agent Search. †We manually changed this name because the original generated name was confusing. — Table 2 | Performance on ARC when transferring top agents from GPT-3.5 to other FMs. Agents discovered by Meta Agent Search consistently outperform the baselines across different models. We report the test accuracy and the 95% bootstrap confidence in...Read More

An important aspect of Meta Agent Search is the generalizability of the discovered agents. Experiments showed that agents optimized on one FM, like GPT-3.5, performed well when transferred to other models such as Claude-Sonnet and GPT-4^[1]. This transferability illustrates these agents' robustness and their potential applicability to a wide array of tasks and environments.

Moreover, agents developed in specific domains, such as math, generalized well to non-math domains like reading comprehension and multi-task problem-solving. This ability to adapt and perform across varied areas underscores the broad utility and effective design of ADAS-generated agents^[1].

Future Directions and Safety Considerations

While ADAS promises a fast track to developing advanced agentic systems, it also raises significant safety concerns. There is a pressing need to run untrusted code safely and ensure that the generated agents are honest, helpful, and harmless. Developing sandbox environments and incorporating principles from Constitutional AI might be crucial future steps^[1].

Potential future research areas in ADAS include higher-order ADAS for self-improving meta agents, introducing more existing building blocks into the search space, integrating multi-objective optimization, and developing more sophisticated evaluation functions^[1]. Extending this approach to more complex real-world applications and understanding the emergence of complexity in human organizations are additional avenues^[1].

Conclusion

The Automated Design of Agentic Systems (ADAS) represents an exciting new frontier in AI research. By automating the creation of complex, powerful agents, ADAS offers a promising path toward increasingly efficient and effective AI systems that can significantly advance various real-world applications^[1].

By demonstrating superior performance across diverse domains and proving robust transferability, ADAS underscores the potential to revolutionize AI development from the foundational level up^[1].

Get more accurate answers with Super Search, upload files, personalized discovery feed, save searches and contribute to the PandiPedia.