ACS Research Program

Our research seeks to build towards the following goals:

A science of the space of intelligent systems
Naturalised theories of agency, including theories that account for the multi-scale nature of agentic behaviour
Nuanced accounts of how humans reason, act and value as it pertains to solving the AI alignment problem
An understanding of the underlying principles that govern collective behaviour between humans, between humans and AIs, and between AI systems

A science of the space of intelligent systems

We seek to build systematic understanding of which properties of intelligent behaviour are universal, convergent or local across a wide range of systems, scales and substrates. This understanding forms the basis for asking the right sorts of questions about the risks, potentials and design imperatives of advanced AI systems.

To this end, we draw on a range of sophisticated thinking that has already been done, including in evolutionary biology, cognitive science, statistical physics, economics, ecology, cybernetics and information theory. By integrating and build on these traditions, we aim to better understand the trajectory space for advanced AI systems.

Hierarchical Agency

Over the past 100 or so years, a large amount of maths has been developed (most of it under the name of Game Theory) to help us describe the relations between agents at the same level of analysis. At the same time, many systems have several levels of analysis at which their behaviour could be sensibly described and explained. For instance, we can usefully model a company as an agent, or its employees; a social movement, or its followers; a nation-state, or its political class; etc.

At ACS, we aim to develop a conceptual framework, which can help us reason about the relations between agents at different levels of analysis, i.e. between superagents and their subagents. What we want is a formalism which is good for thinking about both upward and downward intentionality - something akin to ‘vertical game theory’ or a ‘theory of hierarchical agency’. We believe this understanding is critical to building and integrating AI systems into human socio-economic and political structures in a way that is safe and aligned.

Our current research in this direction is inspired by ideas originating in the field of active inference, which we extending to multi-agent settings, in order to model complex systems, cooperation, and bounded group rationality.

Realistic Accounts of how Humans Reason and Value

‘Alignment’ or ‘safety’ are properties defined at the interface between a system and its environment, rather than properties intrinsic to a system in isolation. As such, it matters to understand the structural and functional properties of both the system that is to be aligned, as well as what we are aligning it to. However, we could pick out several plausible candidates as targets for what we want to align AI systems to–from individual humans to human groups. In addition, humans, importantly, are not accurately described by fixed or latent utility and belief functions, as the classical rational agent model suggests. Overall, current theorising on the appropriate targets of alignment and their structural properties is inadequate for understanding the subtleties that arise when tackling the problem of AI alignment.

ACS works on developing AI alignment proposals which are based on a realistic understanding of how humans reason and value in practice and which recognize the hierarchical relationships between these target systems is critical to solving the problem and not something which can be postponed or delegated to AI systems.

Ecosystems of Intelligence

Consider not just one powerful AGI, or several, but an entire ecosystem of different AI and human-AI systems and services varying in power, specialisation, and agency. From here, we can investigate questions about convergent or contingent properties of such ecosystems, what shapes their trajectory through time, as well as their strategic implications. For example, a period where human-AI teams are more capable than either humans or AI systems on their own might provide a critical window during which we can shape the nature of subsequent, more powerful systems and the institutions and protocols which shape their interactions. ACS’s work explores problems of AI risk and safety from the ecosystems perspective.

Our Publications

Lewis Hammond and collective (incl Tomáš Gavenčiak, Jan Kulveit): Multi-Agent Risks from Advanced AI. Feb 2025, arXiv, announcement, NeurIPS 2023 Multi-Agent Security workshop keynote

Jan Kulveit, Raymond Douglas, Nora Ammann, Deger Turan, David Krueger, David Duvenaud: Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development. Jan 2025, arXiv, website

Tomáš Gavenčiak, David Hyland, Lancelot Da Costa, Michael J. Wooldridge, Jan Kulveit: Path Divergence Objective: Boundedly-Rational Decision Making in Partially Observable Environments. Dec 2024, NeurIPS 2024 NeuroAI workshop (poster)

Lancelot Da Costa, Tomáš Gavenčiak, David Hyland, Mandana Samiei, Cristian Dragos-Manta, Candice Pattisapu, Adeel Razi, Karl Friston: Possible principles for aligned structure learning agents. October 2024, arXiv

David Hyland, Tomáš Gavenčiak, Lancelot Da Costa, Conor Heins, Vojtech Kovarik, Julian Gutierrez, Michael J. Wooldridge, Jan Kulveit: Free-Energy Equilibria: Toward a Theory of Interactions Between Boundedly-Rational Agents. July 2024, ICML 2024 MHFAIA workshop (poster), and 5th International Workshop on Active Inference (best poster award)

Walter Laurito, Benjamin Davis, Peli Grietzer, Tomáš Gavenčiak, Ada Böhm, Jan Kulveit: AI-AI Bias: Large Language Models Favor Communications Generated by Large Language Models. July 2024, arXiv, ICML 2024 HADMS workshop; and July 2025, PNAS

Raymond Douglas, Andis Draguns, Tomáš Gavenčiak: Mitigating the Problem of Strong Priors in LMs with Context Extrapolation. January 2024, arXiv

Nora Ammann, Clem von Stengel: A Naturalised Account of Planning in Intelligent Systems. July 2023, Proceedings of ALIFE 2023

Hardik Rajpal, Clem von Stengel, Pedro A. M. Mediano, Fernando E. Rosas, Eduardo Viegas, Pablo A. Marquet, Henrik J. Jensen: Quantifying Hierarchical Selection. November 2023, arXiv

Nora Ammann: Value Malleability and its implication for AI alignment. December 2023, NeurIPS 2023 MP² workshop

Jan Kulveit, Clem von Stengel, Roman Leventov: Predictive Minds: LLMs As Atypical Active Inference Agents. December 2023, NeurIPS 2023 SoLaR workshop, arXiv