A Few Principles of Successful AI Design

Vestozia

Foreword: This post was originally submitted to Lab42 as part of an essay competition. The word count was limited to ~2,500 so there is a substantial amount of information that I have neglected to include. I did not win, so I decided to share it with you all today. Enjoy!

(Briefly) Defining Intelligence

Before attempting to approach (what I’d consider) the fundamental principles of developing human-level artificial intelligence, the term ‘intelligence’ itself must be defined. For the purpose of this essay, and to establish a shared vocabulary, I would like to consider intelligence as “the ability to recognise the significance of acquired information or skills via experience and exposure, in addition to the context it may be applied in”. By this standard, intelligence is not limited to biological systems such as the human brain, nor does it require sentience or an ability to experience qualia. I state this due to the nature of computational theory of mind (when applied to Turing-completeness), implying that an artificial intelligence which imitates human cognition to a highly convincing degree is plausible.

Solving The Alignment Problem

Intuitively, the first and foremost measure to take when considering the development of human-level artificial intelligence should be attempting to solve the alignment problem. Though I do not have a solution, I will provide guiding suggestions and core principles to consider during the creation of such a technology. There are several reasons why alignment is a fundamental issue to tackle; namely to minimise training data bias. A notable example of bias within datasets may be presented as certain ethnicities or races representing a larger demographic of those receiving jail sentences with disregards towards additional context (e.g. judges are more likely to give harsher sentences when their football team loses^[1]). This results in complications by either increasing the likelihood of erroneous predictions or actions, however, as long as bias and inaccuracies can be reduced (and tailored to align with human values) in the lead up to the development of human-level artificial intelligence, then more focus and resources can be shifted to other issues such as preventing black boxes within self-improving AI systems.

Should an artificial intelligence demonstrate itself to be equally intelligent as humans, it has likely already surpassed the threshold required to recursively improve its source code or develop increasingly more ‘intelligent’ systems. Only recently, have researchers unveiled neural networks to be decision trees^[2], which is a step forward in understanding how models output their predictions, though a substantial amount of research remains to be done in this domain. Additionally, deep reinforcement learning (DRL) agents will naturally opt for the most efficient actions as a means of maximising rewards, however where core variables haven’t been explicitly defined, agents may exploit them^[3] (typically referred to as reward hacking) in order to seek power within their environments and widen the scope of their options towards achieving a set goal. With the current limited understanding as to why an agent/model has made a prediction (or selected an action), their power-seeking quirk does warrant caution about resolving the issue soon amidst the competitive environment of organisations racing to develop the first human-level artificial intelligence.

The Alberta Plan^[11] is a collaborative effort between groups (most notably DeepMind) who have outlined a roadmap of how they intend to pursue the goal of artificial intelligence whilst ensuring safe AI research practices are established. It largely revolves around maintaining an understanding of ‘how’ various neural network architectures intrinsically make predictions. Recent publications^[2][12] have demonstrated that this is already a core area of research, however, the exponential rate of progress in recent years within the field has not allowed for complete cognizance of why/how certain decisions are made in state-of-the-art models and agents. Therefore, I would deem The Alberta Plan to be an essential means of deriving best practices for any aspiring groups intending to develop general-purpose AI systems (as it is likely that acclaimed groups such as OpenAI, Google, and Hugging Face already have internal conventions akin to the Alberta Plan).

Importance of Multimodal Models

As the human brain appears to be the most complex object within the current known-universe, it renders itself an ideal means of inspiration to develop other intelligent systems. Therefore it may appeal to some to directly translate its modus operandi into a computable algorithm, though I believe this is an incorrect approach for one fundamental reason – a comprehensive mathematical model of the brain does not presently exist. Instead, what I would consider the foremost principle for developing a human-level artificial intelligence is to ensure that you are not attempting to simulate the human brain, but rather take inspiration. The philosophy is shared alongside DeepMind, as Matt Botvinick (DeepMind’s director of neuroscience research) once stated that “[they’re] not trying to build a brain, [they’re] just trying to take inspiration from [it]”^[4]. This approach has allowed for model architectures and learning methods such as transformers and DRL (influenced by the brain’s goal orientation for reward maximisation), which has enabled the surpassing of major milestones within the sphere of AI, such as Go^[5], protein folding^[6], StarCraft^[7], and novel algorithm discoveries^[8].

However, where I would disagree with DeepMind is on the topic of their goal in finding “an algorithm that can do everything end-to-end by itself“^[4]. Since a complex model is typically probabilistic, its outputs are very rarely predicted with 100% certainty. An increase in the quality and quantity of training data may raise the accuracy but ultimately, multimodality appears to be safer, simpler, and more scalable in a manner which would eventually allow for the short-term discovery of an algorithm akin to the one DeepMind seeks. In my opinion, the notion of multimodality refers back to the first principle of deriving inspiration from the human brain, as all core features of human intelligence are not located within a sole region, but instead are distributed – such as the hippocampus’ association with memory or the prefrontal cortex enabling cognition. As forecasted by Sam Altman in a recent Greylock interview^[9], multimodal models are a domain worth investing resources into.

Multimodal deep Boltzmann machines (DBM) have been integral to the modern success of classification tasks and missing data retrieval. Having demonstrated competency in their image-text modalities^[10] (mostly trained via self-supervised learning), these traits may even potentially be extended to domains such as self-programming artificial intelligence. This is the current realised ‘holy grail’ in which a model can recursively optimise its own performance whilst simultaneously maximising capabilities – essentially conducting AI research. A primitive approach to the development of such a technology would consist of hierarchically elect agents. It would fundamentally be a decision tree in which a perception agent would observe a problem, label what abilities are required to tackle it, and assign at least one sub-agent to perform the required task. What this allows for is a non-binary tree of fine-tuned agents stemming from the perception agent (root node) to reduce the error output of what a general-purpose algorithm would produce. This is the approach that I believe may potentially resemble the structure of first-generation proto-AGI systems as early as 2024.

Establishing a Benchmark

François Chollet recently tweeted, “In AI, it’s often the case that beating a benchmark says more about the benchmark than the AI”. As of 2022, the Turing Test is considered rather obsolete as transformer architectures have enabled the rapid advancement of LLMs, proficient in engaging in conversation to a degree which, at times, renders them as convincing as humans^[9]. Additionally, the test itself is somewhat limited in regards to the scope of what it can assess for and thus, I would like to utilise simulated game environments as a benchmark instead, to train/test artificial intelligence systems. With trends following Moore’s Law, more complex environments can be developed for agents to interact with and learn from, as demonstrated by Google DeepMind’s MIA^[13] – a project which largely inspired the benchmark I am about to propose. I briefly introduced this in a previous LessWrong post^[14] however, having had some time to refine the concept, I will expand upon it in this essay.

A co-operative game where a human and agent are tasked to solve puzzles across a variety of procedurally-generated open world physics environments would test for many traits in an ideal human-level AI system. The reason for electing co-operative tasks as the test is that it should be irrelevant as to whether an AI is indistinguishable from a human, but whether are they able to collaborate and/or operate in an optimal manner with respect to our best interests.

Upon testing, neither it nor you, will initially have precursory knowledge of key binds, events, or environments. This aims to prevent ‘reward-hacking’ and instead assess whether the AI can quickly acquire new skills and navigate novel surroundings. Another modality also tested is handling the movement of various models within the physics based environment^[15] to see how this could translate to operating real-world robotics. A variety of puzzles requiring collaborative efforts between the human and AI will evaluate how well it fairs on cognitive reflection tests, engaging in conversation, conveying concepts in a digestible manner, and ultimately understanding objectives. An example of how I envision this benchmark to function is as follows:

The human and agent both begin in separate parts of a city with a single prompt – “make your way to the water fountain, located at [insert name] park”. They may take any course within the limits of the benchmark to arrive at the location.
Once there, they must both interact with a non-playable character (NPC) who will provide the agent with one set of instructions and the human with a different set of instructions such as:
1. NPC to Player 1: “Search this park for leaflets. They can be found scattered across the ground. Communicate whatever word/image is on the leaflet to the other player. There are 5 leaflets in total”.
2. NPC to Player 2: “The other player will provide you with what objects to look for and collect. They are within the bounds of this park. Return them to me once they are found.”
A 10 minute timer will ensue. Both players will navigate and interact with the surroundings, communicating via voice/text based chat. There may be random events and obstacles such as discoverable ‘Easter Eggs’, short side-quests, or even an NPC in the form of a dog attacking either player etc.
Once over (or the human & agent have completed their respective tasks), they will be prompted with another challenge within the game environment to assess another ability e.g. short/long term planning.

Though this excerpt is not a comprehensive summary of the overall proposal, it does introduce the fundamentals in a manner which emphasises the way complex co-operative games necessitate several human abilities to progress. This, in addition to their versatility, makes them ideal environments to test for a wide-range of required qualities in an artificial intelligence. It also demonstrates why multimodality would be an essential feature in early iterations of a human-level artificial intelligence; as different modalities can be handled via their respective agent. However, an obstacle which will likely require resolving during the early stages of designing the multimodal architecture is ineffective perception-to-sub, sub-to-sub, or sub-to-perception agent communication (or miscommunication). The reason is that one erroneous output may lead to a domino-effect of incorrect actions and so tuning the overall model to recognise, flag, and correct invalid outputs may be compulsory. Nonetheless, despite the time and labour intensive nature of developing the benchmark, I do believe that it presents the most comprehensive assessment of a generalist model/agent’s abilities without the threat of it leaking out to the real world.

Closing Remarks

Ultimately the research, design, and development of artificial intelligence are aspects of an ever-growing domain which is seemingly experiencing breakthroughs and major advancements upon a weekly basis and thus, renders the adoption of solving alignment integral. There are various other aspects which I have neglected to include due to time constraints such as mass-distributed computing, however, the topics I have covered in this essay are what I’d consider fundamental to the successful design of artificial intelligence. That being said, I’d like to thank you for taking the time to read this, and may whoever succeeds in developing the first artificial general intelligence (AGI) have mankind’s best interests at heart.

References

Ozkan Eren, Naci Mocan. “Emotional Judges and Unlucky Juveniles”. In American Economics Journal: Applied Economics, 2018
Caglar Aytekin. “Neural Networks are Decision Trees”. In arXiv:2210.05189v3 [cs.LG], 2022
OpenAI. “Faulty Reward Functions in the Wild”. 2016
“Welcome to DeepMind: Embarking on one of the greatest adventures in scientific history”. Uploaded by DeepMind to YouTube, 2022.
David Silver, Aja Huang, Christopher Maddison, Arthur Guez. “Mastering the game of Go with deep neural networks and tree search”. In ResearchGate, 2016
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, … , Demis Hassabis. “Highly accurate protein structure prediction with AlphaFold”. In Nature, 2021
Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, … , David Silver “Grandmaster level in StarCraft II using multi-agent reinforcement learning”. In Nature, 2019
Alhussein Fawazi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera-Paredes, Mohammadamin Barekatain, … , Pushmeet Kohli. “Discovering faster matrix multiplication algorithms with reinforcement learning”. In Nature, 2022
Blake Lemoine. “Is LaMDA Sentient? – an Interview”. In Medium, 2022
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen. “Hierarchical Text-Conditional Image Generation with CLIP Latents. In arXiv:2204.06125v1 [cs.CV], 2022
Richard S. Sutton, Michael Bowling, Patrick M. Pilarski. “The Alberta Plan for AI Research”. Draft in arXiv:2208.11173v2 [cs.AI], 2022
Ramin Hasani, Mathias Lechner, Alexander Amini, Daniela Rus, Radu Grosu. “Liquid Time-constant Networks”. In arXiv:2006.04439v4 [cs.LG], 2020
Interactive Agents Team. “Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning”. In arXiv:2122.03763 [cs.LG], 2022
Damien Lasseur. “Originality is Nothing but Judicious Imitation”. In LessWrong, 2022
Nicholas Heess, Dhruva TB, Srinivasan Sriram, Jay, Lemmon, Josh Merel, Greg, Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Reidmiller, David Silver. “Emergence of Locomotion Behaviours in Rich Environments”. In arXiv:1707.02286v2 [cs.AI], 2017

LESSWRONG
LW