This is a special post for short-form writing by Prometheus. Only they can create top-level comments. Comments here also appear on the Shortform Page and All Posts page.
On 12 September 1940, the entrance to the Lascaux Cave was discovered on the La Rochefoucauld-Montbel lands by 18-year-old Marcel Ravidat when his dog, Robot, investigated a hole left by an uprooted tree (Ravidat would embellish the story in later retellings, saying Robot had fallen into the cave.)[8][9] Ravidat returned to the scene with three friends, Jacques Marsal, Georges Agnel, and Simon Coencas. They entered the cave through a 15-metre-deep (50-foot) shaft that they believed might be a legendary secret passage to the nearby Lascaux Manor.[9][10][11] The teenagers discovered that the cave walls were covered with depictions of animals.[12][13] Galleries that suggest continuity, context or simply represent a cavern were given names. Those include the Hall of the Bulls, the Passageway, the Shaft, the Nave, the Apse, and the Chamber of Felines. They returned along with the Abbé Henri Breuil on 21 September 1940; Breuil would make many sketches of the cave, some of which are used as study material today due to the extreme degradation of many of the paintings. Breuil was accompanied by Denis Peyrony, curator of Les eyzies (Prehistory Museum) at Les Eyzies, Jean Bouyssonie and Dr Cheynier.
The cave complex was opened to the public on 14 July 1948, and initial archaeological investigations began a year later, focusing on the Shaft. By 1955, carbon dioxide, heat, humidity, and other contaminants produced by 1,200 visitors per day had visibly damaged the paintings. As air condition deteriorated, fungi and lichen increasingly infested the walls. Consequently, the cave was closed to the public in 1963, the paintings were restored to their original state, and a monitoring system on a daily basis was introduced.
Lascaux II, an exact copy of the Great Hall of the Bulls and the Painted Gallery was displayed at the Grand Palais in Paris, before being displayed from 1983 in the cave's vicinity (about 200 m or 660 ft away from the original cave), a compromise and attempt to present an impression of the paintings' scale and composition for the public without harming the originals.[10][13] A full range of Lascaux's parietal art is presented a few kilometres from the site at the Centre of Prehistoric Art, Le Parc du Thot, where there are also live animals representing ice-age fauna.[14]
The paintings for this site were duplicated with the same type of materials (such as iron oxide, charcoal, and ochre) which were believed to be used 19,000 years ago.[9][15][16][17] Other facsimiles of Lascaux have also been produced over the years.
They have also created additional copies, Lascaux III, Lascaux IV, and Lascaux V.
Consequently, the cave was closed to the public in 1963, the paintings were restored to their original state, and a monitoring system on a daily basis was introduced.
“I actually find it overwhelmingly hopeful, that four teenagers and a dog named Robot discovered a cave with 17,000-year-old handprints, that the cave was so overwhelmingly beautiful that two of those teenagers devoted themselves to its protection. And that when we humans became a danger to that caves' beauty, we agreed to stop going. Lascaux is there. You cannot visit.”
-John Green
People preserve the remains of Lucy, work hard to preserve old books, the Mona Lisa is protected under bullet-proof glass and is not up for sale.
What is the mechanistic reason for this? There are perfect copies of these things, yet humans go through great lengths to preserve the original. Why is there the Sacred?
They have created copies of Lascaux, yet still work hard to preserve the original. Humans cannot enter. They get no experience of joy from visiting. It is not for sale. Yet they strongly desire to protect it, because it is the original, and no other reason.
Robin Hanson gave a list of Sacred characteristic, some I find promising:
Sacred things are highly (or lowly) valued. We revere, respect, & prioritize them.
Sacred is big, powerful, extraordinary. We fear, submit, & see it as larger than ourselves.
We want the sacred “for itself”, rather than as a means to get other things.
Sacred makes us feel less big, distinct, independent, in control, competitive, entitled.
We get emotionally attached to the sacred; our stance re it is oft part of our identity.
We desire to connect with the sacred, and to be more associated with it.
Sacred things are sharply set apart and distinguished from the ordinary, mundane.
Re sacred, we fear a slippery slope, so that any compromise leads to losing it all.
If we can understand the sacred, it seems like a concept that probably wouldn’t fall into a simple utility function, something that wouldn’t break out-of-distribution. A kind of Sacred Human Value Shard, something that protects our part of the manifold.
A practical reason for preserving the original is that new techniques can allow new things to be discovered about it. A copy can embody no more than the observations that we have already made.
There's no point to analysing the pigments in a modern copy of a painting, or carbon-dating its frame.
if we could somehow establish how information from the original was extracted, do you expect humans to then destroy the original or allow it to be destroyed?
No. The original is a historical document that may have further secrets to be revealed by methods yet to be invented. A copy says of the original only what was put into it.
I think you're missing the point. If we could establish that all important information had been extracted from the original, would you expect humans to then destroy the original or allow it to be destroyed?
My guess is that they wouldn't. Which I think means practicality is not the central reason why humans do this.
The following is a conversation between myself in 2022, and a newer version of me earlier this year.
On the Nature of Intelligence and its "True Name":
2022 Me: This has become less obvious to me as I’ve tried to gain a better understanding of what general intelligence is. Until recently, I always made the assumption that intelligence and agency were the same thing. But General Intelligence, or G, might not be agentic. Agents that behave like RLs may only be narrow forms of intelligence, without generalizability. G might be something closer to a simulator. From my very naive perception of neuroscience, it could be that we (our intelligence) is not agentic, but just simulates agents. In this situation, the prefrontal cortex not only runs simulations to predict its next sensory input, but might also run simulations to predict inputs from other parts of the brain. In this scenario, “desire” or “goals”, might be simulations to better predict narrowly-intelligent agentic optimizers. Though the simulator might be myopic, I think this prediction model allows for non-myopic behavior, in a similar way GPT has non-myopic behavior, despite only trying to predict the next token (it has an understanding of where a future word “should” be within the context of a sentence, paragraph, or story). I think this model of G allows for the appearance of intelligent goal-seeking behavior, long-term planning, and self-awareness. I have yet to find another model for G that allows for all three. The True Name of G might be Algorithm Optimized To Reduce Predictive Loss.
2023 Me: interesting, me’22, but let me ask you something: you seem to think this majestic ‘G’ is something humans have, but other species do not, and then name the True Name of ‘G’ to be Algorithm Optimized To Reduce Predictive Loss. Do you *really* think other animals don’t do this? How long is a cat going to survive if it can’t predict where it’s going to land? Or where the mouse’s path trajectory is heading? Did you think it was all somehow hardcoded in? But cats can jump up on tables, and those weren’t in the ancestry environment, there’s clearly some kind of generalized form of prediction occurring. Try formulating that answer again, but taboo “intelligence”, “G”, “agent”, “desire”, and “goal”. I think the coherence of it breaks down.
Now, what does me’23 think? Well, I’m going to take a leaf from my own book, and try to explain what I think without the words mentioned above. There are predictive mechanisms in the Universe that can run models of what things in the Universe might do in future states. Some of these predictive mechanism are more computationally efficient than others. Some will be more effective than others. A more effective and efficient predictive mechanism, with a large input of information about the Universe, could be a very powerful tool. If taken to the theoretical (not physical) extreme, that predictive mechanism would hold models of all possible future states. It could then, by accident or intention, guide outcomes toward certain future states over others.
2022 Me: according to this model, humans dream because the simulator is now making predictions without sensory input, gradually creating a bigger and bigger gap from reality. Evidence to support this is from sensory-deprivation tanks, where humans, despite being awake, have dream-like states. I also find it interesting that people who exhibit Schizophrenia, which involves hallucinations (like dreams do), can tickle themselves. Most people can be tickled by others, but not themselves. But normal people on LSD can do this, and also can have hallucinations. My hairbrained theory is that something is going wrong when initializing new tokens for the simulator, which results in hallucinations from the lack of correction from sensory input, and a less strong sense of self because of a lack of correction from RL agents in other parts of the brain.
2023 Me: I don’t want to endorse crackpot theories from Me’22, so I’m just going to speak from feelings and fuzzy intuitions here. I will say hallucinations from chatbots are interesting. When getting one to hallucinate, it seems to be kind of “making up reality as it goes along”. You say it’s a Unicorn, and it will start coming up with explanations for why it’s a Unicorn. You say it told you something you never told it, and it will start acting as though it did. I have to admit it does have a strange resemblance to dreams. I find myself in New York, but remember that I had been in Thailand that morning, and I hallucinate a memory of boarding a plane. I wonder where I got the plane ticket, and I hallucinate another memory of buying one. These are not well-reasoned arguments, though, so I hope Me’24 won’t beat me up too much about them.
2022 Me: I have been searching for how to test this theory. One interest of mine has been mirrors.
2023 Me: Don’t listen to Me’22 on this one. He thought he understood something, but he didn’t. Yes, the mirror thing is interesting in animals, but it’s probably a whole different thing, not the same thing.
The following is a conversation between myself in 2022, and a newer version of myself earlier this year.
On AI Governance and Public Policy
2022 Me: I think we will have to tread extremely lightly with, or, if possible, avoid completely. One particular concern is the idea of gaining public support. Many countries have an interest in pleasing their constituents, so if executed well, this could be extremely beneficial. However, it runs high risk of doing far more damage. One major concern is the different mindset needed to conceptualize the problem. Alerting people to the dangers of Nuclear War is easier: nukes have been detonated, the visual image of incineration is easy to imagine and can be described in detail, and they or their parents have likely lived through nuclear drills in school. This is closer to trying to explain someone the dangers of nuclear war before Hiroshima, before the Manhattan Project, and before even tnt was developed. They have to conceptualize what an explosion even is, not simply imagining an explosion at greater scale. Most people will simply not have the time or the will to try to grasp this problem, so this runs the risk of having people calling for action to a problem they do not understand, which will likely lead to dismissal by AI Researchers, and possibly short-sighted policies that don’t actually tackle the problem, or even make the problem worse by having the guise of accomplishment. To make matters worse, there is the risk of polarization. Almost any concern with political implications that has gained widespread public attention runs a high risk of becoming polarized. We are still dealing with the ramifications of well-intentioned, but misguided, early advocates in the Climate Change movement two decades ago, who set the seeds for making climate policy part of one’s political identity. This could be even more detrimental than a merely uninformed electorate, as it might push people who had no previous opinion on AI to advocate strongly in favor of capabilities acceleration, and to be staunchly against any form of safety policy. Even if executed using the utmost caution, this does not stop other players from using their own power or influence to hijack the movement and lead it astray.
2023 Me: Ah, Me’22,, the things you don’t know! Many of the concerns of Me’22 I think are still valid, but we’re experiencing what chess players might call a “forced move”. People are starting to become alarmed, regardless of what we say or do, so steering that in a direction we want is necessary. The fire alarm is being pushed, regardless, and if we don’t try to show some leadership in that regard, we risk less informed voices and blanket solutions winning-out. The good news is “serious” people are going on “serious” platforms and actually talking about x-risk. Other good news is that, from current polls, people are very receptive to concerns over x-risk and it has not currently fallen into divisive lines (roughly the same % of those concerned fall equally among various different demographics). This is still a difficult minefield to navigate. Polarization could still happen, especially with an Election Year in the US looming. I’ve also been talking to a lot of young people who feel frustrated not having anything actionable to do, and if those in AI Safety don’t show leadership, we might risk (and indeed are already risking), many frustrated youth taking political and social action into their own hands. We need to be aware that EA/LW might have an Ivory Tower problem, and that, even though a pragmatic, strategic, and careful course of action might be better, this might make many feel “shut out” and attempt to steer their own course. Finding a way to make those outside EA/LW/AIS feel included, with steps to help guide and inform them, might be critical to avoiding movement hijacking.
On Capabilities vs. Alignment Research:
2022 Me: While I strongly agree that not increasing capabilities is a high priority right now, I also question if we risk creating a state of inertia. In terms of the realms of safety research, there are very few domains that do not risk increasing capabilities research. And, while capabilities continues to progress every day, we might risk failing to keep up the speed of safety progress simply because every action risks an increase in capabilities. Rather than a “do no harm” principle, I think counterfactuals need to be examined in these situations, where we must consider if there is a greater risk if we *don’t* do research in a certain domain.
2023 Me: Oh, oh, oh! I think Me’22 was actually ahead of the curve on this one. This might still be controversial, but I think many got the “capabilities space” wrong. Many AIS-inspired theories that could increase capabilities are for systems that could be safer, more interpretable, and easier to monitor by default. And by not working on such systems we instead got the much more inscrutable, dangerous models by default, because the more dangerous models are easier. To quote the vape commercials, “safer != safe” but I still quit smoking in favor of electronics because safer is still at least safer. This is probably a moot point now, though, since I think it’s likely too late to create an entirely new paradigm in AI architectures. Hopefully Me’24 will be happy to tell me we found a new, 100% safe and effective new paradigm that everyone’s hopping on. Or maybe he’ll invent it.
Can humans become Sacred?
On 12 September 1940, the entrance to the Lascaux Cave was discovered on the La Rochefoucauld-Montbel lands by 18-year-old Marcel Ravidat when his dog, Robot, investigated a hole left by an uprooted tree (Ravidat would embellish the story in later retellings, saying Robot had fallen into the cave.)[8][9] Ravidat returned to the scene with three friends, Jacques Marsal, Georges Agnel, and Simon Coencas. They entered the cave through a 15-metre-deep (50-foot) shaft that they believed might be a legendary secret passage to the nearby Lascaux Manor.[9][10][11] The teenagers discovered that the cave walls were covered with depictions of animals.[12][13] Galleries that suggest continuity, context or simply represent a cavern were given names. Those include the Hall of the Bulls, the Passageway, the Shaft, the Nave, the Apse, and the Chamber of Felines. They returned along with the Abbé Henri Breuil on 21 September 1940; Breuil would make many sketches of the cave, some of which are used as study material today due to the extreme degradation of many of the paintings. Breuil was accompanied by Denis Peyrony, curator of Les eyzies (Prehistory Museum) at Les Eyzies, Jean Bouyssonie and Dr Cheynier.
The cave complex was opened to the public on 14 July 1948, and initial archaeological investigations began a year later, focusing on the Shaft. By 1955, carbon dioxide, heat, humidity, and other contaminants produced by 1,200 visitors per day had visibly damaged the paintings. As air condition deteriorated, fungi and lichen increasingly infested the walls. Consequently, the cave was closed to the public in 1963, the paintings were restored to their original state, and a monitoring system on a daily basis was introduced.
Lascaux II, an exact copy of the Great Hall of the Bulls and the Painted Gallery was displayed at the Grand Palais in Paris, before being displayed from 1983 in the cave's vicinity (about 200 m or 660 ft away from the original cave), a compromise and attempt to present an impression of the paintings' scale and composition for the public without harming the originals.[10][13] A full range of Lascaux's parietal art is presented a few kilometres from the site at the Centre of Prehistoric Art, Le Parc du Thot, where there are also live animals representing ice-age fauna.[14]
The paintings for this site were duplicated with the same type of materials (such as iron oxide, charcoal, and ochre) which were believed to be used 19,000 years ago.[9][15][16][17] Other facsimiles of Lascaux have also been produced over the years.
They have also created additional copies, Lascaux III, Lascaux IV, and Lascaux V.
Consequently, the cave was closed to the public in 1963, the paintings were restored to their original state, and a monitoring system on a daily basis was introduced.
“I actually find it overwhelmingly hopeful, that four teenagers and a dog named Robot discovered a cave with 17,000-year-old handprints, that the cave was so overwhelmingly beautiful that two of those teenagers devoted themselves to its protection. And that when we humans became a danger to that caves' beauty, we agreed to stop going. Lascaux is there. You cannot visit.”
-John Green
People preserve the remains of Lucy, work hard to preserve old books, the Mona Lisa is protected under bullet-proof glass and is not up for sale.
What is the mechanistic reason for this? There are perfect copies of these things, yet humans go through great lengths to preserve the original. Why is there the Sacred?
They have created copies of Lascaux, yet still work hard to preserve the original. Humans cannot enter. They get no experience of joy from visiting. It is not for sale. Yet they strongly desire to protect it, because it is the original, and no other reason.
Robin Hanson gave a list of Sacred characteristic, some I find promising:
Sacred things are highly (or lowly) valued. We revere, respect, & prioritize them.
Sacred is big, powerful, extraordinary. We fear, submit, & see it as larger than ourselves.
We want the sacred “for itself”, rather than as a means to get other things.
Sacred makes us feel less big, distinct, independent, in control, competitive, entitled.
Sacred quiets feelings of: doubts, anxiety, ego, self-criticism, status-consciousness.
We get emotionally attached to the sacred; our stance re it is oft part of our identity.
We desire to connect with the sacred, and to be more associated with it.
Sacred things are sharply set apart and distinguished from the ordinary, mundane.
Re sacred, we fear a slippery slope, so that any compromise leads to losing it all.
If we can understand the sacred, it seems like a concept that probably wouldn’t fall into a simple utility function, something that wouldn’t break out-of-distribution. A kind of Sacred Human Value Shard, something that protects our part of the manifold.
A practical reason for preserving the original is that new techniques can allow new things to be discovered about it. A copy can embody no more than the observations that we have already made.
There's no point to analysing the pigments in a modern copy of a painting, or carbon-dating its frame.
if we could somehow establish how information from the original was extracted, do you expect humans to then destroy the original or allow it to be destroyed?
No. The original is a historical document that may have further secrets to be revealed by methods yet to be invented. A copy says of the original only what was put into it.
Only recently an ancient, charred scroll was first read.
I think you're missing the point. If we could establish that all important information had been extracted from the original, would you expect humans to then destroy the original or allow it to be destroyed?
My guess is that they wouldn't. Which I think means practicality is not the central reason why humans do this.
I think you’re missing my point, which is that we cannot establish that.
Yes, I’m questioning your hypothetical. I always question hypotheticals.
The following is a conversation between myself in 2022, and a newer version of me earlier this year.
On the Nature of Intelligence and its "True Name":
2022 Me: This has become less obvious to me as I’ve tried to gain a better understanding of what general intelligence is. Until recently, I always made the assumption that intelligence and agency were the same thing. But General Intelligence, or G, might not be agentic. Agents that behave like RLs may only be narrow forms of intelligence, without generalizability. G might be something closer to a simulator. From my very naive perception of neuroscience, it could be that we (our intelligence) is not agentic, but just simulates agents. In this situation, the prefrontal cortex not only runs simulations to predict its next sensory input, but might also run simulations to predict inputs from other parts of the brain. In this scenario, “desire” or “goals”, might be simulations to better predict narrowly-intelligent agentic optimizers. Though the simulator might be myopic, I think this prediction model allows for non-myopic behavior, in a similar way GPT has non-myopic behavior, despite only trying to predict the next token (it has an understanding of where a future word “should” be within the context of a sentence, paragraph, or story). I think this model of G allows for the appearance of intelligent goal-seeking behavior, long-term planning, and self-awareness. I have yet to find another model for G that allows for all three. The True Name of G might be Algorithm Optimized To Reduce Predictive Loss.
2023 Me: interesting, me’22, but let me ask you something: you seem to think this majestic ‘G’ is something humans have, but other species do not, and then name the True Name of ‘G’ to be Algorithm Optimized To Reduce Predictive Loss. Do you *really* think other animals don’t do this? How long is a cat going to survive if it can’t predict where it’s going to land? Or where the mouse’s path trajectory is heading? Did you think it was all somehow hardcoded in? But cats can jump up on tables, and those weren’t in the ancestry environment, there’s clearly some kind of generalized form of prediction occurring. Try formulating that answer again, but taboo “intelligence”, “G”, “agent”, “desire”, and “goal”. I think the coherence of it breaks down.
Now, what does me’23 think? Well, I’m going to take a leaf from my own book, and try to explain what I think without the words mentioned above. There are predictive mechanisms in the Universe that can run models of what things in the Universe might do in future states. Some of these predictive mechanism are more computationally efficient than others. Some will be more effective than others. A more effective and efficient predictive mechanism, with a large input of information about the Universe, could be a very powerful tool. If taken to the theoretical (not physical) extreme, that predictive mechanism would hold models of all possible future states. It could then, by accident or intention, guide outcomes toward certain future states over others.
2022 Me: according to this model, humans dream because the simulator is now making predictions without sensory input, gradually creating a bigger and bigger gap from reality. Evidence to support this is from sensory-deprivation tanks, where humans, despite being awake, have dream-like states. I also find it interesting that people who exhibit Schizophrenia, which involves hallucinations (like dreams do), can tickle themselves. Most people can be tickled by others, but not themselves. But normal people on LSD can do this, and also can have hallucinations. My hairbrained theory is that something is going wrong when initializing new tokens for the simulator, which results in hallucinations from the lack of correction from sensory input, and a less strong sense of self because of a lack of correction from RL agents in other parts of the brain.
2023 Me: I don’t want to endorse crackpot theories from Me’22, so I’m just going to speak from feelings and fuzzy intuitions here. I will say hallucinations from chatbots are interesting. When getting one to hallucinate, it seems to be kind of “making up reality as it goes along”. You say it’s a Unicorn, and it will start coming up with explanations for why it’s a Unicorn. You say it told you something you never told it, and it will start acting as though it did. I have to admit it does have a strange resemblance to dreams. I find myself in New York, but remember that I had been in Thailand that morning, and I hallucinate a memory of boarding a plane. I wonder where I got the plane ticket, and I hallucinate another memory of buying one. These are not well-reasoned arguments, though, so I hope Me’24 won’t beat me up too much about them.
2022 Me: I have been searching for how to test this theory. One interest of mine has been mirrors.
2023 Me: Don’t listen to Me’22 on this one. He thought he understood something, but he didn’t. Yes, the mirror thing is interesting in animals, but it’s probably a whole different thing, not the same thing.
The following is a conversation between myself in 2022, and a newer version of myself earlier this year.
On AI Governance and Public Policy
2022 Me: I think we will have to tread extremely lightly with, or, if possible, avoid completely. One particular concern is the idea of gaining public support. Many countries have an interest in pleasing their constituents, so if executed well, this could be extremely beneficial. However, it runs high risk of doing far more damage. One major concern is the different mindset needed to conceptualize the problem. Alerting people to the dangers of Nuclear War is easier: nukes have been detonated, the visual image of incineration is easy to imagine and can be described in detail, and they or their parents have likely lived through nuclear drills in school. This is closer to trying to explain someone the dangers of nuclear war before Hiroshima, before the Manhattan Project, and before even tnt was developed. They have to conceptualize what an explosion even is, not simply imagining an explosion at greater scale. Most people will simply not have the time or the will to try to grasp this problem, so this runs the risk of having people calling for action to a problem they do not understand, which will likely lead to dismissal by AI Researchers, and possibly short-sighted policies that don’t actually tackle the problem, or even make the problem worse by having the guise of accomplishment. To make matters worse, there is the risk of polarization. Almost any concern with political implications that has gained widespread public attention runs a high risk of becoming polarized. We are still dealing with the ramifications of well-intentioned, but misguided, early advocates in the Climate Change movement two decades ago, who set the seeds for making climate policy part of one’s political identity. This could be even more detrimental than a merely uninformed electorate, as it might push people who had no previous opinion on AI to advocate strongly in favor of capabilities acceleration, and to be staunchly against any form of safety policy. Even if executed using the utmost caution, this does not stop other players from using their own power or influence to hijack the movement and lead it astray.
2023 Me: Ah, Me’22,, the things you don’t know! Many of the concerns of Me’22 I think are still valid, but we’re experiencing what chess players might call a “forced move”. People are starting to become alarmed, regardless of what we say or do, so steering that in a direction we want is necessary. The fire alarm is being pushed, regardless, and if we don’t try to show some leadership in that regard, we risk less informed voices and blanket solutions winning-out. The good news is “serious” people are going on “serious” platforms and actually talking about x-risk. Other good news is that, from current polls, people are very receptive to concerns over x-risk and it has not currently fallen into divisive lines (roughly the same % of those concerned fall equally among various different demographics). This is still a difficult minefield to navigate. Polarization could still happen, especially with an Election Year in the US looming. I’ve also been talking to a lot of young people who feel frustrated not having anything actionable to do, and if those in AI Safety don’t show leadership, we might risk (and indeed are already risking), many frustrated youth taking political and social action into their own hands. We need to be aware that EA/LW might have an Ivory Tower problem, and that, even though a pragmatic, strategic, and careful course of action might be better, this might make many feel “shut out” and attempt to steer their own course. Finding a way to make those outside EA/LW/AIS feel included, with steps to help guide and inform them, might be critical to avoiding movement hijacking.
On Capabilities vs. Alignment Research:
2022 Me: While I strongly agree that not increasing capabilities is a high priority right now, I also question if we risk creating a state of inertia. In terms of the realms of safety research, there are very few domains that do not risk increasing capabilities research. And, while capabilities continues to progress every day, we might risk failing to keep up the speed of safety progress simply because every action risks an increase in capabilities. Rather than a “do no harm” principle, I think counterfactuals need to be examined in these situations, where we must consider if there is a greater risk if we *don’t* do research in a certain domain.
2023 Me: Oh, oh, oh! I think Me’22 was actually ahead of the curve on this one. This might still be controversial, but I think many got the “capabilities space” wrong. Many AIS-inspired theories that could increase capabilities are for systems that could be safer, more interpretable, and easier to monitor by default. And by not working on such systems we instead got the much more inscrutable, dangerous models by default, because the more dangerous models are easier. To quote the vape commercials, “safer != safe” but I still quit smoking in favor of electronics because safer is still at least safer. This is probably a moot point now, though, since I think it’s likely too late to create an entirely new paradigm in AI architectures. Hopefully Me’24 will be happy to tell me we found a new, 100% safe and effective new paradigm that everyone’s hopping on. Or maybe he’ll invent it.