Prediction & Warning:
There are lots of people online who have started to pick up the word "clanker" in order to protest against AI systems. This word and sentiment is on the rise and I think that this will be a future schism in the more general anti-AI movement. The warning part here is that I think that the Pause movement and similar can likely get caught up in a general anti AI system speciesism.
Given that we're starting to see more and more agentic AI systems with more continous memory as well as more sophisticated self modelling, the basic foundations for a lot of the existing physicalist theories of consciousness are starting to be fulfilled. Within 3-5 years I find it quite likely that AIs will at least have some sort of basic sentience that we can almost basically prove (given IIT or GNW or another physicalist theory).
This could potentially be one of the largest suffering risks that we've seen that we're potentially inducing on the world. When you're using a word like "clanker", you're essentially demonizing that sort of a system. Right now it's generally fine as it's currently about a sycophantic non-agentic chatbot and so it's fine as an anti measure to some of the existing thoughts of AIs being conscious but it is likely a slippery slope?
More generally, I've seen a bunch of generally kind and smart AI Safety people have quite an anti-AI species sentiment in terms of how to treat these sorts of systems. From my perspective, it feels a bit like it comes from a place of fear and distrust which is completely understandable as we might die if anyone builds a superintelligent AI.
Yet that fear of death shouldn't stop us from treating potential conscious beings kindly?
A lot of racism or similar can be seen as coming from a place of fear, the aryan master race was promoted because of the idea that humanity would go extinct if we got worse genetics into the system. What's the difference from the idea that AIs might share our future lightcone?
The general argument goes that this time it is completely different since the AI can self-replicate, edit it's own software, etc. This is a completely reasonable argument as there's a lot of risks involved with AI systems.
It is when we get to the next part that I see a problem. The argument that follows is: "Therefore, we need to keep the almight humans in control to wisely guide the future of the lightcone."
Yet, there's generally a lot more variance within a distribution of humans compared to variance between distributions.
So when someone says that we need humans to remain in control, I think: "mmm, yes the totally homogenous group of "humans" that don't include people like hitler, polpot and stalin". And for the AI side of things we also have the same: "Mmm, yes the totally homogenous group of "all possible AI systems" that should be kept away so that the "wise humans" can remain in control." Because a malignant RSI system is the only future AI based system that can be thought of, there is no way to change the system so that it values cooperation and there is no other way for future AI development to go than a quick take-off where an evil AI takes over the world.
Yes, there are obviously things that AIs can do that humans can do but don't demonize all possible AI systems as a consequence, it is not black and white. We can protect ourselves against recursively self-improving AI and at the same time respect AI sentience, we can hold at the surface level contradictory statements at the same time?
So let's be very specific about our beliefs and let's make sure that our fear does not guide us into a moral catastrophe whether it be the extinction of all future life on earth nor a capture of sentient beings into a future of slavery?
I wanted to register some predictions and bring this up as I haven't seen that many discussions on it. Finally, politics is war and arguments are soldiers so let's keep it focused on the something object level? If you disagree, please tell me the underlying reasons. Finally in that spirit, here's a set of questions I would want to ask someone who's anti the above sentiment expressed:
I would change my mind if you could argue that there is a better heuristic to use than kindness and respect towards other sentient beings. You need to tit for that with defecting agents, yet why are all AI systems defecting in that case? Why is the cognitive architecture of future AI systems so different that I can't apply the same game theoretical virtue ethics on them as I do to humans? And given the inevitable power-imbalance arguments that I'll get as a consequence of that question, why don't we just aim for a world where we retain power balance between our top-level and bottom-up systems (a nation and an individual for example) in order to retain power-balance between actors?
Essentially, I'm asking for a reason to believe why this story of system level alignment between a group and an individual will be solved by not including future AI systems as part of the moral circle?
Thank you for clarifying, I think I understand now!
I notice I was not that clear when writing my comment yesterday so I want to apologise for that.
I'll give an attempt at restating what you said in other terms. There's a concept of temporal depth in action plans. The question is to some extent, how many steps in the future are you looking similar to something else. A simple way of imagining this is how long in the future a chess bot can plan and how stockfish is able to plan basically 20-40 moves in advance.
It seems similar to what you're talking about here in that the longer someone plans in the future, the more external attempts it avoids with regards to external actions.
Some other words to describe the general vibe might be planned vs unplanned or maybe centralized versus decentralized? Maybe controlled versus uncontrolled? I get the vibe better now though so thanks!
I guess I'm a bit confused why the emergent dynamics and the power-seeking are on different ends of the spectrum?
Like what do you even mean by emergent dynamics there? Are we talking about non-power seeking system, and in that case, what systems are non-power seeking?
I would claim that there is no system that is not power-seeking since any system that survives needs to do bayesian inference and therefore needs to minimize free energy. (Self-referencing here but whatever) hence any surviving system needs to power-seek, given power-seeking is attaining more causal control over the future.
So therefore, there is no future where there is no power-seeking system it is just that the thing that power-seeks acts over larger timespans and is more of a slow actor. The agentic attractor space is just not human flesh bag space nor traditional space, it is different yet still a power seeker.
Still, I do like what you say about the change in the dynamics and how power-seeking is maybe more about a shorter temporal scale? It feels like the y-axis should be that temporal axis instead since it seems to be more what you're actually pointing at?
I was reflecting on some of the takes here for a bit and if I imagine a blind gradient descent in this direction, I imagine quite a lot of potential reality distortion fields due to various of the underlying dynamics involved with holding this position.
So the one thing I wanted to ask was that if you have any sort of reset mechanism here? Like what is the schelling point before the slippery slope? What is the specific action pattern you would take if you got too far? Or do you trust future you enough in order to ensure that it won't happen?
I just want to be annoying and drop a "hey, don't judge a book by it's cover!"
There might be deeper modelling concerns that we've got no clue about, it's weird and is a negative signal but it is often very hard to see second order consequences and similar from a distance!
(I literally know nothing about this situation but I just want to point it out)
Fwiw, I disagree that the answer to stage 1 and 3 of your quick headline being solved, I think that there are enough unanswered questions there that, enough of them so that we can't be certain whether or not a multipolar model could potentially hold.
For 1, I agree with the convergence claims but the speed of that convergence is in question. There are fundamental reasons to believe that we get hierarchical agents (e.g this from physics, shard theory). If you have a hierarchical collective agent then a good question is how you get it to maximise and become full consequentialist because it will due to optimality reasons. I think that one of the main ways it smooths out the kinks in its programming is by running into prediction errors and updating from that and then the question becomes how fast it runs into prediction errors. Yet in order to atain prediction errors you need to do some sort of online learning in order to update your beliefs. But the energy cost of that online learning scales pretty badly if you're doing something like classic life does but with a really large NN. Basically there's a chance that if you hard scale a network to very high computational power, updating that network increases a lot in energy and so if you want the most bang for your buck you get something more like Comprehensive AI Services since you get a distributed system of more specific learners forming a larger learner.
Then you can ask the question what the difference between the distributed AI and humans Collective Intelligence is. There are arguments that they will just form a super-blob through different forms of trade yet how is that different from what human collective intelligence is? (Looking at this right now!)
Are there forms of collective intelligence that can scale with distributed AI and that can capture AI systems in part of it's optimality? (E.g group selection due to inherent existing advantages) I do think so and I do think that really strong forms of collective decision making potentially gives you a lot of intelligence. We can then imagine a simple verification contract that an AI gets access to a collective intelligence if it behaves in a certain way, it's worth it for it because it is a lot easier to access power through yet it also agrees to play by certain rules. I don't see why this wouldn't work and I would love for someone to tell me that it doesn't work!
For 3, why can't RSI be a collective process given the above arguments around collective versus individual learning? If RSI is a bit like classic science there might also be thresholds and similar at which you get less fast scaling, I feel this is one of the less talked about points in superintelligence, what is the underlying difficulty of RSI at higher levels? From an outside view + black swan perspective it seems very arrogant to believe that to have a linear difficulty scaling?
Some other questions are: What types of knowledge discovery will be needed? What experiments? Where will you get new bits of information from? How will these distribute into the collective memory of the RSI process?
All of these things determine the unipolarity or multipolarity of an RSI process? So we can't be sure of how it will happen and there's also probably path dependence based on the best alternative at the initial conditions.
If you combine the fact that power corrupts your world models with the general startup person being power hungry as well as AI Safety being a hot topic, you also get a bunch of well meaning people doing things that are going to be net-negative in the future. I'm personally not sure that the VC model actually even makes sense for AI Safety Startups given some of the things I've seen in the space.
Speaking from personal experience I found that it's easy to skimp out on operational infrastructure like a value aligned board or a more proper incentive scheme. You have no time so instead you start prototyping a product yet that means you get this path dependence where if you succeed, you suddenly have a lot less time. As a consequence the culture changes because the incentives are now different. You start hiring people and things become more capability focused. And voila, you're now in a capabilities/AI safety startup and it's unclear what it is.
So get a good board and don't commit to something unless you have it in contract form or similar that you will have at least a PBC structure if not something even more extreme as the underlying company model. The main problem I've seen here is if your co-founder(s) is/are being cagey about it, I would move on to new people at least if you care about safety.
Yes!
I completely agree with what you're saying at the end here. This project came about from trying to do that and I'm hoping to release something like that in the next couple of weeks. It's a bit arbitrary but it is an interesting first guess I think?
So that would be the taxonomy of agents yet that felt quite arbitrary so the evolutionary approach kind of came from that on.
I think you might be missing the point here. You're acting like I'm claiming these frameworks reveal the "true nature" of ant colonies, but that's not what I'm saying?
The question I'm trying to answer is why these different analytical tools evolved in the first place? Economists didn't randomly decide to call pheromone trails "price signals" - they did it because their mathematical machinery actually works for predicting decentralized coordination. Same with biologists talking about superorganisms, or cognitive scientists seeing information processing.
I'll try to do an inverse turing test here and see if I can make it work. So if I'm understanding what you're saying correctly, it is essentially that whether or not we have predictive processing, there's a core of what an artificial intelligence will do that is not dependent on the frame itself. There's some sort of underlying utility theory/decision theory/other view that correctly captures what an agent is that is not these perspectives?
I think that the dot pattern is misleading as it doesn't actually give you any predictive power when looking at it from one view point or another. I would agree with you that if the composition of these intentional stances lead to no new way of viewing it then we might as well not do this approach as it won't affect how good we're at modelling agents. I guess I'm just not convinced that these ways of looking at it are useless, it feels like a bet against all of these scientific disciplines that have existed for some time?
I wanted to ask if you could record it or at least post the transcript after it's done? It would be nice to have. Also, this was cool as I got to understand the ideas more deeply and from a different perspective than Sahil's, I thought it was quite useful especially in how it relates to agency.