Maybe this question has already been answered but I don't understand how recursive self-improvement of AIs is compatible with the AI alignment problem being hard.
I mean doesn't the AI itself face the alignment problem when it tries to improve/modify itself substantially? So wouldn't a sufficently intelligent AI refuse to create such an improvement for fear the goals of the improved AI would differ from its own?
I'd just like to add that even if you think this piece is completly mistaken I think it certainly shows we are definitely not knowledgeable enough about what and how values and motives work in us much less AI to confidently make the prediction that AIs will be usefully described with a single global utility function or will work to subvert their reward system or the like.
Maybe that will turn out to be true but before we spend so many resources on trying to solve AI alignment let's try to make the argument for the great danger much more rigorous first...usually best way to start anyway.
This is one of the most important posts ever on LW though I don't think the implications have been fully drawn out. Specifically, this post raises serious doubts about the arguments for AI x-risk as a result of alignment mismatch and the models used to talk about that risk. It undercuts both Bostrom's argument that an AI will have a meaningful (self-aware?) utility function and Yudkowsky's reward button parables.
The role these two arguments play in convincing people that AI x-risk is a hard problem is to explain why, if you don't anthropomorphize should a program that's , say, excellent at conducting/scheduling interviews to ferret out moles in the intelligence community try to manipulate external events at all not just think about them to better catch moles? I mean it's often the case that ppl fail to pursue their fervent goals outside familiar context. Why will AI be different? Both arguments conclude that AI will inevitably act like it's very effectively maximizing some simple utility function in all contexts and in all ways.
Bostrom tries to convince us that as creatures get more capable they tend to act more coherently (more like they are governed by a global utility function). This is of course true for evolved creatures but by offering a theory of how value type things can arise this theory predicts that if you only train your AI in a relatively confined class of circumstances (even if that requires making very accurate predictions about the rest of the world) it isn't going to develop that kind of simple global value but, rather, would likely find multie shards in tension without clear direction if forced to make value choices in very different circumstances. Similarly, it exains why the AI won't just wirehead itself by pressing it's rewaes button.
I absolutely think that the future of online marketing g involves more asking ppl for their prefs. I know I go into my settings on good to active curate what they show me.
Indeed, I think Google is leaving a fucking pile of cash on the table by not adding a "I dislike" button and a little survey on their ads.
I feel there is something else going on here too.
Your claimed outside view asks us to compare a clean codebase with an unclean one and I absolutely agree that it's a good case for using currentDate when initially writing code.
But you motivated this by considering refactoring and I think things go off the rails there. If the only issue in your codebase was you called currentDate yyymmdd consistently or even had other consistent weird names it wouldn't be a message it would just have slightly weird conventions. Any coder working on it for a non-trivial length of time would start just reading yyymmdd as current date in their head.
Tge codebase is only messy when you inconsistently use a bunch of different names for a concept that aren't very descriptive. But now refactoring faces exactly the same problem working with the code does..the confusion coders experience seeing the variable and wondering what it does becomes ambiguity which forces a time intensive refactor.
Practically the right move is probably better stds going forward and to encourage coders to fix variable names in any piece of code they touch. But I don't think it's really a good example of divergent intuitions once you are talking about the same things.
I don't think this is a big problem.. The people who use ad blockers are both a small fraction of internet users and the most sophisticated ones so I doubt they are a major issue for website profit. I mean sure, Facebook is eventually going to try to squeeze out the last few percent of users if they can do so with an easy countermeasure but if this was really a big concern websites would be pushing to get that info back from the company they use to host ads. Admittedly when I was working on ads for Google (I'm not cut out to be out of academia so I went back to it) I never really got into this part of the system so I can't comment on how it would work out but I think if this mattered enough companies serving ads would figure out how to report back to the page about ad blockers.
I'm sure some sites resent ad blockers and take some easy countermeasures but at an economic level I'm skeptical this really matters.
What this means for how you should feel about using ad blockers is more tricky but since I kinda like well targeted ads I don't have much advice on this point.
Interesting, but I think it's the other end of the equation where the real problem lies: voting. Given the facts that
1) A surprisingly large fraction of the US population has tried hard drugs of one kind or another.
2) Even those who haven't almost surely know people who have and seem to find it interesting/fascinating/etc.. not horrifying behavior that deserves prison time.
So why is it that people who would never dream of sending their friend who tried coke to prison or even the friend who sold that friend some of his stash how do we end up with draconian drug laws?
I don't have an easy answer. I'm sure the overton window and a desire to signal that they themselves are not pro-drug or drug users is part of the answer. It's like lowering the age of consent for sex. As long as the loudest voices arguing it should be legal for 40 year olds to sleep with 16 year olds are creeps few people will make that argument no matter how good.
But this doesn't really seem like enough to explain the phenomena.
So your intent here is to diagnose the conceptual confusion that many people have with respect to infinity yes? And your thesis is that: people are confused about infinity because they think it has a unique referant while in fact positive and negative infinity are different?
I think you are on to something but it's a little more complicated and that's what gets people are confused. The problem is that in fact there are a number of different concepts we use the term infinity to describe which is why it so super confusing (and I bet there are more).
1. Virtual Points that are above or below all other values in an ordered ring (or their positive component) which we use as shorthand to write limits and reason about how they behave.
2. The background idea of the infinite as meaning something that is beyond all finite values (hence why a point at infinity is infinite).
3. The cardinality of sets which are bijectable with a proper subset of themselves, i.e., infinite. Even here there is an ambiguity between the sets with a given cardinality and the cardinal itself.
4. The notion of absolute mathematical infinity. If this concept makes sense it does have a single reference which is taken to be 'larger' (usually in the sense of cardinality) than any possible cardinal, i.e. the height of the true hierarchy of sets.
5. The metaphorical or theological notion of infinity as a way of describing something beyond human comprehension and/or without limits.
The fact that some of these notions do uniquely refer while others don't is a part of the problem.
Stimulants are an excellent short term solution. If you absolutely need to get work done tonight and can't sleep amphetamine (i.e. Adderall) is a great solution. Indeed, there are a number of studies/experiments (including those the airforce relies on to give pilots amphetamines) backing up the fact that it improves the ability to get tasks done while sleep deprived.
Of course, if you are having long term sleep problems it will likely increase those problems.
I like the idea of this sequence, but -- given the goal of spelling out the argument in terms of first principles -- I think more needs to be done to make the claims precisce or acknowledge they are not.
I realize that you might be unable to be more precisce given the lack of precision in this argument generally -- I don't understand how people have invested so much time/mondy on research to solve the problem and so little on making the argument for it clear and rigorous -- but if that's the case I suggest you indicate where the definitions are insufficient/lacking/unclear.
I'll list a few issues here:
Defining Superintelligence
Even Bostrom's definition of superintelligence is deeply unclear. For instance, would an uploaded human mind which simply worked at 10x the speed of a normal human mind qualify as a superintelligence? Intuitively the answer should be no, but per the definition the answer is almost certainly yes (at least if we imbue that upload with extra patience). After all, virtually all cognitive tasks of interest benefit from extra time -- if not at the time of performance then extra time to practice (10x the practice games of chess would make you a better player). And if it did qualify it undermines the argument about superintelligence improvement (see below).
If you require a qualitative improvement rather than merely speeding up the rate of computation to be a superintelligence then the definition risks being empty. In many important cognitive tasks humans already implement the theoretically optimal algorithm or nearly do so. Lots of problems (eg search on unordered data on classical TM) have no better solution than just brute force and this likely includes quite a few tasks we care quite a bit about (maybe even in social interactions). Sure, maybe an AI could optimize away the part where our slow human brain slogs through (tho arg we have as well w/ computers) but that just sounds like increased processing speed.
Finally, does that superiority measure resource usage? Does a superintelligence need to beat us on a watt for watt comparison or could it use the computing capacity of the planet.
These are just a few concerns but they illustrate the inadequacy of the definition. And it's not just nitpicking. This loose way of talking about superintelligence invites us, w/o adequate argument, to assume the relationship we will have to AI is akin to the relationship you have with your dumb family members/friends. And even if that was the relationship, remember that your dumb friends wouldn't seem so easily dominated if they hadn't decided not to put in much effort into intellectual issues.
Self-improvement
When it comes to talking about self-improvement the discussion is totally missing any notion of rate, extent or qualitative measure. The tendency is for people to assume that since technology seems to happen fast somehow so will this self-improvement but why should that be?
I mean we are already capable of self-improvement. We change the culture we pass down over time and as a result a child born today ends up learning more math, history and all sorts of problem solving tools in school that an ancient Roman kid wouldn't have learned [1]. Will AI self-improvement be equally slow? If it doesn't improve itself any faster than we improve our intelligence no problem. So any discussion of this issue that seeks to draw any meaningful conclusions needs to make some claim about the rate of improvement and even defining such a quantitative measure seems extremely difficult.
And it's not just the instantaneous rate of self-improvement that matters but also the shape of the curve. You seem to grant that figuring out how to improve AI intelligence will take the AI some time to figure out -- it's gotta do the same kind of trial and error we did to build it in the first place -- and won't be instantaneous. Ok, how does that time taken scale with increasing intelligence? Maybe an AI with a 100 SIQ points can build one with 101 SIQ after a week of work. But then maybe it takes 2 weeks for the 101 SIQ AI to figure out how to reach 102 and so on. Maybe it even asymptotes.
And what does any of this even mean? Is it getting much more capable or marginally capable? Why assume the former? Given the fact that there are mathematical limits on the most efficient possible algorithms shouldn't we expect an asymptote in ability? Indeed, there might be good reasons to think humans aren't far from it.
1: Ofc, I know that people will try and insist that merely having learned a bunch of skills/tricks in school that help you solve problems doesn't qualify as improving your intelligence. Why not? If it's just a measure of ability to solve relevant cognitive challenges such teaching sure seems to qualify. I think the temptation here is to import the way we use intelligence in human society as a measure of raw potential but that relies on a kind of hardware/software distinction that doesn't obviously make sense for AI (and arguably doesn't make sense for humans over long time scales -- Flynn effect).