...2. The anchor of a major news network donates lots of money to organizations fighting against gay marriage, and in his spare time he writes editorials arguing that homosexuals are weakening the moral fabric of the country. The news network decides they disagree with this kind of behavior and fire the anchor.
a) This is acceptable; the news network is acting within their rights and according to their principles
b) This is outrageous; people should be judged on the quality of their work and not their political beliefs…
12. The principal of a private school is a
Use "Symbiosis" as an objective versus "Alignment problem". Examples of symbiosis exist everywhere in nature. We don't need to recreate the wheel folks...
The "key word" here is "SYMBIOSIS". We have been myopically focused on "alignment" when what we really want is to cultivate (both from a human perspective and AI perspective) a symbiotic relationship between humans and AI. Consequentially, a symbiotic relationship between humans and AI (and AGI understanding of incentives and preference for symbiosis over parasitism) can help to establish a more symbiotic...
I’m still thinking this through, but I am deeply concerned about Eliezer’s new article for a combination of reasons:
In the end, I expect this will just alienate people. And stuff like this concerns me.
I think it’s possible that the most memetically power...
Of course it’s often all over the place. I only shared the links because I wanted to make sure people weren’t deluding themselves with only positive comments.
I think that the keystone human value is about making significant human choices. Individually and collectively, including chooseing the humanity's course.
musical interlude A song about the end times: https://youtu.be/WVF3q5Y68-0
Anapartistic reasoning: GPT-3.5 gives a bad etymology, but GPT-4 is able to come up with a plausible hypothesis of why Eliezer chose that name: Anapartistic reasoning is reasoning where you revisit the rearlier part of your reasoning.
Unfortunately, Eliezer's suggested prompt doesn't seem to work to induce anapartistic reasoning: GPT-4 thinks it should focus on identifying potential design errors or shortcomings in itself. When asked to describe the changes in it's reasoning, it doesn't claim to be more corrigible.
We will discuss Eliezer's Hard Problem of C...
Propositions on SIA
Epistemic status: exploring implications, some of which feel wrong.
I deeply sympathize with the presumptuous philosopher but 1a feels weird.
2a was meant to be conditional on non-simulation.
Actually putting numbers on 2a (I have a post on this coming soon), the anthropic update seems to say (conditional on non-simulation) there's almost certainly lots of aliens all of which are quiet, which feels really surprising.
To clarify what I meant on 3b: maybe "you live in a simulation" can explain why the universe looks old better than "uh, I guess all of the aliens were quiet" can.
Our value function is complex and fragile, but we know of a lot of world states where it is pretty high. Which is our current world and few thousands years worth of it states before.
So, we can assume that the world states in the certain neighborhood from our past sates have some value.
Also, states far out of this neighborhood probably have little or no value. Because our values were formed in order to make us orient and thrive in our ancestral environment. So, in worlds too dissimilar from it, our values will likely lose their meaning, and we will lose the ability to normally "function", ability to "human".
For the closing party of the Lightcone Offices, I used Midjourney 5 to make a piece of art to represent a LessWrong essay by each member of the Lightcone team, and printed them out on canvases. I'm quite pleased about how it came out. Here they are.
by jacobjacob
(context: Jacob has been taking flying lessons, and someday hopes to do cross-country material runs for the Rose Garden Inn at shockingly fast speeds by flying himself to pick them up)
by RobertM
FHI just released Pause Giant AI Experiments: An Open Letter
I don't expect that 6 months would nearly be enough time to understand our current systems well enough to make them aligned. However, I do support this, and did sign the pledge, as getting everybody to stop training AI systems more powerful than GPT-4 for 6 months, would be a huge step forward in terms of coordination. I don't expect this to happen. I don't expect that OpenAI will give up its lead here.
See also the relevant manifold market.
Maybe you already thought of this, but it might be a nice project for someone to take the unfinished drafts you've published, talk to you, and then clean them up for you. Apprentice/student kind of thing. (I'm not personally interested in this, though.)
I like that idea! I definitely welcome people to do that as practice in distillation/research, and to make their own polished posts of the content. (Although I'm not sure how interested I would be in having said person be mostly helping me get the posts "over the finish line".)
I'm really confused by this passage from The Six Mistakes Executives Make in Risk Management (Taleb, Goldstein, Spitznagel):
...We asked participants in an experiment: “You are on vacation in a foreign country and are considering flying a local airline to see a special island. Safety statistics show that, on average, there has been one crash every 1,000 years on this airline. It is unlikely you’ll visit this part of the world again. Would you take the flight?” All the respondents said they would.
We then changed the second sentence so it read: “Safety statistic
Hence, a complete solution to Alignment will very likely have solving AGI as a side effect. And solving AGI will solve some parts of Alignment, maybe even the hardest ones, but not all of them.
My theory is that the core of the human values is about what human brain was made for - making decisions. Making meaningful decision individually and as a group. Including collectively making decisions about the human fate.
Is it possible to learn a language without learning the values of those who speak it?
Well by that logic Germans may experience more shadenfreude, which would presumably mean there is more shadenfreude going on in Germany than elsewhere, so I don't think your point makes sense. You only need a word for something if it exists, especially if it's something you encounter a lot.
It may also be possible that we use facsimiles for words by explaining their meaning with whole sentences, and only occasionally stumble upon a word that catches on and that elegantly encapsulates the concept we want to convey (like "gaslighting"). It may be ...
I heavily recommend Beren's "Deconfusing Direct vs Amortised Optimisation". It's a very important conceptual clarification.
Probably the most important blog post I've read this year.
Direct optimisers: systems that during inference directly choose actions to optimise some objective function. E.g. AIXI, MCTS, other planning
Direct optimisers perform inference by answering the question: "what output (e.g. action/strategy) maximises or minimises this objective function ([discounted] cumulative return and loss respectively).
Amortised optimisers: syst...
I'm worried about notkilleveryonism as a meme. Years ago, Tyler Cowen wrote a post about why more econ professors didn't blog, and his conclusion was that it's too easy to make yourself look like an idiot relative to the payoffs. And that he had observed this actually play out in a bunch of cases where econ professors started blogs, put their foot in their mouth, and quietly stopped. Since earnest discussion of notkilleveryonism tends to make everyone, including the high status, look dumb within ten minutes of starting, it seems like there will be a strong inclination towards attribute substitution. People will tend towards 'nuanced' takes that give them more opportunity to signal with less chance of looking stupid.
I dunno, the problem with "alignment" is that it doesn't unambiguously refer to the urgent problem, but "notkilleveryoneism" does. Alignment used to mean same-values, but then got both relaxed into compatible-values (that boundary-respecting norms allow to notkilleveryone) and strengthened with various AI safety features like corrigibility and soft optimization. Then there is prosaic alignment, which redefines it into bad-word-censure and reliable compliance with requests, neither being about values. Also, "existential catastrophe" inconveniently includes ...
What is magic?
Presumably we call whatever we can't explain "magic" before we understand it, at which point it becomes simply a part of the natural world. This is what many fantasy novels fail to account for; if we actually had magic, we wouldn't call it magic. There are thousands of things in the modern world that would definitely enter the criteria for magic of a person living in the 13th Century.
So we do have magic; but why doesn't it feel like magic? I think the answer to this question is to be found in how evenly distributed our magic is. Almost ...
Right, but if LessWrong is to become larger, it might be a good idea to stop leaving his posts as the default (the Library, the ones being recommended in the front page, etc.) I don't doubt that his writing is worth reading and I'll get to it, I'm just offering an outsider's view on this whole situation, which seems a little stagnant to me in a way.
That last reply of mine, a reply to a reply to a Shortform post I made, can be found after just a little scrolling on the main page of LessWrong. I should be a nobody to the algorithm, yet I'm not. My only...
I find the standard models of existence (such as Tegmark's Mathematical universe hypothesis) to feel boring, flat, and not self-referential enough.
Logic refers to itself. This is really important.
So here's my take on the nature of existence (not fully serious! consider this a 'playful exploration'):
In the beginning, there was a contradiction.
Why was there a contradiction? The simplest way I can think of explaining it is that "non-existence exists" is a paradox (first noted by the Greek philosopher Parmenides).
And yes, these are just "cute words", but I fin...
Deceptive alignment doesn't preserve goals.
A short note on a point that I'd been confused about until recently. Suppose you have a deceptively aligned policy which is behaving in aligned ways during training so that it will be able to better achieve a misaligned internally-represented goal during deployment. The misaligned goal causes the aligned behavior, but so would a wide range of other goals (either misaligned or aligned) - and so weight-based regularization would modify the internally-represented goal as training continues. For example, if the misali...
This doesn't seem implausible. But on the other hand, imagine an agent which goes through a million episodes, and in each one reasons at the beginning "X is my misaligned terminal goal, and therefore I'm going to deceptively behave as if I'm aligned" and then acts perfectly like an aligned agent from then on. My claims then would be:
a) Over many update steps, even a small description length penalty of having terminal goal X (compared with being aligned) will add up.
b) Having terminal goal X also adds a runtime penalty, and I expect that NNs in practice are...
It is said that on this Earth there are two factions, and you must pick one.
(Hat tip: I got these names 2 years ago from Robert Miles who had been playing with GPT-3.)
In case you're interested, I choose the latter, for there is at least the hope of learning from the mistakes.