Not wanting to disagree or downplay, I just want to offer a different way to think about it.
When somebody says I don't exist - and this definitely happens - to me, it all depends on what they're trying to do with it. If they're saying "you don't exist, so I don't need to worry about harming you because the category of people who would be harmed is empty", then yeah I feel hurt and offended and have the urge to speak up, probably loudly. But if they're just saying when trying to analyze reality, like, "I don't think people like that exist, because my model ...
Or maybe you're just the right amount of optimistic for the people you've run into, and I'm just less lucky. =P
I'll cheat and give you the ontological answer upfront: you're confusing the alternate worlds simulated in your decision algorithm with physically real worlds. And the practical answer: free will is a tool for predicting whether a person is amenable to persuasion.
Smith has a brain tumor such that he couldn’t have done otherwise
Smith either didn't simulate alternate worlds, didn't evaluate them correctly or the evaluation didn't impact his decisionmaking; there is no process flow through outcome simulation that led to his action. Instead of "I want X de...
Sounds like regret aversion?
edit: Hm, you're right that optionality is kind of an independent component.
See also: Swiss cheese model
tl;dr: don't overanalyze the final cause of disaster; usually it was preceded by serial failure of prevention mechanisms, any one or all of which can be improved for risk reduction.
If your son can't tell the difference between the risk profiles of LSD and heroin, something has gone wrong in your drug education. Maybe it's the overly simplistic "drugs are bad" messaging? Maybe a "drugs have varying levels of risk in multiple dimensions" messaging would avoid embarrassing events like comparing LSD with coffee - because yes, coffee is a drug that affects the brain. Wouldn't be much use if it didn't. It even creates a tolerance, forcing increasingly higher doses. So it is in fact quite hard to draw a hard line between coffee, LSD, mariju...
So IIUC, would you expect RLHF to, for instance, destroy not just the model's ability to say racist slurs, but its ability to model that anybody may say racist slurs?
Do you think OpenAI's "As a language model trained by OpenAI" is trying to avoid this by making the model condition proper behavior on its assigned role?
Yes, this effectively forces the network to use backward reasoning. It's equivalent to saying "Please answer without thinking, then invent a justification."
The whole power of chains-of-thought comes from getting the network to reason before answering.
When we get results that it is easy for you to be afraid of, it will be firmly too late for safety work.
How does this handle the situation where the AI, in some scenario, picks up the idea of "deception" and then, when it describes its behavior honestly by intending to mislead the observer into thinking that it is honest, due to noticing that it is probably inside a training scenario, then gets reinforcement trained on dishonest behaviors that present as honest, ie. deceptive honesty?
Hm, difficult. I think the minimal required trait is the ability to learn patterns that map outputs to deferred reward inputs. So an organism that simply reacts to inputs directly would not be an optimizer, even if it has a (static) nervous system. A test may be if the organism can be made to persistedly change strategy by a change in reward, even in the immediate absence of the reward signal.
I think maybe you could say that ants are not anthill optimizers? Because the optimization mechanism doesn't operate at all on the scale of individual ants? Not sure if that holds up.
I think a bacterium is not an optimizer. Rather, it is optimized by evolution. Animals start being optimizers by virtue of planning over internal representations of external states, which makes them mesaoptimizers of evolution.
If we follow this model, we may consider that optimization requires a map-territory distinction. in that view, DNA is the map of evolution, and the CNS is the map of the animal. If the analogy holds, I'd speculate that the weights are the map of reinforcement learning, and the context window is the map of the mesaoptimizer.
Most multiplayer games have some way to limit XP gain from encounters outside your difficulty, to avoid exactly this sort of cheesing. The worry is that it allows players to get through the content quicker, with (possibly paid) help from others, which presumably makes it less likely they'll stick around.
(Though of course an experienced player can still level vastly faster, since most players don't take combat anywhere near optimally to maximize xp gain.)
That said, Morrowind famously contains an actual intelligence explosion. So you tend to see this sort of...
Resources used in pressuring corporations are unlikely to have any effect which increases AI risk.
Devil's advocate: If this unevenly delays corporations sensitive to public concerns, and those are also corporations taking alignment at least somewhat seriously, we get a later but less safe takeoff. Though this goes for almost any intervention, including to some extent regulatory.
I don’t understand why you would want to spend any effort proving that transformers could scale to AGI.
The point would be to try and create common knowledge that they can. Otherwise, for any "we decided to not do X", someone else will try doing X, and the problem remains.
Humanity is already taking a shotgun approach to unaligned AGI. Shotgunning safety is viable and important, but I think it's more urgent to prevent the first shotgun from hitting an artery. Demonstrating AGI viability in this analogy is shotgunning a pig in the town square, to prove to ...
I'm actually optimistic about prosaic alignment for a takeoff driven by language models. But I don't know what the opportunity for action is there - I expect Deepmind to trigger the singularity, and they're famously opaque. Call it 15% chance of not-doom, action or no action. To be clear, I think action is possible, but I don't know who would do it or what form it would take. Convince OpenAI and race Deepmind to a working prototype? This is exactly the scenario we hoped to not be in...
edit: I think possibly step 1 is to prove that Transformers can scale to...
If these paths are viable, I desire to believe that they are viable.
If these paths are nonviable, I desire to believe that they are nonviable.
Does it do any good, to take well-meaning optimistic suggestions seriously, if they will in fact clearly not work? Obviously, if they will work, by all means we should discover that, because knowing which of those paths, if any, is the most likely to work is galactically important. But I don't think they've been dismissed just because people thought the optimists needed to be taken down a peg. Reality does not owe us...
I have a reasonably low value for p(Doom). I also think these approaches (to the extent they are courses of action) are not really viable. However, as long as they don't increase the probability of p(Doom) its fine to pursue them. Two important considerations here: an unviable approach may still slightly reduce p(Doom) or delay Doom and the resources used for unviable approaches don't necessarily detract from the resources used for viable approaches.
For example, "we'll pressure corporations to take these problems seriously", while u...
How much would your view shift if there was a model that could "engineer its own prompt", even during training?
I meant it's a hard bet to win because how exactly would I collect. That said, I'm genuinely not sure if it's a good field for betting. Roughly speaking, there's two sorts of bets: "put your money where your mouth is" bets and "hedging" bets. The former are "for fun" and signaling/commitment purposes; the latter are where the actual benefit comes in. But with both bets, it's difficult to figure out a bet structure that works if the market gets destroyed in the near future! We could bet on confidence, but I'm genuinely not sure if there'll be one or two "bi...
A system that contains agents is a system that is dangerous, it doesn't have to "be" an agent. Arguably PaLM already contains simple agents. This is why it's so important that it understands jokes, because jokes contain agents that are mistaken about the world, which implies the capability to model people with different belief states.
Imagine a human captured by a mind control fungus, and being mind controlled to not replicate and to do no harm. Also the entire planet is covered with the fungus and the human hates it and wants it to be dead, because of the mind control. (This is not an AI analogy, just an intuition pump to get the human in the right mindset.) Also the fungus is kind of stupid, maybe 90 IQ by human standards for its smartest clusters. What rules could you, as the fungus, realistically give the human, that doesn't end up with "our entire planet is now on fire" or "we have...
That seems like a hard bet to win. I suggest instead offering to bet on "you will end up less worried" vs "I will end up more worried", though that may not work.
Katja Grace's 2015 survey of NIPS and ICML researchers provided an aggregate forecast giving a 50% chance of HLMI occurring by 2060 and a 10% chance of it occurring by 2024.
2015 feels decades ago though. That's before GPT-1!
(Today, seven years after the survey was conducted, you might want to update against the researchers that predicted HLMI by 2024.)
I would expect a survey done today to have more researchers predicting 2024. Certainly I'd expect a median before 2060! My layman impression is that things have turned out to be easier to do for big language ...
Just delete the context window and tweak the prompt.
Well, if we get to AGI from NLP, ie. a model trained on a giant human textdump, I think that's promising because we're feeding it primarily data that's generated by the human ontology in the first place, so the human ontology would plausibly be the best compressor for it.
I wonder what the failure probability is for human customer service employees.
Now I'm not saying it's anthropic pressure, but if that's true maybe we shouldn't just keep training until we know what exactly it is that the model is grokking.
Trying to solve the LessWrong Community Alignment Problem?
Good question. What my intuition says is "even if you have a snapshot at a certain point, if it was generated randomly, there is no way to get the next snapshot from it." Though maybe it would be. If so, I think it's not just conscious but me in every regard. - I don't know if this is physically coherent, but if we imagine a process by which you can gain answers to every important question about the current state but very little information about the next state, then I don't think this version of me would be conscious. - That said, if you can also query in...
Sorta. Fully agreed with the second. I'm not sure I believe in a state-process distinction- I don't think that if you randomly pulled a snapshot of a brain very much like mine out of a hat, that that snapshot would be phenomenologically conscious, though of course as per follow-the-improbability I wouldn't expect you to actually do this. Rather, the pattern of "my brain, ie. subset <small number> of iteration <large number> of <grand unified theory of physics> is conscious." Ie. I believe in state only inasmuch as it's the output of a pro...
Is my answer (patternism/mathematical monadism) separate from reductive functionalism? My view is that the algorithmic description of my brain is already phenomenologically conscious; physically evaluating it accesses these experiences but does not create them. I think the materialist view still holds that there is some sort of "secret fire" to things actually physically happening.
(If not, just count me under red func.)
The biggest stretch here seems to me to be evaluating the brain on the basis of how much compute existing hardware requires to emulate the brain. Ultimately, this is biased towards the conclusion, because, to slightly parody your position, the question you end up asking is "how much brain do you need to simulate one brain," determining that the answer is "one brain", and then concluding that the brain is perfectly efficient at being itself. However, the question of how much of the compute that is being attributed to the brain here is actually necessary for...
The biggest stretch here seems to me to be evaluating the brain on the basis of how much compute existing hardware requires to emulate the brain.
Where did I do that? I never used emulation in that context. Closely emulating a brain - depending on what you mean - could require arbitrarily more compute then the brain itself.
This article is about analyzing how close the brain is to known physical computational limits.
You may be confused by my comparisons to GPUs? That is to establish points of comparison. Naturally it also relates to the compute/energy c...
I think what happened is the Wesley twins noticed that they had contradictory beliefs:
Thus they know for a fact that something about the map is fucking with their memory or perception. Hence "Someone said a rude word."
Old LessWrong meme - phyg is rot13 cult. For a while people were making "are we a cult" posts so much that it was actually messing with LessWrong's SEO. Hence phyg.
Common question: "Well, but what if God was real and actually appeared to you in flame and glory, wouldn't it be silly to not be convinced in that case?"
My answer: "I don't know, do you think my thought patterns are likely to be deployed in such an environment?"
I think it can be reasonable to have 100% confidence in beliefs where the negation of the belief would invalidate the ability to reason, or to benefit from reason. Though with humans, I think it always makes sense to leave an epsilon for errors of reason.
I don't think the verbal/pre-verbal stream of consciousness that describes our behavior to ourselves is identical with ourselves. But I do think our brain exploits it to exert feedback on its unconscious behavior, and that's a large part of how our morality works. So maybe this is still relevant for AI safety.
That's true, but ... I feel in most cases, it's a good idea to run mixed strategies. I think that by naivety I mean the notion that any single strategy will handle all cases - even if there are strategies where this is true, it's wrong for almost all of them.
Humans can be stumped, but we're fairly good at dynamic strategy selection, which tends to protect us from being reliably exploited.
Humans can be stumped, but we're fairly good at dynamic strategy selection, which tends to protect us from being reliably exploited.
Have you ever played Far Cry 4? At the beginning of that game, there is a scene where you're being told by the main villain of the storyline to sit still while he goes downstairs to deal with some rebels. A normal human player would do the expected thing, which is to curiously explore what's going on downstairs, which then leads to the unfolding of the main story and thus actual gameplay. But if you actually stick to the villa...
Well, one may develop an AI that handles noisy TV by learning that it can't predict the noisy TV. The idea was to give it a space that is filled with novelty reward, but doesn't lead to a performance payoff.
What would stump a (naive) exploration-based AI? One may imagine a game as such: the player starts on the left side of a featureless room. If they go to the right side of the room, they win. In the middle of the room is a terminal. If one interacts with the terminal, one is kicked into an embedded copy of the original Doom.
An exploration-based agent would probably discern that Doom is way more interesting than the featureless room, whereas a human would probably put it aside at some point to "finish" exploring the starter room first. I think this demands a sort of mixed breadth-depth exploration?
The famous problem here is the "noisy TV problem". If your AI is driven to go towards regions of uncertainty then it will be completely captivated by a TV on the wall showing random images, no need for a copy of Doom, any random giberish that the AI can't predict will work.
Sure, but that definition is so generic and applies to so many things that are obviously not like human pain (landslides?) that it lacks all moral compulsion.
Counterexample:
Oh God! I am in horrible pain right now! For no reason, my body feels like it's on fire! Every single part of my body feels like it's burning up! I'm being burned alive! Help! Please make it stop! Help me!!
Okay, so that thing that I just said was a lie. I was not actually in pain (I can confirm this introspectively); instead, I merely pretended to be in pain.
Sir Ian McKellen has an instructive video.
The Turing test works for many things, but I don't think it works for checking for the existence of internal phenomenological states. If you ask...
I mostly see where you're coming from, but I think the reasonable answer to "point 1 or 2 is a false dichotomy" is this classic, uh, tumblr quote (from memory):
"People cannot just. At no time in the history of the human species has any person or group ever just. If your plan relies on people to just, then your plan will fail."
This goes especially if the thing that comes after "just" is "just precommit."
My expectation is that interaction with Vassar is that the people who espouse 1 or 2 expect that the people interacting are incapable of precommitting to th...
This is a very good criticism! I think you are right about people not being able to "just."
My original point with those strategies was to illustrate an instance of motivated stopping about people in the community who have negative psychological effects, or criticize popular institutions. Perhaps it is the case that people genuinely tried to make a strategy but automatically rejected my toy strategies as false. I do not think it is, based on "vibe" and on the arguments that people are making, such as "argument from cult."
I think you are actually completely ...
I don't think Scott is claiming it's arbitrary, I think he's claiming it's subjective, which is to say instrumental. As Eliezer kept pointing out in the morality debates, subjective things are objective if you close over the observer - human (ie. specific humans') morality is subjective, but not arbitrary, and certainly not unknowable.
But also I don't think that phylo categorization is stronger per se than niche categorization in predicting animal behavior, especially when it comes to relatively mutable properties like food consumption. Behavior, body shap...
I think this is either basic psychology or wrong.¹
For one, Kant seems to be conflating the operation of a concept with its perception:
Since the concept of “unity” must exist for there to be combination (or “conjunction”) in the first place, unity can’t come from combination itself. The whole-ness of unified things must be a product of something beyond combination.
This seems to say that the brain cannot unify things unless it has a concept of combination. However, just as an example, reinforcement learning in AI shows this to be false: unification can ...
There was a comment here, but I completely wiped it because it was too confused.
Sorry, but I can no longer participate in the free-will debate. Apparently I have unlearnt how to think in that particular broken way. Anything that has to do with indeterminism relating to choice is no longer legible to me.
I have sort of unlearnt how to think of free will in a nondeterministic sense. As such, I tripped over the part where you said there were "arguments against free will." Like, yes of course the sensation of volition is produced by a deterministic, predictable process; how else could it be about the deciding process? Aboutness only exists in causal systems.
A more interesting question may be what the sensation is for? What part of our high-level cognition depends on noticing that we are making a decision?
A variable is just a pure function with no parameters.
Right, but in the naming style I know, promotedPosts
would never have a visible side effect, because it's a noun. Side-effectful functions have imperative names, promotePosts
- and never the two shall mix.
Disclaimer: I know Said Achmiz from another LW social context.
In my experience, the safe bet is that minds are more diverse than almost anyone expects.
A statement advanced in a discussion like "well, but nobody could seriously miss that X" is near-universally false.
(This is especially ironic cause of the "You don't exist" post you just wrote.)