How can we recognize when we are failing to change our thinking in light of new evidence that doesn’t fit our expectations and assumptions? And how can we update our thought processes to overcome the challenges that our old ways of seeing pose?

Recent Discussion

This essay was partly based on discussions with "woog" on Discord. Further thanks to the gears to ascension, for inspiring this post with an offhand comment. This is also an entry for the Open Philanthropy AI Worldviews Contest.

Many new researchers are going into AI alignment. For a variety of reasons, they may choose to work for organizations such as Anthropic or OpenAI. Chances are good that a new researcher will be interested in "interpretability".

A creeping concern for many: "Is my research going to cause AGI ruin? Am I making the most powerful AI systems more powerful, even though I'm trying to make them safer?" Maybe they've even heard someone say that "mechanistic interpretability is capabilities research". This essay dissects the specific case of interpretability research, to figure...

"Can we control the thought and behavior patterns of a powerful mind at all?"-I do not see why this would not be the case. For example, in a neural network, if we are able to find a cluster of problematic neurons, then we will be able to remove those neurons. With that being said, I do not know how well this works in practice. After removing the neurons (and normalizing so that the remaining neurons are given higher weights), if we do not retrain the neural network, then it could exhibit more unexpected or poor behavior. If we do retrain the network, then ... (read more)

tl;dr: Ask questions about AGI Safety as comments on this post, including ones you might otherwise worry seem dumb!

Asking beginner-level questions can be intimidating, but everyone starts out not knowing anything. If we want more people in the world who understand AGI safety, we need a place where it's accepted and encouraged to ask about the basics.

We'll be putting up monthly FAQ posts as a safe space for people to ask all the possibly-dumb questions that may have been bothering them about the whole AGI Safety discussion, but which until now they didn't feel able to ask.

It's okay to ask uninformed questions, and not worry about having done a careful search before asking.

AISafety.info - Interactive FAQ

Additionally, this will serve as a way to spread the project Rob...

@drocta @Cookiecarver We started writing up an answer to this question for Stampy. If you have any suggestions to make it better I would really appreciate it. Are there important factors we are leaving out? Something that sounds off? We would be happy for any feedback you have either here or on the document itself https://docs.google.com/document/d/1tbubYvI0CJ1M8ude-tEouI4mzEI5NOVrGvFlMboRUaw/edit#

3NeuralSystem_e5e18h
This thread https://twitter.com/JosephJacks_/status/1662663709037301761 [https://twitter.com/JosephJacks_/status/1662663709037301761] has motivated me to explore arguments for two claims, 1) There are possible artificial agents that if actualized would pose an existential threat to humanity, 2) Humanity is relatively less likely to go extinct if AGI is not actualized within the next five years. -------------------------------------------------------------------------------- XXX: Is there a way to make this argument more succinct while not substantially modifying it's meaning? XXX: Can I improve the handling of so-called "circumstances" in the argument? XXX: Do I want to clarify what I mean by "artificial agent"?  I could also use a different term such as, "artificial system", or "artificial intelligence" or "machine intelligence". Definitions "The United States government" refers to all employees of the American federal government from 1940 to the present day. X is a cognitive system iff X is a single human or group of humans or artificial agent. Argument P1. For any group of humans, there is a possible artificial agent that if actualized would be as capable and intelligent as the group of humans. P2. The United States government is a group of humans. P3. For any cognitive systems, X and Y, given sufficiently similar circumstances, X and Y can achieve the same set of goals. P4. The United States government achieved the goal of acquiring a vast arsenal of nuclear weapons under circumstances C. P5. Any Earth residing cognitive system which possesses a vast arsenal of nuclear weapons poses an existential threat to humanity. C1. There is a possible artificial agent that if actualized would be as capable and intelligent as the United States government. (from P1 and P2) C2. There is a possible artificial agent that if actualized within sufficiently similar circumstances, can achieve any goal that the US government has achieved. (from C1 and P3) C3. There i

Possible counterarguments:

It doesn't increase the risk as agents with nuclear arsenals already exist?
Current US government exploited unique resources - land etc 

Current US government will oppose a new similar to US government organization to appear

Proofs are in this link

This will be a fairly important post. Not one of those obscure result-packed posts, but something a bit more fundamental that I hope to refer back to many times in the future. It's at least worth your time to read this first section up to its last paragraph.

There are quite a few places where randomization would help in designing an agent. Maybe we want to find an interpolation between an agent picking the best result, and an agent mimicking the distribution over what a human would do. Maybe we want the agent to do some random exploration in an environment. Maybe we want an agent to randomize amongst promising plans instead of committing fully to the plan it thinks is the best.

However, all...

I forget if I already mentioned this to you, but another example where you can interpret randomization as worst-case reasoning is MaxEnt RL, see this paper. (I reviewed an earlier version of this paper here (review #3).)

EMDR (Eye Movement Desensitization and Reprocessing) therapy is a structured therapy that encourages the patient to focus briefly on a traumatic memory while simultaneously experiencing bilateral stimulation (typically eye movements, but also tones or taps), which is associated with a reduction in the vividness and emotion associated with the traumatic memories.

EMDR is usually done with a therapist. However, you can also just do self-administered EMDR on your own - as often and whenever you want without any costs! Most people don't seem to know this great "do it on your own" option exists - I didn't. So my main goal with this post is to just make you aware of the fact that: "Hey, there's this great therapeutic tool called EMDR, and you can just do it!"....

These are all excellent questions! Unfortunately, I don't have definite answers. I've read somewhere that the idea is to tax the working memory as much as possible such that you can just barely hold an emotional felt sense at the same time as well.
I'd be very interested if someone does some more reading and research on this!
What I personally do: The more intensive the felt sense feels, the harder I focus on the EMDR "distractions", and vice-versa. 

1Anton Rodenhauser35m
I've started doing self-administered EMDR about 6 months ago, and I've been using it very regularly since then, maybe 4 times a week. About half of the time that I do it it feels like it does "something", and maybe every 1 in 10 times it feels like a bigger breakthrough. I've noticed big changes in my behaviour and emotional life over the last 6 months. However, I've been combining a lot of therapeutic stuff, not just EMDR.
1Anton Rodenhauser38m
That's true. However, it's hard to know in advance how severe a trauma is. 
2Blacknsilver11h
I was just reading about EMDR in "The Body Keeps the Score" and thinking how nice it'd be if my psychiatrist wasn't stuck in the 19th century. I will try this out on my own and edit (or maybe reply) later on with my thoughts and experiences.

LessWrong is experimenting with the addition of reacts to the site, as per the recent experimental Open Thread. We are now progressing to the next stage of the experiment: trying out reacts in actual discussion threads.

The dev/moderator team will be proactively looking for posts to enable to react voting on (with author permission), but also any user can enable it themselves to help us experiment:

  • When creating or editing a post, expand the "Options" section at the bottom and change the Voting system to Names-attached reactions

The admins will also be on the lookout for good posts to enable reacts on (with author permission).

Iterating on the react palette

We're continuing to think about what reacts should be available. Thanks to everyone who's weighed in so far.

I just spent time today...

I like the "picking one's nose" icon. :D 

2tailcalled3h
Idea: to address this issue of reacts potentially leading to less texty responses [https://www.lesswrong.com/posts/SzdevMqBusoqbvWgt/open-thread-with-experimental-feature-reactions?commentId=7mEq2NxrKmYmHwSk9] in an unconfounded way, maybe for a period of time during later experiments you could randomly enable reacts on half of all new posts? Might be silly though. At least it's not very worthwhile without a measure of how well it goes. Potentially total amount of text written in discussions could function as such a measure, but it seems kind of crude.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Subscribe to Curated posts
Log In Reset Password
...or continue with

People talk about Kelly betting and expectation maximization as though they're alternate strategies for the same problem. Actually, they're each the best option to pick for different classes of problems. Understanding when to use Kelly betting and when to use expectation maximization is critical.

Most of the ideas for this came from Ole Peters ergodicity economics writings. Any mistakes are my own.

The parable of the casino

Alice and Bob visit a casino together. They each have $100, and they decide it'll be fun to split up, play the first game they each find, and then see who has the most money. They'll then keep doing this until their time in the casino is up in a couple days.

Alice heads left and finds a game that looks good. It's double...

2Oscar_Cunningham6h
Can you be more precise about the exact situation Bob is in? How many rounds will he get to play? Is he trying to maximise money, or trying to beat Alice? I doubt the Kelly criterion will actually be his optimal strategy.

I wrote this with the assumption that Bob would care about maximizing his money at the end, and that there would be a high but not infinite number of rounds.

On my view, your questions mostly don't change the analysis much. The only difference I can see is that if he literally only cares about beating Alice, he should go all in. In that case, having $1 less than Alice is equivalent to having $0. That's not really how people use money though, and seems pretty artificial.

How are you expecting these answers to change things?

Here are some views, oftentimes held in a cluster:

  • You can't make strong predictions about what superintelligent AGIs will be like. We've never seen anything like this before. We can't know that they'll FOOM, that they'll have alien values, that they'll kill everyone. You can speculate, but making strong predictions about them? That can't be invalid.
  • You can't figure out how to align an AGI without having an AGI on-hand. Iterative design is the only approach to design that works in practice. Aligning AGI right on the first try isn't simply hard, it's impossible, so racing to build an AGI to experiment with is the correct approach for aligning it.
  • An AGI cannot invent nanotechnology/brain-hacking/robotics/[insert speculative technology] just from the data already available to humanity, then use its newfound understanding
...

I liked that you found a common thread in several different arguments.

However, I don't think that the views are all believed or all disagreed with in practice. I do think Yann LeCun would agree with all the points and Eliezer Yudkowsky would disagree with all the points (except perhaps the last point).

For example, I agree with 1 and 5, agree with the first half but not the second half of 2 disagree with 3 and have mixed feelings about 4.

Why? At a high level, I think the extent to which individual researchers, large organizations and LLMs/AIs need empirical feedback to improve are all quite different.

7Logan Zoellner2h
I think this is a strawman of LPE.  People who point out you need real world experience don't say that you need 0 theory, but that you have to have some contact with reality, even in deadly [https://en.wikipedia.org/wiki/Demon_core] domains. Outside of a handful of domains like computer science and pure mathematics, contact with reality is necessary because the laws of physics [https://en.wikipedia.org/wiki/Uncertainty_principle] dictate that we can only know things up to a limited precision.  Moreover, it is the experience of experts in a wide variety of domains that "try the thing out and see [https://www.youtube.com/watch?v=Lrn1c6N0phw]what happens" is a ridiculously effective heuristic. Even in mathematics, the one domain where LPE should in principal be unnecessary, trying [https://en.wikipedia.org/wiki/Collatz_conjecture#Empirical_data]things out is one of the main ways that mathematicians gain intuitions for what new results are/aren't likely to hold. I also note that your post doesn't give a single example of a major engineering/technology breakthrough that was done without LPE (in a domain that interacts with physical reality). This is literally the one specific thing LPE advocates think you need to learn from experience about, and you're just asserting it as true? To summarize: Domains where "pure thought" is enough: * toy problems * limited/no interaction with the real world * solution/class of solutions known in advance Domains where LPE is necessary: * too complicated/messy to simulate * depends on precise physical details of the problem * even a poor approximation to solution not knowable in advance
3Gerald Monroe8h
Your post seems to disagree with several empirically based lesswrong posts.  Since your model of the capabilities of simulations is wrong, why should anyone believe ASIs will be exempt?  Analysis follows: https://blog.aiimpacts.org/p/you-cant-predict-a-game-of-pinball [https://blog.aiimpacts.org/p/you-cant-predict-a-game-of-pinball]  Mathematically shows that it's impossible to model a game of pinball well enough to predict it at all.  Note that if this is an unknown pinball machine - it's not a perfect ideal one, but there are irregularities in the table, wear on the bumpers, and so on - then even an ASI with a simulator cannot actually solve this game of pinball.  It will need to play it some. If you think about the pinball problem in more detail - "give it 5 minutes" - you will realize that brute force playing thousands of games isn't needed.  To know about the irregularities of the tabletop, you need the ball to travel over all of the tabletop, from probably several different directions and speeds, and observe it's motion with a camera.  To know about hidden flaws in the bumpers you likely need impacts from different angles and speeds. There are a variety of microscope scanning techniques that work like the above.  This is also similar to how PBR material scanning is done (example link https://www.a23d.co/blog/pbr-texture-scanning/) Conclusion: you won't need the thousands of games a human player will need to get good at a particular pinball table, but you will need to play enough games on a given table or collect data from it using sensors not available to humans (and not published online in any database, you will have to get humans to setup the sensors over the table or send robots equipped with the sensors).  Without this information, if the task is "achieve expert level performance on this pinball table, zero shot, with nothing but a photo of the table" , the task is impossible.  No ASI, even an "infinite superintelligence", can solve.  This extends in
2GdL75210h
But every environment which isn't perfectly known and every "goal" which isn't complete concrete , opens up error. Which then stacka upon error as any "plan" to interact with / modify reality adds another step. If the ASI can infer some materials science breakthroughs with given human knowledge and existing experimental data to some great degree of certainty , ok I buy it. What I don't buy is that it can simulate enough actions and reactions with enough certainty to nail a large domain of things on the first try. But I suppose thats still sort of moot from an existential risk perspective because FOOM and sharp turns aren't really a requirement. But "inferring" the best move in tic tac toe and say "developing a unified theory of reality without access to super colliders" is a stretch that doesn't hold up to reason. "Hands on experience ia not magic" , neither is "superintelligence" , the LLM's already hallucinate and any concievable future iteration will still be bound by physics , a few wrong assumptions compounded together can whiff a lot of hyperintelligent schemes.

Preamble:

(If you're already familiar with all basics and don't want any preamble, skip ahead to Section B for technical difficulties of alignment proper.)

I have several times failed to write up a well-organized list of reasons why AGI will kill you.  People come in with different ideas about why AGI would be survivable, and want to hear different obviously key points addressed first.  Some fraction of those people are loudly upset with me if the obviously most important points aren't addressed immediately, and I address different points first instead.

Having failed to solve this problem in any good way, I now give up and solve it poorly with a poorly organized list of individual rants.  I'm not particularly happy with this list; the alternative was publishing nothing, and publishing this seems marginally...

1kubanetics7h
This is another reply in this vein, I'm quite new to this so don't feel obliged to read through. I just told myself I will publish this. I agree (90-99% agreement) with almost all of the points Eliezer made. And the rest is where I probably didn't understand enough or where there's no need for a comment, e.g.: 1. - 8.  agree 9. Not sure if I understand it right - if the AGI has been successfully designed not to kill everyone then why need oversight? If it is capable to do so and the design fails then on the other hand what would our oversight do? I don't think this is like the nuclear cores. Feels like it's a bomb you are pretty sure won't go off at random but if it does your oversight won't stop it. 10. - 14. - agree 15. - I feel like I need to think about it more to honestly agree. 16. - 18. - agree 19. - to my knowledge, yes 20. - 23. - agree 24. - initially I put "80% agree" to the first part of the argument here (that   but then discussing it with my reading group I reiterated this few times and begun to agree even more grasping the complexity of something like CEV. 25. - 29. - agree 30. - agree, although wasn't sure about  I think that the key part of this claim is "all the effects of" and I wasn't sure whether we have to understand all, but of course we have to be sure one of the effects is not human extintion then yes, so for "solving alignment" also yes. 31. - 34. - agree 35. - no comment, I have to come back to this once I graps LDT better 36. - agree 37. - no comment, seems like a rant 😅 38. - agree 39. - ok, I guess 40. - agree, I'm glad some people want to experiment with the financing of research re 40. 41. - agree , although I agree with some of the top comments on this, e.g. evhub's 42. - agree 43. - agree, at least this is what it feels like  

Regarding 9: I believe it's when you are successful enough that your AGI doesn't instantly kill you immediately but it still can kill you in the process of using it. It's in the context of a pivotal act, so it assumes you will operate it to do something significant and potentially dangerous.