New Comment
35 comments, sorted by Click to highlight new comments since: Today at 1:58 AM

Being Sick Sucks More than I Thought

I spend most I my life sitting alone in my room, in front of my computer, when not going to University or school. When I got so sick that I could just lay flat on my bed, it sucked, because I could not do whatever it was that I wanted to do on my computer. However, that was only when I was very very sick. Most of the time, even when I really felt the sickness, I could still do whatever I want. At the very least I could listen to an audiobook, or watch a Youtube video.

When I was sick for 1 or 2 weeks, really at most 1 or 2 days, I would feel so sick that I could not really do anything else, and that only happened once or twice in my life. So mostly my life did not change when I was sick. But now the situation changed. I now often want to go to some event that is basically always related to AI alignment, or go to a space where people that work on AI alignment hang out. But I can't do this when I am sick. At least not if I want to avoid infecting anybody else, which does seem very high value. Now sickness has something that does make my life substantially worse, compared to before.

I did basically miss EAG 2023 bay area completely because I was sick. I just had one meeting before the symptoms kicked in. I spend hours lining up cool people that I really wanted to talk to. But then I just lay sick in my Hotel room. Also, there was one 2 day event after EAG that I missed, and there was a retreat that I will miss more than half of at least. Being sick sucks. I guess I should wear a mask on public transport, and Ubers.

I'm sorry to hear this. At least I got to meet you before you fell ill. Get well soon.

Thank you, though just to be clear I am not saying this to complain. I say this to cache my reasoning behind, how important not getting sick is. I was operating while not taking properly into account the consequences of my actions.

Sometimes I tell somebody about a problem in our relation. An answer I often hear is an honest "What do you want me to do". This is probably well-intentioned most of the time, but I really don't like this answer. I much prefer when the other person starts to use their cognitive resources to optimize the problem to smithereens. "What do you want me to do" is the lazy answer. It is the answer you give to be agreeable. It makes it seem like you don't care about the problem, or at least not enough for you to invest effort into fixing it.

This is highly dependent on the relation and the problem.  If you don't have a ready answer to "what should I do", then you probably should be asking and discussion whether and what kind of problem there is, prior to expecting someone to put a bunch of thought into your short description.

Yes. I was thinking about the scenario where I make it absolutely clear that there is a problem. I feel that should be enough reason for them to start optimizing, and not take my inability to provide a policy for them to execute as an excuse to ignore the problem. Though I probably could describe the problem better. See also this.

Fair enough - those details matter in human relationships, and it's probably not possible to abstract/generalize enough for you to be comfortable posting while still getting useful feedback in this forum.

I do worry that a lot of LW readers' model of society and relationships is more symmetrical in goals and attitudes than is justified by experience and observation.  Other-optimization (Trying to make someone more effective in satisfying your goals) is not pretty.  

LW readers' model of society and relationships is more symmetrical in goals and attitudes than is justified by experience and observation

What do you mean by this?

In this case, I mean that I’d be kind of shocked if most humans, even close friends or romantic partners, react to “here’s a problem I see in our relationship” with the openness and vigor you seem to expect.

In general, I mean there’s often a denial of the fact that most people are more selfish than we want to project.

Do you mean "What do you want me to do" in the tone of voice that means "There's nothing to do here, bugger off"? Or do you mean "What do you want me to do?" in the tone of voice that means "I'm ready to help with this. What should I do to remedy the problem?"?

I mean the situation where they are serious. If I would tell them a solution they would consider it and might even implement it. But they are not pointing their consequentialist reasoning skills toward the problem to crush it. See also this comment.

"What do you want me to do?" prods you to give concrete examples of what a solution looks like. That can reveal aspects of the problem you didn't realize, and implicitly shows people an model of the problem. Which is crucial, because communicating is hard, even with people you're close to. Especially if they haven't didn't notice the problem themselves.

I have not communicated the subtleties here. I was mainly complaining about a situation where the other person is not making the mental move of actually trying to solve the problem. When I don't have an answer to "What do you want me to do?", they see it as an excuse, to do nothing and move on. Your interpretation presupposes that they are trying to solve the problem. If somebody would do what you are describing, they would do well to state that explicitly.

"What do you want me to do?" is much worse than "What do you want me to do? I am asking because maybe you have already thought of a solution, and it is just a matter of you telling me how to implement it. Then I can go ahead and implement it if I also think it is a good solution. If not that is fine too. In this case, let's try to solve the problem together. Let's first get clearer about what a solution would look like. What are the relevant properties a solution should have, and what is weighting on these properties? ..."

Solomonoff induction does not talk about how to make optimal tradeoffs in the programs that serve as the hypothesis.

Imagine you want to describe a part of the world that contains a gun. Solomonoff induction would converge on finding the program that perfectly predicts all the possible observations. So this program would be able to predict what sort of observations would I make after I stuff a banana into the muzzle and fire it. But knowing how the banana was splattered around is not the most useful fact about the gun. It is more useful to know that a gun can be used to kill humans and animals. So if you want to store your world model in only n bits of memory, you need to decide which information to put in. And this matters because some information is much more useful than others. So how can we find the world model that gives you the most power over the world, i.e. letting you reach the greatest number of states? Humans have the ability to judge the usefulness of information. You can ask yourself, what sort of knowledge would be most useful for you to learn? Or, What knowledge would be most bad to forget?

FHI just released Pause Giant AI Experiments: An Open Letter

I don't expect that 6 months would nearly be enough time to understand our current systems well enough to make them aligned. However, I do support this, and did sign the pledge, as getting everybody to stop training AI systems more powerful than GPT-4 for 6 months, would be a huge step forward in terms of coordination. I don't expect this to happen. I don't expect that OpenAI will give up its lead here.

See also the relevant manifold market.

Right now I am trying to better understand future AI systems, by first thinking about what sort of abilities I expect every system of high cognitive power will have, and second, trying to find a concrete practical implementation of this ability. One ability is building a model of the world, that has certain desiderata. For example, if we have multiple agents in the world, then we can factor the world, such that we can build just one model of the agent, and point to this model in our description of the world two times. This is something that Solomonoff induction can also do. I am interested in constraining the world model, such that we always get out a world model that has a similar structure, such that the world model becomes more interpretable. I.e. I try to find a way for building a world model, where we mainly need to understand the world model's content, as it is easy to understand how the content is organized.

Apparently a heuristic funders use, is that the best startup founders are those that have done the most startups in the past, irrespective of if they failed or succeeded.

If this is mapping reality well, it might be because most startups fail. So even a person that is very competent at running a startup is expected to fail a couple of times. And having run multiple startups either indicates that certain skills have been acquired, or that the person has some desirable attributes:

  • Determination is important, so people who give up after failing will be filtered.
  • If somebody convinced other grantmakers in the past, that is an indicator that they are intelligent enough to generate a coherent-looking proposal.
  • They will have gathered a lot of experience running companies, which seems to be an important skill that I would expect is hard to impossible to get in other ways, to the same extent.

I was listening to a stoic lesson on Waking up. It was about:

  • Focus on being a participant in your life during the day.
  • But in a low-grade manner observe yourself during the day.
  • Play the role of your own critic in the evening (e.g. do a bedtime reflection).

I've been doing a daily reflection for a long time. Though I have not thought about the reflection as providing constructive criticism. This framing seems much better than my previous one. Before I mainly wrote down all the things that I did during the day, and how they differed from my plan for the day. This is not bad, insofar as it helps you to make improvements to your life. I do think there is some merit in just doing this, but the main benefit is, that it makes it easier to think about concrete plans for improvement. I understand constructive criticism as either providing information that is relevant to come up with plans for improving yourself, or with suggestions for such plans.

Also, this framing makes it more evident that the goal is on improving yourself. Overeating, behaving differently from how I think I should act in some social circumstances, not going to bed on time, or eating unhealthy food, are more obvious to think about. The objective is to come up with plans for improving yourself. Before it felt more like I was following a rigid procedure of describing my day.

How to do a reflection:

Look for things that were not good for 3 minutes, and then come up with a solution to the most important problem.

This seems to be by far the best plan. You can't train many new habits at the same time. Instead, you should focus on 1-3, until you got them down. Habits are involved in many improvement plans if not all. Most improvements are about training yourself to do the right thing reflexively.

Also, reflecting and coming up with plans can take quite a lot of time. Before having the framing of giving myself constructive criticism, I did not end up with concrete improvement plans that often. Part of the reason is that writing out all the things I did and analyzing how I did not achieve my goals, takes a lot of time. That time is better spent actually thinking about concrete plans. By bounding the amount of time you have for identifying a problem, you force yourself to spend more time devising concrete improvement plans. The most important problems will probably be salient and pop out in the 3 minutes.

I have not tried this strategy in this setting yet, but I used it in others, where it worked very well.

Many people match "pivotal act" to "deploy AGI to take over the world", and ignore the underlying problem of preventing others from deploying misaligned AGI.

I have talked to two high-profile alignment/alignment-adjacent people who actively dislike pivotal acts.

I think both have contorted notions of what a pivotal act is about. They focused on how dangerous it would be to let a powerful AI system loose on the world.

However, a pivotal act is about this. So an act that ensures that misaligned AGI will not be built is a pivotal act. Many such acts might look like taking over the world. But this is not a core feature of a pivotal act. If I could prevent all people from deploying misaligned AGI, by eating 10 bananas in sixty seconds, then that would count as a pivotal act!

The two researchers were not talking about how to prevent misaligned AGI from being built at all. So I worry that they are ignoring this problem in their solution proposals. It seems "pivotal act" has become a term with bad connotations. When hearing "pivotal act", these people pattern match to "deploy AGI to take over the world", and ignore the underlying problem of preventing others from deploying misaligned AGI.

I expect there are a lot more people who fall into this trap. One of the people was giving a talk and this came up briefly. Other people seemed to be on board with what was said. At least nobody objected, except me.

Disgust is optimizing

Someone told me that they were feeling disgusted by the view of trying to optimize for specific things, using specific objectives. This is what I wrote to them:

That feeling of being disgusted is actually some form of optimization itself. Disgust is a feeling that is utilized for many things, that we perceive as negative. It was probably easier for evolution to rewire when to feel disgusted, instead of creating a new feeling. The point is that that feeling that arises is supposed to change your behavior steering you in certain directions. I.e. it redirects what you are optimizing for. For example, it could make you think about why trying to optimize for things directly using explicit objectives is actually a bad thing. But the value judgment comes first. You first feel disgusted, and then you try to combat in some way the thing that you are disgusted by and try to come up with reasons why it is bad. So it is ironic that one can feel disgusted at optimization when feeling disgusted is part of an optimization process itself.

We were talking about maximizing positive and minimizing negative conscious experiences. I guess with the implicit assumption that we could find some specification of this objective that we would find satisfactory (one that would not have unintended consequences when implemented).

It's understandable to feel disgust at some visible optimization processes, while not feeling disgust at others, especially ones that aren't perceived as intrusive or overbearing.  And that could easily lead to disgust at the INTENT to optimize in simple/legible ways, without as much disgust for complex equilibrium-based optimizations that don't have human design behind them.

Yes. There are lots of optimization processes built into us humans, but they feel natural to us, or we simply don't notice them. Stating something that you want to optimize for, especially if it is something that seems to impose itself on the entire structure of the universe, is not natural for humans. And that goal, if implemented would restrict the individual's freedoms. And that humans really don't like.

I think this all makes sense when you are trying to live together in a society, but I am not sure if we should blindly extrapolate these intuitions to determine what we want in the far future.

I am not sure if we should blindly extrapolate these intuitions to determine what we want in the far future.

I'm pretty sure we shouldn't.  Note that "blindly" is a pretty biased way to describe something if you're not trying to skew the discussion.  I'm pretty sure we shouldn't even knowingly and carefully extrapolate these intuitions terribly far into the future.  I'm not sure whether we have a choice, though - it seems believable that a pure laissez-faire attitude toward future values leads to dystopia or extinction.

The "Fu*k it" justification

Sometimes people seem to say "fuk it" towards some particular thing. I think this is a way to justify one's intuitions. You intuitively feel like you should not care about something, but you actually can't put your intuition into words. Except you can say "fuk it" to convey your conclusion, without any justification. "Because it's cool" is similar.

Newcomb: Can't do whats optimal

You have a system, that can predict perfectly what you will do in the future. It presents you with two opaque boxes. If you take both boxes, then it will place in one box 10$ and in the other 0$. If you will take only one box, then it will place in one box 10$ and in the other 1,000,000$. The system does not use its predictive power to predict which box you will choose, but only to determine if you choose one or two boxes. It uses a random number generator to determine where to place which amount of dollars.

This is a modified version of Newcomb's problem.

Imagine that you are an agent that can reliably pre-commit to an action. Now imagine you pre-commit to taking only one box in such a way, that it makes it impossible for you to not uphold that commitment. Now if you choose a box, and get 10$, you know that the other box contains 1,000,000$ for sure.

You have a system, that can predict perfectly what you will do in the future.

In fact, I do not.  This (like Newcomb) doesn't tell me anything about the world.

Imagine that you are an agent that can reliably pre-commit to an action

In this set-up, what does the pre-commitment imagination do for us?  The system predicts correctly whether I pre-commit or not, right?

The interesting thing is that you can end up in a scenario where you actually know that the other box contains 1,000,000$ for sure. The one that you did not pick. Although you can't take it because of the pre-commitment mechanism. And this pre-commitment mechanism is the only thing that prevents you from taking it. The thing that I found interesting is that such a situation can arise.

You have a system, that can predict perfectly what you will do in the future.

In fact, I do not. This (like Newcomb) doesn't tell me anything about the world.

Also of course there is no system in reality that can predict you perfectly, but this is about an idealised scenario that is relevant because there are systems that can predict you with more than 50% accuracy.

Although you can't take it because of the pre-commitment mechanism.

This is a crux for me.  In such worlds where this prediction is possible, you can no longer say "because of" and really know that's true.  I suspect the precommittment mechanism is the way you KNOW that you can't take the box, but it's not why you can't take the box.

I don't really get that. For example, you could put a cryptographic lock on the box (let's assume there is no way around it without the key), and then throw away the key. It seems that now you actually are not able to access the box, because you do not have the key. And you can also at the same time know that this is the case.

Not sure why this should be impossible to say.

Sure, there are any number of commitment mechanisms which would be hard (or NP-hard) to bypass.  If the prediction and box-content selection was performed by Omega based on that cause, then fine.  If instead, it was based on a more complete modeling of the universe, REGARDLESS of whether the visible mechanism "could" be bypassed, then there are other causes than that mechanism.  

There could be but there does not need to be, I would say. Or maybe I really do not get what you are talking about. It could really be that if the cryptographic lock was not in place, that then you could take the box, and there is nothing else that prevents you from doing this. I guess I have an implicit model where I look at the world from a cartesian perspective. So is what you're saying about counterfactuals, and that I am using them in a way that is not valid, and that I do not acknowledge this?

I think my main point is that "because" is a tricky word to use normally, and gets downright weird in a universe that includes Omega levels of predictions about actions that feel "free" from the agent.

If Omega made the prediction, that means Omega sees the actual future, regardless of causality or intent or agent-visible commitment mechanisms.  

New to LessWrong?