All of Johannes C. Mayer's Comments + Replies

FHI just released Pause Giant AI Experiments: An Open Letter

I don't expect that 6 months would nearly be enough time to understand our current systems well enough to make them aligned. However, I do support this, and did sign the pledge, as getting everybody to stop training AI systems more powerful than GPT-4 for 6 months, would be a huge step forward in terms of coordination. I don't expect this to happen. I don't expect that OpenAI will give up its lead here.

See also the relevant manifold market.

I guess you talk about this. Just putting it here, such that others can follow the link easily.

Thank you, though just to be clear I am not saying this to complain. I say this to cache my reasoning behind, how important not getting sick is. I was operating while not taking properly into account the consequences of my actions.

I don't mean to offend, it might be my fault, but I don't think you got the core idea that I was trying to communicate. Probably because I did not say it clearly, or maybe it is to be expected that some people will always not get the core point? But that sounds like an excuse. My core point is not that reading speed is a good thing to improve on (though it might be). It is merely an illustrative example. An example that is supposed to illustrate the core thing that I am talking about, such as to make the general abstract pattern that I want to convey more ... (read more)

Being Sick Sucks More than I Thought

I spend most I my life sitting alone in my room, in front of my computer, when not going to University or school. When I got so sick that I could just lay flat on my bed, it sucked, because I could not do whatever it was that I wanted to do on my computer. However, that was only when I was very very sick. Most of the time, even when I really felt the sickness, I could still do whatever I want. At the very least I could listen to an audiobook, or watch a Youtube video.

When I was sick for 1 or 2 weeks, really at most 1 or ... (read more)

2Søren Elverlin1mo
I'm sorry to hear this. At least I got to meet you before you fell ill. Get well soon.

It seems like I'm pretty bad at communicating what I wanted to communicate. Multiple people in the comments said, is that the case is more complicated, then assigning a single number to reading speed. I agree with this. I think that the way reading speed is normally used by people, is probably too simplistic.

My criticism is about, dismissing the concept as unworkable. There are various algorithms and data in the brain, that enables a human to read. Depending on what this algorithms do exactly and on what kind of data you have, you will be in some sense wor... (read more)

I think, in order to defend "reading speed" as a useful atomic concept (or perhaps a cluster of things with this as a proxy measure), you would need to define more closely exactly what you think it is, not just waving toward "something in the real world".   I don't fully disagree with you that improving one's speed of absorbing information is useful.  I've studied and practiced speed reading, including time trials and tests of words-per-minute with retention quizzes.  I recommend doing so to almost anyone reading this (indicating that you read things regularly).  Even so, I don't think it's a "real thing", but a set of capabilities that are imperfectly measurable; "reading speed" is a correlate of some real things, not a real thing itself.

I basically agree with this. But if you apply what are described in the post, it's reveals a lot about why we are not there yet. If you pit a human driver against any of the described autonomous cars, they will just be lots of situations, where the human performs better. And I don't need to run this experiment, in order to cash out its implications. I think when people talk about fully autonomous cars, then they have implicitly something in mind where the autonomous cars at least as good as human. Thinking about an experiment, that you could run here, makes this implicit assumption explicit. Which is think can be useful. It's one of the tools that you can use to make you definition more precise along the way.

I have not done a PhD. But my two cents here are that none of these skills seem very teachable, by traditional teaching methods. I would be surprised if people try to teach modern half of these things explicitly in a PhD. And I don't expect that they will teach them very well. I expect that you will need to figure out most of these things yourself. I have heard that most PhD students get depressed. That doesn't sound like they have good models of how the mind works and how to take care of their mental health. Though all off it depends on how good the people around you are of course.

This post did something strange to my mind.

I already thought that thinking about the problem yourself is important, and basically required if you want to become a good researcher. At least that was the result of some explicit reasoning steps.

However I then talked to a person about this, and they told me basically the opposite. That it is not required to think about the problems yourself. That is okay to dedicate this thinking to other people. And that's these people probably know better what to do than you, as they have already thought about it for a long ... (read more)

Here is a model of mine, that seems related.

[Edit: Add Epistemic status]
Epistemic status: I have used this successfully in the past and found it helpful. It is relatively easy to do. is large for me.

I think it is helpful to be able to emotionally detach yourself from your ideas. There is an implicit "concept of I" in our minds. When somebody criticizes this "concept of I", it is painful. If somebody says "You suck", that hurts.

There is an implicit assumption in the mind that this concept of "I" is eternal. This has the effect, that ... (read more)

Here is what I would do, in the hypothetical scenario, where I have taken over the world.

  1. Guard against existential risk.
  2. Make sure that every conscious being I have access to is at least comfortable as the baseline.
  3. Figure out how to safely self-modify, and become much much much ... much stronger.
  4. Deconfuse myself about what consciousness is, such that I can do something like 'maximize positive experiences and minimize negative experiences in the universe', without it going horribly wrong. I expect that 'maximize positive experiences, minimize negative e
... (read more)

How much risk is worth how much fun?

Minor point: Having fun is not the only motivation one can have. One could end up doing a drug, even if they expect to have a bad time, but think it is worth it in the long run. I am talking especially about psychedelics.

LW readers' model of society and relationships is more symmetrical in goals and attitudes than is justified by experience and observation

What do you mean by this?

In this case, I mean that I’d be kind of shocked if most humans, even close friends or romantic partners, react to “here’s a problem I see in our relationship” with the openness and vigor you seem to expect. In general, I mean there’s often a denial of the fact that most people are more selfish than we want to project.

Yes. I was thinking about the scenario where I make it absolutely clear that there is a problem. I feel that should be enough reason for them to start optimizing, and not take my inability to provide a policy for them to execute as an excuse to ignore the problem. Though I probably could describe the problem better. See also this.

Fair enough - those details matter in human relationships, and it's probably not possible to abstract/generalize enough for you to be comfortable posting while still getting useful feedback in this forum. I do worry that a lot of LW readers' model of society and relationships is more symmetrical in goals and attitudes than is justified by experience and observation.  Other-optimization (Trying to make someone more effective in satisfying your goals) is not pretty.  

Would this not be better as a Question post?

I mean the situation where they are serious. If I would tell them a solution they would consider it and might even implement it. But they are not pointing their consequentialist reasoning skills toward the problem to crush it. See also this comment.

I have not communicated the subtleties here. I was mainly complaining about a situation where the other person is not making the mental move of actually trying to solve the problem. When I don't have an answer to "What do you want me to do?", they see it as an excuse, to do nothing and move on. Your interpretation presupposes that they are trying to solve the problem. If somebody would do what you are describing, they would do well to state that explicitly.

"What do you want me to do?" is much worse than "What do you want me to do? I am asking because maybe... (read more)

Sometimes I tell somebody about a problem in our relation. An answer I often hear is an honest "What do you want me to do". This is probably well-intentioned most of the time, but I really don't like this answer. I much prefer when the other person starts to use their cognitive resources to optimize the problem to smithereens. "What do you want me to do" is the lazy answer. It is the answer you give to be agreeable. It makes it seem like you don't care about the problem, or at least not enough for you to invest effort into fixing it.

This is highly dependent on the relation and the problem.  If you don't have a ready answer to "what should I do", then you probably should be asking and discussion whether and what kind of problem there is, prior to expecting someone to put a bunch of thought into your short description.
Do you mean "What do you want me to do" in the tone of voice that means "There's nothing to do here, bugger off"? Or do you mean "What do you want me to do?" in the tone of voice that means "I'm ready to help with this. What should I do to remedy the problem?"?
"What do you want me to do?" prods you to give concrete examples of what a solution looks like. That can reveal aspects of the problem you didn't realize, and implicitly shows people an model of the problem. Which is crucial, because communicating is hard, even with people you're close to. Especially if they haven't didn't notice the problem themselves.

Right now I am trying to better understand future AI systems, by first thinking about what sort of abilities I expect every system of high cognitive power will have, and second, trying to find a concrete practical implementation of this ability. One ability is building a model of the world, that has certain desiderata. For example, if we have multiple agents in the world, then we can factor the world, such that we can build just one model of the agent, and point to this model in our description of the world two times. This is something that Solom... (read more)

Apparently a heuristic funders use, is that the best startup founders are those that have done the most startups in the past, irrespective of if they failed or succeeded.

If this is mapping reality well, it might be because most startups fail. So even a person that is very competent at running a startup is expected to fail a couple of times. And having run multiple startups either indicates that certain skills have been acquired, or that the person has some desirable attributes:

  • Determination is important, so people who give up after failing will be filter
... (read more)

How to do a reflection:

Look for things that were not good for 3 minutes, and then come up with a solution to the most important problem.

This seems to be by far the best plan. You can't train many new habits at the same time. Instead, you should focus on 1-3, until you got them down. Habits are involved in many improvement plans if not all. Most improvements are about training yourself to do the right thing reflexively.

Also, reflecting and coming up with plans can take quite a lot of time. Before having the framing of giving myself constructive criticism, I... (read more)

I was listening to a stoic lesson on Waking up. It was about:

  • Focus on being a participant in your life during the day.
  • But in a low-grade manner observe yourself during the day.
  • Play the role of your own critic in the evening (e.g. do a bedtime reflection).

I've been doing a daily reflection for a long time. Though I have not thought about the reflection as providing constructive criticism. This framing seems much better than my previous one. Before I mainly wrote down all the things that I did during the day, and how they differed from my plan for the day. T... (read more)

1Johannes C. Mayer4mo
How to do a reflection: Look for things that were not good for 3 minutes, and then come up with a solution to the most important problem. This seems to be by far the best plan. You can't train many new habits at the same time. Instead, you should focus on 1-3, until you got them down. Habits are involved in many improvement plans if not all. Most improvements are about training yourself to do the right thing reflexively. Also, reflecting and coming up with plans can take quite a lot of time. Before having the framing of giving myself constructive criticism, I did not end up with concrete improvement plans that often. Part of the reason is that writing out all the things I did and analyzing how I did not achieve my goals, takes a lot of time. That time is better spent actually thinking about concrete plans. By bounding the amount of time you have for identifying a problem, you force yourself to spend more time devising concrete improvement plans. The most important problems will probably be salient and pop out in the 3 minutes. I have not tried this strategy in this setting yet, but I used it in others, where it worked very well.

You list several possibilities for how directly working on the problem is not the best thing to do. Somebody who is competent and tries to solve the problem would consider these possibilities and make use of them.

I agree that sometimes there will be a promising path, that is discovered by accident. You could not have planned for discovering it. Or could you? Even if you can't predict what path will reveal itself, you can be aware that there are paths that will reveal themselves in circumstances that you could not predict. You can still plan to do some spec... (read more)

Many people match "pivotal act" to "deploy AGI to take over the world", and ignore the underlying problem of preventing others from deploying misaligned AGI.

I have talked to two high-profile alignment/alignment-adjacent people who actively dislike pivotal acts.

I think both have contorted notions of what a pivotal act is about. They focused on how dangerous it would be to let a powerful AI system loose on the world.

However, a pivotal act is about this. So an act that ensures that misaligned AGI will not be built is a pivotal act. Many such acts might look l... (read more)

Solomonoff induction does not talk about how to make optimal tradeoffs in the programs that serve as the hypothesis.

Imagine you want to describe a part of the world that contains a gun. Solomonoff induction would converge on finding the program that perfectly predicts all the possible observations. So this program would be able to predict what sort of observations would I make after I stuff a banana into the muzzle and fire it. But knowing how the banana was splattered around is not the most useful fact about the gun. It is more useful to know that a gun c... (read more)

I was thinking more free hand drawing.

This does not have inline images, right? This is basically the most important feature, that is missing from emacs org-mode.

I think if you insert an image using markdown it'll be displayed. But I don't think you can draw into it directly.

I have not read this post, just looked at it for 30 seconds. It seems you can apply the babble and prune framework at different levels. What the author there talks about seems to be about the actual idea generation process. In that sense, the content here is already pruned, in the sense that I thought the idea was worth writing about and finished my exploratory writing.

This post did not cause me to come up with this scheme, so what the post talks about is probably at least slightly different.

A bit more context. A few weeks ago I was working on writing a blog post for 7 days, and I still have not published it. In part, this experiment is about making me less averse to publishing things, because the idea here is to publish the (in some sense) worst writing that I am producing. Now that I think about it, there are probably a lot more things that I can do, in order to become better at communicating things. The best way is probably to just go through the entire process of writing stuff up and posting it. There are some youtube channels I like, where their old videos just suck, but their new ones are pretty good. I should probably try to emulate this, by just going through the whole creation process many times.

I agree with this. This is a constraint, otherwise, I would have more posts already. You don't want to constrain yourself by needing to think about if what you are writing is something that you can say in public.

Though I wonder how much value is lost by people not posting certain kinds of content, because of this or similar reasons. If you want to provide more value, a good heuristic might be to talk about stuff that seems important, but that you do not want to share, because that probably indicates that other people will also not talk about this.

I have one female friend (not girlfriend) who I know does by default not interpret any sexual intentions into the actions of males. For example, once she was asked out for a drink by a male stranger, and she did not see the intention behind this. She thought it was just about hanging out. I am not sure how old she was there. When this sort of thing happens a lot to you, then you probably become more aware of it.

That said, when I ask her to meet up, to play chess, she also did not see my intention behind that, and there she was definitely in her twenties. T... (read more)

One of the most useful moral heuristics that I know is: It is ok to do X, if you don't hurt anyone by doing X. And this applies here too.

Yes, though I was actually already believing this when feeling bad about my thoughts. I was not worried about other people thinking about me strangely. I was seeing it as a personal failure, which still made me feel bad. My point is that having unrealistic standards of yourself can also lead to unproductive suffering.

That seems to imply that humans would continue to wirehead conditional on that they started wireheading.

Yes, I think they indeed would.

About the following point:

"Argue that wireheading, unlike many other reward gaming or reward tampering problems, is unlikely in practice because the model would have to learn to value the actual transistors storing the reward, which seems exceedingly unlikely in any natural environment."

Well, that seems to be what happened in the case of rats and probably many other animals. Stick an electrode into the reward center of the brain of a rat. Then give it a button to trigger the electrode. Now some rats will trigger their reward centers and ignore food.

Humans ... (read more)

I added a link, that should have been there from the start, thanks.

Thanks for the clarification, now I get it. I think that is a good point. I do not know anyone that I know who did terrible things. And I mean from all the people who I have ever met. Which is probably in the hundreds. But of course, if they had done something terrible they would not necessarily have said. But it feels like none of them did. I just know one person that got into prison. And with know, I mean that I said 2 words to him in all my life, and a friend who knew him better told me after I did not see him for many years. I would expect that most pe... (read more)

I think I don't quite understand what you are saying unless you mean that not all of the observations of bad behavior come from some "region in space".

Then I would say that yes, it does not happen in one place. When you look on youtube for videos of murder confessions you get the videos from countries where this content is publically accessible and mandated to be produced. Though, these are not the conditions under which all people live. I don't know the laws for every country, but I would guess that some don't do it. Certainly, hunter-gatherer tribes don't.

I was rephrasing what you said about not encountering bad stuff in your life (to this extent) and emphasizing that the news can talk about this stuff, videos can show this stuff - but does your own life involve all this stuff, 'that makes you think humanity is evil?'. I then asked whether this method works ('focus on your own life') when applied to someone else's life. I.e. stuff like talking with someone about their life (or reading an autobiography), and suggested that this might be less distorted - but there's a lot of people, and so there will be people whose lives are extraordinary (not just in good ways), and so something like someone telling their life story on news/interview or whatever could still be very different from your own - and in the way described in this post. And also, maybe the type of people with autobiographies are usually famous and lived unusual lives, so that doesn't necessarily work either.

Yes. There are lots of optimization processes built into us humans, but they feel natural to us, or we simply don't notice them. Stating something that you want to optimize for, especially if it is something that seems to impose itself on the entire structure of the universe, is not natural for humans. And that goal, if implemented would restrict the individual's freedoms. And that humans really don't like.

I think this all makes sense when you are trying to live together in a society, but I am not sure if we should blindly extrapolate these intuitions to determine what we want in the far future.

I'm pretty sure we shouldn't.  Note that "blindly" is a pretty biased way to describe something if you're not trying to skew the discussion.  I'm pretty sure we shouldn't even knowingly and carefully extrapolate these intuitions terribly far into the future.  I'm not sure whether we have a choice, though - it seems believable that a pure laissez-faire attitude toward future values leads to dystopia or extinction.

We were talking about maximizing positive and minimizing negative conscious experiences. I guess with the implicit assumption that we could find some specification of this objective that we would find satisfactory (one that would not have unintended consequences when implemented).

Disgust is optimizing

Someone told me that they were feeling disgusted by the view of trying to optimize for specific things, using specific objectives. This is what I wrote to them:

That feeling of being disgusted is actually some form of optimization itself. Disgust is a feeling that is utilized for many things, that we perceive as negative. It was probably easier for evolution to rewire when to feel disgusted, instead of creating a new feeling. The point is that that feeling that arises is supposed to change your behavior steering you in certain direction... (read more)

It's understandable to feel disgust at some visible optimization processes, while not feeling disgust at others, especially ones that aren't perceived as intrusive or overbearing.  And that could easily lead to disgust at the INTENT to optimize in simple/legible ways, without as much disgust for complex equilibrium-based optimizations that don't have human design behind them.

From what I have heard (I have not researched any of this very thoroughly), the palatability is not the problem directly but something very related is. It is not the case that someone would eat a lot, just because it is just so tasty. It is rather about that the composition of processed food is often very different from unprocessed food. And this affects how our body responds, like when we feel full. Eating some Froot Loops is very different from eating a mango.

This might not be the only, or even the main effect, but I would guess that it is a significant ... (read more)

Ah ok, thank you. Now I get it. I was confused by (i) "Imagine the reporter could do perfect inference" and (ii) "the reporter could simply do the best inference it can in the human Bayes net (given its predicted video)".

(i) I thought of this as that the reporter alone can do it, but what is actually meant is that with the use of the predictor model it can do it.

(ii) Somehow I thought that "given its predicted video" is the important modification here, where in fact the only change is to go from that the reporter can do perfect inference, to that it does the best inference that it can.

In section: "New counterexample: better inference in the human Bayes net", what is meant with that the reporter does perfect inference in the human Bayes net? I am also unclear how the modified counterexample is different.

My current understanding: The reporter is doing inference using and the action sequence and does not use to do inference ( is inferred). The reporter has an exact copy of the human Bayes net and now fixes the nodes for and the action sequence. Then it infers the probability for all possible combinations of values each node can ... (read more)

In all of the counterexamples the reporter starts from the v1, actions, and v2 predicted by the predictor. In order to answer questions it needs to infer the latent variables in the human's model. Originally we described a counterexample where it copied the human inference process. The improved counterexample is to instead use lots of computation to do the best inference it can, rather than copying the human's mediocre inference. To make the counterexample fully precise we'd need to specify an inference algorithm and other details. We still can't do perfect inference though---there are some inference problems that just aren't computationally feasible. (That means there's hope for creating data where the new human simulator does badly because of inference mistakes. And maybe if you are careful it will also be the case that the direct translator does better, because it effectively reuses the inference work done in the predictor? To get a proposal along these lines we'd need to describe a way to produce data that involves arbitrarily hard inference problems.)

The "Fu*k it" justification

Sometimes people seem to say "fuk it" towards some particular thing. I think this is a way to justify one's intuitions. You intuitively feel like you should not care about something, but you actually can't put your intuition into words. Except you can say "fuk it" to convey your conclusion, without any justification. "Because it's cool" is similar.

There could be but there does not need to be, I would say. Or maybe I really do not get what you are talking about. It could really be that if the cryptographic lock was not in place, that then you could take the box, and there is nothing else that prevents you from doing this. I guess I have an implicit model where I look at the world from a cartesian perspective. So is what you're saying about counterfactuals, and that I am using them in a way that is not valid, and that I do not acknowledge this?

I think my main point is that "because" is a tricky word to use normally, and gets downright weird in a universe that includes Omega levels of predictions about actions that feel "free" from the agent. If Omega made the prediction, that means Omega sees the actual future, regardless of causality or intent or agent-visible commitment mechanisms.  

I don't really get that. For example, you could put a cryptographic lock on the box (let's assume there is no way around it without the key), and then throw away the key. It seems that now you actually are not able to access the box, because you do not have the key. And you can also at the same time know that this is the case.

Not sure why this should be impossible to say.

Sure, there are any number of commitment mechanisms which would be hard (or NP-hard) to bypass.  If the prediction and box-content selection was performed by Omega based on that cause, then fine.  If instead, it was based on a more complete modeling of the universe, REGARDLESS of whether the visible mechanism "could" be bypassed, then there are other causes than that mechanism.  
Load More