Wiki Contributions


In regards to the point you disagree on: As I understood it, (seemingly) linear relationships between the behaviour and the capabilities of a system don't need to stay that way. For example, I think that Robert Miles recently was featured in a video on Computerphile (YouTube), in which he described how the answers of LLMs to "What happens if you break a mirror" actually got worse with more capability.

As far as I understand it, you can have a system that behaves in a way which seems completely aligned, and which still hits a point of (... let's call it "power"...) power at which it starts behaving in a way that is not aligned. (And/Or becomes deceptive.) The fact that GPT-4 seems to be more aligned may well be because it hasn't hit this point yet.

So, I don't see how the point you quoted would be an indicator of what future versions will bring, unless they can actually explain what exactly made the difference in behaviour, and how it is robust in more powerful systems (with access to their own code).

If I'm mistaken in my understanding, I'd be happy about corrections (:

Thank you for everything you did. My experience in this world has been a lot better since I discovered your writings, and while I agree with your assessment on the likely future, and I assume you have better things to spend your time doing than reading random comments, I still wanted to say that.

I'm curious to see what exactly the future brings. Whilst the result of the game may be certain, I can't predict the exact moves.

Enjoy it while it lasts, friends.

(Not saying give up, obviously.)

Thank you for pointing out the difference between breaking and stopping to peddle.

I read it, continued, then I got confused about you saying that your practice didn't leave "an empty silence".

I'm going to try what you described, because I may have gotten to that silence by breaking habitually when I was younger, instead of just not putting energy into it.

Might I ask what kind of recovery you were talking about? And how it came to be?

I can very much emphasize with having to loop thoughts to keep them, and if there's something that you did to improve your memory, I'd be extremely interested in trying it. Even accepting that I don't know if it will work for me, it's still way better than having no approach.

I'm glad that you got better!

Hi! Questions about volunteering follow:

"They will only be expected to work either before, after or during the event while joining sessions is still often feasible."

Could I get a rephrasing of that? I'm not certain, if the options of before/during/after are (or can be) exclusive, and I am also unclear on what is meant by "joining sessions is often feasible".

I am happy to help, but I would like to know how much of the time during the event (if any) would be, basically, not the event^^

Best regards

This sounds like a case of "wrong" perspective. (Whoa, what?! Yes, keep reading pls^^)

Like someone believing (to believe) in Nihilism. To Nihilism, I haven't thought of a good and correct counter-statement, except:

"You are simply wrong on all accounts, but by such a small amount that it's hard to point to, because it will sound like »You don't have a right to your own perspective«", (Of course, I also would not agree with disallowing personal opinions (as long as we ARE talking about opinions, not facts).)

Granted, I haven't tried to have that kind of discussion since I really started reading and applying the Sequences. But that may be due to my growing habit of not throwing myself into random and doomed discussions, that I don't have a stake in.

But for Bruce, I think I can formulate it:

I am aware of the fact that I still don't allow myself to succeed sometimes. I have recently found that I stand before a barrier that I can summarize as a negative kind of sunk cost fallacy ("If I succeed here, I could have just done that ten years ago"), and I still haven't broken through, yet.*

But... Generalizing this kind of observation to "We all have this Negativity-Agent in our brain" feels incorrect to me. It both obscures the mistake and makes it seem like there is a plan to it.

If I think "Okay, you just detected that thought-pattern that you identified as triggering a bad situation, now instead do X" I feel in control, I can see myself progress, I can do all the things.

If I think "Damn, there's Bruce again!", not only do I externalize the locus of control, I am also "creating" an entity, that can then rack up "wins" against me, making me feel less like I can "beat" them.

It's not an agent. It's a habit that I need to break. That's a very different problem!

I assume that people will say "Bruce is a metaphor". But, provided I have understood correctly, the brain is very prone to considering things as agents (f.e. natural gods, "The System", The whole bit about "life being (not) fair", ...), so feeding it this narrative would seem like a bad idea.

I predict that it will be harder to get rid of the problem, once one gives it agency and/or agenthood. (Some might want an enemy to fight, but even there I take issue with externalizing the locus of control.)

[*In the spirit of "Don't tell me how flawed you are, unless you also tell me how you plan to fix it", I am reading through Fun Theory to defuse it (yes, first read, I am not procrastinating with "need to read more"):

For me it's: I don't want to do X, I want to do something enjoyable Y. And then, when I do Y, I drift into random things, that often aren't all that enjoyable, but just continue the status quo. All the while X is beginning to loom, accrue negative charge and triggering avoidance routines. But if I do X instead, I don't know how to allow myself to take breaks without sliding into the above pattern. So I intend to optimize my fun and expand the area of things that I find fun. That reorientation should help me with dosing it, too. (And yes, I do have adhd, in case you read it out of the text and were wondering if you should point me there ^^)

Also I recently discovered a belief (in...) that I like to learn. I realized that I really don't like learning. I like understanding, but what I call "learning" has a very negative connotation, so I barely do it. Will discover how to effectively facilitate understanding, too. ]

I hope that you are not still struggling with this, but for anyone else in this situation: I would think that you need to change the way you set your goals. There is loads of advice out there on this topic, but there's a few rules I can recall off the top of my head:

  • "If you formulate a goal, make it concrete, achievable, and make the path clear and if possible decrease the steps required." In your case, every one of the subgoals already had a lot of required actions, so the overarching goal of "publish a book" might be too broadly formulated.
  • "If at all possible don't use external markers for your goals." What apparently usually happens is that either you drop all your good behaviour once you cross the finish line, or your goal becomes/reveals itself to be unreachable and you feel like you can do nothing right (seriously, the extend to which this happens... incredible.), etc.
  • "Focus more on the trajectory than on the goal itself." Once you get there, you will want different things and what you have learned and acquired will just be normal. There is no permanent state of "achieving the goal", there is the path there, and then the path past it.

Very roughly speaking.

All the best.

If I may recommend a book that might make you shift your non-AI related life expectancy: Lifespan by Sinclair.

Quite the fascinating read, my takeaway would be: We might very well not need ASI to reach nigh-indefinite life extension. Accidents of course still happen, so in a non-ASI branch of this world I currently estimate my life expectancy at around 300-5000 years, provided this tech happens in my lifetime (which I think is likely) and given no cryonics/backups/...

(I would like to make it clear that the author barely talks about immortality, more about health and life span, but I suspect that this has to do with decreasing the risk of not being taken seriously. He mentions f.e. millennia old organisms as ones to "learn" from.)

Interestingly, the increase in probability estimation of non-ASI-dependent immortality automatically and drastically impacts the importance of AI safety, since you are a) way more likely to be around (bit selfish, but whatever) when it hits, b) we may actually have the opportunity to take our time (not saying we should drag our feet), so the benefit from taking risks sinks even further, and c) if we get an ASI that is not perfectly aligned, we actually risk our immortality, instead of standing to gain it.

All the best to you, looking forward to meeting you all some time down the line.

(I am certain that the times and locations mentioned by HJPEV will be realized for meet-ups, provided we make it that far.)

It seems to me that the agents you are considering don't have as complex a utility function as people, who seem to at least in part consider their own well being as part of their utility funciton. Additionally, people usually don't have a clear idea of what their actual utility function is, so if they want to go all-in on it, they let some values fall by the wayside. AFAIK this limitation not a requirement for an agent.

If you had your utility function fully specified, I don't think you could be considered both rational and also not a "holy madman". (This borders on my answer to the question of free will, which so far as I can tell, is a question that should not explicitly be answered, so as to not spoil it for anyone who wants to figure it out for themselves.)

Suffice it to say that optimized/optimal function should be a convergent instrumental goal, similar to self-preservation, and a rational agent should thereby have it as a goal. If I am not mistaken, this means that a problem in work-life balance, as you put it, is not something that an actual rational agent would tolerate, provided there are options to choose from that don't include this problem and have a similar return otherwise.

Or did I misinterpret what you wrote? I can be dense sometimes...^^

An idea that might be both unsustainable and potentially dangerous, but also potentially useful, is to have someone teach as a final test. Less an exam and more a project (with oversight?). Of course, these trainees could be authentic or disguised testers.

Problems with this idea (non-exhaustive): - Rationality doesn't necessarily make you good at teaching, - Teaching the basics badly are likely to have negative effects on the trainee, - This could potentially be gamed by reformulated regurgitation.

So... What behaves differently in the presence of Rationality. I like Brennan's idea of time pressure, though he himself demonstrates that you don't need to have finished training for it, and it doesn't really hit the mark.

Or: What requires Rationality? Given Hidden Knowledge (may only require facts that are known, but not to them), one could present new true facts that need to be distinguished from new well-crafted falsehoods (QM anyone?^^). This still only indicates, but it may be part of the process. If they game this by studying everything, and thinking for themselves, and coming to correct conclusions, I think that counts as passing the test. Maybe I am currently not creative enough though. This test could also be performed in isolation, and since time would probably be a relevant component, it would likely not require huge amounts of resources to provide this isolation. Repeat tests could incorporate this (or seemingly incorporate it) too.

If you wanted to invest more effort, you could also specifically not isolate them, but put them in a pressured situation (again, I am being influenced by memories of a certain ceremony. But it is simply really good.) This doesn't have to be societal pressure, but this kind at least makes rash decisions less likely to be costly.

I can't really formulate the idea concretely, but: A test inspired by some of ye olden psychology experiments might provide double yield by both testing the rationality of the person in question and also disabuse them of their trust. Though I can see a lot of ways this idea could go awry.

An issue that most if not all of my tests run into is that they limit what could be taught, since it is still part of the test. This is a problem that should be solved, not just because it irritates me, but because this also means that random chance could easier change the results.

This is, I think, because so far all tests check for the correct answer. This, in itself, may be the wrong approach. Since we try to test techniques which have an impact on the whole person, not "just" their problem solving. I would for example hope that a crisis situation would on average benefit from the people being trained in rationality, not just in regards to "the problem solving itself", but also the emotional response, the ability to see the larger picture, prioritization and initial reaction speed, and so on.

(Maybe having them devise a test is a good test...^^ Productive, too, on the whole.)

(I can think of at least one problem of yours that I still haven't solved, though I therefore can't say whether or not my not-solving-it is actually showing a lack of rationality[though it's likely], or rather depends on something else. Not sure if I should mention it, but since you (thankfully) protect the answer, I don't think that I need to. This, still, is asking for a correct answer though.)

That's all I can think of for now. Though I am not really satisfied... Do I need to be "at a higher level" to be able to evaluate this, since I don't fully grasp what it is that should be tested yet? Seems like either an option or a stop sign..

Load More