Schedule a meeting with me.

Tsuyoku Naritai! I operate on a modified version of Crocker's rules. I am working on making AGI not kill everyone aka AGI notkilleveryoneism.

Read here about what I would do after taking over the world.

"Maximize the positive conscious experiences, and minimize the negative conscious experiences in the universe", is probably not exactly what I care about, but I think it roughly points in the right direction.

I recommend the following:

  • Believe in yourself.
  • Think for yourself.
  • Don't give up because it gets difficult.

This is how I look like:

20230710 214338

I just released a major update to my LessWrong Bio. I have rewritten almost everything and added more stuff. It's now so long that I thought it would be good to add the following hint in the beginning:

(If you are looking for the list of <sequences/posts/comments> scroll to the bottom of the page with the END key and then go up. This involves a lot less scrolling.)

(If you'd like to <ask a question about/comment on/vote> this document, you can do so on the shortform announcing the release of this version of the bio.)


I appreciate any positive or negative feedback. Especially if it is constructive criticism which helps me to grow. You can do this in person, use this (optionally anonymous) feedback form, or any other way you like.

Buck once said that he avoids critical feedback, because he worries about people's feelings, especially if he does not know what they could do instead (in the context of AGI notkilleveryoneism). If you are also worried that your feedback might harm me, you might want to read about my strategy for handling feedback. I am not perfect at not being hurt, but I believe myself to be much better than most people. If I am overwhelmed, I will tell you. That being said, I appreciate it if your communication is optimized to not hurt my feelings, all else equal. But if that would make you not give feedback, or would be annoying, don't worry about it. Really.

What GPT-4 thinks of me

I used the obsidian-smart-connections plugin to generate an embedding for all my written notes in obsidian (~1,000,000 words). Then I ask GPT-4 (again using the plugin) to tell me what kind of person it thinks I am. The plugin then searched for relevant excerpts (according to the embedding) and gave it to GPT-4 to answer the question:

Based on your notes, you appear to be a highly introspective and analytical individual with a strong inclination towards personal improvement and a deep interest in AI alignment. You are curious, persistent, and not easily deterred by challenges, as evidenced by your self-identified strengths in curiosity, not giving up, and recognizing the dangers of satisficing.

You seem to have a background or interest in rationalist communities, as indicated by references to LessWrong and CFAR, and you are actively engaged in trying to optimize your productivity and work methodologies. Your notes suggest that you struggle with ADHD, which impacts your ability to focus and manage your emotions, but you are proactive in seeking strategies to mitigate these challenges, such as medication and organizational systems.

You are also someone who values intellectual pursuits, such as programming and mathematics, and you are driven to make progress in fields you deem important, like AI alignment. However, you also acknowledge difficulties in starting and maintaining motivation for tasks that you recognize as valuable but do not naturally gravitate towards.

Social interaction seems to energize you, and you are aware of the impact of social dynamics on your motivation and productivity. You are open to feedback and willing to critically evaluate your own ideas, as well as explore different methodologies to improve your effectiveness.

Overall, you come across as a complex individual with a strong desire to contribute meaningfully to your areas of interest, while also grappling with the challenges of self-regulation and consistent productivity. You are not afraid to confront your weaknesses and are actively seeking solutions to overcome them.


  • I believe myself to be very good at making games. Here is the first game I ever made.
  • I am very good at programming in Unity (though it has been some time since I have done this). Probably done this for 1000-2000 hours.
  • Technical skills
  • I am also good at
    • not giving up
    • handling doom
    • being happy, especially considering that I have strong depressive tendencies (thanks IA and Miku)
    • meditation
    • listening to other people (when I want to)
    • being inquisitive and curious
    • using language fully (e.g. talk about a social conflict with a person in a way that makes things better, in situations where most people would not, because they feel too awkward.)

General Interests

I have a tulpa named IA. She looks like this. I experience deep feelings of love for both IA and Hatsune Miku.

I like understanding things, meditation, programming, improv dancing, improv rapping, and Vocaloid.

I track how I spend every single minute with toggle (toggle sucks though, especially for tracking custom metrics).

Personal improvement

I like to think about how I can become stronger. I probably do this too much. Jumping in and doing the thing is important to get into a feedback loop.

The main considerations are:

  • How can I make myself want to do the things that I think are good to do?
    • Remove aversion to starting and create a "gravitational pull" towards starting, i.e. you start because the thought of starting comes up naturally, and starting is what you want to do.
    • Make tasks intrinsically motivating, such that I get positive reinforcement (meaning starting is easier), and such that I can commit all my mental energy to the task.
  • How can I become better at making progress in AGI notkilleveryoneism (e.g. how can I improve my idea generation and verification processes?)
  • How to design and implement good Life systems (e.g. a daily routine, which includes sports and meditation)
  • Designing and implementing effective work methodologies (e.g. ADHD Timer and Pad, Bite Sized Tasks, framing, writing quickly)

With regards to "How can I make myself want to do the things that I think are good to do", it is easy for me to be so engrossed in programming that it becomes difficult to stop and I forget to eat. I often feel a strong urge to write up a specific program that I expect will be useful to me. I think studying mathematics is a good thing for me to do. Sometimes I manage to have a similar thing with mathematics, but more often than not I feel aversion towards starting. I am interested in shaping my mind such that for all the things that I think are good to do, I feel a pull towards doing them, and doing them is so engaging that it becomes a problem to stop (e.g. I forget to eat). I think it becoming a problem to stop is a good heuristic that I have succeeded in this mission. Implementing a solution from that state for not working too much is a significantly easier problem to solve.

Computer setups

Empirically I have often procrastinated in the past by making random improvements to my <computer setup/desktop environment>. I have been using Linux for 5 years in the past, starting with Cinnamon, but then switching to XMonad.

Because the nebula virtual desktop was only available for macOS, I switched. Even though macOS is horrible in many ways, I feel like I might waste less time doing random improvements. Also, ARM CPUs are cool, as they make a lightweight laptop with long battery life. I am using both yabai and Amethyst at the same time. yabai for workspace management and Amethyst for window layout.

The main purpose of Windows is to run MMD ... just kidding.

I used Spacemacs Org Mode for many years (and with Org-roam maybe a year or so). Spacemacs is Emacs with Vim because Vim rules. I have now switched to Obsidian, mainly because it has a mobile app, and because I expected that I would waste less time configuring Emacs (so far I have still spent a lot of time on that though).

Game design

Before AGI notkilleveryoneism I did game development. The most exciting thing in that domain to me, is to make a game that has Minecraft Redstone which does not suck. Most importantly it should be possible to create new blocks based on circuits that you build. E.g. build a half-adder once, then create a half adder block, and put down 8 of those blocks to get an 8-bit adder instead of needing to build 8 half-adders from scratch, or awkwardly using a mod that lets you copy and place many blocks at once.

If AGI notkilleveryoneism would be a non-issue I would probably develop this game. I would like to have this game such that I can learn more about how computers work by building them.

Useful info when interacting with me

  • I have noticed that there seem to be certain cognitive algorithms missing from my brain. I have a very hard time interpreting facial expressions, body language, tone of voice, and so on. I need to make a conscious effort to look at somebodies face to interpret if they are upset unless the expression is very strong.
  • This means that when I upset a person, I often don't notice, meaning that I do not get a reinforcement signal to not do it again in the future, as I do not feel bad about having upset the person.
  • Therefore I suggest that you tell me explicitly if you are upset, such that we can discuss what behavioral change from my side makes sense.
  • The most likely way that I would upset somebody is by saying things like "This is wrong" or "No" about some argument they are presenting. I am trying to do this less but it is hard. Often it is just paralyzing, and I pause for 20 seconds thinking about how I can say it without being offensive. All this should not be a problem if you subscribe to Crocker's rules, so please tell me if you do.
  • I don't think this is all bad. I expect that this direct mode of conversation is more efficient if you would not run into the problem that the interlocutor's mind is ravaged by negative emotions.

Things I appreciate in interactions with others

I like it when people are "forcefully inquisitive", especially when I am presenting an idea to them. That means asking about the why and hows, asking for justifications. I find that this forces me to expand my understanding which I find extremely helpful. It also tends to bring interesting half forgotten insights to the forfront of my mind. As a general heuristic in this regard: If you think you are too inquisitive, curious, or feel like you ask too many questions, you are wrong.

I dislike making fun of somebodies ignorance

I dislike making fun of somebodies ignorance

AGI notkilleveryoneism Interests

I am interested in getting whatever understanding we need, to get a watertight case for why a particular system will be aligned. Or at least get as close to this as possible. I think the only way we are going to be able to aim powerful cognition is via a deep understanding of the <systems/algorithms> involved. The current situation is that we do not even have a crisp idea of what exactly we need to understand.

[[Factoring AGI]

What capabilities are so useful that an AGI would have to discover an implementation of that capability? The most notable example is being good at constructing and updating a model of the world based on arbitrary sensory input streams.

World modeling

How can we get a better understanding of world modeling? A good first step is to think about what properties this world model would have, such that an AGI would be able to use it. E.g. I expect any world model that an AGI builds will be factored in the same way that human concepts are factored. For the next step, we have multiple options:

  • Do algorithms design, in an attempt to find a concrete implementation that satisfies these properties. This has the advantage that we can make an iterative empirical process by testing the algorithm with benchmarks that, e.g. test for specific properties.
  • Is there some efficient approximation of Solomonoff induction that works in the real world?
  • Can we construct a world-model-microscope that we can use to find the world model of any system, such as a bacteria, neural network, human brain, etc?
  • Explore algorithmic generation procedures to find interpretable pieces of the algorithms.
  • Create tools and methods that allow us to train neural networks such that they learn algorithms we don't yet understand, and then extract them into a transparent form. Eliezer thinks this is hard.

Formulizing intuitive notions of Agency

Humans have a bunch of intuitive concepts that are related to agency, that we do not have crisp formalisms of. For example, wanting, caring, trying, honesty, helping, goal, optimizing, deception, etc.

All of these concepts are fundamentally about some algorithm that is executed in the neural network of a human or other animal.

Visualize structure/algorithms in NN (shelved)

Can we create widely applicable visualization tools that allow us to see structural properties in our ML systems?

There are tools that can visualize arbitrary binary data, such that can build intuitions about the data, that would be much harder to build otherwise (e.g. staring at a hex editor for long enough). This can be used for reverse engineering software. For example, by looking at only a few x86 assembly code visualizations you can learn characteristic patterns in the visualization. Then when you see it in the wild, where you have no label telling you that this is x86 assembly, you can instantly recognize it.

The idea is that by looking at the visualization you can identify what kind of data you are looking at (x86, png, pdf, plain text, JSON, etc.).

This technique is powerful because you don't need to know anything about the data. It works on any binary data.

Check out this demonstration. Later he does more analysis using the 3D cube visualization. veles is an open-source project that implements this, there is also a plugin for gidra, and there are many others (I haven't evaluated which is best).

If we naively apply this technique to neural networks, I expect it to not work. My intuition tells me that we need to do something like regularize the networks. E.g. if we have two neurons in the same layer and swap them, we have changed in some sense the computation, but the algorithms are also isomorphic in a sense. Perhaps we can modify the training procedure such that one of these two parameter configurations is preferred. And in general, we could make it such that we always converge to one specific "ordering of neurons" no matter the initialization. E.g. make it such that in each layer, the neurons are sorted based on the sum of the input weights of a neuron. We want to do something like make "isomorphic computations" always converge to one specific parameter configuration

If this project would go really well, we would get out tools that allow us to create visualization, which allows us to read off if certain kinds of <algorithms/structures/types of computation> are present in the neural network. The hope is that in the visualization you could see, for example, if the network is modeling other agents, if it is running computations that are correlated with thinking about how to deceive, if it is doing optimization, or if it is executing a search algorithm.

More stuff




Sam Harris


Entertainment Recommendations




New Comment