Wiki Contributions


Excited about this!

Points of feedback:

  1. I don't like to have to scroll my screen horizontally to read the comment. (I notice there's a lot of perfectly good unused white space on the left side; comments would probably fit horizontally if you pushed everything to the left!)
  2. Sometimes when you mouse over the side-comment icon, it tries to scroll the page to make the comment readable. This is very surprising and makes me lose my place.
  3. Hovering over the icon makes the comment appear briefly. If I then want to scroll in order to read the comment, there seems to be no way to 'stay hovered' -- I have to click and toggle it, to make the comment stick around so I can actually read it. (This plus being forced to scroll the screen makes the hover feature kind of useless.)

Overall, feeling optimistic though, and will probably use this.

I think your argument is wrong, but interestingly so. I think DL is probably doing symbolic reasoning of a sort, and it sounds like you think it is not (because it makes errors?)

Do you think humans do symbolic reasoning? If so, why do humans make errors? Why do you think a DL system won't be able to eventually correct its errors in the same way humans do?

My hypothesis is that DL systems are doing a sort of fuzzy finite-depth symbolic reasoning -- it has capacity to understand the productions at a surface level and can apply them (subject to contextual clues, in an error-prone way) step by step, but once you ask for sufficient depth it will get confused and fail. Unlike humans, feedforward neural nets can't think for longer and churn step by step yet; but if someone were to figure out a way to build a looping option into the architecture then I won't be surprised to see DL systems which can go a lot further on symbolic reasoning than they currently do.

What is Pop Warner in this context? I have googled it and it sounds like he was one of the founders of modern American football, but I don't understand what it is in contrast to. Is there some other (presumably safer) ruleset?

(Inside-of-door-posted hotel room prices are called "rack rates" and nobody actually pays those. This is definitely a miscommunication.)

I am guilty of being a zero-to-one, rather than one-to-many, type person. It seems far easier and more interesting to me, to create new forms of progress of any sort, rather than convincing people to adopt better ideas.

I guess the project of convincing people seems hard? Like, if I come up with something awesome that's new, it seems easier to get it into people's hands, rather than taking an existing thing which people have already rejected and telling them "hey this is actually cool, let's look again".

All that said, I do find this idea-space intriguing partly thanks to this post - it makes me want to think of ways of doing more one-to-many type stuff. I've been recently drawn into living in DC and I think the DC effective altruism folks are much more on the one-to-many side of the world.

Upvoted for raising something to conscious attention, that I have never previously considered might be worth paying attention to.

(Slightly grumpy that I'm now going to have a new form of cognitive overhead probably 10+ times per day... these are the risks we take reading LW :P)

Look, I don’t know you at all. So please do ignore me if what I’m saying doesn’t seem right, or just if you want to, or whatever.

I’m a bit worried that you’re seeking approval, not advice? If this is so, know that I for one approve of your chosen path. You are allowed to spend a few years focusing on things that you are passionate about, which (if it works out) may result in you being happy and productive and possibly making the world better.

If you are in fact seeking advice, you should explain what your goal is. If your goal is to make the maximum impact possible — it’s worth at least hundreds of hours trying to see if you can learn more & motivate yourself along a path which seems like it combines high impact with personal resonance. I wouldn’t discount philosophy along this angle, but (for example) it sounds like you may not know that much about the potential of policy careers; there are plenty that do not require particularly strong mathematical skills (… or even any particularly difficult skills beyond some basic extraversion, resistance to boredom and willingness to spend literal decades grinding away within bureaucracies).

If your goal is to be happy, I think you will be happy doing philosophy, and I think you have a potential to make a huge impact that way. Certainly there are a decent number of full-time philosophers within effective altruism who I have huge respect for (Macaskill, Ord, Bostrom, Greaves, and Trammell jump to mind). Plus, you can save a few hundred hours, which seems pretty important if you might already know the outcome of your experimentation!

Thanks! This is very helpful, and yes, I did mean to refer to grokking! Will update the post.

Nice post!

One of my fears is that the True List is super long, because most things-being-tracked are products of expertise in a particular field and there are just so many different fields.


  • In product/ux design, tracking the way things will seem to a naive user who has never seen the product before.
  • In navigation, tracking which way north is.
  • I have a ton of "tracking" habits when writing code:
    • types of variables (and simulated-in-my-head values for such)
    • refactors that want to be done but don't quite have enough impetus for yet
    • loose ends, such as allocated-but-not-freed resources, or false symmetry (something that looks like it should be symmetric but isn't in some critical way), or other potentially-misleading things that need to be explained
    • [there are probably a lot more of these that I am not going to write down now]

Here's my attempt. I haven't read any of the other comments or the tag yet. I probably spent ~60-90m total on this, spread across a few days.

On kill switches

  • low impact somehow but I don’t know how
  • Go slow enough so that people can see what you’re doing
  • Have a bunch of "safewords" and other kill-switches installed at different places, some hopefully hard-to-reach by the AI. Test them regularly, and consider it a deadly flaw if one stops working.

On the AI accurately knowing what it is doing, and pointing at things in the real world

  • watch all the metrics (!)
  • Predict all the metrics you watch, and ask humans about any anomalous metrics that you are watching
  • group inputs and outputs separately and treat inputs as sacred. Perhaps by having an epistemic module which is incentivized by producing true predictions about the world. Make the epistemic module the one that gets all the juice. Planning module should not have any way to influence it.

On responding predictably

  • Require inputs of some kind to get more outputs (something about control theory?) - like a power steering wheel, or an ebike, it helps the human by amplifying the motions you make but not actually Doing Anything on its own
  • Have metrics look smooth on an input/output response curve. No sharp edges. let the humans be the one to turn the knobs.
  • Try things. Try things on a small scale before going big. (define "big"? maybe look at the metrics!)

On epistemology and ontology

  • Ask for feedback both about things you tried; and experiments you want to run
  • make the epistemic module have a stable ontology. If it wants to change the ontology, it needs to create a new epistemic module, and leave the old one running. The new module has to be able not only to predict physical things about the world but also what the old model would say about things being pointed-to and tagged as important in the old models ontology.
  • have a concept of goal metrics and side effects. Try to predict what the human will say about which metrics are goals and which are side effects. (hmm..this is incoherent because you can come up with many irrelevant metrics… what even is a relevant metric. Maybe the system should be grouping metrics using some kind of natural unsupervised clustering system and then trying to predict the prioritization of metric groups? And then to explain the groups it gives samples of the actual thing it’s measuring. Still difficult to make sense of this. )

On planning

  • Bound or penalize plans in complexity and time.
    • "complexity" is hard to define here, and isn't Kolmogorov I don't think.
  • Harshly penalize plans which lean on predicting what other agents will do. Ban plans which depend on prediction of what your operator will do. (AI can still predict what the operator will think of the plan, but that prediction should not feed into any plan itself.)
  • Separate planning from other parts of AI cognition, such that you are never confused about whether you are making a plan or doing other things, such as predicting things about the world, or executing a plan.
  • Don't execute plans at all
  • Train the AI to identify the purpose(s) of each step in the plan. Give negative feedback when steps lead to outcomes that are not identified ahead of time.

On transparency

  • A part of the AI should be trained to explicate what the rest of the AI is "doing"/focused on at all times. Find a way to make this into a feedback cycle.
  • AI should be arranged so it is mostly only doing one thing at a time, and that doesn't switch its focus too many times each minute.
Load More