ToW: Response to "6 reasons why alignment-is-hard discourse...". I liked this post. I'd like to write out some of my thoughts on it.
ToW: Exploration of simulacrum levels. It feels to me like the situation should be more of an interconnected web thing than discrete levels, but I haven't thought it through enough yet.
ToW: "Does demanding rights make problem solvers feel insulted?" informal social commentary exploring my thoughts on the relationship between human rights and the systems we employ to ensure standards of human rights can be met.
ToW: Map articulating all talking (Maat) sequence. Current post plan is described in the first post of the sequence.
ToW: Outcome Influencing Systems (OISs) brief explainer.
ToW: "Shut up about consciousness", a rant about how I think "consciousness" distracts from anything I want to talk about, probably including exploration of how OIS terminology is designed to avoid "consciousness" or other religious questions.
ToW: A review and explanation of the more math AI topics (Statistic ML theory, Universal AI, Infrabayes, etc...) targeted at non math audiences.
No problem. Hope my criticism didn't come across as overly harsh. I'm grateful for your engagement : )
I think the reason humans care about other people's interests, and aren't power-seeking ruthless consequentialists, is because of evolution.
This is kinda a weird way to phrase it since if I'm modelling the causal chain right:
(evolution)->(approval reward)->(not ruthless)
So yeah, evolution is causally upstream of not ruthless, but approval reward is the thing directly before it. Evolution caused all human behaviour, so if you observe any behaviour any human ever exhibits you can validly say "this is because of evolution".
Yeah! That was the post that got me to really deeply believe the Orthogonality Thesis. "Naturalistic Awakening" and "Human Guide to Words" are my two favourite sequences.
OISs are actually a slightly broader definition than optimization processes for two reasons though: (1) OISs have capabilities, not intelligence, and (2) OIS capabilities are arbitrarily general.
(1) The important distinction is that OISs are defined in terms of capabilities not in terms of intelligence, where capabilities can be broken down into skills, knowledge, and resource access.
This is valuable for breaking skills down into skill domains, which is relevant for risk assessment, while intelligence is a kind of generalizable skill that seems to be very poorly defined and usually more distracting to valuable analysis in my opinion.
Also, resource access has the compounding property that knowledge and skill also have which could potentially lead to dangerously compounding capabilities. Making it explicit that "intelligence" is not the only aspect of an OIS that has this compounding property seems important.
(2) Is less well considered and less important. The example I have for this is a bottle cap. A bottle cap makes it more likely that water will stay in a bottle, but it isn't an optimizer, it is an optimized object. When viewed through the optimizer lens, the bottle cap doesn't want to keep the water in, rather, it was optimized by something that does want to keep the water in, so it is not an optimizer. That is, the cap has extremely fragile capabilities. It keeps the water in when it is screwed on, but if it is unscrewed it has no ability on it's own to put itself back on or try to continue keeping the water in. This must be very nearly the limit in how little it is possible for capabilities to generalize.
However, from the OIS lens, the cap indeed makes water staying in the bottle a more likely outcome, and we can say that in some sense it does want to keep the water in.
I find it a little frustrating how general this makes the definition, and I'm sure other people will as well, but I think it is more useful in this case to cast a very wide net and then try to understand the differences between the kinds of things caught by that net, rather than working with the overly limited definitions that fail to reference the objects I am interested in. It also highlights the potential issues with highly optimized fragile OIS. If we need them to generalize, it is a problem that they won't, and if we are expecting safety because something "isn't actually an optimizer" that may not matter if it is sufficiently well optimized over a sufficiently dangerous domain of capability.