[epistemic status: that's just my opinion, man. I have highly suggestive evidence, not deductive proof, for a belief I sincerely hold]
"If you see fraud and do not say fraud, you are a fraud." --- Nasim Taleb
I was talking with a colleague the other day about an AI organization that claims:
His response was (paraphrasing): "Wow, that's a really good lie! A lie that can't be disproven."
I found this response refreshing, because he immediately jumped to the most likely conclusion.
Generally, entrepreneurs who...
If the thesis in Unlocking the Emotional Brain (UtEB) is even half-right, it may be one of the most important books that I have read. Written by the psychotherapists Bruce Ecker, Robin Ticic and Laurel Hulley, it claims to offer a neuroscience-grounded, comprehensive model of how effective therapy works. In so doing, it also happens to formulate its theory in terms of belief updating, helping explain how the brain models the world and what kinds of techniques allow us to actually change our minds. Furthermore, if UtEB is correct, it also explains why rationalist techniques such as Internal Double Crux [1 2 3] work.
UtEB’s premise is that much if not most of our behavior is driven by emotional learning. Intense emotions generate unconscious predictive models of how...
(cross posted from my personal blog)
Since middle school I've generally thought that I'm pretty good at dealing with my emotions, and a handful of close friends and family have made similar comments. Now I can see that though I was particularly good at never flipping out, I was decidedly not good "healthy emotional processing". I'll explain later what I think "healthy emotional processing" is, right now I'm using quotes to indicate "the thing that's good to do with emotions". Here it goes...
When I was a kid I adopted a strong, "Fix it or stop complaining about it" mentality. This applied to stress and worry as well. "Either address the problem you're worried about or quit worrying about it!" Also being a kid, I had a limited...
The justification for modelling real-world systems as “agents” - i.e. choosing actions to maximize some utility function - usually rests on various coherence theorems. They say things like “either the system’s behavior maximizes some utility function, or it is throwing away resources” or “either the system’s behavior maximizes some utility function, or it can be exploited” or things like that. Different theorems use slightly different assumptions and prove slightly different things, e.g. deterministic vs probabilistic utility function, unique vs non-unique utility function, whether the agent can ignore a possible action, etc.
One theme in these theorems is how they handle “incomplete preferences”: situations where an agent does not prefer one world-state over another. For instance, imagine an agent which prefers pepperoni over mushroom pizza when it has pepperoni,...
In 2008, Steve Omohundro's foundational paper The Basic AI Drives conjectured that superintelligent goal-directed AIs might be incentivized to gain significant amounts of power in order to better achieve their goals. Omohundro's conjecture bears out in toy models, and the supporting philosophical arguments are intuitive. In 2019, the conjecture was even debated by well-known AI researchers.
Power-seeking behavior has been heuristically understood as an anticipated risk, but not as a formal phenomenon with a well-understood cause. The goal of this post (and the accompanying paper, Optimal Policies Tend to Seek Power) is to change that.
It’s 2008, the ancient wild west of AI alignment. A few people have started thinking about questions like “if we gave an AI a utility function over world states, and it actually maximized that...
The generalized efficient markets (GEM) principle says, roughly, that things which would give you a big windfall of money and/or status, will not be easy. If such an opportunity were available, someone else would have already taken it. You will never find a $100 bill on the floor of Grand Central Station at rush hour, because someone would have picked it up already.
One way to circumvent GEM is to be the best in the world at some relevant skill. A superhuman with hawk-like eyesight and the speed of the Flash might very well be able to snag $100 bills off the floor of Grand Central. More realistically, even though financial markets are the ur-example of efficiency, a handful of firms do make impressive amounts of money by...
Lately I've come to think of human civilization as largely built on the backs of intelligence and virtue signaling. In other words, civilization depends very much on the positive side effects of (not necessarily conscious) intelligence and virtue signaling, as channeled by various institutions. As evolutionary psychologist Geoffrey Miller says, "it’s all signaling all the way down."
A question I'm trying to figure out now is, what determines the relative proportions of intelligence vs virtue signaling? (Miller argued that intelligence signaling can be considered a kind of virtue signaling, but that seems debatable to me, and in any case, for ease of discussion I'll use "virtue signaling" to mean "other kinds of virtue signaling besides intelligence signaling".) It seems that if you get too much of one type...
Credit for the question to Eli Tyre.
I spent roughly two days attempting to learn the answer this question plus several more writing it up. What is presented is more accurately described as a partial answer or contribution towards answering the question - this report isn’t actually a confident, solid answer. The question is too large for that.
I need to provide a couple of epistemic caveats:
1. A group wants to try an activity that really requires a lot of group buy in. The activity will not work as well if there is doubt that everyone really wants to do it. They establish common knowledge of the need for buy in. They then have a group conversation in which several people make comments about how great the activity is and how much they want to do it. Everyone wants to do the activity, but is aware that if they did not want to do the activity, it would be awkward to admit. They do the activity. It goes poorly.
2. Alice strongly wants to believe A. She searches for evidence of A. She implements a biased search, ignoring evidence against A. She finds justifications...
This essay is an adaptation of a talk I gave at the Human-Aligned AI Summer School 2019 about our work on mesa-optimisation. My goal here is to write an informal, accessible and intuitive introduction to the worry that we describe in our full-length report.
The essay has six parts:
Two distinctions draws the foundational distinctions between
“optimised” and “optimising”, and between utility and reward.
What objectives? discusses the behavioral and internal approaches to understanding objectives of ML systems.
Why worry? outlines the risk posed by the utility ≠ reward gap.
Mesa-optimisers introduces our language for analysing this worry.
An alignment agenda sketches different alignment problems presented by these ideas,...
UtEB’s premise is that much if not most of our behavior is driven by emotional learning. Intense emotions generate unconscious predictive models of how the world functions and what caused those emotions to occur. The brain then uses those models to guide our future behavior. Emotional issues and seemingly irrational behaviors are generated from implicit world-models (schemas) which have been formed in response to various external challenges. Each schema contains memories relating to times when the challenge has been encountered and mental structures describing both the problem and a solution to it.
So in one of the book’s example cases, a man named Richard sought help for...
David Friedman has a fascinating book on alternative legal systems. One chapter focuses on prison law - not the nominal rules, but the rules enforced by prisoners themselves.
The unofficial legal system of California prisoners is particularly interesting because it underwent a phase change sometime after the 1960’s.
Prior to the 1960’s, prisoners ran on a decentralized code of conduct - various unwritten rules roughly amounting to “mind your own business and don’t cheat anyone”. Prisoners who kept to the code were afforded some respect by their fellow inmates. Prisoners who violated the code were ostracized, making them fair game for the more predatory inmates. There was no formal enforcement; the...
This post begins the Immoral Mazes sequence. See introduction for an overview of the plan. Before we get to the mazes, we need some background first.
Meditations on Moloch
Consider Scott Alexander’s Meditations on Moloch. I will summarize here.
Therein lie fourteen scenarios where participants can be caught in bad equilibria.
Recently I've noticed a cognitive dissonance in myself, where I can see that my best ideas have come from participating on various mailing lists and forums (such as cypherpunks, extropians, SL4, everything-list, LessWrong and AI Alignment Forum), and I've received a certain amount of recognition as a result, but when someone asks me what I actually do as an "independent researcher", I'm embarrassed to say that I mostly comment on other people's posts, participate in online discussions, and occasionally a new idea pops into my head and I write it down as a blog/forum post of my own. I guess that's because I imagine it doesn't fit most people's image of what a researcher's...
An actual debate about instrumental convergence, in a public space! Major respect to all involved, especially Yoshua Bengio for great facilitation.
For posterity (i.e. having a good historical archive) and further discussion, I've reproduced the conversation here. I'm happy to make edits at the request of anyone in the discussion who is quoted below. I've improved formatting for clarity and fixed some typos. For people who are not researchers in this area who wish to comment, see the public version of this post here. For people who do work on the relevant areas, please sign up in the top right. It will take a day or so to confirm membership.
Yann LeCun: "don't fear the Terminator", a short opinion piece by Tony Zador and me that was just...
I've been thinking more about partial agency. I want to expand on some issues brought up in the comments to my previous post, and on other complications which I've been thinking about. But for now, a more informal parable. (Mainly because this is easier to write than my more technical thoughts.)
This relates to oracle AI and to inner optimizers, but my focus is a little different.
Suppose you are designing a new invention, a predict-o-matic. It is a wonderous machine which will predict everything for us: weather, politics, the newest advances in quantum physics, you name it. The machine isn't infallible, but it will integrate data across a wide range of domains, automatically keeping itself up-to-date with all areas of science and current events. You fully expect that...
I've been discoursing more privately about the corruption of discourse lately, for reasons that I hope are obvious at least in the abstract, but there's one thing I did think was shareable. The context is another friend's then-forthcoming blog post about the politicization of category boundaries.
In a world where half the employees with bad jobs get good titles, aren't their titles predictively important in that they predict how likely they are to be hired by outside companies? Their likelihood of getting hired is, under these assumptions, going to be the same as that of as people with good jobs and good titles, and higher than that of people with bad jobs and bad titles. So, in terms of things...
Previously in Immoral Mazes sequence: Moloch Hasn’t Won
In Meditations on Moloch, Scott points out that perfect competition destroys all value beyond the axis of competition.
Which, for any compactly defined axis of competition we know about, destroys all value.
This is mathematically true.
Yet value remains.
Thus competition is imperfect.
(Unjustified-at-least-for-now note: Competition is good and necessary. Robust imperfect competition, including evolution, creates all value. Citation/proof not provided due to scope concerns but can stop to expand upon this more if it seems important to do so. Hopefully this will become clear naturally over course of sequence and/or proof seems obvious once considered.)
Perfect competition hasn’t destroyed all value. Fully perfect competition is a useful toy model. But it isn’t an actual thing.
Some systems and markets get close. For now they remain the...
[Epistemic Status: Scroll to the bottom for my follow-up thoughts on this from months/years later.]
Early this year, Conor White-Sullivan introduced me to the Zettelkasten method of note-taking. I would say that this significantly increased my research productivity. I’ve been saying “at least 2x”. Naturally, this sort of thing is difficult to quantify. The truth is, I think it may be more like 3x, especially along the dimension of “producing ideas” and also “early-stage development of ideas”. (What I mean by this will become clearer as I describe how I think about research productivity more generally.) However, it is also very possible that the method produces serious biases in the types of ideas produced/developed, which should be considered. (This would be difficult to quantify at the best of...
Since the CAIS technical report is a gargantuan 210 page document, I figured I'd write a post to summarize it. I have focused on the earlier chapters, because I found those to be more important for understanding the core model. Later chapters speculate about more concrete details of how AI might develop, as well as the implications of the CAIS model on strategy. ETA: This comment provides updates based on more discussion with Eric.
The core idea is to look at the pathway by which we will develop general intelligence, rather than assuming that at some point we will get a superintelligent AGI agent. To predict how AI will progress in the future, we can look at how AI progresses currently -- through research and development (R&D)...