[epistemic status: that's just my opinion, man. I have highly suggestive evidence, not deductive proof, for a belief I sincerely hold]
"If you see fraud and do not say fraud, you are a fraud." --- Nasim Taleb
I was talking with a colleague the other day about an AI organization that claims:
His response was (paraphrasing): "Wow, that's a really good lie! A lie that can't be disproven."
I found this response refreshing, because he immediately jumped to the most likely conclusion.
Generally, entrepreneurs who
...If the thesis in Unlocking the Emotional Brain (UtEB) is even half-right, it may be one of the most important books that I have read. Written by the psychotherapists Bruce Ecker, Robin Ticic and Laurel Hulley, it claims to offer a neuroscience-grounded, comprehensive model of how effective therapy works. In so doing, it also happens to formulate its theory in terms of belief updating, helping explain how the brain models the world and what kinds of techniques allow us to actually change our minds. Furthermore, if UtEB is correct, it also explains why rationalist techniques such as Internal Double Crux [1 2 3] work.
UtEB’s premise is that much if not most of our behavior is driven by emotional learning. Intense emotions generate unconscious predictive models of how...
I have recently had success with the Conference Therapy technique they describe
Cool!
I actually started reading the book, rage-quit in the middle, then came back to it years later and found it useful. I rage-quit because the section on EMDR was about a patient with panic attacks, EMDR was done, and afterward the patient still had panic attacks but they claimed the treatment was a success anyway.
Huh, I didn't remember this from my read. Searching for "panic attacks" in my copy now, there's the story of Susan who got EMDR for panic attacks, but my copy seems ...
(cross posted from my personal blog)
Since middle school I've generally thought that I'm pretty good at dealing with my emotions, and a handful of close friends and family have made similar comments. Now I can see that though I was particularly good at never flipping out, I was decidedly not good "healthy emotional processing". I'll explain later what I think "healthy emotional processing" is, right now I'm using quotes to indicate "the thing that's good to do with emotions". Here it goes...
When I was a kid I adopted a strong, "Fix it or stop complaining about it" mentality. This applied to stress and worry as well. "Either address the problem you're worried about or quit worrying about it!" Also being a kid, I had a limited...
I'm trying to decide to what extent this applies to my lived experience, but finding it difficult to distinguish between maintaining a healthy tranquility and cultivating habitual impassivity. My intuition is that I've had both experiences, but the internal feedback for either is very similar. Both seem to involve putting a functional amount of distance between yourself and your emotional response, and - in my experience - the healthy habit does reinforce itself, just like the negative version. But then, sometimes, I find myself noticing the lack of an emo...
The justification for modelling real-world systems as “agents” - i.e. choosing actions to maximize some utility function - usually rests on various coherence theorems. They say things like “either the system’s behavior maximizes some utility function, or it is throwing away resources” or “either the system’s behavior maximizes some utility function, or it can be exploited” or things like that. Different theorems use slightly different assumptions and prove slightly different things, e.g. deterministic vs probabilistic utility function, unique vs non-unique utility function, whether the agent can ignore a possible action, etc.
One theme in these theorems is how they handle “incomplete preferences”: situations where an agent does not prefer one world-state over another. For instance, imagine an agent which prefers pepperoni over mushroom pizza when it has pepperoni,...
I mean the former: like, whatever "utility" is is not a simple thing to define in terms of things we have a handle on ("pleasurable mental states" does not count as a simple definition), and even if you allow yourself access to standard language about mental states I don't think it's so easy (e.g. there are a bunch of different sorts of mental states that might fall under the broad umbrella of "pleasure").
I do agree that "not for the sake of happiness alone" argues against utilitarianism.
In 2008, Steve Omohundro's foundational paper The Basic AI Drives conjectured that superintelligent goal-directed AIs might be incentivized to gain significant amounts of power in order to better achieve their goals. Omohundro's conjecture bears out in toy models, and the supporting philosophical arguments are intuitive. In 2019, the conjecture was even debated by well-known AI researchers.
Power-seeking behavior has been heuristically understood as an anticipated risk, but not as a formal phenomenon with a well-understood cause. The goal of this post (and the accompanying paper, Optimal Policies Tend to Seek Power) is to change that.
It’s 2008, the ancient wild west of AI alignment. A few people have started thinking about questions like “if we gave an AI a utility function over world states, and it actually maximized that...
Fair enough, but in that example making irreversible decisions is unavoidable. What if we consider a modified tree such that one and only one branch is traversible in both directions, and utility can be anywhere?
I expect we get that the reversible brach is the most popular across the distribution of utility functions (but not necessarily that most utility functions prefer it). That sounds like cause for optimism—'optimal policies tend to avoid irreversible changes'.
The generalized efficient markets (GEM) principle says, roughly, that things which would give you a big windfall of money and/or status, will not be easy. If such an opportunity were available, someone else would have already taken it. You will never find a $100 bill on the floor of Grand Central Station at rush hour, because someone would have picked it up already.
One way to circumvent GEM is to be the best in the world at some relevant skill. A superhuman with hawk-like eyesight and the speed of the Flash might very well be able to snag $100 bills off the floor of Grand Central. More realistically, even though financial markets are the ur-example of efficiency, a handful of firms do make impressive amounts of money by...
In other words, how hard is it to reach the Pareto frontier?
Important
Lately I've come to think of human civilization as largely built on the backs of intelligence and virtue signaling. In other words, civilization depends very much on the positive side effects of (not necessarily conscious) intelligence and virtue signaling, as channeled by various institutions. As evolutionary psychologist Geoffrey Miller says, "it’s all signaling all the way down."
A question I'm trying to figure out now is, what determines the relative proportions of intelligence vs virtue signaling? (Miller argued that intelligence signaling can be considered a kind of virtue signaling, but that seems debatable to me, and in any case, for ease of discussion I'll use "virtue signaling" to mean "other kinds of virtue signaling besides intelligence signaling".) It seems that if you get too much of one type
...Hey folks, just to say I gave this talk on Intelligence Signalling :) inspired by Hanson
Credit for the question to Eli Tyre.
I spent roughly two days attempting to learn the answer this question plus several more writing it up. What is presented is more accurately described as a partial answer or contribution towards answering the question - this report isn’t actually a confident, solid answer. The question is too large for that.
I need to provide a couple of epistemic caveats:
holy citation needed batman! among other serious issues with this perspective, variance in capability has many input factors; while genetics is involved, memetics and material circumstances also make a significant difference - guns, germs, and steel all can serve to amplify or reduce effective intelligence. society has been on a 12k-year energy availability growth process since the start of farming, and there have been many variations on the way - I don't deny that there may be genetic differences. but I don't think we can even conclude there's been a dece...
1. A group wants to try an activity that really requires a lot of group buy in. The activity will not work as well if there is doubt that everyone really wants to do it. They establish common knowledge of the need for buy in. They then have a group conversation in which several people make comments about how great the activity is and how much they want to do it. Everyone wants to do the activity, but is aware that if they did not want to do the activity, it would be awkward to admit. They do the activity. It goes poorly.
2. Alice strongly wants to believe A. She searches for evidence of A. She implements a biased search, ignoring evidence against A. She finds justifications...
it was strange to read it. it was interesting - explaining point i already know in succinct and effective way. and it's connect nicely with the extensive discussion on consent and boundaries. Boundaries: Your Yes Means Nothing if You Can’t Say No.
and then, when i was reading the comments and still internalizing the post i got it - i actually re-invented this concept myself! it could have been so nice not to have to do it... i wrote my own post about it - in Hebrew. it's name translates to Admit that sometimes the answer is "yes", and it start with a ...
This essay is an adaptation of a talk I gave at the Human-Aligned AI Summer School 2019 about our work on mesa-optimisation. My goal here is to write an informal, accessible and intuitive introduction to the worry that we describe in our full-length report.
I will skip most of the detailed analysis from our report, and encourage the curious reader to follow up this essay with our sequence or report.
The essay has six parts:
Two distinctions draws the foundational distinctions between
“optimised” and “optimising”, and between utility and reward.
What objectives? discusses the behavioral and internal approaches to understanding objectives of ML systems.
Why worry? outlines the risk posed by the utility ≠ reward gap.
Mesa-optimisers introduces our language for analysing this worry.
An alignment agenda sketches different alignment problems presented by these ideas,...
A part of me is worried that the terminology invites viewing mesa-optimisers as a description of a very specific failure mode, instead of as a language for the general worry described above.
I have been very confused about the term for a very long time, and have always thought mesa-optimisers is a very specific failure mode.
This post helped me clear things up.
I.
Kaj Sotala has an outstanding review of Unlocking The Emotional Brain; I read the book, and Kaj’s review is better.
He begins:
UtEB’s premise is that much if not most of our behavior is driven by emotional learning. Intense emotions generate unconscious predictive models of how the world functions and what caused those emotions to occur. The brain then uses those models to guide our future behavior. Emotional issues and seemingly irrational behaviors are generated from implicit world-models (schemas) which have been formed in response to various external challenges. Each schema contains memories relating to times when the challenge has been encountered and mental structures describing both the problem and a solution to it.
So in one of the book’s example cases, a man named Richard sought help for...
What ever the case I am often exhausted, when dealing with such issues.
Good post though.
For instance certain high pitch sounds are terrible for my ears. Makes me lose focus, and makes my eyes close.
Its so bad, that I literally feel as though there is pain in my mind.
Schema? Or auditory thing?
It never happens with other sounds, just with this pitch.
Same problem with focus.
I can clearly be aware how the little tribes in my mind come together to defeat the invaders, but once the battle is over they part ways, and go back, or if they have to ...
This post originally appeared here; I've updated it slightly and posted it here as a follow-up to this post.
David Friedman has a fascinating book on alternative legal systems. One chapter focuses on prison law - not the nominal rules, but the rules enforced by prisoners themselves.
The unofficial legal system of California prisoners is particularly interesting because it underwent a phase change sometime after the 1960’s.
Prior to the 1960’s, prisoners ran on a decentralized code of conduct - various unwritten rules roughly amounting to “mind your own business and don’t cheat anyone”. Prisoners who kept to the code were afforded some respect by their fellow inmates. Prisoners who violated the code were ostracized, making them fair game for the more predatory inmates. There was no formal enforcement; the...
I'm very curious about what the actual rules of the various gangs look like. If they exist in written form in an environment where it's easy to confiscate documents I would expect them to be publically accessible.
This post begins the Immoral Mazes sequence. See introduction for an overview of the plan. Before we get to the mazes, we need some background first.
Meditations on Moloch
Consider Scott Alexander’s Meditations on Moloch. I will summarize here.
Therein lie fourteen scenarios where participants can be caught in bad equilibria.
damn, i'm not sure; maybe it was my Twitter cover picture: https://twitter.com/matiroy9/ (this is content i modified from someone else)
Previously: Online discussion is better than pre-publication peer review, Disincentives for participating on LW/AF
Recently I've noticed a cognitive dissonance in myself, where I can see that my best ideas have come from participating on various mailing lists and forums (such as cypherpunks, extropians, SL4, everything-list, LessWrong and AI Alignment Forum), and I've received a certain amount of recognition as a result, but when someone asks me what I actually do as an "independent researcher", I'm embarrassed to say that I mostly comment on other people's posts, participate in online discussions, and occasionally a new idea pops into my head and I write it down as a blog/forum post of my own. I guess that's because I imagine it doesn't fit most people's image of what a researcher's
...FP often leads to long, winding discussions that may end with two researchers agreeing, but the resulting transcript is not great for future readers.
Highly agree. Often valuable content in discussions - either what you wrote or what the other person wrote - just gets lost.
Rereading your old discussions and then distilling the useful stuff into posts is a full time job.
This is a major reason I find very lengthy comments not often worth writing. I wonder if there is a way ro solve this problem. (Maybe more advanced search functionality on your comments?)
An actual debate about instrumental convergence, in a public space! Major respect to all involved, especially Yoshua Bengio for great facilitation.
For posterity (i.e. having a good historical archive) and further discussion, I've reproduced the conversation here. I'm happy to make edits at the request of anyone in the discussion who is quoted below. I've improved formatting for clarity and fixed some typos. For people who are not researchers in this area who wish to comment, see the public version of this post here. For people who do work on the relevant areas, please sign up in the top right. It will take a day or so to confirm membership.
Yann LeCun: "don't fear the Terminator", a short opinion piece by Tony Zador and me that was just...
It would be great to have a summary or distillation of this conversation.
I've been thinking more about partial agency. I want to expand on some issues brought up in the comments to my previous post, and on other complications which I've been thinking about. But for now, a more informal parable. (Mainly because this is easier to write than my more technical thoughts.)
This relates to oracle AI and to inner optimizers, but my focus is a little different.
Suppose you are designing a new invention, a predict-o-matic. It is a wonderous machine which will predict everything for us: weather, politics, the newest advances in quantum physics, you name it. The machine isn't infallible, but it will integrate data across a wide range of domains, automatically keeping itself up-to-date with all areas of science and current events. You fully expect that...
If someone had a strategy that took two years, they would have to over-bid in the first year, taking a loss. But then they have to under-bid on the second year if they're going to make a profit, and--"
"And they get undercut, because someone figures them out."
I think one could imagine scenarios where the first trader can use their influence in the first year to make sure they are not undercut in the second year, analogous to the prediction market example. For instance, the trader could install some kind of encryption in the software that this company use...
I've been discoursing more privately about the corruption of discourse lately, for reasons that I hope are obvious at least in the abstract, but there's one thing I did think was shareable. The context is another friend's then-forthcoming blog post about the politicization of category boundaries.
In a world where half the employees with bad jobs get good titles, aren't their titles predictively important in that they predict how likely they are to be hired by outside companies? Their likelihood of getting hired is, under these assumptions, going to be the same as that of as people with good jobs and good titles, and higher than that of people with bad jobs and bad titles. So, in terms of things...
When I was halfway through this and read about the 4 stages, they immediately seemed to me to correspond to four types of news reporting:
Previously in Immoral Mazes sequence: Moloch Hasn’t Won
In Meditations on Moloch, Scott points out that perfect competition destroys all value beyond the axis of competition.
Which, for any compactly defined axis of competition we know about, destroys all value.
This is mathematically true.
Yet value remains.
Thus competition is imperfect.
(Unjustified-at-least-for-now note: Competition is good and necessary. Robust imperfect competition, including evolution, creates all value. Citation/proof not provided due to scope concerns but can stop to expand upon this more if it seems important to do so. Hopefully this will become clear naturally over course of sequence and/or proof seems obvious once considered.)
Perfect competition hasn’t destroyed all value. Fully perfect competition is a useful toy model. But it isn’t an actual thing.
Some systems and markets get close. For now they remain the...
Agreed. Zvi's proposition also simply doesn't align with first-world people's motivations, as far as I can tell. In short, first-worlders have a lot of other interesting ways that they can use their time.
[Epistemic Status: Scroll to the bottom for my follow-up thoughts on this from months/years later.]
Early this year, Conor White-Sullivan introduced me to the Zettelkasten method of note-taking. I would say that this significantly increased my research productivity. I’ve been saying “at least 2x”. Naturally, this sort of thing is difficult to quantify. The truth is, I think it may be more like 3x, especially along the dimension of “producing ideas” and also “early-stage development of ideas”. (What I mean by this will become clearer as I describe how I think about research productivity more generally.) However, it is also very possible that the method produces serious biases in the types of ideas produced/developed, which should be considered. (This would be difficult to quantify at the best of...
I'm curious exactly what you meant by "first order".
Just that the trade-off is only present if you think of "individual rationality" as "let's forget that I'm part of a community for a moment". All things considered, there's just rationality, and you should do what's optimal.
First-order: Everyone thinks that maximizing insight production means doing IDA* over idea tree. Second-order: Everyone notices that everyone will think that, so it's no longer optimal for maximizing insights produces overall. Everyone wants to coordinate with everyone else...
Since the CAIS technical report is a gargantuan 210 page document, I figured I'd write a post to summarize it. I have focused on the earlier chapters, because I found those to be more important for understanding the core model. Later chapters speculate about more concrete details of how AI might develop, as well as the implications of the CAIS model on strategy. ETA: This comment provides updates based on more discussion with Eric.
The core idea is to look at the pathway by which we will develop general intelligence, rather than assuming that at some point we will get a superintelligent AGI agent. To predict how AI will progress in the future, we can look at how AI progresses currently -- through research and development (R&D)...
I agree that in the long term, agent AI could probably improve faster than CAIS, but I think CAIS could still be a solution.
Regardless of how it is aligned, aligned AI will tend to improve slower than unaligned AI, because it is trying to achieve a more complicated goal, human oversight takes time, etc. To prevent unaligned AI, aligned AI will need a head start, so it can stop any unaligned AI while it's still much weaker. I don't think CAIS is fundamentally different in that respect.
If the reasoning in the post that CAIS will develop before AGI holds up, then CAIS would actually have an advantage, because it would be easier to get a head start.