Gear-level models are expensive - often prohibitively expensive. Black-box approaches are usually much cheaper and faster. But black-box approaches rarely generalize - they're subject to Goodhart, need to be rebuilt when conditions change, don't identify unknown unknowns, and are hard to build on top of. Gears-level models, on the other hand, offer permanent, generalizable knowledge which can be applied to many problems in the future, even if conditions shift.

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
Stephen Fowler63-16
38
Very Spicy Take Epistemic Note:  Many highly respected community members with substantially greater decision making experience (and Lesswrong karma) presumably disagree strongly with my conclusion. Premise 1:  It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research. Premise 2: This was the default outcome.  Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.  Premise 3: Without repercussions for terrible decisions, decision makers have no skin in the game.  Conclusion: Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn't be allowed anywhere near AI Safety decision making in the future. To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.  This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved.  To quote OpenPhil: "OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela."
robo169
3
Our current big stupid: not preparing for 40% agreement Epistemic status: lukewarm take from the gut (not brain) that feels rightish The "Big Stupid" of the AI doomers 2013-2023 was AI nerds' solution to the problem "How do we stop people from building dangerous AIs?" was "research how to build AIs".  Methods normal people would consider to stop people from building dangerous AIs, like asking governments to make it illegal to build dangerous AIs, were considered gauche.  When the public turned out to be somewhat receptive to the idea of regulating AIs, doomers were unprepared. Take: The "Big Stupid" of right now is still the same thing.  (We've not corrected enough).  Between now and transformative AGI we are likely to encounter a moment where 40% of people realize AIs really could take over (say if every month another 1% of the population loses their job).  If 40% of the world were as scared of AI loss-of-control as you, what could the world do? I think a lot!  Do we have a plan for then? Almost every LessWrong post on AIs are about analyzing AIs.  Almost none are about how, given widespread public support, people/governments could stop bad AIs from being built. [Example: if 40% of people were as worried about AI as I was, the US would treat GPU manufacture like uranium enrichment.  And fortunately GPU manufacture is hundreds of time harder than uranium enrichment!  We should be nerding out researching integrated circuit supply chains, choke points, foundry logistics in jurisdictions the US can't unilaterally sanction, that sort of thing.] TLDR, stopping deadly AIs from being built needs less research on AIs and more research on how to stop AIs from being built. *My research included 😬
Tamsin Leake2224
7
I'm surprised at people who seem to be updating only now about OpenAI being very irresponsible, rather than updating when they created a giant public competitive market for chatbots (which contains plenty of labs that don't care about alignment at all), thereby reducing how long everyone has to solve alignment. I still parse that move as devastating the commons in order to make a quick buck.
If your endgame strategy involved relying on OpenAI, DeepMind, or Anthropic to implement your alignment solution that solves science / super-cooperation / nanotechnology, consider figuring out another endgame plan.
Akash189
2
My current perspective is that criticism of AGI labs is an under-incentivized public good. I suspect there's a disproportionate amount of value that people could have by evaluating lab plans, publicly criticizing labs when they break commitments or make poor arguments, talking to journalists/policymakers about their concerns, etc. Some quick thoughts: * Soft power– I think people underestimate the how strong the "soft power" of labs is, particularly in the Bay Area.  * Jobs– A large fraction of people getting involved in AI safety are interested in the potential of working for a lab one day. There are some obvious reasons for this– lots of potential impact from being at the organizations literally building AGI, big salaries, lots of prestige, etc. * People (IMO correctly) perceive that if they acquire a reputation for being critical of labs, their plans, or their leadership, they will essentially sacrifice the ability to work at the labs.  * So you get an equilibrium where the only people making (strong) criticisms of labs are those who have essentially chosen to forgo their potential of working there. * Money– The labs and Open Phil (which has been perceived, IMO correctly, as investing primarily into metastrategies that are aligned with lab interests) have an incredibly large share of the $$$ in the space. When funding became more limited, this became even more true, and I noticed a very tangible shift in the culture & discourse around labs + Open Phil * Status games//reputation– Groups who were more inclined to criticize labs and advocate for public or policymaker outreach were branded as “unilateralist”, “not serious”, and “untrustworthy” in core EA circles. In many cases, there were genuine doubts about these groups, but my impression is that these doubts got amplified/weaponized in cases where the groups were more openly critical of the labs. * Subjectivity of "good judgment"– There is a strong culture of people getting jobs/status for having “good judgment”. This is sensible insofar as we want people with good judgment (who wouldn’t?) but this often ends up being so subjective that it ends up leading to people being quite afraid to voice opinions that go against mainstream views and metastrategies (particularly those endorsed by labs + Open Phil). * Anecdote– Personally, I found my ability to evaluate and critique labs + mainstream metastrategies substantially improved when I spent more time around folks in London and DC (who were less closely tied to the labs). In fairness, I suspect that if I had lived in London or DC *first* and then moved to the Bay Area, it’s plausible I would’ve had a similar feeling but in the “reverse direction”. With all this in mind, I find myself more deeply appreciating folks who have publicly and openly critiqued labs, even in situations where the cultural and economic incentives to do so were quite weak (relative to staying silent or saying generic positive things about labs). Examples: Habryka, Rob Bensinger, CAIS, MIRI, Conjecture, and FLI. More recently, @Zach Stein-Perlman, and of course Jan Leike and Daniel K. 

Popular Comments

Recent Discussion

The goal of instrumental rationality mostly speaks for itself. Some commenters have wondered, on the other hand, why rationalists care about truth. Which invites a few different answers, depending on who you ask; and these different answers have differing characters, which can shape the search for truth in different ways.

You might hold the view that pursuing truth is inherently noble, important, and worthwhile. In which case your priorities will be determined by your ideals about which truths are most important, or about when truthseeking is most virtuous.

This motivation tends to have a moral character to it. If you think it your duty to look behind the curtain, you are a lot more likely to believe that someone else should look behind the curtain too, or castigate them...

It is10

Science was noticing that certain modes of thinking uncovered beliefs that let us manipulate the world


Science is our tool to manipulate the world. Science is the instrument of truth. If you can not manipulate the meaning of reality through definition of words such as 'rational', what it is and it isn't. Where science is the substitute for truth and rationality is the substitute for truth also. As self-evident the definition of rationality is substituted for science and this forms our basic definition for rationality moving forward. Care not to define scien... (read more)

Epistemic Status: Reference.

Expanded From: Against Facebook, as the post originally intended.

Some things are fundamentally Out to Get You.

They seek resources at your expense. Fees are hidden. Extra options are foisted upon you. Things are made intentionally worse, forcing you to pay to make it less worse. Least bad deals require careful search. Experiences are not as advertised. What you want is buried underneath stuff you don’t want. Everything is data to sell you something, rather than an opportunity to help you.

When you deal with Out to Get You, you know it in your gut. Your brain cannot relax. You lookout for tricks and traps. Everything is a scheme.

They want you not to notice. To blind you from the truth. You can feel it when you go...

Urs10

Great post, important concepts. Sharing it everywhere.

There was one piece, though, that I couldn't intuitively grasp, so maybe one of you could help me understand: What is it about video games that are out to get you? ("So do video games.") Elsewhere, Zvi speaks about F2P games, is it about this and their addiction-inducing skinner boxes? If it's about video games in general, I would love to learn how they are more out to get us than, say, novels.

16robo
Our current big stupid: not preparing for 40% agreement Epistemic status: lukewarm take from the gut (not brain) that feels rightish The "Big Stupid" of the AI doomers 2013-2023 was AI nerds' solution to the problem "How do we stop people from building dangerous AIs?" was "research how to build AIs".  Methods normal people would consider to stop people from building dangerous AIs, like asking governments to make it illegal to build dangerous AIs, were considered gauche.  When the public turned out to be somewhat receptive to the idea of regulating AIs, doomers were unprepared. Take: The "Big Stupid" of right now is still the same thing.  (We've not corrected enough).  Between now and transformative AGI we are likely to encounter a moment where 40% of people realize AIs really could take over (say if every month another 1% of the population loses their job).  If 40% of the world were as scared of AI loss-of-control as you, what could the world do? I think a lot!  Do we have a plan for then? Almost every LessWrong post on AIs are about analyzing AIs.  Almost none are about how, given widespread public support, people/governments could stop bad AIs from being built. [Example: if 40% of people were as worried about AI as I was, the US would treat GPU manufacture like uranium enrichment.  And fortunately GPU manufacture is hundreds of time harder than uranium enrichment!  We should be nerding out researching integrated circuit supply chains, choke points, foundry logistics in jurisdictions the US can't unilaterally sanction, that sort of thing.] TLDR, stopping deadly AIs from being built needs less research on AIs and more research on how to stop AIs from being built. *My research included 😬
TsviBT20

Well I asked this https://www.lesswrong.com/posts/X9Z9vdG7kEFTBkA6h/what-could-a-policy-banning-agi-look-like but roughly no one was interested--I had to learn about "born secret" https://en.wikipedia.org/wiki/Born_secret from Eric Weinstein in a youtube video.

FYI, while restricting compute manufacture is I would guess net helpful, it's far from a solution. People can make plenty of conceptual progress given current levels of compute https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce . It's not a way out, ei... (read more)

2gilch
Not quite. It was to research how to build friendly AIs. We haven't succeeded yet. What research progress we have made points to the problem being harder than initially thought, and capabilities turned out to be easier than most of us expected as well. Considered by whom? Rationalists? The public? The public would not have been so supportive before ChatGPT, because most everybody didn't expect general AI so soon, if they thought about the topic at all. It wasn't an option at the time. Talking about this at all was weird, or at least niche, certainly not something one could reasonably expect politicians to care about. That has changed, but only recently. I don't particularly disagree with your prescription in the short term, just your history. That said, politics isn't exactly our strong suit. But even if we get a pause, this only buys us some time. In the long(er) term, I think either the Singularity or some kind of existential catastrophe is inevitable. Those are the attractor states. Our current economic growth isn't sustainable without technological progress to go with it. Without that, we're looking at civilizational collapse. But with that, we're looking at ever widening blast radii for accidents or misuse of more and more powerful technology. Either we get smarter about managing our collective problems, or they will eventually kill us. Friendly AI looked like the way to do that. If we solve that one problem, even without world cooperation, it solves all the others for us. It's probably not the only way, but it's not clear the alternatives are any easier. What would you suggest? I can think of three alternatives. First, the most mundane (but perhaps most difficult), would be an adequate world government. This would be an institution that could easily solve climate change, ban nuclear weapons (and wars in general), etc. Even modern stable democracies are mostly not competent enough. Autocracies are an obstacle, and some of them have nukes. We are not on tra
7Haiku
Strong agree and strong upvote. There are some efforts in the governance space and in the space of public awareness, but there should and can be much, much more. My read of these survey results is: AI Alignment researchers are optimistic people by nature. Despite this, most of them don't think we're on track to solve alignment in time, and they are split on whether we will even make significant progress. Most of them also support pausing AI development to give alignment research time to catch up. As for what to actually do about it: There are a lot of options, but I want to highlight PauseAI. (Disclosure: I volunteer with them. My involvement brings me no monetary benefit, and no net social benefit.) Their Discord server is highly active and engaged and is peopled with alignment researchers, community- and mass-movement organizers, experienced protesters, artists, developers, and a swath of regular people from around the world. They play the inside and outside game, both doing public outreach and also lobbying policymakers. On that note, I also want to put a spotlight on the simple action of sending emails to policymakers. Doing so and following through is extremely OP (i.e. has much more utility than you might expect), and can result in face-to-face meetings to discuss the nature of AI x-risk and what they can personally do about. Genuinely, my model of a world in 2040 that contains humans is almost always one in which a lot more people sent emails to politicians.

The forum has been very much focused on AI safety for some time now, thought I'd post something different for a change. Privilege.

Here I define Privilege as an advantage over others that is invisible to the beholder. This may not be the only definition, or the central definition, or not how you see it, but that's the definition I use for the purposes of this post. I also do not mean it in the culture-war sense as a way to undercut others as in "check your privilege". My point is that we all have some privileges [we are not aware of], and also that nearly each one has a flip side. 

In some way this is the inverse of The Lens That Does Not See Its Flaws: The...

9Razied
The word "privilege" has been so tainted by its association with guilt that it's almost an infohazard to think you've got privilege at this point, it makes you lower your head in shame at having more than others, and brings about a self-flagellation sort of attitude. It elicits an instinct to lower yourself rather than bring others up. The proper reactions to all these things you've listed is gratitude to your circumstances and compassion towards those who don't have them. And certainly everyone should be very careful towards any instinct they have at publicly "acknowledging their privilege"... it's probably your status-raising instincts having found a good opportunity to boast about your intelligence, appearance and good looks while appearing like you're being modest. 

I grew up knowing "privilege" to mean a special right that was granted to you based on your job/role (like free food for those who work at some restaurants) or perhaps granted by authorities due to good behavior (and would be taken away for misusing it).  Note also that the word itself, "privi"-"lege", means "private law": a law that applies to you in particular.

Rights and laws are social things, defined by how others treat you.  To say that your physical health is a privilege therefore seems like either a category error, or a claim that other pe... (read more)

Summary:

We think a lot about aligning AGI with human values. I think it’s more likely that we’ll try to make the first AGIs do something else. This might intuitively be described as trying to make instruction-following (IF) or do-what-I-mean-and-check (DWIMAC) be the central goal of the AGI we design. Adopting this goal target seems to improve the odds of success of any technical alignment approach. This goal target avoids the hard problem of specifying human values in an adequately precise and stable way, and substantially helps with goal misspecification and deception by allowing one to treat the AGI as a collaborator in keeping it aligned as it becomes smarter and takes on more complex tasks.

This is similar but distinct from the goal targets of prosaic alignment efforts....

2Seth Herd
I read your linked shortform thread. I agreed with pretty most of your arguments against some common AGI takeover arguments. I agree that they won't coordinate against us and won't have "collective grudges" against us. But I don't think the arguments for continued stability are very thorough, either. I think we just don't know how it will play out. And I think there's a reason to be concerned that takeover will be rational for AGIs, where it's not for humans. The central difference in logic is the capacity for self-improvement. In your post, you addressed self-improvement by linking a Christiano piece on slow takeoff. But he noted at the start that he wasn't arguing against self-improvement, only that the pace of self improvement would be more modest. But the potential implications for a balance of power in the world remain. Humans are all locked to a similar level of cognitive and physical capabilities. That has implications for game theory where all of the competitors are humans. Cooperation often makes more sense for humans. But the same isn't necessarily true of AGI. Their cognitive and physical capacities can potentially be expanded on. So it's (very loosely) like the difference between game theory in chess, and chess where one of the moves is to add new capabilities to your pieces. We can't learn much about the new game from theory of the old, particularly if we don't even know all of the capabilities that a player might add to their pieces. More concretely: it may be quite rational for a human controlling an AGI to tell it to try to self-improve and develop new capacities, strategies and technologies to potentially take over the world. With a first-mover advantage, such a takeover might be entirely possible. Its capacities might remain ahead of the rest of the world's AI/AGIs if they hadn't started to aggressively self-improve and develop the capacities to win conflicts. This would be particularly true if the aggressor AGI was willing to cause global cata
2Seth Herd
In the near term AI and search are blurred, but that's a separate topic. This post was about AGI as distinct from AI. There's no sharp line between but there are important distinctions, and I'm afraid we're confused as a group because of that blurring. More above, and it's worth its own post and some sort of new clarifying terminology. The term AGI has been watered down to include LLMs that are fairly general, rather than the original and important meaning of AI that can think about anything, implying the ability to learn, and therefore almost necessarily to have explicit goals and agency. This was about that type of "real" AGI, which is still hypothetical even though increasingly plausible in the near term.

That's true, they are different. But search still provides the closest historical analogue (maybe employees/suppliers provide another). Historical analogues have the benefit of being empirical and grounded, so I prefer them over (or with) pure reasoning or judgement.

2Seth Herd
Yes, we do see such "values" now, but that's a separate issue IMO. There's an interesting thing happening in which we're mixing discussions of AI safety and AGI x-risk. There's no sharp line, but I think they are two importantly different things. This post was intended to be about AGI, as distinct from AI. Most of the economic and other concerns relative to the "alignment" of AI are not relevant to the alignment of AGI. This thesis could be right or wrong, but let's keep it distinct from theories about AI in the present and near future. My thesis here (and a common thesis) is that we should be most concerned about AGI that is an entity with agency and goals, like humans have. AI as a tool is a separate thing. It's very real and we should be concerned with it, but not let it blur into categorically distinct, goal-directed, self-aware AGI. Whether or not we actually get such AGI is an open question that should be debated, not assumed. I think the answer is very clearly that we will, and soon; as soon as tool AI is smart enough, someone will make it agentic, because agents can do useful work, and they're interesting. So I think we'll get AGI with real goals, distinct from the pseudo-goals implicit in current LLMs behavior. The post addresses such "real" AGI that is self-aware and agentic, but that has the sole goal of doing what people want is pretty much a third thing that's somewhat counterintuitive.

A couple of weeks ago three European economists published this paper studying the female income penalty after childbirth. The surprising headline result: there is no penalty.

Setting and Methodology

The paper uses Danish data that tracks IVF treatments as well as a bunch of demographic factors and economic outcomes over 25 years. Lundborg et al identify the causal effect of childbirth on female income using the success or failure of the first attempt at IVF as an instrument for fertility.

What does that mean? We can’t just compare women with children to those without them because having children is a choice that’s correlated with all of the outcomes we care about. So sorting out two groups of women based on observed fertility will also sort them based on income and...

JBlack20

Yes, that was my first guess as well. Increased income from employment is most strongly associated with major changes, such as promotion to a new position with changed (and usually increased) responsibilities, or leaving one job and starting work somewhere else that pays more.

It seems plausible that these are not the sorts of changes that women are likely to seek out at the same rate when planning to devote a lot of time in the very near future to being a first-time parent. Some may, but all? Seems unlikely. Men seem more likely to continue to pursue such opportunities at a similar rate due to gender differences in child-rearing roles.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

I don't know the answer, but it would be fun to have a twitter comment with a zillion likes asking Sam Altman this question.  Maybe someone should make one?

1Rebecca
OpenAI wasn’t a private company (ie for-profit) at the time of the OP grant though.
1Ebenezer Dukakis
So basically, I think it is a bad idea and you think we can't do it anyway. In that case let's stop calling for it, and call for something more compassionate and realistic like a public apology. I'll bet an apology would be a more effective way to pressure OpenAI to clean up its act anyways. Which is a better headline -- "OpenAI cofounder apologizes for their role in creating OpenAI", or some sort of internal EA movement drama? If we can generate a steady stream of negative headlines about OpenAI, there's a chance that Sam is declared too much of a PR and regulatory liability. I don't think it's a particularly good plan, but I haven't heard a better one.
1Ebenezer Dukakis
Sure, I think this helps tease out the moral valence point I was trying to make. "Don't allow them near" implies their advice is actively harmful, which in turn suggests that reversing it could be a good idea. But as you say, this is implausible. A more plausible statement is that their advice is basically noise -- you shouldn't pay too much attention to it. I expect OP would've said something like that if they were focused on descriptive accuracy rather than scapegoating. Another way to illuminate the moral dimension of this conversation: If we're talking about poor decision-making, perhaps MIRI and FHI should also be discussed? They did a lot to create interest in AGI, and MIRI failed to create good alignment researchers by its own lights. Now after doing advocacy off and on for years, and creating this situation, they're pivoting to 100% advocacy. Could MIRI be made up of good people who are "great at technical stuff", yet apt to shoot themselves in the foot when it comes to communicating with the public? It's hard for me to imagine an upvoted post on this forum saying "MIRI shouldn't be allowed anywhere near AI safety communications".

Firstly, I'm assuming that high resolution human brain emulation that you can run on a computer is conscious in normal sense that we use in conversations. Like, it talks, has memories, makes new memories, have friends and hobbies and likes and dislikes and stuff. Just like a human that you could talk with only through videoconference type thing on a computer, but without actual meaty human on the other end. It would be VERY weird if this emulation exhibited all these human qualities for other reason than meaty humans exhibit them. Like, very extremely what the fuck surprising. Do you agree?

So, we now have deterministic human file on our hands. 

Then, you can trivially make transformer like next token predictor out of human emulation. You just have emulation,...

1weightt an
Well, it's like saying if the {human in a car as a single system} is or is not conscious. Firstly it's a weird question, because of course it is. And even if you chain the human to a wheel in such a way they will never disjoin from the car. What I did is constrained possible actions of the human emulation. Not severely, the human still can talk whatever, just with constant compute budget, time or iterative commutation steps. Kind of like you can constrain actions of a meaty human by putting them in a jail or something. (... or in a time loop / repeated complete memory wipes) How would you expect to this possibly cash out? Suppose there are human emulations running around doing all things exactly like meaty humans. How exactly do you expect that announcement of a high scientific council go, "We discovered that EMs are not conscious* because .... and that's important because of ...".  Is that completely out of model for you? Or like, can you give me (even goofy) scenario out of that possibility Or do you think high resolution simulations will fail to replicate capabilities of humans, outlook of them? I.e special sauce/quantum fuckery/literal magic?
JBlack20

I don't expect this to "cash out" at all, which is rather the point.

The only really surprising part would be that we had a way to determine for certain whether some other system is conscious or not at all. That is, very similar (high) levels of surprisal for either "ems are definitely conscious" or "ems are definitely not conscious", but the ratio between them not being anywhere near "what the fuck" level.

As it stands, I can determine that I am conscious but I do not know how or why I am conscious. I have only a sample size of 1, and no way to access a lar... (read more)

LessOnline Festival

May 31st to June 2nd, Berkely CA