Book 3 of the Sequences Highlights

While beliefs are subjective, that doesn't mean that one gets to choose their beliefs willy-nilly. There are laws that theoretically determine the correct belief given the evidence, and it's towards such beliefs that we should aspire.

Recent Discussion

TL;DR A company that maximizes its bond price instead of its stock price cares about the long-term future and is incentivized to reduce existential risk.

The world that solves the alignment problem has an institution with the right incentives

There is a world in which we solve the alignment problem. We are not in that world.

The world in which we solve the alignment problem has institutions with incentives to solve alignment-like problems. The same way Walmart has incentives to sell you groceries for a reasonable price. In countries with functioning market economies, nobody thinks about "the grocery problem".

One such institution is one that has incentives to care about the long-term future. I believe that there is an easy way to create such an institution in our current world.

Eternal companies

...
1parafactual9h
If someone believes in high existential risk, then they already have strong incentive to prevent it, because if they don't, they will die. I'm confused as to how this would provide additional incentive. Walmart has incentive to be the institution that provides groceries for a reasonable price, not just for there to be reasonably priced groceries at all. Everyone already has incentive for food to be affordable, often so that they can afford food, but also because of the adverse effects of starving a populace.
3Dagon10h
Interesting idea, and I'm glad to explore it a bit.  I think in addition to your other reasons it's not workable, it's pretty questionable whether "corporation" is the unit of institution to focus on.  I'm also pretty skeptical that slack is compatible with financial metrics as the primary optimization lever, whether amortized or instantaneous. Also, it's unclear (when discussed over the long term, assuming rational investors (which is broken for your proposal as well)) that theoretical stock value deviates much from perpetual bond value.  Both are quite sensitive to perceived stability of company.

it's pretty questionable whether "corporation" is the unit of institution to focus on.

I agree. AI Safety is a public good and so suffers from the https://en.wikipedia.org/wiki/Free-rider_problem and so even if you had eternal companies, they would have to co-ordinate some how. But I think it would be easier for eternal companies to coordinate on AI Safety compared to normal companies.

I'm also pretty skeptical that slack is compatible with financial metrics as the primary optimization lever, whether amortized or instantaneous.

I'm not sure what you me... (read more)

4AnthonyC10h
One concern on the alignment of executive compensation is that it's especially hard to get executives to care about what happens after they die, unless their perpetual bonds go to their heirs, unlike a regular pension. Even then, they or their heirs can sell those bonds, no? At least in the US, we have laws setting time limits on constraints about how heirs can use or dispose of property left to them. And if an eternal company's growth is slow by necessity, then the smart move would be investing the proceeds from perpetual bonds in a diverse portfolio of market-traded faster-growth companies. I understand this is something university endowments sometimes do to get around restrictions on how some funds can be used. When I look at the world's actually existing very old companies, I think the end state for an eternal company might look something like Sumitomo Group: diversified enough to survive systemic and idiosyncratic shifts to any subset of its interests, interdependent enough for mutual support to survive downturns and finance needed changes internally, and willing to divest parts of itself when needed. (Kinda like the world's oldest trees (and largest funghi), that have many above-ground bodies that die all the time, but interconnected wide-spanning root systems and shared DNA.) A lot of long-lived companies are (or at least were) family businesses motivated to preserve intergenerational wealth, like Merck. Do we, or should we expect to, see any signs that these kinds of companies ae unusually motivated to reduce existential risks?

The Story as of ~4 Years Ago

Back in 2020, a group at OpenAI ran a conceptually simple test to quantify how much AI progress was attributable to algorithmic improvements. They took ImageNet models which were state-of-the-art at various times between 2012 and 2020, and checked how much compute was needed to train each to the level of AlexNet (the state-of-the-art from 2012). Main finding: over ~7 years, the compute required fell by ~44x. In other words, algorithmic progress yielded a compute-equivalent doubling time of ~16 months (though error bars are large in both directions).

On the compute side of things, in 2018 a group at OpenAI estimated that the compute spent on the largest training runs was growing exponentially with a doubling rate of ~3.4 months, between 2012...

8Gabriel Mukobi5h
Were you implying that Chinchilla represents algorithmic progress? If so, I disagree: technically you could call a scaling law function an algorithm, but in practice, it seems Chinchilla was better because they scaled up the data. There are more aspects to scale than model size.
6johnswentworth4h
Scaling up the data wasn't algorithmic progress. Knowing that they needed to scale up the data was algorithmic progress.

It seems particularly trivial from an algorithmic aspect? You have the compute to try an idea so you try it. The key factor is still the compute.

Unless you’re including the software engineering efforts required to get these methods to work at scale, but I doubt that?

4johnswentworth6h
That would, and in general restrictions aimed at increasing price/reducing supply could work, though that doesn't describe most GPU restriction proposals I've heard.

I sometimes hear people asking: “What is the plan for avoiding a catastrophe from misaligned AI?”

This post gives my working answer to that question - sort of. Rather than a plan, I tend to think of a playbook.1

  • A plan connotes something like: “By default, we ~definitely fail. To succeed, we need to hit multiple non-default goals.” If you want to start a company, you need a plan: doing nothing will definitely not result in starting a company, and there are multiple identifiable things you need to do to pull it off.
  • I don’t think that’s the situation with AI risk.
    • As I argued before, I think we have a nontrivial chance of avoiding AI takeover even in a “minimal-dignity” future - say, assuming essentially no growth
...

One way that things could go wrong, not addressed by this playbook: AI may differentially accelerate intellectual progress in a wrong direction, or in other words create opportunities for humanity to make serious mistakes (by accelerating technological progress) faster than wisdom to make right choices (philosophical progress). Specific to the issue of misalignment, suppose we get aligned human-level-ish AI, but it is significantly better at speeding up AI capabilities research than the kinds of intellectual progress needed to continue to minimize misalign... (read more)

Eliezer recently tweeted that most people can't think, even most people here, but at least this is a place where some of the people who can think, can also meet each other

This inspired me to read Heidegger's 1954 book What is Called Thinking? (pdf),  in which Heidegger also declares that despite everything, "we are still not thinking". 

Of course, their reasons are somewhat different. Eliezer presumably means that most people can't think critically, or effectively, or something. For Heidegger, we're not thinking because we've forgotten abou... (read more)

For this month's open thread, we're experimenting with Inline Reacts as part of the bigger reacts experiment. In addition to being able to react to a whole comment, you can apply a react to a specific snippet from the comment. When you select text in a comment, you'll see this new react-button off to the side (currently only designed to work well on desktop. If it goes well we'll put more polish into getting it working on mobile)

Right now this is enabled on a couple specific posts, and if it goes well we'll roll it out to more posts.


Meanwhile, the usual intro to Open Threads:

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the...

2Vanessa Kosoy1h
I have the same bug in Firefox 113.0.2 on Windows 11. But, it seems to depend on what I select: for some selections it works, for some selections it doesn't.

To clarify, does this prevent you from in-line reacting or just remove your selection? (ie can you click the button and see the react palette, and what text appears there when you do?)

2Jayson_Virissimo2h
Thanks, that's getting pretty close to what I'm asking for. Since posting the above, I've also found Katja Grace's Argument for AI x-risk from competent malign agents [https://wiki.aiimpacts.org/doku.php?id=arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:will_malign_ai_agents_control_the_future:argument_for_ai_x-risk_from_competent_malign_agents:start] and Joseph Carlsmith's Is Power-Seeking AI an Existential Risk [https://arxiv.org/abs/2206.13353], both of which seem like the kind of thing you could point an analytic philosopher at and ask them which premise they deny. Any idea if something similar is being done to cater to economists (or other social scientists)?
2jimrandomh5h
I can reproduce loss-of-selection on mouseover some of the time on up-to-date Chrome, so, I think probably not browser specific.

Epistemic status: Big if true/I am clearly an idiot for even posting this.

Some apparently real journalists have been approached by (& approached) several intelligence officials, some tasked specifically with investigating UFOs, who claim that the DoD has had evidence of alien intervention for a while in the form of partial & mostly-whole fragments of alien aircraft. A followup article where the publication outlines how the editors verified this persons' and others' claims and affiliations is here, and a part 2 is expected tomorrow.

For some reason - very possibly because it's complete nonsense, or because they haven't had time to independently verify - the story has only been picked up by NYMag so far. The consensus among the people I've been reviewing this article with, is that it's...

I have not read this post, and I have not looked into whatever the report is, but I'm willing to take a 100:1 bet that there is no such non-human originating craft (by which I mean anything actively designed by a technological species — I do not mean that no simple biological matter of any kind could not have arrived on this planet via some natural process like an asteroid), operationalized to there being no Metaculus community forecast (or Manifold market with a sensible operationalization and reasonable number of players) that assigns over 50% probabilit... (read more)

4lsusr8h
Most bets I see are on the order of $10-$1000 which, according to the Kelly Criterion, implies negligible confidence. I'm willing to bet substantially more than that. If we had a real prediction market with proper derivatives, low fees, high liquidity, reputable oracles, etcetera, then I'd just use the standard exchange, but we don't. Consequently, market friction vastly outweighs actual probabilities in importance. Bingo. This is exactly what I mean. Thank you for clarifying. It is important to note that "probability of winning" is not the same as "probability of getting paid, and thus profiting". It's the latter that I care about.
2lsusr8h
That is an honorable offer (I appreciate it, really), but it has negative expected value for me due to counterparty risk, friction, variance, etcetera. (See bayesed's comment [https://www.lesswrong.com/posts/oY9HNicqGGihymnzk/intelligence-officials-say-u-s-has-retrieved-craft-of-non?commentId=ZvYqRxfjc4fyJJg8P].) I'd need substantially better odds for the expected profit to exceed the friction.
2lsusr8h
I'm willing to bet five figures, in theory, but there's a ton of factors that need to be accounted for like capital tie-up, counterparty risk, the value of my time, etc. So if your odds aren't lower than 90%, then it's probably not even worthwhile to bet. Too much friction.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Subscribe to Curated posts
Log In Reset Password
...or continue with
2Vladimir_Nesov2h
There are two importantly different senses of disempowerment. The stars could be taken out of reach, forever, but human civilization develops in its own direction. Alternatively, human civilization is molded according to AIs' aesthetics, there are interventions that manipulate.
1O O1h
Is there a huge reason the latter is hugely different from the former for the average person excluding world leaders.
2Vladimir_Nesov1h
It's a distinction between these different futures. The present that ends in everyone of Earth dying is clearly different from both, but the present literally everlasting is hopefully not a consideration.

I’m just trying to understand the biggest doomers. I feel like disempowerment is probably hard to avoid.

However I don’t think a disempowered future with bountiful lives would be terrible depending on how tiny the kindness weight is/how off it is from us. We are 1/10^53 of the observable universe’s resources. Unless alignment is wildly off base, I see AI directed extinction as unlikely.

I fail to see why even figures like Paul Christiano peg it at such a high level, unless he estimates human directed extinction risks to be high. It seems quite easy to create a plague that wipes out humans and a spiteful individual can do it, probably more likely than an extremely catastrophically misaligned AI.

This post is part of my AI strategy nearcasting series: trying to answer key strategic questions about transformative AI, under the assumption that key events will happen very soon, and/or in a world that is otherwise very similar to today's.

This post gives my understanding of what the set of available strategies for aligning transformative AI would be if it were developed very soon, and why they might or might not work. It is heavily based on conversations with Paul Christiano, Ajeya Cotra and Carl Shulman, and its background assumptions correspond to the arguments Ajeya makes in this piece (abbreviated as “Takeover Analysis”).

I premise this piece on a nearcast in which a major AI company (“Magma,” following Ajeya’s terminology) has good reason to think that it can...

I don't think of process-based supervision as a totally clean binary, but I don't think of it as just/primarily being about how many steps you allow in between audits. I think of it as primarily being about whether you're doing gradient updates (or whatever) based on outcomes (X was achieved) or processes (Y seems like a well-reasoned step to achieve X). I think your "Example 0" isn't really either - I'd call it internals-based supervision. 

I agree it matters how many steps you allow in between audits, I just think that's a different distinction.

Here’... (read more)

A network of specialized open-source agents emerges

Developed by open-source communities, “agentic” AI systems like AutoGPT and BabyAGI begin to demonstrate increased levels of goal-directed behavior. They are built with the aim of overcoming the limitations of current LLMs by adding persistent memory and agentic capabilities. When GPT-4 is launched and OpenAI offers an API soon after, these initiatives generate a substantial surge in attention and support.

This inspires a wave of creativity that Andrej Karpathy of OpenAI calls a “Cambrian explosion”, evoking a reference to the emergence of a rich variety of life forms within a relatively brief time span over 500 million years ago. Much like those new animals filled vacant ecological niches through specialization, the most successful of the recent initiatives similarly specialize in narrow domains. The...

6Daniel Kokotajlo12h
Nice story! Mostly I think that the best AGIs will always be in the big labs rather than open source, and that current open-source models aren't smart enough to get this sort of self-improving ecosystem off the ground. But it's not completely implausible.

Thank you very much! I agree. We chose this scenario out of many possibilities because so far it hasn't been described in much detail and because we wanted to point out that open source can also lead to dangerous outcomes, not because it is the most likely scenario. Our next story will be more "mainstream".