ozziegooen

I'm currently researching forecasting and epistemics as part of the Quantified Uncertainty Research Institute.

Sequences

Beyond Questions & Answers
Squiggle
Prediction-Driven Collaborative Reasoning Systems

Wikitag Contributions

Comments

Sorted by

I partially agree, but I think this must only be a small part of the issue.

- I think there's a whole lot of key insights people could raise that aren't info-hazards. 
- If secrecy were the main factor, I'd hope that there would be some access-controlled message boards or similar. I'd want the discussion to be intentionally happening somewhere. Right now I don't really think that's happening. I think a lot of tiny groups have their own personal ideas, but there's surprisingly little systematic and private thinking between the power players. 
- I think that secrecy is often an excuse not to open ideas to feedback, and thus not be open to critique. Often, what what I see, this goes hand-in-hand with "our work just really isn't that great, but we don't want to admit it"

In the last 8 years or so, I've kept on hoping there would be some secret and brilliant "master plan" around EA, explaining the lack of public strategy. I have yet to find one. The closest I know of is some over-time discussion and slack threads with people at Constellation and similar - I think these are interesting in terms of understanding the perspectives of these (powerful) people, but I don't get the impression that there's all too much comprehensiveness of genius that's being hidden. 

That said,
- I think that policy orgs need to be very secretive, so agree with you regarding why those orgs don't write more big-picture things.

This is an orthogonal question. I agree that if we're there now, my claim is much less true. 

I'd place fairly little probability mass on this (<10%) and believe much of the rest of the community does as well, though I realize there is a subset of the LessWrong-adjacent community that does. 

I'm not sure if it means much, but I'd be very happy if AI safety could get another $50B from smart donors today.

I'd flag that [stopping AI development] would cost far more than $50B. I'd expect that we could easily lose $3T of economic value in the next few years if AI progress seriously stopped. 

I guess, it seems to me like duration is basically dramatically more expensive to get than funding, for amounts of funding people would likely want. 

Thanks for the specificity! 

> On harder-to-operationally-define dimensions (sense of hope and agency for the 25th through 75th percentile of culturally normal people), it’s  quite a bit worse.

I think it's likely that many people are panicking and losing hope each year. There's a lot of grim media around.

I'm far less sold that something like "civilizational agency" is declining. From what I can tell, companies have gotten dramatically better at achieving their intended ends in the last 30 years, and most governments have generally been improving in effectiveness. 

One challenge I'd have for you / others who feel similar to you, is to try to get more concrete on measures like this, and then to show that they have been declining.

My personal guess is that a bunch of people are incredibly anxious over the state of the world, largely for reasons of media attention, and then this spills over into them assuming major global ramifications without many concrete details or empirical forecasts. 
 

In terms of proposing and discussing AI Alignment strategies, I feel like a few individuals have been dominating the LessWrong conversation recently.
 
I've seen a whole lot from John Wentworth and the Redwood team.

After that, it seems to get messier. 

There are several individuals or small groups with their own very unique takes. Matthew Barnett, Davidad, Jesse Hoogland, etc. I think these groups often have very singular visions that they work on, that few others have much buy-in with. 

Groups like the Deepmind and Anthropic safety teams seem hesitant to write much about or discuss big-picture strategy. My impression is that specific researchers are working typically working on fairly narrow agendas, and that the leaders of these orgs don't have the most coherent strategies. There's one big problem that it's very difficult to be honest and interesting about big-picture AI strategy without saying things that would be bad for a major organization to say.  

Most policy people seem focused on policy details. The funders (OP?) seem pretty quiet. 

I think there's occasionally some neat papers or posts that come from AI Policy or groups like Convergence research. But these also don't seem to be a big part of the conversation I see - like the authors are pretty segmented, and other LessWrong readers and AI safety people don't pay much attention to their work. 

Here are some important-seeming properties to illustrate what I mean:

  1. Robustness of value-alignment: Modern LLMs can display a relatively high degree of competence when explicitly reasoning about human morality. In order for it to matter for RSI, however, those concepts need to also appropriately come into play when reasoning about seemingly unrelated things, such as programming. The continued ease of jailbreaking AIs serves to illustrate this property failing (although solving jailbreaking would not necessarily get at the whole property I am pointing at).
  2. Propagation of beliefs: When the AI knows something, it should know it in a way which integrates well with everything else it knows, rather than easily displaying the knowledge in one context while seeming to forget it in another.
  3. Preference for reasons over rationalizations: An AI should be ready and eager to correct its mistakes, rather than rationalizing its wrong answers. It should be truth-seeking, following thoughts where they lead instead of planning ahead to justify specific answers. It should prefer to valid proof steps over arriving at an answer when the two conflict.
  4. Knowing the limits of its knowledge: Metacognitive awareness of what it knows and what it doesn't know, appropriately brought to bear in specific situations. The current AI paradigm just has one big text-completion probability distribution, so there's not a natural way for it to distinguish between uncertainty about the underlying facts and uncertainty about what to say next -- hence we get hallucinations.

All of this is more-or-less a version of the metaphilosophy research agenda, framed in terms of current events in AI.


I very much like the concreteness here. 

I consider these sorts of things just fundamental epistemic problems, or basic skills that good researchers should have. All superforecasters should be very familiar with issues 2-4, and most probably couldn't define metaphilosophy. I don't see the need to be fancy about it.

On that note, I'll hypothesize that if we were to make benchmarks for any of these items, it would be fairly doable to make AIs that do better than humans on them, then later we could achieve greater and greater measures. I have a hard time imagining tests here that I would feel confident would not get beaten, if there was sufficient money on the line, in the next year or two. 

I think that Slop could be a social problem (i.e. there are some communities that can't tell slop from better content) , but I'm having a harder time imagining it being a technical problem.

I have a hard time imagining a type of Slop that isn't low in information. All the kinds of Slop I'm familiar with is basically, "small variations on some ideas, which hold very little informational value."

It seems like models like o1 / r1 are trained by finding ways to make information-dense AI-generated data. I expect that trend to continue. If AIs for some reason experience some "slop thresh-hold", I don't see how they get much further by using generated data.

I mostly want to point out that many disempowerment/dystopia failure scenarios don't require a step-change from AI, just an acceleration of current trends. 

Do you think that the world is getting worse each year? 

My rough take is that humans, especially rich humans, are generally more and more successful. 

I'm sure there are ways for current trends to lead to catastrophe - line some trends dramatically increasing and others decreasing, but that seems like it would require a lengthy and precise argument. 

In many worlds, if we have a bunch of decently smart humans around, they would know what specific situations "very dumb humans" would mess up, and take the corresponding preventative measures.

A world where many small pockets of "highly dumb humans" could cause an existential catastrophe is one that's very clearly incredibly fragile and dangerous, enough so that I assume reasonable actors would freak out until it stops being so fragile and dangerous. I think we see this in other areas - like cyber attacks, where reasonable people prevent small clusters of actors from causing catastrophic damage. 

It's possible that the offense/defense balance would dramatically favor tiny groups of dumb actors, and I assume that this is what you and others expect, but I don't see it yet. 

I feel like you're talking in highly absolutist terms here.

Global wealth is $454.4 trillion. We currently have ~8 Bil humans, with an average happiness of say 6/10. Global wealth and most other measures of civilization flourishing that I know of seem to be generally going up over time.

I think that our world makes a lot of mistakes and fails a lot at coordination. It's very easy for me to imagine that we could increase global wealth by 3x if we do a decent job.

So how bad are things now? Well, approximately, "We have the current world, at $454 Trillion, with 8 billion humans, etc". To me that's definitely something to work with. 

Load More