LESSWRONG
LW

Nate Showell
502Ω241111
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
2Nate Showell's Shortform
2y
20
Agent foundations: not really math, not really science
Nate Showell14d3-2

I was describing reasoning about idealized superintelligent systems as the method used in agent foundations research, rather than its goal. In the same way that string theory is trying to figure out "what is up with elementary particles at all," and tries to answer that question by doing not-really-math about extreme energy levels, agent foundations is trying to figure out "what is up with agency at all" by doing not-really-math about extreme intelligence levels.

 

If you've made enough progress in your research that it can make testable predictions about current or near-future systems, I'd like to see them. But the persistent failure of agent foundations research to come up with any such bridge between idealized models and real-world system has made me doubtful that the former are relevant to the latter.

Reply
Agent foundations: not really math, not really science
Nate Showell15d4-4

For me, the OP brought to mind another kind of "not really math, not really science": string theory. My criticisms of agent foundations research are analogous to Sabine Hossenfelder's criticisms of string theory, in that string theory and agent foundations both screen themselves off from the possibility of experimental testing in their choice of subject matter: the Planck scale and very early universe for the former, and idealized superintelligent systems for the latter. For both, real-world counterparts (known elementary particles and fundamental forces; humans and existing AI systems) of the objects they study are primarily used as targets to which to overfit their theoretical models. They don't make testable predictions about current or near-future systems. Unlike with early computer science, agent foundations doesn't come with an expectation of being able to perform experiments in the future, or even to perform rigorous observational studies.

Reply
Inscrutability was always inevitable, right?
Nate Showell23d1-2

Building on what you said, pre-LLM agent foundations research appears to have made the following assumptions about what advanced AI systems would be like:

  1. Decision-making processes and ontologies are separable. An AI system's decision process can be isolated and connected to a different world-model, or vice versa.
  2. The decision-making process is human-comprehensible and has a much shorter description length than the ontology.
  3. As AI systems become more powerful, their decision processes approach a theoretically optimal decision theory that can also be succinctly expressed and understood by human researchers.

None of these assumptions ended up being true of LLMs. In an LLM, the world-model and decision process are mixed together in a single neural network instead of being separate entities. LLMs don't come with decision-related concepts like "hypothesis" and "causality" pre-loaded; those concepts are learned over the course of training and are represented in the same messy, polysemantic way as any other learned concept. There's no way to separate out the reasoning-related features to get a decision process you could plug into a different world-model. In addition, when LLMs are scaled up, their decision-making becomes more complex and inscrutable due to being distributed across the neural network. The LLM's decision-making process doesn't converge into a simple and human-comprehensible decision theory.

Reply
Cole Wyeth's Shortform
Nate Showell2mo2-1

More useful. It would save us the step of having to check for hallucinations when doing research.

Reply
Generalized Hangriness: A Standard Rationalist Stance Toward Emotions
Nate Showell2mo*274

Another example of this pattern that's entered mainstream awareness is tilt. When I'm playing chess and get tilted, I might think things like "all my opponents are cheating, "I'm terrible at this game and therefore stupid," or "I know I'm going to win this time, how could I not win against such a low-rated opponent." But if I take a step back, notice that I'm tilted, and ask myself what information I'm getting from the feeling of being tilted, I notice that it's telling me to take a break until I can stop obsessing over the result of the previous game.

Tilt is common, but also easy to fix once you notice the pattern of what it's telling you and start taking breaks when you experience it. The word "tilt" is another instance of a hangriness-type stance that's caught on because of its strong practical benefits--having access to the word "tilt" makes it easier to notice.

Reply
LessWrong Feed [new, now in beta]
Nate Showell2mo30

It's working now. I think the problem was on my end.

Reply1
‘AI for societal uplift’ as a path to victory
Nate Showell2mo63

This strategy suggests that decreasing ML model sycophancy should be a priority for technical researchers. It's probably the biggest current barrier to the usefulness of ML models as personal decision-making assistants. Hallucinations are probably the second-biggest barrier.

Reply
LessWrong Feed [new, now in beta]
Nate Showell2mo10

The new feed doesn't load at all for me.

Reply
Consider chilling out in 2028
Nate Showell2mo2815

There's another way in which pessimism can be used as a coping mechanism: it can be an excuse to avoid addressing personal-scale problems. A belief that one is doomed to fail, or that the world is inexorably getting worse, can be used as an excuse to give up, on the grounds that comparatively small-scale problems will be swamped by uncontrollable societal forces. Compared to confronting those personal-scale problems, giving up can seem very appealing, and a comparison to a large-scale but abstract problem can act as an excuse for surrender. You probably know someone who spends substantial amounts of their free time watching videos, reading articles, and listening to podcasts that blame all of the world's problems on "capitalism," "systemic racism," "civilizational decline," or something similar, all while their bills are overdue and dishes pile up in their sink.

 

This use of pessimism as a coping mechanism is especially pronounced in the case of apocalypticism. If the world is about to end, every other problem becomes much less relevant in comparison, including all those small-scale problems that are actionable but unpleasant to work on. Apocalypticism can become a blanket pretext for giving in to your ugh fields. And while you're giving in to them, you end up thinking you're doing a great job of utilizing the skill of staring into the abyss (you're confronting the possibility of the end of the world, right?) when you're actually doing this exact opposite. Rather than something related to preverbal trauma, this usability as a coping mechanism is the more likely source of the psychological appeal of AI apocalypticism for many people who encounter it.

Reply21
Distillation Robustifies Unlearning
Nate Showell3mo32

Another experiment idea: testing whether the reduction in hallucinations that Yao et al. achieved with unlearning can be made robust.

Reply
Load More
AI-Fizzle
10h
(+333)
26How are you preparing for the possibility of an AI bust?
Q
1y
Q
16
2Nate Showell's Shortform
2y
20
23Degamification
3y
3
8Reinforcement Learner Wireheading
3y
2