I updated a bit towards thinking that incompetence-at-reasoning is a more common/influential factor than I previously thought. Thanks.
However: Where do you think that moral realism comes from? Why is it a "thorny" issue?
social-media-like interfaces for uncovering group wisdom and will at larger scales while eliciting more productive discourse
That seems like it might significantly help with raising the sanity waterline, and thus help with coordinating on AI x-risk, and thus be extremely high-EV (if it's successfully implemented, widely adopted, and humanity survives for a decade or two beyond widespread adoption). [1]
Do you think it would be practically possible with current LLMs to implement a version of social media that promoted/suggested content based on criteria like
The "widely adopted" part seems difficult to achieve, though. The hypermajority of genpop humans would probably just keep scrolling TikTok and consuming outrage porn on X, even if Civilization 2.0 Wholesome Social Media were available. ↩︎
This discourse structure associates related claims and evidence, [...]
To make it practically possible for non-experts to efficiently make sense of large, spread-out collections of data (e.g. to answer some question about the discourse on some given topic), it's probably necessary to not only rapidly summarize all that data, but also translate it into some easily-human-comprehensible form.
I wonder if it's practically possible to have LMs read a bunch of data (from papers to Twitter "discourse") on a given topic, and rapidly/on-demand produce various kinds of concise, visual, possibly interactive summaries of that topic? E.g. something like this, or a probabilistic graphical model, or some kind of data visualization (depending on what aspect of what kind of topic is in question)?
Ideally perhaps, raw observations are reliably recorded, [...]
Do you have ideas for how to deal with counterfeit observations or (meta)data (e.g. deepfaked videos)?
Given that the basic case for x-risks is so simple/obvious [[1]] , I think most people arguing against any risk are probably doing so due to some kind of myopic/irrational subconscious motive. (It's entirely reasonable to disagree on probabilities, or what policies would be best, etc.; but "there is practically zero risk" is just absurd.)
So I'm guessing that the deeper problem/bottleneck here is people's (emotional) unwillingness to believe in x-risks. So long as they have some strong (often subconscious) motive to disbelieve x-risks, any conversation about x-risks is liable to keep getting derailed or be otherwise very unproductive. [[2]]
I think some common underlying reasons for such motivated disbelief include
I'm not sure what the best approaches are to addressing the above kinds of dynamics. Trying to directly point them out seems likely to end badly (at least with most neurotypical people). If you can somehow get people to (earnestly) do them, small mental exercises like Split and Commit or giving oneself a line of retreat might help for (1.)? For (2.), maybe
If you try the above, I'd be curious to see a writeup of the results.
Building a species of superhumanly smart & fast machine aliens without understanding how they work seems very dangerous. And yet, various companies and nations are currently pouring trillions of dollars into making that happen, and appear to be making rapid progress. (Experts disagree on whether there's a 99% chance we all die, or if there's only a 10% chance we all die and a 90% chance some corporate leaders become uncontested god-emperors, or if we end up as pets to incomprehensible machine gods, or if the world will be transformed beyond human comprehension and everyone will rely on personal AI assistants to survive. Sounds good, right?) ↩︎
A bit like trying to convince a deeply religious person via rational debate. It's not really about the evidence/reasoning. ↩︎
I wouldn't be too surprised if this kind of instinct were evolved, rather than just learned. Even neurotypical humans try to hack each other all the time, and clever psychopaths have probably been around for many, many generations. ↩︎
Think of the stuff that, when you imagine it, feels really yummy.
Also worth taking into consideration: things that feel anti-yummy. Fear/disgust/hate/etc are also signals about your values.
I think the "your values" -framing itself already sneaks in assumptions which are false for a lot of minds/brains. Notably: most minds are not perfectly monolithic/unified things well-modeled as a coherent "you/I/me". And other minds are quite unified/coherent, but are in the unfortunate situation of running on a brain that also contains other (more or less adversarial) mind-like programs/wetware.
Example:
It is entirely possible to have strongly-held values such as "I reject so-and-so arbitrary/disgusting parts of the reward circuitry Evolution designed into my brain; I will not become a slave to the Blind Idiot God's whims and attempts to control me". In that case, the "I" that holds those values clearly excludes at least some parts of its host brain's yumminess-circuitry.[1] (I.e., feelings of yumminess forced upon the mind are not signals about that mind's values, but rather more like attempts by a semi-adversarial brain to hack that mind.)
Another example:
Alex has some shitty experiences in childhood, and strongly internalizes a schema S like "if I do X, I will be safe", and thereafter has strong yumminess feelings about doing X. But later upon reflection, Alex realizes that yumminess feelings are coming from S, and that S's implicit models of reality aren't even remotely accurate now in adulthood. Alex would like to delete S from their brain, but can't. So the strong yumminess-around-X persists. Is X one of Alex's values?
So, I object to what I perceive to be an attempt to promote a narrative/frame about what constitutes "you/I/me" or "your values" for people in general. (Albeit that I'm guessing that there was no malice involved in that promoting.) Especially when it is a frame that seems to imply that many people (as they conceive of themselves) are not really/fully persons, and/or that they should let arbitrary brain-circuits corrupt their souls (if those brain-circuits happen to have the ability to produce feelings of yumminess).
Please be more careful about deploying/rolling your own metaethics.
Maybe that "I" could be described as a learned mesaoptimizer, something that arose "unintentionally" from perspective of some imaginable/nonexistent Evolution-aligned mind-designer. But so what? Why privilege some imaginary Evolution fairy over an actually existing person/mind? ↩︎
I think some of the central models/advice in this post [1] are in an uncanny valley of being substantially correct but also deficient, in ways that are liable to lead some users of the models/advice to harm themselves. (In ways distinct from the ones addressed in the post under admonishments to "not be an idiot".)
In particular, I'm referring to the notion that
The Yumminess You Feel When Imagining Things Measures Your Values
I agree that "yumminess" is an important signal about one's values. And something like yumminess or built-in reward signals are what shape one's values to begin with. But there are a some further important points to consider. Notably: Some values are more abstract than others[2]; values differ a lot in terms of
Also, we are computationally limited meat-bags, sorely lacking in the logical omniscience department.
This has some consequences:
Which in turn raises questions like
The endeavor of answering the above kinds of questions --- determining how to resolve the "shoulds" in them --- is itself value-laden, and also self-referential/recursive, since the answer depends on our meta-values, which themselves are values to which the questions apply.
Doing that properly can get pretty complicated pretty fast, not least because doing so may require Tabooing "I/me" and dissecting the various constituent parts of one's own mind down to a level where introspective access (and/or understanding of how one's own brain works) becomes a bottleneck.[7]
But in conclusion: I'm pretty sure that simply following the most straightforward interpretation of
The Yumminess You Feel When Imagining Things Measures Your Values
would probably lead to doing some kind of violence to one's own values, to gradually corrupting[8] oneself, possibly without ever realizing it or feeling bad at any point. The probable default being "might makes right" / letting the more obvious-to-S1 values eat up ever more of one's soul, at the expense of one's more abstract values.
Addendum: I'd maybe replace
The Yumminess You Feel When Imagining Things Measures Your Values
with
The Yumminess You Feel When Imagining Things is evidence about how some parts of your brain value the imagined things, to the extent that your imagination adequately captured all relevant aspects of those things.
or, the models/advice many readers might (more or less (in)correctly) construe from this post ↩︎
Examples of abstract values: "being logically consistent", "being open-minded/non-parochial", "bite philosophical bullets", "take ideas seriously", "value minds independently of the substrate they're running on". ↩︎
To give one example: Acting without adequately accounting for scope insensitivity. ↩︎
Because S1 yumminess-detectors don't grok the S2 reasoning required to understand that a goals scores highly according to the abstract value, so pursuing the goal feels unrewarding. ↩︎
Example: wanting heroin, vs wanting to not want heroin. ↩︎
Depends on (i.a.) the extent to which we value "being the kind of person I would be if my brain weren't so computationally limited/stupid", I guess. ↩︎
IME. YMMV. ↩︎
as judged by a more careful, reflective, and less computationally limited extrapolation of one's current values ↩︎
So what do you do about the growing aversion to information which is unpleasant to learn? This list is incomplete, and I appreciate your help by expanding it.
The underlying problem seems to be something like "System 1 fails to grok that the Map is not the Territory". So the solution would likely be something that helps S1 grok that.
Possibly helpful things:
Imagine, in as much concrete/experiential detail as possible, the four worlds corresponding to "unpleasant thing is true/false" x "I do/don't believe the thing". Or at least the world where "unpleasant thing is true but I don't believe it".
Dominic Cummings (former Chief Adviser to the UK PM) has written some things about nuclear strategy and how it's implemented in practice. IIUC, he's critical of (i.a.) how Schelling et al.'s game-theoretic models are (often naively/blindly) applied to the real world.