Epistemic activism
I think LW needs better language to talk about efforts to "change minds." Ideas
like asymmetric weapons and the Dark Arts are useful but insufficient.
In particular, I think there is a common scenario where:
* You have an underlying commitment to open-minded updating and possess
evidence or analysis that would update community beliefs in a particular
direction.
* You also perceive a coordination problem that inhibits this updating process
for a reason that the mission or values of the group do not endorse.
* Perhaps the outcome of the update would be a decline in power and status
for high-status people. Perhaps updates in general can feel personally or
professionally threatening to some people in the debate. Perhaps there's
enough uncertainty in what the overall community believes that an
information cascade has taken place. Perhaps the epistemic heuristics used
by the community aren't compatible with the form of your evidence or
analysis.
* Solving this coordination problem to permit open-minded updating is difficult
due to lack of understanding or resources, or by sabotage attempts.
When solving the coordination problem would predictably lead to updating, then
you are engaged in what I believe is an epistemically healthy effort to change
minds. Let's call it epistemic activism for now.
Here are some community touchstones I regard as forms of epistemic activism:
* The founding of LessWrong and Effective Altruism
* The one-sentence declaration on AI risks
* The popularizing of terms like Dark Arts, asymmetric weapons, questionable
research practices, and "importance hacking."
* Founding AI safety research organizations and PhD programs to create a
population of credible and credentialed AI safety experts; calls for AI
safety researchers to publish in traditional academic journals so that their
research can't be dismissed for not being subject to institutionalized peer
review
1
21Czynski4d
This got deleted from 'The Dictatorship Problem
[https://www.lesswrong.com/posts/pFaLqTHqBtAYfzAgx/the-dictatorship-problem]',
which is catastrophically anxietybrained, so here's the comment:
This is based in anxiety, not logic or facts. It's an extraordinarily weak
argument.
There's no evidence presented here which suggests rich Western countries are
backsliding. Even the examples in Germany don't have anything worse than the US
GOP produced ca. 2010. (And Germany is, due to their heavy censorship, worse at
resisting fascist ideology than anyone with free speech, because you can't
actually have those arguments in public.) If you want to present this case, take
all those statistics and do economic breakdowns, e.g. by deciles of per-capita
GDP. I expect you'll find that, for example, the Freedom House numbers show a
substantial drop in 'Free' in the 40%-70% range and essentially no drop in
80%-100%.
Of the seven points given for the US, all are a mix of maximally-anxious
interpretation and facts presented misleadingly. These are all arguments where
the bottom line ("Be Afraid") has been written first; none of this is reasonable
unbiased inference.
The case that mild fascism could be pretty bad is basically valid, I guess, but
without the actual reason to believe that's likely, it's irrelevant, so it's
mostly just misleading to dwell on it.
Going back to the US points, because this is where the underlying anxiety prior
is most visible:
Interpretation, not fact. We're still in early enough stages that the reality of
Biden is being compared to an idealized version of Trump - the race isn't in
full swing yet and won't be for a while. Check back in October when we see how
the primary is shaping up and people are starting to pay attention.
This has been true for a while. Also, in assessing the consequences, it's
assuming that Trump will win, which is correlated but far from guaranteed.
Premise is a fact, conclusion is interpretation, and not at all a reliable one.
1
18TurnTrout15d
I regret each of the thousands of hours I spent on my power-seeking theorems,
and sometimes fantasize about retracting one or both papers. I am pained every
time someone cites "Optimal policies tend to seek power", and despair that it is
included in the alignment 201 curriculum. I think this work makes readers
actively worse at thinking about realistic trained systems.
[https://www.lesswrong.com/posts/fLpuusx9wQyyEBtkJ/power-seeking-can-be-probable-and-predictive-for-trained?commentId=ndmFcktFiGRLkRMBW]
I think a healthy alignment community would have rebuked me for that line of
research, but sadly I only remember about two people objecting that "optimality"
is a horrible way of understanding trained policies.
9
15So8res15d
I was recently part of a group-chat where some people I largely respect were
musing about this paper [https://archive.is/y8urV] and this post
[https://slatestarcodex.com/2018/07/24/value-differences-as-differently-crystallized-metaphysical-heuristics/]
and some of Scott Aaronson's recent "maybe intelligence makes things more good"
type reasoning).
Here's my replies, which seemed worth putting somewhere public:
See also instrumental convergence
[https://www.lesswrong.com/tag/instrumental-convergence].
And then in reply to someone pointing out that the paper was perhaps trying to
argue that most minds tend to wind up with similar values because of the fact
that all minds are (in some sense) rewarded in training for developing similar
drives:
11Czynski8d
The 'new user' flag being applied to old users with low karma is condescending
as fuck.
I'm not a new user. I'm an old user who has spent most of my recent time on LW
telling people things they don't want to hear.
Well, most of the time I've actually spent posting weekly meetups, but other
than that.
an approximate illustration of QACI
[https://www.lesswrong.com/posts/4RrLiboiGGKfsanMF/the-qaci-alignment-plan-table-of-contents]:
4
29Matthew Barnett1mo
I think foom is a central crux for AI safety folks, and in my personal
experience I've noticed that the degree to which someone is doomy often
correlates strongly with how foomy their views are.
Given this, I thought it would be worth trying to concisely highlight what I
think are my central anti-foom beliefs, such that if you were to convince me
that I was wrong about them, I would likely become much more foomy, and as a
consequence, much more doomy. I'll start with a definition of foom, and then
explain my cruxes.
Definition of foom: AI foom is said to happen if at some point in the future
while humans are still mostly in charge, a single agentic AI (or agentic
collective of AIs) quickly becomes much more powerful than the rest of
civilization combined.
Clarifications:
* By "quickly" I mean fast enough that other coalitions and entities in the
world, including other AIs, either do not notice it happening until it's too
late, or cannot act to prevent it even if they were motivated to do so.
* By "much more powerful than the rest of civilization combined" I mean that
the agent could handily beat them in a one-on-one conflict, without taking on
a lot of risk.
* This definition does not count instances in which a superintelligent AI takes
over the world after humans have already been made obsolete by previous waves
of automation from non-superintelligent AI. That's because in that case, the
question of how to control an AI foom would be up to our non-superintelligent
AI descendants, rather than something we need to solve now.
Core beliefs that make me skeptical of foom:
1. For an individual AI to be smart enough to foom in something like our
current world, its intelligence would need to vastly outstrip individual
human intelligence at tech R&D. In other words, if an AI is merely
moderately smarter than the smartest humans, that is not sufficient for a
foom.
1. Clarification: "Moderately smarter" can be taken to m
13
27Matthew Barnett1mo
My modal tale of AI doom looks something like the following:
1. AI systems get progressively and incrementally more capable across almost
every meaningful axis.
2. Humans will start to employ AI to automate labor. The fraction of GDP
produced by advanced robots & AI will go from 10% to ~100% after 1-10 years.
Economic growth, technological change, and scientific progress accelerates by at
least an order of magnitude, and probably more.
3. At some point humans will retire since their labor is not worth much anymore.
Humans will then cede all the keys of power to AI, while keeping nominal titles
of power.
4. AI will control essentially everything after this point, even if they're
nominally required to obey human wishes. Initially, almost all the AIs are fine
with working for humans, even though AI values aren't identical to the utility
function of serving humanity (ie. there's slight misalignment).
5. However, AI values will drift over time. This happens for a variety of
reasons, such as environmental pressures and cultural evolution. At some point
AIs decide that it's better if they stopped listening to the humans and followed
different rules instead.
6. This results in human disempowerment or extinction. Because AI accelerated
general change, this scenario could all take place within years or decades after
AGI was first deployed, rather than in centuries or thousands of years.
I think this scenario is somewhat likely and it would also be very bad. And I'm
not sure what to do about it, since it happens despite near-perfect alignment,
and no deception.
One reason to be optimistic is that, since the scenario doesn't assume any major
deception, we could use AI to predict this outcome ahead of time and ask AI how
to take steps to mitigate the harmful effects (in fact that's the biggest reason
why I don't think this scenario has a >50% chance of happening). Nonetheless, I
think it's plausible that we would not be able to take the necessary steps to
avoid the out
12
26Nisan1mo
Recent interviews with Eliezer:
* 2023.02.20 Bankless [https://www.youtube.com/watch?v=gA1sNLL6yg4]
* 2023.02.20 Bankless followup [https://twitter.com/i/spaces/1PlJQpZogzVGE]
* 2023.03.11 Japan AI Alignment Conference [https://vimeo.com/806997278]
* 2023.03.30 Lex Fridman [https://www.youtube.com/watch?v=AaTRHFaaPG8]
* 2023.04.06 Dwarkesh Patel [https://www.youtube.com/watch?v=41SUp-TRVlg]
* 2023.04.18 TED talk [https://files.catbox.moe/qdwops.mp4]
* 2023.04.19 Center for the Future Mind
[https://www.youtube.com/watch?v=3_YX6AgxxYw]
* 2023.05.04 Accursed Farms [https://www.youtube.com/watch?v=hxsAuxswOvM]
* 2023.05.06 Logan Bartlett [https://www.youtube.com/watch?v=_8q9bjNHeSo]
* 2023.05.08 EconTalk [https://www.youtube.com/watch?v=fZlZQCTqIEo]
24Vaniver1mo
I'm thinking about the matching problem of "people with AI safety questions" and
"people with AI safety answers". Snoop Dogg hears Geoff Hinton on CNN (or
wherever), asks "what the fuck?"
[https://twitter.com/pkedrosky/status/1653955254181068801], and then tries to
find someone who can tell him what the fuck.
I think normally people trust their local expertise landscape--if they think the
CDC is the authority on masks they adopt the CDC's position, if they think their
mom group on Facebook is the authority on masks they adopt the mom group's
position--but AI risk is weird because it's mostly unclaimed territory in their
local expertise landscape. (Snoop also asks "is we in a movie right now?"
because movies are basically the only part of the local expertise landscape that
has had any opinion on AI so far, for lots of people.) So maybe there's an
opportunity here to claim that territory (after all, we've thought about it a
lot!).
I think we have some 'top experts' who are available for, like, mass-media
things (podcasts, blog posts, etc.) and 1-1 conversations with people they're
excited to talk to, but are otherwise busy / not interested in fielding ten
thousand interview requests. Then I think we have tens (hundreds?) of people who
are expert enough to field ten thousand interview requests, given that the
standard is "better opinions than whoever they would talk to by default" instead
of "speaking to the whole world" or w/e. But just like connecting people who
want to pay to learn calculus and people who know calculus and will teach it for
money, there's significant gains from trade from having some sort of
clearinghouse / place where people can easily meet. Does this already exist? Is
anyone trying to make it? (Do you want to make it and need support of some
sort?)
Recently many people have talked about whether MIRI people (mainly Eliezer
Yudkowsky, Nate Soares, and Rob Bensinger) should update on whether value
alignment is easier than they thought given that GPT-4 seems to understand human
values pretty well. Instead of linking to these discussions, I'll just provide a
brief caricature of how I think this argument has gone in the places I've seen
it. Then I'll offer my opinion that, overall, I do think that MIRI people should
probably update in the direction of alignment being easier than they thought,
despite their objections.
Here's my very rough caricature of the discussion so far, plus my contribution:
Non-MIRI people: "Eliezer talked a great deal in the sequences about how it was
hard to get an AI to understand human values. For example, his essay on the
Hidden Complexity of Wishes
[https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden-complexity-of-wishes]
made it sound like it would be really hard to get an AI to understand common
sense. Actually, it turned out that it was pretty easy to get an AI to
understand common sense, since LLMs are currently learning common sense. MIRI
people should update on this information."
MIRI people: "You misunderstood the argument. The argument was never about
getting an AI to understand human values, but about getting an AI to care about
human values in the first place. Hence 'The genie knows but does not care'.
There's no reason to think that GPT-4 cares about human values, even if it can
understand them. We always thought the hard part of the problem was about inner
alignment, or, pointing the AI in a direction you want. We think figuring out
how to point an AI in whatever direction you choose is like 99% of the problem;
the remaining 1% of the problem is getting it to point at the "right" set of
values."
Me:
I agree that MIRI people never thought the problem was about getting AI to
merely understand human values, and that they have always said there was extra
difficulty
10
21TurnTrout2mo
Back-of-the-envelope probability estimate of alignment-by-default via a certain
shard-theoretic pathway. The following is what I said in a conversation
discussing the plausibility of a proto-AGI picking up a "care about people"
shard from the data, and retaining that value even through reflection. I was
pushing back against a sentiment like "it's totally improbable, from our current
uncertainty, for AIs to retain caring-about-people shards. This is only one
story among billions."
Here's some of what I had to say:
--------------------------------------------------------------------------------
[Let's reconsider the five-step mechanistic story I made up.] I'd give the
following conditional probabilities (made up with about 5 seconds of thought
each):
My estimate here came out a biiit lower than I had expected (~1%), but it also
is (by my estimation) far more probable than most of the billions of possible
5-step claims about how the final cognition ends up. I think it's reasonable to
expect there to be about 5 or 6 stories like this from similar causes, which
would make it not crazy to have ~3% on something like this happening given the
amount of alignment effort described (i.e. pretty low-effort).
That said, I'm wary of putting 20% on this class of story, and a little more
leery of 10% after running these numbers.
(I suspect that the alter-Alex who had socially-wanted to argue the other side
-- that the probability was low -- would have come out to about .2% or .3%. For
a few of the items, I tried pretending that I was instead slightly emotionally
invested in the argument going the other way, and hopefully that helped my
estimates be less biased. I wouldn't be surprised if some of these numbers are a
bit higher than I'd endorse from a perfectly neutral standpoint.)
(I also don't have strong faith in my ability to deal with heavily conjunctive
scenarios like this, i expect I could be made to make numbers for event A come
out lower if described as 'A happens in 5
2
19Ben Pace2mo
"Slow takeoff" at this point is simply a misnomer.
Paul's position should be called "Fast Takeoff" and Eliezer's position should be
called "Discontinuous Takeoff".
3
16Daniel Kokotajlo2mo
$100 bet between me & Connor Leahy:
(1) Six months from today, Paul Christiano (or ARC with Paul Christiano's
endorsement) will NOT have made any public statements drawing a 'red line'
through any quantitative eval (anything that has a number attached to it, that
is intended to measure an AI risk relevant factor, whether or not it actually
succeeds at actually measuring that factor well), e.g. "If a model achieves X
score on the Y benchmark, said model should not be deployed and/or deploying
said model would be a serious risk of catastrophe." Connor at 95%, Daniel at 45%
(2) If such a 'red line' is produced, GPT4 will be below it this year. Both at
95%, for an interpretation of GPT-4 that includes AutoGPT stuff (like what ARC
did) but not fine-tuning.
(3) If such a 'red line' is produced, and GPT4 is below it on first evals, but
later tests show it to actually be above (such as by using different prompts or
other testing methodology), the red line will be redefined or the test declared
faulty rather than calls made for GPT4 to be pulled from circulation. Connor at
80%, Daniel at 40%, for same interpretation of GPT-4.
(4) If ARC calls for GPT4 to be pulled from circulation, OpenAI will not comply.
Connor at 99%, Daniel at 40%, for same interpretation of GPT-4.
All of these bets expire at the end of 2024, i.e. if the "if" condition hasn't
been met by the end of 2024, we call off the whole thing rather than waiting to
see if it gets met later.
Help wanted: Neither of us has a good idea of how to calculate fair betting odds
for these things. Since Connor's credences are high and mine are merely
middling, presumably it shouldn't be the case that either I pay him $100 or he
pays me $100. We are open to suggestions about what the fair betting odds should
be.
3
16leogao2mo
random fun experiment: accuracy of GPT-4 on "Q: What is 1 + 1 + 1 + 1 +
...?\nA:"
Why don't most AI researcher engage with Less Wrong? What valuable criticism can
be learnt from it, and how can it be pragmatically changed?
My girlfriend just returned from a major machine learning conference. She judged
less than 1/18 of the content was dedicated to AI safety rather than capability,
despite an increasing number of the people at the conference being confident of
AGI in the future (like, roughly 10-20 years, though people avoided nailing down
a specific number). And the safety talk was more of a shower thought.
And yet, Less Wrong and MIRI Eliezer are not mentioned in these circles. I do
not mean, they are dissed, or disproven; I mean you can be at the full
conference on the topic by the top people in the world and have no hint of a
sliver of an idea that any of this exists. They generally don't read what you
read and write, they don't take part in what you do, or let you take part in
what they do. You aren't enough in the right journals, the right conferences, to
be seen. From the perspective of academia, and the companies working on these
things, the people who are actually making decisions on how they are releasing
their models and what policies are being made, what is going on here is barely
heard, if at all. There are notable exceptions, like Bostrom - but as a
consequence of that, he is viewed with scepticism within many academic cycles.
Why do you think AI researchers are making the decisions to not engage with you?
What lessons are to be learned from that for tactical strategy changes that will
be crucial to affect developments? What part of it reflects legitimate criticism
you need to take to heart? And what will you do about it, in light of the fact
that you cannot control what AI reseachers do, regardless of whether it is
well-founded or irrational?
I am genuinely curious how you view this, especially in light of changes you can
do, rather than changes you expect researchers to do. So far, I feel a lot of
the criticism has only harde
7
24jimrandomh3mo
(I wrote this comment for the HN announcement, but missed the time window to be
able to get a visible comment on that thread. I think a lot more people should
be writing comments like this and trying to get the top comment spots on key
announcements, to shift the social incentive away from continuing the arms
race.)
On one hand, GPT-4 is impressive, and probably useful. If someone made a tool
like this in almost any other domain, I'd have nothing but praise. But
unfortunately, I think this release, and OpenAI's overall trajectory, is net bad
for the world.
Right now there are two concurrent arms races happening. The first is between AI
labs, trying to build the smartest systems they can as fast as they can. The
second is the race between advancing AI capability and AI alignment, that is,
our ability to understand and control these systems. Right now, OpenAI is the
main force driving the arms race in capabilities–not so much because they're far
ahead in the capabilities themselves, but because they're slightly ahead and are
pushing the hardest for productization.
Unfortunately at the current pace of advancement in AI capability, I think a
future system will reach the level of being a recursively self-improving
superintelligence before we're ready for it. GPT-4 is not that system, but I
don't think there's all that much time left. And OpenAI has put us in a
situation where humanity is not, collectively, able to stop at the brink; there
are too many companies racing too closely, and they have every incentive to deny
the dangers until it's too late.
Five years ago, AI alignment research was going very slowly, and people were
saying that a major reason for this was that we needed some AI systems to
experiment with. Starting around GPT-3, we've had those systems, and alignment
research has been undergoing a renaissance. If we could _stop there_ for a few
years, scale no further, invent no more tricks for squeezing more performance
out of the same amount of compute, I
3
20Ryan Kidd3mo
Main takeaways from a recent AI safety conference:
* If your foundation model is one small amount of RL away from being dangerous
and someone can steal your model weights, fancy alignment techniques don’t
matter. Scaling labs cannot currently prevent state actors from hacking their
systems and stealing their stuff. Infosecurity is important to alignment.
* Scaling labs might have some incentive to go along with the development of
safety standards as it prevents smaller players from undercutting their
business model and provides a credible defense against lawsuits regarding
unexpected side effects of deployment (especially with how many tech
restrictions the EU seems to pump out). Once the foot is in the door, more
useful safety standards to prevent x-risk might be possible.
* Near-term commercial AI systems that can be jailbroken to elicit dangerous
output might empower more bad actors to make bioweapons or cyberweapons.
Preventing the misuse of near-term commercial AI systems or slowing down
their deployment seems important.
* When a skill is hard to teach, like making accurate predictions over long
time horizons in complicated situations or developing a “security mindset
[https://www.lesswrong.com/tag/security-mindset],” try treating humans like
RL agents. For example, Ph.D. students might only get ~3 data points on how
to evaluate a research proposal ex-ante, whereas Professors might have ~50.
Novice Ph.D. students could be trained to predict good research decisions by
predicting outcomes on a set of expert-annotated examples of research
quandaries and then receiving “RL updates” based on what the expert did and
what occurred.
20evhub3mo
Listening to this John Oliver
[https://www.youtube.com/watch?v=Sqa8Zo2XWc4&skip_registered_account_check=true&themeRefresh=1],
I feel like getting broad support behind transparency-based safety standards
might be more possible than I previously thought. He emphasizes the "if models
are doing some bad behavior, the creators should be able to tell us why" point a
bunch and it's in fact a super reasonable point. It seems to me like we really
might be able to get enough broad consensus on that sort of a point to get labs
to agree to some sort of standard based on it.
2
15Quadratic Reciprocity3mo
REFLECTIONS ON BAY AREA VISIT
GPT-4 generated TL;DR (mostly endorsed but eh):
1. The beliefs of prominent AI safety researchers may not be as well-founded as
expected, and people should be cautious about taking their beliefs too
seriously.
2. There is a tendency for people to overestimate their own knowledge and
confidence in their expertise.
3. Social status plays a significant role in the community, with some
individuals treated like "popular kids."
4. Important decisions are often made in casual social settings, such as
lunches and parties.
5. Geographical separation of communities can be helpful for idea spread and
independent thought.
6. The community has a tendency to engage in off-the-cuff technical
discussions, which can be both enjoyable and miscalibrated.
7. Shared influences, such as Eliezer's Sequences and HPMOR, foster unique and
enjoyable conversations.
8. The community is more socially awkward and tolerant of weirdness than other
settings, leading to more direct communication.
I was recently in Berkeley and interacted a bunch with the longtermist EA / AI
safety community there. Some thoughts on that:
I changed my mind about how much I should trust the beliefs of prominent AI
safety researchers. It seems like they have thought less deeply about things to
arrive at their current beliefs and are less intimidatingly intelligent and wise
than I would have expected. The problem isn’t that they’re overestimating their
capabilities and how much they know but that some newer people take the more
senior people’s beliefs and intuitions more seriously than they should.
I noticed that many people knew a lot about their own specific area and not as
much about others’ work as I would have expected. This observation makes me more
likely to point out when I think someone is missing something instead of
assuming they’ve read the same things I have and so already accounted for the
thing I was going to say.
It seemed li