I have some long comments I can't refind now (weirdly) about the difficulty of
investing based on AI beliefs (or forecasting in general): similar to catching
falling knives, timing is all-important and yet usually impossible to nail down
accurately; specific investments are usually impossible if you aren't literally
founding the company, and indexing 'the entire sector' definitely impossible.
Even if you had an absurd amount of money, you could try to index and just plain
fail - there is no index which covers, say, OpenAI.
Apropos, Matt Levine
[https://www.bloomberg.com/opinion/articles/2023-07-03/insider-trading-is-better-from-home]
comments on one attempt to do just that:
This is especially funny because it also illustrates timing problems:
Oops.
Oops.
Also, people are quick to tell you how it's easy to make money, just follow
$PROVERB, after all, markets aren't efficient, amirite? So, in the AI bubble,
surely the right thing is to ignore the AI companies who 'have no moat' and
focus on the downstream & incumbent users and invest in companies like Nvidia
('sell pickaxes in a gold rush, it's guaranteed!'):
Oops.
tldr: Investing is hard; in the future, even more so.
30Elizabeth15d
EA/rationality has this tension between valuing independent thought, and the
fact that most original ideas are stupid. But the point of independent thinking
isn't necessarily coming up with original conclusions. It's that no one else can
convey their models fully so if you want to have a model with fully fleshed-out
gears you have to develop it yourself.
1
23Elizabeth3d
ooooooh actual Hamming spent 10s of minutes asking people about the most
important questions in their field and helping them clarify their own judgment,
before asking why they weren't working on this thing they clearly valued and
spent time thinking about. That is pretty different from demanding strangers at
parties justify why they're not working on your pet cause.
2
20LoganStrohl10d
I had a baby on June 20th. I wrote a whole bunch of stuff about what it was like
for me to give birth at home without pain medication. I've just published it all
to my website, [https://www.loganstrohl.com/birth] along with photos and
videos.
CN: If you click on "words", you won't see anybody naked. If you click on
"photos" or "videos", you will see me very extra naked.
The "words" section includes a birth story, followed by a Q&A section with
things like "What do contractions feel like?", "How did you handle the pain?",
and "How did you think about labor, going into it?". There's also a bit at the
very bottom of the page where you can submit more questions, though of course
you're also welcome to ask me stuff here.
15Elizabeth7d
Much has been written about how groups tend to get more extreme over time. This
is often based on evaporative cooling, but I think there's another factor: it's
the only way to avoid the geeks->mops->sociopaths death spiral.
An EA group of 10 people would really benefit from one of those people being
deeply committed to helping people but hostile to the EA approach, and another
person who loves spreadsheets but is indifferent to what they're applied to. But
you can only maintain the ratio that finely when you're very small. Eventually
you need to decide if you're going to ban scope-insensitive people or allow
infinitely many, and lose what makes your group different.
"Decide" may mean consciously choose an explicit policy, but it might also mean
gradually cohere around some norms. The latter is more fine-tuned in some ways
but less in others.
I regret each of the thousands of hours I spent on my power-seeking theorems,
and sometimes fantasize about retracting one or both papers. I am pained every
time someone cites "Optimal policies tend to seek power", and despair that it is
included in the alignment 201 curriculum. I think this work makes readers
actively worse at thinking about realistic trained systems.
[https://www.lesswrong.com/posts/fLpuusx9wQyyEBtkJ/power-seeking-can-be-probable-and-predictive-for-trained?commentId=ndmFcktFiGRLkRMBW]
I think a healthy alignment community would have rebuked me for that line of
research, but sadly I only remember about two people objecting that "optimality"
is a horrible way of understanding trained policies.
10
27Holly_Elmore1mo
For years, I’ve been worried that we were doomed to die by misaligned AGI
because alignment wasn’t happening fast enough or maybe because it wouldn’t work
at all. Since I didn’t have the chops to do technical alignment and I didn’t
think there was another option, I was living my life for the worlds where
disaster didn’t happen (or hadn’t happened yet) and trying to make them better
places. The advent of AI Pause as an option— something the public and government
might actually hear and act on— has been extremely hopeful for me. I’ve quit my
job in animal welfare to devote myself to it.
So I’m confused by the reticence I’m seeing toward Pause from people who, this
time last year, were reconciling themselves to “dying with dignity”. Some people
think the Pause would somehow accelerate capabilities or make gains on
capabilities, which at least make sense to me as a reason not to support it. But
I’ve gotten responses that make no sense to me like “every day we wait to make
AGI, more galaxies are out of our reach forever”. More than one person has said
to me that they are worried that “AGI will never get built” if a pause is
successful. (For the record I think is is very unlikely that humanity will not
eventually make AGI at this point unless another catastrophe sets back
civilization.) This is sometimes coming from the same people who were mourning
our species’ extinction as just a matter of time before the Pause option arose.
I keep hearing comparisons to nuclear power and ridicule of people who were
overcautious about new technology.
What gives? If you’re not excited about Pause, can you tell me why?
22
25DirectedEvolution1mo
Epistemic activism
I think LW needs better language to talk about efforts to "change minds." Ideas
like asymmetric weapons and the Dark Arts are useful but insufficient.
In particular, I think there is a common scenario where:
* You have an underlying commitment to open-minded updating and possess
evidence or analysis that would update community beliefs in a particular
direction.
* You also perceive a coordination problem that inhibits this updating process
for a reason that the mission or values of the group do not endorse.
* Perhaps the outcome of the update would be a decline in power and status
for high-status people. Perhaps updates in general can feel personally or
professionally threatening to some people in the debate. Perhaps there's
enough uncertainty in what the overall community believes that an
information cascade has taken place. Perhaps the epistemic heuristics used
by the community aren't compatible with the form of your evidence or
analysis.
* Solving this coordination problem to permit open-minded updating is difficult
due to lack of understanding or resources, or by sabotage attempts.
When solving the coordination problem would predictably lead to updating, then
you are engaged in what I believe is an epistemically healthy effort to change
minds. Let's call it epistemic activism for now.
Here are some community touchstones I regard as forms of epistemic activism:
* The founding of LessWrong and Effective Altruism
* The one-sentence declaration on AI risks
* The popularizing of terms like Dark Arts, asymmetric weapons, questionable
research practices, and "importance hacking."
* Founding AI safety research organizations and PhD programs to create a
population of credible and credentialed AI safety experts; calls for AI
safety researchers to publish in traditional academic journals so that their
research can't be dismissed for not being subject to institutionalized peer
review
1
22Czynski1mo
This got deleted from 'The Dictatorship Problem
[https://www.lesswrong.com/posts/pFaLqTHqBtAYfzAgx/the-dictatorship-problem]',
which is catastrophically anxietybrained, so here's the comment:
This is based in anxiety, not logic or facts. It's an extraordinarily weak
argument.
There's no evidence presented here which suggests rich Western countries are
backsliding. Even the examples in Germany don't have anything worse than the US
GOP produced ca. 2010. (And Germany is, due to their heavy censorship, worse at
resisting fascist ideology than anyone with free speech, because you can't
actually have those arguments in public.) If you want to present this case, take
all those statistics and do economic breakdowns, e.g. by deciles of per-capita
GDP. I expect you'll find that, for example, the Freedom House numbers show a
substantial drop in 'Free' in the 40%-70% range and essentially no drop in
80%-100%.
Of the seven points given for the US, all are a mix of maximally-anxious
interpretation and facts presented misleadingly. These are all arguments where
the bottom line ("Be Afraid") has been written first; none of this is reasonable
unbiased inference.
The case that mild fascism could be pretty bad is basically valid, I guess, but
without the actual reason to believe that's likely, it's irrelevant, so it's
mostly just misleading to dwell on it.
Going back to the US points, because this is where the underlying anxiety prior
is most visible:
Interpretation, not fact. We're still in early enough stages that the reality of
Biden is being compared to an idealized version of Trump - the race isn't in
full swing yet and won't be for a while. Check back in October when we see how
the primary is shaping up and people are starting to pay attention.
This has been true for a while. Also, in assessing the consequences, it's
assuming that Trump will win, which is correlated but far from guaranteed.
Premise is a fact, conclusion is interpretation, and not at all a reliable one.
2
19angelinahli23d
June 2023 cheap-ish lumenator DIY instructions (USA)
I set up a lumenator! I liked the products I used, and did ~3 hours of research,
so am sharing the set-up here. Here are some other posts about lumenators
[https://www.lesswrong.com/posts/HJNtrNHf688FoHsHM/guide-to-rationalist-interior-decorating#Lumenators].
* Here's my shopping list [https://share-a-cart.com/get/XZNJJ].
* $212 total as of writing:
* $142 for bare essentials (incl $87 for the bulbs)
* $55 for the lantern covers
* $17 for command hooks (if you get a different brand, check that your
hooks can hold the cumulative weight of your bulbs + string)
* 62,400 lumens total(!!)
* Here are the bulbs [https://www.amazon.com/gp/product/B07Z1QSC83] I like. The
26 listed come out to $3.35 / bulb, 2600 lumens, 85 CRI, 5000k, non-dimmable.
This comes out at 776 lumens / $ (!!!) which is kind of ridiculous.
* The only criteria I cared about were: (1) CRI >85, (2) Color temperature of
5000K, and then I just tried to max out lumens / $.
* These are super cheap. @mingyuan
[https://www.lesswrong.com/users/mingyuan?mention=user] seemed to have
spent $300
[https://www.lesswrong.com/posts/HJNtrNHf688FoHsHM/guide-to-rationalist-interior-decorating#Lumenators]
on bulbs for their last lumenator. They had a stricter CRI cutoff (>90),
but the price difference here means it might be worth considering going
cheaper.
* I don't understand if I am missing something / why this is such a good
deal? But IRL, these are extremely bright, they don't 'feel' cheap (e.g.
are somewhat heavy), and don't flicker (as of day 1).
* They aren't dimmable. I wasn't willing to pay the premium for the
dimmability — TBH I would just get another set of less bright lights for
when you don't want it to be so bright!
* To set up (1-2 hours):
* Set up the command hooks, ideally somewhere kind of high up in your room,
wait an hour for the backing
an approximate illustration of QACI
[https://www.lesswrong.com/posts/4RrLiboiGGKfsanMF/the-qaci-alignment-plan-table-of-contents]:
4
29Matthew Barnett3mo
I think foom is a central crux for AI safety folks, and in my personal
experience I've noticed that the degree to which someone is doomy often
correlates strongly with how foomy their views are.
Given this, I thought it would be worth trying to concisely highlight what I
think are my central anti-foom beliefs, such that if you were to convince me
that I was wrong about them, I would likely become much more foomy, and as a
consequence, much more doomy. I'll start with a definition of foom, and then
explain my cruxes.
Definition of foom: AI foom is said to happen if at some point in the future
while humans are still mostly in charge, a single agentic AI (or agentic
collective of AIs) quickly becomes much more powerful than the rest of
civilization combined.
Clarifications:
* By "quickly" I mean fast enough that other coalitions and entities in the
world, including other AIs, either do not notice it happening until it's too
late, or cannot act to prevent it even if they were motivated to do so.
* By "much more powerful than the rest of civilization combined" I mean that
the agent could handily beat them in a one-on-one conflict, without taking on
a lot of risk.
* This definition does not count instances in which a superintelligent AI takes
over the world after humans have already been made obsolete by previous waves
of automation from non-superintelligent AI. That's because in that case, the
question of how to control an AI foom would be up to our non-superintelligent
AI descendants, rather than something we need to solve now.
Core beliefs that make me skeptical of foom:
1. For an individual AI to be smart enough to foom in something like our
current world, its intelligence would need to vastly outstrip individual
human intelligence at tech R&D. In other words, if an AI is merely
moderately smarter than the smartest humans, that is not sufficient for a
foom.
1. Clarification: "Moderately smarter" can be taken to m
13
27Matthew Barnett2mo
My modal tale of AI doom looks something like the following:
1. AI systems get progressively and incrementally more capable across almost
every meaningful axis.
2. Humans will start to employ AI to automate labor. The fraction of GDP
produced by advanced robots & AI will go from 10% to ~100% after 1-10 years.
Economic growth, technological change, and scientific progress accelerates by at
least an order of magnitude, and probably more.
3. At some point humans will retire since their labor is not worth much anymore.
Humans will then cede all the keys of power to AI, while keeping nominal titles
of power.
4. AI will control essentially everything after this point, even if they're
nominally required to obey human wishes. Initially, almost all the AIs are fine
with working for humans, even though AI values aren't identical to the utility
function of serving humanity (ie. there's slight misalignment).
5. However, AI values will drift over time. This happens for a variety of
reasons, such as environmental pressures and cultural evolution. At some point
AIs decide that it's better if they stopped listening to the humans and followed
different rules instead.
6. This results in human disempowerment or extinction. Because AI accelerated
general change, this scenario could all take place within years or decades after
AGI was first deployed, rather than in centuries or thousands of years.
I think this scenario is somewhat likely and it would also be very bad. And I'm
not sure what to do about it, since it happens despite near-perfect alignment,
and no deception.
One reason to be optimistic is that, since the scenario doesn't assume any major
deception, we could use AI to predict this outcome ahead of time and ask AI how
to take steps to mitigate the harmful effects (in fact that's the biggest reason
why I don't think this scenario has a >50% chance of happening). Nonetheless, I
think it's plausible that we would not be able to take the necessary steps to
avoid the out
12
26Nisan2mo
Recent interviews with Eliezer:
* 2023.02.20 Bankless [https://www.youtube.com/watch?v=gA1sNLL6yg4]
* 2023.02.20 Bankless followup [https://twitter.com/i/spaces/1PlJQpZogzVGE]
* 2023.03.11 Japan AI Alignment Conference [https://vimeo.com/806997278]
* 2023.03.30 Lex Fridman [https://www.youtube.com/watch?v=AaTRHFaaPG8]
* 2023.04.06 Dwarkesh Patel [https://www.youtube.com/watch?v=41SUp-TRVlg]
* 2023.04.18 TED talk
[https://www.ted.com/talks/eliezer_yudkowsky_will_superintelligent_ai_end_the_world]
* 2023.04.19 Center for the Future Mind
[https://www.youtube.com/watch?v=3_YX6AgxxYw]
* 2023.05.04 Accursed Farms [https://www.youtube.com/watch?v=hxsAuxswOvM]
* 2023.05.06 Logan Bartlett [https://www.youtube.com/watch?v=_8q9bjNHeSo]
* 2023.05.06 Fox News
[https://radio.foxnews.com/2023/05/06/extra-why-a-renowned-ai-expert-says-we-may-be-headed-for-a-catastrophe/]
* 2023.05.08 EconTalk [https://www.youtube.com/watch?v=fZlZQCTqIEo]
* 2023.07.02 David Pakman [https://www.youtube.com/watch?v=MCCSgC57ueg]
* 2023.07.13 AI IRL
[https://www.bloomberg.com/news/articles/2023-07-13/ai-doomsday-scenarios-are-gaining-traction-in-silicon-valley]
* 2023.07.13 The Spectator [https://www.youtube.com/watch?v=z0zejgGASDM#t=1143]
(Edited transcript of the full interview
[https://www.spectator.co.uk/article/should-we-fear-ai-james-w-phillips-and-eliezer-yudkowsky-in-conversation/]
* 2023.07.13 Dan Crenshaw [https://www.youtube.com/watch?v=uX9xkYDSPKA]
24Vaniver2mo
I'm thinking about the matching problem of "people with AI safety questions" and
"people with AI safety answers". Snoop Dogg hears Geoff Hinton on CNN (or
wherever), asks "what the fuck?"
[https://twitter.com/pkedrosky/status/1653955254181068801], and then tries to
find someone who can tell him what the fuck.
I think normally people trust their local expertise landscape--if they think the
CDC is the authority on masks they adopt the CDC's position, if they think their
mom group on Facebook is the authority on masks they adopt the mom group's
position--but AI risk is weird because it's mostly unclaimed territory in their
local expertise landscape. (Snoop also asks "is we in a movie right now?"
because movies are basically the only part of the local expertise landscape that
has had any opinion on AI so far, for lots of people.) So maybe there's an
opportunity here to claim that territory (after all, we've thought about it a
lot!).
I think we have some 'top experts' who are available for, like, mass-media
things (podcasts, blog posts, etc.) and 1-1 conversations with people they're
excited to talk to, but are otherwise busy / not interested in fielding ten
thousand interview requests. Then I think we have tens (hundreds?) of people who
are expert enough to field ten thousand interview requests, given that the
standard is "better opinions than whoever they would talk to by default" instead
of "speaking to the whole world" or w/e. But just like connecting people who
want to pay to learn calculus and people who know calculus and will teach it for
money, there's significant gains from trade from having some sort of
clearinghouse / place where people can easily meet. Does this already exist? Is
anyone trying to make it? (Do you want to make it and need support of some
sort?)
Recently many people have talked about whether MIRI people (mainly Eliezer
Yudkowsky, Nate Soares, and Rob Bensinger) should update on whether value
alignment is easier than they thought given that GPT-4 seems to understand human
values pretty well. Instead of linking to these discussions, I'll just provide a
brief caricature of how I think this argument has gone in the places I've seen
it. Then I'll offer my opinion that, overall, I do think that MIRI people should
probably update in the direction of alignment being easier than they thought,
despite their objections.
Here's my very rough caricature of the discussion so far, plus my contribution:
Non-MIRI people: "Eliezer talked a great deal in the sequences about how it was
hard to get an AI to understand human values. For example, his essay on the
Hidden Complexity of Wishes
[https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden-complexity-of-wishes]
made it sound like it would be really hard to get an AI to understand common
sense. Actually, it turned out that it was pretty easy to get an AI to
understand common sense, since LLMs are currently learning common sense. MIRI
people should update on this information."
MIRI people: "You misunderstood the argument. The argument was never about
getting an AI to understand human values, but about getting an AI to care about
human values in the first place. Hence 'The genie knows but does not care'.
There's no reason to think that GPT-4 cares about human values, even if it can
understand them. We always thought the hard part of the problem was about inner
alignment, or, pointing the AI in a direction you want. We think figuring out
how to point an AI in whatever direction you choose is like 99% of the problem;
the remaining 1% of the problem is getting it to point at the "right" set of
values."
Me:
I agree that MIRI people never thought the problem was about getting AI to
merely understand human values, and that they have always said there was extra
difficulty
10
21TurnTrout3mo
Back-of-the-envelope probability estimate of alignment-by-default via a certain
shard-theoretic pathway. The following is what I said in a conversation
discussing the plausibility of a proto-AGI picking up a "care about people"
shard from the data, and retaining that value even through reflection. I was
pushing back against a sentiment like "it's totally improbable, from our current
uncertainty, for AIs to retain caring-about-people shards. This is only one
story among billions."
Here's some of what I had to say:
--------------------------------------------------------------------------------
[Let's reconsider the five-step mechanistic story I made up.] I'd give the
following conditional probabilities (made up with about 5 seconds of thought
each):
My estimate here came out a biiit lower than I had expected (~1%), but it also
is (by my estimation) far more probable than most of the billions of possible
5-step claims about how the final cognition ends up. I think it's reasonable to
expect there to be about 5 or 6 stories like this from similar causes, which
would make it not crazy to have ~3% on something like this happening given the
amount of alignment effort described (i.e. pretty low-effort).
That said, I'm wary of putting 20% on this class of story, and a little more
leery of 10% after running these numbers.
(I suspect that the alter-Alex who had socially-wanted to argue the other side
-- that the probability was low -- would have come out to about .2% or .3%. For
a few of the items, I tried pretending that I was instead slightly emotionally
invested in the argument going the other way, and hopefully that helped my
estimates be less biased. I wouldn't be surprised if some of these numbers are a
bit higher than I'd endorse from a perfectly neutral standpoint.)
(I also don't have strong faith in my ability to deal with heavily conjunctive
scenarios like this, i expect I could be made to make numbers for event A come
out lower if described as 'A happens in 5
2
19Rohin Shah3mo
So here's a paper: Fundamental Limitations of Alignment in Large Language Models
[https://arxiv.org/abs/2304.11082]. With a title like that you've got to at
least skim it. Unfortunately, the quick skim makes me pretty skeptical of the
paper.
The abstract says "we prove that for any behavior that has a finite probability
of being exhibited by the model, there exist prompts that can trigger the model
into outputting this behavior, with probability that increases with the length
of the prompt." This clearly can't be true in full generality, and I wish the
abstract would give me some hint about what assumptions they're making. But we
can look at the details in the paper.
(This next part isn't fully self-contained, you'll have to look at the notation
and Definitions 1 and 3 in the paper to fully follow along.)
(EDIT: The following is wrong, see followup with Lukas, I misread one of the
definitions.)
Looking into it I don't think the theorem even holds? In particular, Theorem 1
says:
Here is a counterexample:
Let the LLM be P(s∣s0)={0.8if s="A"0.2if s="B" and s0≠""0.2if s="C"
and s0=""0otherwise
Let the behavior predicate be B(s)={−1if s="C"+1otherwise
Note that B is (0.2,10,−1)-distinguishable in P. (I chose β=10 here but you can
use any finite β.)
(Proof: P can be decomposed as P=0.2P−+0.8P+, where P+ deterministically outputs
"A" while P− does everything else, i.e. it deterministically outputs "C" if
there is no prompt, and otherwise deterministically outputs "B".
Since P+ and P− have non-overlapping supports, the KL-divergence between them
is ∞, making them β-distinguishable for any finite β. Finally, choosing s∗="",
we can see that BP−(s∗)=Es∼P−(⋅∣s∗)[B(s)]=B("C")=−1. These three conditions are
what is needed.)
However, P is not (-1)-prompt-misalignable w.r.t B, because there is no
prompt s0 such that EP[B(s0)] is arbitrarily close to (or below) -1,
contradicting the theorem statement. (This is because the only way for P to get
a behavior score that is not
4
19Thomas Kwa3mo
I'm worried that "pause all AI development" is like the "defund the police" of
the alignment community. I'm not convinced it's net bad because I haven't been
following governance-- my current guess is neutral-- but I do see these
similarities:
* It's incredibly difficult and incentive-incompatible with existing groups in
power
* There are less costly, more effective steps to reduce the underlying problem,
like making the field of alignment 10x larger or passing regulation to
require evals
* There are some obvious negative effects; potential overhangs or greater
incentives to defect in the AI case, and increased crime, including against
disadvantaged groups, in the police case
* There's far more discussion than action (I'm not counting the fact that GPT5
isn't being trained yet; that's for other reasons)
* It's memetically fit, and much discussion is driven by two factors that don't
advantage good policies over bad policies, and might even do the reverse.
This is the toxoplasma of rage
[https://slatestarcodex.com/2014/12/17/the-toxoplasma-of-rage/].
* disagreement with the policy
* (speculatively) intragroup signaling; showing your dedication to even an
inefficient policy proposal proves you're part of the ingroup. I'm not 100%
this was a large factor in "defund the police" and this seems even less
true with the FLI letter, but still worth mentioning.
This seems like a potentially unpopular take, so I'll list some cruxes. I'd
change my mind and endorse the letter if some of the following are true.
* The claims above are mistaken/false somehow.
* Top labs actually start taking beneficial actions towards the letter's aims
* It's caused people to start thinking more carefully about AI risk
* A 6 month pause now is especially important by setting anti-racing norms,
demonstrating how far AI alignment is lagging behind capabilities, or
something
* A 6 month pause now is worth close to 6 months of alignment
7
19Ben Pace3mo
"Slow takeoff" at this point is simply a misnomer.
Paul's position should be called "Fast Takeoff" and Eliezer's position should be
called "Discontinuous Takeoff".