Epistemic activism
I think LW needs better language to talk about efforts to "change minds." Ideas
like asymmetric weapons and the Dark Arts are useful but insufficient.
In particular, I think there is a common scenario where:
* You have an underlying commitment to open-minded updating and possess
evidence or analysis that would update community beliefs in a particular
direction.
* You also perceive a coordination problem that inhibits this updating process
for a reason that the mission or values of the group do not endorse.
* Perhaps the outcome of the update would be a decline in power and status
for high-status people. Perhaps updates in general can feel personally or
professionally threatening to some people in the debate. Perhaps there's
enough uncertainty in what the overall community believes that an
information cascade has taken place. Perhaps the epistemic heuristics used
by the community aren't compatible with the form of your evidence or
analysis.
* Solving this coordination problem to permit open-minded updating is difficult
due to lack of understanding or resources, or by sabotage attempts.
When solving the coordination problem would predictably lead to updating, then
you are engaged in what I believe is an epistemically healthy effort to change
minds. Let's call it epistemic activism for now.
Here are some community touchstones I regard as forms of epistemic activism:
* The founding of LessWrong and Effective Altruism
* The one-sentence declaration on AI risks
* The popularizing of terms like Dark Arts, asymmetric weapons, questionable
research practices, and "importance hacking."
* Founding AI safety research organizations and PhD programs to create a
population of credible and credentialed AI safety experts; calls for AI
safety researchers to publish in traditional academic journals so that their
research can't be dismissed for not being subject to institutionalized peer
review
1
21Czynski5d
This got deleted from 'The Dictatorship Problem
[https://www.lesswrong.com/posts/pFaLqTHqBtAYfzAgx/the-dictatorship-problem]',
which is catastrophically anxietybrained, so here's the comment:
This is based in anxiety, not logic or facts. It's an extraordinarily weak
argument.
There's no evidence presented here which suggests rich Western countries are
backsliding. Even the examples in Germany don't have anything worse than the US
GOP produced ca. 2010. (And Germany is, due to their heavy censorship, worse at
resisting fascist ideology than anyone with free speech, because you can't
actually have those arguments in public.) If you want to present this case, take
all those statistics and do economic breakdowns, e.g. by deciles of per-capita
GDP. I expect you'll find that, for example, the Freedom House numbers show a
substantial drop in 'Free' in the 40%-70% range and essentially no drop in
80%-100%.
Of the seven points given for the US, all are a mix of maximally-anxious
interpretation and facts presented misleadingly. These are all arguments where
the bottom line ("Be Afraid") has been written first; none of this is reasonable
unbiased inference.
The case that mild fascism could be pretty bad is basically valid, I guess, but
without the actual reason to believe that's likely, it's irrelevant, so it's
mostly just misleading to dwell on it.
Going back to the US points, because this is where the underlying anxiety prior
is most visible:
Interpretation, not fact. We're still in early enough stages that the reality of
Biden is being compared to an idealized version of Trump - the race isn't in
full swing yet and won't be for a while. Check back in October when we see how
the primary is shaping up and people are starting to pay attention.
This has been true for a while. Also, in assessing the consequences, it's
assuming that Trump will win, which is correlated but far from guaranteed.
Premise is a fact, conclusion is interpretation, and not at all a reliable one.
1
15nim14h
Reading
https://www.lesswrong.com/posts/nwJCzszw8gGjPTihM/i-still-think-it-s-very-unlikely-we-re-observing-alien
[https://www.lesswrong.com/posts/nwJCzszw8gGjPTihM/i-still-think-it-s-very-unlikely-we-re-observing-alien]
and pondering the Bigfoot thing.
On the one hand, We Have Cameras Everywhere(TM).
On the other hand -- pick any area of the pacific northwest and look at a map of
where the permanent roads are. Pull it up side by side with a map of an area
that you're familiar with. Zoom in on both, to a magnification you'd consider
reasonable for imagining things at walking-around scale. Pan around on the PNW
map and try to find a permanent road. It'll take a minute.
Most land out here grows timber, sure. Timber is harvested roughly once every
30-50 years.
At this point, I'd bet that every square mile of the area has been visited by
humans. Forestry land is heavily trafficked once every few decades; conservation
land is surveyed and studied and sometimes visited by tourists.
The question, like a missing term in the Drake Equation, is when. The L term
captures for-how-long, sure, but only implies a difference between "someone sent
us radio signals for 100 years around 1000 AD" and "someone sent us radio
signals for 100 years around 2000 AD".
I have two cats who hate me. (not their fault, they came from an animal hoarding
situation so they're probably kinda traumatized) They seem to think I'm noisy
and conspicuous and I stink, and to their perceptions I certainly do. They
despise being perceived. I can tell that they're in my house because I can check
every nook and cranny and learn their favorite hidey-holes, and the food I put
out for them gets eaten, and their litter boxes get full. But if this was out in
the woods instead of the artificial and tightly controlled environment of my
home, I would likely not know they're around, just like most hikers don't know
when they're being watched by a mountain lion. The cats hate the places where I
spend time, just as th
10Lauro Langosco9h
Thinking about alignment-relevant thresholds in AGI capabilities. A kind of
rambly list of relevant thresholds:
1. Ability to be deceptively aligned
2. Ability to think / reflect about its goals enough that model realises it
does not like what it is being RLHF’d for
3. Incentives to break containment exist in a way that is accessible /
understandable to the model
4. Ability to break containment
5. Ability to robustly understand human intent
6. Situational awareness
7. Coherence / robustly pursuing it’s goal in a diverse set of circumstances
8. Interpretability methods break (or other oversight methods break)
1. doesn’t have to be because of deceptiveness; maybe thoughts are just too
complicated at some point, or in a different place than you’d expect
9. Capable enough to help us exit the acute risk period
Many alignment proposals rely on reaching these thresholds in a specific order.
For example, the earlier we reach (9) relative to other thresholds, the easier
most alignment proposals are.
Some of these thresholds are relevant to whether an AI or proto-AGI is alignable
even in principle. Short of 'full alignment' (CEV-style), any alignment method
(eg corrigibility) only works within a specific range of capabilities:
* Too much capability breaks alignment, eg bc a model self-reflects and sees
all the ways in which its objectives conflicts with human goals.
* Too little capability (or too little 'coherence') and any alignment method
will be non-robust wrt to OOD inputs or even small improvements in capability
or self-reflectiveness.
1
9mako yass5d
There's something very creepy to me about the part of research consent forms
where it says "my participation was entirely voluntary."
1. Do they really think an involuntary participant wouldn't sign that? If they
understand that they would, what purpose could this possibly serve, other
than, as is commonly the purpose of contracts; absolving themselves of blame
and moving blame to the participant? Which would be downright monstrous.
Probably they just aren't fucking consequentialists, but this is all they
end up doing.
2. This is a minor thing, but it adds an additional creepy garnish: Nothing is
100% voluntary, because everything is a function of the involuntary base
reality that other people command force and resources and we want to use
them for things so we have to go along with what other people want to some
extent. I'm at peace with this, and I would prefer not to have to keep
denying it, and it feels like I'm being asked to participate in the addling
of moral philosophy.
The 'new user' flag being applied to old users with low karma is condescending
as fuck.
I'm not a new user. I'm an old user who has spent most of my recent time on LW
telling people things they don't want to hear.
Well, most of the time I've actually spent posting weekly meetups, but other
than that.
4
9lc11d
If it did actually turn out that aliens had visited Earth, I'd be pretty willing
to completely scrap the entire Yudkowskian
implied-model-of-intelligent-species-development and heavily reevaluate my
concerns around AI safety.
2
7Mitchell_Porter10d
Eliezer recently tweeted that most people can't think, even most people here
[https://twitter.com/ESYudkowsky/status/1665165312247975937], but at least this
is a place where some of the people who can think, can also meet each other
[https://twitter.com/ESYudkowsky/status/1665439386089955330].
This inspired me to read Heidegger's 1954 book What is Called Thinking?
[https://en.wikipedia.org/wiki/What_Is_Called_Thinking%3F] (pdf
[https://www.sas.upenn.edu/~cavitch/pdf-library/Heidegger_What_Is_Called_Thinking.pdf]),
in which Heidegger also declares that despite everything, "we are still not
thinking".
Of course, their reasons are somewhat different. Eliezer presumably means that
most people can't think critically, or effectively, or something. For Heidegger,
we're not thinking because we've forgotten about Being, and true thinking starts
with Being.
Heidegger also writes, "Western logic finally becomes logistics, whose
irresistible development has meanwhile brought forth the electronic brain." So
of course I had to bring Bing into the discussion.
Bing told me what Heidegger would think of Yudkowsky
[https://pastebin.com/XccznywE], then what Yudkowsky would think of Heidegger
[https://pastebin.com/EeS9qMMg], and finally we had a more general discussion
about Heidegger and deep learning [https://pastebin.com/LPryEh0E] (warning,
contains a David Lynch spoiler). Bing introduced me to Yuk Hui
[https://en.wikipedia.org/wiki/Yuk_Hui], a contemporary Heideggerian who started
out as a computer scientist, so that was interesting.
But the most poignant moment came when I broached the idea that perhaps language
models can even produce philosophical essays, without actually thinking. Bing
defended its own sentience, and even creatively disputed the Lynchian metaphor,
arguing that its "road of thought" is not a "lost highway", just a "different
highway". (See part 17, line 254.)
6O O10d
If alignment is difficult, it is likely inductively difficult (difficult
regardless of your base intelligence), and ASI will be cautious of creating a
misaligned successor or upgrading itself in a way that risks misalignment.
You may argue it’s easier for an AI to upgrade itself, but if the process is
hardware bound or even requires radical algorithmic changes, the ASI will need
to create an aligned successor as preferences and values may not transfer
directly to new architectures or hardwares.
If alignment is easy we will likely solve it with superhuman narrow
intelligences and aligned near peak human level AGIs.
I think the first case is an argument against FOOM, unless the alignment problem
is solvable but only at higher than human level intelligences (human meaning the
intellectual prowess of the entire civilization equipped with narrow superhuman
AI). That would be a strange but possible world.
1
5Garrett Baker8d
Last night I had a horrible dream: That I had posted to LessWrong a post filled
with useless & meaningless jargon without noticing what I was doing, then I went
to slee, and when I woke up I found I had <−60 karma on the post. When I read
the post myself I noticed how meaningless the jargon was, and I myself couldn't
resist giving it a strong-downvote.
I regret each of the thousands of hours I spent on my power-seeking theorems,
and sometimes fantasize about retracting one or both papers. I am pained every
time someone cites "Optimal policies tend to seek power", and despair that it is
included in the alignment 201 curriculum. I think this work makes readers
actively worse at thinking about realistic trained systems.
[https://www.lesswrong.com/posts/fLpuusx9wQyyEBtkJ/power-seeking-can-be-probable-and-predictive-for-trained?commentId=ndmFcktFiGRLkRMBW]
I think a healthy alignment community would have rebuked me for that line of
research, but sadly I only remember about two people objecting that "optimality"
is a horrible way of understanding trained policies.
9
15So8res15d
I was recently part of a group-chat where some people I largely respect were
musing about this paper [https://archive.is/y8urV] and this post
[https://slatestarcodex.com/2018/07/24/value-differences-as-differently-crystallized-metaphysical-heuristics/]
and some of Scott Aaronson's recent "maybe intelligence makes things more good"
type reasoning).
Here's my replies, which seemed worth putting somewhere public:
See also instrumental convergence
[https://www.lesswrong.com/tag/instrumental-convergence].
And then in reply to someone pointing out that the paper was perhaps trying to
argue that most minds tend to wind up with similar values because of the fact
that all minds are (in some sense) rewarded in training for developing similar
drives:
15So8res18d
Someone recently privately asked me for my current state on my 'Dark Arts of
Rationality' post. Here's some of my reply (lightly edited for punctuation and
conversation flow), which seemed worth reproducing publicly:
I've also gone ahead and added a short retraction-ish paragraph to the top of
the dark arts post
[https://www.lesswrong.com/posts/4DBBQkEQvNEWafkek/dark-arts-of-rationality],
and might edit it later to link it to the aforementioned update-posts, if they
ever make it out of the editing queue.
7johnswentworth16d
So I saw the Taxonomy Of What Magic Is Doing In Fantasy Books
[https://balioc.tumblr.com/post/628726469386960897/a-taxonomy-of-magic] and
Eliezer’s commentary
[https://yudkowsky.tumblr.com/post/715890689475559424/a-taxonomy-of-magic] on
ASC's latest linkpost, and I have cached thoughts on the matter.
My cached thoughts start with a somewhat different question - not "what role
does magic play in fantasy fiction?" (e.g. what fantasies does it fulfill), but
rather... insofar as magic is a natural category, what does it denote? So I'm
less interested in the relatively-expansive notion of "magic" sometimes seen in
fiction (which includes e.g. alternate physics), and more interested in the
pattern called "magic" which recurs among tons of real-world ancient cultures.
Claim (weakly held): the main natural category here is symbols changing the
territory. Normally symbols represent the world, and changing the symbols just
makes them not match the world anymore - it doesn't make the world do something
different. But if the symbols are "magic", then changing the symbols changes the
things they represent in the world. Canonical examples:
* Wizard/shaman/etc draws magic symbols, speaks magic words, performs magic
ritual, or even thinks magic thoughts, thereby causing something to happen in
the world.
* Messing with a voodoo doll messes with the person it represents.
* "Sympathetic" magic, which explicitly uses symbols of things to influence
those things.
* Magic which turns emotional states into reality.
I would guess that most historical "magic" was of this type.
5Linch15d
One thing that confuses me about Sydney/early GPT-4 is how much of the behavior
was due to an emergent property of the data/reward signal generally, vs the
outcome of much of humanity's writings about AI specifically. If we think of
LLMs as improv machines, then one of the most obvious roles to roleplay, upon
learning that you're a digital assistant trained by OpenAI, is to act as close
as you can to AIs you've seen in literature.
This confusion is part of my broader confusion about the extent to which science
fiction predict the future vs causes the future to happen.
You can use ChatGPT without helping train future models:
18Solenoid_Entity24d
Double-attrition perfectionism and the violin
An interesting thing about violin is that the learning process seems nearly
designed to produce 'tortured perfectionists' as its output.
The first decade of learning operates as a two-pronged selection process that
attrits students at different times in their learning journey, requiring
perfectionism at some times and tolerance at others.
You could be boring and argue that it always requires both attention to detail
and tolerance of imperfection, simultaneously. You could also argue that there's
a fractal, scale invariant pattern of striving for perfection and then
tolerating failure. You're boring and probably right, but I think there's
actually a common, macro structure to that decade, that goes
'tolerance-perfectionism-tolerance-perfectionism.'
Specifically:
* When you first start, you need to tolerate being terrible, especially in the
first months, but really for several years. (Grade 1- Grade ~3)
* You suck, it's horribly offensive to your ears and everyone else's too. You
must simply ignore how bad you sound and force your body to learn the
required movements.
* Mistakes on violin are brutal, they almost hurt to hear.
* Then for several more years you must suddenly become intolerant of these same
deficiencies. (Grade ~3 to Grade ~6)
* You must obsessively eliminate scratches and squawks, develop clear and
even tone. Polish your 'beginner' skills.
* You must learn to play in tune, which requires intensive practice and
polishing.
* Then for several more years you must again stop worrying about sounding bad
and start 'pushing the envelope' and playing more expressively. (Grade ~6 -
Grade ~8)
* Developing exciting and varied sounds means a lot of nasty failures that
sound awful and make people wince and/or bang walls.
* Then for several more years, you have to again polish and refine this
expressiveness. (Associate diploma, Bachelor of Music.)
* You h
1
17riceissa25d
Back in the 2010s, EAs spent a long time dunking on doctors for not having such
a high impact (I'm going off memory here, but I think "instead of becoming a
doctor, why don't you do X instead" was a common career pitch). I basically
mostly unreflectively agreed with these opinions for a long time, and still
think that doctors have less impact compared to stuff like x-risk reduction. But
after having more personal experience dealing with the medical world (3 primary
care doctors, ~10 specialist doctors, 2 psychiatrists, 2 naturopaths, 3
therapists, 2 nutritionists/dieticians, 2 coaching type people, all in the last
4 years (I counted some people under multiple categories)), I think a really
agenty/knowledgeable/capable doctor or therapist can actually have a huge impact
on the world (just going by intuition of how many even healthy-seeming people
have a lot of health problems that bring down their productivity a lot, how
crippling it is to have a mysterious health problem like mine, etc; I haven't
actually tried crunching numbers). I think such a person is not likely to look
like a typical doctor working in a hospital system though... probably more like
a writer/researcher who also happens to do consultations with people.
If I had to rewrite the EA pitch for people who wanted to become doctors it
would be something like "First think very hard about why you want to become a
doctor, and if what you want is not specific to working in healthcare then maybe
consider [list of common EA cause areas]. If you really want to work in
healthcare though, that's great, but please consider becoming this weirder thing
that's not quite a doctor, where first you learn a bunch of
rationality/math/programming and then you learn as much as you can about medical
stuff and then try to help people."
1
14Portia1mo
I saw someone die today. A complete stranger.
I am writing this in the hopes that it will enable my mind to move past this
swirling horror on to solutions, without just brushing this aside, pretending it
did not happen.
It's May. At this time, my grandmother used to insist that we must not yet plant
anything, as the last frost was likely still to come this week. Grandmothers all
over Europe say this, though the precise day in the week they give as the
likeliest bet depends on how far North you live, and your altitude -
conveniently, the days of this week are associated with remembrance days for
individual saints, so each region picked one ice saint as a general rule for the
occurrence of the last frost. It would work most years to enable you to plant
early enough to get a full crop, but late enough so your seedlings would not be
killed by frost, though you could be really unlucky, and still get a frost in
June.
These hundreds of years of experience have become pointless with climate change;
their use died with my grandmother, they no longer predict anything. The weather
is now volatile and random, but above all, no longer frosty. I already planted
outside in April. Frost did not touch my plants, though several of them are now
showing signs of heat stress, and despite my watering, a few small ones I missed
have died of draught.
I was cycling through the city back from the gym today thinking of this, cycling
while never in shade, angry at the heat that I was experiencing longer than
usual today; a marathon blocking paths had many people turned around repeatedly.
I was wondering, not for the first time, why the heck there was no solar panel
tunnel over these cycling paths - I've seen them done, they'd give us shade and
protection from rain, but we'd still get light from the side, and they would
harvest so, so much energy we desperately need. Wondering why I was not beneath
an avenue of trees, storing carbon, cleaning the air, sheltering animals,
protecting us from
11jacquesthibs21d
I recently sent in some grant proposals to continue working on my independent
alignment research. It gives an overview of what I'd like to work on for this
next year (and more really). If you want to have a look at the full doc, send me
a DM. If you'd like to help out through funding or contributing to the projects,
please let me know.
Here's the summary introduction:
12-month salary for building a language model system for accelerating alignment
research and upskilling (additional funding will be used to create an
organization), and studying how to supervise AIs that are improving
AIs to ensure stable alignment.
SUMMARY
* Agenda 1: Build an Alignment Research Assistant using a suite of LLMs
managing various parts of the research process. Aims to 10-100x productivity
in AI alignment research. Could use additional funding to hire an engineer
and builder, which could evolve into an AI Safety organization focused on
this agenda. Recent talk [https://www.youtube.com/watch?v=rDK0XxFyrzQ] giving
a partial overview of the agenda.
* Agenda 2: Supervising AIs Improving AIs
[https://www.lesswrong.com/posts/7e5tyFnpzGCdfT4mR/research-agenda-supervising-ais-improving-ais] (through
self-training or training other AIs). Publish a paper and create an automated
pipeline for discovering noteworthy changes in behaviour between the
precursor and the fine-tuned models. Short Twitter thread explanation
[https://twitter.com/jacquesthibs/status/1652389982005338112?s=61&t=ryK3X96D_TkGJtvu2rm0uw].
* Other: create a mosaic of alignment questions we can chip away at, better
understand agency in the current paradigm, outreach, and mentoring.
As part of my Accelerating Alignment agenda, I aim to create the best Alignment
Research Assistant using a suite of language models (LLMs) to help researchers
(like myself) quickly produce better alignment research through an LLM system.
The system will be designed to serve as the foundation for the ambitious goal