One way of viewing takeoff speed disagreements within the safety community is:
most people agree that growth will eventually be explosively fast, and so the
question is "how big an impact does AI have prior to explosive growth?" We could
quantify this by looking at the economic impact of AI systems prior to the point
when AI is powerful enough to double output each year.
(We could also quantify it via growth dynamics, but I want to try to get some
kind of evidence further in advance which requires looking at AI in
particular---on both views, AI only has a large impact on total output fairly
close to the singularity.)
The "slow takeoff" view is that general AI systems will grow to tens of
trillions of dollars a year of revenue in the years prior to explosive growth,
and so by the time they have automated the entire economy it will look like a
natural extension of the prior trend.
(Cost might be a more robust measure than revenue, especially if AI output is
primarily reinvested by large tech companies, and especially on a slow takeoff
view with a relatively competitive market for compute driven primarily by
investors. Revenue itself is very sensitive to unimportant accounting questions,
like what transactions occur within a firm vs between firms.)
The fast takeoff view is that the pre-takeoff impact of general AI will be...
smaller. I don't know exactly how small, but let's say somewhere between $10
million and $10 trillion, spanning 6 orders of magnitude. (This reflects a low
end that's like "ten people in a basement" and a high end that's just a bit shy
of the slow takeoff view.)
It seems like growth in AI has already been large enough to provide big updates
in this discussion. I'd guess total revenue from general and giant deep learning
systems[1] will probably be around $1B in 2023 (and perhaps much higher if there
is a lot of stuff I don't know about). It also looks on track to grow to $10
billion over the next 2-3 years if not faster. It seems easy to see h
25
47Adam Scherlis6mo
EDIT: I originally saw this in Janus's tweet here:
https://twitter.com/repligate/status/1619557173352370186
[https://twitter.com/repligate/status/1619557173352370186]
Something fun I just found out about: ChatGPT perceives the phrase "
SolidGoldMagikarp" (with an initial space) as the word "distribute", and will
respond accordingly. It is completely unaware that that's not what you typed.
This happens because the BPE tokenizer saw the string " SolidGoldMagikarp" a few
times in its training corpus, so it added a dedicated token for it, but that
string almost never appeared in ChatGPT's own training data so it never learned
to do anything with it. Instead, it's just a weird blind spot in its
understanding of text.
7
40Matthew Barnett3mo
Recently many people have talked about whether MIRI people (mainly Eliezer
Yudkowsky, Nate Soares, and Rob Bensinger) should update on whether value
alignment is easier than they thought given that GPT-4 seems to understand human
values pretty well. Instead of linking to these discussions, I'll just provide a
brief caricature of how I think this argument has gone in the places I've seen
it. Then I'll offer my opinion that, overall, I do think that MIRI people should
probably update in the direction of alignment being easier than they thought,
despite their objections.
Here's my very rough caricature of the discussion so far, plus my contribution:
Non-MIRI people: "Eliezer talked a great deal in the sequences about how it was
hard to get an AI to understand human values. For example, his essay on the
Hidden Complexity of Wishes
[https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden-complexity-of-wishes]
made it sound like it would be really hard to get an AI to understand common
sense. Actually, it turned out that it was pretty easy to get an AI to
understand common sense, since LLMs are currently learning common sense. MIRI
people should update on this information."
MIRI people: "You misunderstood the argument. The argument was never about
getting an AI to understand human values, but about getting an AI to care about
human values in the first place. Hence 'The genie knows but does not care'.
There's no reason to think that GPT-4 cares about human values, even if it can
understand them. We always thought the hard part of the problem was about inner
alignment, or, pointing the AI in a direction you want. We think figuring out
how to point an AI in whatever direction you choose is like 99% of the problem;
the remaining 1% of the problem is getting it to point at the "right" set of
values."
Me:
I agree that MIRI people never thought the problem was about getting AI to
merely understand human values, and that they have always said there was extra
difficulty
10
36Portia4mo
Why don't most AI researcher engage with Less Wrong? What valuable criticism can
be learnt from it, and how can it be pragmatically changed?
My girlfriend just returned from a major machine learning conference. She judged
less than 1/18 of the content was dedicated to AI safety rather than capability,
despite an increasing number of the people at the conference being confident of
AGI in the future (like, roughly 10-20 years, though people avoided nailing down
a specific number). And the safety talk was more of a shower thought.
And yet, Less Wrong and MIRI Eliezer are not mentioned in these circles. I do
not mean, they are dissed, or disproven; I mean you can be at the full
conference on the topic by the top people in the world and have no hint of a
sliver of an idea that any of this exists. They generally don't read what you
read and write, they don't take part in what you do, or let you take part in
what they do. You aren't enough in the right journals, the right conferences, to
be seen. From the perspective of academia, and the companies working on these
things, the people who are actually making decisions on how they are releasing
their models and what policies are being made, what is going on here is barely
heard, if at all. There are notable exceptions, like Bostrom - but as a
consequence of that, he is viewed with scepticism within many academic cycles.
Why do you think AI researchers are making the decisions to not engage with you?
What lessons are to be learned from that for tactical strategy changes that will
be crucial to affect developments? What part of it reflects legitimate criticism
you need to take to heart? And what will you do about it, in light of the fact
that you cannot control what AI reseachers do, regardless of whether it is
well-founded or irrational?
I am genuinely curious how you view this, especially in light of changes you can
do, rather than changes you expect researchers to do. So far, I feel a lot of
the criticism has only harde
7
35lc6mo
The Nick Bostrom fiasco is instructive: never make public apologies to an
outrage machine. If Nick had just ignored whoever it was trying to blackmail
him, it would have been on them to assert the importance of a twenty-five year
old deliberately provocative email, and things might not have ascended to the
point of mild drama. When he tried to "get ahead of things" by issuing an
apology, he ceded that the email was in fact socially significant despite its
age, and that he did in fact have something to apologize for, and so opened
himself up to the Standard Replies that the apology is not genuine, he's
secretly evil etc. etc.
Instead, if you are ever put in this situation, just say nothing. Don't try to
defend yourself. Definitely don't volunteer for a struggle session.
Treat outrage artists like the police. You do not prevent the police from filing
charges against you by driving to the station and attempting to "explain
yourself" to detectives, or by writing and publishing a letter explaining how
sorry you are. At best you will inflate the airtime of the controversy by
responding to it, at worst you'll be creating the controversy in the first
place.
A number of years ago, when LessWrong was being revived from its old form to its
new form, I did not expect the revival to work. I said as much at the time. For
a year or two in the middle there, the results looked pretty ambiguous to me.
But by now it's clear that I was just completely wrong--I did not expect the
revival to work as well as it has to date.
Oliver Habryka in particluar wins Bayes points off of me. Hooray for being right
while I was wrong, and for building something cool!
11
63evhub1y
This is a list of random, assorted AI safety ideas that I think somebody should
try to write up and/or work on at some point. I have a lot more than this in my
backlog, but these are some that I specifically selected to be relatively small,
single-post-sized ideas that an independent person could plausibly work on
without much oversight. That being said, I think it would be quite hard to do a
good job on any of these without at least chatting with me first—though feel
free to message me if you’d be interested.
* What would be necessary to build a good auditing game
[https://www.alignmentforum.org/posts/cQwT8asti3kyA62zc/automating-auditing-an-ambitious-concrete-technical-research]
benchmark?
* How would AI safety AI
[https://www.alignmentforum.org/posts/fYf9JAwa6BYMt8GBj/link-a-minimal-viable-product-for-alignment]
work? What is necessary for it to go well?
* How do we avoid end-to-end training while staying competitive with it? Can we
use transparency on end-to-end models to identify useful modules to train
non-end-to-end?
* What would it look like to do interpretability on end-to-end trained
probabilistic models instead of end-to-end trained neural networks?
* Suppose you had a language model that you knew was in fact a good generative
model of the world and that this property continued to hold regardless of
what you conditioned it on. Furthermore, suppose you had some prompt that
described some agent for the language model to simulate (Alice) that in
practice resulted in aligned-looking outputs. Is there a way we could use
different conditionals to get at whether or not Alice was deceptive (e.g.
prompt the model with “DeepMind develops perfect transparency tools and
provides an opportunity for deceptive models to come clean and receive a
prize before they’re discovered.”).
* Argue for the importance of ensuring that the state-of-the-art in “using AI
for alignment
[https://www.alignmentforum.org/posts/fYf9J
10
62TurnTrout1y
Rationality exercise: Take a set of Wikipedia articles on topics which trainees
are somewhat familiar with, and then randomly select a small number of claims to
negate (negating the immediate context as well, so that you can't just
syntactically discover which claims were negated).
For example [https://en.wikipedia.org/wiki/Developmental_psychology ]:
Sometimes, trainees will be given a totally unmodified article. For brevity, the
articles can be trimmed of irrelevant sections.
Benefits:
* Addressing key rationality skills. Noticing confusion; being more confused by
fiction than fact; actually checking claims against your models of the world.
* If you fail, either the article wasn't negated skillfully ("5 people died
in 2021" -> "4 people died in 2021" is not the right kind of modification),
you don't have good models of the domain, or you didn't pay enough
attention to your confusion.
* Either of the last two are good to learn.
* Scalable across participants. Many people can learn from each modified
article.
* Scalable across time. Once a modified article has been produced, it can be
used repeatedly.
* Crowdsourcable. You can put out a bounty for good negated articles, run them
in a few control groups, and then pay based on some function of how good the
article was. Unlike original alignment research or CFAR technique mentoring,
article negation requires skills more likely to be present outside of
Rationalist circles.
I think the key challenge is that the writer must be able to match the style,
jargon, and flow of the selected articles.
3
61lc10mo
It is both absurd, and intolerably infuriating, just how many people on this
forum think it's acceptable to claim they have figured out how
qualia/consciousness works, and also not explain how one would go about making
my laptop experience an emotion like 'nostalgia', or present their framework for
enumerating the set of all possible qualitative experiences[1]. When it comes to
this particular subject, rationalists are like crackpot physicists with a pet
theory of everything, except rationalists go "Huh? Gravity?" when you ask them
to explain how their theory predicts gravity, and then start arguing with you
about gravity needing to be something explained by a theory of everything. You
people make me want to punch my drywall sometimes.
For the record: the purpose of having a "theory of consciousness" is so it can
tell us which blobs of matter feel particular things under which specific
circumstances, and teach others how to make new blobs of matter that feel
particular things. Down to the level of having a field of AI anaesthesiology. If
your theory of consciousness does not do this, perhaps because the sum total of
your brilliant insights are "systems feel 'things' when they're, y'know, smart,
and have goals. Like humans!", then you have embarassingly missed the mark.
1. ^
(Including the ones not experienced by humans naturally, and/or only
accessible via narcotics, and/or involve senses humans do not have or have
just happened not to be produced in the animal kingdom)
5
57Daniel Kokotajlo1y
The whiteboard in the CLR common room depicts my EA journey in meme format:
Shared with permission, a google doc exchange confirming Eliezer still finds the
arguments for alignment optimism, slower takeoffs, etc. unconvincing:
Caveat: this was a private reply I saw and wanted to share (so people know EY's
basic epistemic state, and therefore probably the state of other MIRI
leadership). This wasn't an attempt to write an adequate public response to any
of the public arguments put forward for alignment optimism or non-fast takeoff,
etc., and isn't meant to be a replacement for public, detailed, object-level
discussion. (Though I don't know when/if MIRI folks plan to produce a proper
response, and if I expected such a response soonish I'd probably have just
waited and posted that instead.)
19
56Vanessa Kosoy2y
Text whose primary goal is conveying information (as opposed to emotion,
experience or aesthetics) should be skimming friendly. Time is expensive, words
are cheap. Skimming is a vital mode of engaging with text, either to evaluate
whether it deserves a deeper read or to extract just the information you need.
As a reader, you should nurture your skimming skills. As a writer, you should
treat skimmers as a legitimate and important part of your target audience. Among
other things it means:
* Good title and TLDR/abstract
* Clear and useful division into sections
* Putting the high-level picture and conclusions first, the technicalities and
detailed arguments later. Never leave the reader clueless about where you’re
going with something for a long time.
* Visually emphasize the central points and make them as self-contained as
possible. For example, in the statement of mathematical theorems avoid
terminology whose definition is hidden somewhere in the bulk of the text.
51Buck2y
[this is a draft that I shared with a bunch of friends a while ago; they raised
many issues that I haven't addressed, but might address at some point in the
future]
In my opinion, and AFAICT the opinion of many alignment researchers, there are
problems with aligning superintelligent models that no alignment techniques so
far proposed are able to fix. Even if we had a full kitchen sink approach where
we’d overcome all the practical challenges of applying amplification techniques,
transparency techniques, adversarial training, and so on, I still wouldn’t feel
that confident that we’d be able to build superintelligent systems that were
competitive with unaligned ones, unless we got really lucky with some empirical
contingencies that we will have no way of checking except for just training the
superintelligence and hoping for the best.
Two examples:
* A simplified version of the hope with IDA is that we’ll be able to have our
system make decisions in a way that never had to rely on searching over
uninterpretable spaces of cognitive policies. But this will only be
competitive if IDA can do all the same cognitive actions that an unaligned
system can do, which is probably false, eg cf Inaccessible Information.
* The best we could possibly hope for with transparency techniques is: For
anything that a neural net is doing, we are able to get the best possible
human understandable explanation of what it’s doing, and what we’d have to
change in the neural net to make it do something different. But this doesn’t
help us if the neural net is doing things that rely on concepts that it’s
fundamentally impossible for humans to understand, because they’re too
complicated or alien. It seems likely to me that these concepts exist. And so
systems will be much weaker if we demand interpretability.
Even though these techniques are fundamentally limited, I think there are still
several arguments in favor of sorting out the practical details of how to
6
50davidad2y
I want to go a bit deep here on "maximum entropy" and misunderstandings thereof
by the straw-man Humbali character
[https://www.alignmentforum.org/posts/ax695frGJEzGxFBK4/biology-inspired-agi-timelines-the-trick-that-never-works#:~:text=Humbali%3A%20%C2%A0I%20feel,other%20people%20think%3F],
mostly to clarify things for myself, but also in the hopes that others might
find it useful. I make no claim to novelty here—I think all this ground was
covered by Jaynes (1968 [https://bayes.wustl.edu/etj/articles/prior.pdf])—but I
do have a sense that this perspective (and the measure-theoretic intuition
behind it) is not pervasive around here, the way Bayesian updating is.
First, I want to point out that entropy of a probability measure p is only
definable relative to a base measure μ, as follows:
Hμ(p)=−∫Xdpdμ(x)logdpdμ(x)dμ(x)
(The derivatives notated here denote Radon-Nikodym derivatives
[https://en.wikipedia.org/wiki/Radon%E2%80%93Nikodym_theorem]; the integral is
Lebesgue [https://en.wikipedia.org/wiki/Lebesgue_integration].) Shannon's
formulae, the discrete H(p)=−∑ip(xi)logp(xi) and the
continuous H(p)=−∫Xp(x)logp(x)dx, are the special cases of this where μ is
assumed to be counting measure or Lebesgue measure, respectively. These formulae
actually treat p as having a subtly different type than "probability measure":
namely, they treat it as a density with respect to counting measure (a
"probability mass function") or a density with respect to Lebesgue measure (a
"probability density function"), and implicitly supply the corresponding μ.
If you're familiar with Kullback–Leibler divergence
[https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence] (DKL), and
especially if you've heard DKL called "relative entropy," you may have already
surmised that Hμ(p)=−DKL(p||μ). Usually, KL divergence is defined with both
arguments being probability measures (measures that add up to 1), but that's not
required for it to be well-defined (what is required is absolute con
1
49jimrandomh2y
In a comment here
[https://www.lesswrong.com/posts/Btrmh6T62tB4g9RMc/why-those-who-care-about-catastrophic-and-existential-risk?commentId=ifnD8DCqX2FFTagoq],
Eliezer observed that:
And my reply to this grew into something that I think is important enough to
make as a top-level shortform post.
It's worth noticing that this is not a universal property of high-paranoia
software development, but a an unfortunate consequence of using the C
programming language and of systems programming. In most programming languages
and most application domains, crashes only rarely point to security problems.
OpenBSD is this paranoid, and needs to be this paranoid, because its
architecture is fundamentally unsound (albeit unsound in a way that all the
other operating systems born in the same era are also unsound). This presents a
number of useful analogies that may be useful for thinking about future AI
architectural choices.
C has a couple of operations (use-after-free, buffer-overflow, and a few
multithreading-related things) which expand false beliefs in one area of the
system into major problems in seemingly-unrelated areas. The core mechanic of
this is that, once you've corrupted a pointer or an array index, this generates
opportunities to corrupt other things. Any memory-corruption attack surface you
search through winds up yielding more opportunities to corrupt memory, in a
supercritical way, eventually eventually yielding total control over the process
and all its communication channels. If the process is an operating system
kernel, there's nothing left to do; if it's, say, the renderer process of a web
browser, then the attacker gets to leverage its communication channels to attack
other processes, like the GPU driver and the compositor. This has the same
sub-or-supercriticality dynamic.
Some security strategies try to keep there from being any entry points into the
domain where there might be supercritically-expanding access: memory-safe
languages, linters, code reviews. C
One fairly strong belief of mine is that Less Wrong's epistemic standards are
not high enough to make solid intellectual progress here. So far my best effort
to make that argument has been in the comment thread starting here
[https://www.lesswrong.com/posts/HekjhtWesBWTQW5eF/agis-as-populations?commentId=5pmTAQrvWtoE4AWYe].
Looking back at that thread, I just noticed that a couple
[https://www.lesswrong.com/posts/HekjhtWesBWTQW5eF/agis-as-populations?commentId=5dLZGTiqAEydBhLKm]
of those comments
[https://www.lesswrong.com/posts/HekjhtWesBWTQW5eF/agis-as-populations?commentId=xGnpeNj3gdb8vHoaK]
have been downvoted to negative karma. I don't think any of my comments have
ever hit negative karma before; I find it particularly sad that the one time it
happens is when I'm trying to explain why I think this community is failing at
its key goal of cultivating better epistemics.
There's all sorts of arguments to be made here, which I don't have time to lay
out in detail. But just step back for a moment. Tens or hundreds of thousands of
academics are trying to figure out how the world works, spending their careers
putting immense effort into reading and producing and reviewing papers. Even
then, there's a massive replication crisis. And we're trying to produce reliable
answers to much harder questions by, what, writing better blog posts, and hoping
that a few of the best ideas stick? This is not what a desperate effort to find
the truth looks like.
68jimrandomh3y
I am now reasonably convinced (p>0.8) that SARS-CoV-2 originated in an
accidental laboratory escape from the Wuhan Institute of Virology.
1. If SARS-CoV-2 originated in a non-laboratory zoonotic transmission, then the
geographic location of the initial outbreak would be drawn from a distribution
which is approximately uniformly distributed over China (population-weighted);
whereas if it originated in a laboratory, the geographic location is drawn from
the commuting region of a lab studying that class of viruses, of which there is
currently only one. Wuhan has <1% of the population of China, so this is (order
of magnitude) a 100:1 update.
2. No factor other than the presence of the Wuhan Institute of Virology and
related biotech organizations distinguishes Wuhan or Hubei from the rest of
China. It is not the location of the bat-caves that SARS was found in; those are
in Yunnan. It is not the location of any previous outbreaks. It does not have
documented higher consumption of bats than the rest of China.
3. There have been publicly reported laboratory escapes of SARS twice before in
Beijing, so we know this class of virus is difficult to contain in a laboratory
setting.
4. We know that the Wuhan Institute of Virology was studying SARS-like bat
coronaviruses. As reported in the Washington Post today, US diplomats had
expressed serious concerns about the lab's safety.
5. China has adopted a policy of suppressing research into the origins of
SARS-CoV-2, which they would not have done if they expected that research to
clear them of scandal. Some Chinese officials are in a position to know.
To be clear, I don't think this was an intentional release. I don't think it was
intended for use as a bioweapon. I don't think it underwent genetic engineering
or gain-of-function research, although nothing about it conclusively rules this
out. I think the researchers had good intentions, and screwed up.
58ESRogs3y
I've been meaning for a while to be more public about my investing, in order to
share ideas with others and get feedback. Ideally I'd like to write up my
thinking in detail, including describing what my target portfolio would be if I
was more diligent about rebalancing (or didn't have to worry about tax
planning). I haven't done either of those things. But, in order to not let the
perfect be the enemy of the good, I'll just share very roughly what my current
portfolio is.
My approximate current portfolio (note: I do not consider this to be optimal!):
* 40% TSLA
* 35% crypto -- XTZ, BTC, and ETH (and small amounts of LTC, XRP, and BCH)
* 25% startups -- Kinta AI [http://www.kinta-ai.com/], Coase [https://coa.se/],
and General Biotics [https://www.generalbiotics.com/]
* 4% diversified index funds
* 1% SQ (an exploratory investment -- there are some indications that I'd want
to bet on them, but I want to do more research. Putting in a little bit of
money forces me to start paying attention.)
* <1% FUV (another exploratory investment)
* -5% cash
Some notes:
* Once VIX comes down, I'll want to lever up a bit. Likely by increasing the
allocation to index funds (and going more short cash).
* One major way this portfolio differs from the portfolio in my heart is that
it has no exposure to Stripe. If it was easy to do, I would probably allocate
something like 5-10% to Stripe.
* I have a high risk tolerance. I think both dispositionally, and because I buy
1) the argument from Lifecycle Investing
[https://www.lesswrong.com/posts/4wL5rcS97rw58G98B/review-of-lifecycle-investing]
that young(ish) people should be something like 2x leveraged and, 2) the
argument that some EAs have made that people who plan to donate a lot should
be closer to risk neutral than they otherwise would be. (Because your
donations are a small fraction of the pool going to similar causes, so the
utility in money is much closer to linear than for money yo
43TurnTrout3y
For the last two years, typing for 5+ minutes hurt my wrists. I tried a lot of
things: shots, physical therapy, trigger-point therapy, acupuncture, massage
tools, wrist and elbow braces at night, exercises, stretches. Sometimes it got
better. Sometimes it got worse.
No Beat Saber, no lifting weights, and every time I read a damn book I would
start translating the punctuation into Dragon NaturallySpeaking syntax.
Have you ever tried dictating a math paper in LaTeX? Or dictating code? Telling
your computer "click" and waiting a few seconds while resisting the temptation
to just grab the mouse? Dictating your way through a computer science PhD?
And then.... and then, a month ago, I got fed up. What if it was all just in my
head, at this point? I'm only 25. This is ridiculous. How can it possibly take
me this long to heal such a minor injury?
I wanted my hands back - I wanted it real bad. I wanted it so bad that I did
something dirty: I made myself believe something. Well, actually, I pretended to
be a person who really, really believed his hands were fine and healing and the
pain was all psychosomatic.
And... it worked, as far as I can tell. It totally worked. I haven't dictated in
over three weeks. I play Beat Saber as much as I please. I type for hours and
hours a day with only the faintest traces of discomfort.
What?
2
36Rohin Shah3y
I often have the experience of being in the middle of a discussion and wanting
to reference some simple but important idea / point, but there doesn't exist any
such thing. Often my reaction is "if only there was time to write an LW post
that I can then link to in the future". So far I've just been letting these
ideas be forgotten, because it would be Yet Another Thing To Keep Track Of. I'm
now going to experiment with making subcomments here simply collecting the
ideas; perhaps other people will write posts about them at some point, if
they're even understandable.