Possible post on suspicious multidimensional pessimism:
I think MIRI people (specifically Soares and Yudkowsky but probably others too) are more pessimistic than the alignment community average on several different dimensions, both technical and non-technical: morality, civilizational response, takeoff speeds, probability of easy alignment schemes working, and our ability to usefully expand the field of alignment. Some of this is implied by technical models, and MIRI is not more pessimistic in every possible dimension, but it's still awfully suspicious.
I strongly suspect that one of the following is true:
the MIRI "optimism dial" is set too low
everyone else's "optimism dial" is set too high. (Yudkowsky has said this multiple times in different contexts)
There are common generators that I don't know about that are not just an "optimism dial", beyond MIRI's models
I'm only going to actually write this up if there is demand; the full post will have citations which are kind of annoying to find.
I really want to see the post on multidimensional pessimism.
As for why, I'd argue 1 is happening.
For examples of 1, a good example of this is FOOM probabilities. I think MIRI
hasn't updated on the evidence that FOOM is likely impossible for classical
computers, and this ought to lower their probabilities to the chance that
quantum/reversible computers appear.
Another good example is the emphasis on pivotal acts like "burn all GPUs." I
think MIRI has too much probability mass on it being necessary, primarily
because I think that they are biased by fiction, where problems must be solved
by heroic acts, while in the real world more boring things are necessary. In
other words, it's too exciting, which should be suspicious.
However that doesn't mean alignment is much easier. We can still fail, there's
no rule that we make it through. It's that MIRI is systematically irrational
here regarding doom probabilities or alignment.
3hairyfigment7mo
What constitutes pessimism about morality, and why do you think that one fits
Eliezer? He certainly appears more pessimistic across a broad area, and has
hinted at concrete arguments for being so.
2Thomas Kwa7mo
Value fragility / value complexity. How close do you need to get to human values
to get 50% of the value of the universe, and how complicated must the
representation be? Also in the past there was orthogonality, but that's now
widely believed.
3Vladimir_Nesov7mo
I think the distance from human values or complexity of values is not a crux, as
web/books corpus overdetermines them in great detail (for corrigibility
purposes). It's mostly about alignment by default, whether human values in
particular can be noticed in there, or if correctly specifying how to find them
is much harder than finding some other deceptively human-value-shaped thing. If
they can be found easily once there are tools to go looking for them at all, it
doesn't matter how complex they are or how important it is to get everything
right, that happens by default.
But also there is this pervasive assumption of it being possible to formulate
values in closed form, as tractable finite data, which occasionally fuels
arguments. Like, value is said to be complex, but of finite complexity. In an
open environment, this doesn't need to be the case, a code/data distinction is
only salient when we can make important conclusions by only looking at code and
not at data. In an open environment, data is unbounded, can't be demonstrated
all at once. So it doesn't make much sense to talk about complexity of values at
all, without corrigibility alignment can't work out anyway.
2hairyfigment7mo
See, MIRI in the past has sounded dangerously optimistic to me on that score.
While I thought EY sounded more sensible than the people pushing genetic
enhancement of humans, it's only now that I find his presence reassuring, thanks
in part to the ongoing story he's been writing. Otherwise I might be yelling at
MIRI to be more pessimistic about fragility of value, especially with regard to
people who might wind up in possession of a corrigible 'Tool AI'.
2RobertM7mo
I'd be very interested in a write-up, especially if you have receipts for
pessimism which seems to be poorly calibrated, e.g. based on evidence contrary
to prior predictions.
0the gears to ascension7mo
I think they pascals-mugged themselves and being able to prove they were wrong
efficiently would be helpful
Maybe this is too tired a point, but AI safety really needs exercises-- tasks that are interesting, self-contained (not depending on 50 hours of readings), take about 2 hours, have clean solutions, and give people the feel of alignment research.
I found some of the SERI MATS application questions better than Richard Ngo's exercises for this purpose, but there still seems to be significant room for improvement. There is currently nothing smaller than ELK (which takes closer to 50 hours to develop a proposal for and properly think about it) that I can point technically minded people to and feel confident that they'll both be engaged and learn something.
If you let me know the specific MATS application questions you like, I'll
probably add them to my exercises.
(And if you let me know the specific exercises of mine you don't like, I'll
probably remove them.)
2Viliam10mo
Not sure if this is what you want, but I can imagine an exercise in Goodharting.
You are given the criteria for a reward and the thing they were supposed to
maximize, your task is to figure out the (least unlikely) way to score very high
on the criteria without doing to well on the intended target.
For example: Goal = make the people in the call center more productive. Measure
= your salary depends on how many phone calls you handle each day. Intended
behavior = picking up the phone quickly, trying to solve the problems quickly.
Actual behavior = "accidentally" dropping phone calls after a few seconds so
that the customer has to call you again (and that counts by the metric as two
phone calls answered).
Another example: Goal = make the software developers more productive. Measure 1
= number of lines of code written. Measure 2 = number of bugs fixed.
I am proposing this because it seems to me that from a 30000 foot view, a big
part of AI alignment is how to avoid Goodharting. ("Goal = create a happy and
prosperous future for humanity. Measure = something that sounds very smart and
scientific. Actual behavior = universe converted to paperclips, GDP successfully
maximized.")
Say I need to publish an anonymous essay. If it's long enough, people could plausibly deduce my authorship based on the writing style; this is called stylometry. The only stylometry-defeating tool I can find is Anonymouth; it hasn't been updated in 7 years and it's unclear if it can defeat modern AI. Is there something better?
Are LLMs advanced enough now that you can just ask GPT-N to do style transfer?
1Tao Lin1mo
if I were doing this, I'd use gpt-4 to translate it into the style of a specific
person, preferably a deceased public figure, then edit the result. I'd guess
GPTs are better at translating to a specific style than removing style
I'm worried that "pause all AI development" is like the "defund the police" of the alignment community. I'm not convinced it's net bad because I haven't been following governance-- my current guess is neutral-- but I do see these similarities:
It's incredibly difficult and incentive-incompatible with existing groups in power
There are less costly, more effective steps to reduce the underlying problem, like making the field of alignment 10x larger or passing regulation to require evals
There are some obvious negative effects; potential overhangs or greater incentives to defect in the AI case, and increased crime, including against disadvantaged groups, in the police case
There's far more discussion than action (I'm not counting the fact that GPT5 isn't being trained yet; that's for other reasons)
It's memetically fit, and much discussion is driven by two factors that don't advantage good policies over bad policies, and might even do the reverse. This is the toxoplasma of rage.
disagreement with the policy
(speculatively) intragroup signaling; showing your dedication to even an inefficient policy proposal proves you're part of the ingroup. I'm not 100% this was a large factor in "defund the
The obvious dis-analogy is that if the police had no funding and largely ceased to exist, a string of horrendous things would quickly occur. Murders and thefts and kidnappings and rapes and more would occur throughout every country in which it was occurring, people would revert to tight-knit groups who had weapons to defend themselves, a lot of basic infrastructure would probably break down (e.g. would Amazon be able to pivot to get their drivers armed guards?) and much more chaos would ensue.
And if AI research paused, society would continue to basically function as it has been doing so far.
One of them seems to me like a goal that directly causes catastrophes and a breakdown of society and the other doesn't.
Fair point. Another difference is that the pause is popular! 66-69% in favor of
the pause, and 41% think AI would do more harm than good
[https://www.monmouth.edu/polling-institute/reports/monmouthpoll_us_021523/] vs
9% for more good than harm.
6quetzal_rainbow1mo
This statement begs for cost-benefit analysis.
Increasing size of alignment field can be efficient, but it won't be cheap. You
need to teach new experts in the field that doesn't have any polised
standardized educational programs and doesn't have much of teachers. If you want
not only increase number of participants in the field, but increase productivity
of the field 10x, you need an extraordinary educational effort.
Passing regulation to require evals seems like a meh idea. Nobody knows in
enough details how to make such evalutions and every wrong idea that makes its
way to law will be here until the end of the world.
9Thomas Kwa1mo
I'd be much happier with increasing participants enough to equal 10-20% of the
field of ML than a 6 month unconditional pause, and my guess is it's less
costly. It seems like leading labs allowing other labs to catch up by 6 months
will reduce their valuations more than 20%, whereas diverting 10-20% of their
resources would reduce valuations only 10% or so.
There are currently 300 alignment researchers. If we take additional researchers
from the pool of 30k people who attended ICML, you get 3000 researchers, and if
they're equal quality this is 10x participants. I wouldn't expect alignment to
go 10x faster, more like 2x with a decent educational effort. But this is in
perpetuity and should speed up alignment by far more than 6 months. There's the
question of getting labs to pay if they're creating most of the harms, which
might be hard though.
I'd be excited about someone doing a real cost-benefit analysis here, or
preferably coming up with better ideas. It just seems so unlikely that a 6 month
pause is close to the most efficient thing, given it destroys much of the value
of a company that has a large lead.
2TurnTrout1mo
Why does this have to be true? Can't governments just compensate existing AGI
labs for the expected commercial value of their foregone future advances due to
indefinite pause?
2Thomas Kwa24d
This seems good if it could be done. But the original proposal was just a call
for labs to individually pause their research, which seems really unlikely to
work.
Also, the level of civilizational competence required to compensate labs seems
to be higher than for other solutions. I don't think it's a common regulatory
practice to compensate existing labs like this, and it seems difficult to work
out all the details so that labs will feel adequately compensated. Plus there
might be labs that irrationally believe they're undervalued. Regulations similar
to the nuclear or aviation industry feel like a more plausible way to get
slowdown, and have the benefit that they actually incentivize safety work.
I had a long-ish conversation with John Wentworth and want to make it known that I could probably write up any of the following distillations if I invested lots of time into them (about a day (edit: 3 days seems more likely) of my time and an hour of John's). Reply if you're really interested in one of them.
What is the type signature of a utility function?
Utility functions must be defined with respect to an external world-model
Infinite money-pumps are not required for incoherence, and not likely in practice. The actual incoherent behavior is that an agent could get to states A_1 or A_2, identical except that A_1 has more money, and chooses A_2. Implications.
Why VNM is not really a coherence theorem. Other coherence theorems relating to EU maximization simultaneously derive Bayesian probabilities and utilities. VNM requires an external frequentist notion of probabilities.
I wish we had polling. Anyway if you made four individual comments, one for
each, I’d weak upvote the first and last.
2Dagon1y
1 and 2 are the same writeup, I think. Utility function maps contingent future
universe-state to a preference ranking (ordinal or cardinal, depending). This
requires a world-model because the mentally-projected future states under
consideration are always and only results of one's models.
If you/he are just saying that money pumps are just one way to show incoherence,
but not the only way, I'd enjoy a writeup of other ways.
I'd also enjoy a writeup of #4 - I'm curious if it's just a directionality
argument (VNM assumes coherence, rather than being about it), or if there's more
subtle differences.
Below is a list of powerful optimizers ranked on properties, as part of a brainstorm on whether there's a simple core of consequentialism that excludes corrigibility. I think that AlphaZero is a moderately strong argument that there is a simple core of consequentialism which includes inner search.
Properties
Simple: takes less than 10 KB of code. If something is already made of agents (markets and the US government) I marked it as N/A.
Coherent: approximately maximizing a utility function most of the time. There are other definitions:
Not being money-pumped
Nate Soares's notion in the MIRI dialogues: having all your actions point towards a single goal
Adversarially coherent: something like "appears coherent to weaker optimizers" or "robust to perturbations by weaker optimizers". This implies that it's incorrigible.
will achieve high utility even when "disrupted" by an optimizer somewhat less powerful
Search+WM: operates by explicitly ranking plans within a world-model. Evolution is a search process, but doesn't have a world-model. The contact with the territory it gets comes from
One of the obvious first problems is that pretty much every mountain and most of
the hills in the world will experience increasingly frequent landslides as much
of their structural strength is eaten, releasing huge plumes of dust that blot
out the sun and stay in the atmosphere. Continental shelves collapse into the
oceans, causing tsunamis and the oceans fill with the suspended nanobot dust.
Biological photosynthesis pretty much ceases, and the mean surface temperature
drops below freezing as most of the sunlight power is intercepted in the
atmosphere and redirected through the dust to below the surface where half the
rocks are being turned into more dust.
If the bots are efficient with their use of solar power this could start
happening within weeks, far too fast for humans to do anything to preserve their
civilization. Almost all concrete contains at least moderate amounts of
feldspars, so a large fraction of the structures in the world collapse when
their foundations rot away beneath them.
Most of the people probably die by choking on the dust while the remainder
freeze or die of thirst, whichever comes first in their local situation.
2Dagon9mo
It's hard to imagine these constraints actually holding up well, nor the
unstated constraint that the ability to make nanobots is limited to this one
type.
My actual prediction depends a whole lot on timeframes - how fast do they
replicate, how long to dust-ify all the feldspar. If it's slow enough
(millenia), probably no real harm - the dust re-solidifies into something else,
or gets into an equilibrium where it's settling and compressing as fast as the
nanos can dustify it. Also, humans have plenty of time to adapt and engineer
workarounds to any climate or other changes.
If they replicate fast, over the course of weeks, it's probably an extinction
event for all of earth life. Dust shuts out the sun, all surface features are
undermined and collapse, everything is dead and even the things that survive
don't have enough of a cycle to continue very long.
Antifreeze proteins prevent water inside organisms from freezing, allowing them to survive at temperatures below 0 °C. They do this by actually binding to tiny ice crystals and preventing them from growing further, basically keeping the water in a supercooled state. I think this is fascinating.
Is it possible for there to be nanomachine enzymes (not made of proteins, because they would denature) that bind to tiny gas bubbles in solution and prevent water from boiling above 100 °C?
Is there a well-defined impact measure to use that's in between counterfactual value and Shapley value, to use when others' actions are partially correlated with yours?
A large part of the post would be proofs about what the distributions of X and V must be for limt→∞E[V|V+X>t]=0, where X and V are independent random variables with mean zero. It's clear that
Doesn't answer your question, but we also came across this effect in the RM
Goodharting work, though instead of figuring out the details we only proved that
it when it's definitely not heavy tailed it's monotonic, for Regressional
Goodhart (https://arxiv.org/pdf/2210.10760.pdf#page=17
[https://arxiv.org/pdf/2210.10760.pdf#page=17]). Jacob probably has more
detailed takes on this than me.
In any event my intuition is this seems unlikely to be the main reason for
overoptimization - I think it's much more likely that it's Extremal Goodhart or
some other thing where the noise is not independent
3Arthur Conmy1mo
Is bullet point one true, or is there a condition that I'm not assuming? E.g if
$V$ is the constant $0$ random variable and $X$ is $N(0, 1)$ then the limit
result holds, but a Gaussian is neither heavy- nor long-tailed
[https://en.wikipedia.org/wiki/Heavy-tailed_distribution#Definitions].
The most efficient form of practice is generally to address one's weaknesses. Why, then, don't chess/Go players train by playing against engines optimized for this? I can imagine three types of engines:
Trained to play more human-like sound moves (soundness as measured by stronger engines like Stockfish, AlphaZero).
Trained to play less human-like sound moves.
Trained to win against (real or simulated) humans while making unsound moves.
The first tool would simply be an opponent when humans are inconvenient or not available. The second and third tools wo
Someone happened to ask a post on Stack Exchange about engines trained to play
less human-like sound moves. The question is here
[https://chess.stackexchange.com/q/29435/21886], but most of the answerers don't
seem to understand the question.
The author of "Where Is My Flying Car" says that the Feynman Program (teching up to nanotechnology by machining miniaturized parts, which are assembled into the tools for micro-scale machining, which are assembled into tools for yet smaller machining, etc) might be technically feasible and the only reason we don't have it is that no one's tried it yet. But this seems a bit crazy for the following reasons:
The author doesn't seem like a domain expert
AFAIK this particular method of nanotechnology was just an idea Feynman had in the famous speech and not a
I'm not a domain expert in micromachines, but have studied at least miniature
machines as part of a previous job.
One very big problem is volume. Once you get down below tonne scale, making and
assembling small parts with fine tolerances is not really any less expensive
than making and assembling larger parts with comparatively the same tolerances.
That is, each one-gram machine made of a thousand parts probably won't cost you
any less than a hundred-kilogram machine made of a thousand parts. It will
almost certainly cost more, since it will require new techniques to make,
assemble, and operate at the smaller scale. The cost of maintenance per machine
almost certainly goes up since there are more layers of indirection in diagnosis
and rectification of problems.
So this doesn't scale down at all: attention is a limiting factor. With advanced
extrapolations from current techniques, maybe we could eventually make nanogram
robot arms for merely the same cost as hundred kilogram robot arms. That doesn't
help much if each one costs $10,000 and needs maintenance every few weeks. We
need some way to make a trillion of them for $10k, and for them to do what we
want without any individual attention at all.
5Gunnar_Zarncke2y
Seems like the key claim:
Can you give any hint why that is or could be?
3JBlack2y
I wasn't ever involved with manufacture of the individual parts, so I don't have
direct experience.
I suspect it's just that as you go smaller, material costs become negligible
compared with process costs. Process costs don't change much, because you still
need humans to oversee the machines carrying out the processes, and there are
similar numbers of processes with as many steps involved no matter how large or
small the parts are. The processes themselves might be different, because some
just can't scale down below a certain size for physics reasons, but it doesn't
get easier at smaller scales.
Also, direct human labour still plays a fairly crucial role in most processes.
There are (so far) always some things to be done where human capabilities exceed
those of any machine we can build at reasonable cost.
2ChristianKl2y
Wikipedia [https://en.wikipedia.org/wiki/J._Storrs_Hall]describes the author as
saying:
What do you mean by "domain expert" is that doesn't count him as being one?
3Thomas Kwa2y
I think a MEMS engineer would be better suited to evaluate whether the
engineering problems are feasible than a computer scientist / futurist author.
Maybe futurists could outdo ML engineers on AI forecasting. But I think the
author doesn't have nearly as detailed an inside view about nanotech as
futurists on AI. There's no good answer in the book to the "attention
bottleneck" objection JBlack just made, and no good story for why the market is
so inefficient.
These are all ideas of the form "If we could make fully general nanotechnology,
then we could do X". Gives me the same vibe as this
[https://www.google.com/books/edition/Surely_You_re_Joking_Mr_Feynman_Adventur/_gA_DwAAQBAJ?hl=en&gbpv=1&dq=feynman+patents+zoom&pg=PT182&printsec=frontcover].
Saying "nuclear reactor. . . you have hydrogen go through the thing. . . Zoom!
it's a rocket" doesn't mean you can evaluate whether a nuclear reactor is
feasible at 194X tech level, and thinking of the utility fog doesn't mean you
can evaluate whether MEMS can be developed into general nanotech at 202X tech
level.
3gjm2y
I can't comment on what JBlack means by "domain expert", but looking at that
list of things about Hall, what I see is:
* "Involved in", which means nothing.
* Founded and moderated a newsgroup: requires no particular domain expertise.
* Founding chief scientist of Nanorex Inc for two years. I can't find any
evidence that Nanorex ever produced anything other than a piece of software
that claimed to do molecular dynamics suitable for simulating nanotech.
Whether it was actually any good, I have no idea, but the company seems not
to have survived. Depending on what exactly the responsibilities of the
"founding chief scientist" are, this could be evidence that Hall understands
a lot about molecular dynamics, or evidence that Hall is a good software
developer, or evidence of nothing at all. In the absence of more information
about Nanorex and their product, it doesn't tell us much.
* Has written several papers on nanotechnology: anyone can write a paper. A
quick look for papers he's written turns up some abstracts, all of which seem
like high-level "here's a concept that may be useful for nanotech" ones. Such
a paper could be very valuable and demonstrate deep insight, but the test of
that would be actually turning out to be useful for nanotech and so far as I
can tell his ideas haven't led to anything much.
* Developed ideas such as utility fog, space pier, etc.: again, anyone can
"develop ideas". The best test of the idea-developer's insight is whether
those ideas turn out actually to be of any use. So far, we don't seem close
to having utility fog, space piers, weather control or flying cars.
* Author of "Nanofuture": pop-science book, which from descriptions I've read
seems mostly to be broad general principles about nanotech that doesn't exist
yet, and exciting speculations about future nanotech thatt doesn't exist yet.
* Fellow of a couple of things: without knowing exactly what their criteria are
for
4Gunnar_Zarncke2y
I guess very few people live up to your requirements for domain expertise.
In nanotech? True enough, because I am not convinced that there is any domain expertise in the sort of nanotech Storrs Hall writes about. It seems like a field that consists mostly of advertising. (There is genuine science and genuine engineering in nano-stuff; for instance, MEMS really is a thing. But the sort of "let's build teeny-tiny mechanical devices, designed and built at the molecular level, which will be able to do amazing things previously-existing tech can't" that Storrs Hall has advocated seems not to have panned out.)
But more generally, that isn't so at all. What I'm looking for by way of domain expertise in a technological field is a history of demonstrated technological achievements. Storrs Hall has one such achievement that I can see, and even that is doubtful. (He founded and was "chief scientist" of a company that made software for simulating molecular dynamics. I am not in a position to tell either how well the software actually worked or how much of it was JSH's doing.) More generally, I want to see a history of demonstrated difficult accomplishments in the field, as opposed to merely writing about the field.
Selecting some random books from my shelves (literally... (read more)
Thank you for this comprehensive answer. I like the requirement of "actual
practical accomplishments in the field".
Googling a bit I found this article on miniaturization:
https://www.designnews.com/miniaturization-not-just-electronics-anymore
[https://www.designnews.com/miniaturization-not-just-electronics-anymore]
Would you consider the cited Thomas L. Hicks from American Laubscher a domain
expert?
4gjm2y
He certainly looks like one to my (itself rather inexpert) eye.
Is it possible to make an hourglass that measures different amounts of time in one direction than the other? Say, 25 minutes right-side up, and 5 minutes upside down, for pomodoros. Moving parts are okay (flaps that close by gravity or something) but it should not take additional effort to flip.
I don't see why this wouldn't be possible? It seems pretty straightforward to me; the only hard part would be the thing that seems hard about making any hourglass, which is getting it to take the right amount of time, but that's a problem hourglass manufacturers have already solved. It's just a valve that doesn't close all the way:
Unless you meant, "how can I make such an hourglass myself, out of things I have at home?" in which case, idk bro.
One question I have about both your solution and mine is how easy it is to vary
the time drastically by changing the size of the hole. My intuition says that
too large holes behave much differently than smaller holes and if you want a
drastic 5x difference in speed you might get into this "too large and the sand
sort of just rushes through" behavior.
4effective-egret2y
While I'm sure there's a mechanical solution, my preferred solution (in terms of
implementation time) would be to simply buy two hourglasses - one that measures
25 minutes and one that measures 5 minutes - and alternate between them.
2Gunnar_Zarncke2y
Or just bundle them together like this:
https://www.amazon.de/Bredemeijer-B0011-Classic-Teatimer-Edelstahl/dp/B00SN5U5E0/
4Matt Goldenberg2y
First thought is to have two separate holes of slightly different sizes, each
one blocked by a little angled platform from one direction. I am not at all
confident you could get this to work in practice
Ideally, one would be able to type in e.g. "growth mindset" or a link to Dweck's original research, and see:
a statement of the idea e.g. 'When "students believe their basic abilities, their intelligence, their talents, are just fixed traits", they underperform students who "understand that their talents and abilities can be developed through effort, good teaching and persistence." Carol Dweck initially studied
Coherence implies mutual information between actions. That is, to be coherent, your actions can't be independent. This is true under several different definitions of coherence, and can be seen in the following circumstances:
When trading between resources (uncertainty over utility function). If you trade 3 apples for 2 bananas, this is information that you won't trade 3 bananas for 2 apples, if there's some prior distribution over your utility function.
When taking multiple actions from the same utility function (u
The nice thing is that this should work even if you are a policy selected by a
decision making algorithm, but you are not yourself a decision making algorithm
anymore. There is no preference in any of the possible runs of the policy at
that point, you don't care about anything now, you only know what you must do
here, and not elsewhere. But if all possible runs of the policy are considered
altogether (in the updateless sense of maps from epistemic situations to action
and future policy), the preference is there, in the shape of the whole thing
across all epistemic counterfactuals. (Basically you reassemble a function from
pairs (from, to) of things it maps, found in individual situations.)
I guess the at-a-distance part could make use of composition of an agent with
some of its outer shells into a behavior that forgets internal interactions
(within the agent, and between the agent and its proximate environment). The
resulting "large agent" will still have basically the same preference, with
respect to distant targets in environment, without a need to look inside the
small agent's head, if the large agent's external actions in a sufficient range
of epistemic situations can be modeled. (These large agents exist in each
inidividual possible situation, they are larger than the small agent within the
situation, and they can be compared with other variants of the large agent from
different possible situations.)
Not clear what to do with dependence on the epistemic situation of the small
agent. It wants to reduce to dependence on a situation in terms of the large
agent, but that doesn't seem to work. Possibly this needs something like the
telephone theorem, with any relevant-in-some-sense dependence of behavior (of
the large agent) on something becoming dependence of behavior on natural
external observations (of the large agent) and not on internal noise (or
epistemic state of the small agent).
Many people think that AI alignment is intractable (<50% chance of success) and also believe that a universe optimized towards elephant CEV, or the CEV of aliens that had a similar evolutionary environment to humans, would be at least 50% as good as a universe optimized towards human CEV. Doesn't this mean we should be spending significant effort (say, at least 1% of the effort spent on alignment) finding tractable plans to create a successor species in case alignment fails?
If alignment fails I don’t think it’s possible to safely prepare a successor
species. We could maybe try to destroy the earth slightly before the AI turns on
rather than slightly after, in the hopes that the aliens don’t screw up the
chance we give them?
One of my professors says this often happens with circular island chains;
populations from any two adjacent islands can interbreed, but not those from
islands farther apart. I don't have a source. Presumably this doesn't require an
expanding geographic barrier.
2Richard_Kennaway3y
Wouldn't that just be a species?
2Pattern3y
Ourorobos species.
2Thomas Kwa3y
I'm thinking of a situation where there are subspecies A through (say) H; A can
interbreed with B, B with C, etc., and H with A, but no non-adjacent subspecies
can produce fertile offspring.
2Pongo3y
A population distributed around a small geographic barrier that grew over time
could produce what you want
2.5 million jobs were created in May 2020, according to the jobs report. Metaculus was something like [99.5% or 99.7% confident](https://www.metaculus.com/questions/4184/what-will-the-may-2020-us-nonfarm-payrolls-figure-be/) that the number would be smaller, with the median at -11.0 and 99th percentile at -2.8. This seems like an obvious sign Metaculus is miscalibrated, but we have to consider both tails, making this merely a 1 in 100 or 1 in 150 event, which doesn't seem too bad.
At an early singularity summit, Jürgen Schmidhuber, who did some of the pioneering work on self-modifying agents that preserve their own utility functions with his Gödel machine, also solved the friendly AI problem. Yes, he came up with the one true utility function that is all you need to program into AGIs!
(For God’s sake, don’t try doing this yourselves. Everyone does it. They all come up with different utility functions. It’s always horrible.)
His one true utility function was “increasing the compression of environ
Something like Goodhart's Law, I suppose. There are natural situations where X
is associated with something good, but literally maximizing X is actually quite
bad. (Having more gold would be nice. Converting the entire universe into atoms
of gold, not necessarily so.)
EY has practiced the skill of trying to see things like a machine. When people
talk about "maximizing X", they usually mean "trying to increase X in a way that
proves my point"; i.e. they use motivated thinking.
Whatever X you take, the priors are almost 100% that literally maximizing X
would be horrible. That includes the usual applause lights, whether they appeal
to normies or nerds.
“Then we will talk this over, though rightfully it should be an equation. The first term is the speed at which a student can absorb already-discovered architectural knowledge. The second term is the speed at which a master can discover new knowledge. The third term represents the degree to which one must already be on the frontier of knowledge to make new discoveries; at zero, everyone discovers equally regardless of what they already know; at one, one must have mastered every
I don't think Scott had a specific concrete equation in mind. (I don't know of
any myself, and Scott would likely have referenced or written it up on SSC/ACX
by now if he had one in mind.) However, conceptually, it's just a variation on
the rocket equation [https://en.wikipedia.org/wiki/Tsiolkovsky_rocket_equation]
or jeep problem [https://en.wikipedia.org/wiki/Jeep_problem], I think.
Showerthought: what's the simplest way to tell that the human body is less than 50% efficient at converting chemical energy to mechanical work via running? I think it's that running uphill makes you warmer than running downhill at the same speed.
When running up a hill at mechanical power p and efficiency f, you have to exert p/f total power and so p(1/f - 1) is dissipated as heat. When running down the hill you convert p to heat. p(1/f - 1) > p implies that f > 0.5.
Maybe this story is wrong somehow. I'm pretty sure your body has no way of recovering your potential energy on the way down; I'd expect most of the waste heat to go in your joints and muscles but maybe some of it goes into your shoes.
Are there approximate versions of the selection theorems? I haven't seen anyone talk about them, but they might be easy to prove.
Approximate version of Kelly criteron: any agent that follows a strategy different by at least epsilon from Kelly betting will almost surely lose money compared to a Kelly-betting agent at a rate f(epsilon)
Approximate version of VNM: Any agent that satisfies some weakened version of the VNM axioms will have high likelihood under Boltzmann rationality (or some other metric of approximate utility maximization). The closest thing I'... (read more)
Is there somewhere I can find a graph of the number of AI alignment researchers vs AI capabilities researchers over time, from say 2005 to the present day?
Is there software that would let me automatically switch between microphones on my computer when I put on my headset?
I imagine this might work as a piece of software that integrates all microphones connected to my computer into a single input device, then transmits the audio stream from the best-quality source.
A partial solution would be something that automatically switches to the headset microphone when I switch to the headset speakers.
Depending on connection method for your headset, you might be able to just use a
simple switch. Mine is USB, and https://smile.amazon.com/dp/B00JX1ZS5O
[https://smile.amazon.com/dp/B00JX1ZS5O] lets me just leave it disconnected when
not in use. My Windows box uses the speakers when it's disconnected (I don't
have a separate mic, but I expect it would work the same), and switches output
and input to the headset when connected. I've seen similar switchers for 3.5mm
audio connectors - I have no doubt they'd work for microphone instead of
speaker, but I don't know any that combine them.
1Thomas Kwa2y
Thanks. I tried a couple different switches on my setup (3.5mm through USB-C
hub), and the computer didn't disconnect upon opening the switch, so I'm giving
up on this until I change hardware.
Possible post on suspicious multidimensional pessimism:
I think MIRI people (specifically Soares and Yudkowsky but probably others too) are more pessimistic than the alignment community average on several different dimensions, both technical and non-technical: morality, civilizational response, takeoff speeds, probability of easy alignment schemes working, and our ability to usefully expand the field of alignment. Some of this is implied by technical models, and MIRI is not more pessimistic in every possible dimension, but it's still awfully suspicious.
I strongly suspect that one of the following is true:
I'm only going to actually write this up if there is demand; the full post will have citations which are kind of annoying to find.
Maybe this is too tired a point, but AI safety really needs exercises-- tasks that are interesting, self-contained (not depending on 50 hours of readings), take about 2 hours, have clean solutions, and give people the feel of alignment research.
I found some of the SERI MATS application questions better than Richard Ngo's exercises for this purpose, but there still seems to be significant room for improvement. There is currently nothing smaller than ELK (which takes closer to 50 hours to develop a proposal for and properly think about it) that I can point technically minded people to and feel confident that they'll both be engaged and learn something.
Say I need to publish an anonymous essay. If it's long enough, people could plausibly deduce my authorship based on the writing style; this is called stylometry. The only stylometry-defeating tool I can find is Anonymouth; it hasn't been updated in 7 years and it's unclear if it can defeat modern AI. Is there something better?
I'm worried that "pause all AI development" is like the "defund the police" of the alignment community. I'm not convinced it's net bad because I haven't been following governance-- my current guess is neutral-- but I do see these similarities:
- It's incredibly difficult and incentive-incompatible with existing groups in power
- There are less costly, more effective steps to reduce the underlying problem, like making the field of alignment 10x larger or passing regulation to require evals
- There are some obvious negative effects; potential overhangs or greater incentives to defect in the AI case, and increased crime, including against disadvantaged groups, in the police case
- There's far more discussion than action (I'm not counting the fact that GPT5 isn't being trained yet; that's for other reasons)
- It's memetically fit, and much discussion is driven by two factors that don't advantage good policies over bad policies, and might even do the reverse. This is the toxoplasma of rage.
- disagreement with the policy
- (speculatively) intragroup signaling; showing your dedication to even an inefficient policy proposal proves you're part of the ingroup. I'm not 100% this was a large factor in "defund the
... (read more)The obvious dis-analogy is that if the police had no funding and largely ceased to exist, a string of horrendous things would quickly occur. Murders and thefts and kidnappings and rapes and more would occur throughout every country in which it was occurring, people would revert to tight-knit groups who had weapons to defend themselves, a lot of basic infrastructure would probably break down (e.g. would Amazon be able to pivot to get their drivers armed guards?) and much more chaos would ensue.
And if AI research paused, society would continue to basically function as it has been doing so far.
One of them seems to me like a goal that directly causes catastrophes and a breakdown of society and the other doesn't.
I had a long-ish conversation with John Wentworth and want to make it known that I could probably write up any of the following distillations if I invested lots of time into them (about a day (edit: 3 days seems more likely) of my time and an hour of John's). Reply if you're really interested in one of them.
Below is a list of powerful optimizers ranked on properties, as part of a brainstorm on whether there's a simple core of consequentialism that excludes corrigibility. I think that AlphaZero is a moderately strong argument that there is a simple core of consequentialism which includes inner search.
Properties
- Simple: takes less than 10 KB of code. If something is already made of agents (markets and the US government) I marked it as N/A.
- Coherent: approximately maximizing a utility function most of the time. There are other definitions:
- Not being money-pumped
- Nate Soares's notion in the MIRI dialogues: having all your actions point towards a single goal
- John Wentworth's setup of Optimization at a Distance
- Adversarially coherent: something like "appears coherent to weaker optimizers" or "robust to perturbations by weaker optimizers". This implies that it's incorrigible.
- Sufficiently optimized agents appear coherent - Arbital
- will achieve high utility even when "disrupted" by an optimizer somewhat less powerful
- Search+WM: operates by explicitly ranking plans within a world-model. Evolution is a search process, but doesn't have a world-model. The contact with the territory it gets comes from
... (read more)Suppose that humans invent nanobots that can only eat feldspars (41% of the earth's continental crust). The nanobots:
Does this cause human extinction? If so, by what mechanism?
Antifreeze proteins prevent water inside organisms from freezing, allowing them to survive at temperatures below 0 °C. They do this by actually binding to tiny ice crystals and preventing them from growing further, basically keeping the water in a supercooled state. I think this is fascinating.
Is it possible for there to be nanomachine enzymes (not made of proteins, because they would denature) that bind to tiny gas bubbles in solution and prevent water from boiling above 100 °C?
Is there a well-defined impact measure to use that's in between counterfactual value and Shapley value, to use when others' actions are partially correlated with yours?
I'm planning to write a post called "Heavy-tailed error implies hackable proxy". The idea is that when you care about V and are optimizing for a proxy U=V+X, Goodhart's Law sometimes implies that optimizing hard enough for U causes V to stop increasing.
A large part of the post would be proofs about what the distributions of X and V must be for limt→∞E[V|V+X>t]=0, where X and V are independent random variables with mean zero. It's clear that
- X must be heavy-tailed (or long-tailed or som
... (read more)The most efficient form of practice is generally to address one's weaknesses. Why, then, don't chess/Go players train by playing against engines optimized for this? I can imagine three types of engines:
The first tool would simply be an opponent when humans are inconvenient or not available. The second and third tools wo
... (read more)The author of "Where Is My Flying Car" says that the Feynman Program (teching up to nanotechnology by machining miniaturized parts, which are assembled into the tools for micro-scale machining, which are assembled into tools for yet smaller machining, etc) might be technically feasible and the only reason we don't have it is that no one's tried it yet. But this seems a bit crazy for the following reasons:
In nanotech? True enough, because I am not convinced that there is any domain expertise in the sort of nanotech Storrs Hall writes about. It seems like a field that consists mostly of advertising. (There is genuine science and genuine engineering in nano-stuff; for instance, MEMS really is a thing. But the sort of "let's build teeny-tiny mechanical devices, designed and built at the molecular level, which will be able to do amazing things previously-existing tech can't" that Storrs Hall has advocated seems not to have panned out.)
But more generally, that isn't so at all. What I'm looking for by way of domain expertise in a technological field is a history of demonstrated technological achievements. Storrs Hall has one such achievement that I can see, and even that is doubtful. (He founded and was "chief scientist" of a company that made software for simulating molecular dynamics. I am not in a position to tell either how well the software actually worked or how much of it was JSH's doing.) More generally, I want to see a history of demonstrated difficult accomplishments in the field, as opposed to merely writing about the field.
Selecting some random books from my shelves (literally... (read more)
Is it possible to make an hourglass that measures different amounts of time in one direction than the other? Say, 25 minutes right-side up, and 5 minutes upside down, for pomodoros. Moving parts are okay (flaps that close by gravity or something) but it should not take additional effort to flip.
I don't see why this wouldn't be possible? It seems pretty straightforward to me; the only hard part would be the thing that seems hard about making any hourglass, which is getting it to take the right amount of time, but that's a problem hourglass manufacturers have already solved. It's just a valve that doesn't close all the way:
Unless you meant, "how can I make such an hourglass myself, out of things I have at home?" in which case, idk bro.
Given that social science research often doesn't replicate, is there a good way to search a social science finding or paper and see if it's valid?
Ideally, one would be able to type in e.g. "growth mindset" or a link to Dweck's original research, and see:
Somewhat related to this post and this post:
Coherence implies mutual information between actions. That is, to be coherent, your actions can't be independent. This is true under several different definitions of coherence, and can be seen in the following circumstances:
- When trading between resources (uncertainty over utility function). If you trade 3 apples for 2 bananas, this is information that you won't trade 3 bananas for 2 apples, if there's some prior distribution over your utility function.
- When taking multiple actions from the same utility function (u
... (read more)Many people think that AI alignment is intractable (<50% chance of success) and also believe that a universe optimized towards elephant CEV, or the CEV of aliens that had a similar evolutionary environment to humans, would be at least 50% as good as a universe optimized towards human CEV. Doesn't this mean we should be spending significant effort (say, at least 1% of the effort spent on alignment) finding tractable plans to create a successor species in case alignment fails?
Are there ring species where the first and last populations actually can interbreed? What evolutionary process could feasibly create one?
2.5 million jobs were created in May 2020, according to the jobs report. Metaculus was something like [99.5% or 99.7% confident](https://www.metaculus.com/questions/4184/what-will-the-may-2020-us-nonfarm-payrolls-figure-be/) that the number would be smaller, with the median at -11.0 and 99th percentile at -2.8. This seems like an obvious sign Metaculus is miscalibrated, but we have to consider both tails, making this merely a 1 in 100 or 1 in 150 event, which doesn't seem too bad.
Eliezer Yudkowsky wrote in 2016:
... (read more)What was the equation for research progress referenced in Ars Longa, Vita Brevis?
... (read more)Showerthought: what's the simplest way to tell that the human body is less than 50% efficient at converting chemical energy to mechanical work via running? I think it's that running uphill makes you warmer than running downhill at the same speed.
When running up a hill at mechanical power p and efficiency f, you have to exert p/f total power and so p(1/f - 1) is dissipated as heat. When running down the hill you convert p to heat. p(1/f - 1) > p implies that f > 0.5.
Maybe this story is wrong somehow. I'm pretty sure your body has no way of recovering your potential energy on the way down; I'd expect most of the waste heat to go in your joints and muscles but maybe some of it goes into your shoes.
Are there approximate versions of the selection theorems? I haven't seen anyone talk about them, but they might be easy to prove.
Approximate version of Kelly criteron: any agent that follows a strategy different by at least epsilon from Kelly betting will almost surely lose money compared to a Kelly-betting agent at a rate f(epsilon)
Approximate version of VNM: Any agent that satisfies some weakened version of the VNM axioms will have high likelihood under Boltzmann rationality (or some other metric of approximate utility maximization). The closest thing I'... (read more)
Is there somewhere I can find a graph of the number of AI alignment researchers vs AI capabilities researchers over time, from say 2005 to the present day?
Is there software that would let me automatically switch between microphones on my computer when I put on my headset?
I imagine this might work as a piece of software that integrates all microphones connected to my computer into a single input device, then transmits the audio stream from the best-quality source.
A partial solution would be something that automatically switches to the headset microphone when I switch to the headset speakers.