# All of cousin_it's Comments + Replies

AI Training Should Allow Opt-Out

Then people should be asked before the fact: "if you upload code to our website, we can use it to train ML models and use them for commercial purposes, are you ok with that?" If people get opted into this kind of thing silently by default, that's nasty and might even be sue-worthy.

AI Training Should Allow Opt-Out

Mechanically, an opt-out would be very easy to implement in software. One could essentially just put a line saying

I'm not sure it's so easy. Copilot is a neural network trained on a large dataset. Making it act as if a certain piece of data wasn't in the training set requires retraining it, and it needs to happen every time someone opts out.

3Aleksey Bykhun8d
I think opt-out should only be possible on first publish, same as e.g. GPL-v3 works, once you publish, you cannot re-claim your rights back
Comment reply: my low-quality thoughts on why CFAR didn't get farther with a "real/efficacious art of rationality"

At some point I hoped that CFAR would come up with "rationality trials", toy challenges that are difficult to game and transfer well to some subset of real world situations. Something like boxing, or solving math problems. But a new entry in that row.

IMO standardized tests of this form are hard; I was going to say "mainstream academia hasn't done much better" but Stanovich published something in 2016 that I'm guessing no one at CFAR has read (except maybe Dan?). I am not aware of any sustained research attempts on CFAR's part to do this. [My sense is lots of people looked at it for a little bit, thought "this is hard", and then dug in ground that seemed more promising.]

I think there are more labor-intensive / less clean alternatives that could have worked. We could have, say, just made the equivalent o... (read more)

AGI Safety FAQ / all-dumb-questions-allowed thread

Without nanotech or anything like that, maybe the easiest way is to manipulate humans into building lots of powerful and hackable weapons (or just wait since we're doing it anyway). Then one day, strike.

Edit: and of course the AI's first action will be to covertly take over the internet, because the biggest danger to the AI is another AI already existing or being about to appear. It's worth taking a small risk of being detected by humans to prevent the bigger risk of being outraced by a competitor.

Grabby Animals: Observation-selection effects favor the hypothesis that UAP are animals which consist of the “field-matter”:

How can photonics work without matter? I thought the problem was that you couldn't make a switch, because light waves just pass through each other (the equations are linear, so the sum of two valid waves is also a valid wave).

2avturchin1mo
I have an intuition that complex knots in magnetic field may be relatively stable and I linked an article that explores the topic of such knots. There are over100 ideas about the physics of ball lightnings and some of them explore quite exotic types of matter. But the nature of such matter was not my central argument: I am more interested in population dynamic, assuming that such matter is possible.
What Is a Major Chord?

Sethares' theory is very nice: we don't hear "these two frequencies have a simple ratio", we hear "their overtones align". But I'm not sure it is the whole story.

If you play a bunch of sine waves in ratios 1:2:3:4:5, it will sound to you like a single note. That perceptual fusion cannot be based on aligning overtones, because sine waves don't have overtones. Moreover, if you play 2:3:4:5, your mind will sometimes supply the missing 1, that's known as "missing fundamental". And if you play some sine waves slightly shifted from 1:2:3:4:5, you'll notice the i... (read more)

2gjm2mo
Just making something explicit that I think I missed for a minute when reading your comment: the point isn't "Sethares doesn't explain how our ears/brains determine what's one note and what's more, so his theory is incomplete" (his theory isn't trying to be a theory of that) but "our ears/brains seem to determine what's one note and what's more by doing something like looking for simple integer frequency multiples, and if there's a mechanism for that it seems likely that it's also involved in determining what combinations of tones sound good to us". I think there's something to that. Here are two things that seem like they push the other way: * On the face of it, this indicates machinery for identifying integer ratios, not necessarily rational ones. (Though maybe the missing-fundamental phenomenon suggests otherwise.) * Suppose you hear a violin and a flute playing the same note. You probably will not hear them as a single instrument. I think that whatever magic our ears/brains do to figure out what's one instrument and what's several also involves things like the exact times when spectral components appear and disappear, and which spectral components appear to be fluctuating together (in frequency or amplitude or both), and maybe even fitting spectral patterns to those of instruments we're used to hearing. (I suspect there's a pile of research on this. I haven't looked.) The more other things we use for that, the less confident we can be that integer-frequency-ratio identification is part of it. * Interesting experiment which I am too lazy to try: pick two frequencies with a highly irrational ratio, construct their harmonic series, and split each harmonic series into two groups. So we have A1 and A2 (splitting up the spectrum of note A) and B1 and B2 (splitting up the spectrum of note B). Now construct a sound built out of all those components -- but make A1 and B1 match closely in details of timing,
2jefftk2mo
The way I would explain this, is that when hearing real sounds it is very common that you hear a frequency and it's harmonics. Almost all the time, if you hear 1:2:3:4:5 etc that is because a single note just sounded. So, if you hear a bunch of sine waves in that ratio (ex: a determined a group of people whistling) it sounds like one note.
Whence the determinant?

Hmm, this seems wrong but fixable. Namely, exp(A) is close to (I+A/n)^n, so raising both sides of det(exp(A))=exp(tr(A)) to the power of 1/n gives something like what we want. Still a bit too algebraic though, I wonder if we can do better.

2Oscar_Cunningham4mo
Another thing to say is ifA(0)=Ithen
Whence the determinant?

Interesting, can you give a simple geometric explanation?

2Oscar_Cunningham4mo
My intuition forexpis that it tells you how an infinitesimal change accumulates over finite time (think compound interest). So the above expression is equivalent todet(I+εA)=1+εtr(A)+O(ε2). Thus we should think 'If I perturb the identity matrix, then the amount by which the unit cube grows is proportional to the extent to which each vector is being stretched in the direction it was already pointing'.
Whence the determinant?

Yup, determinant is how much the volume stretches. And trace is how much the vectors stay pointing in the same direction (average dot product of v and Av). This explains why trace of 90 degree rotation in 2D space is zero, why trace of projection onto a subspace is the dimension of that subspace, and so on.

2Oscar_Cunningham4mo
Thank you for that intuition into the trace! That also helps make sense ofdet( exp(A))=exp(tr(A)).
Bear Surprise Freedom Network

Very cool that you're thinking about this. I've been in a bit of funk since the news about Cogent, Lumen and LINX. It's good to hear that not everyone in the West subscribes to "bolt the door from outside".

Ukraine Post #2: Options

Right now there's indeed an exodus of young qualified people from Russia. The easiest path goes to countries that are visa-free for Russians, like Armenia or Argentina.

Why is the war in Ukraine particularly dangerous for the world?

Ukrainians have wanted to join the EU for years, it was one of the main points of the Euromaidan. Most in the EU were lukewarm to it, but now because of the war there are huge pro-Ukraine demonstrations in every European capital.

Why is the war in Ukraine particularly dangerous for the world?

If everyone acts rationally, the result will be Ukraine growing closer to the EU, Russia becoming more isolated, and no WW3. But Russia isn't acting rationally, I'm losing count of distinct stupid things it has done since Feb 21. Extrapolating that stupidity into the future makes me think that WW3 is quite possible.

2rhollerith_dot_com4mo
I am curious why you think that. To avoid provoking the Soviet Union, Finland and Austria refrain from joining the EU till 5 years after East Germany joined the EU and they still are not (and never were) members of NATO. Nothing bad happened to Finland and Austria that is a quarter as bad as what is happening to Ukraine now.
Why I'm co-founding Aligned AI

Can you describe what changed / what made you start feeling that the problem is solvable / what your new attack is, in short?

Firstly, because the problem feels central to AI alignment, in the way that other approaches didn't. So making progress in this is making general AI alignment progress; there won't be such a "one error detected and all the work is useless" problem. Secondly, we've had success generating some key concepts, implying the problem is ripe for further progress.

This feels like a key detail that's lacking from this post. I actually downvoted this post because I have no idea if I should be excited about this development or not. I'm pretty familiar with Stuart's work over the years, so I'm fairly surprised if there's something big here.

Might help if I put this another way. I'd be purely +1 on this project if it was just "hey, I think I've got some good ideas AND I have an idea about why it's valuable to operationalize them as a business, so I'm going to do that". Sounds great. However, the bit about "AND I think I k... (read more)

Acoustic vs Electric Mandolin

I think the acoustic has a better sound, but the electric one has more groove.

Defending One-Dimensional Ethics

“You’re scratching your own moral-seeming itches. You’re making yourself feel good. You’re paying down imagined debts that you think you owe, you’re being partial toward people around you. Ultimately, that is, your philanthropy is about you and how you feel and what you owe and what you symbolize. My philanthropy is about giving other people more of the lives they’d choose."

“My giving is unintuitive, and it’s not always ‘feel-good,’ but it’s truly other-centered. Ultimately, I’ll take that trade.”

I think the Stirnerian counterargument would be that g... (read more)

2PeterMcCluskey5mo
If you follow other centered ethics, then the counterargument seems irrelevant. The post is excellent at explaining the implications of other centered ethics, but it doesn't seem intended to explain why I should adopt those ethics.
The innocent self

I think this view is the opposite of true. My view is something more like "all men are created evil". Animals are callous about how they kill or eat, and we start out as animals too. An animal doesn't have to be hurt to hurt other animals. Neither does a human, there are tons of reports of rich kids who have everything and are callous anyway. It's nature.

So where do we place the good? I think the good in us is the outer layer, the culture. Game-theoretic conventions like "don't kill", first coming from circumstantial necessity, and then we learn and intern... (read more)

7Vanessa Kosoy5mo
IMO the truth is in the middle. Empathy is within human nature, but it's a very partial [https://www.lesswrong.com/posts/dPmmuaz9szk26BkmD/vanessa-kosoy-s-shortform?commentId=Nn824LSK7nze4mqne] emotion (i.e. we have different amount of empathy for different people), and different people have different capacity for empathy. Culture comes in to impose norms that are at least somewhat impartial and universal. And, these norms are still shaped by game-theoretic incentives (plus historical accident).
Better impossibility result for unbounded utilities

Can we have unbounded utilities, and lotteries with infinite support, but probabilities always go down so fast that the sum (absolutely) converges, no matter what evidence we've seen?

2Vanessa Kosoy5mo
Yes, for example you can penalize the (initially Solomonoff-ish) prior probability of every hypothesis by a factor of e−β(Umax−Umin) where β>0 is some constant, Umax is the maximal expected utility of this hypothesis over all policies, and Umin is the minimal (and you'd have to discard hypotheses for which one of those is already divergent, except maybe in cases where the difference is renormalizable somehow). This kind of thing was referred to as "leverage penalty" in a previous discussion [https://www.lesswrong.com/posts/hbmsW2k9DxED5Z4eJ?commentId=RXEfMJJzCfTGeGDnp]. Personally I'm quite skeptical it's useful, but maaaybe?
Inferring utility functions from locally non-transitive preferences

There's a bit of math directly relevant to this problem: Hodge decomposition of graph flows, for the discrete case, and vector fields, for the continuous case. Basically if you have a bunch of arrows, possibly loopy, you can always decompose it into a sum of two components: a "pure cyclic" one (no sources or sinks, stuff flowing in cycles) and a "gradient" one (arising from a utility function). No neural network needed, the decomposition is unique and can be computed explicitly. See this post, and also the comments by FactorialCode and me.

1Jan5mo
Fantastic, thank you for the pointer, learned something new today! A unique and explicit representation would be very neat indeed.
Trying to Keep the Garden Well

I think the right procedure works something like this: 1) Tenants notice that one of them has trashed the garden, and tell the landlord who. 2) The landlord tells the offending tenant to clean up or they'll be billed. 3) If the offending tenant doesn't clean up, the cleaning fee gets added to their next rent bill.

In your case it seems like the offending tenant wasn't pointed out. Maybe because other tenants didn't care, or maybe some tenants had a mafia mentality and made "snitching" unsafe. Either way, you were right to move away.

1Kenny5mo
I don't think [1] or [2] are even (reasonably) 'possible' in most similar situations. I think the only plausible possibilities are: 1. The relevant people persuade the litterers to remove the items they left in the garden. (Assuming the story in the post is accurate, this didn't work or wasn't tried.) 2. Some people, i.e. not the litterers, and maybe 'the city', remove the items. [1] requires fairly 'expensive social technology', e.g. trust, common values, or effective persuasion being feasible at all, and it is not-uncommonly either absent or prohibitively costly to develop.

The whole thing was much more banal than what you're imagining. It was an interim-use building with mainly student residents. There was no coordination between residents that I knew of.

The garden wasn't trashed before the letter. It was just a table and a couple of chairs, that didn't fit the house rules. If the city had just said "please, take the table out of the garden", I'd have given a 70% chance of it working. If the city had not said a thing, there would not have been (a lot of) additional furniture in the garden.

By issuing the threat, the city intr... (read more)

Understanding the tensor product formulation in Transformer Circuits

Can't say much about transformers, but the tensor product definition seems off. There can be many elements in V⊗W that aren't expressible as v⊗w, only as a linear combination of multiple such. That can be seen from dimensionality: if v and w have dimensions n and m, then all possible pairs can only span n+m dimensions (Cartesian product), but the full tensor product has nm dimensions.

Here's an explanation of tensor products that I came up with sometime ago in an attempt to make it "click". Imagine you have a linear function that takes in two vectors and sp... (read more)

1Tom Lieberum6mo
Ah yes that makes sense to me. I'll modify the post accordingly and probably write it in the basis formulation. ETA: Fixed now, computation takes a tiny bit longer but hopefully still readable to everyone.
The Debtor's Revolt

Consider my friend with the business plan to buy up laundromats. Let's say an illiquid, privately held laundromat makes a 25% return on invested capital. Suppose the stock market demands a 10% return for a small-cap company. So $100 million of privately held laundromats would generate$25 million in annual income, worth $250 million on the stock market, 2.5 times the initial investment. But if the laundromat company can finance 75% of the deal at 10% interest, then the cash cost of acquisition is$25 million. The cash flow profits of \$25 million are reduc

4Benquo5mo
How many people do you think have both of these traits? 1 Access to enough capital to execute on that plan and expect it to be positive-EV taking into account not only opportunity cost, but risk. 2 Regularly calculates the ROI on different business categories they interact with, to look for business opportunities. Seems to me like this number is very small, most people doing this are pretty busy making loads of money, and then their kids don't execute the same strategy so it doesn't snowball intergenerationally. And the rest of the post explains why, structurally, we should expect this class to have shrunk quite a bit in relative terms over the last several decades. I agree that under naive microeconomic assumptions what you predict would happen, and I wouldn't be seeing what I'm seeing.
Reply to Eliezer on Biological Anchors

With these two points in mind, it seems off to me to confidently expect a new paradigm to be dominant by 2040 (even conditional on AGI being developed), as the second quote above implies. As for the first quote, I think the implication there is less clear, but I read it as expecting AGI to involve software well over 100x as efficient as the human brain, and I wouldn’t bet on that either (in real life, if AGI is developed in the coming decades—not based on what’s possible in principle.)

I think this misses the point a bit. The thing to be afraid of is not... (read more)

9Matthew Barnett6mo
Unless I’m mistaken, the Bio Anchors framework explicitly assumes that we will continue to get algorithmic improvements, and even tries to estimate and extrapolate the trend in algorithmic efficiency. It could of course be that progress in reality will turn out a lot faster than the median trendline in the model, but I think that’s reflected by the explicit uncertainty over the parameters in the model. In other words, Holden’s point about this framework being a testbed for thinking about timelines remains unscathed if there is merely more ordinary algorithmic progress than expected.
Considerations on interaction between AI and expected value of the future

To me it feels like alignment is a tiny target to hit, and around it there's a neighborhood of almost-alignment, where enough is achieved to keep people alive but locked out of some important aspect of human value. There are many aspects such that missing even one or two of them is enough to make life bad (complexity and fragility of value). You seem to be saying that if we achieve enough alignment to keep people alive, we have >50% chance of achieving all/most other aspects of human value as well, but I don't see why that's true.

Considerations on interaction between AI and expected value of the future

These involve extinction, so they don't answer the question what's the most likely outcome conditional on non-extinction. I think the answer there is a specific kind of near-miss at alignment which is quite scary.

8Vanessa Kosoy7mo
My point is that Pr[non-extinction | misalignment] << 1, Pr[non-extinction | alignment] = 1, Pr[alignment] is not that low and therefore Pr[misalignment | non-extinction] is low, by Bayes.
Interpreting Yudkowsky on Deep vs Shallow Knowledge

I had the same view as you, and was persuaded out of it in this thread. Maybe to shift focus a little, one interesting question here is about training. How do you train a plan-generating AI? If you reward plans that sound like they'd succeed, regardless of how icky they seem, then the AI will become useless to you by outputting effective-sounding but icky plans. But if you reward only plans that look nice enough to execute, that tempts the AI to make plans that manipulate whoever is reading them, and we're back at square one.

Maybe that's a good way to look... (read more)

4John_Maxwell7mo
I agree these are legitimate concerns... these are the kind of "deep" arguments I find more persuasive. In that thread, johnswentworth wrote: I'd solve this by maintaining uncertainty about the "reward signal", so the AI tries to find a plan which looks good under both alignment and the actual-process-which-generates-the-reward-signal. (It doesn't know which is which, but it tries to learn a sufficiently diverse set of reward signals such that alignment is in there somewhere. I don't think we can do any better than this, because the entire point is that there is no way to disambiguate between alignment and the actual-process-which-generates-the-reward-signal by gathering more data. Well, I guess maybe you could do it with interpretability or the right set of priors, but I would hesitate to make those load-bearing.) (BTW, potentially interesting point I just thought of. I'm gonna refer to actual-process-which-generates-the-reward-signal as "approval". Supposing for a second that it's possible to disambiguate between alignment and approval somehow, and we successfully aim at alignment and ignore approval. Then we've got an AI which might deliberately do aligned things we disapprove of. I think this is not ideal, because from the outside this behavior is also consistent with an AI which has learned approval incorrectly. So we'd want to flip the off switch for the sake of caution. Therefore, as a practical matter, I'd say that you should aim to satisfy both alignment and approval anyways. I suppose you could argue that on the basis of the argument I just gave, satisfying approval is therefore part of alignment and thus this is an unneeded measure, but overall the point is that aiming to satisfy both alignment and approval seems to have pretty low costs.) (I suppose technically you can disambiguate between alignment and approval if there are unaligned things that humans would approve of -- I figure you solve this problem by making your learning algorithm robust again
Considerations on interaction between AI and expected value of the future

I think alignment is finicky, and there's a "deep pit around the peak" as discussed here.

I am skeptical. AFAICT a the typical attempted-but-failed alignment looks like one of the two:

• Goodharting some proxy, such as making the reward signal go on instead of satisfying the human's request in order for the human to press the reward button. This usually produces a universe without people, since specifying a "person" is fairly complicated and the proxy will not be robustly tied to this concept.
• Allowing a daemon to take over. Daemonic utility function are probably completely alien and also produce a universe without people. One caveat is: maybe t
General alignment plus human values, or alignment via human values?

There are very “large” impacts to which we are completely indifferent (chaotic weather changes, the above-mentioned change in planetary orbits, the different people being born as a consequence of different people meeting and dating across the world, etc.) and other, smaller, impacts that we care intensely about (the survival of humanity, of people’s personal wealth, of certain values and concepts going forward, key technological innovations being made or prevented, etc.)

I don't think we are indifferent to these outcomes. We leave them to luck, but that'... (read more)

2Stuart_Armstrong7mo
Yes, but we would be mostly indifferent to shifts in the distribution that preserve most of the features - eg if the weather was the same but delayed or advanced by six days.
Considerations on interaction between AI and expected value of the future

I think the default non-extinction outcome is a singleton with near miss at alignment creating large amounts of suffering.

I'm surprised. Unaligned AI is more likely than aligned AI even conditional on non-extinction? Why do you think that?

Soares, Tallinn, and Yudkowsky discuss AGI cognition

Yeah, I had a similar thought when reading that part. In agent-foundations discussions, the idea often came up that the right decision theory should quantify not over outputs or input-output maps, but over successor programs to run and delegate I/O to. Wei called it "UDT2".

Soares, Tallinn, and Yudkowsky discuss AGI cognition

“Though many predicted disaster, subsequent events were actually so slow and messy, they offered many chances for well-intentioned people to steer the outcome and everything turned out great!” does not sound like any particular segment of history book I can recall offhand.

I think the ozone hole and the Y2K problem fit the bill. Though of course that doesn't mean the AI problem will go the same way.

7Sammy Martin7mo
Also Climate Change itself doesn't completely not look like this scenario [https://forum.effectivealtruism.org/posts/ckPSrWeghc4gNsShK/#1__Good_news_on_emissions] , same with nuclear deterrence [https://www.lesswrong.com/posts/LpM3EAakwYdS6aRKf/what-multipolar-failure-looks-like-and-robust-agent-agnostic?commentId=kxaGSyvxYreBL5sMv] .
Frame Control

years ago I was at a large group dinner with acquaintances and a woman I didn’t like. She was talking about something I wasn’t interested in, mostly to a few other people at the table, and I drifted to looking at my phone. The woman then said loudly, “Oh, looks like I’m boring Aella”. This put me into a position

From that description I sympathize with the woman more.

I've been playing music for many years and have thought of many songs as "perfect" by various musical criteria, melody, beat and so on. But deep down I think musical criteria aren't the answer. It all comes down to which mood the song puts you in, so the perfect song = the one that hits the right mood at your current stage in life. So it's gonna be unavoidably different between people, and for the same person across time. For me as a teenager it was "Losing My Religion", somehow. Now at almost 40, this recording of Aguas de Março makes me smile.

A Bayesian Aggregation Paradox

I think your first example could be even simpler. Imagine you have a coin that's either fair, all-heads, or all-tails. If your prior is "fair or all-heads with probability 1/2 each", then seeing heads is evidence against "fair". But if your prior is "fair or all-tails with probability 1/2 each", then seeing heads is evidence for "fair". Even though "fair" started as 1/2 in both cases. So the moral of the story is that there's no such thing as evidence for or against a hypothesis, only evidence that favors one hypothesis over another.

2Pattern7mo
That's a great explanation. Evidence may also be compatible or incompatible with a hypothesis. For instance, if I get a die (without the dots on the sides that indicate 1-6), and I instead label* it: Red, 4, Life, X-Wing, Int, path through a tree Then finding out I rolled a 4, without knowing what die I used, is compatible with the regular dice hypothesis, but any of the other rolls, is not. *(likely using symbols, for space reasons)
Ngo and Yudkowsky on alignment difficulty

Thinking about it more, it seems that messy reward signals will lead to some approximation of alignment that works while the agent has low power compared to its "teachers", but at high power it will do something strange and maybe harm the "teachers" values. That holds true for humans gaining a lot of power and going against evolutionary values ("superstimuli"), and for individual humans gaining a lot of power and going against societal values ("power corrupts"), so it's probably true for AI as well. The worrying thing is that high power by itself seems suf... (read more)

Split and Commit

A few years ago Abram and I were discussing something like this, and converged on "TC Chamberlin's essay about method of multiple working hypotheses is the key to rationality". Or in other words, never have just one hypothesis, always have a next best.

5ryan_b3mo
Much delayed hot take: science is slowing down due to a misapplication of specialization of labor, which drives focus on a single hypothesis.
5Duncan_Sabien7mo
Ngo and Yudkowsky on alignment difficulty

This is tricky. Let's say we have a powerful black box that initially has no knowledge or morals, but a lot of malleable computational power. We train it to give answers to scary real-world questions, like how to succeed at business or how to manipulate people. If we reward it for competent answers while we can still understand the answers, at some point we'll stop understanding answers, but they'll continue being super-competent. That's certainly a danger and I agree with it. But by the same token, if we reward the box for aligned answers while we still u... (read more)

I do think alignment has a relatively-simple core. Not as simple as intelligence/competence, since there's a decent number of human-value-specific bits which need to be hardcoded (as they are in humans), but not enough to drive the bulk of the asymmetry.

(BTW, I do think you've correctly identified an important point which I think a lot of people miss: humans internally "learn" values from a relatively-small chunk of hardcoded information. It should be possible in-principle to specify values with a relatively small set of hardcoded info, similar to the way ... (read more)

Ngo and Yudkowsky on alignment difficulty

I think it makes complete sense to say something like "once we have enough capability to run AIs making good real-world plans, some moron will run such an AI unsafely". And that itself implies a startling level of danger. But Eliezer seems to be making a stronger point, that there's no easy way to run such an AI safely, and all tricks like "ask the AI for plans that succeed conditional on them being executed" fail. And maybe I'm being thick, but the argument for that point still isn't reaching me somehow. Can someone rephrase for me?

I think it makes complete sense to say something like "once we have enough capability to run AIs making good real-world plans, some moron will run such an AI unsafely". And that itself implies a startling level of danger. But Eliezer seems to be making a stronger point, that there's no easy way to run such an AI safely, and all tricks like "ask the AI for plans that succeed conditional on them being executed" fail.

Yes, I am reading here too that Eliezer seems to be making a stronger point, specifically one related to corrigibility.

Looks like Eliezer bel... (read more)

Speaking for myself here…

OK, let's say we want an AI to make a "nanobot plan". I'll leave aside the possibility of other humans getting access to a similar AI as mine. Then there are two types of accident risk that I need to worry about.

First, I need to worry that the AI may run for a while, then hand me a plan, and it looks like a nanobot plan, but it's not, it's a booby trap. To avoid (or at least minimize) that problem, we need to be confident that the AI is actually trying to make a nanobot plan—i.e., we need to solve the whole alignment problem.

The main issue with this sort of thing (on my understanding of Eliezer's models) is Hidden Complexity of Wishes. You can make an AI safe by making it only able to fulfill certain narrow, well-defined kinds of wishes where we understand all the details of what we want, but then it probably won't suffice for a pivotal act. Alternatively, you can make it powerful enough for a pivotal act, but unfortunately a (good) pivotal act probably has to be very big, very irreversible, and very entangled with all the complicated details of human values. So alignment is l... (read more)

+1 to the question. My current best guess at an answer: There are easy safe ways, but not easy safe useful-enough ways. E.g. you could make your AI output DNA strings for a nanosystem and absolutely do not synthesize them, just have human scientists study them, and that would be a perfectly safe way to develop nanosystems in, say, 20 years instead of 50, except that you won't make it 2 years without some fool synthesizing the strings and ending the world. And more generally, any pathway that relies on humans achieving deep understanding of the pivotal act will take more than 2 years, unless you make 'human understanding' one of the AI's goals, in which case the AI is optimizing human brains and you've lost safety.
Ngo and Yudkowsky on alignment difficulty

That seems wrong, living creatures have lots of specific behaviors that are genetically programmed.

In fact I think both you and John are misunderstanding the bottleneck. The point isn't that the genome is small, nor that it affects the mind indirectly. The point is that the mind doesn't affect the genome. Living creatures don't have the tech to encode their life experience into genes for the next generation.

I've appreciated this comment thread! My take is that you're all talking about different relevant things. It may well be the case that there are multiple reasons why more skills and knowledge aren't encoded in our genomes: a) it's hard to get that information in (from parents' brains), b) it's hard to get that information out (to childrens' brains), and c) having large genomes is costly. What I'm calling the genomic bottleneck is a combination of all of them (although I think John is probably right that c) is not the main reason).

What would falsify my clai... (read more)

1TekhneMakre8mo
Do you think you can encode good flint-knapping technique genetically? I doubt that. I think I agree with your point, and think it's a more general and correct statement of the bottleneck; but, still, I think that genome does mainly affect the mind indirectly, and this is one of the constraints making it be the case that humans have lots of learning / generalizing capability. (This doesn't just apply to humans. What are some stark examples of animals with hardwired complex behaviors? With a fairly high bar for "complex", and a clear explanation of what is hardwired and how we know. Insects have some fairly complex behaviors, e.g. web building, ant-hill building, the tree-leaf nests of weaver ants, etc.; but IDK enough to rule out a combination of a little hardwiring, some emergence, and some learning. Lots of animals hunt after learning from their parents how to hunt. I think a lot of animals can walk right after being born? I think beavers in captivity will fruitlessly chew on wood, indicating that the wild phenotype is encoded by something simple like "enjoys chewing" (plus, learned desire for shelter), rather than "use wood for dam".) An operationalization of "the genome directly programs the mind" would be that things like [the motions employed in flint-knapping] can be hardwired by small numbers of mutations (and hence can be evolved given a few million relevant years). I think this isn't true, but counterevidence would be interesting. Since the genome can't feasibly directly encode behaviors, or at least can't learn those quickly enough to keep up with a changing niche, the species instead evolves to learn behaviors on the fly via algorithms that generalize. If there were *either* mind-mind transfer, *or* direct programming of behavior by the genome, then higher frequency changes would be easier and there'd be less need for fluid intelligence. (In fact it's sort of plausible to me (given my ignorance) that humans are imitation specialists and are less clever
Worst Commonsense Concepts?

To me some of worst commonsense ideas come from the amateur psychology school: "gaslighting", "blaming the victim", "raised by narcissists", "sealioning" and so on. They just teach you to stop thinking and take sides.

Logical fallacies, like "false equivalence" or "slippery slope", are in practice mostly used to dismiss arguments prematurely.

The idea of "necessary vs contingent" (or "essential vs accidental", "innate vs constructed" etc) is mostly used as an attack tool, and I think even professional usage is more often confusing than not.

I think it would be useful if you edited the answer to add a line or two explaining each of those or at least giving links (for example, Schelling fences on slippery slopes), cause these seem non-obvious to me.

I think a lot of human "alignment" isn't encoded in our brains, it's encoded only interpersonally, in the fact that we need to negotiate with other humans of similar power. Once a human gets a lot of power, often the brakes come off. To the extent that's true, alignment inspired by typical human architecture won't work well for a stronger-than-human AI, and some other approach is needed.

5M. Y. Zuo8mo
I didn’t mean to suggest that any future approach has to rely on ‘typical human architecture’. I also believe the least possibly aligned humans are less aligned than the least possibly aligned dolphins, elephants, whales, etc…, are with each other. Treating AGI as a new species, at least as distant to us as dolphins for example, would be a good starting point.

Arguments by definition don't work. If by "human values" you mean "whatever humans end up maximizing", then sure, but we are unstable and can be manipulated, which isn't we want in an AI. And if you mean "what humans deeply want or need", then human actions don't seem very aligned with that, so we're back at square one.

Education on My Homeworld

In The Case against Education: Why the Education System Is a Waste of Time and Money, Bryan Caplan uses Earth data to make the case that compulsory education does not significantly increase literacy.

Compulsory education increases literacy, see the Likbez in the USSR.

Managing your own boredom requires freedom, which is the opposite of compulsion.

One can make the opposite assertion, that it's fastest learned through discipline, and point to Chinese or South Korean schools.

I don’t doubt that it’s useful to have the whole population learn reading and

Education on My Homeworld

There is no standard set of skills everyone is supposed to learn because if everyone learns something then its economic value becomes zero.

This seems wrong. Skills like literacy, numeracy, prosociality and ability to manage your own boredom bring a lot of economic value, even (especially) if everyone has them. And looking at our world, most people don't acquire these skills freely and automatically, they have to be forced somewhat.

4lsusr8mo
In The Case against Education: Why the Education System Is a Waste of Time and Money, Bryan Caplan uses Earth data to make the case that compulsory education does not significantly increase literacy. I'm skeptical that prosociability and the ability to manage your own boredom are taught at school in a way that would not be learned otherwise. Managing your own boredom requires freedom, which is the opposite of compulsion. Sociability requires permission to speak, which is forbidden by default in classroom-style schooling. Algebra and calculus seem the most IQ loaded of anything taught in school. I don't doubt that it's useful to have the whole population learn reading and arithmetic, but this seems to me like it's the kind of thing that can be taught in a few months. (Or a single month to a smart child.) If kids don't learn reading automatically then that would imply that they wouldn't text each other in the absence of school which, to me, is reducto ad absurdum [https://xkcd.com/1414/].
Depositions and Rationality

Boxing in, by bracketing. People who claim to have no idea about a quantity will often give surprisingly tight ranges when explicitly interrogated.

And most of the time their original "no idea" will be more accurate than the stuff you made them make up.

I do think there's a rationality skill implicit in the text: the "coaching" that witnesses undergo to avoid giving answers they don't want to give. That'd be worth learning, as it's literally defense against the dark arts. And the test for it could be an interrogation of the kind that you describe.

The Opt-Out Clause

I didn't want to leave, but also didn't think reciting the phrase would do anything, so I recited it just as an exercise to overcome superstition, and nothing happened. Reminds me of how Ross Scott bought a bunch of people's souls for candy, one guy just said "sure I'm hungry" and signed the contract; that's the way.

[Book Review] "The Bell Curve" by Charles Murray

(3) tax businesses for hogging up all the smart people, if they try to brain drain into their own firm?

Due to tax incidence, that's the same as taxing smart people for getting together. I don't like that for two reasons. First, people should be free to get together. Second, the freedom of smart people to get together could be responsible for large economic gains, so we should be careful about messing with it.

On the Universal Distribution

It's interesting that the "up to a finite fudge factor" problem isn't specific to universal priors. The same problem exists in ordinary probability theory, where you can have different priors about, for example, the propensity of a coin. Then after many observations, all reasonable priors get closer and closer to the true propensity of the coin.

Then it's natural to ask, what kind of such long-run truths do all universal priors converge on? It's not only truths of the form "this bit string comes from this program", because a universal prior can also look at... (read more)

3TekhneMakre8mo
To be clear, SI can't learn which program, just a program with the same functional behavior (depending on the setup, same functional behavior on the prefixes of the string in question). Hm. Say we have universal priors P and Q. For computable infinite strings x, we have that P(x|n) converges to Q(x|n), because they both converge to the right answer. For any x, we have that P(x|n) and Q(x|n) are always within some fixed constant factor of each other, by definition. I conjecture, very unconfidently, that for any P there's a universal Q and a string x such that | P(x|n) - Q(x|n) | > c for some fixed c > 0 for infinitely many n. I don't even have an idea to show this and don't know if it's true, but the intuition is like, an adversary choosing x should be able to find a machine M (or rather, equivalence classes of machines that make the same predictions; this trips me up in constructing Q) that Q and P have different priors on, and are consistent so far with the x|n chosen; then the adversary confirms M heavily by choosing more bits of x, so that the probability on M dominates, and the difference between Q(M) and P(M) is ballooned up maximally (given the constraint that P(Q) and Q(P) are positive); then the adversary picks a different M' and repeats, and so on.
They don't make 'em like they used to

Input latency and unpredictability of it. One famous example is that for many years there were usable finger-drumming apps on iOS but not on Android, because on Android you couldn't make the touchscreen + app + OS + sound system let people actually drum in time. Something would always introduce a hundred ms of latency (give or take) at random moments, which is enough to mess up the feeling. Everyone knew it and no one could fix it.

They don't make 'em like they used to

Or just keep a piezoelectric lighter.