Noosphere89 — LessWrong

Link to long comments that I want to pin, but are too long to be pinned:

https://www.lesswrong.com/posts/Zzar6BWML555xSt6Z/?commentId=aDuYa3DL48TTLPsdJ

https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=Gcigdmuje4EacwirD

https://www.lesswrong.com/posts/DCQ8GfzCqoBzgziew/?commentId=RhTNmgZqjJpzGGAaL

The more boring (and likely case) is that we just have too few data-points to tell whether AI control can actually work as it's supposed to, so we have to mostly fall back on priors.

I'll flag something from J Bostock's comment here while I'm making the comment:

I've only ever heard control talked about as a stopgap for a fairly narrow set of ~human capabilities, which allows us to something something solve alignment.

The human range of capabilities is actually quite large (discussed in SSC).

Shortform

Noosphere892d20

My own take on philosophy is that it's basically divided into 3 segments:

The philosophical problems that were solved, but the solutions are unsatisfying, so philosophers try to futilely make progress on the problem, whereas other scientists content themselves with less general solutions that evade the impossibilities.

(An example is how many philosophical problems basically reduce to the question of "does there exist a way to have a prior that is always better than any other prior for a set of data without memorizing all of the data", and the answer is no in general, because of the No Free Lunch theorem, and an example of the problem solved is the Problem of Induction, but that matters less than people think because our world doesn't satisfy the property of what's required to generate a No Free Lunch result, and ML/AI is focused on solving specific problems in our universe).

2. The philosophical problem depends on definitions in an essential way, such that solving the problem amounts to disambiguating the definition, and there is no objective choice. (Example: Any discussion of what art is, and more generally any discussion of what X is potentially vulnerable to this sort of issue).

3. Philosophical problems that are solved, where the solutions aren't unsatisfying to us (A random example is Ayer's Puzzle of why would you collect any new data if you want to find the true hypothesis, solved by Mark Sellke).

A potential crux with Raemon/Wei Dai here is that I think that lots of philosophical problems are impossible to solve in a satisfying/fully general way, and that this matters a lot less to me than to a lot of LWers.

Another potential crux is that I don't think preference aggregation/CEV can actually work without a preference prior/base values that must be arbitrarily selected, and thus politics is inevitably going to be in the preference aggregation (This comes from Steven Byrnes here):

I’m concerned that CEV isn’t well-defined. Or more specifically, that you could list numerous equally-a-priori-plausible detailed operationalizations of CEV, and they would give importantly different results, in a way that we would find very unsatisfying.
Relatedly, I’m concerned that a “Long Reflection” wouldn't resolve all the important things we want it to resolve, or else resolve them in a way that is inextricably contingent on details of the Long Reflection governance / discourse rules, with no obvious way to decide which of numerous plausible governance / discourse rules are “correct”.
When people make statements that implicitly treat "the value of the future" as being well-defined, e.g. statements like “I define ‘strong utopia’ as: at least 95% of the future’s potential value is realized”, I’m concerned that these statements are less meaningful than they sound.
I’m concerned that changes in human values over the generations are at some deep level more like a random walk than progress-through-time, and that they only feel like progress-through-time because we’re “painting the target around the arrow”. So when we say “Eternal value lock-in is bad—we want to give our descendants room for moral growth!”, and we also simultaneously say specific things like “We want a future with lots of friendship and play and sense-of-agency and exploration, and very little pain and suffering, and…!”, then I’m concerned that those two statements are at least a little bit at odds, and maybe strongly at odds. (If it turns out that we have to pick just one of those two statements, I don’t know which one I’d vote for.)

On the philosophical problems posed by Wei Dai, here's what I'd say:

Decision theory for AI / AI designers
How to resolve standard debates in decision theory?
Logical counterfactuals
Open source game theory
Acausal game theory / reasoning about distant superintelligences

All of these problems are problems where it isn't worth it for humanity to focus on the problems, and instead delegate them to aligned AIs, with a few caveats (I'll also say that there doesn't exist a single decision theory that outperforms every other decision theory, links here and here (though there is a comment that I do like here))

Infinite/multiversal/astronomical ethics
Should we (or our AI) care much more about a universe that is capable of doing a lot more computations?
What kinds of (e.g. spatial-temporal) discounting is necessary and/or desirable?

This is very much dependent on the utility function/values, so this needs more assumptions in order to even have a solution.

Fair distribution of benefits
How should benefits from AGI be distributed?
For example, would it be fair to distribute it equally over all humans who currently exist, or according to how much AI services they can afford to buy?
What about people who existed or will exist at other times and in other places or universes?

Again, this needs assumptions over the utility function/fairness metric in order to even have a solution.

Need for "metaphilosophical paternalism"?
However we distribute the benefits, if we let the beneficiaries decide what to do with their windfall using their own philosophical faculties, is that likely to lead to a good outcome?

Again, entirely dependent on the utility functions.

Metaphilosophy
What is the nature of philosophy?
What constitutes correct philosophical reasoning?
How to specify this into an AI design?

I basically agree with Connor Leahy that the definition of metaphilosophy/philosophy is so large as to contain everything, and thus this is an ask for us to be able to solve every problem, so in that respect the No Free Lunch theorem tells us that we have to in general have every possible example memorized in training, and since this is not possible for us, we can immediately say that there is no generally correct philosophical reasoning that can be specified into an AI design, but in my view this matters a lot less than people think it does.

Philosophical forecasting
How are various AI technologies and AI safety proposals likely to affect future philosophical progress (relative to other kinds of progress)?

Depends, but in general the better AI is at hard to verify tasks, the better it's philosophy is.

Preference aggregation between AIs and between users
How should two AIs that want to merge with each other aggregate their preferences?
How should an AI aggregate preferences between its users?

In general, this is dependent on their utility functions, but one frame that I do like is Preference Aggregation as Bayesian Inference.

Normativity for AI / AI designers
What is the nature of normativity?
Do we need to make sure an AGI has a sufficient understanding of this?

The first question is a maybe interesting research question, but I don't think we need AGI to understand/have normativity.

Metaethical policing
What are the implicit metaethical assumptions in a given AI alignment proposal (in case the authors didn't spell them out)?
What are the implications of an AI design or alignment proposal under different metaethical assumptions?
Encouraging designs that make minimal metaethical assumptions or is likely to lead to good outcomes regardless of which metaethical theory turns out to be true.

For the first question, most alignment plans have the implicit meta-ethical assumption of moral relativism, which is that there's no fundamentally objective values, and every value is valid, we just have to take the values of a human as given, as well as utility functions being a valid representative of human value, in that we can reduce what humans value into a utility function, but this is always correct, so it doesn't matter.

Moral relativism is in a sense the most minimal metaethical assumption you can make, as it is entirely silent on what moral views are correct.

And that's my answer to all of the questions from this post.

Mottes and Baileys in AI discourse

Noosphere892d9-1

A big part of the problem, in a sense is that the discussion is usually focused on dunking on bad arguments.

One of the takeaways of the history of science/progress is that in general, you should pretty much ignore bad arguments against an idea, and most importantly not update towards towards your idea being correct.

The post linked was in part a response to a comment of yours on my last post.

And this shows up a lot in the political examples, and the big issue I've noticed in political discourse is everyone goes towards the weakest arguments on the other side, and don't steelman their opponents (this is in combination with another issue that a lot of people are trying to smuggle in moral claims based on the factual claims, as well as trying to use the factual claims to normalize hurting/killing people on the other side because lots of people simply want to hurt/kill other people, and are bottlenecked by logistics plus opposition).

This is one of my leading theories on how political discussions go wrong nowadays.

Another example here is the orthogonality thesis and instrumental convergence back in 2006-2008 tried to debunk bad arguments from AI optimists at the time, and one of the crucial mistakes that I think doomed MIRI towards having unreasonable (in my eyes) confidence about the hardness of the AI safety problem is the fact that they kept engaging with bad critics instead of trying to invent imaginary steelmans of the AI optimist position (I also think the AI optimists have done this to a lesser extent) (though to be fair we knew a lot less about AI back in 2006-2008).

This is also why empirical evidence is usually far more valuable than arguments, as it cuts out the selection effects that can be a massive problem, and is undoubtably a better critic than anyone will likely generate (except in certain fields).

This is also why I think the recent push to make AI safety have traction amongst the general public by creating a movement is a mistake.

Zack M. Davis's Steelmanning is Normal, ITT-passing is Niche is relevant here (but there are 2 caveats, in that in a case where one person just has way more knowledge, ITT is disproprotinately useful, and in cases where emotions are a rate-limiting factor, ITTs are also necessary).

So one of the key things LWers should be expected to do is be able to steelman beliefs (that aren't moral beliefs) that they think are wrong, and to always focus on the best arguments/evidence.

The main way I've seen people turn ideologically crazy [Linkpost]

Noosphere892d20

I think this is actually a bad thing, and I'd argue that this sort of thing in general is one of my top hypotheses of why political discourse goes so wrong so fast, because people take their (bad) objections as having some bearing on their ideas, and thus update towards their ideas being correct:

You are inherently going to be arguing with a lot of stupid people, or a lot of "super fired up" people, when you argue ideas that affect such people. And you should have to. Most people wouldn't be able to correctly and logically articulate why you shouldn't steal their car, let alone anything related to Marxism or veganism, but I would say that their objections should have some bearing on whether you do so.

Selection Has A Quality Ceiling

Noosphere895d10

The fundamental problem here is that there isn't actually a way to increase human performance by deliberate training very much, absent gene editing, and the basic reasons for this sum up to "lots of ability is much more genetic in nature, and environmental factors are often overestimated heavily in learning, combined with the brain mostly not learning after 25 years old, and while the human brain always learns, it does so at a substantially reduced rate once you reach 25 years old, so most training programs can't teach them too much, unfortunately."

At least, this was my half-remembered view of what areas like neuroscience and genetics fields found out about human abilities.

Which side of the AI safety community are you in?

Noosphere896d65

Admittedly, most of the reason why we are able to solve climate change easily while polarization happened is because it turned out to be the case that the problem was far easier to solve than feared (if we don't care about animal welfare much, which is the case for ~all humans) without much government intervention.

I actually think this has a reasonable likelihood of happening, but conditional on no alignment solution that's cheap enough to be adopted without large government support, if it's doable at all, then polarization matters far more here, so it's actually a useful case study for worlds where alignment is hard.

How an AI company CEO could quietly take over the world

Noosphere897d30

But a million humanoid robots could probably do it if they were merely expert at fighting. Agent-5 isn't a single agent, it's a collective of millions.

Something close to this might be a big reason why Amdahl's law/parallelization bottlenecks on the software singularity might not matter, because the millions of AIs are much, much closer to one single AI doing deep serial research than it is to an entire human field with thousands or millions of people.

The main way I've seen people turn ideologically crazy [Linkpost]

Noosphere897d20

I agree that their attitude is reasonable, conditional on superintelligence being achievable in the foreseeable future. I personally think this is unlikely, but I'm far from certain.

I was referring to the general public here.

The main way I've seen people turn ideologically crazy [Linkpost]

Noosphere897d30

I partially agree with the reason, but I suspect an even bigger reason is the fact that you have to ignore most critics of an idea, because the critics will by default give very bad criticisms no matter the idea's quality, and this substantially strengthens any pre-existing self-selection/selection biases in general, meaning you now need new techniques that are robust to selection effects (or you are working in a easy-to-verify field, such that it's easy for criticism to actually be correct without you being in the field yourself and you don't need to be in the group to correctly assess the ideas/execution).

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments