## LESSWRONGLW

Morality is Scary

Depending what we mean by 'crazy' I think that's unlikely - particularly when what we care about here are highly unusual moral stances.

Completely agree with this. I understood crazy as literal crazy - people with psychiatric disorders or other reasons they absolutely refuse to engage in trades with fellow humans.

If we're considering intelligent people, I'm sure members of all human ideologies (including religious, tech, economic extremists) still share a lot of common behaviour and drivers of behaviour, simply by virtue of being human. And hence will engag... (read more)

1Joe_Collman3dFirst, I think we want to be thinking in terms of [personal morality we'd reflectively endorse] rather than [all the base, weird, conflicting... drivers of behaviour that happen to be in our heads]. There are things most of us would wish to change about ourselves if we could. There's no sense in baking them in for all eternity (or bargaining on their behalf), just because they happen to form part of what drives us now. [though one does have to be a bit careful here, since it's easy to miss the upside of qualities we regard as flaws] With this in mind, reflectively endorsed antinatalism really is a problem: yes, some people will endorse sacrificing everything just to get to a world where there's no suffering (because there are no people). Note that the kinds of bargaining approach Vanessa is advocating are aimed at guaranteeing a lower bound for everyone (who's not pre-filtered out) - so you only need to include one person with a particularly weird view to fail to reach a sensible bargain. [though her most recent version [https://www.lesswrong.com/posts/y5jAuKqkShdjMNZab/morality-is-scary?commentId=zbNAFDZQCjfmQnkmR] should avoid this]
If a "Kickstarter for Inadequate Equlibria" was built, do you have a concrete inadequate equilibrium to fix?

This concept has become popular where I live. Customers value faster delivery times, hence offering those is profitable. Expaning at a loss clearly wasn't "prohibitively costly" for us, I don't think anything smaller than $1B is prohibitively costly if you have a clear business plan and supportive investors. Morality is Scary I wondered this too. Curious - do you think the technical part of such solutions should be worked on in the open or not? Lots of people downvoted my post so I wonder if there's some concern people have with this line of thinking. 4AprilSR3dI don't have any particular plan for becoming a world dictator with a lot of money, but it's certainly easier if you have a lot of money than if you don't. Morality is Scary I can imagine many alterations to my utility function that I definitely wouldn't want to accept in advance, but which would make me "happier" in the end. Would this include situations like the AGI performing neurosurgery on you so you become a paperclip maximiser yourself? (i.e. paperclip maximising makes you happy) I can totally imagine invented situations where my current self voluntarily signs up for such a surgery, and I'm sure an AGI is more creative than I am at inventing situations and means to persuade me. 3moridinamael3dYes, there is a broad class of wireheading solutions that we would want to avoid, and it is not clear how to specify a rule that distinguishes them from outcomes that we would want. When I was a small child I was certain that I would never want to move away from home. Then I grew up, changed my mind, and moved away from home. It is important that I was able to do something which a past version of myself would be horrified by. But this does not imply that there should be a general rule allowing all such changes. Understanding which changes to your utility function are good or bad is, as far as decision theory is concerned, undefined. Morality is Scary Random thought: Maybe crazy behaviour correlates with less intelligence (g) as defined correlates with lower moral status, so we use intelligence as the filtering rule. I wonder if any of these correlations can be formalised. 1Joe_Collman3dDepending what we mean by 'crazy' I think that's unlikely - particularly when what we care about here are highly unusual moral stances. I'd see intelligence as a multiplier, rather than something which points you in the 'right' direction. Outliers will be at both extremes of intelligence - and I think you'll get a much wider moral variety on the high end. For instance, I don't think you'll find many low-intelligence antinatalists [https://en.wikipedia.org/wiki/Antinatalism] - and here I mean the stronger, non-obvious claim: not simply that most people calling themselves antinatalists, or advocating for antinatalism will have fairly high intelligence, but rather that most people with such a moral stance (perhaps not articulated) will have fairly high intelligence. Generally, I think there are many weird moral stances you might think your way into that you'd be highly unlikely to find 'naturally' (through e.g. absorption of cultural norms). I'd also expect creativity to positively correlate with outlier moralities. Minds that habitually throw together seven disparate concepts will find crazier notions than those which don't get beyond three. [linkpost] Crypto Cities It seems to be better to start with some smaller, manageable, nexus of improvements. Agreed. In practice, if crypto property rights were ever to come about, it would have to be in either a city state like Singapore or Monaco, or in a watered down form where the ‘on chain system’ is a bit of sprinkling to make the city seem high tech and futuristic. Maybe there is non-zero benefit to on-chain real estate, it will depend a lot on implementation I think. On-chain dollars (USDC) has allowed people to do crypto loans and trading (DeFi). Perhaps on-chain real estat... (read more) 1M. Y. Zuo3dGood points. Also it may be possible to build such a city on a remote island or in the middle of a desert. Jurisdictions may even compete for the privilege if$X billions of annual tax dollars could be guaranteed. If all the crypto billionaires pooled together to build a small city that may be enough capital.

I don't personally know what it would take to get real estate on-chain, and I'm not too hopeful. Part of the reason why I took a break from crypto in the first place.

But I'd assume you would need local politicians in favour, for starters. I don't think national-level laws would need to be changed for this system to work, given that it's a hybrid system. Eminent domain authority could continue to exist. (I don't know anything about US law btw, just did a google search on what it is.)

Also the land doesn't need to be bought, just transferred from one system t... (read more)

4M. Y. Zuo5dIn the end I think your last paragraph explains everything. The proposal presumes that a parallel legal system governing property rights would even be allowed in the first place. Considering that in every nation I can think of property rights are guaranteed by a constitution, or a crown or something else equally difficult to change, this seems to be about as likely as any other activist proposal getting a supermajority to change the constitution. In practice, if crypto property rights were ever to come about, it would have to be in either a city state like Singapore or Monaco, or in a watered down form where the ‘on chain system’ is a bit of sprinkling to make the city seem high tech and futuristic. Your right to not be too hopeful about on chain real estate. It seems to be better to start with some smaller, manageable, nexus of improvements.
Anthropics and the Universal Distribution

There's a background assumption in these discussions about anthropics, that there is a single correct answer, but I think that the correct probability distribution depends on what your aim is.

I echo this intuition weakly - and also if you replace "anthropic theories" with "decision theories".

Anthropic theories or decision theories are said to be "better" if they are in some sense - more intuitive or more intelligent. Often we are implicitly assuming a notion of intelligence under which all agents ( / Turing machines / physical structures / toy models) can ... (read more)

I currently translate AGI-related texts to Russian. Is that useful?

Quick google search says english fluency is low in Russia even in universities, so this seems very useful. You probably have a better picture of this than me. You may also want to spend some time thinking about distribution - which audiences are you targetting, where are they likely to end up seeing your content. I have no clue what this looks like for Russia.

P.S. You can also ask questions about impactfulness on the EA forum or 80000hours

P.P.S. You may even be able to apply for funding from EA funds if it'll help increase the reach of your content.

M. Y. Zuo's Shortform

I agree with most of this, and my intuitions are towards AI alignment being impossible for these very reasons. Humans not being capable of consistency doesn't seem to me like something we can change through sheer willpower alone. We have entire cognitive modules that are not designed for rational thinking in the first place. Perhaps only neurosurgery can change that.

1M. Y. Zuo6dIt does seem like alignment for all intents and purposes is impossible. Creating an AI truly beyond us then is really creating future, hopefully doting, parents to live under.

Anyone has any good resources on linguistic analysis to doxx people online?

Both automated and manual, although I'm more keen on learning about automated.

What is the state-of-the-art in such capabilities today? Are there forecasts on future capabilities?

Trying to figure out whether defending online anonymity is worth doing or a lost cause.

Why do you believe AI alignment is possible?

I think my intuitions are mix of your 3rd and 5th one.

Do you mean it's just highly unlikely that humans will successfully find and implement any of the possible safe designs? Then assuming impossibility would seem to make this even more likely, self-fulfilling-prophecy style, no? Isn't trying to fix this problem the whole point of alignment research?

If the likelihood is sufficiently low, no reasonable amount of work might get you there. Say the odds of aligned AI being built this century are 10^-10 if you do nothing versus 10^-5 if thou... (read more)

MESSY INTUITIONS ABOUT AGI, MIGHT TYPE THEM OUT PROPERLY LATER

OR NOT

I'm sure we're a finite number of voluntary neurosurgeries away from worshipping paperclip maximisers. I tend to feel we're a hodge-podge of quick heuristic modules and deep strategic modules, and until you delete the heuristic modules via neurosurgery our notion of alignment will always be confused. Our notion of superintelligence / super-rationality is an agent that doesn't use the bad heuristics we do, people have even tried formalising this with Solomonoff / Turing machines / AIXI. But... (read more)

Rodarmor's Shortform

Is this theoretical or has it been implemented anywhere?

My gut reaction is that demanding people put everything up for sale will cause a lot of harms. You're essentially writing a perpetual call option (for free) on every asset on the planet.

So far I can think of:

- Assets with higher volatility will have higher self-assessed value (SAV) and hence higher tax - why is this a useful property of a tax system? Your system will force risk-averse investments and hence less economic growth.

- Assets that are intended as long-term investments will have h... (read more)

Open question: Math proofs that would enable you become a world dictator

I missed that. Although it doesn't change much, the AI will likely still hunt for a formal proof when it finds direct search for an algo yielding no result. And the risk-reward calculation of actually asking an unaligned AI this question is same, some odds of very high danger and small odds of useful success.

M. Y. Zuo's Shortform

Don't know.

"Axioms plus their consequences" is a toy/ideal model that may look very different from how desires and reasoning are actually wired in the brain. You can check out Coherentism for some intuitions on an alternate model. Deeper understanding over which model best describes how human brains work or are constrained, is an open problem. Someone besides me might have better resources.

1M. Y. Zuo9dLet’s think about it another way. Consider the thought experiment where a single normal cell is removed from the body of any randomly selected human. Clearly they would still be human. If you keep on removing normal cells though eventually they would die. And if you keep on plucking away cells eventually the entire body would be gone and only cancerous cells would be left, i.e. only a ‘paperclip optimizer’ would remain from the original human, albeit inefficient and parasitic ‘paperclips’ that need a organic host. (Due to the fact that everyone has some small number of cancerous cells at any given time that are taken care of by regular processes) At what point does the human stop being ‘human’ and starts being a lump of flesh? And at what point does the lump of flesh become a latent ‘paperclip optimizer’? Without a sharp cutoff, which I don’t think there is, there will inevitably be inbetween cases where your proposed methods cannot be applied consistently. The trouble is if we, or the decision makers of the future, accept even one idea that is not internally consistent then it hardly seems like anyone will be able to refrain from accepting other ideas that are internally contradictory too. Nor will everyone err in the same way. There is no rational basis to accept one or another as a contradiction can imply anything at all, as we know from basic logic. Then the end result will appear quite like monkey tribes fighting each other, agitating against each and all based on which inconsistencies they accept or not. Regardless of what they call each other, humans, aliens, AI, machines, organism, etc…
M. Y. Zuo's Shortform

there doesn’t seem to be a credible argument to dissuade them

There can be still be arguments like:

- "You will be unhappy if you resist what you are biologically hardwired to do. You probably don't want that."

- "People will socially isolate from you if you want human extinction. You don't want the former."

Basically vaguely point at inconsistency within your wants, rather than which wants are universally correct or wrong in some metaphysical frame. Most people don't want their wants to be inconsistent. (So much so that some people come up with el... (read more)

1M. Y. Zuo17dThose appear to be examples of arguments from consequences, a logical fallacy. How could similar reasoning be derived from axioms, if at all?
Why do you believe AI alignment is possible?

Fair, but it acts as a prior against it. If you can't even align humans with each other in the face of an intelligence differential, why will you be to align an alien with all humans?

Or are the two problems fundamentally different in some way?

5AprilSR17dI mean, I agree it'd be evidence that alignment is hard in general, but "impossible" is just... a really high bar? The space of possible minds is very large, and it seems unlikely that the quality "not satisfactorily close to being aligned with humans" is something that describes every superintelligence. It's not that the two problems are fundamentally different it's just that... I don't see any particularly compelling reason to believe that superintelligent humans are the most aligned possible superintelligences?
Worst Commonsense Concepts?

Maybe use the same social tool but use better labels than "fact" and "opinion". Maybe use "fact" but contrast it with "desire" or "preference" or "want" instead of "opinion". Cause opinions are often about fact and can be argued as objectively correct or wrong, whereas with desires are more subjective and you want to avoid fights on the latter.

Although tbvh at that early stage as an authority figure you may also want to impose "good desires" and "bad desires" without following through the whole chain of reasoning why. (that some desires make you unhappy or cause social isolation or are unreasonable to fulfill or misalign with my notion of a "good person" that I want you to be.)

So I'm not sure what helps children.

Rodarmor's Shortform

I hope you are aware this is basically an agriculture tax.

Land use (global) is 51 million km2 for agriculture (a majority of which is for livestock), versus 1.5 million km2 for "all human settlement".

Agriculture contributes to only 6.4% of the global GDP. The rest of the GDP is coming out of a (likely small) subset of the human-settled land.

It isn't clear to me why a land tax would distribute wealth from the rich to the poor. Or why this form of tax is in any sense "fairer" than say, taxing all publicly listed companies by a perecentage of their shares, if... (read more)

HYPOTHETICAL (possibly relevant to AI safety)

Assume you could use magic that could progressively increase the intelligence of exactly one human in a "natural way". * You need to pick one person (not yourself) who you trust a lot and give them some amount of intelligence. Your hope is they use this capability to solve useful problems that will increase human wellbeing broadly.

What is the maximum amount of intelligence you'd trust them with?

*when I say natural way I mean their neural circuits grow, maintain and operate in a manner similar to how they already... (read more)

Open question: Math proofs that would enable you become a world dictator

Interesting suggestion.

Though I assume you also need the proof to be by construction and the algorithms to be implementable in practice (very large constant factors or exponents not allowed). Odds of this existing will have to be factored into whether this whole question is worth asking an unsafe AI.

I wonder if there's a computational problem whose solution has higher odds of existence or more easily leads to practical implementation.

3CronoDAS17dWhich is why I specified "an algorithm" and not "a proof". Also, if my understanding is correct, simulating quantum systems is in PSPACE, so one thing this would do is make nanotechnology much easier to develop...
Open question: Math proofs that would enable you become a world dictator

I agree in the world today, attempting to get AI that can do this is dangerous, better off slowing down AI race. What I'm suggesting is more as a precaution for a world where the AI race has gotten to the point where this AI can be built.

Why do you believe AI alignment is possible?

Humans today all have roughly same intelligence and training history though. It isn't obvious (to me atleast) that human with an extra "intelligence module" will remain aligned with other humans. I would personally be afraid of any human being intelligent enough to unilaterally execute a totalitarian power grab over the world, no matter how good of a person they seem to be.

2AprilSR18dI'm not sure either way on giving actual human beings superintelligence somehow, but I don't think that not working would imply there aren't other possible-but-hard approaches.
Ngo and Yudkowsky on alignment difficulty

You could define a threshold for known AI capability or odds of extinction* and bet on that instead.

*as estimated by some set of alignment experts

Open question: Math proofs that would enable you become a world dictator

you don't want to build that tool.

Depending on the current world state (say someone else is close to discovering AGI too), it might be a viable option to have I guess.

Open question: Math proofs that would enable you become a world dictator

I think my idea was to restrict the number of compute cycles being given to the Task AI. Given enough compute, the AI will eventually end up spending some compute to learn the existence of the real world, but if the AI is naturally much better suited to math theorems than this, it might directly solve the math problem first.

I'm keen to know if that would work.

4Gyrodiot19dSo, assuming an unaligned agent here. If your agent isn't aware that its compute cycles are limited (i.e. the compute constraint is part of the math problem), then you have three cases: (1a) the agent doesn't hit the limit with its standard search, you're in luck; (1b) the problem is difficult enough that the agent runs its standards search but fails to find a solution in the allocated cycles, so it always fails, but safely. (1c) you tweak the agent to be more compute-efficient, which is very costly and might not work, in practice if you're in case 1b and it apparently fails safely you have an incentive to just increase the limit. If your agent is indeed aware of the constraint, then it has an incentive to remove it, or increase the limit by other means. Three cases here again: (2a) identical to 1a, you're in luck; (2b) the limit is low enough that strategic action to remove the constraint is impossible, the agent fails "safely"; (3b) the agent finds a way to remove the constraint, and you're in very unsafe territory. Two observations from there: first, ideally you'd want your agent to operate safely even if given unbounded cycles, that's the Omni Test. [https://arbital.com/p/omni_test/] Second, there's indeed an alignment concept for agents that just try to solve the problem without long-term planning, that's Myopia [https://www.alignmentforum.org/tag/myopia] (and defining it formally is... hard).
M. Y. Zuo's Shortform

Ideal rational agents can be programmed with arbitrary goals. Such an agent won't ask questions like "why is my terminal goal X not Y", cause "terminal goal = X" is an axiom. You can't prove axioms out of anything more fundmental. If you could, those more fundamental things would be the axioms then. And two different agents can have different sets of axioms.

Non-ideal agents can have all sorts of behaviours, some of which are goal-seeking, some of which are rational, some of which are not. Humans are in this category.

Human brains are programmed to release n... (read more)

1M. Y. Zuo18dIf no one’s goals can be definitely proven to be better than anyone else’s goals, then it doesnt seem like we can automatically conclude the majority of present or future humans, or our descendants, will prioritize maximizing fun, happiness, etc. If some want to pursue that then fine, if others want to pursue different goals, even ones that are deleterious to overall fun, happiness, etc., then there doesn’t seem to be a credible argument to dissuade them?
The Learning-Theoretic AI Alignment Research Agenda

I believe that "values" is also a natural concept.

I agree they're natural, but more in the map than the territory.

(Assume) Humans are reducible to Turing machines. Assume you could run a simulation of the machine at any pace you like. You won't need to know about the "values" of the Turing machine to predict its behaviour in any possible circumstance. It needn't even have any values. But in the real world we don't have infinite compute, so we have to ignore the details and focus on models with high information density, when modelling other minds. (Or... (read more)

Ngo and Yudkowsky on alignment difficulty

Couldn't theorems with very little information about the universe be useful for a pivotal act?

https://www.lesswrong.com/posts/qmN2H8gjxEvJsKbzL/open-question-math-proofs-that-would-enable-you-become-a

I'd be super keen on reading anything as to why this is impossible. (Or atleast harder than all the other directions being currently pursued.)

P.S. Explanations for the downvote(s) would help

Ngo and Yudkowsky on alignment difficulty

What about spending those 20 or 50 years even before we have AGI? Have the messy parts of the solution ready, so you only need to plug the Task AGI into some narrow-but-hard subproblem and you have a pivotal act.

https://www.lesswrong.com/posts/qmN2H8gjxEvJsKbzL/open-question-math-proofs-that-would-enable-you-become-a

P.S. Explanation for the downvote(s) would help

Ngo and Yudkowsky on alignment difficulty

I agree that the more compute time is spent on any problem, the more likely the AI pursues eventually instrumental goals like breaking out of its box. I wonder if it is possible to find a suitable problem, such that this does not happen before it solves the problem head-on.

https://www.lesswrong.com/posts/qmN2H8gjxEvJsKbzL/open-question-math-proofs-that-would-enable-you-become-a

since there is a limited pool of people who you would trust in practice

It might be possible for large parties to collude too. (To the detriment of everyone outside of the party)

Each person goes and records their vote with a party authority. The party authorities then tally votes and tell everyone the result. (Tallying can maintain privacy using ZK proofs or a trusted third party) Then the authorities tell you to vote proportional to this result in the official vote because it is the "right thing to do". They could use lie detectors and generally psychologi... (read more)

Just as a note, this also applies the QV version that does not use money. Each human has 100 "points" that they must allocate, the only choice being how much they allocate to which candidates.

3Bucky18dNo real idea, possible obviousness? Have an upvote to compensate :)
Social behavior curves, equilibria, and radicalism

I second JenniferRM's post, most people don't live in a sigular global soup where they can see 7 billion voices and are equally influenced by all 7 billion. Instead you end up with local pockets through which ideas spread. Which is something I've been thinking about how to engineer - how do you create communities that can spread ideas quickly and then influence other communities.

P.S. I wonder if the equations for behaviour spread look anything like those for spread of disease and if any of that research could be reusable. Both are geographically localised ... (read more)

Why do you believe AI alignment is possible?

Thanks for typing this out.

Sounds like uploading a human mind and then connecting it to an "intelligence module". (It's probably safer for us to first upload and then think about intelligence enhancement, rather than ask an AGI to figure out how to upload or model us.)

I personally tend to feel that even such a mind would quickly adopt behaviours than you and I find.... alien, and their value system will change significantly. Do you feel that wouldn't happen, and if so do you have any insightas to why?

M. Y. Zuo's Shortform

If you really feel comfortable with that, you can do that. I wondered this too.

Most people are not keen on sacrificing current human civilisation at the altar of a different one.

1M. Y. Zuo19dWhat’s the rational basis for preferring all mass-energy consuming grey goo created by humans over all mass-energy consuming grey goo created by a paperclip optimizer? The only possible ultimate end in both scenarios is heat death anyways.
M. Y. Zuo's Shortform

Because it does not define 'fun'.

Because you need to figure out or intuit what is fun for you. It may not be identical to what is fun for Yudkowsky. It beingvery arbitrary all the more means you can't use some rationalist discourse to arrive at an understanding of what 'fun' is.

only an omniscient and omnipresent being could 'clearly' see whether the world is benevolently designed or not

We can meaningfully move towards goals we know we can never attain, forever.

0M. Y. Zuo21dSo why must we prevent paperclip optimizers from bringing about their own ‘fun’?
M. Y. Zuo's Shortform

+1 on this

Rationality is instrumental.

Why do you believe AI alignment is possible?

they can use their eyes and see instead of spending resources on bananas, etc., we’re spending it on ballistic missiles, etc.

That certainly acts as a point against us being aligned, in the brain of monkey. (Assuming they could even understand it's us who are building the missiles in front of them.) Maybe you can counteract it with other points in favour. It isn't immediately clear to me why that has to be deceptive (if we were in fact aligned with monkeys). Keen on your thoughts.

P.S. Minor point but you can even deliberately hide the missiles from the monkeys, if necessary. I'm not sure if willful omission counts as deception.

Why do you believe AI alignment is possible?

Okay but the analogue isn't that we need to convince monkeys ballistic missiles are important. It's that we need to convince monkeys that we care about exactly the same things they do. That we're one of them.

(That's what I meant by - there's a lot of things we don't need to understand, if we only want to understand that we are aligned.)

4M. Y. Zuo21dAre you pondering what arguments a future AGI will need to convince humans? That’s well covered on LW. Otherwise my point is that we will almost certainly not convince monkeys that ‘we’re one of them‘ if they can use their eyes and see instead of spending resources on bananas, etc., we’re spending it on ballistic missiles, etc. Unless you mean if we can by deception, such as denying we spend resources along those lines, etc… in that case I’m not sure how that relates to a future AGI/human scenarios.
Why do you believe AI alignment is possible?

re c): Cool, no worries. I agree it's a little specific.

re last para, you're right that "deflecting" may not have been the best word. Basically I meant you're intentionally moving the conversation away from trying to nail down specifics, which is opposite to the direction I was trying to move it because that's where I felt it would be most useful. I agree that your conversational move may have been useful, I was just wondering if it would be more useful to now start moving in the direction I wanted.

By the end of this conversation I have gotten a vague ment... (read more)

1TekhneMakre21dMostly, all good. (I'm mainly making this comment about process because it's a thing that crops up a lot and seems sort of important to interactions in general, not because it particularly matters in this case.) Just, "I meant you're intentionally moving the conversation away from trying to nail down specifics"; so, it's true that (1) I was intentionally doing X, and (2) X entails not particularly going toward nailing down specifics, and (3) relative to trying to nail down specifics, (2) entails systematically less nailing down of specifics. But it's not the case that I intended to avoid nailing down specifics; I just was doing something else. I'm not just saying that I wasn't *deliberately* avoiding specifics, I'm saying I was behaving differently from someone who has a goal or subgoal of avoiding specifics. Someone with such a goal might say some things that have the sole effect of moving the conversation away from specifics. For example, they might provide fake specifics to distract you from the fact they're not nailing down specifics; they might mock you or otherwise punish you for asking for specifics; they might ask you / tell you not to ask questions because they call for specifics; they might criticize questions for calling for specifics; etc. In general there's a potentially adversarial dynamic here, where someone intends Y but pretends not to intend Y, and does this by acting as though they intend X which entails pushing against Y; and this muddies the waters for people just intending X, not Y, because third parties can't distinguish them. Anyway, I just don't like the general cultural milieu of treating it as an ironclad inference that if someone's actions systematically result in Y, they're intending Y. It's really not a valid inference in theory or practice. The situation is sometimes muddied, such that it's appropriate to treat such people *as though* they're intending Y, but distinguishing this from a high-confidence proposition that they are in fact
Why do you believe AI alignment is possible?

Got it. Second para makes a lot of sense.

First and last para feel like intentionally deflecting from trying to pin down specifics. I mean your responses are great but still. My responses seem to be moving towards trying to pin some specific things down, yours go a bit in the opposite direction. Do you feel pinning down specifics is a) worth doing? b) possible to do? c) something you wish to do in this conversation?

(I totally understand that defining specifics too rigidly in one way shouldn't blind us to all the other ways we could have done things, but that doesn't by itself mean we shouldn't ever try to define them in different ways and think each of those through.)

4TekhneMakre21da) worth doing? Extremely so; you only ever get good non-specifics as the result having iteratively built up good specifics. b) possible to do? In general, yes. In this case? Fairly likely not; it's bad poetry, the senses that generated are high variance, likely nonsense, some chance of some sense. And alignment is hard and understanding minds is hard. c) something you wish to do in this conversation? Not so much, I guess. I mean, I think some of the metaphors I gave, e.g. the one about the 10 year old, are quite specific in themselves, in the sense that there's some real thing that happens when a human grows up which someone could go and think about in a well-defined way, since it's a real thing in the world; I don't know how to make more specific what, if anything, is supposed to be abstracted from that as an idea for understanding minds, and more-specific-ing seems hard enough that I'd rather rest it. Thanks for noting explicitly. (Though, your thing about "deflecting" seems, IDK what, like you're mad that I'm not doing something, or something, and I'd rather you figure out on your own what it is you're expecting from people explicitly and explicitly update your expectations, so that you don't accidentally incorrectly take me (or whoever you're talking to) to have implicitly agreed to do something (maybe I'm wrong that's what happened). It's connotatively false to say I'm "intentionally deflecting" just because I'm not doing the thing you wanted / expected. Specific-ing isn't the only good conversational move and some good conversational moves go in the opposite direction.)
Why do you believe AI alignment is possible?

it's definitely harder to be an x-risk without superintelligence

Harder but not impossible. Black balls are hypothetical inventions whose very existence (or existence as public information) makes them very likely to be deployed. With nukes for instance we have only a small set of parties who are capable of building them and choose not to deploy.

an island population with a backup of libgen

As a complete aside, that's a really cool hypothetical, I have no idea if that's true though. Lots of engineering depends on our economic and scientific history, costs of m... (read more)

4TekhneMakre21dI wouldn't want to say that too much. I'd rather say that an organ serves a purpose. It's part of a design, part of something that's been optimized, but it isn't mainly optimizing, or as you say, it's not intelligent. More "pieces which can be assembled into an optimizer", less "a bunch of little optimizers", and maybe it would be good if the human were doing the main portion of the assembling, whatever that could mean. Hm. This feels like a bit of a different dimension from the developmental analogy? Well, IDK how the metaphor of hands and eyes is meant. Having more "hands and eyes", in the sense of the bad poetry of "something you can weild or perceive via", feels less radical than, say, what happens when a 10-year-old meets someone they can have arguments with and learns to argue-think. IDK, it's a good question. I mean, we know the AI has to be doing a bunch of stuff that we can't do, or else there's no point in having an AI. But it might not have to quite look like "having its own model", but more like "having the rest of the model that the human's model is trying to be". IDK. Also could replace "model" with "value" or "agency" (which goes to show how vague this reasoning is).
Why do you believe AI alignment is possible?

I tried reading about free energy minimisation on wikipedia, it went past my head. Is there any source or material you would recommend?

4Jon Garcia21dYeah, Friston is a bit notorious for not explaining his ideas clearly enough for others to understand easily. It took me a while to wrap my head around what all his equations were up to and what exactly "active inference" entails, but the concepts are relatively straightforward once it all clicks. You can think of "free energy" as the discrepancy between prediction and observation, like the potential energy of a spring stretched between them. Minimizing free energy is all about finding states with the highest probability and setting things up such that the highest probability states are those where your model predictions match your observations. In statistical mechanics, the probability of a particle occupying a particular state is proportional to the exponential of the negative potential energy of that state. That's why air pressure exponentially drops off with altitude (to a first approximation,p(h)∝ex p(−mghRT)). For a normal distribution: p(x)=1√2πσ2exp(−12(x−μ)2σ2) the energy is a parabola: E(x)=−log(p(x))=12(x−μ)2σ2+C This is exactly the energy landscape you see for an ideal Newtonian spring with rest lengthμand spring constantk=1σ2=precision. Physical systems always seek the configuration with the lowest free energy (e.g., a stretched spring contracting towards its rest length). In the context of mind engineering,xmight represent an observation,μthe prediction of the agent's internal model of the world, and1σ2 the expected precision of that prediction. Of course, these are all high-dimensional vectors, so matrix math is involved (Friston always usesΠfor the precision matrix). For rational agents, free energy minimization involves adjusting the hidden variables in an agent's internal predictive model (perception) or adjusting the environment itself (action) until "predictions" and "observations" align to within the desired/expected precision. (For actions, "prediction" is a bit of a misnomer; it's actually a goal or a homeostatic set point that the agent
Why do you believe AI alignment is possible?

Got it. That's a possible stance.

But I do believe there exist arguments (/chains of reasoning/etc) that can be understood by and convincing to both smart and dumb agents, even if the class of arguments that a smarter agent can recognise is wider. I would personally hope one such argument can answer the question "can alignment be done?" , either as yes or no. There's a lot of things about the superhuman intellect that we don't need to be able to understand in order for such an argument to exist. Same as how we don't need to understand the details of monkey ... (read more)

1M. Y. Zuo21dIn this case we would we be the monkeys gazing at the strange, awkwardly tall and hairless monkeys pondering about them in terms of monkey affairs. Maybe I would understand alignment in terms of whose territory is whose, who is the alpha and omega among the human tribe(s), which bananas trees are the best, where is the nearest clean water source, what kind of sticks and stones make the best weapons, etc. I probably won’t understand why human tribe(s) commit such vast efforts into creating and securing and moving around those funny looking giant metal cylinders with lots of wizmos at the top, bigger than any tree I’ve seen. Why every mention of them elicits dread, why only a few of the biggest human tribes are allowed to have them, why they need to be kept on constant alert, why multiple need to be put in even bigger metal cylinders to roam around underwater, etc., surely nothing can be that important right? If the AGI is moderately above us, than we could probably find such arguments convincing to both, but we would never be certain of them. If the AGI becomes as far above us as humans to monkeys then I believe the chances are about as likely as us arguments that could convince monkeys about the necessity of ballistic missile submarines.