All of X4vier's Comments + Replies

Maybe an analogy which seems closer to the "real world" situation - let's say you and someone like Sam Altman both tried to start new companies. How much more time and starting capital do you think you'd need to have a better shot of success than him?

3Herb Ingram5mo
I really have no idea, probably a lot? I don't quite see what you're trying to tell me. That one (which?) of my two analogies (weather or RTS) is bad? That you agree or disagree with my main claim that "evaluating the relative value of an intelligence advantage is probably hard in real life"? Your analogy doesn't really speak to me because I've never tried to start a company and have no idea what leads to success, or what resources/time/information/intelligence helps how much.

Out of interest - if you had total control over OpenAI - what would you want them to do?

I kinda reject the energy of the hypothetical? But I can speak to some things I wish I saw OpenAI doing:

  1. Having some internal sense amongst employees about whether they're doing something "good" given the stakes, like Google's old "don't be evil" thing. Have a culture of thinking carefully about things and managers taking considerations seriously, rather than something more like management trying to extract as much engineering as quickly as possible without "drama" getting in the way.

    (Perhaps they already have a culture like this! I haven't worked there. Bu
... (read more)

I think OP is correct about cultural learning being the most important factor in explaining the large difference in intelligence between homo sapiens and other animals.

In early chapters of Secrets of Our Success, the book examines studies comparing performance of young humans and young chimps on various congnitive tasks. The book argues that across a broad array of cognitive tests, 4 year old humans do not perform singificantly better than 4 year old chimps on average, except in cases where the task can be solved by immitating others (human children crushe... (read more)

Note that this isn't exactly the hypothesis proposed in the OP and would point in a different direction. OP states there is a categorical difference between animals and humans, in the ability of humans to transfer data to future generation. This is not the case, because animals do this as well. What your paraphrase of Secrets of Our Success is suggesting is this existing capacity for transfer of data across generations is present in many animals, but there is some threshold of 'social learning' which was crossed by humans - and when crossed, lead to cultural explosion.  I think this is actually mostly captured by .... One notable thing about humans is, it's likely the second time in history a new type of replicator with R>1 emerged: memes. From replicator-centric perspective on the history of the universe, this is the fundamental event, starting a different general evolutionary computation operating at much shorter timescale.  Also ... I've skimmed few chapters of the book and the evidence it gives of the type 'chimps vs humans' is mostly for current humans being substantially shaped by cultural evolution, and also our biology being quite influenced by cultural evolution. This is clearly to be expected after the evolutions run for some time, but does not explain causality that much. (The mentioned new replicator dynamic is actually one of the mechanisms which can lead to discontinuous jumps based on small changes in underlying parameter. Changing the reproduction number of a virus from just below one to above one causes an epidemic.)  

If we expect there will be lots of intermediate steps - does this really change the analysis much?

How will we know once we've reached the point where there aren't many intermediate steps left before crossing a crticial threshold? How do you expect everyone's behaviour to change once we do get close?

If we expect there will be lots of intermediate steps - does this really change the analysis much? > I think so yes. One fundamental way is that you might develop machines that are intelligent enough to produce new knowledge at a speed and quality above the current capacity of humans, without those machines being necessarily agentic. Those machines could potentially work in the alignment problem themselves I think I know what EY objection would be (I might be wrong): a machine capable of doing that is already an AGI and henceforth already deadly.  Well, I think this argument would be wrong too. I can envision a machine capable of doing science and not necessarily being agentic. How will we know once we've reached the point where there aren't many intermediate steps left before crossing a crticial threshold? > I don't know if it is useful to think in terms of thresholds.  A threshold to what? To an AGI? To an AGI of unlimited power? Before making a very intelligent machine there will be less intelligent machines. The leap can be very quick, but I don't expect that there will be at any point one single entity that is so powerful that will dominate any other life forms in a very short time (a window of time shorter than it takes to other companies/groups to develop similar entities). How do I know that? I don't, but when I hear all the possible scenarios in which a machine pulls off a "end of the world" scenario, they all are based on the assumption (and I think it is fair calling it this way) that the machine will have almost unlimited power, e.g. it is able to simulate nanomachines and then devise a plan to successfully deploy simultaneously those nanomachines everywhere while being hidden. It is this part of the argument that I have problems with: it assumes that these things are possible in the first place. And some things are not, even if you have 100000 Von Neumanns thinking for 1000000 years. A machine that can play Go at the God level can't win a game again

Doesn't make sense to use the particular consumer's preferencces to estimate the cruelty cost. If that's how we define the cruelty cost it then the buyer should already be taking it into account when making their purchasing decision, so it's not an exernality.

The externality comes from the animals themselves having interests which the consumers aren't considering

The issue is that there might be no objective way to compare an animal's interests to a human's interests.
Answer by X4vierDec 19, 2022165

(This is addressed both to concerned_dad and to "John" who I hope will read this comment).

Hey John, I'm Xavier - hope you don't mind me giving this unsolicited advice but I'd love to share my take on your situation, and some personal anecdotes about myself and my friends which I think you might find useful (who I suspect had a lot in common with you when we were your age). I bet you're often frustrated about adults thinking they know better than you, especially when most of the time they're clearly not as sharp as you are and don't seem to be thinking as d... (read more)

For the final bet (or the induction base for a finite sequence), one cannot pick an amount without knowing the zero-point on the utility curve.

I'm a little confused about what you mean sorry - 

What's wrong with this example?: 

It's time for the final bet, I have $100 and my utility is 

I have the opportunity to bet on a coin which lands heads with probability , at  odds.

If I bet  on heads, then my expected utility is , which is maximized when .

So I decide to bet 5... (read more)

The problem is in the "I have" statement in the setup.  After your final bet, you will be dead (or at least not have any ability for further decisions about the money).  You have to specify what "have" means, in terms of your utility.  Perhaps you've willed it all to a home for cats, in that case the home has 500,100 +/- x.  Perhaps you care about humanity as a whole, in which case your wager has no impact - any that you add or remove from "your" $100 comes out of someone else's.  Or if the wager is making something worth x, or destroying x value as your final act, then humanity as a whole has $90T +/- x.  

As far as I can tell, the fact that you only ever control a very small proportion of the total wealth in the universe isn't something we need to consider here.

No matter what your wealth is, someone with log utility will treat a prospect of doubling their money to be exactly as good as it would be bad to have their wealth cut in half, right?

I don't know - I don't have a good sense of what "terminal values" mean for humans.  But I suspect it does matter - for a logarithmic utility curve, figuring out the change in utility for a given delta in resources depends entirely on the proportion of the total that the given delta represents. 

Thanks heaps for the post man, I really enjoyed it! While I was reading it felt like you were taking a bunch of half-baked vague ideas out of my own head, cleaning them up, and giving some much clearer more-developed versions of those ideas back to me :)

Thanks for response!

Input/output: I agree that the unnatural input/output channel is just as much a problem for the 'intended' model as for the models harbouring consequentialists, but I understood your original argument as relying on there being a strong asymmetry where the models containing consequentialists aren't substantially penalised by the unnaturalness of their input/output channels. An asymmetry like this seems necessary because specifying the input channel accounts for pretty much all of the complexity in the intended model.

Compu... (read more)

they have to run simulations of all kinds of possible universes to work out which ones they care about and where in the multiverse Solomonoff inductors are being used to make momentous decisions

I think that they have an easy enough job but I agree the question is a little bit complicated and not argued for in the post. (In my short response I was imagining the realtime component of the simulation, but that was the wrong thing to be imagining.)

I think the hardest part is not from the search over possible universes but from cases where exact historical simul... (read more)

Thanks for your comment, I think I'm a little confused about what it would mean to actually satisfy this assumption.

It seems to me that many current algorithms, for example, a rainbowDQN agent, would satisfy assumption 3? But like I said I'm super confused about anything resembling questions about self-awareness/naturalisation.

Sorry for the late response! I didn't realise I had comments :)

In this proposal we go with (2): The AI does whatever it thinks the handlers will reward it for.

I agree this isn't as good as giving the agents an actually safe reward function, but if our assumptions are satisfied then this approval-maximising behaviour might still result in the human designers getting what they actually want.

What I think you're saying (please correct me if I misunderstood) is that an agent aiming to do whatever its designers reward it for will be incentivised ... (read more)

My entry:

Accepted, gave some feedback in the comments

So much of your writing sounds like an eloquent clarification of my own underdeveloped thoughts. I'd bet good money your lesswrong contributions have delivered me far more help than harm :) Thanks <3

Heartbreaking :'( still, that "taken time off from their cryptographic shenanigans" line made me laugh so hard I woke my girlfriend up