Thank you! I appreciate the in-depth comment.
Do you think any of these groups hold that all of the alignment problem can be solved without advancing capabilities?
Thanks!
And I appreciate the correction -- I admit I was confused about this, and may not have done enough of a deep-dive to untangle this properly. Originally I wanted to say "empiricists versus theorists" but I'm not sure where I got the term "theorist" from either.
Thanks!
And to both examples, how are you conceptualizing a "new idea"? Cause I suspect we don't have the same model on what an idea is.
Two things that worked for me:
Produce stuff, a lot of stuff, and make it findable online. This makes it possible for people to see your potential and reach out to you.
Send an email to anyone you admire asking if they are interested in going for a coffee (if you have the funds to fly out to them) or do a video call. Explain why you admire them and why this would be high value to you. I did this for 4 people without limit of 'how likely are they to answer' and one of them said 'yeah sure' and I think the email made them happy cause a reasonable subset of people like learning how they have touched other's lives in a positive way.
Even in experiments, I think most of the value is usually from observing lots of stuff, more than from carefully controlling things.
I think I mostly agree with you but have the "observing lots of stuff" categorized as "exploratory studies" which are badly controlled affairs where you just try to collect more observations to inform your actual eventual experiment. If you want to pin down a fact about reality, you'd still need to devise a well-controlled experiment that actually shows the effect you hypothesize to exist from your observations so far.
...If you a
There is an EU telegram group where they are, among other things, collecting data on where people are in Europe. I'll DM an invite.
That makes a lot of sense! And was indeed also thinking of Elicit
Note: The meetup this month is Wednesday, Jan 4th, at 15:00. I'm in Berkeley currently, and I couldn't see how times were displayed for you guys cause I have no option to change time zones on LW. I apologize if this has been confusing! I'll get a local person to verify dates and times next time (or even set them).
Did you accidentally forget to add this post to your research journal sequence?
I thought I added it but apparently hadn't pressed submit. Thank you for pointing that out!
- optimization algorithms (finitely terminating)
- iterative methods (convergent)
That sounds as if as if they are always finitely terminating or convergent, which they're not. (I don't think you wanted to say they are)
I was going by the Wikipedia definition:
...To solve problems, researchers may use algorithms that terminate in a finite number of steps, or iterative methods that converge to a
Oh my, this looks really great. I suspect between this and the other list of AIS researchers, we're all just taking different cracks at generating a central registry of AIS folk so we can coordinate at all different levels on knowing what people are doing and knowing who to contact for which kind of connection. However, maintaining such an overarching registry is probably a full time job for someone with high organizational and documentation skills.
I'll keep it in mind, thank you!
Great idea!
So my intuition is that letting people edit a file that is publicly linked is inviting a high probability of undesirable results (like accidental wipes, unnoticed changes to the file, etc). I'm open to looking in to this if the format gains a lot of traction and people find it very useful. For the moment, I'll leave the file as-is so no one's entry can be accidentally affected by someone else's edits. Thank you for the offer though!
Thank you for sharing! I actually have a similar response myself but assumed it was not general. I'm going to edit the image out.
EDIT: Both are points are moot using Stuart Armstrong's narrower definition of the Orthogonality thesis that he argues in General purpose intelligence: arguing the Orthogonality thesis:
High-intelligence agents can exist having more or less any final goals (as long as these goals are of feasible complexity, and do not refer intrinsically to the agent’s intelligence).
Old post:
I was just working through my own thoughts on the Orthogonality thesis and did a search on LW on existing material and found this essay. I had pretty much the same thoughts on intel...
Hmm, that wouldn't explain the different qualia of the rewards, but maybe it doesn't have to. I see your point that they can mathematically still be encoded in to one reward signal that we optimize through weighted factors.
I guess my deeper question would be: do the different qualias of different reward signals achieve anything in our behavior that can't be encoded through summing the weighted factors of different reward systems in to one reward signal that is optimized?
Another framing here would be homeostasis - if you accept humans aren't happiness optim...
Clawbacks refer to grants that have already been distributed but would need to be returned. You seem to be thinking of grants that haven't been distributed yet. I hope both get resolved but they would require different solutions. The post above is only about clawbacks though.
As a grantee, I'd be very interested in hearing what informs your estimate, if you feel comfortable sharing.
Netherlands
Small celebration in Amsterdam: https://www.lesswrong.com/events/mTxNWEes265zkxhiH/winter-solstice-amsterdam
Sure. For instance, hugging/touch, good food, or finishing a task all deliver a different type of reward signal. You can be saturated on one but not the others and then you'll seek out the other reward signals. Furthermore, I think these rewards are biochemically implemented through different systems (oxytocin, something-sugar-related-unsure-what, and dopamine). What would be the analogue of this in AI?
ah, like that. Thank you for explaining. I wouldn't consider that a reversal cause you're then still converting intuitions into testable hypotheses. But the emphasis on discussion versus experimentation is then reversed indeed.
What would the sensible reverse of number 5? I can generate those them for 1-4 and 6, but I am unsure what the benefit could be of confusing intuitions with testable hypotheses?
I really appreciate that thought! I think there were a few things going on:
good to know, thank you!
On further reflection, I changed my mind (see title and edit at top of article). Your comment was one of the items that helped me understand the concepts better, so just wanted to add a small thank you note. Thank you!
Thanks!
On that note, I was wondering if there was any way I could tag the people that engaged me on this (cause it's spread between 2 articles) just so I can say thanks? Seems like the right thing to do to high five everyone after a lost duel or something? Dunno, there is some sentiment there where a lightweight acknowledgement/update would be a useful thing to deliver in this case, I feel, to signal that people's comments actually had an effect. DM'ing everyone or replying to each comment again would give everyone a notification but generates a lot of clutter and overhead, so that's why tagging seemed like a good route.
I wasn't sure how I hadn't argued that, but between all the difference comments, I've now pieced it together. I appreciate everyone engaging me on this, and I've updated the essay to "deprecated" with an explanation at the top that I no longer endorse these views.
Thank you. Between all the helpful comments, I've updated my point of view and updated this essay to deprecated with an explanation + acknowledgement at the top.
The surrogacy example originally struck me as very unrealistic cause I presumed it was mostly illegal (it is in Europe but apparently not in some States of the US) and heavily frowned upon here for ethical reasons (but possibly not in the US?). So my original reasoning was that you'd get in far more trouble for applying for many surrogates than for swapping out sperm at the sperm bank.
I guess if this is not the case then it might have been a fetish for those doctors? I'm slightly confused about the matter now what internal experience put them up to it if t...
Yes, good point. I was looking at those statistics for a bit. Poorer parents do indeed tend to maximize their number of offspring no matter the cost while richer parents do not. It might be that parents overestimate the IGF payoffs of quality, but then that just makes them bad/incorrect optimizers. It wouldn't make them less of an optimizer.
I think there also some other subtle nuances going on, like for instance, I'd consider myself fairly close to an IGF optimizer but I don't care about all genes/traits equally. There is a multigenerational "strain" I ide...
I think the notion that people are adaptation-executors, who like lots of things a little bit in context-relevant situations, predicts our world more than the model of fitness-maximizers, who would jump on this medical technology and aim to have 100,000s of children soon after it was built.
I think this skips the actual social trade-offs of the strategy you outline above:
My claim was purely that some people do actually optimize on this. It's just fairly hard, and their success also relies on how their abilities to game the system compares to how strong the system is. E.g. There was that fertility doctor that just used his own sperm all the time, for instance.
Makes sense. I'm starting to suspect I overestimated the number of people who would take these deals, but I think there still would be more for the above than for the original thought experiments.
That last one killed me hahaha _
Here is my best attempt at working out my thoughts on this, but I noticed I reached some confusion at various points. I figured I'd post it anyway in case it either actually makes sense or people have thoughts they feel like sharing that might help my confusion.
Edit: The article is now deprecated. Thanks for everyone commenting here for helping me understand the different definitions of optimizer. I do suspect my misunderstanding of Nate's point might mirror why there is relatively common pushback against his claim? But maybe I'm typical minding.
They are a small minority currently cause the environment changes so quickly right now. Things have been changing insanely fast in the last century or so but before the industrial revolution and especially before the agriculture revolution, humans were much better optimized for IGF, I think. Evolution is still 'training' us and these last 100 years have been a huge change compared to the generation length of humans. Nate is stating that humans genetically are not IGF maximizers, and that is false. We are, we are just currently heavily being 'retrained'.
Re:...
I disagree humans don't optimize IGF:
Thank you for the comment!
Possibly such a proof exists. With more assumptions, you can get better information on human values, see here. This obviously doesn't solve all concerns.
Those are great references! I'm going to add them to my reading list, thank you.
Only a few people think about this a lot -- I currently can only think of the Center on Long-Term Risk on the intersection of suffering focus and AI Safety. Given how bad suffering is, I'm glad that there are people thinking about it, and do not think that a simple inefficiency argument is enough.
I'd h...
What distinguishes capabilities and intelligence to your mind, and what grounds that distinction? I think I'd have to understand that to begin to formulate an answer.
Great job writing up your thoughts, insights, and model!
My mind is mainly attracted to the distinction you make between capabilities and agency. In my own model, agency is a necessary part of increasing capabilities, and will per definition emerge in superhuman intelligence. I think the same conclusion follows from the definitions you use as follows:
You define "capabilities" by the Legg and Hutter definition you linked to, which reads:
Intelligence measures an agent's ability to achieve goals in a wide range of environments
You define "agency" as...
Yes, agreed. The technique is only aimed at the "soft" edge of this, where people might in reality even disagree if something is still in or outside the Overton Window. I do think a gradient-type model of controversiality is a more realistic model of how people are socially penalized than a binary model. The exercise is not aimed at sharing views that would lead to heavy social penalties indeed, and I don't think anyone would benefit from running it that way. It's a very relevant distinction you are raising.
Good question!
My thinking on this is slightly different than @omark's. Specifically:
My intuition is that there is a gradient from controversial statements to this-will-cause-unrecoverable-social-status damage. I think I might have implicitly employed a 'softer' definition of Overton window as 'statements that make others or yourself uncomfortable to express/debate', where the 'harder' definition would be statements you can't socially recover from. I think intuitively I wouldn't presume anyone wants to share the latter and I don't see much benefit in doing so. But overall, my concept of Overton window is much more gradient than a binary, and this exercise aims to allow people to stretch through the (perceived) low range.
These are experiments we ran at an AIS-x-rationality meetup to explore novelty generation strategies. I've added a short review to each exercise description.
Exercise 1: Inside View
Review: This was great priming but h...
Interesting!
I dug through the comments too and someone referred to this article by Holden Karnofsky, but I don't actually agree with that for adults (kids, sure).
Yes, but that's not what I meant by my question. It's more like ... do we have a way of applying kinds of reward signals to AI, or can we only apply different amounts of reward signals? My impression is the latter, but humans seem to have the former. So what's the missing piece?
I was thinking of the structure of Generative Adversarial Networks. Would that not apply in this case? It would involve 2 competing AGI's in the end though. I'm not sure if they'd just collaborate to set both their reward functions to max, or if that will never happen due to possible game theoretic considerations.
Thank you for your thoughtful reply!
Did you check out the list of specification gaming or the article? It's quite good! Most of the errors are less like missing rungs and more like exploitable mechanics.
I found that I couldn't follow through with making those sorts of infinite inescapable playgrounds for humans, I always want the game to lead out, to life, health and purpose...
But what would that be for AGI? If they escape the reward functions we want them to have, then they are very unlikely to develop a reward function that will be kind or tolerant...
Thanks for doing this!
I was trying to work out how the alignment problem could be framed as a game design problem and I got stuck on this idea of rewards being of different 'types'. Like, when considering reward hacking, how would one hack the reward of reading a book or exploring a world in a video game? Is there such a thing as 'types' of reward in how reward functions are currently created? Or is it that I'm failing to introspect on reward types and they are essentially all the same pain/pleasure axis attached to different items?
That last explanation se...
I've recently started looking at AIS and I'm trying to figure out how I would like to contribute to the field. My sole motivation is that all timelines see either my kids or grandkids dying from AGI. I want them to die of old age after having lived a decent life.
I get the sense that motivation ranks as quite selfish, but it's a powerful one for me. If working on AIS is the one best thing I can do for their current and future wellbeing, then I'll do that.
>My sole motivation is that all timelines see either my kids or grandkids dying from AGI.
Would that all people were so selfish!
Should we have a rewrite the Rationalist Basics Discourse contest?
Not that I think anything is gonna beat this. But still :D
Ps: can be both content and/or style