All of Shoshannah Tekofsky's Comments + Replies

Well damn... Well spotted.

I found the full-text version and will dig in to this next week to see what's up exactly.

Thank you! I wholeheartedly agree to be honest. I've added a footnote to the claim, linking and quoting your comment. Are you comfortable with this?

3Daniel Kokotajlo2mo
Sure, thanks!

Oooh gotcha. In that case, we are not remotely any good at avoiding the creation of unaligned humans either! ;)

0Meena Kumar2mo
Because we aren't aligned.

Could you paraphrase? I'm not sure I follow your reasoning... Humans cooperate sufficiently to generate collective intelligence, and they cooperate sufficiently due to a range of alignment mechanics between humans, no?

2Christopher King2mo
It's a bit tongue-in-cheek, but technically for an AI to be aligned, it isn't allowed to create unaligned AIs. Like if your seed AI [https://www.lesswrong.com/tag/seed-ai] creates a paperclip maximizer, that's bad. So if humanity accidentally creates a paperclip maximizer, they are technically unaligned under this definition.

Should we have a rewrite the Rationalist Basics Discourse contest?

Not that I think anything is gonna beat this. But still :D

Ps: can be both content and/or style

5the gears to ascension4mo
rewrite contests are, in general, a wonderful idea, if you ask me.

Thank you! I appreciate the in-depth comment.

Do you think any of these groups hold that all of the alignment problem can be solved without advancing capabilities?

Thanks!

And I appreciate the correction -- I admit I was confused about this, and may not have done enough of a deep-dive to untangle this properly. Originally I wanted to say "empiricists versus theorists" but I'm not sure where I got the term "theorist" from either.

Thanks!

And to both examples, how are you conceptualizing a "new idea"? Cause I suspect we don't have the same model on what an idea is.

2Akash5mo
Good question. I'm using the term "idea" pretty loosely and glossily.  Things that would meet this vague definition of "idea": * The ELK problem (like going from nothing to "ah, we'll need a way of eliciting latent knowledge from AIs") * Identifying the ELK program as a priority/non-priority (generating the arguments/ideas that go from "this ELK thing exists" to "ah, I think ELK is one of the most important alignment directions" or "nope, this particular problem/approach doesn't matter much" * An ELK proposal * A specific modification to an ELK proposal that makes it 5% better.  So new ideas could include new problems/subproblems we haven't discovered, solutions/proposals, code to help us implement proposals, ideas that help us prioritize between approaches, etc.  How are you defining "idea" (or do you have a totally different way of looking at things)?

Two things that worked for me:

  1. Produce stuff, a lot of stuff, and make it findable online. This makes it possible for people to see your potential and reach out to you.

  2. Send an email to anyone you admire asking if they are interested in going for a coffee (if you have the funds to fly out to them) or do a video call. Explain why you admire them and why this would be high value to you. I did this for 4 people without limit of 'how likely are they to answer' and one of them said 'yeah sure' and I think the email made them happy cause a reasonable subset of people like learning how they have touched other's lives in a positive way.

Even in experiments, I think most of the value is usually from observing lots of stuff, more than from carefully controlling things.

I think I mostly agree with you but have the "observing lots of stuff" categorized as "exploratory studies" which are badly controlled affairs where you just try to collect more observations to inform your actual eventual experiment. If you want to pin down a fact about reality, you'd still need to devise a well-controlled experiment that actually shows the effect you hypothesize to exist from your observations so far.

If you a

... (read more)

There is an EU telegram group where they are, among other things, collecting data on where people are in Europe. I'll DM an invite.

That makes a lot of sense! And was indeed also thinking of Elicit

Note: The meetup this month is Wednesday, Jan 4th, at 15:00. I'm in Berkeley currently, and I couldn't see how times were displayed for you guys cause I have no option to change time zones on LW. I apologize if this has been confusing! I'll get a local person to verify dates and times next time (or even set them).

Did you accidentally forget to add this post to your research journal sequence?

I thought I added it but apparently hadn't pressed submit. Thank you for pointing that out!


 

  1. optimization algorithms (finitely terminating)
  2. iterative methods (convergent)

That sounds as if as if they are always finitely terminating or convergent, which they're not. (I don't think you wanted to say they are)

I was going by the Wikipedia definition:

To solve problems, researchers may use algorithms that terminate in a finite number of steps, or iterative methods that converge to a

... (read more)
5Leon Lang6mo
I see. I think I was confused since, in my mind, there are many Turing machines that simply do not "optimize" anything. They just compute a function.   I think I wanted to point to a difference in the computational approach of different algorithms that find a path through the universe. If you chain together many locally found heuristics, then you carve out a path through reality over time that may lead to some "desirable outcome". But the computation would be vastly different from another algorithm that thinks about the end result and then makes a whole plan of how to reach this. It's basically the difference between deontology and consequentialism. This post is on similar themes [https://www.alignmentforum.org/posts/KDMLJEXTWtkZWheXt/consequentialism-and-corrigibility].  I'm not at all sure if we disagree about anything here, though.      I would say that if you remember the plan and retrieve it later for repeated use, then you do this by learning and the resulting computation is not planning anymore. Planning is always the thing you do at the moment to find good results now, and learning is the thing you do to be able to use a solution repeatedly.  Part of my opinion also comes from the intuition that planning is the thing that derives its use from the fact that it is applied in complex environments in which learning by heart is often useless. The very reason why planning is useful for intelligent agents is that they cannot simply learn heuristics to navigate the world.  To be fair, it might be that I don't have the same intuitive connection between planning and learning in my head that you do, so if my comments are beside the point, then feel free to ignore :)    Conceptually it does, thank you! I wouldn't call these parameters and hyperparameters, though. Low-level and high-level features might be better terms.  Again, I think the shard theory of human values might be an inspiration for these thoughts, as well as this post on AGI motivation [https://w

Oh my, this looks really great. I suspect between this and the other list of AIS researchers, we're all just taking different cracks at generating a central registry of AIS folk so we can coordinate at all different levels on knowing what people are doing and knowing who to contact for which kind of connection. However, maintaining such an overarching registry is probably a full time job for someone with high organizational and documentation skills.

3plex6mo
Yup, another instance of this is the longtermist census [https://forum.effectivealtruism.org/posts/rMuoqhoer8ThuGmGF/fill-out-this-census-of-everyone-who-could-ever-see], that likely has the most entries but is not public. Then there's AI Safety Watch [https://aiwatch.issarice.com/#positions-grouped-by-person], the EA Hub (with the right filters) [https://eahub.org/profiles/?refinementList%5Bcause_areas%5D%5B0%5D=AI%20strategy%20%26%20policy&refinementList%5Bcause_areas%5D%5B1%5D=AI%20safety%20technical%20research], the mailing list of people who went through AGISF, I'm sure SERI MATS has one, other mailing lists like AISS's opportunities one, other training programs, student groups, people in various entries on aisafety.community [https://coda.io/@alignmentdev/alignmentecosystemdevelopment]... Yeah, there's some organizing to do. Maybe the EA forum's proposed new profile features will end up being the killer app?

Great idea!

So my intuition is that letting people edit a file that is publicly linked is inviting a high probability of undesirable results (like accidental wipes, unnoticed changes to the file, etc). I'm open to looking in to this if the format gains a lot of traction and people find it very useful. For the moment, I'll leave the file as-is so no one's entry can be accidentally affected by someone else's edits. Thank you for the offer though!

4plex6mo
Yeah, that is a risk. Have you checked out ASAP? Seems pretty related https://airtable.com/shrhjo857neCToCNW/tblXj7gik84xGIZly/viwaKxHhBEmIyEcSr?blocks=hide [https://airtable.com/shrhjo857neCToCNW/tblXj7gik84xGIZly/viwaKxHhBEmIyEcSr?blocks=hide] https://airtable.com/shrhjo857neCToCNW/tblXj7gik84xGIZly/viwB4nnuzhGLAEONY?blocks=hide [https://airtable.com/shrhjo857neCToCNW/tblXj7gik84xGIZly/viwB4nnuzhGLAEONY?blocks=hide] https://asap-homepage.notion.site/asap-homepage/Home-b38ba079d3dd4d258baa7cd1ae4eb68f [https://asap-homepage.notion.site/asap-homepage/Home-b38ba079d3dd4d258baa7cd1ae4eb68f]

Thank you for sharing! I actually have a similar response myself but assumed it was not general. I'm going to edit the image out.

EDIT: Both are points are moot using Stuart Armstrong's narrower definition of the Orthogonality thesis that he argues in General purpose intelligence: arguing the Orthogonality thesis:

High-intelligence agents can exist having more or less any final goals (as long as these goals are of feasible complexity, and do not refer intrinsically to the agent’s intelligence).

Old post:

I was just working through my own thoughts on the Orthogonality thesis and did a search on LW on existing material and found this essay. I had pretty much the same thoughts on intel... (read more)

Hmm, that wouldn't explain the different qualia of the rewards, but maybe it doesn't have to. I see your point that they can mathematically still be encoded in to one reward signal that we optimize through weighted factors.

I guess my deeper question would be: do the different qualias of different reward signals achieve anything in our behavior that can't be encoded through summing the weighted factors of different reward systems in to one reward signal that is optimized?

Another framing here would be homeostasis - if you accept humans aren't happiness optim... (read more)

1Aprillion (Peter Hozák)7mo
Allostasis [https://how-emotions-are-made.com/notes/Allostasis] is a more biologically plausible explanation of "what a brain does" than homeostasis, but to your point: I do think optimizing for happiness and doing kinda-homeostasis are "just the same somehow". I have a slightly circular view that the extension of happiness exists as an output of a network with 86 billion neurons and 60 trillion connections, and that it is a thing that the brain can optimize for. Even if the intension of happiness as defined by a few English sentences is not the thing, and even if optimization for slightly different things would be very fragile, the attractor of happiness might be very small and surrounded by dystopian tar pits, I do think it is something that exists in the real world and is worth searching for. Though if we cannot find any intension that is useful, perhaps other approaches to AI Alignment and not the "search for human happiness" will be more practical.

Clawbacks refer to grants that have already been distributed but would need to be returned. You seem to be thinking of grants that haven't been distributed yet. I hope both get resolved but they would require different solutions. The post above is only about clawbacks though.

3shminux7mo
Good point. I meant both, since the same logic applies

As a grantee, I'd be very interested in hearing what informs your estimate, if you feel comfortable sharing.

1shminux7mo
I don't have any special insights. But I would assume that the total amount of undistributed grants is not huge, and there are EA-adjacent funding orgs that have funds available. Using previously selected now-unfunded grantees saves them the work of identifying promising projects. Plus it limits the fallout from the current FTX debacle. Win/win.

Sure. For instance, hugging/touch, good food, or finishing a task all deliver a different type of reward signal. You can be saturated on one but not the others and then you'll seek out the other reward signals. Furthermore, I think these rewards are biochemically implemented through different systems (oxytocin, something-sugar-related-unsure-what, and dopamine). What would be the analogue of this in AI?

1Aprillion (Peter Hozák)7mo
I see. These are implemented differently in humans, but my intuition about the implementation details is that "reward signal" as a mathematically abstract object can be modeled by single value even if individual components are physically implemented by different mechanisms, e.g. an animal could be modeled as if was optimizing for a pareto optimum between a bunch of normalized criteria. reward = S(hugs) + S(food) + S(finishing tasks) + S(free time) - S(pain) ... People spend their time cooking, risk cutting fingers, in order to have better food and build relationships. But no one would want to get cancer to obtain more hugs, presumably not even to increase number of hugs from 0 to 1, so I don't feel human rewards are completely independent magisteria, there must be some biological mechanism to integrate the different expected rewards and pains into decisions. Spending energy on computation of expected value can be included in the model, we might decide that we would get lower reward if we overthink the current decision and that would be possible to model as included in the one "reward signal" in theory, even though it would complicate predictability of humans in practice (however, it turns out that humans can be, in fact, hard to predict, so I would say this is a complication of reality, not a useless complication in the model).

ah, like that. Thank you for explaining. I wouldn't consider that a reversal cause you're then still converting intuitions into testable hypotheses. But the emphasis on discussion versus experimentation is then reversed indeed.

What would the sensible reverse of number 5? I can generate those them for 1-4 and 6, but I am unsure what the benefit could be of confusing intuitions with testable hypotheses?

4Richard_Ngo8mo
Reversal: when you have different intuitions about high-level questions, it's often not worth spending a lot of time debating them extensively - instead, move onto doing whatever research your intuitions imply will be valuable.

I really appreciate that thought! I think there were a few things going on:

  • Definitons and Degrees: I think in common speech and intuitions it is the case that failing to pick the optimal option doesn't mean something is not an optimizer. I think this goes back to the definition confusion, where 'optimizer' in CS or math literally picks the best option to maximize X no matter the other concerns. While in daily life, if one says they optimize on X then trading off against lower concerns at some value greater than zero is still considered optimizing. E.g. s
... (read more)
2TekhneMakre8mo
I wouldn't say "picks the best option" is the most interesting thing in the conceptual cluster around "actual optimizer". A more interesting thing is "runs an ongoing, open-ended, creative, recursive, combinatorial search for further ways to greatly increase X".    I mean certainly this is pointing at something deep and important. But the shift here I would say couldn't be coming from agentic IGF maximization, because agentic IGF maximization would have already, before your pregnancy, cared in the same qualitative way, with the same orientation to the intergenerational organism, though about 1/8th as much, about your cousins, and 1/16th as much about the children of your cousins. Like, of course you care about those people, maybe in a similar way as you care about your children, and maybe connected to IGF in some way; but something got turned on, which looks a lot like a genetically programmed mother-child caring, which wouldn't be an additional event if you'd been an IGF maxer. (One could say, you care about your children mostly intrinsically, not mostly because of an IGF calculation. Yes this intrinsic care is in some sense put there by evolution for IGF reasons, but that doesn't make them your reasons.) Hm. I don't agree that this is very plausible; what I agreed with was that human evolution is closer to an IGF maxer, or at least some sort of myopic https://www.lesswrong.com/tag/myopia [https://www.lesswrong.com/tag/myopia]  IGF maxer, in the sense that it only "takes actions" according to the criterion of IGF.  It's a little plausible. I think it would have to look like a partial Baldwinization https://en.wikipedia.org/wiki/Baldwin_effect [https://en.wikipedia.org/wiki/Baldwin_effect] of pointers to the non-genetic memeplex of explicit IGF maximization; I don't think evolution would be able to assemble brainware that reliably in relative isolation does IGF, because that's an abstract calculative idea whose full abstractly calculated implications are weird a

On further reflection, I changed my mind (see title and edit at top of article). Your comment was one of the items that helped me understand the concepts better, so just wanted to add a small thank you note. Thank you!

Thanks!

On that note, I was wondering if there was any way I could tag the people that engaged me on this (cause it's spread between 2 articles) just so I can say thanks? Seems like the right thing to do to high five everyone after a lost duel or something? Dunno, there is some sentiment there where a lightweight acknowledgement/update would be a useful thing to deliver in this case, I feel, to signal that people's comments actually had an effect. DM'ing everyone or replying to each comment again would give everyone a notification but generates a lot of clutter and overhead, so that's why tagging seemed like a good route.

4Ben Pace8mo
No especially good suggestion from me. Obvious options: * You could make a comment that links to the most helpful comments. * You could make one PM convo that includes everyone (you can add multiple people to a PM convo) and link them to the comment Agree that tagging/mentions would be nice here.

I wasn't sure how I hadn't argued that, but between all the difference comments, I've now pieced it together. I appreciate everyone engaging me on this, and I've updated the essay to "deprecated" with an explanation at the top that I no longer endorse these views.

3TekhneMakre8mo
Applause for putting your thoughts out there, and applause for updating. Also maybe worth saying: It's maybe worth "steelmanning" your past self; maybe the intuitions you expressed in the post are still saying something relevant that wasn't integrated into the picture, even if it wasn't exactly "actually some humans are literally IGF maximizers".  Like, you said something true about X, and you thought that IGF meant X, but now you don't think IGF means X, but you still maybe said something worthwhile about X. 

Thank you. Between all the helpful comments, I've updated my point of view and updated this essay to deprecated with an explanation + acknowledgement at the top.

2Viliam8mo
In return, your new disclaimer at the beginning of the article made me notice something I was confused about -- whether we should apply the label "X maximizer" only to someone who actually achieves the highest possible value of X, or also to someone who tries but maybe fails. In other words, are we only talking about internal motivation, or describing the actual outcome and expecting perfection? To use an analogy, imagine a chess-playing algorithm. It is correct to call it a "chess victory maximizer"? On one hand, the algorithm does not care about anything other than winning at chess. On the other hand, if a better algorithm comes later and defeats the former one, will we say that the former one is not an actual chess victory maximizer, because it did some (in hindsight) non-victory-maximizing moves, which is how it lost the game? When talking about humans, imagine that a random sci-fi mutation turns someone into a literal fitness maximizer, but at the same time, that human's IQ remains only 100. So the human would literally stop caring about anything other than reproduction, but maybe would not be smart enough to notice the most efficient strategy, and would use a less efficient one. Would it still be okay to call such human a fitness maximizer? Is it about "trying, within your limits", or is it "doing the theoretically best thing"? I suppose, if I talked to such guy, and told him e.g. "hey, do you realize that donating at sperm clinic would result in way more babies than just hooking up with someone every night and having unprotected sex?", if the guy would immediately react by "oh shit, no more sex anymore, I need to save all my sperms for donation" then I would see no objection to calling him a maximizer. His cognitive skills are weak, but his motivation is flawless. (But I still stand by my original point, that humans are not even like this. The guys who supposedly maximize the number of their children would actually not be willing to give up sex forever, i
5Ben Pace8mo
Woop, take credit for changing your mind!

The surrogacy example originally struck me as very unrealistic cause I presumed it was mostly illegal (it is in Europe but apparently not in some States of the US) and heavily frowned upon here for ethical reasons (but possibly not in the US?). So my original reasoning was that you'd get in far more trouble for applying for many surrogates than for swapping out sperm at the sperm bank.

I guess if this is not the case then it might have been a fetish for those doctors? I'm slightly confused about the matter now what internal experience put them up to it if t... (read more)

Yes, good point. I was looking at those statistics for a bit. Poorer parents do indeed tend to maximize their number of offspring no matter the cost while richer parents do not. It might be that parents overestimate the IGF payoffs of quality, but then that just makes them bad/incorrect optimizers. It wouldn't make them less of an optimizer.

I think there also some other subtle nuances going on, like for instance, I'd consider myself fairly close to an IGF optimizer but I don't care about all genes/traits equally. There is a multigenerational "strain" I ide... (read more)

I think the notion that people are adaptation-executors, who like lots of things a little bit in context-relevant situations, predicts our world more than the model of fitness-maximizers, who would jump on this medical technology and aim to have 100,000s of children soon after it was built.

I think this skips the actual social trade-offs of the strategy you outline above:

  1. The likely back lash in society against any woman who tries this is very high. Any given rich woman would have to find surrogate women who are willing to accept the money and avoid bei
... (read more)

My claim was purely that some people do actually optimize on this. It's just fairly hard, and their success also relies on how their abilities to game the system compares to how strong the system is. E.g. There was that fertility doctor that just used his own sperm all the time, for instance.

2Viliam8mo
Yes, the story of the doctor was the inspiration for my comment. Compared to him, other "maximizers" clearly did not do enough. And as Gwern wrote, even the doctor could have done much better. (Also, I have no evidence here, but I wonder how much of what the doctor did was a strategy, and how much was just exploiting a random opportunity. Did he become a fertility doctor on purpose to do this, or did he just choose a random high-status job, and then noticed an opportunity? I suppose we will never know.)
5gwern8mo
I'm not sure which one you mean because there's a few examples of that, but he still has not maximized even for quite generous interpretations of 'maximize': none of those doctors so much as lobbied their fellow doctors to use him as their exclusive sperm donor, for example, nor offered to bribe them; none of the doctors I've read about appear to have spent any money at all attempting to get more offspring, much less to the extent of making any dent in their high doctor-SES standard of living (certainly no one went, 'oh, so that is what he had secretly devoted his life to maximizing, we were wondering'), much less paid for a dozen surrogacies with the few million net assets they'd accumulate over a lifetime. You can't excuse this as a typical human incompetence [https://danluu.com/p95-skill/] because it requires only money to cut a check, which they had.

Makes sense. I'm starting to suspect I overestimated the number of people who would take these deals, but I think there still would be more for the above than for the original thought experiments.

Here is my best attempt at working out my thoughts on this, but I noticed I reached some confusion at various points. I figured I'd post it anyway in case it either actually makes sense or people have thoughts they feel like sharing that might help my confusion.

Edit: The article is now deprecated. Thanks for everyone commenting here for helping me understand the different definitions of optimizer. I do suspect my misunderstanding of Nate's point might mirror why there is relatively common pushback against his claim? But maybe I'm typical minding.

They are a small minority currently cause the environment changes so quickly right now. Things have been changing insanely fast in the last century or so but before the industrial revolution and especially before the agriculture revolution, humans were much better optimized for IGF, I think. Evolution is still 'training' us and these last 100 years have been a huge change compared to the generation length of humans. Nate is stating that humans genetically are not IGF maximizers, and that is false. We are, we are just currently heavily being 'retrained'.

Re:... (read more)

I disagree humans don't optimize IGF:

  1. We seem to have different observational data. I do know some people who make all their major life decisions based on quality and quantity of offspring. Most of them are female but this might be a bias in my sample. Specifically, quality trades off against quantity: waiting to find a fitter partner and thus losing part of your reproductive window is a common trade off. Similarly, making sure your children have much better lives than you by making sure your own material circumstances (or health!) are better is another.
... (read more)
5Ben Pace8mo
* Given the ability to medically remove, store, and artificially inseminate eggs, current technologies make it possible for a woman to produce many more children than the historical limit of ~50 (i.e. one every 9 months for a woman's entire reproductive years), and closer to the limit (note that each woman produces 100,000s of eggs).  * I don't have a worked out plan, but I could see a woman removing most of her eggs, somehow causing many other women to use her eggs to have children (whether it's by finding infertile women, or paying people, or showing that the eggs would be healthier than others'), and having many more children than historically possible. * I suspect many women could have 50-100 children this way, and that peak women could have 10,000s of children this way, closer to the male model of reproduction. * I'd be interested to know the maximum number of children any woman has had in history, and also since the invention of this sort of medical technology. * I imagine that such a world would have a market (and class system) based around being able to get your eggs born. There are services where a different woman will have your children, but I think the maximizer world would look more like poor women primarily being paid to have children (and being pregnant >50% of their lives) and rich women primarily paying to have children (and having 1000s of children born). * I think the notion that people are adaptation-executors, who like lots of things a little bit in context-relevant situations, predicts our world more than the model of fitness-maximizers, who would jump on this medical technology and aim to have 100,000s of children soon after it was built. * I also suspect that population would skyrocket relative to the current numbers (e.g. be 10-1000x the current size). Perhaps efforts to colonize Mars would have been sustained during the 20th century, as this planet would have been more
2Thomas Kwa8mo
The reason why we're talking about humans and IGF is because there's an analogy to AGI. If we select on the AI to be corrigible (or whatever nice property) in subhuman domains, will it generalize out-of-distribution to be corrigible when superhuman and performing coherent optimization? Humans are not generalizing out of distribution. The average woman who wants to raise high quality children does not have the goal of maximizing IGF; she does try to instill the value of maximizing IGF into them, nor use the far more effective strategies of donating eggs, trying to get around egg donation limits [https://www.ucsfhealth.org/education/faq-common-questions-for-egg-donors], or getting her male relatives to donate sperm. If the environment stabilizes, additional selection pressure might cause these people to become a majority. But we might not have additional selection pressure [https://www.lesswrong.com/posts/GNhMPAWcfBCASy8e6/a-central-ai-alignment-problem-capabilities-generalization] in the AGI case.
6Rob Bensinger8mo
Is this the best strategy for maximizing IGF? Do happier and wealthier kids have more offspring? Given that wealthier countries tend to have lower birth rates, I wonder if the IGF-maximizing strategy would instead often look like trying to have lots of poor children with few options? (I'll note as an aside that even if this is false, it should definitely be a thing many parents seriously consider doing and are strongly tempted by, if the parents are really maximizing IGF rather than maximizing proxies like "their kids' happiness". It would be very weird, for example, if an IGF maximizer reacted to this strategy with revulsion.) I'd be similarly curious if there are cases where making your kids less happy, less intelligent, less psychologically stable, etc. increased their expected offspring. This would test to what extent 'I want lots and lots and lots of kids' parents are maximizing IGF per se, versus maximizing some combination of 'have lots of descendants', 'make my descendants happy (even if this means having fewer of them)', etc.
3Shoshannah Tekofsky8mo
Here is my best attempt [https://www.lesswrong.com/posts/CCN3XHiRKhrtX6wDW/some-humans-are-fitness-maximizers] at working out my thoughts on this, but I noticed I reached some confusion at various points. I figured I'd post it anyway in case it either actually makes sense or people have thoughts they feel like sharing that might help my confusion. Edit: The article is now deprecated. Thanks for everyone commenting here for helping me understand the different definitions of optimizer. I do suspect my misunderstanding of Nate's point might mirror why there is relatively common pushback against his claim? But maybe I'm typical minding.
7tailcalled8mo
In the long term, we would expect humans to end up directly optimizing IGF (assuming no revolutions like AI doom or similar) due to evolution. The way this proceeds in practice is that people vary on the extent to which they optimize IGF vs other things, and those who optimize IGF pass on their genes, leading to higher optimization of IGF. So yes eventually these sorts of people will win, but as you admit yourself they are a small minority, so humans as they currently exist are mostly not IGF maximizers. Also, regarding quality vs quantity, it's my impression that society massively overinvests in quality relative to what would be implied by IGF. Society is incredibly safe compared to the past, so you don't need much effort to make them survive. Insofar as there is an IGF value in quality, it's probably in somehow convincing your children to also optimize for IGF, rather than do other things.

Thank you for the comment!

Possibly such a proof exists. With more assumptions, you can get better information on human values, see here. This obviously doesn't solve all concerns.

Those are great references! I'm going to add them to my reading list, thank you.

Only a few people think about this a lot -- I currently can only think of the Center on Long-Term Risk on the intersection of suffering focus and AI Safety. Given how bad suffering is, I'm glad that there are people thinking about it, and do not think that a simple inefficiency argument is enough.

I'd h... (read more)

1Leon Lang9mo
I think I basically agree (though maybe not with as much high confidence as you), but I think that doesn't mean that huge amounts of suffering will not dominate the future. For example, if there will be not one but many superintelligent AI systems determining the future, this might create suffering due to cooperation failures. 

What distinguishes capabilities and intelligence to your mind, and what grounds that distinction? I think I'd have to understand that to begin to formulate an answer.

1wilm9mo
I've unfortunately been quite distracted, but better a late reply than no reply. With capabilities I mean how well a system accomplishes different tasks. This is potentially high dimensional (there can be many tasks that two systems are not equally good at). Also it can be more and less general (optical character recognition is very narrow because it can only be used for one thing, generating / predicting text is quite general). Also, systems without agency can have strong and general capabilities (a system might generate text or images without being agentic). This is quite different from the definition by Legg and Hutter, which is more specific to agents. However, since last week I have updated on strongly and generally capable non-agentic systems being less likely to actually be built (especially before agentic systems). In consequence, the difference between my notion of capabilities and a more agent related notion of intelligence is less important than I thought.

Great job writing up your thoughts, insights, and model!

My mind is mainly attracted to the distinction you make between capabilities and agency. In my own model, agency is a necessary part of increasing capabilities, and will per definition emerge in superhuman intelligence. I think the same conclusion follows from the definitions you use as follows:

You define "capabilities" by the Legg and Hutter definition you linked to, which reads:
 

 Intelligence measures an agent's ability to achieve goals in a wide range of environments

You define "agency" as... (read more)

2wilm9mo
Thanks for your replies. I think our intuitions regarding intelligence and agency are quite different. I deliberately mostly stickest to the word ‘capabilities’, because in my intuition you can have systems with very strong and quite general capabilities, that are not agentic. One very interesting point is that you : “Presumably the problem happens somewhere between "the smartest animal we know" and "our intelligence", and once we are near that, recursive self-improvement will make the distinction moot”. Can you explain this position more? In my intuition building and improving intelligent systems is far harder than that. I hope to later come back to your answer to information about the real world.

Yes, agreed. The technique is only aimed at the "soft" edge of this, where people might in reality even disagree if something is still in or outside the Overton Window. I do think a gradient-type model of controversiality is a more realistic model of how people are socially penalized than a binary model. The exercise is not aimed at sharing views that would lead to heavy social penalties indeed, and I don't think anyone would benefit from running it that way. It's a very relevant distinction you are raising.

Good question!

My thinking on this is slightly different than @omark's. Specifically:

  • Everyone commits to being vulnerable by sharing their own controversial statements. This symmetry is often not present in normal conversation, where you focus on one topic where one person might have a controversial opinion and the other does not.
  • It's much higher density on iterating through controversial opinions than a normal conversation would be.
  • It's a session you can sign up for where you can trust everyone is coming to the session with the same intention to grow and s
... (read more)
1M. Y. Zuo9mo
That’s interesting though I don’t see how the commitment mechanism could work without some arbiter to decide if the follow up statement is actually controversial How do you envision disputes along the lines of not-actually-that-controversial will be resolved?

My intuition is that there is a gradient from controversial statements to this-will-cause-unrecoverable-social-status damage. I think I might have implicitly employed a 'softer' definition of Overton window as 'statements that make others or yourself uncomfortable to express/debate', where the 'harder' definition would be statements you can't socially recover from. I think intuitively I wouldn't presume anyone wants to share the latter and I don't see much benefit in doing so. But overall, my concept of Overton window is much more gradient than a binary, and this exercise aims to allow people to stretch through the (perceived) low range.

Moved the addendum in to the comments, cause it seemed to mess up the navigation. This seems like a more elegant solution.

 


 
Addendum: Experiments

These are experiments we ran at an AIS-x-rationality meetup to explore novelty generation strategies. I've added a short review to each exercise description.

Session 1

Exercise 1: Inside View

  • Split in pairs
  • 5 minute timer
  • Instructions: explain your internal model of the AI Alignment problem. If someone is done talking, then remaining time can be filled with questions.
  • Switch

Review: This was great priming but h... (read more)

Interesting!

I dug through the comments too and someone referred to this article by Holden Karnofsky, but I don't actually agree with that for adults (kids, sure).

Yes, but that's not what I meant by my question. It's more like ... do we have a way of applying kinds of reward signals to AI, or can we only apply different amounts of reward signals? My impression is the latter, but humans seem to have the former. So what's the missing piece?

1Aprillion (Peter Hozák)7mo
hm, I gave it some time, but still confused .. can you name some types of reward that humans have?
Load More