All of Ilio's Comments + Replies

Our daily whims might be a bit inconsistent, but our larger goals aren't.

It’s a key faith I used to share, but I’m now agnostic about that. To take a concrete exemple, everyone knows that blues and reds get more and more polarized. Grey type like old me would thought there must be a objective truth to extract with elements from both sides. Now I’m wondering if ethics should ends with: no truth can help deciding whether future humans should be able to live like bees or like dolphins or like the blues or like the reds, especially when living like the reds... (read more)

Fascinating paper! I wonder how much they would agree that holography means sparse tensors and convolution, or that the intuitive versus reflexive thinking basically amount to visuo-spatial versus phonological loop. Can’t wait to hear which other idea you’d like to import from this line of thought.

1Bill Benzon16m
Miriam Lipshutz Yevick was born in 1924 and died in 2018, so we can't ask her these questions. She fled Europe with her family inn 1940 for the same reason many Jews fled Europe and ended up in Hoboken, NJ. Seven years later she got a PhD in math from MIT; she was only the 5th woman to get that degree from MIT. But, as both a woman and a Jew, she had almost no chance of an academic post in 1947. She eventually got an academic gig, but it was at a college oriented toward adult education. Still, she managed to do some remarkable mathematical work. The two papers I mention in that blog post were written in the mid-1970s. That was the height of classic symbolic AI and the cognitive science movement more generally. Newell and Simon got their Turing Award in 1975, the year Yevick wrote that remarkable 1975 paper on holographic logic, which deserves to be more widely known. She wrote as a mathematician interested in holography (an interest she developed while corresponding with physicist David Bohm in the 1950s), not as a cognitive scientist. Of course, in arguing for holography as a model for (one kind of) thought, she was working against the tide. Very few were thinking in such terms at that time. Rosenblatt's work was in the past, and had been squashed by Minsky and Pappert, as you've noted. The West Coast connectionist work didn't jump off until the mid-1980s. So there really wasn't anyone in the cognitive science community at the time to investigate the line of thinking she initiated. While she wasn't thinking about real computation, you know, something you actually do on computers, she thought abstractly in computational terms, such as Turing and others did (though Turing also worked with actual computers). It seems to me that her contribution was to examine the relationship between a computational regime and the objects over which he was asked to compute. She's quite explicit about that. If the object tends toward geometrical simplicity – she was using identificat

I have no idea whether or not Hassibis is himself dismissive of that work

Well that’s a problem, don’t you think?

but many are.

Yes, as a cognitive neuroscientist myself, you’re right that many within my generation tend to dismiss symbolic approaches. We were students during a winter that many of us thought caused by the over promising and under delivering of the symbolic approach, with Minsky as the main reason for the slow start of neural networks. I bet you have a different perspective. What’s your three best points for changing the view of my generation?

2Bill Benzon20h
I'll get back to you tomorrow. I don't think it's a matter of going back to the old ways. ANNs are marvelous; they're here to stay. The issue is one of integrating some symbolic ideas. It's not at all clear how that's to be done. If you wish, take a look at this blog post: Miriam Yevick on why both symbols and networks are necessary for artificial minds.

Because I agree, and because « strangely » sounds to me like « with inconstancies ».

In other words, in my view the orthodox view on orthogonality is problematic, because it suppose that we can pick at will within the enormous space of possible functions, whereas the set of intelligent behavior that we can construct is more likely sparse and by default descriptible using game theory (think tit for tat).

3Seth Herd20h
I think this would be a problem if what we wanted was logically inconsistent. But it's not. Our daily whims might be a bit inconsistent, but our larger goals aren't. And we can get those goals into AI - LLMs largely understand human ethics even at this point. And what we really want, at least in the near term, is an AGI that does what I mean and checks.

This is a sort of positive nihilism. Because value is not inherent in the physical world, you can assign value to whatever you want, with no inconsistency.

Say we construct a strong AI that attributes a lot of value to a specific white noise screenshot. How would you expect it to behave?

3Seth Herd1d
Strangely. Why?

Your point is « Good AIs should have a working memory, a concept that comes from psychology ».

DH point is « Good AIs should have a working memory, and the way to implement it was based on concepts taken from neuroscience ».

That’s indeed orthogonal notions, if you will.

2Bill Benzon1d
I did a little checking. It's complicated. In 2017 Hassibis published an article entitled "Neuroscience-Inspired Artificial Intelligence" in which he attributes the concept of episodic memory to a review article that Endel Tulving published in 2002, " EPISODIC MEMORY: From Mind to Brain." That article has quite a bit to say about the brain. In the 2002 article Tulving dates the concept to an article he published in 1972. That article is entitled "Episodic and Semantic Memory." As far as I know, while there are precedents – everything can be fobbed off on Plato if you've a mind to do it, that's where the notion of episodic memory enters in to modern discussions. Why do I care about this kind of detail? First, I'm a scholar and it's my business to care about these things. Second, a lot of people in contemporary AI and ML are dismissive of symbolic AI from the 1950s through the 1980s and beyond. While Tulving was not an AI researcher, he was very much in the cognitive science movement, which included philosophy, psychology, linguistics, and AI (later on, neuroscientists would join in). I have no idea whether or not Hassibis is himself dismissive of that work, but many are. It's hypocritical to write off the body of work while using some of the ideas. These problems are too deep and difficult to write off whole bodies of research in part because they happened before you were born – FWIW Hassibis was born in 1976.

I’m a bit annoyed that Hassabis is giving neuroscience credit for the idea of episodic memory.

That’s not my understanding. To me he is giving neuroscience credit for the ideas that made possible to implement a working memory in LLM. I guess he didn’t want to use words like thalamocortical, but from a neuroscience point of view transformers indeed look inspired by the isocortex, e.g. by the idea that a general distributed architecture can process any kind of information relevant to a human cognitive architecture.

1Bill Benzon1d
Yeah, he's talking about neuroscience. I get that. But "episodic memory" is a term of art and the idea behind it didn't come from neuroscience. It's quite possible that he just doesn't know the intellectual history and is taking "episodic memory" as a term that's in general use, which it is. But he's also making claims about intellectual history.  Because he's using that term in that context, I don't know just what claim he's making. Is he also (implicitly) claiming that neuroscience is the source of the idea? If he thinks that, then he's wrong. If he's just saying that he got the idea from neuroscience, OK. But, the idea of a "general distributed architecture" doesn't have anything to do with the idea of episodic memory. They are orthogonal notions, if you will.

I’d be happy if you could point out a non competitive one, or explain why my proposal above does not obey your axioms. But we seem to get diminished returns to sort these questions out, so maybe it’s time to close at this point and wish you luck. Thanks for the discussion!

Saying fuck you is helpful when the aim is to exclude whoever disagree with your values. This is often instrumental to construct a social group, or to get accepted in a social group that includes high status toxic characters. I take be nice as the claim that there are always better objectives.

This is aiming at a different problem than goal agnosticism; it's trying to come up with an agent that is reasonably safe in other ways.

Well, assuming a robust implementation, I still think it obeys your criterions, but now you mention « restrictive », my understanding is that you want this expression to specifically refers to pure predictors. Correct?

If yes, I’m not sure that’s the best choice for clarity (why not « pure predictors »?) but of course that’s your choice. If not, can you give some examples of goal agnostic agents other than pure predictors?

Goal agnosticism can, in principle, apply to things which are not pure predictors, and there are things which could reasonably be called predictors which are not goal agnostic. A subset of predictors are indeed the most powerful known goal agnostic systems. I can't currently point you toward another competitive goal agnostic system (rocks are uselessly goal agnostic), but the properties of goal agnosticism do, in concept, extend beyond predictors, so I leave the door open. Also, by using the term "goal agnosticism" I try to highlight the value that arises directly from the goal-related properties, like statistical passivity and the lack of instrumental representational obfuscation. I could just try to use the more limited and implementation specific "ideal predictors" I've used before, but in order to properly specify what I mean by an "ideal" predictor, I'd need to specify goal agnosticism.

You forgot to explain why these arguments only apply to strangers. Is there a reason to think medical research and economical incentives are better when it’s a family member who need a kidney?

Nope, my social media presence is very very low. But I’m open to suggestion since I realized there’s a lot of toxic characters with high status here. Did you try EA forums? Is it better?

Hm... pretty similar here. I also don't have much of a media presence. I haven't tried EA forums yet, mainly because I consider myself intellectually more aligned with LW, but in any case I'm open to looking. This is looking to be a more personal conversation now. Would you like to continue in direct messages? Open to hearing your suggestions, I'm just as clueless right now. 

(The actual question is about your best utilitarian model, not your strategy given my model.)

Uniform distribution of donating kidney sounds also the result when a donor is 10^19 more likely to set the example. Maybe I should precise that the donor is unlikely to take the 1% risk unless someone else is more critical to war effort.

Good laugh! But they’re also 10^19 times more likely to get the difference between donating one kidney and donating both.

Nope, but one of my son suggests discord.

Likely a good suggestion. I'm in a few communities myself. But then, I'm unsure if you're familiar with how discord works. Discord is primarily a messaging app with public server features tacked on. Not the sort of community for posts like this. Are you aware of any particular communities within discord I could join? The general platform has many communities, much like reddit, but I'm not aware of any similar to lesswrong. 

Thanks for organizing this, here’s the pseudocode for my entry.

Robot 1: Cooperate at first, then tit for tat for 42 rounds, then identify yourself by playing: [0, 1, 1, 0, 0, 0, 1,1, 1, 1], then cooperate if the opponent did the same, otherwise defect.

Robot 2: Same as robot 1, ending with: … otherwise tit for tat

Robot 3 (secret): Same as robot 1, with a secret identifying sequence and number of initial ronds (you pick Isaac).

2Isaac King17d

No problem with the loading here. The most important files seems positive and pseudocode. In brief, this seems an attempt to guess which algorithm the cerebellum implements, waiting for more input from neuroscientists and/or coders to implement and test the idea. Not user friendly indeed. +1 for clarifications needed.

I waited Friday so that you won’t sleep at school because of me, but yes I enjoyed both style and freshness of ideas!

Look, I think you’re a young & promising opinion writer, but if you stay on LW I would expect you’ll get beaten by the cool kids (for lack of systematic engagement with both spirit and logical details of the answers you get). What about finding some place more about social visions and less about pure logic? Send me where and I’ll join for more about the strengths and some pitfalls maybe.

Many thanks for the kind words, I appreciate it.  You're probably right. I mainly started on lesswrong because this is a community I'm familiar with, and a place I can expect to understand basic norms. (I've read the sequences and have some understanding of rationalist discourse). I'm unsure how I'd fare in other communities, but then, I haven't looked either. Are you familiar with any? I don't know myself. 

…but I thought the criterion was unconditional preference? The idea of nausea is precisely because agents can decide to act despite nausea, they’d just rather find a better solution (if their intelligence is up to the task).

I agree that curiosity, period seems highly vulnerable (You read Scott Alexander? He wrote an hilarious hit piece about this idea a few weeks or months ago). But I did not say curious, period. I said curious about what humans will freely chose next.

In other words, the idea is that it should prefer not to trick humans, because if it does... (read more)

Right; a preference being conditionally overwhelmed by other preferences does not make the presence of the overwhelmed preference conditional. Or to phrase it another way, suppose I don't like eating bread[1] (-1 utilons), but I do like eating cheese (100 utilons) and garlic (1000 utilons). You ask me to choose between garlic bread (1000 - 1 = 999 utilons) and cheese (100 utilons); I pick the garlic bread. The fact that I don't like bread isn't erased by the fact that I chose to eat garlic bread in this context. This is aiming at a different problem than goal agnosticism; it's trying to come up with an agent that is reasonably safe in other ways. In order for these kinds of bounds (curiosity, nausea) to work, they need to incorporate enough of the human intent behind the concepts. So perhaps there is an interpretation of those words that is helpful, but there remains the question "how do you get the AI to obey that interpretation," and even then, that interpretation doesn't fit the restrictive definition of goal agnosticism. The usefulness of strong goal agnostic systems (like ideal predictors) is that, while they do not have properties like those by default, they make it possible to incrementally implement those properties. 1. ^ utterly false for the record

As you might guess, it’s not obvious to me. Would you mind to provide some details on these interpretations and how you see the breakage happens?

Also, we’ve been going back and forth without feeling the need to upvote each other, which I thought was fine but turns out being interpreted negatively. [to clarify: it seems to be one of the criterion here:] If that’s you thoughts too, we can close at this point, otherwise let’s give each other some high fives. Your call and thanks for the discussion in any case.

For example, a system that avoids experimenting on humans—even when prompted to do so otherwise—is expressing a preference about humans being experimented on by itself. Being meaningfully curious will also come along with some behavioral shift. If you tried to induce that behavior in a goal agnostic predictor through conditioning for being curious in that way and embed it in an agentic scaffold, it wouldn't be terribly surprising for it to, say, set up low-interference observation mechanisms. Not all violations of goal agnosticism necessarily yield doom, but even prosocial deviations from goal agnosticism are still deviations.

I suggest you use a photodetector to countercheck the frequency (11.9Hz is easy using diodes, but [because of their frame rate] screens are much less compliant).

Good point. Also I should stop hitting icons to read what it means.

Perhaps it's nontrivial that humans were selected to value a lot of stuff

I prefer the reverse story: humans are tools in the hand of the angiosperms, and they’re still doing the job these plants selected them for: they defend angiosperm at all cost. If superIA destruct 100% of the humans along with 99% of life on earth, they’ll call that the seed phase and chill for the new empty environment they would have made us clean for them.

As I point out in my AI pause essay:

Nitpick in there

I hope the reader will grant that the burden of proof is on those who advocate for such a moratorium. We should only advocate for such heavy-handed government action if it’s clear that the benefits of doing so would significantly outweigh the costs.

I find hard to grant something that would have make our response to pandemics or global warming even slower than they are. By the same reasoning, we would not have the Montreal protocol and the UV levels would be public concerns.

1Nora Belrose25d
If you want to discuss the other contents of my AI pause essay, it's probably best for you to comment over on the EA forum post, not here.

I agree that "IGF is the objective" is somewhat sloppy shorthand.

It’s used a lot in the comment sections. Do you know a better refutal than this post?

We know it depends on damage repartition. Losing1% of your neurons is enough to destruct your thalamus, which looks like your brain is dead. But you can also lose much more without noticing anything, if the damage is sparse enough.

Thanks, that helps.

Yes; during training, a non-goal agnostic optimizer can produce a goal agnostic predictor.

Suppose an agent is made robustly curious about what humans will next chose when free from external pressures and nauseous if its own actions could be interpreted as if experimenting on humans or its own code, do you agree it would be a good candidate for goal agnosticism?

Probably not? It's tough to come up with an interpretation of those properties that wouldn't result in the kind of unconditional preferences that break goal agnosticism.

I'm unsure if this is helpful in the realm of aid specifically, but I believe it does provide ample evidence for my thesis and raise it's coherency.

I update for stronger internal coherency and ability to articulate clear and well written stories. That was fun to read!

Now I don’t have the same internal frame of reference when it comes to evaluate what counts as evidence. I can accept a good story as evidence, but only if I can evaluate its internal coherency against other good stories one might believe in. Let’s cook one to see what I mean: « In a d... (read more)

Thanks for your reply!  Yes, you're right, I realize I was rather thin on evidence for the link between institutional weakness and corruption. I believe this was like mind fallacy on my end, I assumed the link was obvious. But since clearly it was not allow me to go back and contextualize it. Disclaimer: It's late and I'm tired, prose quality will be lower than usual, and I'll be prone to some rather dry political jokes.  To understand the link between institutions and corruption, I think it's helpful just to use simple mental models. Consider this simple question: what causes corruption? The answer seems fairly straightforward. People are corrupt, they want money, etc etc. But clearly, this isn't everything. Humans in different countries coming from similar racial, social, and class backgrounds tend to be varying levels of 'corrupt', but even countries with similar backgrounds often have wildly varying corruption levels. Take North and South Korea, for one example. Both were unified states emerging from occupation post WWII, but they took wildly different paths in their development as countries.  South Korea eventually transitioned from a military dictatorship to a free market democracy who know today. North Korea, however, remained a military dictatorship. This resulted in stark differences in corruption handling on both sides. In the global corruptions perceptions index, South Korea ranks 31st, while North Korea ranks an appalling 171st. Why was this?  The answer, I think, is institutions. South Korea, having developed a free market system and accountable mode of governance, is able to check the power of it's political and economic elites. If the president of North Korea decides he wants to abuse his power, the people have no recourse. If the president of South Korea decides to abuse their power, they end up in prison. (See the 7 korean heads of state that ended up in jail, quite impressive for a 40 year period. We had 4 years with Trump and only managed a m

I am ultimately still presenting what I believe is a more controversial thesis

In my head I rephrased that thesis as poor institutions and practices can impair efficiency totally, which I found as unsurprising as a charity add turns as not entirely accurate. So if you target readers who find this controversial I may just not be the right reader for the feedback you seek.

Still, I gave some time thinking at: What could you do to make me update?

One way is to beware more about unfairness. Instead of mere illustration of failures when your thesis was ignored,... (read more)

Right, that makes sense, and it was part of the angle I was taking. When I said controversial I was mainly referring to the more general claim that aid tends to be ineffective in reducing long term poverty, with few exceptions. (the implication being that aid fails to address institutional issues) The idea that monetary resources plays a small (or as I argue, largely negligible role) in addressing long term issues seems to me like it would be controversial to many EAs. But then, this mostly semantical and hardly the main point. Let's get into the heart of the issue.  A very insightful question. I was initially a bit dubious myself. Where has my thesis been followed by aid organizations? Certainly I don't recall any charities focusing on reforming government institutions! But then, on second thought, that was almost the entire point. It wasn't aid programs reforming governments, but rather, people.  Consider all the wealthiest nations in the world. With few exceptions, the richest nations are the ones with strong institutions, particularly representative, democratic ones. Although there are exceptions, they tend to be few and far between (see Singapore with an authoritarian technocracy that's ruthlessly efficient, or Qatar with their absurd amounts of oil wealth. Meanwhile, nations with defunct or nonexistent institutions (see North Korea, The Congo, Mexico, South Africa) invariably face poverty and destitution on a mass scale. Even in China, one of the great economic success stories, we still see defunct instructional inheritances like the hukou system result in situations like 25% of the Chinese workforce being trapped in subsistence agriculture (compared to around 2% in the US, mostly industrial farmers).  In that sense, I believe I can answer your question about precision.  I would liken institutions to a force multipliers in the military sense. One soldier with a gun >>> one hundred soldiers with spears. In the same way, powerful institutions enhance the ab

it's a bit difficult for me to pinpoint where exactly the miscommunication occurred. Could you elaborate on that point?

I can speculate the negative tone you got has something to do with misunderstanding your intent (Do you want to prove EA is doom to fail? I don’t think so but that’s one way to read the title.) but in truth I can’t exclude gatekeeping, nor talk for the LW team.

I'm disinclined to create the thesis

Ok then, this was more of a clarification question (Is this your thesis in one sentence, or you feel that’s a different thesis? A different one.). Thanks for the thorough answer with pointers, I’ll have a look with pleasure.

Hm... right. That would make sense. I think I can see how people might misread that. No, I had no intention of doing anything like that. I was trying to address the shortcomings of charity, particularly in the realm of structural and institutional rot (and the other myriad causes of poverty). EA charity faces many of the same issues in this regard, but 'doomed to fail' is hardly the point I would like to make. (If anything I try my best to advocate the opposite by making a donation myself) I was merely trying to point out that foreign/charity aid in general cannot hope to solve ingrained root causes of poverty without substantial, unsustainably large investments.  Part of the issue may have to do with my writing style. I try to aim for emotionally evocative, powerful posts. I find that this is a good way to get people engaged and generate discussion. (It also tends to be more fulfilling to write) This seems to have gotten in the way of clarity. Given the weight and circumstances of the subject matter (millions of people living in misery) I thought it was more than appropriate to amp up my language. Of course, this is still no excuse for being unclear. I should probably re-examine my diction.  That said, do you think I could change the title or edit in a disclaimer? The title itself was largely a stylistic choice, while I certainly could've said 'aid to the poor has certain practical limitations' I feel like that's hardly interesting nor conductive towards sparking a discussion. I am ultimately still presenting what I believe is a more controversial thesis, and I thought my title should reflect that. 

I like the way your text raise expectations for one conclusion and then present your actual thoughts (that none of these points overcome the danger of overestimating them).

However this is a sensitive topic on LW, so maybe a good precaution would be to clarify upfront that you’ll present a series of typical past failures rather than a logical case for why altruism can’t be efficient.

As an example, I was frustrated when the experts who based their approach off scientific research turned to not knowing to local market. How is that based on science? But ok, no... (read more)

You're right, that's probably a good idea. I considered a more comprehensive disclaimer myself when writing, but then opted against it when I realize it was likely to weaken the main point of my post. Even though this post is constrained by being only able to consider past failures (I have no information about the future), the past is still a very strong predictor of how the future turns out. Based off analysis of the past I'm still inclined to believe that foreign/charity aid by itself is insufficient to solve the root causes of poverty. I don't believe altruism is inefficient per se, or that the problem of poverty can't be solved (of course it has been). I am merely seeking to rebut the claim that I have seen in the fundraiser video, which seems to imply that charity aid alone would be enough to eradicate poverty (a claim which I find wildly overblown). I find action worthwhile, but I also find that the predicted resulted don't seem to meet the scale claimed by many people. I thought I was clear enough in this regard.   But then, after a few recent comments I'm inclined to believe I definitely misrepresented my case somewhere, or maybe my language was unclear. However, it's a bit difficult for me to pinpoint where exactly the miscommunication occurred. Could you elaborate on that point? This is my article, so it's quite difficult to see where I goofed up. This is all in my head, so it's quite obvious to me. If you could provide a disclaimer text for me I would be immensely grateful.  Regarding another working hypothesis, I believe I could do that, but I'm unsure if I'll have the time or will to amass the relevant evidence. Proving the negation of a claim is much easier than proving a claim. Institutional economics (a school of thought which believes institutions to be the greatest determinant of economic success) is already the position of many thinkers, and if I were to defend it I would also have to research competing economic theories and many more real world

An agent can have unconditional preferences over world states that are already fulfilled. A maximizer doesn't stop being a maximizer if it's maximizing.

Well said! In my view, if we’d feed a good enough maximizer with the goal of learning to look as if they were a unified goal agnostic agent, then I’d expect the behavior of the resulting algorithm to handle the paradox well enough it’ll make sense.

If the question is whether the thermostat's full system is goal agnostic, I suppose it is, but in an uninteresting way.

I beg to differ. In my view our vol... (read more)

If you successfully gave a strong maximizer the goal of maximizing a goal agnostic utility function, yes, you could then draw a box around the resulting system and correctly call it goal agnostic. Composing multiple goal agnostic systems into a new system, or just giving a single goal agnostic system some trivial scaffolding, does not necessarily yield goal agnosticism in the new system. It won't necessarily eliminate it, either; it depends on what the resulting system is. Yes; during training, a non-goal agnostic optimizer can produce a goal agnostic predictor.

the way I was using that word would imply some kind of preference over external world states.

It’s 100% ok to have your own set of useful definitions, just trying to understand it. In this very sense, one cannot want an external world state that is already in place, correct?

it's at least slightly unintuitive to me to describe a system as "wanting X" in a way that is not distinct from "being X,"

Let’s say we want to maximize the number of digits of pi we explicitly know. You could say being continuously curious about the next digits is a continuous st... (read more)

An agent can have unconditional preferences over world states that are already fulfilled. A maximizer doesn't stop being a maximizer if it's maximizing. That's definitely a goal, and I'd describe an agent with that goal as both "wanting" in the previous sense and not goal agnostic. If the thermostat is describable as goal agnostic, then I wouldn't say it's "wanting" by my previous definition. If the question is whether the thermostat's full system is goal agnostic, I suppose it is, but in an uninteresting way. (Note that if we draw the agent-box around 'thermostat with temperature set to 72' rather than just 'thermostat' alone, it is not goal agnostic anymore. Conditioning a goal agnostic agent can produce non-goal agnostic agents.)

A model that "wants" to be goal agnostic such that its behavior is goal agnostic can't be described as "wanting"

Ok, I did not expect you were using a tautology there. I’m not sure I get how to use it. Would you say a thermostat can’t be described as wanting because it’s being goal agnostic?

If you were using "wanting" the way I was using the word in the previous post, then yes, it would be wrong to describe a goal agnostic system as "wanting" something, because the way I was using that word would imply some kind of preference over external world states. I have no particular ownership over the definition of "wanting" and people are free to use words however they'd like, but it's at least slightly unintuitive to me to describe a system as "wanting X" in a way that is not distinct from "being X," hence my usage. 

You're right, I had kind of muddled thinking on that particular point.

That was not my thought (I consider interactive clarifications as one of our most powerful tool, then pressure to produce perfect texts as counterproductive), but..

I should lose a lot of Bayes points if this technology is still 10 years away (…) If this tech didn't become a feature of slow takeoff then I would lose even more Bayes points.

…I appreciate the concrete predictions very much, thanks!

I actually do think that the invention and deployment of this tech is heavily weighted

... (read more)

I don’t get if that’s your estimate for AI safety being attacked or for AI [safety] being destroyed by 2033. If that’s the former, what would count as an attack? If that’s the latter, what would you count as strong evidence that your estimate was wrong? (assuming you agree that AI safety still existing in 2033 counts as weak evidence given your ~30% prior).

You're right, I had kind of muddled thinking on that particular point. My thinking was that they would try to destroy or damage AI safety and the usual tactics would not work because AI safety is too weird, motivated, and rational (although they probably would not have a hard time measuring motivation sufficient to detect that it is much higher than normal interest groups). I tend to think of MIRI as an org that they can't pull out the rug under from because it's hardened e.g. it will survive in and function in some form even if everyone else in the AI safety community is manipulated by gradient descent into hating MIRI, but realistically Openphil is probably much more hardened. It's also hard to resolve because this tech is ubiquitous so maybe millions of people get messed with somehow (e.g. deliberately hooked on social media 3 hours per day). What AI safety looks like after being decimated would probably be very hard to picture; for steel manning purposes I will say that the 30% would apply to being heavily and repeatedly attacked and significantly damaged well beyond the FTX crisis and the EAforum shitposts. Frankly, I think I should lose a lot of Bayes points if this technology is still 10 years away. I know what I said in the epistemic status section, but I actually do think that the invention and deployment of this tech is heavily weighted towards the late 2010s and during COVID. If this tech didn't become a feature of slow takeoff then I would lose even more Bayes points.

Thanks for your patience and clarifications.

The observable difference between the two is the presence of instrumental behavior towards whatever goals it has.

Say again? On my left an agent that "just is goal agnostic". On my right an agent that "just want to be goal agnostic". At first both are still -the first because it is goal agnostic, the second because they want to look as if they were goal agnostic. Then I ask something. The first respond because they don’t mind doing what I ask. The second respond because they want to look as if they don’t mind doing what I ask. Where’s the observable difference?

If you have a model that "wants" to be goal agnostic in a way that means it behaves in a goal agnostic way in all circumstances, it is goal agnostic. It never exhibits any instrumental behavior arising from unconditional preferences over external world states. For the purposes of goal agnosticism, that form of "wanting" is an implementation detail. The definition places no requirement on how the goal agnostic behavior is achieved. In other words: A model that "wants" to be goal agnostic such that its behavior is goal agnostic can't be described as "wanting" to be goal agnostic in terms of its utility function; there will be no meaningful additional terms for "being goal agnostic," just the consequences of being goal agnostic. As a result of how I was using the words, the fact that there is an observable difference between "being" and "wanting to be" is pretty much tautological.

I trust my impression here is because I have information

Then I should update on epigenetics is not supported by evidence. And also about my chances to post nasty and arrogant when my medication change. Sorry about that.

However, I have a question about the large or small amount of bits.

Suppose Musk offers you a private island with a colony of hominids – the kind raw enough that they haven't yet invented cooking with fire. Then he insists very hard that you introduce strong sexual selection, which led to one of those big monkeys inventing parading in fron... (read more)

The reason I trust my impression here is because I have information where I have good reason to suspect that epigenetics in general is basically a P-hacked field, where the results are pure noise and indicate that epigenetics probably can't work, so yes I'm skeptical of epigenetics being a viable way to transmit information throughout the generations, or really epigenetics being useful at all.

Evolution mostly can't transmit any bits from one generation to the next generation via genetic knowledge, or really any other way

My first impression skimming through, is that what it's arguing is that abuse by parents can negatively affect a child, and that stress can have both positive and negative effects, and that individual responses to stress determine the balance of positive to negative effects. 2 things I want to point out: 1. I think that the conclusions from this study are almost certainly extremely limited, and I wouldn't trust these results to generalize to other species like us. 2. I expect the results, in so far as they are real and generalizable, to be essentially that the genome can influence things later in life via indirect methods, but mostly can't directly specify it via hardcoding it or baking it directly in as prior information, and the transfer seems very limited, and critically the timescale is likely on evolutionary timescales, which is far, far slower than human within-lifetime learning timescales, and certainly not as much as the many bits cultural evolution can give in a much shorter timeframe. I will edit the post to modify the any to more as many bits as cultural evolution, and edit it more to say what I really meant here.

Viewed in isolation, the optimizer responsible for training the model isn't goal agnostic because it can be described as having preferences over external world state (the model).

This is where I am lost. In this scenario, it seems that we could describe both the model and the optimizer as either having an unconditional preference for goal agnosticism, or both as having preferences over the state of external words(to include goal agnostic models). I don't understand what axiom or reasoning leads to treating these two things differently.

The resulting per

... (read more)
The difference is subtle but important, in the same way that an agent that "performs bayesian inference" is different from an agent that "wants to perform bayesian inference." A goal agnostic model does not want to be goal agnostic, it just is. If the model is describable as wanting to be goal agnostic, in terms of a utility function, it is not goal agnostic. The observable difference between the two is the presence of instrumental behavior towards whatever goals it has. A model that "wants to perform bayesian inference" might, say, maximize the amount of inference it can do, which (in the pathological limit) eats the universe. A model that wants to be goal agnostic has fewer paths to absurd outcomes since self-modifying to be goal agnostic is a more local process that doesn't require eating the universe and it may have other values that suggest eating the universe is bad, but it's still not immediately goal agnostic. Agent doesn't have a constant definition across all contexts, but it can be valid to describe a goal agnostic system as a rational agent in the VNM sense. Taking the "ideal predictor" as an example, it has a utility function that it maximizes. In the limit, it very likely represents a strong optimizing process. It just so happens that the goal agnostic utility function does not directly imply maximization with respect to external world states, and does not take instrumental actions that route through external world states (unless the system is conditioned into an agent that is not goal agnostic).

Your Fermi estimate starts from the women you’ve met but your conclusion is on women in general, who might present widely different characteristics.

According to Gallup polls, about 46% of Americans are creationists. (…) That’s half the country. And I don’t have a single one of those people in my social circle.

For the evolution of human intelligence, the optimizer is just evolution: biological natural selection.

Really? Would your argument change if we could demonstrate a key role for sexual selection, primate wars or the invention of cooking over fire?

the process itself, being an optimizer over world states, is not goal agnostic either.

That’s the crux I think: I don’t get why you reject (programmable) learning processes as goal agnostic.

you must be unable to describe me as having unconditional preferences over world states for me to be goal agnostic.

Let’s say I clone you_genes a few billions time, each time twisting your environment and education until I’m statistically happy with the recipe. What unconditional preferences would you expect to remain?

Let’a say you_adult are actually a digital brai... (read more)

It's important to draw a box around the specific agent under consideration. Suppose I train a model with predictive loss such that the model is goal agnostic. Three things can be simultaneously true: 1. Viewed in isolation, the optimizer responsible for training the model isn't goal agnostic because it can be described as having preferences over external world state (the model). 2. The model is goal agnostic because it meets the stated requirements (and is asserted by the hypothetical). 3. A simulacrum arising from sequences predicted by that goal agnostic predictor when conditioned to predict non-goal agnostic behavior is not goal agnostic. The resulting person would still be human, and presumably not goal agnostic as a result. A simulacrum produced by an ideal goal agnostic predictor that is conditioned to reproduce the behavior of that human would also not be goal agnostic. The fact that that those preferences arose conditionally based on your selection process isn't relevant to whether the person is goal agnostic. The relevant kind of conditionality is within the agent under consideration. No; "I" still have preferences over world states. They're just being overridden. Bumping up a level and drawing the box around the unpleasant boss and myself combined, still no, because the system expresses my preferences filtered by my boss's preferences. Some behavior being conditional isn't sufficient for goal agnosticism; there must be no way to describe the agent under consideration as having unconditional preferences over external world states.
Answer by IlioOct 11, 202341

Dating people with disabilities, neurodivergents, low IQs, mental health issues, and/or your brother in law.

I’m glad you see that that way. How would you challenge an interpretation of your axioms so that the best answer is we don’t need to change anything at all?

  • random sampling of its behavior has negligible[3] probability of being a dangerously capable optimizing process with incorrigible preferences over external world states.[4]

That sounds true for natural selection (most of earth history we were stuck with unicellulars, most of vertebrate history we were stuck with the smallest brain-for-body-size possibles), children (if we could secretly switch a pair... (read more)

Salvaging the last paragraph of my previous post is pretty difficult. The "it" in "you could call it goal agnostic" was referring to the evolved creature, not natural selection, but the "conditionally required ... specific mutations" would not actually serve to imply goal agnosticism for the creature. I was trying to describe a form of natural selection equivalent to a kind of predictive training but messed it up. Trying to model natural selection, RL-based training systems, or predictive training systems as agents gets squinty, but all of them could be reasonably described as having "preferences" over the subjects of optimization. They're all explicitly optimizers over external states; they don't meet the goal agnostic criteria. Some types of predictive training seem to produce goal agnostic systems, but the optimization process is not itself a goal agnostic system. Regarding humans, I'm comfortable just asserting that we're not goal agnostic. I definitely have preferences over world states. You could describe me with a conditionalized utility function, but that's not sufficient for goal agnosticism; you must be unable to describe me as having unconditional preferences over world states for me to be goal agnostic. Dogs are probably not paperclip maximizers, but dogs seem to have preferences over world states, so that process doesn't produce goal agnostic agents. And the process itself, being an optimizer over world states, is not goal agnostic either.

Great material! Although maybe a better name might be « goal curious ».

In your view, what are the smallest changes that would make {natural selection; children; compagnies} goal agnostic?

That's tough to answer. There's not really a way to make children goal agnostic; humans aren't that kind of thing. In principle, maybe you could construct a very odd corporate entity that is interfaced with like a conditioned predictor, but it strains the question. It's easier to discuss natural selection in this context by viewing natural selection as the outer optimizer. It's reinforcement learning with a sparse and distant reward. Accordingly, the space of things that could be produced by natural selection is extremely wide. It's not surprising that humans are not machines that monomaniacally maximize inclusive genetic fitness; the optimization process was not so constraining. There's no way to implement this, but if "natural selection" somehow conditionally required that only specific mutations could be propagated through reproduction, and if there were only negligibly probable paths by which any "evolved" creature could be a potentially risky optimizer, then you could call it goal agnostic. It'd be a pretty useless form of goal agnosticism, though; nothing about that system makes it easy for you to aim it at anything.

It feels way to easy to flip the sign:

« I think the orthogonality thesis is wrong. For instance, without rejecting the orthogonality thesis, one might think we should stop constructing AGI!

You might think this is stupid, but some significant people believe it. Clearly the orthogonality thesis is nontrivially confusion for cases like this. »

I think the sign flip would be: If Nora Belrose wants to make that argument, then she can just do what. Thanks to Eliezer Y. pushing the orthogonality thesis in rationalist circles, I don't think anyone wants to make that argument, and that's why I didn't address it but instead just showed how it used to be believed.

Same quote, emphasis on the basic question.

What’s wrong with « Left and right limbs come out basically the same size because it’s the same construction plan. »?

9Thomas Kwa2mo
A sufficiently mechanistic answer lets us engineer useful things, e.g. constructing an animal with left limbs 2 inches longer than its right limbs.
Ups. Yeah I forgot to address that one. I was just astonished to hear no one knows the answer to that one.

I feel the OP is best thought as a placebo pump rather than a mechanistic one-size-fits-all advice. It might not be the best for you if it lacks some key ingredient you need. Or it might work iff you first create a fiction character that you can feel responsible for many of your problem (« Moloch did this! »), then allow yourself find the >1% of the time where you did successfully overcome the bastard, then you can climb.'escalade#/media/Fichier%3ABW_2012-08-26_Anna_Stoehr_AUS_0601-ZoomDoigtsArqués.jpg

Load More