Fascinating paper! I wonder how much they would agree that holography means sparse tensors and convolution, or that the intuitive versus reflexive thinking basically amount to visuo-spatial versus phonological loop. Can’t wait to hear which other idea you’d like to import from this line of thought.
I have no idea whether or not Hassibis is himself dismissive of that work
Well that’s a problem, don’t you think?
but many are.
Yes, as a cognitive neuroscientist myself, you’re right that many within my generation tend to dismiss symbolic approaches. We were students during a winter that many of us thought caused by the over promising and under delivering of the symbolic approach, with Minsky as the main reason for the slow start of neural networks. I bet you have a different perspective. What’s your three best points for changing the view of my generation?
Because I agree, and because « strangely » sounds to me like « with inconstancies ».
In other words, in my view the orthodox view on orthogonality is problematic, because it suppose that we can pick at will within the enormous space of possible functions, whereas the set of intelligent behavior that we can construct is more likely sparse and by default descriptible using game theory (think tit for tat).
This is a sort of positive nihilism. Because value is not inherent in the physical world, you can assign value to whatever you want, with no inconsistency.
Say we construct a strong AI that attributes a lot of value to a specific white noise screenshot. How would you expect it to behave?
Your point is « Good AIs should have a working memory, a concept that comes from psychology ».
DH point is « Good AIs should have a working memory, and the way to implement it was based on concepts taken from neuroscience ».
That’s indeed orthogonal notions, if you will.
I’m a bit annoyed that Hassabis is giving neuroscience credit for the idea of episodic memory.
That’s not my understanding. To me he is giving neuroscience credit for the ideas that made possible to implement a working memory in LLM. I guess he didn’t want to use words like thalamocortical, but from a neuroscience point of view transformers indeed look inspired by the isocortex, e.g. by the idea that a general distributed architecture can process any kind of information relevant to a human cognitive architecture.
I’d be happy if you could point out a non competitive one, or explain why my proposal above does not obey your axioms. But we seem to get diminished returns to sort these questions out, so maybe it’s time to close at this point and wish you luck. Thanks for the discussion!
Saying fuck you is helpful when the aim is to exclude whoever disagree with your values. This is often instrumental to construct a social group, or to get accepted in a social group that includes high status toxic characters. I take be nice as the claim that there are always better objectives.
This is aiming at a different problem than goal agnosticism; it's trying to come up with an agent that is reasonably safe in other ways.
Well, assuming a robust implementation, I still think it obeys your criterions, but now you mention « restrictive », my understanding is that you want this expression to specifically refers to pure predictors. Correct?
If yes, I’m not sure that’s the best choice for clarity (why not « pure predictors »?) but of course that’s your choice. If not, can you give some examples of goal agnostic agents other than pure predictors?
You forgot to explain why these arguments only apply to strangers. Is there a reason to think medical research and economical incentives are better when it’s a family member who need a kidney?
Nope, my social media presence is very very low. But I’m open to suggestion since I realized there’s a lot of toxic characters with high status here. Did you try EA forums? Is it better?
(The actual question is about your best utilitarian model, not your strategy given my model.)
Uniform distribution of donating kidney sounds also the result when a donor is 10^19 more likely to set the example. Maybe I should precise that the donor is unlikely to take the 1% risk unless someone else is more critical to war effort.
Good laugh! But they’re also 10^19 times more likely to get the difference between donating one kidney and donating both.
Thanks for organizing this, here’s the pseudocode for my entry.
Robot 1: Cooperate at first, then tit for tat for 42 rounds, then identify yourself by playing: [0, 1, 1, 0, 0, 0, 1,1, 1, 1], then cooperate if the opponent did the same, otherwise defect.
Robot 2: Same as robot 1, ending with: … otherwise tit for tat
Robot 3 (secret): Same as robot 1, with a secret identifying sequence and number of initial ronds (you pick Isaac).
No problem with the loading here. The most important files seems positive and pseudocode. In brief, this seems an attempt to guess which algorithm the cerebellum implements, waiting for more input from neuroscientists and/or coders to implement and test the idea. Not user friendly indeed. +1 for clarifications needed.
I waited Friday so that you won’t sleep at school because of me, but yes I enjoyed both style and freshness of ideas!
Look, I think you’re a young & promising opinion writer, but if you stay on LW I would expect you’ll get beaten by the cool kids (for lack of systematic engagement with both spirit and logical details of the answers you get). What about finding some place more about social visions and less about pure logic? Send me where and I’ll join for more about the strengths and some pitfalls maybe.
…but I thought the criterion was unconditional preference? The idea of nausea is precisely because agents can decide to act despite nausea, they’d just rather find a better solution (if their intelligence is up to the task).
I agree that curiosity, period seems highly vulnerable (You read Scott Alexander? He wrote an hilarious hit piece about this idea a few weeks or months ago). But I did not say curious, period. I said curious about what humans will freely chose next.
In other words, the idea is that it should prefer not to trick humans, because if it does...
As you might guess, it’s not obvious to me. Would you mind to provide some details on these interpretations and how you see the breakage happens?
Also, we’ve been going back and forth without feeling the need to upvote each other, which I thought was fine but turns out being interpreted negatively. [to clarify: it seems to be one of the criterion here: https://www.lesswrong.com/posts/hHyYph9CcYfdnoC5j/automatic-rate-limiting-on-lesswrong] If that’s you thoughts too, we can close at this point, otherwise let’s give each other some high fives. Your call and thanks for the discussion in any case.
I suggest you use a photodetector to countercheck the frequency (11.9Hz is easy using diodes, but [because of their frame rate] screens are much less compliant).
Perhaps it's nontrivial that humans were selected to value a lot of stuff
I prefer the reverse story: humans are tools in the hand of the angiosperms, and they’re still doing the job these plants selected them for: they defend angiosperm at all cost. If superIA destruct 100% of the humans along with 99% of life on earth, they’ll call that the seed phase and chill for the new empty environment they would have made us clean for them.
As I point out in my AI pause essay:
Nitpick in there
I hope the reader will grant that the burden of proof is on those who advocate for such a moratorium. We should only advocate for such heavy-handed government action if it’s clear that the benefits of doing so would significantly outweigh the costs.
I find hard to grant something that would have make our response to pandemics or global warming even slower than they are. By the same reasoning, we would not have the Montreal protocol and the UV levels would be public concerns.
I agree that "IGF is the objective" is somewhat sloppy shorthand.
It’s used a lot in the comment sections. Do you know a better refutal than this post?
We know it depends on damage repartition. Losing1% of your neurons is enough to destruct your thalamus, which looks like your brain is dead. But you can also lose much more without noticing anything, if the damage is sparse enough.
Thanks, that helps.
Yes; during training, a non-goal agnostic optimizer can produce a goal agnostic predictor.
Suppose an agent is made robustly curious about what humans will next chose when free from external pressures and nauseous if its own actions could be interpreted as if experimenting on humans or its own code, do you agree it would be a good candidate for goal agnosticism?
I'm unsure if this is helpful in the realm of aid specifically, but I believe it does provide ample evidence for my thesis and raise it's coherency.
I update for stronger internal coherency and ability to articulate clear and well written stories. That was fun to read!
Now I don’t have the same internal frame of reference when it comes to evaluate what counts as evidence. I can accept a good story as evidence, but only if I can evaluate its internal coherency against other good stories one might believe in. Let’s cook one to see what I mean: « In a d...
I am ultimately still presenting what I believe is a more controversial thesis
In my head I rephrased that thesis as poor institutions and practices can impair efficiency totally, which I found as unsurprising as a charity add turns as not entirely accurate. So if you target readers who find this controversial I may just not be the right reader for the feedback you seek.
Still, I gave some time thinking at: What could you do to make me update?
One way is to beware more about unfairness. Instead of mere illustration of failures when your thesis was ignored,...
it's a bit difficult for me to pinpoint where exactly the miscommunication occurred. Could you elaborate on that point?
I can speculate the negative tone you got has something to do with misunderstanding your intent (Do you want to prove EA is doom to fail? I don’t think so but that’s one way to read the title.) but in truth I can’t exclude gatekeeping, nor talk for the LW team.
I'm disinclined to create the thesis
Ok then, this was more of a clarification question (Is this your thesis in one sentence, or you feel that’s a different thesis? A different one.). Thanks for the thorough answer with pointers, I’ll have a look with pleasure.
I like the way your text raise expectations for one conclusion and then present your actual thoughts (that none of these points overcome the danger of overestimating them).
However this is a sensitive topic on LW, so maybe a good precaution would be to clarify upfront that you’ll present a series of typical past failures rather than a logical case for why altruism can’t be efficient.
As an example, I was frustrated when the experts who based their approach off scientific research turned to not knowing to local market. How is that based on science? But ok, no...
An agent can have unconditional preferences over world states that are already fulfilled. A maximizer doesn't stop being a maximizer if it's maximizing.
Well said! In my view, if we’d feed a good enough maximizer with the goal of learning to look as if they were a unified goal agnostic agent, then I’d expect the behavior of the resulting algorithm to handle the paradox well enough it’ll make sense.
If the question is whether the thermostat's full system is goal agnostic, I suppose it is, but in an uninteresting way.
I beg to differ. In my view our vol...
the way I was using that word would imply some kind of preference over external world states.
It’s 100% ok to have your own set of useful definitions, just trying to understand it. In this very sense, one cannot want an external world state that is already in place, correct?
it's at least slightly unintuitive to me to describe a system as "wanting X" in a way that is not distinct from "being X,"
Let’s say we want to maximize the number of digits of pi we explicitly know. You could say being continuously curious about the next digits is a continuous st...
A model that "wants" to be goal agnostic such that its behavior is goal agnostic can't be described as "wanting"
Ok, I did not expect you were using a tautology there. I’m not sure I get how to use it. Would you say a thermostat can’t be described as wanting because it’s being goal agnostic?
You're right, I had kind of muddled thinking on that particular point.
That was not my thought (I consider interactive clarifications as one of our most powerful tool, then pressure to produce perfect texts as counterproductive), but..
I should lose a lot of Bayes points if this technology is still 10 years away (…) If this tech didn't become a feature of slow takeoff then I would lose even more Bayes points.
…I appreciate the concrete predictions very much, thanks!
...I actually do think that the invention and deployment of this tech is heavily weighted
I don’t get if that’s your estimate for AI safety being attacked or for AI [safety] being destroyed by 2033. If that’s the former, what would count as an attack? If that’s the latter, what would you count as strong evidence that your estimate was wrong? (assuming you agree that AI safety still existing in 2033 counts as weak evidence given your ~30% prior).
Thanks for your patience and clarifications.
The observable difference between the two is the presence of instrumental behavior towards whatever goals it has.
Say again? On my left an agent that "just is goal agnostic". On my right an agent that "just want to be goal agnostic". At first both are still -the first because it is goal agnostic, the second because they want to look as if they were goal agnostic. Then I ask something. The first respond because they don’t mind doing what I ask. The second respond because they want to look as if they don’t mind doing what I ask. Where’s the observable difference?
I trust my impression here is because I have information
Then I should update on epigenetics is not supported by evidence. And also about my chances to post nasty and arrogant when my medication change. Sorry about that.
However, I have a question about the large or small amount of bits.
Suppose Musk offers you a private island with a colony of hominids – the kind raw enough that they haven't yet invented cooking with fire. Then he insists very hard that you introduce strong sexual selection, which led to one of those big monkeys inventing parading in fron...
Evolution mostly can't transmit any bits from one generation to the next generation via genetic knowledge, or really any other way
http://allmanlab.caltech.edu/biCNS217_2008/PDFs/Meaney2001.pdf
Viewed in isolation, the optimizer responsible for training the model isn't goal agnostic because it can be described as having preferences over external world state (the model).
This is where I am lost. In this scenario, it seems that we could describe both the model and the optimizer as either having an unconditional preference for goal agnosticism, or both as having preferences over the state of external words(to include goal agnostic models). I don't understand what axiom or reasoning leads to treating these two things differently.
...The resulting per
Your Fermi estimate starts from the women you’ve met but your conclusion is on women in general, who might present widely different characteristics.
https://slatestarcodex.com/2014/09/30/i-can-tolerate-anything-except-the-outgroup/
According to Gallup polls, about 46% of Americans are creationists. (…) That’s half the country. And I don’t have a single one of those people in my social circle.
For the evolution of human intelligence, the optimizer is just evolution: biological natural selection.
Really? Would your argument change if we could demonstrate a key role for sexual selection, primate wars or the invention of cooking over fire?
the process itself, being an optimizer over world states, is not goal agnostic either.
That’s the crux I think: I don’t get why you reject (programmable) learning processes as goal agnostic.
you must be unable to describe me as having unconditional preferences over world states for me to be goal agnostic.
Let’s say I clone you_genes a few billions time, each time twisting your environment and education until I’m statistically happy with the recipe. What unconditional preferences would you expect to remain?
Let’a say you_adult are actually a digital brai...
Dating people with disabilities, neurodivergents, low IQs, mental health issues, and/or your brother in law.
I’m glad you see that that way. How would you challenge an interpretation of your axioms so that the best answer is we don’t need to change anything at all?
That sounds true for natural selection (most of earth history we were stuck with unicellulars, most of vertebrate history we were stuck with the smallest brain-for-body-size possibles), children (if we could secretly switch a pair...
Great material! Although maybe a better name might be « goal curious ».
In your view, what are the smallest changes that would make {natural selection; children; compagnies} goal agnostic?
It feels way to easy to flip the sign:
« I think the orthogonality thesis is wrong. For instance, without rejecting the orthogonality thesis, one might think we should stop constructing AGI!
You might think this is stupid, but some significant people believe it. Clearly the orthogonality thesis is nontrivially confusion for cases like this. »
Same quote, emphasis on the basic question.
What’s wrong with « Left and right limbs come out basically the same size because it’s the same construction plan. »?
I feel the OP is best thought as a placebo pump rather than a mechanistic one-size-fits-all advice. It might not be the best for you if it lacks some key ingredient you need. Or it might work iff you first create a fiction character that you can feel responsible for many of your problem (« Moloch did this! »), then allow yourself find the >1% of the time where you did successfully overcome the bastard, then you can climb.
It’s a key faith I used to share, but I’m now agnostic about that. To take a concrete exemple, everyone knows that blues and reds get more and more polarized. Grey type like old me would thought there must be a objective truth to extract with elements from both sides. Now I’m wondering if ethics should ends with: no truth can help deciding whether future humans should be able to live like bees or like dolphins or like the blues or like the reds, especially when living like the reds... (read more)