All of TekhneMakre's Comments + Replies

Ok, I appreciate you clarifying. I should have made the connection more clear. The connection is that Ziz's theory is that humans are actually, very literally, two people: one left hemisphere and one right hemisphere, with different personalities, life orientations, and perhaps genders. See for example the links in https://hivewired.wordpress.com/2019/12/02/hemisphere-theory-much-more-than-you-wanted-to-know/  (I haven't read that summary carefully). Some other "Zizians" also say they believe this about themselves, and believe that other people are de... (read more)

Why is this downvoted? It's a straightforward connection, and interesting. 

4Steven Byrnes3d
I looked up “the Zizians”. They seem to be some group of people in California, I guess. Anyway, I interpret your comment as “maybe those people have schizophrenia”. OK, maybe they do, maybe they don’t, I dunno. Lots of people have schizophrenia. I think if you want to discuss which people do or don’t have schizophrenia, and how that does or doesn’t effect their behavior, that’s fine, at least in principle. (In practice, things can go badly when amateurs start trying to give psychological diagnoses to people they don’t like.) But I don’t see how it would be related to this blog post. If you want to know how people behave when they have schizophrenia, the right approach is to look at lots of people with schizophrenia and write down how they behave. People have already done this, and you can read the results on wikipedia and many other places. The wrong approach is to read highly-speculative musings on the neuroscience of schizophrenia, like the contents of this blog post. Any information in the latter is screened off [https://risingentropy.com/screening-off-and-explaining-away/] by the former, right?
2TekhneMakre3d
Why is this downvoted? It's a straightforward connection, and interesting. 

My impression is that basically no one knows how reasoning works, so people either make vague statements (I don't know what shard theory is supposed to be but when I've looked briefly at it it's either vague or obvious), or retreat to functional descriptions like "the AI follows a policy that acheives high reward" or "the AI is efficient relative to humans" or "the AI pumps outcomes" (see e.g. here: https://www.lesswrong.com/posts/7im8at9PmhbT4JHsW/ngo-and-yudkowsky-on-alignment-difficulty?commentId=5LsHYuXzyKuK3Fbtv ). 

Why is it at all plausible that bigger feedforward neural nets trained to predict masses of random data recorded from humans or whatever random other stuff would ever be superintelligent? I think AGI could be near, but I don't see how this paradigm gets there. I don't even see why it's plausible. I don't even see why it's plausible that either a feedforward network gets to AGI, or an AI trained on this random data gets to AGI. These of course aren't hard limits at all. There's some big feedforward neural net that destroys the world, so there's some trainin... (read more)

1Simulation_Brain5d
I think the main concern is that feed forward nets are used as a component in systems that achieve full AGI. For instance, deepmind's agent systems include a few networks and run a few times before selecting an action. Current networks are more like individual pieces of the human brain, like a visual system and a language system. Putting them together and getting them to choose and pursue goals and subgoals appropriately seems all too plausible. Now, some people also think that just increasing the size of nets and training data sets will produce AGI, because progress has been so good so far. Those people seem to be less concerned with safety. This is probably because such feedforward nets would be more like tools than agents. I tend to agree with you that this approach seems unlikely to.produce real AGI much less ASI, but it could produce very useful systems that are superhuman in limited areas. It already has in a few areas, such as protein folding.

FYI Ziz also thinks one should stand there. https://sinceriously.fyi/narrative-breadcrumbs-vs-grizzly-bear/

I glanced at the first paper you cited, and it seems to show a very weak form of the statements you made. AFAICT their results are more like "we found brain areas that light up when the person reads 'cat', just like how this part of the neural net lights up when given input 'cat'" and less like "the LLM is useful for other tasks in the same way as the neural version is useful for other tasks". Am I confused about what the paper says, and if so, how? What sort of claim are you making?

What's the story here about how people were using FDT?

It's a pretty weird epistemic state to be in, to think that he's 99% accurate at reading that sort of thing (assuming you mean, he sometimes opens the cage on people who seem from the inside as FDT-theorist-y as you, and 99% of the time they run, and he sometimes doesn't open the cage on people who seem from the inside as FDT-theorist-y as you, and 99% of them would have run (the use of the counterfactual here is suspicious)). But yeah, of course if you're actually in that epistemic state, you shouldn't run. That's just choosing to have a bear released on you. 

This is interesting, and I'd like to see more. Specifically, what's Ziz's problem, and more generally, preciser stuff. 

> By starting the dilemma at a point in logical time after the blackmail letter has already been received, the counterfactuals where the blackmail doesn’t occur have by the definition and conditions of the dilemma not happened, the counterfactuals are not the world the agent actually occupies regardless of their moves prior to the dilemma’s start. The dilemma declares that the letter has been received, it cannot be u... (read more)

-1Slimepriestess14d
this is Ziz's original formulation of the dilemma, but it could be seen as somewhat isomorphic to the fatal mechanical blackmail dilemma: FDT says stand there and bare your throat in order to make this situation not occur, but that fails to track the point in logical time that the agent actually is placed into at the start of a game where the bear has already been released.

Some versions of "ambition" are evil (e.g. wanting to commit fraud), some are dysfunctional (e.g. wanting to be a hedge fund manager or CEO of ECorp), and some are worthwhile but have been severely damaged. Your mission isn't to get into the inner ring https://www.lewissociety.org/innerring/ , it's to understand why the worthwhile ambition has been damaged (see https://www.lesswrong.com/posts/KktH8Q94eK3xZABNy/hope-and-false-hope ), and then create contexts where that sort of thing can grow. 

Yeah, I don't think BCIs are likely to help align strong AGI. (By the same token I don't they'd hurt; and if they would hurt, that would also somewhat imply they could help if done differently.)

As I think I've mentioned to you before in another thread, I think it's probably incorrect for us to sacrifice not-basically-zero hopes in 10 or 20 years, in exchange for what is in practice even smaller hopes sooner. I think the great majority of people who say they think AGI is very very (or "super") likely in, say, the next 10 years are mostly just updating off everyone else. 

5Nathan Helm-Burger23d
Yeah, I think I am somewhat unusual in having tried to research timelines in depth and run experiments to support my research. My inside view continues to suggest we have less than 5 years. I've been struggling with how to write convincingly about this without divulging sociohazards. I feel apologetic for being in the situation of arguing for a point that I refuse to cite my evidence for.

That would be one potential effect. Another potential effect would be that you can learn to manipulate (not in the psychological sense, in the sense of "use one's hands to control") the AI better, by seeing and touching more of the AI with faster feedback loops. Not saying it's likely to work, but I think"hopeless" goes too far.  

1Nathan Helm-Burger23d
Yeah, I don't think we know enough to be sure how it would work out one way or another. There's lots of different ways to wire up neurons to computers. I think it would be worth experimenting with if we had the time. We super don't though.

By increasing your output bandwidth, obviously.

4Eli Tyre23d
Increasing your output bandwidth in a case like this one would just give the AI more ability to model you and cater to you specifically.

(The reason I linked to the comment is that I too have noticed that downvotes without explanation don't give much information, and my probably bad suggestion about that seemed relevant.)

Thanks for clarifying.... but, I can't publish it. I've put text in the title and in the body, and clicked the publish button. It has some effect, namely making the "GET FEEDBACK" button disappear. When I check links to shortform comments, they're still not visible to outsiders. When I reload the container post, the title text is gone and the body text is gone but restorable, even though I've also clicked SAVE DRAFT. 
I'm refering to the post on my profile that looks like: 1[Draft]Bíos brakhús

Text:
 

Crazy idea: you're not allowed to downvote without either writing an explanation of why, or pressing agree on someone else's explanation of why they downvoted. Or some variation of that.




https://www.lesswrong.com/posts/QZM6pErzL7JwE3pkv/niplav-s-shortform?commentId=862NKA2x4AHx3FAcp#862NKA2x4AHx3FAcp 

2jimrandomh25d
Not sure why you're linking to that comment here, but: the reason that link was broken for niplav is because your shortform-container post is marked as a draft, which makes it (and your shortform comments) inaccessible to non-admins. You can fix it by editing the shortform container post and clicking Publish, which will make it accessible again.

This seems important. Can you crystallize more of the causality, from your reading? E.g. is it because peer review creates cabals and entrenched interests who upvote work that makes their work seem "in the hot areas", or similar? Or because it creates wasteful work for the academics trying to conform to logistical peer review requirements? Or predatory journals select for bad editors? Or it creates an illusion of consensus, obscuring that there are gaping wide open questions? Or...?

4the gears to ascenscion1mo
I don't feel qualified to distill, which is why I did not. I only have a fuzzy grasp of the issue myself. Your hypotheses all seem plausible to me.

A couple things:
1. Decision-makers tend to be more in demand, in general. This implies a number of things: the unilateralist curse bites harder (innoculation, and just exhaustion); they're harder to contact; they're more exposed to people trying to socially hack them, so they're more on guard; and they might have more constraints on them, preventing them from changing course (unsure about this one; obviously they have some power, but they also might have less freedom to more deeply reorient). 
2. Talking to average researchers gives you practice talkin... (read more)

I'd be interested if you can braindump a bit about what happened when you tried to convince Cannell, and make guesses about underlying dynamics. 

Another hypothesis: walking turns on your brain's "we're going somewhere, let's keep track of where we've been / we are / we're going" mode. But since you're not going anywhere, and your path is very simple / doesn't require any high-level/deliberative adjustment or attention, that "journey mode" energy is purposed towards your thinking journey.

He's saying that since floating point arithmetic isn't necessarily associative, you can tell something about how some abstract function like the sum of a list is actually implemented / computed; and that partial info points at some architectures more than others. 

I'm talking about reflective stability. Are you saying that all agents will eventually self modify into FEP, and FEP is a rock? 

1Roman Leventov25d
Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning [https://www.lesswrong.com/posts/df4Jjg9cmJ7R2bkzR/reward-is-not-necessary-how-to-create-a-compositional-self-1]

What you've said so far doesn't seem to address my comments, or make it clear to me what the relevant of the FEP is. I also don't understand the FEP or the point of the FEP. I'm not saying EU maximizers are reflectively stable or a limit of agency, I'm saying that EU maximization is the least obviously reflectively unstable thing I'm aware of. 

1Roman Leventov1mo
I said that the limit of agency is already proposed, from the physical perspective (FEP). And this limit is not EU maximisation. So, methodologically, you should either criticise this proposal, or suggest an alternative theory that is better, or take the proposal seriously. If you take the proposal seriously (I do): the limit appears to be "uninteresting". A maximally entangled system is "nothing", it's perceptibly indistinguishable from its environment, for a third-person observer (let's say, in Tegmark's tripartite partition system-environment-observer [ https://journals.aps.org/prd/abstract/10.1103/PhysRevD.85.123517]).  There is no other limit. Instrumental convergence is not the limit, a strong instrumentally convergent system is still far from the limit. This suggests that unbounded analysis, "thinking to the limit" is not useful, in this particular situation. Any physical theory of agency [https://www.lesswrong.com/posts/2BPPwboTDrAMFiGHe/the-two-conceptions-of-active-inference-an-intelligence#Theories_of_agency] must ensure "reflective stability", by construction. I definitely don't sense anything "reflectively unstable" in Active Inference, because it's basically the theory of self-evidencing, and wields instrumental convergence in service of this self-evidencing. Who wouldn't "want" this, reflectively? Active Inference agents in some sense must want this by construction because they want to be themselves, as long as possible. However they redefine themselves, and at that very moment, they also want to be themselves (redefined). The only logical possibility out of this is to not want to exist at all at some point, i. e., commit suicide, which agents (e. g., humans) actually do sometimes. But conditioned on that they want to continue to exist, they are definitely reflectively stable.

There's naturality as in "what does it look like, the very first thing that is just barely generally capable enough to register as a general intelligence?", and there's naturality as in "what does it look like, a highly capable thing that has read-write access to itself?". Both interesting and relevant, but the latter question is in some ways an easier question to answer, and in some ways easier to answer alignment questions about. This is analogous to unbounded analysis: https://arbital.com/p/unbounded_analysis/

In other words, we can't even align an EU ma... (read more)

1Roman Leventov1mo
You seem to try to bail out EU maximisation as the model because it is a limit of agency, in some sense. I don't think this is the case. In classical [https://arxiv.org/abs/1906.10184] and quantum [https://chrisfieldsresearch.com/qFEP-2112.15242.pdf] derivations of the Free Energy Principle [https://www.lesswrong.com/tag/free-energy-principle], it is shown that the limit is the perfect predictive capability of the agent's environment (or, more pedantically: in classic formulation, FEP is derived from basic statistical mechanics; in quantum formulation, it's more of being postulated, but it is shown that quantum FEP in the limit is equivalent to the Unitarity Principle). Also, Active Inference, the process theory which is derived from the FEP, can be seen as a formalisation of instrumental convergence [https://www.lesswrong.com/posts/ostLZyhnBPndno2zP/active-inference-as-a-formalisation-of-instrumental]. So, we can informally outline the "stages of life" of a self-modifying agent as follows: general intelligence -> maximal instrumental convergence -> maximal prediction of the environment -> maximal entanglement with the environment.

If you're saying "let's think about a more general class of agents because EU maximization is unrealistic", that's fair, but note that you're potentially making the problem more difficult by trying to deal with a larger class with fewer invariants. 

If you're saying "let's think about a distinct but not more general class of agents because that will be more alignable", then maybe, and it'd be useful to say what the class is, but: you're going to have trouble aligning something if you can't even know that it has some properties that are stable under sel... (read more)

1DragonGod1mo
I strongly suspect that expected utility maximisers are anti-natural for selection for general capabilities.

From the title and the first sentence, the article I hallucinated described the possibility of saying: "Okay, you've won this actual game, I resign for most scenarios; but if such-and-such event happened, then the game would be interesting; so let's say that you've won 90%, and then battle it out for the remaining 10% with such-and-such alteration to the game state."

3jefftk1mo
Also sounds fun!

Fusion research slowed to a crawl in the 1970s and so we don't have fusion power

Requires huge specialized equipment. Some AI requires huge equipment, but (1) you can do a lot with a little, and (2) the equipment is heavily economically incentivized for other reasons (all the other uses of compute). 
 

 Electric cars have been delayed by a century.

Why was this? I'd've thought it's basically battery tech, blocked on materials tech. Is that not right?
 

  The space industry is only now catching up to where it was 50 years ago. 
 

... (read more)
3shminux1mo
With fusion it was mostly defunding, just like with space exploration: from https://21sci-tech.com/Articles_2010/Winter_2009/Who_Killed_Fusion.pdf [https://21sci-tech.com/Articles_2010/Winter_2009/Who_Killed_Fusion.pdf] Not sure if this is a possibility with AI. Electric cars and transport in general were apparently killed by the gas automobile industry. Battery tech was just enough for the daily commute, and there were options and workarounds. I am not an expert on how the government regulation kills innovation, there is probably enough out there, including by Zvi and Scott Alexander.

Thank you, this seems like a high-quality steelman (I couldn't judge if it passes an ITT). 

 

looks to me like the power-weighted average of staff is extremely into racing to the front, at least to near the brink of catastrophe or until governments buy risks enough to coordinate slowdown.
 

Can anyone say confident why? Is there one reason that predominates, or several? Like it's vaguely something about status, money, power, acquisitive mimesis, having a seat at the table... but these hypotheses are all weirdly dismissive of the epistemics of these high-powered people, so either we're talking about people who are high-powered because of the mana... (read more)

There are a lot of pretty credible arguments for them to try, especially with low risk estimates for AI disempowering humanity, and if their percentile of responsibility looks high within the industry.

One view is that the risk of AI turning against humanity is less than the risk of a nasty eternal CCP dictatorship if democracies relinquish AI unilaterally. You see this sort of argument made publicly by people like Eric Schmidt, and 'the real risk isn't AGI revolt, it's bad humans' is almost a reflexive take for many  in online discussion of AI risk. T... (read more)

Seems reason regarding public policy. But what about
1. private funders of AGI-relevant research
2. researchers doing AGI-relevant research?

Seems like there's a lot of potential reframings that make it more feasible to separate safe-ish research from non-safe-ish research. E.g. software 2.0: we're not trying to make a General Intelligence, we're trying to replace some functions in our software with nets learned from data. This is what AlphaFold is like, and I assume is what ML for fusion energy is like. If there's a real category like this, a fair amount of the conflict might be avoidable? 

Most AI companies and most employees there seem not to buy risk much, and to assign virtually no resources to address those issues. Unilaterally holding back from highly profitable AI when they won't put a tiny portion of those profits into safety mitigation again looks like an ask out of line with their weak interest. Even at the few significant companies with higher percentages of safety effort, it still looks to me like the power-weighted average of staff is extremely into racing to the front, at least to near the brink of catastrophe or until governmen... (read more)

More generally, that twitter thread is an exemplar of a broader thing which is the Vortex of Silicon Valley Craziness, which is mostly awesome, often very silly, and also tinged with craziness. And I have an uncertain vague sense that this tinge of craziness is some major part of the tailwind pushing AGI research? Or more specifically, the tailwind investing capital into AGI research.

I think that can be tactically bad, as I think people generally don’t tend to be receptive to people telling them that they and all their scientific idols are being reckless.

In many cases, they're right, and in fact they're working on AI (broadly construed) that's (1) narrow, (2) pretty unlikely to contribute to AGI, and (3) potentially scientifically interesting or socially/technologically useful, and therefore good to pursue. "We" may have a tactical need to be discerning ourselves in who, and what intentions, we criticize. 

The tension wouldn't be there if obstruction isn't bottlenecked on discernment because discernment is easy / not too hard, but I don't think you made that argument. 

If discernment is not too hard, that's potentially a dangerous thing: by being all discerning in a very noticeable way, you're painting a big target on "here's the dangerous [cool!] research". Which is what seems to have already happened with AGI. 

This is also a general problem with "just make better arguments about AI X-risk". You can certainly make such arguments without spreading i... (read more)

I think there's a major internal tension in the picture you present (though the tension is only there with further assumptions). You write:

Obstruction doesn’t need discernment

[...]
I don’t buy it. If all you want is to slow down a broad area of activity, my guess is that ignorant regulations do just fine at that every day (usually unintentionally). In particular, my impression is that if you mess up regulating things, a usual outcome is that many things are randomly slower than hoped. If you wanted to speed a specific thing up, that’s a very different story

... (read more)
2TekhneMakre1mo
The tension wouldn't be there if obstruction isn't bottlenecked on discernment because discernment is easy / not too hard, but I don't think you made that argument.  If discernment is not too hard, that's potentially a dangerous thing: by being all discerning in a very noticeable way, you're painting a big target on "here's the dangerous [cool!] research". Which is what seems to have already happened with AGI.  This is also a general problem with "just make better arguments about AI X-risk". You can certainly make such arguments without spreading ideas about how to advance capabilities, but still, the most pointed arguments are like "look, in order to transform the world you have to do XYZ, and XYZ is dangerous because ABC". You could maybe take the strategy of, whenever a top researcher makes a high-level proposal for how to make AGI, you can criticize that like "leaving aside whether or not that leads to AGI, if it led to AGI, here's how that would go poorly".  (I acknowledge that I'm being very "can't do" in emphasis, but again, I think this pathway is crucial and worth thinking through... and therefore I want to figure out the best ways to do it!)

Broadly agree, but: I think we're very confused about the social situation. (Again, I agree this argues to work on deconfusing, not to give up!) For example, one interpretation of the propositions claimed in this thread

https://twitter.com/soniajoseph_/status/1597735163936768000 

if  they are true, is that AI being dangerous is more powerful in terms of moving money and problem-solving juice as a recruitment tool rather than a dissuasion tool. I.e. in certain contexts, it's beneficial towards the goal of getting funding to include in your pitch "th... (read more)

2TekhneMakre1mo
More generally, that twitter thread is an exemplar of a broader thing which is the Vortex of Silicon Valley Craziness, which is mostly awesome, often very silly, and also tinged with craziness. And I have an uncertain vague sense that this tinge of craziness is some major part of the tailwind pushing AGI research? Or more specifically, the tailwind investing capital into AGI research.

Sure. 

if the change is big enough to impact his life, he'd notice

Well, it depends what he cares about. For example, if he mainly wants to be happy and live a life that feels good / satisfying, which is a reasonable goal, then he may be largely right. On the other hand, a lot of worthwhile goals that he might care about would demand creative intelligence to achieve. A significant drop in creative intelligence--a decrease in someone's peak ability to create new things, new ideas--is not something that would be picked up in normal studies, is not somethi... (read more)

1concerned_dad2mo
That is a good point. He concedes it. He tried for "microdosing LSD promotes creative intelligence" but couldn't back it up sufficiently.

Well, really I was trying to write in a sort of jokey but also no-nonsense way directly to the son in order to not be boring or something. IDK 

Yeah... I think the whole thing was written with me in a weird mood, retracted. Ruby's comment is much saner anyway.

2TekhneMakre2mo
Well, really I was trying to write in a sort of jokey but also no-nonsense way directly to the son in order to not be boring or something. IDK 

Why the hell is LSD criminalized everywhere? There are NO negative side effects

On priors, excitotoxicity should be a major concern. He should check his likelihood ratios: if there were brain damage caused by LSD, how likely would that show up in a study? What do the studies actually measure, and what might be happening that they don't measure? How would you know if people who take a lot of amphetamines / psychedelics were constantly degrading some their cognitive abilities?

"What's the difference between you drinking alcohol or coffee and me taking amphetam

... (read more)
[This comment is no longer endorsed by its author]Reply
1concerned_dad2mo
Likelyhood ratios is an interesting point I hadn't considered. I brought it up to him, and he believes if the change is big enough to impact his life, he'd notice (compared it to sleep deprivation), and if it's smaller, then it doesn't matter. Cumulative small changes over time was countered because he's apparently been benchmarking various aspects of "intelligence" for the past year and would detect a change to baseline. When he finds this post, he'll find the Nietsche part amusing (he's been reading classical philosophy recently), thanks for adding it!
2the gears to ascenscion2mo
downvote for the edgelord sarcasm being insufficiently marked. if nothing else I would want to add a recommendation for Benjamin Hoffman's blog Compass Rose, to counterbalance somewhat. there are probably others I could suggest as well. I liked everything else you said.

GPT gives the token "216" in the string "63 = 216" a very low probability, just as low as "215" or "217".

Replacing "63" with "62" in the prompt still gives "216" as an output with ~10% probability.

Would the tokenizer behave differently given "216" and "2^16", e.g. giving respectively the token "216" and some tokens like "**2" and "16*"? That would explain this as, GPT knows of course that 216 isn't 63, but, it's been forced to predict a relationship like "**2" + "16*" = "**63*".

3LawrenceC2mo
The Codex tokenizer used by the GPT-3.5 models tokenizes them differently: "216" is 1 token, "2^16" is 3 ("2", "^", "16"). Note that " 216" (with a space) is a different token, and it's what text-davinci-003 actually really wants to predict (you'll often see 100x probability ratios between these two tokens).    Here's the log probs of the two sequences using Adam's prompt above, with the trailing space removed (which is what he did in his actual setup, otherwise you get different probabilities):  2 16 -> -15.91  2^16 -> -1.34

coined by the anti-equality/human-rights/anti-LGBT/racist crowd

This is false. https://en.wikipedia.org/wiki/Woke 

6green_leaf2mo
Thanks. I didn't know that. So what I had in mind is the neologism [https://en.wiktionary.org/wiki/woke#Adjective] (the third meaning), but the original word has actually normal roots.

Essentially, the assumption I made explicitly, which is that there exists a policy which achieves shutdown with probability 1.

Oops, I missed that assumption. Yeah, if there's such a policy, and it doesn't trade off against fetching the coffee, then it seems like we're good. See though here, arguing briefly that by Cromwell's rule, this policy doesn't exist. https://arbital.com/p/task_goal/ 

Even with a realistic  probability of shutdown failing, if we don’t try to juice  so high that it exceeds , my guess is there woul

... (read more)
1davidad2mo
Yes, I think there are probably strong synergies with satisficing, perhaps lexicographically minimizing something like energy expenditure once the EU maximum is reached. I will think about this more.

Problem: suppose the agent foresees that it won't be completely sure that a day has passed, or that it has actually shut down. Then the agent A has a strong incentive to maintain control over the world past when it shuts down, to swoop in and really shut A down if A might not have actually shut down and if there might still be time. This puts a lot of strain on the correctness of the shutdown criterion: it has to forbid this sort of posthumous influence despite A optimizing to find a way to have such influence. 
(The correctness might be assumed by the... (read more)

0Sune2mo
A similar objection is that you might accidentially define the utility function and time limit in such a way that the AI assigns positive probability to the hypothesis that it can later create a time machine and go back and improve the utility. Then once the time has passed, it will desparately try to invent a time machine, even if it thinks it is extremely unlikely to succed (this is using Bostrom’s way of thinking. Shard theory would not predict this).
2davidad2mo
To the first point, I think this problem can be avoided with a much simpler assumption than that the shutdown criterion forbids all posthumous influence. Essentially, the assumption I made explicitly, which is that there exists a policy which achieves shutdown with probability 1. (We might need a slightly stronger version of this assumption: it might need to be the case that for any action, there exists an action which has the same external effect but also causes a shutdown with probability 1.) This means that the agent doesn’t need to build itself any insurance policy to guarantee that it shuts down. I think this is not a terribly inaccurate assumption; of course, in reality, there are cosmic rays and a properly embedded and self-aware agent might deduce that none of its future actions are perfectly reliable, even though a model-free RL agent would probably never see any evidence of this (and it wouldn’t be any worse at folding the laundry for it). Even with a realistic ϵ probability of shutdown failing, if we don’t try to juice 1−1/C so high that it exceeds 1−ϵ, my guess is there would not be enough incentive to justify the cost of building a successor agent just to raise that from 1−ϵ to 1.
3davidad2mo
To the second point, yes, I edited the conclusion to reflect this.

E.g. "does this plan avoid having a steganographically encoded world-ending message hidden in it" is more co-NPish than NP-ish. Like, just showing me the plan doesn't make it easy to verify that there isn't a hidden message, even if there isn't a hidden message. Checking whether a sequence of actions is the first half of a strategy to take over the world is potentially more like P-space. 

We've been steadily accumulating evidence since then that intelligence is compute-intensive. It's time to reject that threat model as a distraction.

 

If the AI is a much better programmer than humans are, then it has a pretty good shot at packing a lot more intelligence into the same amount of compute. 

Not exactly a disagreement, but I think this post is missing something major about classic style (the style in a more objective sense, maybe not Pinker's version). Namely, classic style can be taken as a sort of discipline which doesn't so much tell you how to write but rather makes strong recommendations about what to write. If you find yourself writing a lot of "I think..." and "Maybe..." and "My concept of..." and so on, you might want to questions whether you should be writing this, instead of thinking it through more carefully. This advice of course d... (read more)

IDK if helpful, but my comment on this post here is maybe related to fighting fire with fire (though Elizabeth might have been more thinking of strictly internal motions, or something else):

https://www.lesswrong.com/posts/kcoqwHscvQTx4xgwa/?commentId=bTe9HbdxNgph7pEL4#comments

And gjm's comment on this post points at some of the relevant quotes:

https://www.lesswrong.com/posts/kcoqwHscvQTx4xgwa/?commentId=NQdCG27BpLCTuKSZG

(Mainly for third parties:)

I don't care about people accepting my frame.

I flag this as probably not true.

Frankly, lots of folk here are bizarrely terrified of frames. I get why; there are psychological methods of attack based on framing effects.

It's the same sort of thing your post is about. 

Might have filtered folk well early on and helped those for whom it wasn't written relax a bit more.

I flag this as centering critical reactions being about the reacters not being relaxed, rather than that there might be something wrong with his post. 

Load More