All of sudo -i's Comments + Replies

Shallow comment:

How are you envisioning the prevention of strategic takeovers? It seems plausible that robustly preventing strategic takeovers would also require substantial strategizing/actualizing.

3TsviBT5d
Are you echoing this point from the post? It might be possible for us humans to prevent strategicness, though this seems difficult because even detecting strategicness is maybe very difficult. E.g. because thinking about X also sneakily thinks about Y: https://tsvibt.blogspot.com/2023/03/the-fraught-voyage-of-aligned-novelty.html#inexplicitness [https://tsvibt.blogspot.com/2023/03/the-fraught-voyage-of-aligned-novelty.html#inexplicitness] My mainline approach is to have controlled strategicness, ideally corrigible (in the sense of: the mind thinks that [the way it determines the future] is probably partially defective in an unknown way).

The first point isn’t super central. FWIW I do expect that humans will occasionally not swap words back.

Humans should just look at the noised plan and try to convert it into a more reasonable-seeming, executable plan.

Edit: that is, without intentionally changing details.

Fair enough! For what it’s worth, I think the reconstruction is probably the more load-bearing part of the proposal.

Is your claim that the noise borne asymmetric pressure away from treacherous plans disappears in above-human intelligences? I could see it becoming less material as intelligence increases, but the intuition should still hold in principle.

2shminux19d
I am not confidently claiming anything, not really an expert... But yeah, I guess I like the way you phrased it. The more disparity there is in intelligence, the less extra noise matters. I do not have a good model of it though. Just feels like more and more disparate dangerous paths appear in this case, overwhelming the noise.

"Most paths lead to bad outcomes" is not quite right. For most (let's say human developed, but not a crux) plan specification languages, most syntactically valid plans in that language would not substantially permute the world state when executed.

I'll begin by noting that over the course of writing this post, the brittleness of treacherous plans became significantly less central.

However, I'm still reasonably convinced that the intuition is sound. If a plan is adversarial to humans, the plan's executor will face adverse optimization pressure from humans and... (read more)

2shminux20d
I can see that working when the entity is at the human level of intelligence or less. Maybe I misunderstand the setup, and this is indeed the case. I can't imagine that it would work on a superintelligence...

i.e. that the problem is easily enough addressed that it can be done by firms in the interests of making a good product and/or based on even a modest amount of concern from their employees and leadership

I'm curious about how contingent this prediction is on 1, timelines and 2, rate of alignment research progress. On 2, how much of your P(no takeover) comes from expectations about future research output from ARC specifically?

If tomorrow, all alignment researchers stopped working on alignment (and went to become professional tennis players or something) and no new alignment researchers arrived, how much more pessimistic would you become about AI takeover?

These predictions are not very related to any alignment research that is currently occurring. I think it's just quite unclear how hard the problem is, e.g. does deceptive alignment occur, do models trained to honestly answer easy questions generalize to hard questions, how much intellectual work are AI systems doing before they can take over, etc.

I know people have spilled a lot of ink over this, but right now I don't have much sympathy for confidence that the risk will be real and hard to fix (just as I don't have much sympathy for confidence that the problem isn't real or will be easy to fix).

Epistemic Status: First read. Moderately endorsed.

I appreciate this post and I think it's generally good for this sort of clarification to be made.

 

One distinction is between dying (“extinction risk”) and having a bad future (“existential risk”). I think there’s a good chance of bad futures without extinction, e.g. that AI systems take over but don’t kill everyone.

This still seems ambiguous to me. Does "dying" here mean literally everyone? Does it mean "all animals," all mammals," "all humans," or just "most humans? If it's all humans dying, do all hu... (read more)

4paulfchristiano2mo
I think these questions are all still ambiguous, just a little bit less ambiguous. I gave a probability for "most" humans killed, and I intended P(>50% of humans killed). This is fairly close to my estimate for E[fraction of humans killed]. I think if humans die it is very likely that many non-human animals die as well. I don't have a strong view about the insects and really haven't thought about it. In the final bullet I implicitly assumed that the probability of most humans dying for non-takeover reasons shortly after building AI was very similar to the probability of human extinction; I was being imprecise, I think that's kind of close to true but am not sure exactly what my view is.

I think this is probably good to just 80/20 with like a weekend of work? So that there’s a basic default action plan for what to do when someone goes “hi designated community person, I’m depressed.”

People really should try to not have depression. Depression is bad for your productivity. Being depressed for eg a year means you lose a year of time, AND it might be bad for your IQ too.

A lot of EAs get depressed or have gotten depressed. This is bad. We should intervene early to stop it.

I think that there should be someone EAs reach out to when they’re depressed (maybe this is Julia Wise?), and then they get told the ways they’re probably right and wrong so their brain can update a bit, and a reasonable action plan to get them on therapy or meds or whatever.

2Dagon2mo
I don't disagree, but I don't think it's limited to EA or Rationalist community members, and I wouldn't expect that designated group helper contacts will reach most of the people who need it.  It's been my experience (for myself and for a number of friends) that when someone can use this kind of help, they tend not to "reach out" for it. Your framing of "we should intervene" may have more promise.  Having specific advice on HOW lay-people can intervene would go a long way toward shifting our norms of discourse from "you seem depressed, maybe you should seek help" to "this framing may indicate a depressive episode or negative emotional feedback loop - please take a look at <this page/thread> to help figure out who you can talk with about it".
3sudo -i2mo
I think this is probably good to just 80/20 with like a weekend of work? So that there’s a basic default action plan for what to do when someone goes “hi designated community person, I’m depressed.”

Strong upvoted.

I’m excited about people thinking carefully about publishing norms. I think this post existing is a sign of something healthy.

Re Neel: I think that telling junior mech interp researchers to not worry too much about this seems reasonable. As a (very) junior researcher, I appreciate people not forgetting about us in their posts :)

I'd be excited about more people posting their experiences with tutoring 

Short on time. Will respond to last point.

I wrote that they are not planning to "solve alignment once and forever" before deploying first AGI that will help them actually develop alignment and other adjacent sciences.

Surely this is because alignment is hard! Surely if alignment researchers really did find the ultimate solution to alignment and present it on a silver platter, the labs would use it.

Also: An explicit part of SERI MATS’ mission is to put alumni in orgs like Redwood and Anthropic AFAICT. (To the extent your post does this,) it’s plausibly a mistake to treat SERI MATS like an independent alignment research incubator.

5Ryan Kidd2mo
MATS aims to find and accelerate alignment research talent, including: * Developing scholar research ability through curriculum elements focused on breadth, depth, and originality (the "T-model of research"); * Assisting scholars in producing impactful research through research mentorship, a community of collaborative peers, dedicated 1-1 support, and educational seminars; * Aiding the creation of impactful new alignment organizations (e.g., Jessica Rumbelow's Leap Labs and Marius Hobbhahn's Apollo Research); * Preparing scholars for impactful alignment research roles in existing organizations. Not all alumni will end up in existing alignment research organizations immediately; some return to academia, pursue independent research, or potentially skill-up in industry (to eventually aid alignment research efforts). We generally aim to find talent with existing research ability and empower it to work on alignment, not necessarily through existing initiatives (though we certainly endorse many).
1Roman Leventov2mo
Yes, admittedly, there is much less strain on being very good at philosophy of science if you are going to work within a team with a clear agenda, particularly within AGI lab where the research agendas tend to be much more empirical than in "academic" orgs like MIRI or ARC. And thinking about research strategy is not the job of non-leading researchers at these orgs either, whereas independent researcher or researchers at more boutique labs have to think about their strategies by themselves. Founders of new orgs and labs have to think about their strategies very hard, too. But preparing employees for OpenAI, Antrhopic, or DeepMind is clearly not the singular focus of SERI MATS.

Epistemic status: hasty, first pass

First of all thanks for writing this.

I think this letter is “just wrong” in a number of frustrating ways.

A few points:

  • “Engineering doesn’t help unless one wants to do mechanistic interpretability.” This seems incredibly wrong. Engineering disciplines provide reasonable intuitions for how to reason about complex systems. Almost all engineering disciplines require their practitioners to think concretely. Software engineering in particular also lets you run experiments incredibly quickly, which makes it harder to be wrong.
... (read more)
1Roman Leventov2mo
I should have written "ML engineering" (I think it was not entirely clear from the context, fixed now). Knowing the general engineering methodology and the typical challenges in systems engineering for robustness and resilience is, of course, useful, and having visceral experience of these (e.g., engineering distributed systems, coding oneself bugs in the systems and seeing how they may fail in unexpected ways). But I would claim that learning this through practice, i.e., learning "from one's own mistakes", is again inefficient. Smart people learn from others' mistakes. Just going through some of the materials from here [https://github.com/lorin/resilience-engineering] would give alignment researchers much more useful insights than years of hands-on engineering practice[1]. Again, it's an important qualification that we are talking about what's effective for theoretical-ish alignment research, not actual engineering of (AGI) systems! I don't argue that ML theory is useless. I argue that going through ML courses that spend too much time on building basic MLP networks or random forests (and understanding the theory of these, though it's minimal) is ineffective. I personally stay abreast of ML research by following MLST podcast (e.g., on spiking NNs [https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy8xZTRhMGVhYy9wb2RjYXN0L3Jzcw/episode/YzMzNGMwMzQtOGM0Yy00NWY0LWJhNmYtZTA3ZTlhNzUyM2Ew?sa=X&ved=0CAUQkfYCahcKEwiAlaKuirn-AhUAAAAAHQAAAAAQDg], deep RL [https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy8xZTRhMGVhYy9wb2RjYXN0L3Jzcw/episode/YzFhZDZmMzYtMDZlMC00YTE3LWJkZjAtMDA0NjUxYzJmNTU0?sa=X&ved=0CAUQkfYCahcKEwiAlaKuirn-AhUAAAAAHQAAAAAQDg], Domingos on neurosymbolic and lots of other stuff [https://www.lesswrong.com/posts/aLdqXWa2svnpvufQ6/roman-leventov-s-shortform?commentId=MFFmbww73q4NFodmj], a series of interviews with people at Cohere: Hooker [https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy8xZTRhMGVhYy9wb2RjYXN0L3Jzcw/episode/NWQ1M2YzMjUtNT
2sudo -i2mo
Also: An explicit part of SERI MATS’ mission is to put alumni in orgs like Redwood and Anthropic AFAICT. (To the extent your post does this,) it’s plausibly a mistake to treat SERI MATS like an independent alignment research incubator.

Ordering food to go and eating it at the restaurant without a plate and utensils defeats the purpose of eating it at the restaurant

Restaurants are a quick and convenient way to get food, even if you don’t sit down and eat there. Ordering my food to-go saves me a decent amount of time and food, and also makes it frictionless to leave.

But judging by votes, it seems like people don’t find this advice very helpful. That’s fine :(

2Jiro3mo
It sounded like you were suggesting that people order the food to go even if they're eating there. Ordering it to go and then actually going makes more sense, but still has the problem of "what is your reason for going to a restaurant?" Most people who go to restaurants want to eat there a large portion of the time.

I’m planning on removing this post and replacing it with a single big post of life optimizations.

I think there might be a misunderstanding. I order food because cooking is time-consuming, not because it doesn’t have enough salt or sugar.

1nim3mo
Have you considered ordering catering for a "group" a couple times a week, and having your meals from the single catering order for several days, instead of spending the time choosing and acquiring more premade food each day? I've seen some folks online who have great success using catering as a meal prep option because it's more frugal than ordering separate meals, but it also incurs less time investment as well as costing less money.

Maybe it’d be good if someone compiled a list of healthy restaurants available on DoorDash/Uber Eats/GrubHub in the rationalist/EA hubs?

1nim3mo
IMO the most obvious harm reduction strategy for "fast food delivery is expensive and terrible" is not to order different fast food, but to keep pre-made frozen meals on hand. You can buy frozen meals with the nutrition profile of your choice, make them yourself, or pay someone to make them for you. This costs less money and time than ordering delivery, and has the added benefit of leveraging that cognitive bias where you make "healthier" food choices when planning meals in advance compared to decisions that you make while hungry. I'd postulate that people often order delivery because it's the quickest and easiest option available to them. It seems like getting people (including oneself) to eat something healthier than their defaults is a matter of making something even quicker and easier available, rather than offering a choice between "do it your usual way" and a higher-friction option of checking a list first.
2trevor3mo
The advice that I heard is to put more and more salt into your cooking, until that you feel satisfied with your cooking and become less likely to order food (which will have tons of salt anyway, way more than you would ever add). There's no easy fix with sugar because it's addictive and has a withdrawal period.

Can’t you just combat this by drinking water?

2trevor3mo
That results in too much salt and too much water, and not enough of the other stuff (e.g. electrolytes). Adding in more of the other stuff doesn't solve the problem, it means your metabolism is going too quickly, because more is going in and therefore more has to be going out at the same time. The human metabolism has, like, a million interconnected steps, and increasing the salinity or speed of your bloodstream affects all of them at once.

If you plan to eat at the restaurant, you can just ask them for a box if you have food left over.

This is true at most restaurants. Unfortunately, it often takes a long time for the staff to prepare a box for you (o(5 minutes)).

A potential con is that most food needs to be refrigerated if you want to keep it safe to eat for several hours

One might simply get into the habit of putting whatever food they have in the refrigerator. I find that refrigerated food is usually not unpleasant to eat, even without heating.

2nim3mo
My experience is that not all locations (work etc) have refrigerator space conveniently available, but if you have access, that's great! I find that asking the person at the counter to give me a box is a much quicker operation than asking wait staff to put my food into a box for me.

Sometimes when you purchase an item, the cashier will randomly ask you if you’d like additional related items. For example, when purchasing a hamburger, you may be asked if you’d like fries.

It is usually a horrible idea to agree to these add-ons, since the cashier does not inform you of the price. I would like fries for free, but not for $100, and not even for $5.

The cashier’s decision to withhold pricing information from you should be evidence that you do not, in fact, want to agree to the deal.

2Dagon3mo
For most LW readers, it's usually a bad idea, because many of us obsessively put cognitive effort into unimportant choices like what to order at a hamburger restaurant, and reminders or offers of additional things don't add any information or change our modeling of our preferences, so are useless.  For some, they may not be aware that fries were not automatic, or may not have considered whether they want fries (at the posted price, or if price is the decider, they can ask), and the reminder adds salience to the question, so they legitimately add fries.  Still others feel it as (a light, but real) pressure to fit in or please the cashier by accepting, and accept the add-on out of guilt or whatever.   Some of these reasons are "successes" in terms of mutually-beneficial trade, some are "predatory" in that the vendor makes more money and the customer doesn't get the value they'd hoped. Many are "irrelevant" in that they waste a small amount of time and change no decisions.   I think your heuristic of "decline all non-solicited offers" is pretty strong, in most aspects of the world.  
3Richard_Kennaway3mo
You could always ask. I ignore upsells because I've already decided what I want and ordered that, whether it's extra fries or a hotel room upgrade.

Epistemic status: clumsy

An AI could also be misaligned because it acts in ways that don't pursue any consistent goal (incoherence).

It’s worth noting that this definition of incoherence seems inconsistent with VNM. Eg. A rock might satisfy the folk definition of “pursuing a consistent goal,” but fail to satisfy VNM due to lacking completeness (and by corollary due to not performing expected utility optimization over the outcome space).

Strong upvoted.

The result is surprising and raises interesting questions about the nature of coherence. Even if this turns out to be a fluke, I predict that it’d be an informative one.

I think I was deceived by the title.

I’m pretty sure that rapid capability generalization is distinct from the sharp left turn.

dedicated to them making the sharp left turn

I believe that “treacherous turn” was meant here.

1scasper4mo
thanks

Wait I’m pretty confident that this would have the exact opposite effect on me.

1Christopher King4mo
Well it helps that he is super chill. It's not like he's micromanaging me, but if I start literally goofing off he'd probably notice, lol.

You can give ChatGPT the job posting and a brief description of Simon’s experiment, and then just ask them to provide critiques from a given perspective (eg. “What are some potential moral problems with this plan?”)

2the gears to ascension4mo
ah, I see, yeah, solid and makes sense.

I clicked the link and thought it was a bad idea ex post. I think that my attempted charitable reading of the Reddit comments revealed significantly less constructive data than what would have been provided by ChatGPT.

I suspect that rationalists engaging with this form of content harms the community a non-trivial amount.

2the gears to ascension4mo
Interesting, if the same could be done with chatgpt I'd be curious to hear how you'd frame the question. If the same analysis can be done with chatgpt I'd do it consistently. Can you say more about how it causes harm? I'd like to find a way to reduce that harm, because there's a lot of good stuff in this sort of analysis, but you're right that there's a tendency to use extremely spiky words. A favorite internet poster of mine has some really interesting takes on how it's important to use soft language and not demand people agree, which folks on that subreddit are in fact pretty bad at doing. It's hard to avoid it at times, though, when one is impassioned.

I’m a fan of this post, and I’m very glad you wrote it.

I understand feeling frustrated given the state of affairs, and I accept your apology.

Have a great day.

You don’t have an accurate picture of my beliefs, and I’m currently pessimistic about my ability to convey them to you. I’ll step out of this thread for now.

6the gears to ascension6mo
that's fair. I apologize for my behavior here; I should have encoded my point better, but my frustration is clearly incoherent and overcalibrated. I'm sorry to have wasted your time and reduced the quality of this comments section.

I find the accusation that I'm not going to do anything slightly offensive.

Of course, I cannot share what I have done and plan to do without severely de-anonymizing myself. 

I'm simply not going to take humanity's horrific odds of success as a license to make things worse, which is exactly what you seem to be insisting upon.

5the gears to ascension6mo
no, there's no way to make it better that doesn't involve going through, though. your model that any attempt to understand or use capabilities is failure is nonsense, and I wish people on this website would look in a mirror about what they're claiming when they say that. that attitude was what resulted in mispredicting alphago! real safety research is always, always, always capabilities research! it could not be otherwise!

Default comment guidelines:

  • Aim to explain, not persuade
  • Try to offer concrete models and predictions
  • If you disagree, try getting curious about what your partner is thinking
  • Don't be afraid to say 'oops' and change your mind
2the gears to ascension6mo
I mean, yeah, I definitely don't belong on this website, I'm way too argumentative. like, I'm not gonna contest that. But are you gonna actually do anything about your beliefs, or are you gonna sit around insisting we gotta slow down?

Your reply does not even remotely resemble good faith engagement. 

You can unilaterally slow down AI progress by not working on it. Each additional day until the singularity is one additional day to work on alignment.

"Becoming the fire" because you're doomer-pilled is maximally undignified. 

-2the gears to ascension6mo
You cannot unilaterally slow down AI progress by not working on it??? what the fuck kind of opinion is that? deepmind is ahead of you. Deepmind will always be ahead of you. You cannot catch up to deepmind. for fuck's sake, deepmind has a good shot of having TAI right now, and you want me to slow the fuck down? the fuck is your problem, have you still not updated off of deep learning?

Why not create non-AI startups that are way less likely to burn capabilities commons?

3Heighn6mo
It seems to me joshc is arguing that it's relatively easy to make money with AI startups at the moment.
-5the gears to ascension6mo

Random thoughts:

  1. Wouldn't it be best for the rolling admissions MATS be part of MATS? 
  2. Some ML safety engineering bootcamps scare me. Once you're taking in large groups of new-to-EA/new-to-safety people and teaching them how to train transformers, I'm worried about downside risks. I have heard that Redwood has been careful about this. Cool if true. 
  3. What does building a New York-based hub look like?
1Ryan Kidd6mo
1. Currently, MATS is somewhat supporting rolling admissions for a minority of mentors with our Autumn and Spring cohorts (which are generally extensions of our Summer and Winter cohorts, respectively). Given that MATS is mainly focused on optimizing the cohort experience for scholars (because we think starting a research project in an academic cohort of people with similar experience with targeted seminars and workshops is ideal), we are probably a worse experience for scholars or mentors who ideally would start research projects at irregular intervals. Some scholars might not benefit as much from the academic cohort experience as others. Some mentors might ideally commit to mentorship during times of the year outside of MATS' primary Winter/Summer cohorts. Also, MATS' seminar program doesn't necessarily run year-round, and we don't offer as much logistical support to scholars outside of Winter/Summer. There is definitely free energy here for a complementary program, I think. 2. I am also scared about ML upskilling bootcamps that act as feeder grounds for AI capabilities organizations. I think vetting (including perhaps AGISF prerequisite) is key, as is a clear understanding of where the participants will go next. I only recommend this kind of project because hundreds of people seemingly complete AGISF and want to upskill to work on AI alignment but have scant opportunities. Also, MATS' theory of change includes adding value through accelerating the development of (rare) "research leads [https://forum.effectivealtruism.org/posts/7WXPkpqKGKewAymJf/how-to-pursue-a-career-in-technical-ai-alignment#Types_of_alignment_work]" to increase the "carrying capacity" of the alignment research ecosystem (which theoretically is not principally bottlenecked by "research supporter" talent because training/buying such talent scales much easier than training/buying "research lead" talen

What sort of value do you expect to get out of "crossing the theory-practice gap"?

Do you think that this will result in better insights about which direction to focus in during your research, for example? 

2johnswentworth6mo
Some general types of value which are generally obtained by taking theories across the theory-practice gap: * Finding out where the theory is wrong * Direct value from applying the theory * Creating robust platforms upon which further tools can be developed

I filled out an application. This looks like a very promising program.

I was watching some clips of Aaron Gwin's (American professional mountain bike racer) riding recently. Reflecting on how amazing humans are. How good we can get, with training and discipline.

Did some math today, and remembered what I love about it. Being able to just learn, without the pressure and anxiety of school, is so wonderfully joyful. I'm going back to basics, and making sure that I understand absolutely everything.

I'm feeling very excited about my future. I'm going to learn so much. I'm going to have so much fun. I'm going to get so good.

When I first started college, I set myself the goal of looking, by now, like an absolute wizard to me from a year ago. To be advanced enough to be indistinguishable from magic.

A year in, I now can do ... (read more)

How would you identify a second Yudkowsky? I really don’t like this trope.

By writing ability?

2MondSemmel7mo
For instance by being acknowledged by the first Yudkowsky as the second one. I was referring here mostly to the difficulty of trying to impart expertise from one person to another. Experts can write down and teach legible insights, but the rest of their expertise (which is often the most important stuff) is very hard to teach.
3Thomas Kwa7mo
There's a clarification by John here [https://www.lesswrong.com/posts/moi3cFY2wpeKGu9TT/clarifying-the-agent-like-structure-problem]. I heard it was going to be put on Superlinear but unclear if/when.

I thought they were calling me a flying Minecraft pig https://aether.fandom.com/wiki/Phyg

Well, let me be the first to say that I don't think you're a passive mob that can be found in the aether.

It’s hard but not impossible to put 10k hours of deliberate practice into a hobby

I’ve made this decision offline a long time ago.

1Kalmere7mo
Your life, your choice. Just saying as a career machine learning specialist, just make sure your plans are robust. Leaving academica to go into what could be considered ML research raised major red flags to me. But I don't know your situation - you may have a golden opportunity. Job offer from OpenAI et al! Farewell and good luck.

Lol idk why people get the impression that I’m relying on LW for career advice.

I’m not.

1sudo -i7mo
I’ve made this decision offline a long time ago.
1Kalmere7mo
I'll rephrase. Wanting to take a drastic career turn could be a symptom of many other things. Degree burnout. Wanting to do a different subject. Depression. Chronic fatigue from an unknown medical cause. Wanting to get out of a situation (relationship, etc). I do not know you, so any guess I would make would not be helpful. But my gut feel is that it is worth getting second opinions from those close to you with more information. This is an online forum- I suggest you get second opinion before making drastic decisions. I know of several people taking PhDs without having things clear in their mind. That didn't work out well

I really don't want to entertain this "you're in a cult" stuff.

It's not very relevant to the post, and it's not very intellectually engaging either. I've dedicated enough cycles to this stuff. 

6jacob_cannell7mo
That's not really what I'm saying: it's more like this community naturally creates nearby phyg-like attractors which take some individually varying effort to avoid. If you don't have any significant differences of opinion/viewpoint you may already be in the danger zone. There are numerous historical case examples of individuals spiraling too far in, if you know where to look.
Load More