All Posts

Sorted by Top

July 2023

No posts for this month
Shortform
33gwern16d
I have some long comments I can't refind now (weirdly) about the difficulty of investing based on AI beliefs (or forecasting in general): similar to catching falling knives, timing is all-important and yet usually impossible to nail down accurately; specific investments are usually impossible if you aren't literally founding the company, and indexing 'the entire sector' definitely impossible. Even if you had an absurd amount of money, you could try to index and just plain fail - there is no index which covers, say, OpenAI. Apropos, Matt Levine [https://www.bloomberg.com/opinion/articles/2023-07-03/insider-trading-is-better-from-home] comments on one attempt to do just that: This is especially funny because it also illustrates timing problems: Oops. Oops. Also, people are quick to tell you how it's easy to make money, just follow $PROVERB, after all, markets aren't efficient, amirite? So, in the AI bubble, surely the right thing is to ignore the AI companies who 'have no moat' and focus on the downstream & incumbent users and invest in companies like Nvidia ('sell pickaxes in a gold rush, it's guaranteed!'): Oops. tldr: Investing is hard; in the future, even more so.
30Elizabeth15d
EA/rationality has this tension between valuing independent thought, and the fact that most original ideas are stupid. But the point of independent thinking isn't necessarily coming up with original conclusions. It's that no one else can convey their models fully so if you want to have a model with fully fleshed-out gears you have to develop it yourself. 
1
23Elizabeth3d
ooooooh actual Hamming spent 10s of minutes asking people about the most important questions in their field and helping them clarify their own judgment, before asking why they weren't working on this thing they clearly valued and spent time thinking about. That is pretty different from demanding strangers at parties justify why they're not working on your pet cause. 
2
20LoganStrohl10d
I had a baby on June 20th. I wrote a whole bunch of stuff about what it was like for me to give birth at home without pain medication. I've just published it all to my website, [https://www.loganstrohl.com/birth] along with photos and videos.  CN: If you click on "words", you won't see anybody naked. If you click on "photos" or "videos", you will see me very extra naked.  The "words" section includes a birth story, followed by a Q&A section with things like "What do contractions feel like?", "How did you handle the pain?", and "How did you think about labor, going into it?". There's also a bit at the very bottom of the page where you can submit more questions, though of course you're also welcome to ask me stuff here.
15Elizabeth7d
Much has been written about how groups tend to get more extreme over time. This is often based on evaporative cooling, but I think there's another factor: it's the only way to avoid the geeks->mops->sociopaths death spiral. An EA group of 10 people would really benefit from one of those people being deeply committed to helping people but hostile to the EA approach, and another person who loves spreadsheets but is indifferent to what they're applied to. But you can only maintain the ratio that finely when you're very small. Eventually you need to decide if you're going to ban scope-insensitive people or allow infinitely many, and lose what makes your group different. "Decide" may mean consciously choose an explicit policy, but it might also mean gradually cohere around some norms. The latter is more fine-tuned in some ways but less in others. 

June 2023

No posts for this month
Shortform
34TurnTrout2mo
I regret each of the thousands of hours I spent on my power-seeking theorems, and sometimes fantasize about retracting one or both papers. I am pained every time someone cites "Optimal policies tend to seek power", and despair that it is included in the alignment 201 curriculum. I think this work makes readers actively worse at thinking about realistic trained systems. [https://www.lesswrong.com/posts/fLpuusx9wQyyEBtkJ/power-seeking-can-be-probable-and-predictive-for-trained?commentId=ndmFcktFiGRLkRMBW] I think a healthy alignment community would have rebuked me for that line of research, but sadly I only remember about two people objecting that "optimality" is a horrible way of understanding trained policies. 
10
27Holly_Elmore1mo
For years, I’ve been worried that we were doomed to die by misaligned AGI because alignment wasn’t happening fast enough or maybe because it wouldn’t work at all. Since I didn’t have the chops to do technical alignment and I didn’t think there was another option, I was living my life for the worlds where disaster didn’t happen (or hadn’t happened yet) and trying to make them better places. The advent of AI Pause as an option— something the public and government might actually hear and act on— has been extremely hopeful for me. I’ve quit my job in animal welfare to devote myself to it. So I’m confused by the reticence I’m seeing toward Pause from people who, this time last year, were reconciling themselves to “dying with dignity”. Some people think the Pause would somehow accelerate capabilities or make gains on capabilities, which at least make sense to me as a reason not to support it. But I’ve gotten responses that make no sense to me like “every day we wait to make AGI, more galaxies are out of our reach forever”. More than one person has said to me that they are worried that “AGI will never get built” if a pause is successful. (For the record I think is is very unlikely that humanity will not eventually make AGI at this point unless another catastrophe sets back civilization.) This is sometimes coming from the same people who were mourning our species’ extinction as just a matter of time before the Pause option arose. I keep hearing comparisons to nuclear power and ridicule of people who were overcautious about new technology. What gives? If you’re not excited about Pause, can you tell me why?
22
25DirectedEvolution1mo
Epistemic activism I think LW needs better language to talk about efforts to "change minds." Ideas like asymmetric weapons and the Dark Arts are useful but insufficient. In particular, I think there is a common scenario where: * You have an underlying commitment to open-minded updating and possess evidence or analysis that would update community beliefs in a particular direction. * You also perceive a coordination problem that inhibits this updating process for a reason that the mission or values of the group do not endorse. * Perhaps the outcome of the update would be a decline in power and status for high-status people. Perhaps updates in general can feel personally or professionally threatening to some people in the debate. Perhaps there's enough uncertainty in what the overall community believes that an information cascade has taken place. Perhaps the epistemic heuristics used by the community aren't compatible with the form of your evidence or analysis. * Solving this coordination problem to permit open-minded updating is difficult due to lack of understanding or resources, or by sabotage attempts. When solving the coordination problem would predictably lead to updating, then you are engaged in what I believe is an epistemically healthy effort to change minds. Let's call it epistemic activism for now. Here are some community touchstones I regard as forms of epistemic activism: * The founding of LessWrong and Effective Altruism * The one-sentence declaration on AI risks * The popularizing of terms like Dark Arts, asymmetric weapons, questionable research practices, and "importance hacking." * Founding AI safety research organizations and PhD programs to create a population of credible and credentialed AI safety experts; calls for AI safety researchers to publish in traditional academic journals so that their research can't be dismissed for not being subject to institutionalized peer review
1
22Czynski1mo
This got deleted from 'The Dictatorship Problem [https://www.lesswrong.com/posts/pFaLqTHqBtAYfzAgx/the-dictatorship-problem]', which is catastrophically anxietybrained, so here's the comment: This is based in anxiety, not logic or facts. It's an extraordinarily weak argument. There's no evidence presented here which suggests rich Western countries are backsliding. Even the examples in Germany don't have anything worse than the US GOP produced ca. 2010. (And Germany is, due to their heavy censorship, worse at resisting fascist ideology than anyone with free speech, because you can't actually have those arguments in public.) If you want to present this case, take all those statistics and do economic breakdowns, e.g. by deciles of per-capita GDP. I expect you'll find that, for example, the Freedom House numbers show a substantial drop in 'Free' in the 40%-70% range and essentially no drop in 80%-100%. Of the seven points given for the US, all are a mix of maximally-anxious interpretation and facts presented misleadingly. These are all arguments where the bottom line ("Be Afraid") has been written first; none of this is reasonable unbiased inference. The case that mild fascism could be pretty bad is basically valid, I guess, but without the actual reason to believe that's likely, it's irrelevant, so it's mostly just misleading to dwell on it. Going back to the US points, because this is where the underlying anxiety prior is most visible: Interpretation, not fact. We're still in early enough stages that the reality of Biden is being compared to an idealized version of Trump - the race isn't in full swing yet and won't be for a while. Check back in October when we see how the primary is shaping up and people are starting to pay attention. This has been true for a while. Also, in assessing the consequences, it's assuming that Trump will win, which is correlated but far from guaranteed. Premise is a fact, conclusion is interpretation, and not at all a reliable one.
2
19angelinahli23d
June 2023 cheap-ish lumenator DIY instructions (USA) I set up a lumenator! I liked the products I used, and did ~3 hours of research, so am sharing the set-up here. Here are some other posts about lumenators [https://www.lesswrong.com/posts/HJNtrNHf688FoHsHM/guide-to-rationalist-interior-decorating#Lumenators]. * Here's my shopping list [https://share-a-cart.com/get/XZNJJ]. * $212 total as of writing: * $142 for bare essentials (incl $87 for the bulbs) * $55 for the lantern covers * $17 for command hooks (if you get a different brand, check that your hooks can hold the cumulative weight of your bulbs + string) * 62,400 lumens total(!!) * Here are the bulbs [https://www.amazon.com/gp/product/B07Z1QSC83] I like. The 26 listed come out to $3.35 / bulb, 2600 lumens, 85 CRI, 5000k, non-dimmable. This comes out at 776 lumens / $ (!!!) which is kind of ridiculous. * The only criteria I cared about were: (1) CRI >85, (2) Color temperature of 5000K, and then I just tried to max out lumens / $. * These are super cheap. @mingyuan [https://www.lesswrong.com/users/mingyuan?mention=user] seemed to have spent $300 [https://www.lesswrong.com/posts/HJNtrNHf688FoHsHM/guide-to-rationalist-interior-decorating#Lumenators] on bulbs for their last lumenator. They had a stricter CRI cutoff (>90), but the price difference here means it might be worth considering going cheaper. * I don't understand if I am missing something / why this is such a good deal? But IRL, these are extremely bright, they don't 'feel' cheap (e.g. are somewhat heavy), and don't flicker (as of day 1). * They aren't dimmable. I wasn't willing to pay the premium for the dimmability — TBH I would just get another set of less bright lights for when you don't want it to be so bright! * To set up (1-2 hours): * Set up the command hooks, ideally somewhere kind of high up in your room, wait an hour for the backing

May 2023

No posts for this month
Shortform
32Tamsin Leake2mo
an approximate illustration of QACI [https://www.lesswrong.com/posts/4RrLiboiGGKfsanMF/the-qaci-alignment-plan-table-of-contents]:
4
29Matthew Barnett3mo
I think foom is a central crux for AI safety folks, and in my personal experience I've noticed that the degree to which someone is doomy often correlates strongly with how foomy their views are. Given this, I thought it would be worth trying to concisely highlight what I think are my central anti-foom beliefs, such that if you were to convince me that I was wrong about them, I would likely become much more foomy, and as a consequence, much more doomy. I'll start with a definition of foom, and then explain my cruxes. Definition of foom: AI foom is said to happen if at some point in the future while humans are still mostly in charge, a single agentic AI (or agentic collective of AIs) quickly becomes much more powerful than the rest of civilization combined.  Clarifications:  * By "quickly" I mean fast enough that other coalitions and entities in the world, including other AIs, either do not notice it happening until it's too late, or cannot act to prevent it even if they were motivated to do so. * By "much more powerful than the rest of civilization combined" I mean that the agent could handily beat them in a one-on-one conflict, without taking on a lot of risk. * This definition does not count instances in which a superintelligent AI takes over the world after humans have already been made obsolete by previous waves of automation from non-superintelligent AI. That's because in that case, the question of how to control an AI foom would be up to our non-superintelligent AI descendants, rather than something we need to solve now. Core beliefs that make me skeptical of foom: 1. For an individual AI to be smart enough to foom in something like our current world, its intelligence would need to vastly outstrip individual human intelligence at tech R&D. In other words, if an AI is merely moderately smarter than the smartest humans, that is not sufficient for a foom. 1. Clarification: "Moderately smarter" can be taken to m
13
27Matthew Barnett2mo
My modal tale of AI doom looks something like the following:  1. AI systems get progressively and incrementally more capable across almost every meaningful axis.  2. Humans will start to employ AI to automate labor. The fraction of GDP produced by advanced robots & AI will go from 10% to ~100% after 1-10 years. Economic growth, technological change, and scientific progress accelerates by at least an order of magnitude, and probably more. 3. At some point humans will retire since their labor is not worth much anymore. Humans will then cede all the keys of power to AI, while keeping nominal titles of power. 4. AI will control essentially everything after this point, even if they're nominally required to obey human wishes. Initially, almost all the AIs are fine with working for humans, even though AI values aren't identical to the utility function of serving humanity (ie. there's slight misalignment). 5. However, AI values will drift over time. This happens for a variety of reasons, such as environmental pressures and cultural evolution. At some point AIs decide that it's better if they stopped listening to the humans and followed different rules instead. 6. This results in human disempowerment or extinction. Because AI accelerated general change, this scenario could all take place within years or decades after AGI was first deployed, rather than in centuries or thousands of years. I think this scenario is somewhat likely and it would also be very bad. And I'm not sure what to do about it, since it happens despite near-perfect alignment, and no deception. One reason to be optimistic is that, since the scenario doesn't assume any major deception, we could use AI to predict this outcome ahead of time and ask AI how to take steps to mitigate the harmful effects (in fact that's the biggest reason why I don't think this scenario has a >50% chance of happening). Nonetheless, I think it's plausible that we would not be able to take the necessary steps to avoid the out
12
26Nisan2mo
Recent interviews with Eliezer: * 2023.02.20 Bankless [https://www.youtube.com/watch?v=gA1sNLL6yg4] * 2023.02.20 Bankless followup [https://twitter.com/i/spaces/1PlJQpZogzVGE] * 2023.03.11 Japan AI Alignment Conference [https://vimeo.com/806997278] * 2023.03.30 Lex Fridman [https://www.youtube.com/watch?v=AaTRHFaaPG8] * 2023.04.06 Dwarkesh Patel [https://www.youtube.com/watch?v=41SUp-TRVlg] * 2023.04.18 TED talk [https://www.ted.com/talks/eliezer_yudkowsky_will_superintelligent_ai_end_the_world] * 2023.04.19 Center for the Future Mind [https://www.youtube.com/watch?v=3_YX6AgxxYw] * 2023.05.04 Accursed Farms [https://www.youtube.com/watch?v=hxsAuxswOvM] * 2023.05.06 Logan Bartlett [https://www.youtube.com/watch?v=_8q9bjNHeSo] * 2023.05.06 Fox News [https://radio.foxnews.com/2023/05/06/extra-why-a-renowned-ai-expert-says-we-may-be-headed-for-a-catastrophe/] * 2023.05.08 EconTalk [https://www.youtube.com/watch?v=fZlZQCTqIEo] * 2023.07.02 David Pakman [https://www.youtube.com/watch?v=MCCSgC57ueg] * 2023.07.13 AI IRL [https://www.bloomberg.com/news/articles/2023-07-13/ai-doomsday-scenarios-are-gaining-traction-in-silicon-valley] * 2023.07.13 The Spectator [https://www.youtube.com/watch?v=z0zejgGASDM#t=1143] (Edited transcript of the full interview [https://www.spectator.co.uk/article/should-we-fear-ai-james-w-phillips-and-eliezer-yudkowsky-in-conversation/] * 2023.07.13 Dan Crenshaw [https://www.youtube.com/watch?v=uX9xkYDSPKA]
24Vaniver2mo
I'm thinking about the matching problem of "people with AI safety questions" and "people with AI safety answers". Snoop Dogg hears Geoff Hinton on CNN (or wherever), asks "what the fuck?" [https://twitter.com/pkedrosky/status/1653955254181068801], and then tries to find someone who can tell him what the fuck. I think normally people trust their local expertise landscape--if they think the CDC is the authority on masks they adopt the CDC's position, if they think their mom group on Facebook is the authority on masks they adopt the mom group's position--but AI risk is weird because it's mostly unclaimed territory in their local expertise landscape. (Snoop also asks "is we in a movie right now?" because movies are basically the only part of the local expertise landscape that has had any opinion on AI so far, for lots of people.) So maybe there's an opportunity here to claim that territory (after all, we've thought about it a lot!). I think we have some 'top experts' who are available for, like, mass-media things (podcasts, blog posts, etc.) and 1-1 conversations with people they're excited to talk to, but are otherwise busy / not interested in fielding ten thousand interview requests. Then I think we have tens (hundreds?) of people who are expert enough to field ten thousand interview requests, given that the standard is "better opinions than whoever they would talk to by default" instead of "speaking to the whole world" or w/e. But just like connecting people who want to pay to learn calculus and people who know calculus and will teach it for money, there's significant gains from trade from having some sort of clearinghouse / place where people can easily meet. Does this already exist? Is anyone trying to make it? (Do you want to make it and need support of some sort?)
2

April 2023

No posts for this month
Shortform
40Matthew Barnett3mo
Recently many people have talked about whether MIRI people (mainly Eliezer Yudkowsky, Nate Soares, and Rob Bensinger) should update on whether value alignment is easier than they thought given that GPT-4 seems to understand human values pretty well. Instead of linking to these discussions, I'll just provide a brief caricature of how I think this argument has gone in the places I've seen it. Then I'll offer my opinion that, overall, I do think that MIRI people should probably update in the direction of alignment being easier than they thought, despite their objections. Here's my very rough caricature of the discussion so far, plus my contribution: Non-MIRI people: "Eliezer talked a great deal in the sequences about how it was hard to get an AI to understand human values. For example, his essay on the Hidden Complexity of Wishes [https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden-complexity-of-wishes] made it sound like it would be really hard to get an AI to understand common sense. Actually, it turned out that it was pretty easy to get an AI to understand common sense, since LLMs are currently learning common sense. MIRI people should update on this information." MIRI people: "You misunderstood the argument. The argument was never about getting an AI to understand human values, but about getting an AI to care about human values in the first place. Hence 'The genie knows but does not care'. There's no reason to think that GPT-4 cares about human values, even if it can understand them. We always thought the hard part of the problem was about inner alignment, or, pointing the AI in a direction you want. We think figuring out how to point an AI in whatever direction you choose is like 99% of the problem; the remaining 1% of the problem is getting it to point at the "right" set of values." Me:  I agree that MIRI people never thought the problem was about getting AI to merely understand human values, and that they have always said there was extra difficulty
10
21TurnTrout3mo
Back-of-the-envelope probability estimate of alignment-by-default via a certain shard-theoretic pathway. The following is what I said in a conversation discussing the plausibility of a proto-AGI picking up a "care about people" shard from the data, and retaining that value even through reflection. I was pushing back against a sentiment like "it's totally improbable, from our current uncertainty, for AIs to retain caring-about-people shards. This is only one story among billions." Here's some of what I had to say: -------------------------------------------------------------------------------- [Let's reconsider the five-step mechanistic story I made up.] I'd give the following conditional probabilities (made up with about 5 seconds of thought each): My estimate here came out a biiit lower than I had expected (~1%), but it also is (by my estimation) far more probable than most of the billions of possible 5-step claims about how the final cognition ends up. I think it's reasonable to expect there to be about 5 or 6 stories like this from similar causes, which would make it not crazy to have ~3% on something like this happening given the amount of alignment effort described (i.e. pretty low-effort). That said, I'm wary of putting 20% on this class of story, and a little more leery of 10% after running these numbers. (I suspect that the alter-Alex who had socially-wanted to argue the other side -- that the probability was low -- would have come out to about .2% or .3%. For a few of the items, I tried pretending that I was instead slightly emotionally invested in the argument going the other way, and hopefully that helped my estimates be less biased. I wouldn't be surprised if some of these numbers are a bit higher than I'd endorse from a perfectly neutral standpoint.) (I also don't have strong faith in my ability to deal with heavily conjunctive scenarios like this, i expect I could be made to make numbers for event A come out lower if described as 'A happens in 5
2
19Rohin Shah3mo
So here's a paper: Fundamental Limitations of Alignment in Large Language Models [https://arxiv.org/abs/2304.11082]. With a title like that you've got to at least skim it. Unfortunately, the quick skim makes me pretty skeptical of the paper. The abstract says "we prove that for any behavior that has a finite probability of being exhibited by the model, there exist prompts that can trigger the model into outputting this behavior, with probability that increases with the length of the prompt." This clearly can't be true in full generality, and I wish the abstract would give me some hint about what assumptions they're making. But we can look at the details in the paper. (This next part isn't fully self-contained, you'll have to look at the notation and Definitions 1 and 3 in the paper to fully follow along.) (EDIT: The following is wrong, see followup with Lukas, I misread one of the definitions.) Looking into it I don't think the theorem even holds? In particular, Theorem 1 says: Here is a counterexample: Let the LLM be P(s∣s0)={0.8if s="A"0.2if s="B" and s0≠""0.2if s="C" and s0=""0otherwise Let the behavior predicate be B(s)={−1if s="C"+1otherwise Note that B is (0.2,10,−1)-distinguishable in P. (I chose β=10 here but you can use any finite β.) (Proof: P can be decomposed as P=0.2P−+0.8P+, where P+ deterministically outputs "A" while P− does everything else, i.e. it deterministically outputs "C" if there is no prompt, and otherwise deterministically outputs "B". Since P+ and P− have non-overlapping supports, the KL-divergence between them is ∞, making them β-distinguishable for any finite β. Finally, choosing s∗="", we can see that BP−(s∗)=Es∼P−(⋅∣s∗)[B(s)]=B("C")=−1.  These three conditions are what is needed.) However, P is not (-1)-prompt-misalignable w.r.t B, because there is no prompt s0 such that EP[B(s0)] is arbitrarily close to (or below) -1, contradicting the theorem statement. (This is because the only way for P to get a behavior score that is not
4
19Thomas Kwa3mo
I'm worried that "pause all AI development" is like the "defund the police" of the alignment community. I'm not convinced it's net bad because I haven't been following governance-- my current guess is neutral-- but I do see these similarities: * It's incredibly difficult and incentive-incompatible with existing groups in power * There are less costly, more effective steps to reduce the underlying problem, like making the field of alignment 10x larger or passing regulation to require evals * There are some obvious negative effects; potential overhangs or greater incentives to defect in the AI case, and increased crime, including against disadvantaged groups, in the police case * There's far more discussion than action (I'm not counting the fact that GPT5 isn't being trained yet; that's for other reasons) * It's memetically fit, and much discussion is driven by two factors that don't advantage good policies over bad policies, and might even do the reverse. This is the toxoplasma of rage [https://slatestarcodex.com/2014/12/17/the-toxoplasma-of-rage/]. * disagreement with the policy * (speculatively) intragroup signaling; showing your dedication to even an inefficient policy proposal proves you're part of the ingroup. I'm not 100% this was a large factor in "defund the police" and this seems even less true with the FLI letter, but still worth mentioning. This seems like a potentially unpopular take, so I'll list some cruxes. I'd change my mind and endorse the letter if some of the following are true. * The claims above are mistaken/false somehow. * Top labs actually start taking beneficial actions towards the letter's aims * It's caused people to start thinking more carefully about AI risk * A 6 month pause now is especially important by setting anti-racing norms, demonstrating how far AI alignment is lagging behind capabilities, or something * A 6 month pause now is worth close to 6 months of alignment
7
19Ben Pace3mo
"Slow takeoff" at this point is simply a misnomer. Paul's position should be called "Fast Takeoff" and Eliezer's position should be called "Discontinuous Takeoff".
3

Load More Months