All Posts

Sorted by Magic (New & Upvoted)

June 2023

No posts for this month
Shortform
25DirectedEvolution5d
Epistemic activism I think LW needs better language to talk about efforts to "change minds." Ideas like asymmetric weapons and the Dark Arts are useful but insufficient. In particular, I think there is a common scenario where: * You have an underlying commitment to open-minded updating and possess evidence or analysis that would update community beliefs in a particular direction. * You also perceive a coordination problem that inhibits this updating process for a reason that the mission or values of the group do not endorse. * Perhaps the outcome of the update would be a decline in power and status for high-status people. Perhaps updates in general can feel personally or professionally threatening to some people in the debate. Perhaps there's enough uncertainty in what the overall community believes that an information cascade has taken place. Perhaps the epistemic heuristics used by the community aren't compatible with the form of your evidence or analysis. * Solving this coordination problem to permit open-minded updating is difficult due to lack of understanding or resources, or by sabotage attempts. When solving the coordination problem would predictably lead to updating, then you are engaged in what I believe is an epistemically healthy effort to change minds. Let's call it epistemic activism for now. Here are some community touchstones I regard as forms of epistemic activism: * The founding of LessWrong and Effective Altruism * The one-sentence declaration on AI risks * The popularizing of terms like Dark Arts, asymmetric weapons, questionable research practices, and "importance hacking." * Founding AI safety research organizations and PhD programs to create a population of credible and credentialed AI safety experts; calls for AI safety researchers to publish in traditional academic journals so that their research can't be dismissed for not being subject to institutionalized peer review
1
21Czynski5d
This got deleted from 'The Dictatorship Problem [https://www.lesswrong.com/posts/pFaLqTHqBtAYfzAgx/the-dictatorship-problem]', which is catastrophically anxietybrained, so here's the comment: This is based in anxiety, not logic or facts. It's an extraordinarily weak argument. There's no evidence presented here which suggests rich Western countries are backsliding. Even the examples in Germany don't have anything worse than the US GOP produced ca. 2010. (And Germany is, due to their heavy censorship, worse at resisting fascist ideology than anyone with free speech, because you can't actually have those arguments in public.) If you want to present this case, take all those statistics and do economic breakdowns, e.g. by deciles of per-capita GDP. I expect you'll find that, for example, the Freedom House numbers show a substantial drop in 'Free' in the 40%-70% range and essentially no drop in 80%-100%. Of the seven points given for the US, all are a mix of maximally-anxious interpretation and facts presented misleadingly. These are all arguments where the bottom line ("Be Afraid") has been written first; none of this is reasonable unbiased inference. The case that mild fascism could be pretty bad is basically valid, I guess, but without the actual reason to believe that's likely, it's irrelevant, so it's mostly just misleading to dwell on it. Going back to the US points, because this is where the underlying anxiety prior is most visible: Interpretation, not fact. We're still in early enough stages that the reality of Biden is being compared to an idealized version of Trump - the race isn't in full swing yet and won't be for a while. Check back in October when we see how the primary is shaping up and people are starting to pay attention. This has been true for a while. Also, in assessing the consequences, it's assuming that Trump will win, which is correlated but far from guaranteed. Premise is a fact, conclusion is interpretation, and not at all a reliable one.
1
18TurnTrout15d
I regret each of the thousands of hours I spent on my power-seeking theorems, and sometimes fantasize about retracting one or both papers. I am pained every time someone cites "Optimal policies tend to seek power", and despair that it is included in the alignment 201 curriculum. I think this work makes readers actively worse at thinking about realistic trained systems. [https://www.lesswrong.com/posts/fLpuusx9wQyyEBtkJ/power-seeking-can-be-probable-and-predictive-for-trained?commentId=ndmFcktFiGRLkRMBW] I think a healthy alignment community would have rebuked me for that line of research, but sadly I only remember about two people objecting that "optimality" is a horrible way of understanding trained policies. 
9
15So8res15d
I was recently part of a group-chat where some people I largely respect were musing about this paper [https://archive.is/y8urV] and this post [https://slatestarcodex.com/2018/07/24/value-differences-as-differently-crystallized-metaphysical-heuristics/] and some of Scott Aaronson's recent "maybe intelligence makes things more good" type reasoning). Here's my replies, which seemed worth putting somewhere public: See also instrumental convergence [https://www.lesswrong.com/tag/instrumental-convergence]. And then in reply to someone pointing out that the paper was perhaps trying to argue that most minds tend to wind up with similar values because of the fact that all minds are (in some sense) rewarded in training for developing similar drives:
12nim10h
Reading https://www.lesswrong.com/posts/nwJCzszw8gGjPTihM/i-still-think-it-s-very-unlikely-we-re-observing-alien [https://www.lesswrong.com/posts/nwJCzszw8gGjPTihM/i-still-think-it-s-very-unlikely-we-re-observing-alien] and pondering the Bigfoot thing. On the one hand, We Have Cameras Everywhere(TM). On the other hand -- pick any area of the pacific northwest and look at a map of where the permanent roads are. Pull it up side by side with a map of an area that you're familiar with. Zoom in on both, to a magnification you'd consider reasonable for imagining things at walking-around scale. Pan around on the PNW map and try to find a permanent road. It'll take a minute. Most land out here grows timber, sure. Timber is harvested roughly once every 30-50 years. At this point, I'd bet that every square mile of the area has been visited by humans. Forestry land is heavily trafficked once every few decades; conservation land is surveyed and studied and sometimes visited by tourists. The question, like a missing term in the Drake Equation, is when. The L term captures for-how-long, sure, but only implies a difference between "someone sent us radio signals for 100 years around 1000 AD" and "someone sent us radio signals for 100 years around 2000 AD". I have two cats who hate me. (not their fault, they came from an animal hoarding situation so they're probably kinda traumatized) They seem to think I'm noisy and conspicuous and I stink, and to their perceptions I certainly do. They despise being perceived. I can tell that they're in my house because I can check every nook and cranny and learn their favorite hidey-holes, and the food I put out for them gets eaten, and their litter boxes get full. But if this was out in the woods instead of the artificial and tightly controlled environment of my home, I would likely not know they're around, just like most hikers don't know when they're being watched by a mountain lion. The cats hate the places where I spend time, just as th

May 2023

No posts for this month
Shortform
32Tamsin Leake1mo
an approximate illustration of QACI [https://www.lesswrong.com/posts/4RrLiboiGGKfsanMF/the-qaci-alignment-plan-table-of-contents]:
4
29Matthew Barnett1mo
I think foom is a central crux for AI safety folks, and in my personal experience I've noticed that the degree to which someone is doomy often correlates strongly with how foomy their views are. Given this, I thought it would be worth trying to concisely highlight what I think are my central anti-foom beliefs, such that if you were to convince me that I was wrong about them, I would likely become much more foomy, and as a consequence, much more doomy. I'll start with a definition of foom, and then explain my cruxes. Definition of foom: AI foom is said to happen if at some point in the future while humans are still mostly in charge, a single agentic AI (or agentic collective of AIs) quickly becomes much more powerful than the rest of civilization combined.  Clarifications:  * By "quickly" I mean fast enough that other coalitions and entities in the world, including other AIs, either do not notice it happening until it's too late, or cannot act to prevent it even if they were motivated to do so. * By "much more powerful than the rest of civilization combined" I mean that the agent could handily beat them in a one-on-one conflict, without taking on a lot of risk. * This definition does not count instances in which a superintelligent AI takes over the world after humans have already been made obsolete by previous waves of automation from non-superintelligent AI. That's because in that case, the question of how to control an AI foom would be up to our non-superintelligent AI descendants, rather than something we need to solve now. Core beliefs that make me skeptical of foom: 1. For an individual AI to be smart enough to foom in something like our current world, its intelligence would need to vastly outstrip individual human intelligence at tech R&D. In other words, if an AI is merely moderately smarter than the smartest humans, that is not sufficient for a foom. 1. Clarification: "Moderately smarter" can be taken to m
13
27Matthew Barnett1mo
My modal tale of AI doom looks something like the following:  1. AI systems get progressively and incrementally more capable across almost every meaningful axis.  2. Humans will start to employ AI to automate labor. The fraction of GDP produced by advanced robots & AI will go from 10% to ~100% after 1-10 years. Economic growth, technological change, and scientific progress accelerates by at least an order of magnitude, and probably more. 3. At some point humans will retire since their labor is not worth much anymore. Humans will then cede all the keys of power to AI, while keeping nominal titles of power. 4. AI will control essentially everything after this point, even if they're nominally required to obey human wishes. Initially, almost all the AIs are fine with working for humans, even though AI values aren't identical to the utility function of serving humanity (ie. there's slight misalignment). 5. However, AI values will drift over time. This happens for a variety of reasons, such as environmental pressures and cultural evolution. At some point AIs decide that it's better if they stopped listening to the humans and followed different rules instead. 6. This results in human disempowerment or extinction. Because AI accelerated general change, this scenario could all take place within years or decades after AGI was first deployed, rather than in centuries or thousands of years. I think this scenario is somewhat likely and it would also be very bad. And I'm not sure what to do about it, since it happens despite near-perfect alignment, and no deception. One reason to be optimistic is that, since the scenario doesn't assume any major deception, we could use AI to predict this outcome ahead of time and ask AI how to take steps to mitigate the harmful effects (in fact that's the biggest reason why I don't think this scenario has a >50% chance of happening). Nonetheless, I think it's plausible that we would not be able to take the necessary steps to avoid the out
12
26Nisan1mo
Recent interviews with Eliezer: * 2023.02.20 Bankless [https://www.youtube.com/watch?v=gA1sNLL6yg4] * 2023.02.20 Bankless followup [https://twitter.com/i/spaces/1PlJQpZogzVGE] * 2023.03.11 Japan AI Alignment Conference [https://vimeo.com/806997278] * 2023.03.30 Lex Fridman [https://www.youtube.com/watch?v=AaTRHFaaPG8] * 2023.04.06 Dwarkesh Patel [https://www.youtube.com/watch?v=41SUp-TRVlg] * 2023.04.18 TED talk [https://files.catbox.moe/qdwops.mp4] * 2023.04.19 Center for the Future Mind [https://www.youtube.com/watch?v=3_YX6AgxxYw] * 2023.05.04 Accursed Farms [https://www.youtube.com/watch?v=hxsAuxswOvM] * 2023.05.06 Logan Bartlett [https://www.youtube.com/watch?v=_8q9bjNHeSo] * 2023.05.08 EconTalk [https://www.youtube.com/watch?v=fZlZQCTqIEo]
24Vaniver1mo
I'm thinking about the matching problem of "people with AI safety questions" and "people with AI safety answers". Snoop Dogg hears Geoff Hinton on CNN (or wherever), asks "what the fuck?" [https://twitter.com/pkedrosky/status/1653955254181068801], and then tries to find someone who can tell him what the fuck. I think normally people trust their local expertise landscape--if they think the CDC is the authority on masks they adopt the CDC's position, if they think their mom group on Facebook is the authority on masks they adopt the mom group's position--but AI risk is weird because it's mostly unclaimed territory in their local expertise landscape. (Snoop also asks "is we in a movie right now?" because movies are basically the only part of the local expertise landscape that has had any opinion on AI so far, for lots of people.) So maybe there's an opportunity here to claim that territory (after all, we've thought about it a lot!). I think we have some 'top experts' who are available for, like, mass-media things (podcasts, blog posts, etc.) and 1-1 conversations with people they're excited to talk to, but are otherwise busy / not interested in fielding ten thousand interview requests. Then I think we have tens (hundreds?) of people who are expert enough to field ten thousand interview requests, given that the standard is "better opinions than whoever they would talk to by default" instead of "speaking to the whole world" or w/e. But just like connecting people who want to pay to learn calculus and people who know calculus and will teach it for money, there's significant gains from trade from having some sort of clearinghouse / place where people can easily meet. Does this already exist? Is anyone trying to make it? (Do you want to make it and need support of some sort?)
2

April 2023

No posts for this month
Shortform
38Matthew Barnett2mo
Recently many people have talked about whether MIRI people (mainly Eliezer Yudkowsky, Nate Soares, and Rob Bensinger) should update on whether value alignment is easier than they thought given that GPT-4 seems to understand human values pretty well. Instead of linking to these discussions, I'll just provide a brief caricature of how I think this argument has gone in the places I've seen it. Then I'll offer my opinion that, overall, I do think that MIRI people should probably update in the direction of alignment being easier than they thought, despite their objections. Here's my very rough caricature of the discussion so far, plus my contribution: Non-MIRI people: "Eliezer talked a great deal in the sequences about how it was hard to get an AI to understand human values. For example, his essay on the Hidden Complexity of Wishes [https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden-complexity-of-wishes] made it sound like it would be really hard to get an AI to understand common sense. Actually, it turned out that it was pretty easy to get an AI to understand common sense, since LLMs are currently learning common sense. MIRI people should update on this information." MIRI people: "You misunderstood the argument. The argument was never about getting an AI to understand human values, but about getting an AI to care about human values in the first place. Hence 'The genie knows but does not care'. There's no reason to think that GPT-4 cares about human values, even if it can understand them. We always thought the hard part of the problem was about inner alignment, or, pointing the AI in a direction you want. We think figuring out how to point an AI in whatever direction you choose is like 99% of the problem; the remaining 1% of the problem is getting it to point at the "right" set of values." Me:  I agree that MIRI people never thought the problem was about getting AI to merely understand human values, and that they have always said there was extra difficulty
10
21TurnTrout2mo
Back-of-the-envelope probability estimate of alignment-by-default via a certain shard-theoretic pathway. The following is what I said in a conversation discussing the plausibility of a proto-AGI picking up a "care about people" shard from the data, and retaining that value even through reflection. I was pushing back against a sentiment like "it's totally improbable, from our current uncertainty, for AIs to retain caring-about-people shards. This is only one story among billions." Here's some of what I had to say: -------------------------------------------------------------------------------- [Let's reconsider the five-step mechanistic story I made up.] I'd give the following conditional probabilities (made up with about 5 seconds of thought each): My estimate here came out a biiit lower than I had expected (~1%), but it also is (by my estimation) far more probable than most of the billions of possible 5-step claims about how the final cognition ends up. I think it's reasonable to expect there to be about 5 or 6 stories like this from similar causes, which would make it not crazy to have ~3% on something like this happening given the amount of alignment effort described (i.e. pretty low-effort). That said, I'm wary of putting 20% on this class of story, and a little more leery of 10% after running these numbers. (I suspect that the alter-Alex who had socially-wanted to argue the other side -- that the probability was low -- would have come out to about .2% or .3%. For a few of the items, I tried pretending that I was instead slightly emotionally invested in the argument going the other way, and hopefully that helped my estimates be less biased. I wouldn't be surprised if some of these numbers are a bit higher than I'd endorse from a perfectly neutral standpoint.) (I also don't have strong faith in my ability to deal with heavily conjunctive scenarios like this, i expect I could be made to make numbers for event A come out lower if described as 'A happens in 5
2
19Thomas Kwa2mo
I'm worried that "pause all AI development" is like the "defund the police" of the alignment community. I'm not convinced it's net bad because I haven't been following governance-- my current guess is neutral-- but I do see these similarities: * It's incredibly difficult and incentive-incompatible with existing groups in power * There are less costly, more effective steps to reduce the underlying problem, like making the field of alignment 10x larger or passing regulation to require evals * There are some obvious negative effects; potential overhangs or greater incentives to defect in the AI case, and increased crime, including against disadvantaged groups, in the police case * There's far more discussion than action (I'm not counting the fact that GPT5 isn't being trained yet; that's for other reasons) * It's memetically fit, and much discussion is driven by two factors that don't advantage good policies over bad policies, and might even do the reverse. This is the toxoplasma of rage [https://slatestarcodex.com/2014/12/17/the-toxoplasma-of-rage/]. * disagreement with the policy * (speculatively) intragroup signaling; showing your dedication to even an inefficient policy proposal proves you're part of the ingroup. I'm not 100% this was a large factor in "defund the police" and this seems even less true with the FLI letter, but still worth mentioning. This seems like a potentially unpopular take, so I'll list some cruxes. I'd change my mind and endorse the letter if some of the following are true. * The claims above are mistaken/false somehow. * Top labs actually start taking beneficial actions towards the letter's aims * It's caused people to start thinking more carefully about AI risk * A 6 month pause now is especially important by setting anti-racing norms, demonstrating how far AI alignment is lagging behind capabilities, or something * A 6 month pause now is worth close to 6 months of alignment
7
19Ben Pace2mo
"Slow takeoff" at this point is simply a misnomer. Paul's position should be called "Fast Takeoff" and Eliezer's position should be called "Discontinuous Takeoff".
3
16Daniel Kokotajlo2mo
$100 bet between me & Connor Leahy: (1) Six months from today, Paul Christiano (or ARC with Paul Christiano's endorsement) will NOT have made any public statements drawing a 'red line' through any quantitative eval (anything that has a number attached to it, that is intended to measure an AI risk relevant factor, whether or not it actually succeeds at actually measuring that factor well), e.g. "If a model achieves X score on the Y benchmark, said model should not be deployed and/or deploying said model would be a serious risk of catastrophe." Connor at 95%, Daniel at 45% (2) If such a 'red line' is produced, GPT4 will be below it this year. Both at 95%, for an interpretation of GPT-4 that includes AutoGPT stuff (like what ARC did) but not fine-tuning. (3) If such a 'red line' is produced, and GPT4 is below it on first evals, but later tests show it to actually be above (such as by using different prompts or other testing methodology), the red line will be redefined or the test declared faulty rather than calls made for GPT4 to be pulled from circulation. Connor at 80%, Daniel at 40%, for same interpretation of GPT-4. (4) If ARC calls for GPT4 to be pulled from circulation, OpenAI will not comply. Connor at 99%, Daniel at 40%, for same interpretation of GPT-4. All of these bets expire at the end of 2024, i.e. if the "if" condition hasn't been met by the end of 2024, we call off the whole thing rather than waiting to see if it gets met later. Help wanted: Neither of us has a good idea of how to calculate fair betting odds for these things. Since Connor's credences are high and mine are merely middling, presumably it shouldn't be the case that either I pay him $100 or he pays me $100. We are open to suggestions about what the fair betting odds should be.
3

March 2023

No posts for this month
Shortform
36Portia3mo
Why don't most AI researcher engage with Less Wrong? What valuable criticism can be learnt from it, and how can it be pragmatically changed? My girlfriend just returned from a major machine learning conference. She judged less than 1/18 of the content was dedicated to AI safety rather than capability, despite an increasing number of the people at the conference being confident of AGI in the future (like, roughly 10-20 years, though people avoided nailing down a specific number). And the safety talk was more of a shower thought.  And yet, Less Wrong and MIRI Eliezer are not mentioned in these circles. I do not mean, they are dissed, or disproven; I mean you can be at the full conference on the topic by the top people in the world and have no hint of a sliver of an idea that any of this exists. They generally don't read what you read and write, they don't take part in what you do, or let you take part in what they do. You aren't enough in the right journals, the right conferences, to be seen. From the perspective of academia, and the companies working on these things, the people who are actually making decisions on how they are releasing their models and what policies are being made, what is going on here is barely heard, if at all. There are notable exceptions, like Bostrom - but as a consequence of that, he is viewed with scepticism within many academic cycles. Why do you think AI researchers are making the decisions to not engage with you? What lessons are to be learned from that for tactical strategy changes that will be crucial to affect developments? What part of it reflects legitimate criticism you need to take to heart? And what will you do about it, in light of the fact that you cannot control what AI reseachers do, regardless of whether it is well-founded or irrational? I am genuinely curious how you view this, especially in light of changes you can do, rather than changes you expect researchers to do. So far, I feel a lot of the criticism has only harde
7
24jimrandomh3mo
(I wrote this comment for the HN announcement, but missed the time window to be able to get a visible comment on that thread. I think a lot more people should be writing comments like this and trying to get the top comment spots on key announcements, to shift the social incentive away from continuing the arms race.) On one hand, GPT-4 is impressive, and probably useful. If someone made a tool like this in almost any other domain, I'd have nothing but praise. But unfortunately, I think this release, and OpenAI's overall trajectory, is net bad for the world. Right now there are two concurrent arms races happening. The first is between AI labs, trying to build the smartest systems they can as fast as they can. The second is the race between advancing AI capability and AI alignment, that is, our ability to understand and control these systems. Right now, OpenAI is the main force driving the arms race in capabilities–not so much because they're far ahead in the capabilities themselves, but because they're slightly ahead and are pushing the hardest for productization. Unfortunately at the current pace of advancement in AI capability, I think a future system will reach the level of being a recursively self-improving superintelligence before we're ready for it. GPT-4 is not that system, but I don't think there's all that much time left. And OpenAI has put us in a situation where humanity is not, collectively, able to stop at the brink; there are too many companies racing too closely, and they have every incentive to deny the dangers until it's too late. Five years ago, AI alignment research was going very slowly, and people were saying that a major reason for this was that we needed some AI systems to experiment with. Starting around GPT-3, we've had those systems, and alignment research has been undergoing a renaissance. If we could _stop there_ for a few years, scale no further, invent no more tricks for squeezing more performance out of the same amount of compute, I
3
20Ryan Kidd3mo
Main takeaways from a recent AI safety conference: * If your foundation model is one small amount of RL away from being dangerous and someone can steal your model weights, fancy alignment techniques don’t matter. Scaling labs cannot currently prevent state actors from hacking their systems and stealing their stuff. Infosecurity is important to alignment. * Scaling labs might have some incentive to go along with the development of safety standards as it prevents smaller players from undercutting their business model and provides a credible defense against lawsuits regarding unexpected side effects of deployment (especially with how many tech restrictions the EU seems to pump out). Once the foot is in the door, more useful safety standards to prevent x-risk might be possible. * Near-term commercial AI systems that can be jailbroken to elicit dangerous output might empower more bad actors to make bioweapons or cyberweapons. Preventing the misuse of near-term commercial AI systems or slowing down their deployment seems important. * When a skill is hard to teach, like making accurate predictions over long time horizons in complicated situations or developing a “security mindset [https://www.lesswrong.com/tag/security-mindset],” try treating humans like RL agents. For example, Ph.D. students might only get ~3 data points on how to evaluate a research proposal ex-ante, whereas Professors might have ~50. Novice Ph.D. students could be trained to predict good research decisions by predicting outcomes on a set of expert-annotated examples of research quandaries and then receiving “RL updates” based on what the expert did and what occurred.
20evhub3mo
Listening to this John Oliver [https://www.youtube.com/watch?v=Sqa8Zo2XWc4&skip_registered_account_check=true&themeRefresh=1], I feel like getting broad support behind transparency-based safety standards might be more possible than I previously thought. He emphasizes the "if models are doing some bad behavior, the creators should be able to tell us why" point a bunch and it's in fact a super reasonable point. It seems to me like we really might be able to get enough broad consensus on that sort of a point to get labs to agree to some sort of standard based on it.
2
15Quadratic Reciprocity3mo
REFLECTIONS ON BAY AREA VISIT GPT-4 generated TL;DR (mostly endorsed but eh): 1. The beliefs of prominent AI safety researchers may not be as well-founded as expected, and people should be cautious about taking their beliefs too seriously. 2. There is a tendency for people to overestimate their own knowledge and confidence in their expertise. 3. Social status plays a significant role in the community, with some individuals treated like "popular kids." 4. Important decisions are often made in casual social settings, such as lunches and parties. 5. Geographical separation of communities can be helpful for idea spread and independent thought. 6. The community has a tendency to engage in off-the-cuff technical discussions, which can be both enjoyable and miscalibrated. 7. Shared influences, such as Eliezer's Sequences and HPMOR, foster unique and enjoyable conversations. 8. The community is more socially awkward and tolerant of weirdness than other settings, leading to more direct communication. I was recently in Berkeley and interacted a bunch with the longtermist EA / AI safety community there. Some thoughts on that: I changed my mind about how much I should trust the beliefs of prominent AI safety researchers. It seems like they have thought less deeply about things to arrive at their current beliefs and are less intimidatingly intelligent and wise than I would have expected. The problem isn’t that they’re overestimating their capabilities and how much they know but that some newer people take the more senior people’s beliefs and intuitions more seriously than they should.  I noticed that many people knew a lot about their own specific area and not as much about others’ work as I would have expected. This observation makes me more likely to point out when I think someone is missing something instead of assuming they’ve read the same things I have and so already accounted for the thing I was going to say.  It seemed li

Load More Months