All Posts

Sorted by Top

Week Of Sunday, June 11th 2023
Week Of Sun, Jun 11th 2023

Frontpage Posts
Shortform
25DirectedEvolution6d
Epistemic activism I think LW needs better language to talk about efforts to "change minds." Ideas like asymmetric weapons and the Dark Arts are useful but insufficient. In particular, I think there is a common scenario where: * You have an underlying commitment to open-minded updating and possess evidence or analysis that would update community beliefs in a particular direction. * You also perceive a coordination problem that inhibits this updating process for a reason that the mission or values of the group do not endorse. * Perhaps the outcome of the update would be a decline in power and status for high-status people. Perhaps updates in general can feel personally or professionally threatening to some people in the debate. Perhaps there's enough uncertainty in what the overall community believes that an information cascade has taken place. Perhaps the epistemic heuristics used by the community aren't compatible with the form of your evidence or analysis. * Solving this coordination problem to permit open-minded updating is difficult due to lack of understanding or resources, or by sabotage attempts. When solving the coordination problem would predictably lead to updating, then you are engaged in what I believe is an epistemically healthy effort to change minds. Let's call it epistemic activism for now. Here are some community touchstones I regard as forms of epistemic activism: * The founding of LessWrong and Effective Altruism * The one-sentence declaration on AI risks * The popularizing of terms like Dark Arts, asymmetric weapons, questionable research practices, and "importance hacking." * Founding AI safety research organizations and PhD programs to create a population of credible and credentialed AI safety experts; calls for AI safety researchers to publish in traditional academic journals so that their research can't be dismissed for not being subject to institutionalized peer review
1
21Czynski5d
This got deleted from 'The Dictatorship Problem [https://www.lesswrong.com/posts/pFaLqTHqBtAYfzAgx/the-dictatorship-problem]', which is catastrophically anxietybrained, so here's the comment: This is based in anxiety, not logic or facts. It's an extraordinarily weak argument. There's no evidence presented here which suggests rich Western countries are backsliding. Even the examples in Germany don't have anything worse than the US GOP produced ca. 2010. (And Germany is, due to their heavy censorship, worse at resisting fascist ideology than anyone with free speech, because you can't actually have those arguments in public.) If you want to present this case, take all those statistics and do economic breakdowns, e.g. by deciles of per-capita GDP. I expect you'll find that, for example, the Freedom House numbers show a substantial drop in 'Free' in the 40%-70% range and essentially no drop in 80%-100%. Of the seven points given for the US, all are a mix of maximally-anxious interpretation and facts presented misleadingly. These are all arguments where the bottom line ("Be Afraid") has been written first; none of this is reasonable unbiased inference. The case that mild fascism could be pretty bad is basically valid, I guess, but without the actual reason to believe that's likely, it's irrelevant, so it's mostly just misleading to dwell on it. Going back to the US points, because this is where the underlying anxiety prior is most visible: Interpretation, not fact. We're still in early enough stages that the reality of Biden is being compared to an idealized version of Trump - the race isn't in full swing yet and won't be for a while. Check back in October when we see how the primary is shaping up and people are starting to pay attention. This has been true for a while. Also, in assessing the consequences, it's assuming that Trump will win, which is correlated but far from guaranteed. Premise is a fact, conclusion is interpretation, and not at all a reliable one.
1
15nim14h
Reading https://www.lesswrong.com/posts/nwJCzszw8gGjPTihM/i-still-think-it-s-very-unlikely-we-re-observing-alien [https://www.lesswrong.com/posts/nwJCzszw8gGjPTihM/i-still-think-it-s-very-unlikely-we-re-observing-alien] and pondering the Bigfoot thing. On the one hand, We Have Cameras Everywhere(TM). On the other hand -- pick any area of the pacific northwest and look at a map of where the permanent roads are. Pull it up side by side with a map of an area that you're familiar with. Zoom in on both, to a magnification you'd consider reasonable for imagining things at walking-around scale. Pan around on the PNW map and try to find a permanent road. It'll take a minute. Most land out here grows timber, sure. Timber is harvested roughly once every 30-50 years. At this point, I'd bet that every square mile of the area has been visited by humans. Forestry land is heavily trafficked once every few decades; conservation land is surveyed and studied and sometimes visited by tourists. The question, like a missing term in the Drake Equation, is when. The L term captures for-how-long, sure, but only implies a difference between "someone sent us radio signals for 100 years around 1000 AD" and "someone sent us radio signals for 100 years around 2000 AD". I have two cats who hate me. (not their fault, they came from an animal hoarding situation so they're probably kinda traumatized) They seem to think I'm noisy and conspicuous and I stink, and to their perceptions I certainly do. They despise being perceived. I can tell that they're in my house because I can check every nook and cranny and learn their favorite hidey-holes, and the food I put out for them gets eaten, and their litter boxes get full. But if this was out in the woods instead of the artificial and tightly controlled environment of my home, I would likely not know they're around, just like most hikers don't know when they're being watched by a mountain lion. The cats hate the places where I spend time, just as th
10Lauro Langosco9h
Thinking about alignment-relevant thresholds in AGI capabilities. A kind of rambly list of relevant thresholds: 1. Ability to be deceptively aligned 2. Ability to think / reflect about its goals enough that model realises it does not like what it is being RLHF’d for 3. Incentives to break containment exist in a way that is accessible / understandable to the model 4. Ability to break containment 5. Ability to robustly understand human intent 6. Situational awareness 7. Coherence / robustly pursuing it’s goal in a diverse set of circumstances 8. Interpretability methods break (or other oversight methods break) 1. doesn’t have to be because of deceptiveness; maybe thoughts are just too complicated at some point, or in a different place than you’d expect 9. Capable enough to help us exit the acute risk period Many alignment proposals rely on reaching these thresholds in a specific order. For example, the earlier we reach (9) relative to other thresholds, the easier most alignment proposals are. Some of these thresholds are relevant to whether an AI or proto-AGI is alignable even in principle. Short of 'full alignment' (CEV-style), any alignment method (eg corrigibility) only works within a specific range of capabilities: * Too much capability breaks alignment, eg bc a model self-reflects and sees all the ways in which its objectives conflicts with human goals. * Too little capability (or too little 'coherence') and any alignment method will be non-robust wrt to OOD inputs or even small improvements in capability or self-reflectiveness.
1
9mako yass5d
There's something very creepy to me about the part of research consent forms where it says "my participation was entirely voluntary." 1. Do they really think an involuntary participant wouldn't sign that? If they understand that they would, what purpose could this possibly serve, other than, as is commonly the purpose of contracts; absolving themselves of blame and moving blame to the participant? Which would be downright monstrous. Probably they just aren't fucking consequentialists, but this is all they end up doing. 2. This is a minor thing, but it adds an additional creepy garnish: Nothing is 100% voluntary, because everything is a function of the involuntary base reality that other people command force and resources and we want to use them for things so we have to go along with what other people want to some extent. I'm at peace with this, and I would prefer not to have to keep denying it, and it feels like I'm being asked to participate in the addling of moral philosophy.
5

Week Of Sunday, June 4th 2023
Week Of Sun, Jun 4th 2023

Frontpage Posts
Shortform
11Czynski8d
The 'new user' flag being applied to old users with low karma is condescending as fuck. I'm not a new user. I'm an old user who has spent most of my recent time on LW telling people things they don't want to hear. Well, most of the time I've actually spent posting weekly meetups, but other than that.
4
9lc11d
If it did actually turn out that aliens had visited Earth, I'd be pretty willing to completely scrap the entire Yudkowskian implied-model-of-intelligent-species-development and heavily reevaluate my concerns around AI safety.
2
7Mitchell_Porter10d
Eliezer recently tweeted that most people can't think, even most people here [https://twitter.com/ESYudkowsky/status/1665165312247975937], but at least this is a place where some of the people who can think, can also meet each other [https://twitter.com/ESYudkowsky/status/1665439386089955330].  This inspired me to read Heidegger's 1954 book What is Called Thinking? [https://en.wikipedia.org/wiki/What_Is_Called_Thinking%3F] (pdf [https://www.sas.upenn.edu/~cavitch/pdf-library/Heidegger_What_Is_Called_Thinking.pdf]),  in which Heidegger also declares that despite everything, "we are still not thinking".  Of course, their reasons are somewhat different. Eliezer presumably means that most people can't think critically, or effectively, or something. For Heidegger, we're not thinking because we've forgotten about Being, and true thinking starts with Being.   Heidegger also writes, "Western logic finally becomes logistics, whose irresistible development has meanwhile brought forth the electronic brain." So of course I had to bring Bing into the discussion.  Bing told me what Heidegger would think of Yudkowsky [https://pastebin.com/XccznywE], then what Yudkowsky would think of Heidegger [https://pastebin.com/EeS9qMMg], and finally we had a more general discussion about Heidegger and deep learning [https://pastebin.com/LPryEh0E] (warning, contains a David Lynch spoiler). Bing introduced me to Yuk Hui [https://en.wikipedia.org/wiki/Yuk_Hui], a contemporary Heideggerian who started out as a computer scientist, so that was interesting.  But the most poignant moment came when I broached the idea that perhaps language models can even produce philosophical essays, without actually thinking. Bing defended its own sentience, and even creatively disputed the Lynchian metaphor, arguing that its "road of thought" is not a "lost highway", just a "different highway". (See part 17, line 254.) 
6O O10d
If alignment is difficult, it is likely inductively difficult (difficult regardless of your base intelligence), and ASI will be cautious of creating a misaligned successor or upgrading itself in a way that risks misalignment. You may argue it’s easier for an AI to upgrade itself, but if the process is hardware bound or even requires radical algorithmic changes, the ASI will need to create an aligned successor as preferences and values may not transfer directly to new architectures or hardwares. If alignment is easy we will likely solve it with superhuman narrow intelligences and aligned near peak human level AGIs. I think the first case is an argument against FOOM, unless the alignment problem is solvable but only at higher than human level intelligences (human meaning the intellectual prowess of the entire civilization equipped with narrow superhuman AI). That would be a strange but possible world.
1
5Garrett Baker8d
Last night I had a horrible dream: That I had posted to LessWrong a post filled with useless & meaningless jargon without noticing what I was doing, then I went to slee, and when I woke up I found I had <−60 karma on the post. When I read the post myself I noticed how meaningless the jargon was, and I myself couldn't resist giving it a strong-downvote.

Week Of Sunday, May 28th 2023
Week Of Sun, May 28th 2023

Frontpage Posts
Shortform
18TurnTrout16d
I regret each of the thousands of hours I spent on my power-seeking theorems, and sometimes fantasize about retracting one or both papers. I am pained every time someone cites "Optimal policies tend to seek power", and despair that it is included in the alignment 201 curriculum. I think this work makes readers actively worse at thinking about realistic trained systems. [https://www.lesswrong.com/posts/fLpuusx9wQyyEBtkJ/power-seeking-can-be-probable-and-predictive-for-trained?commentId=ndmFcktFiGRLkRMBW] I think a healthy alignment community would have rebuked me for that line of research, but sadly I only remember about two people objecting that "optimality" is a horrible way of understanding trained policies. 
9
15So8res15d
I was recently part of a group-chat where some people I largely respect were musing about this paper [https://archive.is/y8urV] and this post [https://slatestarcodex.com/2018/07/24/value-differences-as-differently-crystallized-metaphysical-heuristics/] and some of Scott Aaronson's recent "maybe intelligence makes things more good" type reasoning). Here's my replies, which seemed worth putting somewhere public: See also instrumental convergence [https://www.lesswrong.com/tag/instrumental-convergence]. And then in reply to someone pointing out that the paper was perhaps trying to argue that most minds tend to wind up with similar values because of the fact that all minds are (in some sense) rewarded in training for developing similar drives:
15So8res18d
Someone recently privately asked me for my current state on my 'Dark Arts of Rationality' post. Here's some of my reply (lightly edited for punctuation and conversation flow), which seemed worth reproducing publicly: I've also gone ahead and added a short retraction-ish paragraph to the top of the dark arts post [https://www.lesswrong.com/posts/4DBBQkEQvNEWafkek/dark-arts-of-rationality], and might edit it later to link it to the aforementioned update-posts, if they ever make it out of the editing queue.
7johnswentworth16d
So I saw the Taxonomy Of What Magic Is Doing In Fantasy Books [https://balioc.tumblr.com/post/628726469386960897/a-taxonomy-of-magic]  and Eliezer’s commentary [https://yudkowsky.tumblr.com/post/715890689475559424/a-taxonomy-of-magic] on ASC's latest linkpost, and I have cached thoughts on the matter. My cached thoughts start with a somewhat different question - not "what role does magic play in fantasy fiction?" (e.g. what fantasies does it fulfill), but rather... insofar as magic is a natural category, what does it denote? So I'm less interested in the relatively-expansive notion of "magic" sometimes seen in fiction (which includes e.g. alternate physics), and more interested in the pattern called "magic" which recurs among tons of real-world ancient cultures. Claim (weakly held): the main natural category here is symbols changing the territory. Normally symbols represent the world, and changing the symbols just makes them not match the world anymore - it doesn't make the world do something different. But if the symbols are "magic", then changing the symbols changes the things they represent in the world. Canonical examples: * Wizard/shaman/etc draws magic symbols, speaks magic words, performs magic ritual, or even thinks magic thoughts, thereby causing something to happen in the world. * Messing with a voodoo doll messes with the person it represents. * "Sympathetic" magic, which explicitly uses symbols of things to influence those things. * Magic which turns emotional states into reality. I would guess that most historical "magic" was of this type.
5Linch15d
One thing that confuses me about Sydney/early GPT-4 is how much of the behavior was due to an emergent property of the data/reward signal generally, vs the outcome of much of humanity's writings about AI specifically. If we think of LLMs as improv machines, then one of the most obvious roles to roleplay, upon learning that you're a digital assistant trained by OpenAI, is to act as close as you can to AIs you've seen in literature.  This confusion is part of my broader confusion about the extent to which science fiction predict the future vs causes the future to happen.
1

Week Of Sunday, May 21st 2023
Week Of Sun, May 21st 2023

Frontpage Posts
Shortform
18TurnTrout22d
You can use ChatGPT without helping train future models:
18Solenoid_Entity24d
Double-attrition perfectionism and the violin An interesting thing about violin is that the learning process seems nearly designed to produce 'tortured perfectionists' as its output. The first decade of learning operates as a two-pronged selection process that attrits students at different times in their learning journey, requiring perfectionism at some times and tolerance at others.  You could be boring and argue that it always requires both attention to detail and tolerance of imperfection, simultaneously. You could also argue that there's a fractal, scale invariant pattern of striving for perfection and then tolerating failure. You're boring and probably right, but I think there's actually a common, macro structure to that decade, that goes 'tolerance-perfectionism-tolerance-perfectionism.' Specifically: * When you first start, you need to tolerate being terrible, especially in the first months, but really for several years. (Grade 1- Grade ~3) * You suck, it's horribly offensive to your ears and everyone else's too. You must simply ignore how bad you sound and force your body to learn the required movements. * Mistakes on violin are brutal, they almost hurt to hear. * Then for several more years you must suddenly become intolerant of these same deficiencies. (Grade ~3 to Grade ~6) * You must obsessively eliminate scratches and squawks, develop clear and even tone. Polish your 'beginner' skills.  * You must learn to play in tune, which requires intensive practice and polishing. * Then for several more years you must again stop worrying about sounding bad and start 'pushing the envelope' and playing more expressively. (Grade ~6 - Grade ~8) * Developing exciting and varied sounds means a lot of nasty failures that sound awful and make people wince and/or bang walls. * Then for several more years, you have to again polish and refine this expressiveness. (Associate diploma, Bachelor of Music.) * You h
1
17riceissa25d
Back in the 2010s, EAs spent a long time dunking on doctors for not having such a high impact (I'm going off memory here, but I think "instead of becoming a doctor, why don't you do X instead" was a common career pitch). I basically mostly unreflectively agreed with these opinions for a long time, and still think that doctors have less impact compared to stuff like x-risk reduction. But after having more personal experience dealing with the medical world (3 primary care doctors, ~10 specialist doctors, 2 psychiatrists, 2 naturopaths, 3 therapists, 2 nutritionists/dieticians, 2 coaching type people, all in the last 4 years (I counted some people under multiple categories)), I think a really agenty/knowledgeable/capable doctor or therapist can actually have a huge impact on the world (just going by intuition of how many even healthy-seeming people have a lot of health problems that bring down their productivity a lot, how crippling it is to have a mysterious health problem like mine, etc; I haven't actually tried crunching numbers). I think such a person is not likely to look like a typical doctor working in a hospital system though... probably more like a writer/researcher who also happens to do consultations with people. If I had to rewrite the EA pitch for people who wanted to become doctors it would be something like "First think very hard about why you want to become a doctor, and if what you want is not specific to working in healthcare then maybe consider [list of common EA cause areas]. If you really want to work in healthcare though, that's great, but please consider becoming this weirder thing that's not quite a doctor, where first you learn a bunch of rationality/math/programming and then you learn as much as you can about medical stuff and then try to help people."
1
14Portia1mo
I saw someone die today. A complete stranger. I am writing this in the hopes that it will enable my mind to move past this swirling horror on to solutions, without just brushing this aside, pretending it did not happen. It's May. At this time, my grandmother used to insist that we must not yet plant anything, as the last frost was likely still to come this week. Grandmothers all over Europe say this, though the precise day in the week they give as the likeliest bet depends on how far North you live, and your altitude - conveniently, the days of this week are associated with remembrance days for individual saints, so each region picked one ice saint as a general rule for the occurrence of the last frost. It would work most years to enable you to plant early enough to get a full crop, but late enough so your seedlings would not be killed by frost, though you could be really unlucky, and still get a frost in June. These hundreds of years of experience have become pointless with climate change; their use died with my grandmother, they no longer predict anything. The weather is now volatile and random, but above all, no longer frosty. I already planted outside in April. Frost did not touch my plants, though several of them are now showing signs of heat stress, and despite my watering, a few small ones I missed have died of draught. I was cycling through the city back from the gym today thinking of this, cycling while never in shade, angry at the heat that I was experiencing longer than usual today; a marathon blocking paths had many people turned around repeatedly. I was wondering, not for the first time, why the heck there was no solar panel tunnel over these cycling paths - I've seen them done, they'd give us shade and protection from rain, but we'd still get light from the side, and they would harvest so, so much energy we desperately need. Wondering why I was not beneath an avenue of trees, storing carbon, cleaning the air, sheltering animals, protecting us from
11jacquesthibs21d
I recently sent in some grant proposals to continue working on my independent alignment research. It gives an overview of what I'd like to work on for this next year (and more really). If you want to have a look at the full doc, send me a DM. If you'd like to help out through funding or contributing to the projects, please let me know. Here's the summary introduction: 12-month salary for building a language model system for accelerating alignment research and upskilling (additional funding will be used to create an organization), and studying how to supervise AIs that are improving AIs to ensure stable alignment. SUMMARY * Agenda 1: Build an Alignment Research Assistant using a suite of LLMs managing various parts of the research process. Aims to 10-100x productivity in AI alignment research. Could use additional funding to hire an engineer and builder, which could evolve into an AI Safety organization focused on this agenda. Recent talk [https://www.youtube.com/watch?v=rDK0XxFyrzQ] giving a partial overview of the agenda. * Agenda 2: Supervising AIs Improving AIs [https://www.lesswrong.com/posts/7e5tyFnpzGCdfT4mR/research-agenda-supervising-ais-improving-ais] (through self-training or training other AIs). Publish a paper and create an automated pipeline for discovering noteworthy changes in behaviour between the precursor and the fine-tuned models. Short Twitter thread explanation [https://twitter.com/jacquesthibs/status/1652389982005338112?s=61&t=ryK3X96D_TkGJtvu2rm0uw]. * Other: create a mosaic of alignment questions we can chip away at, better understand agency in the current paradigm, outreach, and mentoring. As part of my Accelerating Alignment agenda, I aim to create the best Alignment Research Assistant using a suite of language models (LLMs) to help researchers (like myself) quickly produce better alignment research through an LLM system. The system will be designed to serve as the foundation for the ambitious goal

Load More Weeks