All Posts

Sorted by Top

2023

No posts for this year
Shortform
53paulfchristiano5mo
One way of viewing takeoff speed disagreements within the safety community is: most people agree that growth will eventually be explosively fast, and so the question is "how big an impact does AI have prior to explosive growth?" We could quantify this by looking at the economic impact of AI systems prior to the point when AI is powerful enough to double output each year. (We could also quantify it via growth dynamics, but I want to try to get some kind of evidence further in advance which requires looking at AI in particular---on both views, AI only has a large impact on total output fairly close to the singularity.) The "slow takeoff" view is that general AI systems will grow to tens of trillions of dollars a year of revenue in the years prior to explosive growth, and so by the time they have automated the entire economy it will look like a natural extension of the prior trend. (Cost might be a more robust measure than revenue, especially if AI output is primarily reinvested by large tech companies, and especially on a slow takeoff view with a relatively competitive market for compute driven primarily by investors. Revenue itself is very sensitive to unimportant accounting questions, like what transactions occur within a firm vs between firms.) The fast takeoff view is that the pre-takeoff impact of general AI will be... smaller. I don't know exactly how small, but let's say somewhere between $10 million and $10 trillion, spanning 6 orders of magnitude. (This reflects a low end that's like "ten people in a basement" and a high end that's just a bit shy of the slow takeoff view.) It seems like growth in AI has already been large enough to provide big updates in this discussion. I'd guess total revenue from general and giant deep learning systems[1] will probably be around $1B in 2023 (and perhaps much higher if there is a lot of stuff I don't know about). It also looks on track to grow to $10 billion over the next 2-3 years if not faster. It seems easy to see h
25
47Adam Scherlis6mo
EDIT: I originally saw this in Janus's tweet here: https://twitter.com/repligate/status/1619557173352370186 [https://twitter.com/repligate/status/1619557173352370186] Something fun I just found out about: ChatGPT perceives the phrase " SolidGoldMagikarp" (with an initial space) as the word "distribute", and will respond accordingly. It is completely unaware that that's not what you typed. This happens because the BPE tokenizer saw the string " SolidGoldMagikarp" a few times in its training corpus, so it added a dedicated token for it, but that string almost never appeared in ChatGPT's own training data so it never learned to do anything with it. Instead, it's just a weird blind spot in its understanding of text.
7
40Matthew Barnett3mo
Recently many people have talked about whether MIRI people (mainly Eliezer Yudkowsky, Nate Soares, and Rob Bensinger) should update on whether value alignment is easier than they thought given that GPT-4 seems to understand human values pretty well. Instead of linking to these discussions, I'll just provide a brief caricature of how I think this argument has gone in the places I've seen it. Then I'll offer my opinion that, overall, I do think that MIRI people should probably update in the direction of alignment being easier than they thought, despite their objections. Here's my very rough caricature of the discussion so far, plus my contribution: Non-MIRI people: "Eliezer talked a great deal in the sequences about how it was hard to get an AI to understand human values. For example, his essay on the Hidden Complexity of Wishes [https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden-complexity-of-wishes] made it sound like it would be really hard to get an AI to understand common sense. Actually, it turned out that it was pretty easy to get an AI to understand common sense, since LLMs are currently learning common sense. MIRI people should update on this information." MIRI people: "You misunderstood the argument. The argument was never about getting an AI to understand human values, but about getting an AI to care about human values in the first place. Hence 'The genie knows but does not care'. There's no reason to think that GPT-4 cares about human values, even if it can understand them. We always thought the hard part of the problem was about inner alignment, or, pointing the AI in a direction you want. We think figuring out how to point an AI in whatever direction you choose is like 99% of the problem; the remaining 1% of the problem is getting it to point at the "right" set of values." Me:  I agree that MIRI people never thought the problem was about getting AI to merely understand human values, and that they have always said there was extra difficulty
10
36Portia4mo
Why don't most AI researcher engage with Less Wrong? What valuable criticism can be learnt from it, and how can it be pragmatically changed? My girlfriend just returned from a major machine learning conference. She judged less than 1/18 of the content was dedicated to AI safety rather than capability, despite an increasing number of the people at the conference being confident of AGI in the future (like, roughly 10-20 years, though people avoided nailing down a specific number). And the safety talk was more of a shower thought.  And yet, Less Wrong and MIRI Eliezer are not mentioned in these circles. I do not mean, they are dissed, or disproven; I mean you can be at the full conference on the topic by the top people in the world and have no hint of a sliver of an idea that any of this exists. They generally don't read what you read and write, they don't take part in what you do, or let you take part in what they do. You aren't enough in the right journals, the right conferences, to be seen. From the perspective of academia, and the companies working on these things, the people who are actually making decisions on how they are releasing their models and what policies are being made, what is going on here is barely heard, if at all. There are notable exceptions, like Bostrom - but as a consequence of that, he is viewed with scepticism within many academic cycles. Why do you think AI researchers are making the decisions to not engage with you? What lessons are to be learned from that for tactical strategy changes that will be crucial to affect developments? What part of it reflects legitimate criticism you need to take to heart? And what will you do about it, in light of the fact that you cannot control what AI reseachers do, regardless of whether it is well-founded or irrational? I am genuinely curious how you view this, especially in light of changes you can do, rather than changes you expect researchers to do. So far, I feel a lot of the criticism has only harde
7
35lc6mo
The Nick Bostrom fiasco is instructive: never make public apologies to an outrage machine. If Nick had just ignored whoever it was trying to blackmail him, it would have been on them to assert the importance of a twenty-five year old deliberately provocative email, and things might not have ascended to the point of mild drama. When he tried to "get ahead of things" by issuing an apology, he ceded that the email was in fact socially significant despite its age, and that he did in fact have something to apologize for, and so opened himself up to the Standard Replies that the apology is not genuine, he's secretly evil etc. etc. Instead, if you are ever put in this situation, just say nothing. Don't try to defend yourself. Definitely don't volunteer for a struggle session. Treat outrage artists like the police. You do not prevent the police from filing charges against you by driving to the station and attempting to "explain yourself" to detectives, or by writing and publishing a letter explaining how sorry you are. At best you will inflate the airtime of the controversy by responding to it, at worst you'll be creating the controversy in the first place.
6

2022

No posts for this year
Shortform
95So8res9mo
A number of years ago, when LessWrong was being revived from its old form to its new form, I did not expect the revival to work. I said as much at the time. For a year or two in the middle there, the results looked pretty ambiguous to me. But by now it's clear that I was just completely wrong--I did not expect the revival to work as well as it has to date. Oliver Habryka in particluar wins Bayes points off of me. Hooray for being right while I was wrong, and for building something cool!
11
63evhub1y
This is a list of random, assorted AI safety ideas that I think somebody should try to write up and/or work on at some point. I have a lot more than this in my backlog, but these are some that I specifically selected to be relatively small, single-post-sized ideas that an independent person could plausibly work on without much oversight. That being said, I think it would be quite hard to do a good job on any of these without at least chatting with me first—though feel free to message me if you’d be interested. * What would be necessary to build a good auditing game [https://www.alignmentforum.org/posts/cQwT8asti3kyA62zc/automating-auditing-an-ambitious-concrete-technical-research] benchmark? * How would AI safety AI [https://www.alignmentforum.org/posts/fYf9JAwa6BYMt8GBj/link-a-minimal-viable-product-for-alignment] work? What is necessary for it to go well? * How do we avoid end-to-end training while staying competitive with it? Can we use transparency on end-to-end models to identify useful modules to train non-end-to-end? * What would it look like to do interpretability on end-to-end trained probabilistic models instead of end-to-end trained neural networks? * Suppose you had a language model that you knew was in fact a good generative model of the world and that this property continued to hold regardless of what you conditioned it on. Furthermore, suppose you had some prompt that described some agent for the language model to simulate (Alice) that in practice resulted in aligned-looking outputs. Is there a way we could use different conditionals to get at whether or not Alice was deceptive (e.g. prompt the model with “DeepMind develops perfect transparency tools and provides an opportunity for deceptive models to come clean and receive a prize before they’re discovered.”). * Argue for the importance of ensuring that the state-of-the-art in “using AI for alignment [https://www.alignmentforum.org/posts/fYf9J
10
62TurnTrout1y
Rationality exercise: Take a set of Wikipedia articles on topics which trainees are somewhat familiar with, and then randomly select a small number of claims to negate (negating the immediate context as well, so that you can't just syntactically discover which claims were negated).  For example [https://en.wikipedia.org/wiki/Developmental_psychology ]: Sometimes, trainees will be given a totally unmodified article. For brevity, the articles can be trimmed of irrelevant sections.  Benefits: * Addressing key rationality skills. Noticing confusion; being more confused by fiction than fact; actually checking claims against your models of the world. * If you fail, either the article wasn't negated skillfully ("5 people died in 2021" -> "4 people died in 2021" is not the right kind of modification), you don't have good models of the domain, or you didn't pay enough attention to your confusion.  * Either of the last two are good to learn. * Scalable across participants. Many people can learn from each modified article. * Scalable across time. Once a modified article has been produced, it can be used repeatedly. * Crowdsourcable. You can put out a bounty for good negated articles, run them in a few control groups, and then pay based on some function of how good the article was. Unlike original alignment research or CFAR technique mentoring, article negation requires skills more likely to be present outside of Rationalist circles. I think the key challenge is that the writer must be able to match the style, jargon, and flow of the selected articles. 
3
61lc10mo
It is both absurd, and intolerably infuriating, just how many people on this forum think it's acceptable to claim they have figured out how qualia/consciousness works, and also not explain how one would go about making my laptop experience an emotion like 'nostalgia', or present their framework for enumerating the set of all possible qualitative experiences[1]. When it comes to this particular subject, rationalists are like crackpot physicists with a pet theory of everything, except rationalists go "Huh? Gravity?" when you ask them to explain how their theory predicts gravity, and then start arguing with you about gravity needing to be something explained by a theory of everything. You people make me want to punch my drywall sometimes. For the record: the purpose of having a "theory of consciousness" is so it can tell us which blobs of matter feel particular things under which specific circumstances, and teach others how to make new blobs of matter that feel particular things. Down to the level of having a field of AI anaesthesiology. If your theory of consciousness does not do this, perhaps because the sum total of your brilliant insights are "systems feel 'things' when they're, y'know, smart, and have goals. Like humans!", then you have embarassingly missed the mark. 1. ^ (Including the ones not experienced by humans naturally, and/or only accessible via narcotics, and/or involve senses humans do not have or have just happened not to be produced in the animal kingdom)
5
57Daniel Kokotajlo1y
The whiteboard in the CLR common room depicts my EA journey in meme format:

2021

No posts for this year
Shortform
65Rob Bensinger2y
Shared with permission, a google doc exchange confirming Eliezer still finds the arguments for alignment optimism, slower takeoffs, etc. unconvincing: Caveat: this was a private reply I saw and wanted to share (so people know EY's basic epistemic state, and therefore probably the state of other MIRI leadership). This wasn't an attempt to write an adequate public response to any of the public arguments put forward for alignment optimism or non-fast takeoff, etc., and isn't meant to be a replacement for public, detailed, object-level discussion. (Though I don't know when/if MIRI folks plan to produce a proper response, and if I expected such a response soonish I'd probably have just waited and posted that instead.)
19
56Vanessa Kosoy2y
Text whose primary goal is conveying information (as opposed to emotion, experience or aesthetics) should be skimming friendly. Time is expensive, words are cheap. Skimming is a vital mode of engaging with text, either to evaluate whether it deserves a deeper read or to extract just the information you need. As a reader, you should nurture your skimming skills. As a writer, you should treat skimmers as a legitimate and important part of your target audience. Among other things it means: * Good title and TLDR/abstract * Clear and useful division into sections * Putting the high-level picture and conclusions first, the technicalities and detailed arguments later. Never leave the reader clueless about where you’re going with something for a long time. * Visually emphasize the central points and make them as self-contained as possible. For example, in the statement of mathematical theorems avoid terminology whose definition is hidden somewhere in the bulk of the text.
51Buck2y
[this is a draft that I shared with a bunch of friends a while ago; they raised many issues that I haven't addressed, but might address at some point in the future] In my opinion, and AFAICT the opinion of many alignment researchers, there are problems with aligning superintelligent models that no alignment techniques so far proposed are able to fix. Even if we had a full kitchen sink approach where we’d overcome all the practical challenges of applying amplification techniques, transparency techniques, adversarial training, and so on, I still wouldn’t feel that confident that we’d be able to build superintelligent systems that were competitive with unaligned ones, unless we got really lucky with some empirical contingencies that we will have no way of checking except for just training the superintelligence and hoping for the best. Two examples:  * A simplified version of the hope with IDA is that we’ll be able to have our system make decisions in a way that never had to rely on searching over uninterpretable spaces of cognitive policies. But this will only be competitive if IDA can do all the same cognitive actions that an unaligned system can do, which is probably false, eg cf Inaccessible Information. * The best we could possibly hope for with transparency techniques is: For anything that a neural net is doing, we are able to get the best possible human understandable explanation of what it’s doing, and what we’d have to change in the neural net to make it do something different. But this doesn’t help us if the neural net is doing things that rely on concepts that it’s fundamentally impossible for humans to understand, because they’re too complicated or alien. It seems likely to me that these concepts exist. And so systems will be much weaker if we demand interpretability. Even though these techniques are fundamentally limited, I think there are still several arguments in favor of sorting out the practical details of how to
6
50davidad2y
I want to go a bit deep here on "maximum entropy" and misunderstandings thereof by the straw-man Humbali character [https://www.alignmentforum.org/posts/ax695frGJEzGxFBK4/biology-inspired-agi-timelines-the-trick-that-never-works#:~:text=Humbali%3A%20%C2%A0I%20feel,other%20people%20think%3F], mostly to clarify things for myself, but also in the hopes that others might find it useful. I make no claim to novelty here—I think all this ground was covered by Jaynes (1968 [https://bayes.wustl.edu/etj/articles/prior.pdf])—but I do have a sense that this perspective (and the measure-theoretic intuition behind it) is not pervasive around here, the way Bayesian updating is. First, I want to point out that entropy of a probability measure p is only definable relative to a base measure μ, as follows:  Hμ(p)=−∫Xdpdμ(x)logdpdμ(x)dμ(x) (The derivatives notated here denote Radon-Nikodym derivatives [https://en.wikipedia.org/wiki/Radon%E2%80%93Nikodym_theorem]; the integral is Lebesgue [https://en.wikipedia.org/wiki/Lebesgue_integration].) Shannon's formulae, the discrete H(p)=−∑ip(xi)logp(xi) and the continuous H(p)=−∫Xp(x)logp(x)dx, are the special cases of this where μ is assumed to be counting measure or Lebesgue measure, respectively. These formulae actually treat p as having a subtly different type than "probability measure": namely, they treat it as a density with respect to counting measure (a "probability mass function") or a density with respect to Lebesgue measure (a "probability density function"), and implicitly supply the corresponding μ. If you're familiar with Kullback–Leibler divergence [https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence] (DKL), and especially if you've heard DKL called "relative entropy," you may have already surmised that Hμ(p)=−DKL(p||μ). Usually, KL divergence is defined with both arguments being probability measures (measures that add up to 1), but that's not required for it to be well-defined (what is required is absolute con
1
49jimrandomh2y
In a comment here [https://www.lesswrong.com/posts/Btrmh6T62tB4g9RMc/why-those-who-care-about-catastrophic-and-existential-risk?commentId=ifnD8DCqX2FFTagoq], Eliezer observed that: And my reply to this grew into something that I think is important enough to make as a top-level shortform post. It's worth noticing that this is not a universal property of high-paranoia software development, but a an unfortunate consequence of using the C programming language and of systems programming. In most programming languages and most application domains, crashes only rarely point to security problems. OpenBSD is this paranoid, and needs to be this paranoid, because its architecture is fundamentally unsound (albeit unsound in a way that all the other operating systems born in the same era are also unsound). This presents a number of useful analogies that may be useful for thinking about future AI architectural choices. C has a couple of operations (use-after-free, buffer-overflow, and a few multithreading-related things) which expand false beliefs in one area of the system into major problems in seemingly-unrelated areas. The core mechanic of this is that, once you've corrupted a pointer or an array index, this generates opportunities to corrupt other things. Any memory-corruption attack surface you search through winds up yielding more opportunities to corrupt memory, in a supercritical way, eventually eventually yielding total control over the process and all its communication channels. If the process is an operating system kernel, there's nothing left to do; if it's, say, the renderer process of a web browser, then the attacker gets to leverage its communication channels to attack other processes, like the GPU driver and the compositor. This has the same sub-or-supercriticality dynamic. Some security strategies try to keep there from being any entry points into the domain where there might be supercritically-expanding access: memory-safe languages, linters, code reviews. C

2020

No posts for this year
Shortform
73Richard_Ngo3y
One fairly strong belief of mine is that Less Wrong's epistemic standards are not high enough to make solid intellectual progress here. So far my best effort to make that argument has been in the comment thread starting here [https://www.lesswrong.com/posts/HekjhtWesBWTQW5eF/agis-as-populations?commentId=5pmTAQrvWtoE4AWYe]. Looking back at that thread, I just noticed that a couple [https://www.lesswrong.com/posts/HekjhtWesBWTQW5eF/agis-as-populations?commentId=5dLZGTiqAEydBhLKm] of those comments [https://www.lesswrong.com/posts/HekjhtWesBWTQW5eF/agis-as-populations?commentId=xGnpeNj3gdb8vHoaK] have been downvoted to negative karma. I don't think any of my comments have ever hit negative karma before; I find it particularly sad that the one time it happens is when I'm trying to explain why I think this community is failing at its key goal of cultivating better epistemics. There's all sorts of arguments to be made here, which I don't have time to lay out in detail. But just step back for a moment. Tens or hundreds of thousands of academics are trying to figure out how the world works, spending their careers putting immense effort into reading and producing and reviewing papers. Even then, there's a massive replication crisis. And we're trying to produce reliable answers to much harder questions by, what, writing better blog posts, and hoping that a few of the best ideas stick? This is not what a desperate effort to find the truth looks like.
68jimrandomh3y
I am now reasonably convinced (p>0.8) that SARS-CoV-2 originated in an accidental laboratory escape from the Wuhan Institute of Virology. 1. If SARS-CoV-2 originated in a non-laboratory zoonotic transmission, then the geographic location of the initial outbreak would be drawn from a distribution which is approximately uniformly distributed over China (population-weighted); whereas if it originated in a laboratory, the geographic location is drawn from the commuting region of a lab studying that class of viruses, of which there is currently only one. Wuhan has <1% of the population of China, so this is (order of magnitude) a 100:1 update. 2. No factor other than the presence of the Wuhan Institute of Virology and related biotech organizations distinguishes Wuhan or Hubei from the rest of China. It is not the location of the bat-caves that SARS was found in; those are in Yunnan. It is not the location of any previous outbreaks. It does not have documented higher consumption of bats than the rest of China. 3. There have been publicly reported laboratory escapes of SARS twice before in Beijing, so we know this class of virus is difficult to contain in a laboratory setting. 4. We know that the Wuhan Institute of Virology was studying SARS-like bat coronaviruses. As reported in the Washington Post today, US diplomats had expressed serious concerns about the lab's safety. 5. China has adopted a policy of suppressing research into the origins of SARS-CoV-2, which they would not have done if they expected that research to clear them of scandal. Some Chinese officials are in a position to know. To be clear, I don't think this was an intentional release. I don't think it was intended for use as a bioweapon. I don't think it underwent genetic engineering or gain-of-function research, although nothing about it conclusively rules this out. I think the researchers had good intentions, and screwed up.
58ESRogs3y
I've been meaning for a while to be more public about my investing, in order to share ideas with others and get feedback. Ideally I'd like to write up my thinking in detail, including describing what my target portfolio would be if I was more diligent about rebalancing (or didn't have to worry about tax planning). I haven't done either of those things. But, in order to not let the perfect be the enemy of the good, I'll just share very roughly what my current portfolio is. My approximate current portfolio (note: I do not consider this to be optimal!): * 40% TSLA * 35% crypto -- XTZ, BTC, and ETH (and small amounts of LTC, XRP, and BCH) * 25% startups -- Kinta AI [http://www.kinta-ai.com/], Coase [https://coa.se/], and General Biotics [https://www.generalbiotics.com/] * 4% diversified index funds * 1% SQ (an exploratory investment -- there are some indications that I'd want to bet on them, but I want to do more research. Putting in a little bit of money forces me to start paying attention.) * <1% FUV (another exploratory investment) * -5% cash Some notes: * Once VIX comes down, I'll want to lever up a bit. Likely by increasing the allocation to index funds (and going more short cash). * One major way this portfolio differs from the portfolio in my heart is that it has no exposure to Stripe. If it was easy to do, I would probably allocate something like 5-10% to Stripe. * I have a high risk tolerance. I think both dispositionally, and because I buy 1) the argument from Lifecycle Investing [https://www.lesswrong.com/posts/4wL5rcS97rw58G98B/review-of-lifecycle-investing] that young(ish) people should be something like 2x leveraged and, 2) the argument that some EAs have made that people who plan to donate a lot should be closer to risk neutral than they otherwise would be. (Because your donations are a small fraction of the pool going to similar causes, so the utility in money is much closer to linear than for money yo
43TurnTrout3y
For the last two years, typing for 5+ minutes hurt my wrists. I tried a lot of things: shots, physical therapy, trigger-point therapy, acupuncture, massage tools, wrist and elbow braces at night, exercises, stretches. Sometimes it got better. Sometimes it got worse. No Beat Saber, no lifting weights, and every time I read a damn book I would start translating the punctuation into Dragon NaturallySpeaking syntax. Have you ever tried dictating a math paper in LaTeX? Or dictating code? Telling your computer "click" and waiting a few seconds while resisting the temptation to just grab the mouse? Dictating your way through a computer science PhD? And then.... and then, a month ago, I got fed up. What if it was all just in my head, at this point? I'm only 25. This is ridiculous. How can it possibly take me this long to heal such a minor injury? I wanted my hands back - I wanted it real bad. I wanted it so bad that I did something dirty: I made myself believe something. Well, actually, I pretended to be a person who really, really believed his hands were fine and healing and the pain was all psychosomatic. And... it worked, as far as I can tell. It totally worked. I haven't dictated in over three weeks. I play Beat Saber as much as I please. I type for hours and hours a day with only the faintest traces of discomfort. What?
2
36Rohin Shah3y
I often have the experience of being in the middle of a discussion and wanting to reference some simple but important idea / point, but there doesn't exist any such thing. Often my reaction is "if only there was time to write an LW post that I can then link to in the future". So far I've just been letting these ideas be forgotten, because it would be Yet Another Thing To Keep Track Of. I'm now going to experiment with making subcomments here simply collecting the ideas; perhaps other people will write posts about them at some point, if they're even understandable.
14

Load More Years