A general guide for pursuing independent research, from conceptual questions like "how to figure out how to prioritize, learn, and think", to practical questions like "what sort of snacks to should you buy to maximize productivity?"

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
invalid input syntax for type boolean: "Isn't the real concern AI-enabled surveillance?"
dreeves150
0
A couple days ago I wanted to paste a paragraph from Sarah Constantin's latest post on AGI into Discord and of course the italicizing disappeared which drives me bananas and I thought there must exist tools for solving that problem and there are but they're all abominations so I said to ChatGPT (4o), > can you build a simple html/javascript app with two text areas. the top text area is for rich text (rtf) and the bottom for plaintext markdown. whenever any text in either text area changes, the app updates the other text area. if the top one changes, it converts it to markdown and updates the bottom one. if the bottom one changes, it converts it to rich text and updates the top one. aaaand it actually did it and I pasted it into Replit and... it didn't work but I told it what errors I was seeing and continued going back and forth with it and ended up with the following tool without touching a single line of code: eat-the-richtext.dreev.es PS: Ok, I ended up going back and forth with it *a lot* (12h45m now in total, according to TagTime) to get to the polished state it's in now with tooltips and draggable divider and version number and other bells and whistles. But as of version 1.3.4 it's 100% ChatGPT's code with me guiding it in strictly natural language.
Akash1710
3
I'm surprised why some people are so interested in the idea of liability for extreme harms. I understand that from a legal/philosophical perspective, there are some nice arguments about how companies should have to internalize the externalities of their actions etc. But in practice, I'd be fairly surprised if liability approaches were actually able to provide a meaningful incentive shift for frontier AI developers. My impression is that frontier AI developers already have fairly strong incentives to avoid catastrophes (e.g., it would be horrible for Microsoft if its AI model caused $1B in harms, it would be horrible for Meta and the entire OS movement if an OS model was able to cause $1B in damages.) And my impression is that most forms of liability would not affect this cost-benefit tradeoff by very much. This is especially true if the liability is only implemented post-catastrophe. Extreme forms of liability could require insurance, but this essentially feels like a roundabout and less effective way of implementing some form of licensing (you have to convince us that risks are below an acceptable threshold to proceed.) I think liability also has the "added" problem of being quite unpopular, especially among Republicans. It is easy to attack liability regulations as anti-innovation, argue that that it creates a moat (only big companies can afford to comply), and argue that it's just not how America ends up regulating things (we don't hold Adobe accountable for someone doing something bad with Photoshop.) To be clear, I don't think "something is politically unpopular" should be a full-stop argument against advocating for it. But I do think that "liability for AI companies" scores poorly both on "actual usefulness if implemented" and "political popularity/feasibility." I also think the "liability for AI companies" advocacy often ends up getting into abstract philosophy land (to what extent should companies internalize externalities) and ends up avoiding some of the "weirder" points (we expect AI has a considerable chance of posing extreme national security risks, which is why we need to treat AI differently than Photoshop.) I would rather people just make the direct case that AI poses extreme risks & discuss the direct policy interventions that are warranted. With this in mind, I'm not an expert in liability and admittedly haven't been following the discussion in great detail (partly because the little I have seen has not convinced me that this is an approach worth investing into). I'd be interested in hearing more from people who have thought about liability– particularly concrete stories for how liability would be expected to meaningfully shift incentives of labs. (See also here).  Stylistic note: I'd prefer replies along the lines of "here is the specific argument for why liability would significantly affect lab incentives and how it would work in concrete cases" rather than replies along the lines of "here is a thing you can read about the general legal/philosophical arguments about how liability is good."
keltan532
14
From Newcastle, Australia to Berkeley, San Francisco. I arrived yesterday for Less.online. I’ve had a bit of culture shock, a big helping of being increasingly scared, and quite a few questions. I’ll start with those. Feel free to skip them. These questions are based on warnings I’ve gotten from local non-rationalists. Idk if they’re scared because of the media they consume or because of actual stats. I’m asking these because they feel untrue. 1. Is it ok to be outside after dark? 2. Will I really get ‘rolled’ mid day in Oakland? 3. Are there gangs walking around Oakland looking to stab people? 4. Will all the streets fill up with homeless people at night? 5. Are they chill? In Aus they’re usually down to talk if you are. Culture shocks for your enjoyment: 1. Why is everyone doing yoga? 2. To my Uber driver: “THAT TRAIN IS ON THE ROAD!?” 3. “I thought (X) was just in movies!” 4. Your billboards are about science instead of coal mining! 5. “Wait, you’re telling me everything is vegan?” Thank Bayes, this is the best. All our vegan restaurants went out of business. 6. People brag about things? And they do it openly? At least, I think that’s what’s happening? 7. “Silicon Valley is actually a valley?!” Should have predicted this one. I kinda knew, but I didn’t know like I do now. 8. “Wow! This shop is openly selling nangs!” (whip its) “And a jungle juice display!” 9. All your cars are so new and shiny. 60% of ours are second hand 10. Most people I see in the streets look below 40. It’s like I’m walking around a university! 11. Wow. It’s really sunny. 12. American accents irl make me feel like I’m walking through a film. 13. “HOLY SHIT! A CYBER TRUCK?!” 14. Ok this is a big one. Apps I’ve had for 8+ years are suddenly different when I arrive here? 15. This is what Uber is meant to be. I will go back to Australia and cry. Your airport has custom instruction… in app! WHAT!? The car arrives in 2 minutes instead of 30 minutes. Also, the car arrives at all. 16. The google app has a beaker for tests now? 17. Snap maps has gifs in it 18. Apple Maps lets you scan buildings? And has tips about good restaurants and events? 19. When I bet in the Manifold app. A real paper Crain flies from the nearest tree, lands in front of me and unfolds. Written inside, “Will Eliezer Yudkowsky open a rationalist bakery?” I circle “Yes”. The paper meticulously folds itself back to a Crain. It looks at me. Makes a little sound that doesn’t echo in the streets but in my head, and it burns. Every time this happens I save the ashes. Are Manifold creating new matter? How are they doing this? 20. That one was a lie Things that won’t kill me but scare me rational/irrational: 1. What if I’ve been wrong? What if this is all a scam? A cult? What if Mum was right? 2. What if I show up to the location and there is no building there? 3. What if I make some terribly awkward cultural blunder for SF and everyone yells at me? 4. What if no one tells me? 5. I’m sure I’ll be at least in the bottom 5% for intelligence at Less Online. I won’t be surprised or hurt if I’ve got the least Gs of people there. But what if it all goes over my head? Maybe I can’t even communicate with smart people about the things I care about. 6. What if I can’t handle people telling me what they think of my arguments without kid gloves? What if I get angry and haven’t learnt to handle that? 7. I’m just a Drama teacher and Psych student. My head is filled with improv games and fun facts about Clever Hans! ‘Average’ Americans seem to achieve much higher than ‘average’ Australians. I’m scared of feeling under qualified. Other things: 1. Can you think of something I should be worried about, that I’ve not written here? 2. I’ve brought my copies of the Rationality A-Z books. I want to ask people I meet to sign their favourite post in the two books. Is that culturally acceptable? Feels kinda weird bc Yud is going to be there. But it would be a really warm/fuzzy item to me in the future. 3. I don’t actually know what a lot of the writers going look like. I hope this doesn’t result in a blunder. But might be funny, given that I expect rationalists to be pretty chill. 4. Are other people as excited about the Fooming Shoggoths as I am? 5. I’m 23, I have no idea if that is very old, very young, or about normal for a rationalist. I’d guess about normal, with big spread across the right of a graph. It feels super weird to be in the same town as a bunch of you guys now. I’ve never met a rationalist irl. I talked to Ruby over zoom once, who said to me “You know you don’t have to stay in Australia right?” I hope Ruby is a good baseline for niceness levels of you all. If you’re going, I’ll see you at Less.Online. If you’re not, I’d still love to meet you. Feel free to DM me!
My mainline prediction scenario for the next decades. My mainline prediction * : * LLMs will not scale to AGI. They will not spawn evil gremlins or mesa-optimizers. BUT Scaling laws will continue to hold and future LLMs will be very impressive and make a sizable impact on the real economy and science over the next decade.  * there is a single innovation left to make AGI-in-the-alex sense work, i.e. coherent, long-term planning agents (LTPA) that are effective and efficient in data sparse domains over long horizons.  * that innovation will be found within the next 10-15 years * It will be clear to the general public that these are dangerous  * governments will act quickly and (relativiely) decisively to  bring these agents under state-control. national security concerns will dominate.  * power will reside mostly with governments AI safety institutes and national security agencies. In so far as divisions of tech companies are able to create LTPAs they will be effectively nationalized.  * International treaties will be made to constrain AI, outlawing the development of LTPAs by private companies. Great power competition will mean US and China will continue developing LTPAs, possibly largely boxed. Treaties will try to constrain this development with only partial succes (similar to nuclear treaties).  * LLMs will continue to exist and be used by the general public * Conditional on AI ruin the closest analogy is probably something like the Cortez-Pizarro-Afonso takeovers. Unaligned AI will rely on human infrastructure and human allies for the earlier parts of takeover - but its inherent advantages in tech, coherence, decision-making and (artificial) plagues will be the deciding factor. *  The world may be mildly multi-polar.  * This will involve conflict between AIs. * AIs very possible may be able to cooperate in ways humans can't.  * The arrival of AGI will immediately inaugurate a scientific revolution. Sci-fi sounding progress like advanced robotics, quantum magic, nanotech, life extension, laser weapons, large space engineering, cure of many/most remaining diseases will become possible within two decades of AGI, possibly much faster.  * Military power will shift to automated manufacturing of drones &  weaponized artificial plagues. Drones, mostly flying will dominate the battlefield. Mass production of drones and their rapid and effective deployment in swarms will be key to victory.   Two points on which I differ with most commentators: (i) I believe AGI is a real (mostly discrete) thing , not a vibe, or a general increase of improved tools. I believe it is inherently agenctic. I don't think spontaneous emergence of agents is impossible but I think it is more plausible agents will be built rather than grown.  (ii) I believe in general the ea/ai safety community is way overrating the importance of individual tech companies vis a vis broader trends and the power of governments. I strongly agree with Stefan Schubert's take here on the latent hidden power of government: https://stefanschubert.substack.com/p/crises-reveal-centralisation Consequently, the ea/ai safety community is often myopically focusing on boardroom politics that are relativily inconsequential in the grand scheme of things.  *where by mainline prediction I mean the scenario that is the mode of what I expect. This is the single likeliest scenario. However, since it contains a large number of details each of which could go differently, the probability on this specific scenario is still low. 
Feels like FLI is a massively underrated org. Cos of the whole vitalik donation thing they have like $300mn. 

Popular Comments

Recent Discussion

Helen Toner went on the TED AI podcast, giving us more color on what happened at OpenAI. These are important claims to get right.

I will start with my notes on the podcast, including the second part where she speaks about regulation in general. Then I will discuss some implications more broadly.

Notes on Helen Toner’s TED AI Show Podcast

This seems like it deserves the standard detailed podcast treatment. By default each note’s main body is description, any second-level notes are me.

  1. (0:00) Introduction. The host talks about OpenAI’s transition from non-profit research organization to de facto for-profit company. He highlights the transition from ‘open’ AI to closed as indicative of the problem, whereas I see this as the biggest thing they got right. He also notes that he was
...
9MichaelDickens
He was caught lying about the non-disparagement agreements, but I guess lying to the public is fine as long as you don't lie to the board? Taylor's and Summers' comments here are pretty disappointing—it seems that they have no issue with, and maybe even endorse, Sam's now-publicly-verified bad behavior.

we have found Mr Altman highly forthcoming

That's exactly the line that made my heart sink.

I find it a weird thing to choose to say/emphasize.

The issue under discussion isn't whether Altman hid things from the new board; it's whether he hid things to the old board a long while ago.

Of course he's going to seem forthcoming towards the new board at first. So, the new board having the impression that he was forthcoming towards them? This isn't information that helps us much in assessing whether to side with Altman vs the old board. That makes me think: why repo... (read more)

3PeterH
I think that Paul Graham’s remarks today—particularly the “we didn’t want him to leave” part—make it clear that Altman was not fired. In December 2023, Paul Graham gave a similar account to the Wall St Journal and said “it would be wrong to use the word ‘fired’”. Roon has a take.
1Dana
These are the remarks Zvi was referring to in the post. Also worth noting Graham's consistent choice of the word 'agreed' rather than 'chose', and Altman's failed attempt to transition to chairman/advisor to YC. It sure doesn't sound like Altman was the one making the decisions here.

It seems like one of the biggest problems* in AI Safety is that it is ridiculously hard to get good training (i.e. MATS is ridiculously competitive now) and employed (samesies).

Has anyone look across other categories (e.g. potential other sciences) to see how this problem has been solved? I assume at the most macro level it is going to be "Industry" vs "Government" but I'm looking for more concrete interventions.

Thoughts?

*we're turning away very smart, motivated, well-meaning and skilled people. This is bad.

Dagon20

This seems to be asking from the demand side ("we" being people with lots of money who want to hire trained people), but then switches to demand side (people being turned away looking for training and employment).  

I think that's a hint to your answer: other industries solve it by actually hiring lots of people, and offering training on the job or with regular programs.  Oh, and usually waiting for equilibrium to catch up, which is not comfortable for rapid-change requirements.

As we explained in our MIRI 2024 Mission and Strategy update, MIRI has pivoted to prioritize policy, communications, and technical governance research over technical alignment research. This follow-up post goes into detail about our communications strategy.

The Objective: Shut it Down[1]

Our objective is to convince major powers to shut down the development of frontier AI systems worldwide before it is too late. We believe that nothing less than this will prevent future misaligned smarter-than-human AI systems from destroying humanity. Persuading governments worldwide to take sufficiently drastic action will not be easy, but we believe this is the most viable path.

Policymakers deal mostly in compromise: they form coalitions by giving a little here to gain a little somewhere else. We are concerned that most legislation intended to keep humanity alive will go...

3ryan_greenblatt
Hmm, I'm not sure I exactly buy this. I think you should probably follow something like onion honesty which can involve intentionally simplifying your message to something you expect will give the audience more true views. I think you should lean on the side of stating things, but still, sometimes stating a thing which is true can be clearly distracting and confusing and thus you shouldn't.
Raemon40

Man I just want to say I appreciate you following up on each subthread and noting where you agree/disagree, it feels earnestly truthseeky to me.

6Matthew Barnett
I appreciate the straightforward and honest nature of this communication strategy, in the sense of "telling it like it is" and not hiding behind obscure or vague language. In that same spirit, I'll provide my brief, yet similarly straightforward reaction to this announcement: 1. I think MIRI is incorrect in their assessment of the likelihood of human extinction from AI. As per their messaging, several people at MIRI seem to believe that doom is >80% likely in the 21st century (conditional on no global pause) whereas I think it's more like <20%. 2. MIRI's arguments for doom are often difficult to pin down, given the informal nature of their arguments, and in part due to their reliance on analogies, metaphors, and vague supporting claims instead of concrete empirically verifiable models. Consequently, I find it challenging to respond to MIRI's arguments precisely. The fact that they want to essentially shut down the field of AI based on these largely informal arguments seems premature to me. 3. MIRI researchers rarely provide any novel predictions about what will happen before AI doom, making their theories of doom appear unfalsifiable. This frustrates me. Given a low prior probability of doom as apparent from the empirical track record of technological progress, I think we should generally be skeptical of purely theoretical arguments, especially if they are vague and make no novel, verifiable predictions prior to doom. 4. Separately from the previous two points, MIRI's current most prominent arguments for doom seem very weak to me. Their broad model of doom appears to be something like the following (although they would almost certainly object to the minutia of how I have written it here): (1) At some point in the future, a powerful AGI will be created. This AGI will be qualitatively distinct from previous, more narrow AIs. Unlike terms such as "the economy", "GPT-4", or "Microsoft", this AGI is not a mere collection of entities or tools integrated into
4ryan_greenblatt
I basically agree with your overall comment, but I'd like to push back in one spot: From my understanding, for at least Nate Soares, he claims his internal case for >80% doom is disjunctive and doesn't route all through 1, 2, 3, and 4. I don't really know exactly what the disjuncts are, so this doesn't really help and I overall agree that MIRI does make "sweeping claims with high confidence".


You are invited to join Vision Weekend Europe, the annual festival of Foresight Institute at Bückeburg Castle in Germany from July 12 - 14. 

What’s this year’s theme? This year’s main conference track is dedicated to “Paths to Progress”; meaning you will hear 10+ invited presentations from Foresight’s core community highlighting paths to progress in the following areas:

  • Long-term History & Flourishing Futures
  • Longevity, Rejuvenation, Cryonics
  • Molecular Machines, Computing, APM
  • Neurotech, BCIs & WBEs
  • Cryptography, Security & AI
  • Energy, Space, Expansion
  • Funding, Innovation, Progress

Confirmed presenters include Jaan Tallinn (Future of Life Institute), Hendrik Dietz (Dietz Lab), Anders Sandberg (University of Oxford), Catalin Mitelut (NYU), Muriel Richard-Noca (ClearSpace), Nikolina Lauc (GlycanAge), Andrew Critch (Encultured), Joao Pedro De Magalhaes (University of Birmingham), Jeremy Barton, Toby Pilditch (Transformative Futures Institute), Matjaz Leonardis (Oxford University), Trent McConaghy (Ocean Protocol), Chiara...

Since at least 2017, OpenAI has asked departing employees to sign offboarding agreements which legally bind them to permanently—that is, for the rest of their lives—refrain from criticizing OpenAI, or from otherwise taking any actions which might damage its finances or reputation.[1]

If they refused to sign, OpenAI threatened to take back (or make unsellable) all of their already-vested equity—a huge portion of their overall compensation, which often amounted to millions of dollars. Given this immense pressure, it seems likely that most employees signed.

If they did sign, they became personally liable forevermore for any financial or reputational harm they later caused. This liability was unbounded, so had the potential to be financially ruinous—if, say, they later wrote a blog post critical of OpenAI, they might in principle be...

Geoffrey Irving (Research Director, AI Safety Institute)

Given the tweet thread Geoffrey wrote during the board drama, it seems pretty clear that he's willing to publicly disparage OpenAI. (I used to work with Geoffrey, but have no private info here)

8Garrett Baker
A market on the subject: https://manifold.markets/GarrettBaker/which-of-the-names-below-will-i-rec?r=R2FycmV0dEJha2Vy
1quila
I'm a different person but I would support contracts which disallow spread of capabilities insights, but not contracts which disallow criticism of AI orgs (and especially not surprise ones). IIUC the latter is what what the OAI-NonDisparagement-controversy has been about. I'm not confident the following is true, but it seems to me that your first question was written under a belief that the controversy was about both of those at once. It seems like it was trying (under that world model) to 'axiomatically' elicit a belief in disagreement with an ongoing controversy, which would be non-truthseeking.
2the gears to ascension
That seems like a misgeneralization, and I'd like to hear what thoughts you'd have depending on the various answers that could be given in the framework you raise. I'd imagine that there are a wide variety of possible ways a person could be limited in what they choose to say, and being threatened if they say things is a different situation than if they voluntarily do not: for example, the latter allows them to criticize.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Labs should give deeper model access to independent safety researchers (to boost their research)

Sharing deeper access helps safety researchers who work with frontier models, obviously.

Some kinds of deep model access:

  1. Helpful-only version
  2. Fine-tuning permission
  3. Activations and logits access
  4. [speculative] Interpretability researchers send code to the lab; the lab runs the code on the model; the lab sends back the results

See Shevlane 2022 and Bucknall and Trager 2023.

A lab is disincentivized from sharing deep model access because it doesn't want headlines about h... (read more)

List of 27 papers (supposedly) given to John Carmack by Ilya Sutskever: "If you really learn all of these, you’ll know 90% of what matters today." 
The list has been floating around for a few weeks on Twitter/LinkedIn. I figure some might have missed it so here you go.
Regardless of the veracity of the tale I am still finding it valuable.

https://punkx.org/jackdoe/30.html

  1. The Annotated Transformer (nlp.seas.harvard.edu)
  2. The First Law of Complexodynamics (scottaaronson.blog)
  3. The Unreasonable Effectiveness of RNNs (karpathy.github.io)
  4. Understanding LSTM Networks (colah.github.io)
  5. Recurrent Neural Network Regularization (arxiv.org)
  6. Keeping Neural Networks Simple by Minimizing the Description Length of the Weights (cs.toronto.edu)
  7. Pointer Networks (arxiv.org)
  8. ImageNet Classification with Deep CNNs (proceedings.neurips.cc)
  9. Order Matters: Sequence to sequence for sets (arxiv.org)
  10. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism (arxiv.org)
  11. Deep Residual Learning for Image Recognition (arxiv.org)
  12. Multi-Scale Context Aggregation by Dilated Convolutions
...

I like this format and framing of "90% of what matters" and someone should try doing it with other subjects.

4Amalthea
Might be good to estimate the date of the recommendation - as the interview where Car mack mentioned this was in 2023, a rough guess might be 2021/22?

LessOnline Festival

May 31st to June 2nd, Berkeley CA