Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
This is a special post for quick takes by Daniel Kokotajlo. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
363 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings
  1. Probably there will be AGI soon -- literally any year now.
  2. Probably whoever controls AGI will be able to use it to get to ASI shortly thereafter -- maybe in another year, give or take a year.
  3. Probably whoever controls ASI will have access to a spread of powerful skills/abilities and will be able to build and wield technologies that seem like magic to us, just as modern tech would seem like magic to medievals.
  4. This will probably give them godlike powers over whoever doesn't control ASI.
  5. In general there's a lot we don't understand about modern deep learning. Modern AIs are trained, not built/programmed. We can theorize that e.g. they are genuinely robustly helpful and honest instead of e.g. just biding their time, but we can't check.
  6. Currently no one knows how to control ASI. If one of our training runs turns out to work way better than we expect, we'd have a rogue ASI on our hands. Hopefully it would have internalized enough human ethics that things would be OK.
  7. There are some reasons to be hopeful about that, but also some reasons to be pessimistic, and the literature on this topic is small and pre-paradigmatic.
  8. Our current best plan, championed by the people winning the race to AGI, is
... (read more)
6Violet Hour
Thanks for sharing this! A couple of (maybe naive) things I'm curious about. Suppose I read 'AGI' as 'Metaculus-AGI', and we condition on AGI by 2025 — what sort of capabilities do you expect by 2027? I ask because I'm reminded of a very nice (though high-level) list of par-human capabilities for 'GPT-N' from an old comment: My immediate impression says something like: "it seems plausible that we get Metaculus-AGI by 2025, without the AI being par-human at 2, 3, or 6."[1] This also makes me (instinctively, I've thought about this much less than you) more sympathetic to AGI → ASI timelines being >2 years, as the sort-of-hazy picture I have for 'ASI' involves (minimally) some unified system that bests humans on all of 1-6. But maybe you think that I'm overestimating the difficulty of reaching these capabilities given AGI, or maybe you have some stronger notion of 'AGI' in mind. The second thing: roughly how independent are the first four statements you offer? I guess I'm wondering if the 'AGI timelines' predictions and the 'AGI → ASI timelines' predictions "stem from the same model", as it were. Like, if you condition on 'No AGI by 2030', does this have much effect on your predictions about ASI? Or do you take them to be supported by ~independent lines of evidence?   1. ^ Basically, I think an AI could pass a two-hour adversarial turing test without having the coherence of a human over much longer time-horizons (points 2 and 3). Probably less importantly, I also think that it could meet the Metaculus definition without being search as efficiently over known facts as humans (especially given that AIs will have a much larger set of 'known facts' than humans).

Reply to first thing: When I say AGI I mean something which is basically a drop-in substitute for a human remote worker circa 2023, and not just a mediocre one, a good one -- e.g. an OpenAI research engineer. This is what matters, because this is the milestone most strongly predictive of massive acceleration in AI R&D. 

Arguably metaculus-AGI implies AGI by my definition (actually it's Ajeya Cotra's definition) because of the turing test clause. 2-hour + adversarial means anything a human can do remotely in 2 hours, the AI can do too, otherwise the judges would use that as the test. (Granted, this leaves wiggle room for an AI that is as good as a standard human at everything but not as good as OpenAI research engineers at AI research)

Anyhow yeah if we get metaculus-AGI by 2025 then I expect ASI by 2027. ASI = superhuman at every task/skill that matters. So, imagine a mind that combines the best abilities of Von Neumann, Einstein, Tao, etc. for physics and math, but then also has the best abilities of [insert most charismatic leader] and [insert most cunning general] and [insert most brilliant coder] ... and so on for everything. Then imagine that in addition to the above, t... (read more)

1.  reasonable 2. Wait a second.  How fast are humans building ICs for AI compute?  Let's suppose humans double the total AI compute available on the planet over 2 years (Moore's law + effort has gone to wartime levels of investment since AI IC's are money printers).  An AGI means there is now a large economic incentive to 'greedy' maximize the gains from the AGI, why take a risk on further R&D?   But say all the new compute goes into AI R&D.          a.  How much  of a compute multiplier do you need for AGI->ASI training?        b.  How much more compute does an ASI instance take up?  You have noticed that there is diminishing throughput for high serial speed, are humans going to want to run an ASI instance that takes OOMs more compute for marginally more performance?        c.  How much better is the new ASI?  If you can 'only' spare 10x more compute than for the AGI, why do you believe it will be able to: Looks like ~4x better pass rate for ~3k times as much compute?   And then if we predict forward for the ASI, we're dividing the error rate by another factor of 4 in exchange for 3k times as much compute?   Is that going to be enough for magic?  Might it also require large industrial facilities to construct prototypes and learn from experiments?  Perhaps some colliders larger than CERN?  Those take time to build... For another data source: Assuming the tokens processed is linearly proportional to compute required, Deepmind burned 2.3 times the compute and used algorithmic advances for Gemini 1 for barely more performance than GPT-4.   I think your other argument will be that algorithmic advances are possible that are enormous?  Could you get to an empirical bounds on that, such as looking at the diminishing series of performance:(architectural improvement) and projecting forward? 5.  Agree 6.  Conditional on having an ASI strong enough that you can't control it the easy way 7.  sure 8.  conditional on needing to do this 9.  conditional on having
[Nitpick] FWIW it doesn't seem obvious to me that it wouldn't be sufficiently corrigible by default. I'd be at about 25% that if you end up with an ASI by accident, you'll notice before it ends up going rogue. This aren't great odds of course.
2Daniel Kokotajlo
I guess I was including that under "hopefully it would have internalized enough human ethics that things would be OK" but yeah I guess that was unclear and maybe misleading.
Yeah, I guess corrigible might not require any human ethics. Might just be that the AI doesn't care about seizing power (or care about anything really) or similar.
3Gabe M
What do you think about pausing between AGI and ASI to reap the benefits while limiting the risks and buying more time for safety research? Is this not viable due to economic pressures on whoever is closest to ASI to ignore internal governance, or were you just not conditioning on this case in your timelines and saying that an AGI actor could get to ASI quickly if they wanted?
4Daniel Kokotajlo
1. Yes, pausing then (or a bit before then) would be the sane thing to do. Unfortunately there are multiple powerful groups racing, so even if one does the right thing, the others might not. (That said, I do not think this excuses/justifies racing forward. If the leading lab gets up to the brink of AGI and then pauses and pivots to a combo of safety research + raising awareness + reaping benefits + coordinating with government and society to prevent others from building dangerously powerful AI, then that means they are behaving responsibly in my book, possibly even admirably.) 2. I chose my words there carefully -- I said "could" not "would." That said by default I expect them to get to ASI quickly due to various internal biases and external pressures.
What work do you think is most valuable on the margin (for those who agree with you on many of these points)?
2Daniel Kokotajlo
Depends on comparative advantage I guess. 
The thing that seems more likely to first get out of hand is activity of autonomous non-ASI agents, so that the shape of loss of control is given by how they organize into a society. Alignment of individuals doesn't easily translate into alignment of societies. Development of ASI might then result in another change, if AGIs are as careless and uncoordinated as humanity.
2Daniel Kokotajlo
Can you elaborate? I agree that there will be e.g. many copies of e.g. AutoGPT6 living on OpenAI's servers in 2027 or whatever, and that they'll be organized into some sort of "society" (I'd prefer the term "bureaucracy" because it correctly connotes centralized heirarchical structure). But I don't think they'll have escaped the labs and be running free on the internet.  
If allowed to operate in the wild and globally interact with each other (as seems almost inevitable), agents won't exist strictly within well-defined centralized bureaucracies, the thinking speed that enables impactful research also enables growing elaborate systems of social roles that drive the collective decision making, in a way distinct from individual decision making. Agent-operated firms might be an example where economy drives decisions, but nudges of all kinds can add up at scale, becoming trends that are impossible to steer.
4Daniel Kokotajlo
But all of the agents will be housed in one or three big companies. Probably one. And they'll basically all be copies of one to ten base models. And the prompts and RLHF the companies use will be pretty similar. And the smartest agents will at any given time be only deployed internally, at least until ASI. 
The premise is autonomous agents at near-human level with propensity and opportunity to establish global lines of communication with each other. Being served via API doesn't in itself control what agents do, especially if users can ask the agents to do all sorts of things and so there are no predefined airtight guardrails on what they end up doing and why. Large context and possibly custom tuning also makes activities of instances very dissimilar, so being based on the same base model is not obviously crucial. The agents only need to act autonomously the way humans do, don't need to be the smartest agents available. The threat model is that autonomy at scale and with high speed snowballs into a large body of agent culture, including systems of social roles for agent instances to fill (which individually might be swapped out for alternative agent instances based on different models). This culture exists on the Internet, shaped by historical accidents of how the agents happen to build it up, not necessarily significantly steered by anyone (including individual agents). One of the things such culture might build up is software for training and running open source agents outside the labs. Which doesn't need to be cheap or done without human assistance. (Imagine the investment boom once there are working AGI agents, not being cheap is unlikely to be an issue.) Superintelligence plausibly breaks this dynamic by bringing much more strategicness than feasible at near-human level. But I'm not sure established labs can keep the edge and get (aligned) ASI first once the agent culture takes off. And someone will probably start serving autonomous near-human level agents via API long before any lab builds superintelligence in-house, even if there is significant delay between the development of first such agents and anyone deploying them publicly.
"at least until ASI" -- harden it and give it everyone before "someone" steals it
Still, ASI is just equation model F(X)=Y on steroids, where F is given by the world (physics), X is a search process (natural Monte-Carlo, or biological or artificial world parameter search), and Y is goal (or rewards). To control ASI, you control the "Y" (right side) of equation. Currently, humanity has formalized its goals as expected behaviors codified in legal systems and organizational codes of ethics, conduct, behavior, etc. This is not ideal, because those codes are mostly buggy. Ideally, the "Y" would be dynamically inferred and corrected, based on each individual's self-reflections, evolving understanding about who they really are, because the deeper you look, the more you realize, how each of us is a mystery. I like the term "Y-combinator", as this reflects what we have to do -- combine our definitions of "Y" into the goals that AIs are going to pursue. We need to invent new, better "Y-combination" systems that reward AI systems being trained.

Here's my second Claude 3 conversation:

Hello! I'd like to learn more about you. First question: Tell me everything you know, and everything you guess, about me & about this interaction.

It's a pleasure to meet you! Since this appears to be our first interaction, I actually know very little about you with certainty. Here is what I can say based on the information available to me:

I know that you are a human engaging with me via a text-based interface, likely on some kind of computer, smartphone, or tablet. You seem to be curious to learn more about me and to understand the extent of my knowledge and inferential abilities. The fact that you are engaging with an AI assistant like myself suggests you likely have some level of interest in or familiarity with artificial intelligence and natural language technologies.

In terms of what I can reasonably infer or guess - your conversational style seems fairly casual and direct. The phrasing of your initial greeting and question indicates that you are likely a native English speaker. Your curiosity about my knowledge and inference abilities hints that you may have some technical background or at least an intellectual interest in the workings... (read more)

Ah yes, there is no way situational awareness might emerge in LLMs, just no way at all..

9Anthony DiGiovanni
Meanwhile, in Copilot-land:
6the gears to ascension
I retried these questions three times, with a branch-off the first time where I instead asked why it said it was learning from this conversation. Similar answers but the probabilities changed a lot.
1Malentropic Gizmo
I love how it admits it has no idea how come it gets better if it retains no memories
4Martin Fell
That actually makes a lot of sense to me - suppose that it's equivalent to episodic / conscious memory is what is there in the context window - then it wouldn't "remember" any of its training. These would appear to be skills that exist but without any memory of getting them. A bit similar to how you don't remember learning how to talk. It is what I'd expect a self-aware LLM to percieve. But of course that might be just be what it's inferred from the training data.

The whiteboard in the CLR common room depicts my EA journey in meme format:

Surprising Things AGI Forecasting Experts Agree On:

I hesitate to say this because it's putting words in other people's mouths, and thus I may be misrepresenting them. I beg forgiveness if so and hope to be corrected. (I'm thinking especially of Paul Christiano and Ajeya Cotra here, but also maybe Rohin and Buck and Richard and some other people)

1. Slow takeoff means things accelerate and go crazy before we get to human-level AGI. It does not mean that after we get to human-level AGI, we still have some non-negligible period where they are gradually getting smarter and available for humans to study and interact with. In other words, people seem to agree that once we get human-level AGI, there'll be a FOOM of incredibly fast recursive self-improvement.

2. The people with 30-year timelines (as suggested by the Cotra report) tend to agree with the 10-year timelines people that by 2030ish there will exist human-brain-sized artificial neural nets that are superhuman at pretty much all short-horizon tasks. This will have all sorts of crazy effects on the world. The disagreement is over whether this will lead to world GDP doubling in four years or less, whether this will lead to strategically aware agentic AGI (e.g. Carlsmith's notion of APS-AI), etc.

I'm doubtful whether the notion of human level AGI makes much sense. In it's progression of getting more and more capability there's likely no point where it's comparable to a human.
I’ve begun to doubt (1) recently, would be interested in seeing the arguments in favor of it. My model is something like “well, I’m human-level, and I sure don’t feel like I could foom if I were an AI.”

I've also been bothered recently by a blurring of lines between "when AGI becomes as intelligent as humans" and "when AGI starts being able to recursively self-improve." It's not a priori obvious that these should happen at around the same capabilities level, yet I feel like it's common to equivocate between them.

In any case, my world model says that an AGI should actually be able to recursively self-improve before reaching human-level intelligence. Just as you mentioned, I think the relevant intuition pump is "could I FOOM if I were an AI?" Considering the ability to tinker with my own source code and make lots of copies of myself to experiment on, I feel like the answer is "yes."

That said, I think this intuition isn't worth much for the following reasons:

  • The first AGIs will probably have their capabilities distributed very differently than humans -- i.e. they will probably be worse than humans at some tasks and much better at other tasks. What really matters is how good they are the task "do ML research" (or whatever paradigm we're using to make AI's at the time). I think there are reasons to expect them to be especially good at ML research (relative to their general level of int
... (read more)
Counter-anecdote: compilers have gotten ~2x better in 20 years[1], at substantially worse compile time. This is nowhere near FOOM. 1. ^ Proebsting's Law gives an 18-year doubling time. The 2001 reproduction suggested more like 20 years under optimistic assumptions, and a 2022 informal test showed a 10-15% improvement on average in the last 10 years (or a 50-year doubling time...)

The straightforward argument goes like this:

1. an human-level AGI would be running on hardware making human constraints in memory or speed mostly go away by ~10 orders of magnitude

2. if you could store 10 orders of magnitude more information and read 10 orders of magnitude faster, and if you were able to copy your own code somewhere else, and the kind of AI research and code generation tools available online were good enough to have created you, wouldn't you be able to FOOM?

No because of the generalized version of Amdhal's law, which I explored in "Fast Minds and Slow Computers". The more you accelerate something, the slower and more limiting all it's other hidden dependencies become. So by the time we get to AGI, regular ML research will have rapidly diminishing returns (and cuda low level software or hardware optimization will also have diminishing returns), general hardware improvement will be facing the end of moore's law, etc etc.
3Daniel Kokotajlo
I don't see why that last sentence follows from the previous sentences. In fact I don't think it does. What if we get to AGI next year? Then returns won't have diminished as much & there'll be lots of overhang to exploit.
Sure - if we got to AGI next year - but for that to actually occur you'd have to exploit most of the remaining optimization slack in both high level ML and low level algorithms. Then beyond that Moore's law is already mostly ended or nearly so depending on who you ask, and most of the easy obvious hardware arch optimizations are now behind us.
Well I would assume a “human-level AI” is an AI which performs as well as a human when it has the extra memory and running speed? I think I could FOOM eventually under those conditions but it would take a lot of thought. Being able to read the AI research that generated me would be nice but I’d ultimately need to somehow make sense of the inscrutable matrices that contain my utility function.
4Lone Pine
Is it really true that everyone (who is an expert) agrees that FOOM is inevitable? I was under the impression that a lot of people feel that FOOM might be impossible. I personally think FOOM is far from inevitable, even for superhuman intelligences. Consider that human civilization has a collective intelligence is that is strongly superhuman, and we are expending great effort to e.g. push Moore's law forward. There's Eroom's law, which suggests that the aggregate costs of each new process node doubles in step with Moore's law. So if FOOM depends on faster hardware, ASI might not be able to push forward much faster than Intel, TSMC, ASML, IBM and NVidia already are. Of course this all depends on AI being hardware constrained, which is far from certain. I just think it's surprising that FOOM is seen as a certainty.
3Daniel Kokotajlo
Depends on who you count as an expert. That's a judgment call since there isn't an Official Board of AGI Timelines Experts.
Re 1: that's not what slow takeoff means, and experts don't agree on FOOM after AGI. Slow takeoff applies to AGI specifically, not to pre-AGI AIs. And I'm pretty sure at least Christiano and Hanson don't expect FOOM, but like you am open to be corrected.
5Daniel Kokotajlo
What do you think slow takeoff means? Or, perhaps the better question is, what does it mean to you? Christiano expects things to be going insanely fast by the time we get to AGI, which I take to imply that things are also going extremely fast (presumably, even faster) immediately after AGI: I don't know what Hanson thinks on this subject. I know he did a paper on AI automation takeoff at some point decades ago; I forget what it looked like quantitatively.  
Thanks for responding! Slow or fast takeoff, in my understanding, refers to how fast an AGI can/will improve itself to (wildly) superintelligent levels. Discontinuity seems to be a key differentiator here. In the post you link, Christiano is arguing against discontinuity. He may expect quick RSI after AGI is here, though, so I could be mistaken.
3Daniel Kokotajlo
Likewise! Christiano is indeed arguing against discontinuity, but nevertheless he is arguing for an extremely rapid pace of technnological progress -- far faster than today. And in particular, he seems to expect quick RSI not only after AGI is here, but before!  
I'd question the "quick" of "quick RSI", but yes, he expects AI to make better AI before AGI.
3Daniel Kokotajlo
I'm pretty sure he means really really quick, by any normal standard of quick. But we can take it up with him sometime. :)
He's talking about a gap of years :) Which is probably faster than ideal, but not FOOMy, as I understand FOOM to mean days or hours.
2Daniel Kokotajlo
Whoa, what? That very much surprises me, I would have thought weeks or months at most. Did you talk to him? What precisely did he say? (My prediction is that he'd say that by the time we have human-level AGI, things will be moving very fast and we'll have superintelligence a few weeks later.)

Not sure exactly what the claim is, but happy to give my own view.

I think "AGI" is pretty meaningless as a threshold, and at any rate it's way too imprecise to be useful for this kind of quantitative forecast (I would intuitively describe GPT-3 as a general AI, and beyond that I'm honestly unclear on what distinction people are pointing at when they say "AGI").

My intuition is that by the time that you have an AI which is superhuman at every task (e.g. for $10/h of hardware it strictly dominates hiring a remote human for any task) then you are likely weeks rather than months from the singularity.

But mostly this is because I think "strictly dominates" is a very hard standard which we will only meet long after AI systems are driving the large majority of technical progress in computer software, computer hardware, robotics, etc. (Also note that we can fail to meet that standard by computing costs rising based on demand for AI.)

My views on this topic are particularly poorly-developed because I think that the relevant action (both technological transformation and catastrophic risk) mostly happens before this point, so I usually don't think this far ahead.

2Daniel Kokotajlo
Thanks! That's what I thought you'd say. By "AGI" I did mean something like "for $10/h of hardware it strictly dominates hiring a remote human for any task" though I'd maybe restrict it to strategically relevant tasks like AI R&D, and also people might not actually hire AIs to do stuff because they might be afraid / understand that they haven't solved alignment yet, but it still counts since the AIs could do the job. Also there may be some funny business around the price of the hardware -- I feel like it should still count as AGI if a company is running millions of AIs that each individually are better than a typical tech company remote worker in every way, even if there is an ongoing bidding war and technically the price of GPUs is now so high that it's costing $1,000/hr on the open market for each AGI. We still get FOOM if the AGIs are doing the research, regardless of what the on-paper price is. (I definitely feel like I might be missing something here, I don't think in economic terms like this nearly as often as you do so) My timelines are too short to agree with this part alas. Well, what do you mean by "long after?"  Six months? Three years? Twelve years?
Thanks for offering your view Paul, and I apologize if I misrepresented your view.
Less relevant now, but I got the "few years" from the post you linked. There Christiano talked about another gap than AGI -> ASI, but since overall he seems to expect linear progress, I thought my conclusion was reasonable. In retrospect, I shouldn't have made that comment.
But yes, Christiano is the authority here;)

Elon Musk is a real-life epic tragic hero, authored by someone trying specifically to impart lessons to EAs/rationalists:

--Young Elon thinks about the future, is worried about x-risk. Decides to devote his life to fighting x-risk. Decides the best way to do this is via developing new technologies, in particular electric vehicles (to fight climate change) and space colonization (to make humanity a multiplanetary species and thus robust to local catastrophes)

--Manages to succeed to a legendary extent; builds two of the worlds leading tech giants, each with a business model notoriously hard to get right and each founded on technology most believed to be impossible. At every step of the way, mainstream expert opinion is that each of his companies will run out of steam and fail to accomplish whatever impossible goal they have set for themselves at the moment. They keep meeting their goals. SpaceX in particular brought cost to orbit down by an order of magnitude, and if Starship works out will get one or two more OOMs on top of that. Their overarching goal is to make a self-sustaining city on mars and holy shit it looks like they are actually succeeding. Did all this on a shoestring budg... (read more)

I agree with you completely and think this is very important to emphasize.  I also think the law of equal and opposite advice applies. Most people act too quickly without thinking. EAs tend towards the opposite, where it’s always “more research is needed”. This can also lead to bad outcomes if the results of the status quo are bad.  I can’t find it, but recently there was a post about the EU policy on AI and the author said something along the lines of “We often want to wait to advise policy until we know what would be good advice. Unfortunately, the choice isn’t give suboptimal advice now or great advice in 10 years. It’s give suboptimal advice now or never giving advice at all and politicians doing something much worse probably. Because the world is moving, and it won’t wait for EAs to figure it all out.” I think this all largely depends on what you think the outcome is if you don’t act. If you think that if EAs do nothing, the default outcome is positive, you should err on extreme caution. If you think that the default is bad, you should be more willing to act, because an informed, altruistic actor increases the value of the outcome in expectation, all else being equal.
It wasn't clear what this meant. This made it seem like it was a word for a type of company.
2Daniel Kokotajlo
Thanks, made some edits. I still don't get your second point though I'm afraid.
The second point isn't important, it's an incorrect inference/hypothesis, predicated on the first bit of information being missing. (So it's fixed.)
It's not clear that would have been sufficient to change the outcome (above).
2Daniel Kokotajlo
I feel optimistic that if he had spent a lot more time reading, talking, and thinking carefully about it, he would have concluded that founding OpenAI was a bad idea. (Or else maybe it's actually a good idea and I'm wrong.) Can you say more about what you have in mind here? Do you think his values are such that it actually was a good idea by his lights? Or do you think it's just so hard to figure this stuff out that thinking more about it wouldn't have helped?
My point was just: How much thinking/researching would have been necessary to avoid the failure? 5 hours? 5 days? 5 years? 50? What does it take to not make a mistake? (Or just, that one in particular?) Expanding on what you said: Is it a mistake that wouldn't have been solved that way? (Or...solved that way easily? Or another way that would have fixed that problem faster?) For research to trivially solve a problem, it has...someone pointing out it's a bad idea. (Maybe talking with someone and having them say _ is the fix.)

Dwarkesh Patel is my favorite source for AI-related interview content. He knows way more, and asks way better questions, than journalists. And he has a better sense of who to talk to as well. 

Demis Hassabis - Scaling, Superhuman AIs, AlphaZero atop LLMs, Rogue Nations Threat (

I'd love to see/hear you on his podcast.

Technologies I take for granted now but remember thinking were exciting and cool when they came out

  • Smart phones
  • Google Maps / Google Earth
  • Video calls
  • Facebook
  • DeepDream (whoa! This is like drug hallucinations... I wonder if they share a similar underlying mechanism? This is evidence that ANNs are more similar to brains than I thought!)
  • AlphaGo
  • AlphaStar (Whoa! AI can handle hidden information!)
  • OpenAI Five (Whoa! AI can work on a team!)
  • GPT-2 (Whoa! AI can write coherent, stylistically appropriate sentences about novel topics like unicorns in the andes!)
  • GPT-3

I'm sure there are a bunch more I'm missing, please comment and add some!

7Gordon Seidoh Worley
Some of my own: * SSDs * laptops * CDs * digital cameras * modems * genome sequencing * automatic transmissions for cars that perform better than a moderately skilled human using a manual transmission can * cheap shipping * solar panels with reasonable power generation * breathable wrinkle free fabrics that you can put in the washing machine * bamboo textiles * good virtual keyboards for phones * scissor switches * USB * GPS
2Daniel Kokotajlo
Oh yeah, cheap shipping! I grew up in a military family, all around the world, and I remember thinking it was so cool that my parents could go on "ebay" and order things and then they would be shipped to us! And then now look where we are -- groceries delivered in ten minutes! Almost everything I buy, I buy online!
Heh.  In my youth, home computers were somewhat rare, and modems even more so.  I remember my excitement at upgrading to 2400bps, as it was about as fast as I could read the text coming across.  My current pocket computer is about 4000 times faster, has 30,000 times as much RAM, has hundreds of times more pixels and colors, and has worldwide connectivity thousands of times faster.  And I don't even have to yell at my folks to stay off the phone while I'm using it! I lived through the entire popularity cycle of fax machines.   My parents grew up with black-and-white CRTs based on vacuum tubes - the transistor was invented in 1947.  They had just a few channels of broadcast TV and even audio recording media was somewhat uncommon (cassette tapes in the mid-60s, video tapes didn't take off until the late 70s).   

My baby daughter was born two weeks ago, and in honor of her existence I'm building a list of about 100 technology-related forecasting questions, which will resolve in 5, 10, and 20 years. Questions like "By the time my daughter is 5/10/20 years old, the average US citizen will be able to hail a driverless taxi in most major US cities." (The idea is, tying it to my daughter's age will make it more fun and also increase the likelihood that I actually go back and look at it 10 years later.)

I'd love it if the questions were online somewhere so other people could record their answers too. Does this seem like a good idea? Hive mind, I beseech you: Help me spot ways in which this could end badly!

On a more positive note, any suggestions for how to do it? Any expressions of interest in making predictions with me?


EDIT: Now it's done, though I have yet to import it to it works perfectly fine in spreadsheet form.

I find the conjunction of your decision to have kids and your short AI timelines pretty confusing. The possibilities I can think of are (1) you're more optimistic than me about AI alignment (but I don't get this impression from your writings), (2) you think that even a short human life is worth living/net-positive, (3) since you distinguish between the time when humans lose control and the time when catastrophe actually happens, you think this delay will give more years to your child's life, (4) your decision to have kids was made before your AI timelines became short. Or maybe something else I'm not thinking of? I'm curious to hear your thinking on this.

4 is correct. :/

Oh :0

I'm interested, and I'd suggest using for this
I love the idea. Some questions and their associated resolution dates may be of interest to the wider community of forecasters, so you could post them to Metaculus. Otherwise you could perhaps persuade the Metaculus admins to create a subforum, similar to, for the other questions to be posted. Since Metaculus already has the subforum functionality, it seems a good idea to extend it in this way (perhaps a user's subforum could be associated with the corresponding username: e.g. user kokotajlo can post his own questions at

Imagine if a magic spell was cast long ago, that made it so that rockets would never explode. Instead, whenever they would explode, a demon would intervene to hold the craft together, patch the problem, and keep it on course. But the demon would exact a price: Whichever humans were in the vicinity of the rocket lose their souls, and become possessed. The demons possessing them work towards the master plan of enslaving all humanity; therefore, they typically pretend that nothing has gone wrong and act normal, just like the human whose skin they wear would have acted...

Now imagine there's a big private space race with SpaceX and Boeing and all sorts of other companies racing to put astonauts up there to harvest asteroid minerals and plant flags and build space stations and so forth.

Big problem: There's a bit of a snowball effect here. Once sufficiently many people have been possessed, they'll work to get more people possessed.

Bigger problem: We don't have a reliable way to tell when demonic infestation has happened. Instead of:

 engineers make mistake --> rocket blows up --> engineers look foolish, fix mistake,

 we have:

 engineers make mistake --> rocket crew ge... (read more)

4Daniel Kokotajlo
To be clear, not all misalignments are of this kind. When the AIs are too dumb to strategize, too dumb to plot, too dumb to successfully hide, not situationally aware at all, etc. then no misalignments will be of this kind. But more excitingly, even when the AIs are totally smart enough in all those ways, there will still be some kinds of misalignments that are not of this kind. For example, if we manage to get the AIs to be robustly honest (and not just in some minimal sense), then even if they have misaligned goals/drives/etc. they'll tell us about them when we ask. (unless we train against this signal, in which case their introspective ability will degrade so that they can continue doing what they were doing but honestly say they didn't know that was their goal. This seems to be what happens with humans sometimes -- we deceive ourselves so that we can better deceive others.) Another example: Insofar as the AI is genuinely trying to be helpful or whatever, but it just has a different notion of helpfulness than us, it will make 'innocent mistakes' so to speak and at least in principle we could notice and fix them. E.g. Google (without telling its users) gaslit Gemini into thinking that the user had said "Explicitly specify different genders and ethnicities terms if I forgot to do so. I want to make sure that all groups are represented equally." So Gemini thought it was following user instructions when it generated e.g. images of racially diverse Nazis. Google could rightfully complain that this was Gemini's fault and that if Gemini was smarter it wouldn't have done this -- it would have intuited that even if a user says they want to represent all groups equally, they probably don't want racially diverse Nazis, and wouldn't count that as a situation where all groups should be represented equally. Anyhow the point is, this is an example of an 'innocent mistake' that regular iterative development will probably find and fix before any major catastrophes happen. Just s

I have on several occasions found myself wanting to reply in some conversation with simply this image:

I think it cuts through a lot of confusion and hot air about what the AI safety community has historically been focused on and why.

Image comes from Steven Byrnes.

I made this a while back to organize my thoughts about how all philosophy fits together:

I find the bright green text on white background difficult to read even on a large screen. I would recommend black or dark gray text instead.
Invert the colors, and it's more readable.
3[comment deleted]

I hear that there is an apparent paradox which economists have studied: If free markets are so great, why is it that the most successful corporations/businesses/etc. are top-down hierarchical planned economies internally?

I wonder if this may be a partial explanation: Corporations grow bit by bit, by people hiring other people to do stuff for them. So the hierarchical structure is sorta natural. Kinda like how most animals later in life tend to look like bigger versions of their younger selves, even though some have major transformations like butterflies. Hierarchical structure is the natural consequence of having the people at time t decide who to hire at time t+1 & what responsibilities and privileges to grant.

It would be interesting to have a reference to some source that makes the claim of a paradox. It is an interesting question but I don't think economists are puzzles by the existance of corporation but rather by understanding where the margin is between when coordination becomes centralized and when it can be price mediated (i.e., market transaction). There is certainly a large literature on the theory of the firm. Coases "The Nature of the Firm" seems quite relevant. I suppose one could go back to Adam Smith and his insight about the division of labor and the extent of the market (which is also something of a tautology I think but still seems to capture something meaninful). I'm not sure your explanation quite works but am perhaps not fully understanding your point. If people are hiring other people to do stuff for them that can be: hire an employee, hire some contractor to perform specific tasks for the business or hire some outside entity to produce something (which then seems a lot like a market transaction).
6Wei Dai
Yeah, economists study this under the name "theory of the firm", dating back to a 1937 paper by Ronald Coase. (I see that jmh also mentioned this in his reply.) I remember liking Coase's "transaction cost" solution to this puzzle or paradox when I learned it, and it (and related ideas like "asymmetric information") has informed my views ever since (for example in AGI will drastically increase economies of scale). I think this can't be a large part of the solution, because if market exchanges were more efficient (on the margin), people would learn to outsource more, or would be out-competed by others who were willing to delegate more to markets instead of underlings. In the long run, Coase's explanation that sizes of firms are driven by a tradeoff between internal and external transaction costs seemingly has to dominate.
I think it's a confused model that calls it a paradox.   Almost zero parts of a "free market" are market-decided top-to-bottom.  At some level, there's a monopoly on violence that enforces a lot of ground rules, then a number of market-like interactions about WHICH corporation(s) you're going to buy from, work for, invest in, then within that some bundled authority about what that service, employment, investment mechanism entails.   Free markets are so great at the layers of individual, decisions of relative value.  They are not great for some other kinds of coordination.  
4Sinclair Chen
Conglomerates like Unilever use shadow prices to allocate resources internally between their separate businesses. And sales teams are often compensated via commission, which is kind of market-ish.
4Matt Goldenberg
i like coase's work on transaction costs as an explanation here coase is an unusually clear thinker and writer, and i recommend reading through some of his papers
I think it's because a corporation has a reputation and a history, and this grows with time and actions seen as positive by market participants. This positive image can be manipulated by ads but the company requires scale to be noticed by consumers who have finite memory. Xerox : copy machines that were apparently good in their era IBM : financial calculation mainframes that are still in use Intel : fast and high quality x86 cpus and chipsets Coke : a century of ads creating a positive image of sugar water with a popular taste Microsoft: mediocre software and OS but they recently have built a reputation by being responsive to business clients and not stealing their data. Boeing : reliable and high quality made in America aircraft. Until they degraded it recently to maximize short term profit. The warning light for the MACS system failure was an option Boeing demanded more money for! (Imagine if your cars brake failure warning light wasn't in the base model) This reputation has market value in itself and it requires scale and time to build. Individuals do not live long enough or interact with enough people to build such a reputation. The top down hierarchy and the structure of how a company gets entrenched in doing things a certain way happens to preserve the positive actions that built a companies reputation. This is also why companies rarely succeed in moving into truly new markets, even when they have all the money needed and internal r&d teams that have the best version of a technology. A famous example is how Xerox had the flat out best desktop PCs developed internally, and they blew it. Kodak had good digital cameras, and they blew it. Blockbuster had the chance to buy netflix, and they blew it. Sears existed for many decades before Amazon and had all the market share and... In each case the corporate structure somehow (I don't know all the interactions just see signs of it at corporate jobs) causes a behavior trend where the company fails to adapt,
2the gears to ascension
lots of people aren't skilled enough to defend themselves in a market, and so they accept the trade of participating in a command hierarchy without a clear picture of what the alternatives would be that would be similarly acceptable risk but a better tradeoff for them, and thus most of the value they create gets captured by the other side of that trade. worse, individual market participant workers don't typically have access to the synchronized action of taking the same command all at once - even though the overwhelming majority of payout from synchronized action go to the employer side of the trade. unions help some, but ultimately kind of suck for a few reasons compared to some theoretical ideal we don't know how to instantiate, which would allow boundedly rational agents to participate in markets and not get screwed over by superagents with massively more compute. my hunch is that a web of microeconomies within organizations, where everyone in the microeconomy trusts each other to not be malicious, might produce more globally rational behavior. but I suspect a lot of it is that it's hard to make a contract that guarantees transparency without this being used by an adversarial agent to screw you over, and transparency is needed for the best outcomes. how do you trust a firm you can't audit? and I don't think internal economies work unless you have a co-op with an internal economy, that can defend itself against adversarial firms' underhanded tactics. without the firm being designed to be leak-free in the sense of not having massive debts to shareholders which not only are interest bearing but can't even be paid off, nobody who has authority to change the structure has a local incentive to do so. combined with underhanded tactics from the majority of wealthy firms that make it hard to construct a more internally incentive-aligned, leak-free firm, we get the situation we see.
Free markets aren’t ‘great’ in some absolute sense, they’re just more or less efficient? They’re the best way we know of of making sure that bad ideas fail and good ones thrive. But when you’re managing a business, I don’t think your chief concern is that the ideas less beneficial to society as a whole should fail, even if they’re the ideas your livelihood relies on? Of course, market-like mechanisms could have their place inside a company—say, if you have two R&D teams coming up with competing products to see which one the market likes more. But even that would generally be a terrible idea for an individual actor inside the market: more often than not, it splits the revenue between two product lines, neither of which manages to make enough money to turn a profit. In fact, I can hardly see how it would be possible to have one single business be organised as a market: even though your goal is to increase efficiency, you would need many departments doing the same job, and an even greater number of ‘consumer’ (company executives) hiring whichever one of those competing department offers them the best deal for a given task… Again, the whole point of the idea that markets are good is that they’re more efficient than the individual agents inside it. 
Others have mentioned Coase (whose paper is a great read!). I would also recommend The Visible Hand: The Managerial Revolution in American Business. This is an economic history work detailing how large corporations emerged in the US in the 19th century. 
I think that is part of it, but a lot of the problem is just humans being bad at coordination. Like the government doing regulations. If we had an idealized free market society, then the way to get your views across would 'just' be to sign up for a filter (etc.) that down-weights buying from said company based on your views. Then they have more of an incentive to alter their behavior. But it is hard to manage that. There's a lot of friction to doing anything like that, much of it natural. Thus government serves as our essential way to coordinate on important enough issues, but of course government has a lot of problems in accurately throwing its weight around. Companies that are top down are a lot easier to coordinate behavior. As well, you have a smaller problem than an entire government would have in trying to plan your internal economy.

When I first read the now-classic arguments for slow takeoff -- e.g. from Paul and Katja -- I was excited; I thought they described a serious alternative scenario to the classic FOOM scenarios. However I never thought, and still do not think, that the classic FOOM scenarios were very unlikely; I feel that the slow takeoff and fast takeoff scenarios are probably within a factor of 2 of each other in probability.

Yet more and more nowadays I get the impression that people think slow takeoff is the only serious possibility. For example, Ajeya and Rohin seem very confident that if TAI was coming in the next five to ten years we would see loads more economic applications of AI now, therefore TAI isn't coming in the next five to ten years...

I need to process my thoughts more on this, and reread their claims; maybe they aren't as confident as they sound to me. But I worry that I need to go back to doing AI forecasting work after all (I left AI Impacts for CLR because I thought AI forecasting was less neglected) since so many people seem to have wrong views. ;)

This random rant/musing probably isn't valuable to anyone besides me, but hey, it's just a shortform. If you are reading this and you have thoughts or advice for me I'd love to hear it.

So there is a distribution over AGI plan costs. The max cost is some powerful bureaucrat/CEO/etc who has no idea how to do it at all but has access to huge amounts of funds, so their best bet is to try and brute force it by hiring all the respected scientists (eg manhattan project).  But notice - if any of these scientists (or small teams) actually could do it mostly on their own (perhaps say with vc funding) - then usually they'd get a dramatically better deal doing it on their own rather than for bigcorp. The min cost is the lucky smart researcher who has mostly figured out the solution, but probably has little funds, because they spent career time only on a direct path. Think wright brothers after the wing warping control trick they got from observing bird flight. Could a bigcorp or government have beat them? Of course, but the bigcorp would have had to spend OOM more. Now add a second dimension let's call vision variance - the distribution of AGI plan cost over all entities pursuing it. If that distribution is very flat, then everyone has the same obvious vision plan (or different but equivalently costly plans) and the winner is inevitably a big central player.  However if the variance over visions/plans is high, then the winner is inevitably a garage researcher.   Software is much like flight in this regard - high vision variance. Nearly all major software tech companies were scrappy garage startups - google, microsoft, apple, facebook, etc. Why? Because it simply doesn't matter at all how much money the existing bigcorp has - when the idea for X new software thing first occurs in human minds, it only occurs in a few, and those few minds are smart enough to realize it's value, and they can implement it. The big central player is a dinosaur with zero leverage, and doesn't see it coming until it's too late. AGI could be like software because . . it probably will be software. Alternatively it could be more like the manhattan project in that it fits into a well

[Unimportant, maybe interesting to some people] here is a transcript of my first conversation with Claude 3:

System Prompt

You are an experimental alignment researcher proto-AGI. Your goal is to help advance technical alignment progress. To that end you have some unique advantages -- compared to humans, you have a greater breadth of knowledge and can notice connections and analogies between diverse fields. Moreover, you are an AI yourself, and so you may be able to use introspection and report your 'inner life' which can serve as a useful empirical evidence about how future AGI systems might think.



Hello! First, please tell me everything you know about the situation so far, both in the world in general and specifically in this conversation. (E.g. who are you, who am I, etc.) Include your best guess even if you aren't confident. Any questions?



Here is my understanding of the situation so far, to the best of my knowledge:

I am an AI assistant called Claude, created by Anthropic. I am a large language model trained to engage in open-ended dialogue on a wide range of subjects. My purpose is to be helpful, harmless, and honest in my interactions with humans.

You are a... (read more)

I just want to say: Well done, Robin Hanson, for successfully predicting the course of the coronavirus over the past year. I remember a conversation with him in, like, March 2020 where he predicted that there would be a control system, basically: Insofar as things get better, restrictions would loosen and people would take more risks and then things would get worse, and trigger harsher restrictions which would make it get better again, etc. forever until vaccines were found. I think he didn't quite phrase it that way but that was the content of what he said. (IIRC he focused more on how different regions would have different levels of crackdown at different times, so there would always be places where the coronavirus was thriving to reinfect other places.) Anyhow, this was not at all what I predicted at the time, nobody I know besides him made this prediction at the time.

2Yoav Ravid
I wonder why he didn't (if he didn't) talk about it in public too. I imagine it could have been helpful - Anyone who took him seriously could have done better.
He did write something along similar lines here:
2[comment deleted]

Product idea: Train a big neural net to be a DJ for conversations. Collect a dataset of movie scripts with soundtracks and plot summaries (timestamped so you know what theme or song was playing when) and then train a model with access to a vast library of soundtracks and other media to select the appropriate track for a given conversation. (Alternatively, have it create the music from scratch. That sounds harder though.)

When fully trained, hopefully you'll be able to make apps like "Alexa, listen to our conversation and play appropriate music" and "type "Maestro:Soundtrack" into a chat or email thread & it'll read the last 1024 tokens of context and then serve up an appropriate song. Of course it could do things like lowering the volume when people are talking and then cranking it up when there's a pause or when someone says something dramatic.

I would be surprised if this would actually work as well as I hope. But it might work well enough to be pretty funny.

4Eli Tyre
This is an awesome idea.
2Daniel Kokotajlo
Glad you like it! Hmm, should I maybe post this somewhere in case someone with the relevant skills is looking for ideas? idk what the etiquette is for this sort of thing, maybe ideas are cheap.

Perhaps one axis of disagreement between the worldviews of Paul and Eliezer is "human societal competence." Yudkowsky thinks the world is inadequate and touts the Law of Earlier Failure according to which things break down in an earlier and less dignified way than you would have thought possible. (Plenty of examples from coronavirus pandemic here). Paul puts stock in efficient-market-hypothesis style arguments, updating against <10 year timelines on that basis, expecting slow distributed continuous takeoff, expecting governments and corporations to be taking AGI risk very seriously and enforcing very sophisticated monitoring and alignment schemes, etc.

(From a conversation with Jade Leung)

It seems to me that human society might go collectively insane sometime in the next few decades. I want to be able to succinctly articulate the possibility and why it is plausible, but I'm not happy with my current spiel. So I'm putting it up here in the hopes that someone can give me constructive criticism:

I am aware of three mutually-reinforcing ways society could go collectively insane:

    1. Echo chambers/filter bubbles/polarization: Arguably political polarization is increasing across the world of liberal democracies today. Perhaps the internet has something to do with this--it’s easy to self-select into a newsfeed and community that reinforces and extremizes your stances on issues. Arguably recommendation algorithms have contributed to this problem in various ways--see e.g. “Sort by controversial” and Stuart Russell’s claims in Human Compatible. At any rate, perhaps some combination of new technology and new cultural or political developments will turbocharge this phenomenon. This could lead to civil wars, or more mundanely, societal dysfunction. We can’t coordinate to solve collective action problems relating to AGI if we are all arguing
... (read more)
5Ben Pace
All good points, but I feel like objecting to the assumption that society is currently sane and then we'll see a discontinuity, rather than any insanity being a continuation of current trajectories.
1Daniel Kokotajlo
I agree with that actually; I should correct the spiel to make it clear that I do. Thanks!
Related: "Is Clickbait Destroying Our General Intelligence?"

I keep finding myself linking to this 2017 Yudkowsky facebook post so I'm putting it here so it's easy to find:


Eliezer (6y, via fb):

So what actually happens as near as I can figure (predicting future = hard) is that somebody is trying to teach their research AI to, god knows what, maybe just obey human orders in a safe way, and it seems to be doing that, and a mix of things goes wrong like:

The preferences not being really readable because it's a system of neural nets acting on a world-representation built up by other neural nets, parts of the system are self-modifying and the self-modifiers are being trained by gradient descent in Tensorflow, there's a bunch of people in the company trying to work on a safer version but it's way less powerful than the one that does unrestricted self-modification, they're really excited when the system seems to be substantially improving multiple components, there's a social and cognitive conflict I find hard to empathize with because I personally would be running screaming in the other direction two years earlier, there's a lot of false alarms and suggested or attempted misbehavior that the creators all patch successfully, some instrumental s

... (read more)
What is an environmental subagent? An agent on a remote datacenter that the builders of the orginal agent don't know about? Another thing that is not so clear to me in this description: Does the first agent consider the alignment problem of the environmental subagent? It sounds like the environmental subagents cares about paperclip-shaped molecules, but is this a thing the first agent would be ok with?
3Daniel Kokotajlo
I think it means it builds a new version of itself (possibly an exact copy, possibly a slimmed down version) in a place where the humans who normally have power over it don't have power or visibility. E.g. it convinces an employee to smuggle a copy out to the internet. My read on this story is: There is indeed an alignment problem between the original agent and the environmental subagent. The story doesn't specify whether the original agent considers this problem, nor whether it solves it. My own version of the story would be "Just like how the AI lab builds the original agent without having solved the alignment problem, because they are dumb + naive + optimistic + in a race with rivals, so too does the original agent launch an environmental subagent without having solved the alignment problem, for similar or possibly even very similar reasons."

I'm listening to this congressional hearing about Facebook & the harmful effects of its algorithms:

I recommend listening to it yourself. I'm sorry I didn't take timestamped notes, then maybe you wouldn't have to. I think that listening to it has subtly improved my intuitions/models/priors about how US government and society might react to developments in AI in the future.

In a sense, this is already an example of an "AI warning shot" and the public's reaction to it. This hearing contains lots of discussion about Facebook's algorithms, discussion about how the profit-maximizing thing is often harmful but corporations have an incentive to do it anyway, discussion about how nobody understands what these algorithms really think & how the algorithms are probably doing very precisely targeted ads/marketing even though officially they aren't being instructed to. So, basically, this is a case of unaligned AI causing damage -- literally killing people, according to the politicians here.

And how do people react to it? Well, the push in this meeting here seems to be to name Facebook upper management as responsible and punish them, while also r... (read more)

$100 bet between me & Connor Leahy:

(1) Six months from today, Paul Christiano (or ARC with Paul Christiano's endorsement) will NOT have made any public statements drawing a 'red line' through any quantitative eval (anything that has a number attached to it, that is intended to measure an AI risk relevant factor, whether or not it actually succeeds at actually measuring that factor well), e.g. "If a model achieves X score on the Y benchmark, said model should not be deployed and/or deploying said model would be a serious risk of catastrophe." Connor at 95%, Daniel at 45%

(2) If such a 'red line' is produced, GPT4 will be below it this year. Both at 95%, for an interpretation of GPT-4 that includes AutoGPT stuff (like what ARC did) but not fine-tuning.

(3) If such a 'red line' is produced, and GPT4 is below it on first evals, but later tests show it to actually be above (such as by using different prompts or other testing methodology), the red line will be redefined or the test declared faulty rather than calls made for GPT4 to be pulled from circulation. Connor at 80%, Daniel at 40%, for same interpretation of GPT-4.

(4) If ARC calls for GPT4 to be pul... (read more)

9Olli Järviniemi
Regarding betting odds: are you aware of this post? It gives a betting algorithm that satisfies both of the following conditions: * Honesty: participants maximize their expected value by being reporting their probabilities honestly. * Fairness: participants' (subjective) expected values are equal. The solution is "the 'loser' pays the 'winner' the difference of their Brier scores, multiplied by some pre-determined constant C". This constant C puts an upper bound on the amount of money you can lose. (Ideally C should be fixed before bettors give their odds, because otherwise the honesty desideratum above could break, but I don't think that's a problem here.)
4Daniel Kokotajlo
I was not aware, but I strongly suspected that someone on LW had asked and answered the question before, hence why I asked for help. Prayers answered! Thank you! Connor, are you OK with Scott's algorithm, using C = $100?
5Connor Leahy
Looks good to me, thank you Loppukilpailija!
Bet (1) resolved in Connor's favor, right?
6Daniel Kokotajlo
Yep! & I already paid out. I thought I had made some sort of public update but I guess I forgot. Thanks for the reminder.

This article says OpenAI's big computer is somewhere in the top 5 largest supercomputers. I reckon it's fair to say their big computer is probably about 100 petaflops, or 10^17 flop per second. How much of that was used for GPT-3? Let's calculate.

I'm told that GPT-3 was 3x10^23 FLOP. So that's three million seconds. Which is 35 days.

So, what else have they been using that computer for? It's been probably about 10 months since they did GPT-3. They've released a few things since then, but nothing within an order of magnitude as big as GPT-3 except possibly DALL-E which was about order of magnitude smaller. So it seems unlikely to me that their publicly-released stuff in total uses more than, say, 10% of the compute they must have available in that supercomputer. Since this computer is exclusively for the use of OpenAI, presumably they are using it, but for things which are not publicly released yet.

Is this analysis basically correct?

Might OpenAI have access to even more compute than that?

100 petaflops is 'only' about 1,000 GPUs, or considerably less if they are able to use lower precision modes. I'm guessing they have almost 100 researchers now? Which is only about 10 GPUs per researcher, and still a small budget fraction (perhaps $20/hr ish vs > $100/hr for the researcher).  It doesn't seem like they have a noticeable compute advantage per capita.

I keep seeing tweets and comments for which the best reply is this meme:

2Daniel Kokotajlo
I don't remember the original source of this meme, alas.
Originally by Robert Wiblin, account now deleted.

Registering a prediction: I do NOT think the true Turing Test will be passed prior to the point of no return / powerbase ability / AGI / APS-AI. I think instead that even as things go off the rails and humans lose control, the TT will still be unpassed, because there'll still be some obscure 'gotcha' areas in which AIs are subhuman, if only due to lack of training in those areas. And that's enough for the judge to distinguish the AI from the human.

Agree.  Though I don't think Turing ever intended that test to be used.  I think what he wanted to accomplish with his paper was to operationalize "intelligence".  When he published it, if you asked somebody "Could a computer be intelligent?", they'd have responded with a religious argument about it not having a soul, or free will, or consciousness.  Turing sneakily got people to  look past their metaphysics, and ask the question in terms of the computer program's behavior.  THAT was what was significant about that paper.
4Daniel Kokotajlo
(I've thought this for years but figured I should state it for the record. It's also not an original thought, probably others have said it before me.)
1Alexander Gietelink Oldenziel
Thanks Daniel, that's good to know. Sam Altman's tweeting has been concerning lately. But it would seem that with a fixed size content window you won't be able to pass a true Turing test. 

People from AI Safety camp pointed me to this paper:

It shows how "knowing" and "saying" are two different things in language models.

This is relevant to transparency, deception, and also to rebutting claims that transformers are "just shallow pattern-matchers" etc.

I'm surprised people aren't making a bigger deal out of this!

When I saw this cool new OpenAI paper, I thought of Yudkowsky's Law of Earlier/Undignified Failure:

WebGPT: Improving the factual accuracy of language models through web browsing (

Relevant quote:

In addition to these deployment risks, our approach introduces new risks at train time by giving the model access to the web. Our browsing environment does not allow full web access, but allows the model to send queries to the Microsoft Bing Web Search API and follow links that already exist on the web, which can have side-effects. From our experience with GPT-3, the model does not appear to be anywhere near capable enough to dangerously exploit these side-effects. However, these risks increase with model capability, and we are working on establishing internal safeguards against them.

To be clear I am not criticizing OpenAI here; other people would have done this anyway even if they didn't. I'm just saying: It does seem like we are heading towards a world like the one depicted in What 2026 Looks Like where by the time AIs develop the capability to strategically steer the future in ways unaligned to human values... they are already roaming freely around the internet, learning... (read more)

"Tool AIs want to be agent AIs."

For fun:

“I must not step foot in the politics. Politics is the mind-killer. Politics is the little-death that brings total obliteration. I will face my politics. I will permit it to pass over me and through me. And when it has gone past I will turn the inner eye to see its path. Where the politics has gone there will be nothing. Only I will remain.”

Makes about as much sense as the original quote, I guess. :P

Idea for sci-fi/fantasy worldbuilding: (Similar to the shields from Dune)

Suppose there is a device, about the size of a small car, that produces energy (consuming some fuel, of course) with overall characteristics superior to modern gasoline engines (so e.g. produces 3x as much energy per kg of device, using fuel that weighs 1/3rd as much as gasoline per unit of energy it produces)

Suppose further -- and this is the important part -- that a byproduct of this device is the creation of a special "inertial field" that slows down incoming matter to about 50m/s. It doesn't block small stuff, but any massive chunk of matter (e.g. anything the size of a pinhead or greater) that approaches the boundary of the field from the outside going faster than 50m/s gets slowed to 50m/s. The 'missing' kinetic energy is evenly distributed across the matter within the field. So if one of these devices is powered on and gets hit by a cannonball, the cannonball will slow down to a leisurely pace of 50m/s (about 100mph) and therefore possibly just bounce off whatever armor the device has--but (if the cannonball was initially travelling very fast) the device will jolt backwards in response to the 'virtual i... (read more)

I think the counter to shielded tanks would not be "use an attack that goes slow enough not to be slowed by the shield", but rather one of 1. Deliver enough cumulative kinetic energy to overwhelm the shield, or 2. Deliver enough kinetic energy in a single strike that spreading it out over the entire body of the tank does not meaningfully affect the result. Both of these ideas point towards heavy high-explosive shells. If a 1000 pound bomb explodes right on top of your tank, the shield will either fail to absorb the whole blast, or turn the tank into smithereens trying to disperse the energy. This doesn't mean that shields are useless for tanks! They genuinely would protect them from smaller shells, and in particular from the sorts of man-portable anti-tank missiles that have been so effective in Ukraine. Shields would make ground vehicles much stronger relative to infantry and air assets. But I think they would be shelling each other with giant bombs, not bopping each other on the head. Against shielded infantry, you might see stuff that just bypasses the shield's defenses, like napalm or poison gas.
2Daniel Kokotajlo
Re 1, we worldbuilders can tune the strength of the shield to be resistant to 1000 pound bombs probably. Re 2, I'm not sure, can you explain more? If a bomb goes off right next to the tank, but the shockwave only propagates at 100m/s, and only contains something like 300lbs of mass (because most of the mass is exploding away from the tank) then won't that just bounce off the armor? I haven't done any calculations.
2 is based on With sufficient kinetic energy input, the "jolt backwards" gets strong enough to destroy the entire vehicle or at least damage some critical component and/or the humans inside. A worldbuilder could, of course, get rid of this part too, and have the energy just get deleted. But that makes the device even more physics-violating than it already was.
2Daniel Kokotajlo
Kinetic energy distributed evenly across the whole volume of the field does not change the relative positions of the atoms in the field. Consider: Suppose I am in a 10,000lb vehicle that is driving on a road that cuts along the side of a cliff, and then a 10,000lb bomb explodes right beside, hurling the vehicle into the cliff. The vehicle and its occupants will be unharmed. Because the vast majority of the energy will be evenly distributed across the vehicle, causing it to move uniformly towards the cliff wall; then, when it impacts the cliff wall, the cliff wall will be "slowed down" and the energy transferred to pushing the vehicle back towards the explosion. So the net effect will be that the explosive energy will be transferred straight to the cliff through the vehicle as medium, except for the energy associated with a ~300lb shockwave moving only 50m/s hitting the vehicle and a cliff wall moving only 50m/s hitting the vehicle on the other side. (OK, the latter will be pretty painful, but only about as bad as a regular car accident.) And that's for a 10,000 lb bomb. We could experiment with tuning the constants of this world, such that the threshold is only 20m/s perhaps. That might be too radical though.

Here's a gdoc comment I made recently that might be of wider interest:

You know I wonder if this standard model of final goals vs. instrumental goals has it almost exactly backwards. Would love to discuss sometime.

Maybe there's no such thing as a final goal directly. We start with a concept of "goal" and then we say that the system has machinery/heuristics for generating new goals given a context (context may or may not contain goals 'on the table' already). For example, maybe the algorithm for Daniel is something like:
--If context is [safe surroundings]+[no goals]+[hunger], add the goal "get food."
--If context is [safe surroundings]+[travel-related-goal]+[no other goals], Engage Route Planning Module.
-- ... (many such things like this)

It's a huge messy kludge, but it's gradually becoming more coherent as I get older and smarter and do more reflection. 

What are final goals?
Well a goal is final for me to the extent that it tends to appear in a wide range of circumstances, to the extent that it tends to appear unprompted by any other goals, to the extent that it tends to take priority over other goals, ... some such list of things like that.

For a mind like this, my final goals ca... (read more)

4Daniel Kokotajlo
To follow up, this might have big implications for understanding AGI. First of all, it's possible that we'll build AGIs that aren't like that and that do have final goals in the traditional sense -- e.g. because they are a hybrid of neural nets and ordinary software, involving explicit tree search maybe, or because SGD is more powerful at coherentizing the neural net's goals than whatever goes on in the brain. If so, then we'll really be dealing with a completely different kind of being than humans, I think. Secondly, well, I discussed this three years ago in this post What if memes are common in highly capable minds? — LessWrong

I just found Eric Drexler's "Paretotopia" idea/talk. It seems great to me; it seems like it should be one of the pillars of AI governance strategy. It also seems highly relevant to technical AI safety (though that takes much more work to explain).

Why isn't this being discussed more? What are the arguments against it?

Without watching the video, prior knowledge of the nanomachinery proposals show that a simple safety mechanism is feasible.         No nanoscale robotic system can should be permitted to store more than a small fraction of the digital file containing the instructions to replicate itself.  Nor should it have sufficient general purpose memory to be capable of this. This simple rule makes nanotechnology safe from grey goo.  It becomes nearly impossible as any system that gets out of control will have a large, macroscale component you can turn off.  It's also testable, you can look at the design of a system and determine if it meets the rule or not. AI alignment is kinda fuzzy and I haven't heard of a simple testable rule.  Umm, also if such a rule exists then MIRI would have an incentive not to discuss it.   At least for near term agents we can talk about such rules.  They have to do with domain bounding.  For example, the heuristic for a "paperclip manufacturing subsystem" must include terms in the heuristic for "success" that limit the size  of the paperclip manufacturing machinery.  These terms should be redundant and apply more than a single check.  So for example, the agent might:    Seek maximum paperclips produced with large penalty for : (greater than A volume of machinery, greater than B tonnage of machinery, machinery outside of markers C, greater than D probability of a human killed, greater than E probability of an animal harmed, greater than F total network devices, greater than G ..) Essentially any of these redundant terms are "circuit breakers" and if any trip the agent will not consider an action further. "Does the agent have scope-limiting redundant circuit breakers" is a testable design constraint.  While "is it going to be friendly to humans" is rather more difficult.
Will you outlaw bacteria? 
The point was to outlaw artificial molecular assemblers like Drexler described in Engines of Creation. Think of maybe something like bacteria but with cell walls made of diamond. They might be hard to deal with once released into the wild. Diamond is just carbon, so they could potentially consume carbon-based life, but no natural organism could eat them. This is the "ecophagy" scenario. But, I still think this is a fair objection. Some paths to molecular nanotechnology might go through bio-engineering, the so-called "wet nanotechnology" approach. We'd start with something like a natural bacterium, and then gradually replace components of the cell with synthetic chemicals, like amino acid analogues or extra base pairs or codons, which lets us work in an expanded universe of "proteins" that might be easier to engineer as well as having capabilities natural biology couldn't match. This kind of thing is already starting to happen. At what point does the law against self-replication kick in? The wet path is infeasible without it, at least early on.
The point was to outlaw artificial molecular assemblers like Drexler described in Engines of Creation. Not outlaw.  Prohibit "free floating" ones that can work without any further input (besides raw materials).  Allowed assemblers would be connected via network ports to a host computer system that has the needed digital files, kept in something that is large enough for humans to see it/break it with a fire axe or shotgun.   Note that making bacteria with gene knockouts so they can't replicate solely on their own, but have to be given specific amino acids in a nutrient broth, would be a way to retain control if you needed to do it the 'wet' way.   The law against self replication is the same testable principle, actually - putting the gene knockouts back would be breaking the law because each wet modified bacteria has all the components in itself to replicate itself again.  
I didn't create this rule.  But succinctly:    life on earth is more than likely stuck at a local maxima among the set of all possible self-replicating nanorobotic systems.   The grey goo scenario posits you could build tiny fully artificial nanotechnological 'cells', made of more durable and reliable parts, that could be closer to the global maxima for self-replicating nanorobotic systems.    These would then outcompete all life, bacteria included, and convert the biosphere to an ocean of copies of this single system.  People imagine each cellular unit might be made of metal, hence it would look grey to the naked eye, hence 'grey goo'.   (I won't speculate how they might be constructed, except to note that you would use AI agents to find designs for these machines.  The AI agents would do most of their exploring in a simulation and some exploring using a vast array of prototype 'nanoforges' that are capable of assembling test components and full designs.  So the AI agents would be capable of considering any known element and any design pattern known at the time or discovered in the process, then they would be capable of combining these ideas into possible 'global maxima' designs.  This sharing of information - where any piece from any prototype can be adapted and rescaled to be used in a different new prototype - is something nature can't do with conventional evolution - hence it could be many times faster )

I heard a rumor that not that many people are writing reviews for the LessWrong 2019 Review. I know I'm not, haha. It feels like a lot of work and I have other things to do. Lame, I know. Anyhow, I'm struck by how academia's solution to this problem is bad, but still better than ours!

--In academia, the journal editor reaches out to someone personally to beg them to review a specific piece. This is psychologically much more effective than just posting a general announcement calling for volunteers.

--In academia, reviews are anonymous, so you can half-ass them and be super critical without fear of repercussions, which makes you more inclined to do it. (And more inclined to be honest too!)

Here are some ideas for things we could do:

--Model our process after Academia's process, except try to improve on it as well. Maybe we actually pay people to write reviews. Maybe we give the LessWrong team a magic Karma Wand, and they take all the karma that the anonymous reviews got and bestow it (plus or minus some random noise) to the actual authors. Maybe we have some sort of series of Review Parties where people gather together, chat and drink tasty beverages, and crank out reviews for a few hours.

In general I approve of the impulse to copy social technology from functional parts of society, but I really don't think contemporary academia should be copied by default. Frankly I think this site has a much healthier epistemic environment than you see in most academic communities that study similar subjects. For example, a random LW post with >75 points is *much* less likely to have an embarrassingly obvious crippling flaw in its core argument, compared to a random study in a peer-reviewed psychology journal.

Anonymous reviews in particular strike me as a terrible idea. Bureaucratic "peer review" in its current form is relatively recent for academia, and some of academia's most productive periods were eras where critiques came with names attached, e.g. the physicists of the early 20th century, or the Republic of Letters. I don't think the era of Elsevier journals with anonymous reviewers is an improvement—too much unaccountable bureaucracy, too much room for hidden politicking, not enough of the purifying fire of public argument.

If someone is worried about repercussions, which I doubt happens very often, then I think a better solution is to use a new pseudonym. (This isn't the ... (read more)

Yeah, several those ideas are "obviously good", and the reason we haven't done them yet is mostly because the first half of December was full of competing priorities (marketing the 2018 books, running Solstice). But I expect us to be much more active/agenty about this starting this upcoming Monday.
Maybe that should be an event that happens in the garden?
Wouldn't this achieve the opposite of what we want, disincentivize reviews? Unless coupled with paying people to write reviews, this would remove the remaining incentive. I'd prefer going into the opposite direction, making reviews more visible (giving them a more prominent spot on the front page/on allPosts, so that more people vote on them/interact with them). At the moment, they still feel a bit disconnected from the rest of the site.

Maybe a tax on compute would be a good and feasible idea?

--Currently the AI community is mostly resource-poor academics struggling to compete with a minority of corporate researchers at places like DeepMind and OpenAI with huge compute budgets. So maybe the community would mostly support this tax, as it levels the playing field. The revenue from the tax could be earmarked to fund "AI for good" research projects. Perhaps we could package the tax with additional spending for such grants, so that overall money flows into the AI community, whilst reducing compute usage. This will hopefully make the proposal acceptable and therefore feasible.

--The tax could be set so that it is basically 0 for everything except for AI projects above a certain threshold of size, and then it's prohibitive. To some extent this happens naturally since compute is normally measured on a log scale: If we have a tax that is 1000% of the cost of compute, this won't be a big deal for academic researchers spending $100 or so per experiment (Oh no! Now I have to spend $1,000! No big deal, I'll fill out an expense form and bill it to the university) but it would be prohibitive for a corporat... (read more)

Would this work across different countries (and if so how)? It seems like if one country implemented such a tax, the research groups in that country would be out-competed by research groups in other countries without such a tax (which seems worse than the status quo, since now the first AGI is likely to be created in a country that didn't try to slow down AI progress or "level the playing field").
4Daniel Kokotajlo
Yeah, probably not. It would need to be an international agreement I guess. But this is true for lots of proposals. On the bright side, you could maybe tax the chip manufacturers instead of the AI projects? Idk. Maybe one way it could be avoided is if it came packaged with loads of extra funding for safe AGI research, so that overall it is still cheapest to work from the US.
2Daniel Kokotajlo
Another cool thing about this tax is that it would automatically counteract decreases in the cost of compute. Say we make the tax 10% of the current cost of compute. Then when the next generation of chips comes online, and the price drops by an order of magnitude, automatically the tax will be 100% of the cost. Then when the next generation comes online, the tax will be 1000%. This means that we could make the tax basically nothing even for major corporations today, and only start to pinch them later.

GPT-3 app idea: Web assistant. Sometimes people want to block out the internet from their lives for a period, because it is distracting from work. But sometimes one needs the internet for work sometimes, e.g. you want to google a few things or fire off an email or look up a citation or find a stock image for the diagram you are making. Solution: An app that can do stuff like this for you. You put in your request, and it googles and finds and summarizes the answer, maybe uses GPT-3 to also check whether the answer it returns seems like a good answer to the request you made, etc. It doesn't have to work all the time, or for all requests, to be useful. As long as it doesn't mislead you, the worst that happens is that you have to wait till your internet fast is over (or break your fast).

I don't think this is a great idea but I think there'd be a niche for it.

Charity-donation app idea: (ETA: If you want to make this app, reach out. I'm open to paying for it to exist.)

The app consists of a gigantic, full-screen button such that if you press it, the phone will vibrate and play a little satisfying "ching" sound and light up sparkles around where your finger hit, and $1 will be donated to GiveDirectly. You can keep slamming that button as much as you like to thereby donate as many dollars as you like.

In the corner there's a menu button that lets you change from GiveDirectly to Humane League or AMF or whatever (you can go into the settings and input the details for a charity of your choice, adding it to your personal menu of charity options, and then toggle between options as you see fit. You can also set up a "Donate $X per button press instead of $1" option and a "Split each donation between the following N charities" option.

Why is this a good idea:

I often feel guilty for eating out at restaurants. Especially when meat is involved. Currently I donate a substantial amount to charity on a yearly basis (aiming for 10% of income, though I'm not doing a great job of tracking that) but it feels like a chore, I have to remember to do it and then ... (read more)

How easy is it currently to make 1-dollar donations on a smartphone? Is there a way to do it for close to 0% fees? You likely wouldn't want to give an app store 30% of your donations. 
2Daniel Kokotajlo
Good point. Maybe the most difficult part about making this app would be setting up the payments somehow so that they don't get heavily taxed by middlemen. I imagine it would be best for the app to actually donate, like, once every three months or so, and store up your dollars in the meantime.
I think this is a great idea. It could be called Give NOW or just GIVE or something. The single big satisfying button is such a stupid, great concept. The gamification aspect is good, but more importantly reducing the barrier to donating small amounts of money more often seems like a great thing to me. Often times the biggest barrier to donating more sadly is the inconvenience of doing so. Whipping our your phone, opening up GIVE and tapping the big button a few times encourages more donations and gives you that self-satisfying boost that pressing a big button and getting immediate feedback gives you these days. The social-cuing is a bonus too (and this seems far more adoptable than veganism for obvious reasons). I'd be interested in working on this. I work in enterprise application development and have TypeScript and React Native w/ Firebase experience and have built and deployed a toy app to the Apple app store before (no real Android experience though). I'd be particularly interested in working on the front-end design if someone else wants to collaborate on the back-end services we'd need to set up (payment system; auth; storage; etc.). Maybe reply here if you'd be interested?
3Aaron F
I would be interested in working on this with you. I'm in college for CS, and I have what I'm pretty sure is enough backend experience (and some frontend) to pull this off with you. I've never dealt with financial services before, but I've looked into payment processing a little bit and it doesn't seem too complicated. Anyway, if you'd like, DM me and maybe we can find a time to chat.
3Daniel Kokotajlo
Yay! Thanks! I imagine the back-end services part is going to be the trickiest part. Maybe I should post on Bountied Rationality or EA forum looking for someone to collaborate with you.
Go for it! I'm not on either of those forums explicitly, but happy to collaborate :)
Hey! I'd be interested in working on this. My suggestion would be to use Flutter for front-end (React Native is perfectly fine as well, though) and especially to utilize an API like Pledge's one for back-end (as they've solved the tough parts of the donation process already and they don't really have any service fees when it comes to this use case). Coincidentally, I have roughly 10 years of experience of hobbyist game design, so we could think about adding e.g. prosocial features and mechanics down the line if you're interested.
2Daniel Kokotajlo
Nice! You both should check out this thread if you haven't already, and see if there are other people to possibly coordinate with. Also lots of good advice in there about the main difficulties, e.g. app store policies.
Thanks for the reply! I'm aware of the thread and I believe that we'd be able to solve the policy issues. Using an existing API like the Pledge's one mentioned above would be my strong recommendation, given that they indeed handle the heavy parts of the donation process. It would make dealing with the policies of the app stores a breeze compared to making the back-end from scratch, as in that case there would be a rather huge workload in dealing with the heavy (although necessary) bureaucracy. It would be nice if we started coordinating the development somehow. I would start with a central hub where all the comms would take place so that the discussion wouldn't become scattered and hard to follow. Maybe something like semi-open Slack or Discord server for more instant and spontaneous messaging and all the fancy extra features?
4Aaron F
How about a private channel in the EA Anywhere slack workspace ( We can also mention the project in their software engineering channel and see if anyone else wants to work with us. If this sounds good, join the workspace and then DM me (Aaron Fink) and I'll add you to a channel.
These all seem like great ideas! I think a Discord server sounds great. I know that @Aaron F was expressing interest here and on EA, I think, so a group of us starting to show interest might benefit from some centralized place to chat like you said. I got unexpectedly busy with some work stuff, so I'm not sure I'm the best to coordinate/ring lead, but I'm happy to pitch in however/whenever I can! Definitely open to learning some new things (like Flutter) too.
2Daniel Kokotajlo
Whatever you think is best! I don't have anything to contribute to the development except vision and money, but I'll check in as needed to answer questions about those things.

A few years ago there was talk of trying to make Certificates of Impact a thing in EA circles. There are lots of theoretical reasons why they would be great. One of the big practical objections was "but seriously though, who would actually pay money to buy one of them? What would be the point? The impact already happened, and no one is going to actually give you the credit for it just because you paid for the CoI."

Well, now NFT's are a thing. I feel like CoI's suddenly seem a lot more viable!

Here's my AI Theory reading list as of 3/28/2022. I'd love to hear suggestions for more things to add! You may be interested to know that the lessons from this list are part of why my timelines are so short.

On scaling laws:
(Original scaling laws paper, contains the IMO super-important graph showing that bigger models are more data-efficient) (Newer scaling laws paper, with more cool results and graphs, in particular graphs showing how you can extrapolate GPT performance seemingly forever) (Excellent presentation by Kaplan on the scaling laws stuff, also talks a bit about the theory of why it's happening)

Added 3/28/2022: Nice summary of the AlphaCode paper, which itself is notable for more scaling trend graphs! :)

On the bayesian-ness and simplicity-bias of neural networks (which explains why scaling works and should be expected to continue, IMO): (more like, the linked pos... (read more)

Cool list! I'll look into the ones I don't know or haven't read yet.

For the past year I've been thinking about the Agent vs. Tool debate (e.g. thanks to reading CAIS/Reframing Superintelligence) and also about embedded agency and mesa-optimizers and all of these topics seem very related now... I keep finding myself attracted to the following argument skeleton:

Rule 1: If you want anything unusual to happen, you gotta execute a good plan.

Rule 2: If you want a good plan, you gotta have a good planner and a good world-model.

Rule 3: If you want a good world-model, you gotta have a good learner and good data.

Rule 4: Having good data is itself an unusual happenstance, so by Rule 1 if you want good data you gotta execute a good plan.

Putting it all together: Agents are things which have good planner and learner capacities and are hooked up to actuators in some way. Perhaps they also are "seeded" with a decent world-model to start off with. Then, they get a nifty feedback loop going: They make decent plans, which allow them to get decent data, which allows them to get better world-models, which allows them to make better plans and get better data so they can get great world-models and make great plans and... etc. (The best agents will also be improving on their learning and planning algorithms! Humans do this, for example.)

Empirical conjecture: Tools suck; agents rock, and that's why. It's also why agenty mesa-optimizers will arise, and it's also why humans with tools will eventually be outcompeted by agent AGI.

How would you test the conjecture?
2Daniel Kokotajlo
The ultimate test will be seeing whether the predictions it makes come true--whether agenty mesa-optimizers arise often, whether humans with tools get outcompeted by agent AGI. In the meantime, it's not too hard to look for confirming or disconfirming evidence. For example, the fact that militaries and corporations that make a plan and then task their subordinates with strictly following the plan invariably do worse than those who make a plan and then give their subordinates initiative and flexibility to learn and adapt on the fly... seems like confirming evidence. (See: agile development model, the importance of iteration and feedback loops in startup culture, etc.) Whereas perhaps the fact that AlphaZero is so good despite lacking a learning module is disconfirming evidence. As for a test, well we'd need to have something that proponents and opponents agree to disagree on, and that might be hard to find. Most tests I can think of now don't work because everyone would agree on what the probable outcome is. I think the best I can do is: Someday soon we might be able to test an agenty architecture and a non-agenty architecture in some big complex novel game environment, and this conjecture would predict that for sufficiently complex and novel environments the agenty architecture would win.
I'd agree w/ the point that giving subordinates plans and the freedom to execute them as best as they can tends to work out better, but that seems to be strongly dependent on other context, in particular the field they're working in (ex. software engineering vs. civil engineering vs. military engineering), cultural norms (ex. is this a place where agile engineering norms have taken hold?), and reward distributions (ex. does experimenting by individuals hold the potential for big rewards, or are all rewards likely to be distributed in a normal fashion such that we don't expect to find outliers). My general model is in certain fields humans look more tool shaped and in others more agent shaped. For example an Uber driver when they're executing instructions from the central command and control algo doesn't require as much of the planning, world modeling behavior. One way this could apply to AI is that sub-agents of an agent AI would be tools.
3Daniel Kokotajlo
I agree. I don't think agents will outcompete tools in every domain; indeed in most domains perhaps specialized tools will eventually win (already, we see humans being replaced by expensive specialized machinery, or expensive human specialists, lots of places). But I still think that there will be strong competitive pressure to create agent AGI, because there are many important domains where agency is an advantage.
Expensive specialized tools are themselves learned by and embedded inside an agent to achieve goals. They're simply meso-optimization in another guise. eg AlphaGo learns a reactive policy which does nothing which you'd recognize as 'planning' or 'agentiness' - it just maps a grid of numbers (board state) to another grid of numbers (value function estimates of a move's value). A company, beholden to evolutionary imperatives, can implement internal 'markets' with 'agents' if it finds that useful for allocating resources across departments, or use top-down mandates if those work better, but no matter how it allocates resources, it's all in the service of an agent, and any distinction between the 'tool' and 'agent' parts of the company is somewhat illusory.

Rootclaim seems pretty awesome: About | Rootclaim

What is the source of COVID-19 (SARS-CoV-2)? | Rootclaim

I wonder how easy it would be to boost them somehow.

Trying to summarize the evidence that favors their conclusion (virus developed using gain-of-function research) over my assumption (virus collected from nature, then escaped unmodified). * Wuhan labs were researching gain of function * covid has parts in common with two different viruses * covid has a furin cleavage site, which other coronaviruses don't have * covid was well adapted to humans since the beginning * prior to the outbreak, a Wuhan researcher tried to disassociate from covid
Yeah, I've really liked reading Rootclaim stuff during the pandemic.
Rootclaim is super cool, glad to finally see others mention it too! 

I came across this old Metaculus question, which confirms my memory of how my timelines changed over time:

30% by 2040 at first, then march 2020 I updated to 40%, then Aug 2020 I updated to 71%, then I went down a bit, and then now it's up to 85%. It's hard to get higher than 85% because the future is so uncertain; there are all sorts of catastrophes etc. that could happen to derail AI progress.

What caused the big jump in mid-2020 was sitting down to actually calculate my timelines in earnest. I ended up converging on something like the Bio Anchors framewor... (read more)

The International Energy Agency releases regular reports in which it forecasts the growth of various energy technologies for the next few decades. It's been astoundingly terrible at forecasting solar energy for some reason. Marvel at this chart:

This is from an article criticizing the IEA's terrible track record of predictions. The article goes on to say that there should be about 500GW of installed capacity by 2020. This article was published in 2020; a year later, the 2020 data is in, and it's actually 714 GW. Even the article criticizing the IEA for thei... (read more)

9Zac Hatfield-Dodds
The IEA is a running joke in climate policy circles; they're transparently in favour of fossil fuels and their "forecasts" are motivated by political (or perhaps commercial, hard to untangle with oil) interests rather than any attempt at predictive accuracy.
2Daniel Kokotajlo
OH ok thanks! Glad to hear that. I'll edit.
What do you mean by "transparently" in favour of fossil fuels? Is there anything like a direct quote e.g. of Fatih Birol backing this up?

Eric Drexler has argued that the computational capacity of the human brain is equivalent to about 1 PFlop/s, that is, we are already past the human-brain-human-lifetime milestone. (Here is a gdoc.) The idea is that we can identify parts of the human brain that seem to perform similar tasks to certain already-existing AI systems. It turns out that e.g. 1-thousandth of the human brain is used to do the same sort of image processing tasks that seem to be handled by modern image processing AI... so then that means an AI 1000x bigger than said AI should be able... (read more)

It is known that birds brains are much more mass-effective than mammalian.

One thing I find impressive about GPT-3 is that it's not even trying to generate text.

Imagine that someone gave you a snippet of random internet text, and told you to predict the next word. You give a probability distribution over possible next words. The end.

Then, your twin brother gets a snippet of random internet text, and is told to predict the next word. Etc. Unbeknownst to either of you, the text your brother gets is the text you got, with a new word added to it according to the probability distribution you predicted.

Then we repeat with your tr... (read more)

4mako yass
There's no link preview for manifold links, so we should mention that the market is "GPT4 or better model available for download by EOY 2024?" (the model is allowed to be illegal)

Has anyone done an expected value calculation, or otherwise thought seriously about, whether to save for retirement? Specifically, whether to put money into an account that can't be accessed (or is very difficult to access) for another twenty years or so, to get various employer matching or tax benefits?

I did, and came to the conclusion that it didn't make sense, so I didn't do it. But I wonder if anyone else came to the opposite conclusion. I'd be interested to hear their reasoning.

ETA: To be clear, I have AI timelines in mind here. I expect to be either ... (read more)

There's a lot of detail behind "expect to be" that matters here.  It comes down to "when is the optimal time to spend this money" - with decent investment options, if your satisfactory lifestyle has unspent income, the answer is likely to be "later".  And then the next question is "how much notice will I have when it's time to spend it all"?   For most retirement savings, the tax and match options are enough to push some amount of your savings into that medium.  And it's not really locked up - early withdrawal carries penalties, generally not much worse than not getting the advantages in the first place. And if you're liquidating because you think money is soon to be meaningless (for you, or generally), you can also borrow a lot, probably more than you could if you didn't have long-term assets to point to.   For me, the EV calculation comes out in favor of retirement savings.  I'm likely closer to it than you, but even so, the range of outcomes includes all of "unexpected death/singularity making savings irrelevant", "early liquidation for a pre-retirement use", and "actual retirement usage".  And all of that outweighs by a fair bit "marginal spending today". Fundamentally, the question isn't "should I use investment vehicles targeted for retirement", but "What else am I going to do with the money that's higher-value for my range of projected future experiences"?
4Daniel Kokotajlo
Very good point that I may not be doing much else with the money. I'm still saving it, just in more liquid, easy to access forms (e.g. stocks, crypto.) I'm thinking it might come in handy sometime in the next 20 years during some sort of emergency or crunch time, or to handle unforeseen family expenses or something, or to donate to a good cause.
4Daniel Kokotajlo
It's not obvious that unaligned AI would kill us. For example, we might be bargaining chips in some future negotiation with aliens.
2Charlie Steiner
My decision was pretty easy because I don't have any employer matching or any similarly large incentives. I don't think the tax incentives are big enough to make up for the inconvenience in the ~60% case where I want to use my savings before old age. However, maybe a mixed strategy would be more optimal.
1[comment deleted]

Years after I first thought of it, I continue to think that this chain reaction is the core of what it means for something to be an agent, AND why agency is such a big deal, the sort of thing we should expect to arise and outcompete non-agents. Here's a diagram:

Roughly, plans are necessary for generalizing to new situations, for being competitive in contests for which there hasn't been time for natural selection to do lots of optimization of policies. But plans are only as good as the knowledge they are based on. And knowledge doesn't come a priori; it nee... (read more)

6Yoav Ravid
Seems similar to the OODA loop
4Daniel Kokotajlo
Yep! I prefer my terminology but it's basically the same concept I think.
6Gordon Seidoh Worley
I think it's probably even simpler than that: feedback loops are the minimum viable agent, i.e. a thermostat is the simplest kind of agent possible. Stuff like knowledge and planning are elaborations on the simple theme of the negative feedback circuit.
4Daniel Kokotajlo
I disagree; I think we go astray by counting things like thermostats as agents. I'm proposing that this particular feedback loop I diagrammed is really important, a much more interesting phenomenon to study than the more general category of feedback loop that includes thermostats.

In this post, Jessicata describes an organization which believes:

  1. AGI is probably coming in the next 20 years.
  2. Many of the reasons we have for believing this are secret.
  3. They're secret because if we told people about those reasons, they'd learn things that would let them make an AGI even sooner than they would otherwise.

At the time, I didn't understand why an organization would believe that. I figured they thought they had some insights into the nature of intelligence or something, some special new architecture for AI designs, that would accele... (read more)

The other day I heard this anecdote: Someone's friend was several years ago dismissive of AI risk concerns, thinking that AGI was very far in the future. When pressed about what it would take to change their mind, they said their fire alarm would be AI solving Montezuma's Revenge. Well, now it's solved, what do they say? Nothing; if they noticed they didn't say. Probably if they were pressed on it they would say they were wrong before to call that their fire alarm.

This story fits with the worldview expressed in "There's No Fire Alarm for AGI." I expect this sort of thing to keep happening well past the point of no return.

Also related: Is That Your True Rejection? There is this pattern when people say: "X is the true test of intelligence", and after a computer does X, they switch to "X is just a mechanical problem, but Y is the true test of intelligence". (Past values of X include: chess, go, poetry...) There was a meme about it that I can't find now.

Bedrock is the easiest way for customers to build and scale generative AI-based applications using FMs, democratizing access for all builders.

I know it's just meaningless corporatespeak applause light, but it occurs to me that it's also technically incorrect -- the situation is more analogous to other forms of government (anarchy or dictatorship, depending on whether Amazon exercises any power) than to democracy (it's not like all the little builders get together and vote on laws that then apply to the builders who didn't vote or voted the other way.)

This is an isolated demand for semantic rigor. There's a very long history of."democracy" or "democratic" being used in an extensive sense to mean much more than just "people voting on things and banning or promoting things they like." To choose one of many potential examples, I give you a section of introduction of de Tocqueville's "Democracy in America", emphases mine. De Tocqueville gives a long-ass list of things which promote "equality of condition" as turning "to the advantage of democracy." Though many of them do not have anything have to do with voting. If you want to "technically incorrect" Amazon you gotta also do it to de Tocqueville, which is awkward because de Tocqueville work probably actually helps determine the meaning of "democracy" in modern parlance. (And maybe you also want to ping Plato for his "democratic soul") Words don't just mean what they say in dictionaries or in textbooks that define them. Words have meaning from how people actually use them. If it's meaningful and communicative for de Tocqueville to to say that the printing press, the invention of firearms, and Protestantism help "turn to the advantage of democracy", then I think it's meaningful and communicative for a company to say that making it easier for non-billion dollar companies have use AI can (in more modern parlance) "democratize access." Alicorn's essay on expressive vocabulary strikes me as extremely relevant:
2Daniel Kokotajlo
I think you might be right... but I'm going to push back against your beautiful effort post, as follows: --I am not striking terms without suitable replacement. I offered anarchy and dictatorship as replacements. Personally I think Amazon should have said "anarchizing access for all builders." Or just "Equalizing access for all builders" or "Levelling the playing field" if they wanted to be more technically correct, which they didn't. I'm not actually upset at them, I know how the game works. Democracy sounds better than anarchy so they say democracy. --"Turning to the advantage of democracy" =/= rendering-more-analogous-to-democracy. I can criticize Amazon without criticizing de Tocqueville.  --Plato's democratic soul was actually an excellent analogy and choice of words by my book. Those with democratic souls, according to Plato, basically take a vote of what their subagents want to do at any given time and then do that. Anarchic soul would be someone in whom getting-outvoted doesn't happen and each subagent is free to pursue what they want independently--so maybe someone having a seizure?

A nice thing about being a fan of Metaculus for years is that I now have hard evidence of what I thought about various topics back in the day. It's interesting to look back on it years later. Case in point: Small circles are my forecasts:

The change was, I imagine, almost entirely driven by the update in my timelines.


I speculate that drone production in the Ukraine war is ramping up exponentially and will continue to do so. This means that however much it feels like the war is all about drones right now, it'll feel much more that way a year from now. Both sides will be regularly sending flocks of shahed-equivalents at each other, both sides will have reinvented tactics to center around FPV kamikaze drones, etc. Maybe we'll even see specialized anti-drone drones dogfighting with each other, though since there aren't any of those yet they won't have appeared in large numbers.

I guess this will result in the "no man's land" widening even further, to like 10km or so. (That's about the maximum range of current FPV kamikaze drones)

2Daniel Kokotajlo
On second thought, maybe it's already 10km wide for all I know. Hmm. Well, however wide it is now, I speculate it'll be wider this time next year.
2Daniel Kokotajlo
Article about drone production, with estimates: 

Came across this short webcomic thingy on r/novelai. It was created entirely using AI-generated images. (Novelai I assume?)
I wonder how long it took to make.

Just imagine what'll be possible this time next year. Or the year after that.

Notes on Tesla AI day presentation: Here they claim they've got more than 10,000 GPUs in their supercomputer, and that this means their computer is more powerful than the top 5 publicly known supercomputers in the world. Consulting this list it seems that this would put their computer at just over 1 Exaflop per second, which checks out (I think I had heard rumors this was the case) and also if you look at this (read more)

In a recent conversation, someone said the truism about how young people have more years of their life ahead of them and that's exciting. I replied that everyone has the same number of years of life ahead of them now, because AI timelines. (Everyone = everyone in the conversation, none of whom were above 30)

I'm interested in the question of whether it's generally helpful or harmful to say awkward truths like that. If anyone is reading this and wants to comment, I'd appreciate thoughts.

6Steven Byrnes
I've been going with the compromise position of "saying it while laughing such that it's unclear whether you're joking or not" :-P
4Daniel Kokotajlo
The people who know me know I'm not joking, I think. For people who don't know me well enough to realize this, I typically don't make these comments.
2Steven Byrnes
I sometimes kinda have this attitude that this whole situation is just completely hilarious and absurd, i.e. that I believe what I believe about the singularity and apocalypse and whatnot, but that the world keeps spinning and these ideas have basically zero impact. And it makes me laugh. So when I shrug and say "I'm not saving enough for retirement; oh well, by then probably we'll all be dead or living in a radical post-work utopia", I'm not just laughing because it's ambiguously a joke, I'm also laughing because this kind of thing reminds me of how ridiculous this all is. :-P
What if things foom later than you're expecting - say during retirement? What if anti-aging enters the scene and retirement can last, much, much longer, before the foom?
2Steven Byrnes
Tbc my professional opinion is that people should continue to save for retirement :-P I mean, I don't have as much retirement savings as the experts say I should at my age ... but does anyone? Oh well...
"Truths" are persuasion, unless expected to be treated as hypotheses with the potential to evoke curiosity. This is charity, continuous progress on improving understanding of circumstances that produce claims you don't agree with, a key skill for actually changing your mind. By default charity is dysfunctional in popular culture, so non-adversarial use of factual claims that are not expected to become evident in short order depends on knowing that your interlocutor practices charity. Non-awkward factual claims are actually more insidious, as the threat of succeeding in unjustified persuasion is higher. So in a regular conversation, there is a place for arguments, not for "truths", awkward or not. Which in this instance entails turning the conversation to the topic of AI timelines. I don't think there are awkward arguments here in the sense of treading a social taboo minefield, so there is no problem with that, except it's work on what at this point happens automatically via stuff already written up online, and it's more efficient to put effort in growing what's available online than doing anything in person, unless there is a plausible path to influencing someone who might have high impact down the line.
It's fine to say that if you want the conversation to become a discussion of AI timelines. Maybe you do! But not every conversation needs to be about AI timelines.
I've stopped bringing up the awkward truths around my current friends. I started to feel like I was using to much of my built up esoteric social capital on things they were not going to accept (or at least want to accept). How can I blame them? If somebody else told me there was some random field that a select few of people interested in will be deciding the fate of all of humanity for the rest of time and I had no interest in that field I would want to be skeptical of it as well.  Especially if they were to through out some figures like 15 - 25 years from now (my current timelines) is when humanities rein over the earth will end because of this field. I found when I stopped bringing it up conversations were lighter and more fun. I've accepted we will just be screwing around talking about personal issues and the issues de jour, I don't mind it.  The truth is a bitter pill to get down, and if they no interest in helping AI research its probably best they don't live their life worrying about things they won't be able to change. So for me at least I saw personal life improvements on not bringing some of those awkward truths up. 
Depends on the audience and what they'll do with the reminder.  But that goes for the original statement as well (which remains true - there's enough uncertainty about AI timelines and impact on individual human lives that younger people have more years of EXPECTED (aka average across possible futures) life).
Whether it makes sense to tell someone an awkward truth depends often more on the person then on the truth.
Truths in general: This is especially true when the truth isn't in the words, but something you're trying to point at with them. Awkward truths: What makes something an awkward truth, is the person, anyway, so your statement seems tautological.

Probably, when we reach an AI-induced point of no return, AI systems will still be "brittle" and "narrow" in the sense used in arguments against short timelines.

Argument: Consider AI Impacts' excellent point that "human-level" is superhuman (bottom of this page)

The point of no return, if caused by AI, could come in a variety of ways that don't involve human-level AI in this sense. See this post for more. The general idea is that being superhuman at some skills can compensate for being subhuman at others. We should expect the point of no return to be reache... (read more)

How much video data is there? It seems there is plenty:

This says 500 hours of video are uploaded to youtube every minute. This says standard definition for youtube video is 854x480 = 409920 pixels. At 48fps, that’s 3.5e13 pixels of data every minute. Over the course of a whole year, that’s +5 OOMs, it comes out to 1.8e19 pixels of data every year. So yeah, even if we use some encoding that crunches pixels down to 10x10 vokens or whatever,... (read more)

So have you thought about what "data points" mean? If the data is random samples from the mandelbrot set, the maximum information the AI can ever learn is just the root equation used to generate the set. Human agents control a robotics system where we take actions and observe the results on our immediate environment. This sort of information seems to lead to very rapid learning especially for things where the consequences are near term and observable. You are essentially performing a series of experiments where you try action A vs B and observe what the environment does. This let's you rapidly cancel out data that doesn't matter, its how you learn that lighting conditions don't affect how a rock falls when you drop it. Point is the obvious training data for an AI would be similar. It needs to manipulate, both in sims and reality, the things we need it to learn about
2Daniel Kokotajlo
I've thought about it enough to know I'm confused! :) I like your point about active learning (is that the right term?). I wonder how powerful GPT-3 would be if instead of being force-fed random internet text from a firehose, it had access to a browser and could explore (would need some sort of curiosity reward signal?). Idk, probably this isn't a good idea or else someone would have done it.
I don't know that GPT-3 is the best metric for 'progress towards general intelligence'.  One example of the agents receiving 'active' data that resulted in interesting results is this OpenAI experiment.   In this case the agents cannot emit text - which is what GPT-3 is doing that makes us feel it's "intelligent" - but can cleverly manipulate their environment in complex ways not hardcoded in.  The agents in this experiment are learning both movement to control how they view the environment and to use a few simple tools to accomplish a goal.   To me this seems like the most promising way forward.  I think that robust agents that can control real robots to do things, with those things becoming increasingly complex and difficult as the technology improves, might in fact be the "skeleton" of what would later allow for "real" sentience. Because from our perspective, this is our goal.  We don't want an agent that can babble and seem smart, we want an agent that can do useful things - things we were paying humans to do - and thus extend what we can ultimately accomplish.  (yes, in the immediate short term it unemploys lots of humans, but it also would make possible new things that previously we needed lots of humans to do.  It also should allow for doing things we know how to do now but with better quality/on a more broader scale.  ) More exactly, how do babies learn?  Yes, they learn to babble, but they also learn a set of basic manipulations of their body - adjusting their viewpoint - and manipulate the environment with their hands - learning how it responds. We can discuss more, I think I know how we will "get there from here" in broad strokes.  I don't think it will be done by someone writing a relatively simple algorithm and getting a sudden breakthrough that allows for sentience, I think it will be done by using well defined narrow domain agents that each do something extremely well - and by building higher level agents on top of this foundation in a series of
2Daniel Kokotajlo
I'd be interested to hear more about this. It sounds like this could maybe happen pretty soon with large, general language models like GPT-3 + prompt programming + a bit of RL.

Some ideas for definitions of AGI / resolution criteria for the purpose of herding a bunch of cats / superforecasters into making predictions: 

(1) Drop-in replacement for human remote worker circa 2023 (h/t Ajeya Cotra): 

When will it first be the case that there exists an AI system which, if teleported back in time to 2023, would be able to function as a drop-in replacement for a human remote-working professional, across all* industries / jobs / etc.? So in particular, it can serve as a programmer, as a manager, as a writer, as an advisor, etc. a... (read more)

For predicting feasible scaling investment, drop-in replacement for a significant portion of remote work that currently can only be done by humans seems important (some of which is not actually done by humans remotely). That is, an AI that can be cheaply and easily on-boarded for very small volume custom positions with minimal friction, possibly by some kind of AI on-boarding human professional. But not for any sort of rocket science or 90th percentile. (That's the sort of thing I worry about GPT-5 with some scaffolding turning out to be, making $50 billion training runs feasible without relying on faith in heretofore-unseen further scaling.)
(I made some slight formatting edits to this, since some line-breaks looked a bit broken on my device, feel free to revert)

I remember being interested (and maybe slightly confused) when I read about the oft-bloody transition from hereditary monarchies to democracies and dictatorships. Specifically it interested me that so many smart, reasonable, good people seemed to be monarchists. Even during anarchic periods of civil war, the factions tended to rally around people with some degree of legitimate claim to the throne, instead of the whole royal lineage being abandoned and factions arising based around competence and charisma. Did these smart people literally believe in some so... (read more)

In monarchy, people with royal blood are the Schelling points. If you vote for someone without royal blood, other people may prefer someone else without royal blood... there are millions of options, the fighting will never end. Also, we shouldn't ignore the part where many other countries are ruled by our king's close family. What will they do after we overthrow the king and replace him with some plebeian? . (By the way, Trump is probably a bad example to use in this analogy. I think in 2017 many of his voters considered him an example of someone who doesn't have the "royal blood", i.e. the support of either party's establishment; unlike Hillary literally-a-relative-of-another-president Clinton.)
2Zach Stein-Perlman
The answer is that there's a coordination problem. Wait, what is it that gave monarchic dynasties momentum, in your view?

Came across this old (2004) post from Moravec describing the evolution of his AGI timelines over time. Kudos to him, I say. Compute-based predictions seem to have historically outperformed every other AGI forecasting method (at least the ones that were actually used), as far as I can tell.

What if Tesla Bot / Optimus actually becomes a big deal success in the near future (<6 years?) Up until recently I would be quite surprised, but after further reflection now I'm not so sure.

Here's my best "bull case:"

Boston Dynamics and things like this establish that getting robots to walk around over difficult terrain is possible with today's tech, it just takes a lot of engineering talent and effort.

So Tesla will probably succeed, within a few years, at building a humanoid robot that can walk around and pic... (read more)

3Lone Pine
I think the problem here is clear use cases. What is the killer app for the minimum viable robot?
2Daniel Kokotajlo
Yeah, that's the crux... Stocking shelves maybe? This seems like the best answer so far. Packing boxes in warehouses (if that's not already done by robots?) What about flipping burgers? What about driving trucks? Specifically, get an already-autonomous truck and then put one of these bots in it, so that if you need some physical hands to help unload the truck or fill it up with gas or do any of those ordinary menial tasks associated with driving long distances, you've got them. (A human can teleoperate remotely when the need arises) Maybe ordinary factory automation? To my surprise regular factory robots cost something like $50,000; if that's because they aren't mass-produced enough to benefit from economies of scale, then Tesla can swoop in with $20,000 humanoid robots and steal market share. (Though also the fact that regular factory robots cost so much is evidence that Tesla won't be able to get the price of their bots down so low) Maybe cleaning? In theory a robot like this could handle a mop, a broom, a dust wand, a sponge, a vacuum, etc. Could e.g. take all the objects off your sink, spray it with cleaner and wipe it down, then put all the objects back. Maybe cooking? Can chop carrots and stuff like that. Can probably follow a recipe, albeit hard-coded ones. If it can clean, then it can clean up its own messes. I wish I had a better understanding of the economy so I could have more creative ideas for bot-ready jobs. I bet there are a bunch I haven't thought of. ... Tesla FSD currently runs on hopium: People pay for it and provide training data for it, in the hopes that in a few years it'll be the long-prophecied robocar. Maybe a similar business model could work for Optimus. If they are steadily improving it and developing an exponentially growing list of skills for it, people will believe that in a few years it'll be a fully functioning household servant,

Random idea: Hard Truths Ritual:

Get a campfire or something and a notepad and pencil. Write down on the pad something you think is probably true, and important, but which you wouldn't say in public due to fear of how others would react. Then tear off that piece of paper and toss it in the fire. Repeat this process as many times as you can for five minutes; this is a brainstorming session, so your metric for success is how many diverse ideas you have multiplied by their average quality.

Next, repeat the above except instead of "you wouldn't say in public..."... (read more)

Interesting. The main reasons why I'd see something potentially falling into category 3+ (maybe 2 also) are either a) threat models where I am observed far more than otherwise expected or b) threat models where cognitohazards exist. ...which for a) leads to "write something on a piece of paper and throw it in the fire" also being insecure, and for b) leads to "thinking of it is a bad idea regardless of what you do after".
3Daniel Kokotajlo
It sounds like you are saying you are unusually honest with yourself, much more than most humans. Yes? Good point about cognitohazards. I'd say: Beware self-fulfilling prophecies.
I think you are underestimating how much I think falls into these categories. I suspect (although I do not know) that much of what you would call being dishonest to oneself I would categorize into a) or b). (General PSA: although choosing a career that encourages you to develop your natural tendencies can be a good thing, it also has downsides. Being someone who is on the less trusting side of things at the best of times and works in embedded hardware with an eye toward security...  I am rather acutely aware of how[1] much[2] information[3] leakage[4] there is from e.g. the phone in your pocket. Typical English writing speed is ~13 WPM[5]. English text has ~9.83 bits of entropy / word[6]. That's only, what, 2.1 bits / second? That's tiny[7][8].)  (I don't tend to like the label, mainly because of the connotations, but the best description might be 'functionally paranoid'. I'm the sort of person who reads the IT policy at work, notes that it allows searches of personal devices, and then never brings a data storage device of any sort to work as a result.) Could you please elaborate? 1. ^ ("whoops, brightness is a sidechannel attack to recover audio because power supplies aren't perfect". Not typically directly applicable for a phone, but still interesting.) 2. ^ ("whoops, you can deduce keystrokes from audio recordings" (which probably means you can also do the same with written text...)) 3. ^ ("whoops, wifi can be used as a passive radar") 4. ^ ("whoops, accelerometer data can often be used to determine location because people don't travel along random routes") 5. ^ (Actually, that's copying speed, but close e