Every now and then, some AI luminaries
I agree with (1) and strenuously disagree with (2).
The last time I saw something like this, I responded by writing: LeCun’s “A Path Towards Autonomous Machine Intelligence” has an unsolved technical alignment problem.
Well, now we have a second entry in the series, with the new preprint book chapter “Welcome to the Era of Experience” by...
The RL algorithms that people talk in AI traditionally feature an exponentially-discounted sum of future rewards, but I don’t think there’s any exponentially-discounted sums of future rewards in biology (more here). Rather, you have an idea (“I’m gonna go to the candy store”), and the idea seems good or bad, and if it seems sufficiently good, then you do it! (More here.) It can seem good for lots of different reasons. One possible reason is: the idea is immediately associated with (non-behaviorist) primary reward. Another possible reason is: the idea invol...
Yes, I've read and fully understood 99% of Decision theory does not imply that we get to have nice things, a post debunking many wishful ideas for Human-AI Trade. I don't think that debunking works against Logical Counterfactual Simulations (where the simulators delete evidence of the outside world from math and logic itself).
One day, we humans may be powerful enough to run simulations of whole worlds. We can run simulations of worlds where physics is completely different. The strange creatures which evolve in our simulations, may never realize who we are and what we are, while we observe them and their every detail.
Not only can we run simulations of worlds where physics is completely different. But we can run simulations of worlds where...
In my world model, using Logical Counterfactual Simulations to do Karma Tests will not provide the superintelligence "a lot of evidence" that it may be in a Karma Test. It will only be a very remote possibility which it cannot rule out. This makes it worthwhile for it to spend 0.0000001% of the universe on being kind for humans.
Even such a tiny amount may be enough to make trillions of utopias, because the universe is quite big.
If you are an average utilitarianism, then this tiny amount could easily make the difference between whether the ...
Posted a piece on how removing informal social control (the “auntie layer”) affects a city’s memetic landscape, using SF as case study. Interested in rationalist critiques: Are such regulators net-positive or pareto-inefficient? |
A common claim is that concern about [X] ‘distracts’ from concern about [Y]. This is often used as an attack to cause people to discard [X] concerns, on pain of being enemies of [Y] concerns, as attention and effort are presumed to be zero-sum.
There are cases where there is limited focus, especially in political contexts, or where arguments or concerns are interpreted perversely. A central example is when you site [ABCDE] then they’ll find what they consider the weakest one and only consider or attack that, silently discarding the rest entirely. Critics of existential risk do that a lot.
So it does happen. But in general one should assume such claims are false.
Thus, the common claim that AI existential risks ‘distract’ from immediate harms. It turns out Emma...
Are there any suggestions for how to get this message across? To all those AI x-risk disbelievers?
I love o3. I’m using it for most of my queries now.
But that damn model is a lying liar. Who lies.
This post covers that fact, and some related questions.
The biggest thing to love about o3 is it just does things. You don’t need complex or multi-step prompting, ask and it will attempt to do things.
...Ethan Mollick: o3 is far more agentic than people realize. Worth playing with a lot more than a typical new model. You can get remarkably complex work out of a single prompt.
It just does things. (Of course, that makes checking its work even harder, especially for non-experts.)
Teleprompt AI: Completely agree. o3 feels less like prompting and more like delegating. The upside is wild- but yeah, when it just does
Huh. I knew that's how ChatGPT worked but I had assumed they would've worked out a less hacky solution by now!
to follow up my philanthropic pledge from 2020, i've updated my philanthropy page with the 2024 results.
in 2024 my donations funded $51M worth of endpoint grants (plus $2.0M in admin overhead and philanthropic software development). this comfortably exceeded my 2024 commitment of $42M (20k times $2100.00 — the minimum price of ETH in 2024).
this also concludes my 5-year donation pledge, but of course my philanthropy continues: eg, i’ve already made over $4M in endpoint grants in the first quarter of 2025 (not including 2024 grants that were slow to disburse), as well as pledged at least $10M to the 2025 SFF grant round.
You've funded a what looks from my vantage point to be a huge portion of the quality-adjusted attempts to avert doom, perhaps a majority. Much appreciation for stepping up for humanity.
Technically the point of going to college is to help you thrive in the rest of your life after college. If you believe in AI 2027, the most important thing for the rest of your life is for AI to be developed responsibly. So, maybe work on that instead of college?
I think the EU could actually be good place to protest for an AI pause. Because the EU doesn't have national AI ambitions, and the EU is increasingly skeptical of the US, it seems to me that a bit of protesting could do a lot to raise awareness of the reckless path that the US is taking. That, ...
Yesterday, I couldn't wrap my head around some programming concepts in Python, so I turned to ChatGPT (gpt-4o) for help. This evolved into a very long conversation (the longest I've ever had with it by far), at the end of which I pasted around 600 lines of code from Github and asked it to explain them to me. To put it mildly, I was surprised by the response:
Resubmitting the prompt produced pretty much the same result (or a slight variation of it, not identical token-by-token). I also tried adding some filler sentences before and after the code block, but to no avail. Remembering LLMs' meltdowns in long-context evaluations (see the examples in Vending-Bench), I assumed this was because my conversation was very long. Then, I copied just...
Have you tried seeing how ChatGPT responds to individual lines of code from that excerpt? There might be an anomalous token in it along the lines of " petertodd".
What would it take to convince you to come and see a fish that recognizes faces?
Note: I'm not a marine biologist, nor have I kept fish since I was four. I have no idea what fish can really do. For the purposes of this post, let's suppose that fish recognizing faces is not theoretically impossible, but beyond any reasonable expectation.
Imagine someone comes to you with this story:
..."I have an amazing fish. Marcy. She's an archerfish—you know, the kind that can spit at insects? Well, I trained her to spit at politicians.
"It started as a simple game. I’d tap a spot, she’d spit at it, and get a treat. One day I put a TV beside the tank and tapped when a politician came on. Eventually, she started
I'd say that in most contexts in normal human life, (3) is the thing that makes this less of an issue for (1) and (2). If the thing I'm hearing about it real, I'll probably keep hearing about it, and from more sources. If I come across 100 new crazy-seeming ideas and decide to indulge them 1% of the time, and so do many other people, that's usually, probably enough to amplify the ones that (seem to) pan out. By the time I hear about the thing from 2, 5, or 20 sources, I will start to suspect it's worth thinking about at a higher level.