This is a brief summary of what we believe to be the most important takeaways from our new paper and from our findings shown in the o1 system card. We also specifically clarify what we think we did NOT show.
Paper: https://www.apolloresearch.ai/research/scheming-reasoning-evaluations
Twitter about paper: https://x.com/apolloaisafety/status/1864735819207995716
Twitter about o1 system card: https://x.com/apolloaisafety/status/1864737158226928124
We say an AI system is “scheming” if it covertly pursues misaligned goals, hiding its true capabilities and objectives. We think that in order to scheme, models likely need to be goal-directed, situationally aware, and capable enough to reason about scheming as a strategy. In principle, models might acquire situational awareness and stable long-term goals during training, and then scheme in pursuit of those goals. We...
Seems like some measure of evidence -- maybe large, maybe tiny -- that "We don't know how to give AI values, just to make them imitate values" is false?
I am pessimistic about loss signals getting 1-to-1 internalised as goals or desires in a way that is predictable to us with our current state of knowledge on intelligence and agency, and would indeed tentatively consider this observation a tiny positive update.
The ACX/EA/LW Sofia Meetup for October will be on the 15th (Sunday) at 17:00 at the Mr. Pizza on Vasil Levski.
Sofia ACX started with the 2021 Meetups Everywhere round. Attendance hovers around 4-8 people. Everyone worries they're not serious enough about ACX to join, so you should banish that thought and come anyway. "Please feel free to come even if you feel awkward about it, even if you’re not 'the typical ACX reader', even if you’re worried people won’t like you", even if you didn't come to the previous meetings, even if you don't speak Bulgarian, etc., etc.
Each month we pick something new to read and discuss. This time, we're discussing "Math's Fundamental Flaw," a Veritasium video on Gödel's Incompleteness Theorem.
See you there.
If Biden pardons people like Fauci for crimes like perjury, that would set a bad precedent.
There's a reason why perjury is forbidden and if you just give pardons to any government official who committed crimes at the end of an administration that's a very bad precedent.
One way out of that would be to find a different way to punish government criminals when they are pardoned. One aspect of a pardon is that they remove the Fifth Amendment defense.
You can subpoena pardoned people in front of Congress and ask them under oath to speak about all the crimes...
Epistemic status: Toy model. Oversimplified, but has been anecdotally useful to at least a couple people, and I like it as a metaphor.
I’d like to share a toy model of willpower: your psyche’s conscious verbal planner “earns” willpower (earns trust with the rest of your psyche) by choosing actions that nourish your fundamental, bottom-up processes in the long run. For example, your verbal planner might expend willpower dragging you to disappointing first dates, then regain that willpower, and more, upon finding you a good long-term romance. Wise verbal planners can acquire large willpower budgets by making plans that, on average, nourish your fundamental processes. Delusional or uncaring verbal planners, on the other hand, usually become “burned out” – their willpower budget goes broke-ish, leaving them little to...
one thing I valued highly was free time, and regardless of how much money and status a 40 hour a week job gives you, that's still 40 hours a week in which your time isn't free!
Yeah, the same here. The harder I work the more money I can get (though the relation is not linear; more like logarithmic), but at this point the thing I want it not money... it is free time!
I guess the official solution is to save money for early retirement. Which requires investing the money wisely, otherwise the inflation eats it.
By the way, perhaps you could have some people check your resume, maybe you are doing something wrong there.
Related, here is something Yudkowsky wrote three years ago:
...I'm about ready to propose a group norm against having any subgroups or leaders who tell other people they should take psychedelics. Maybe they have individually motivated uses - though I get the impression that this is, at best, a high-variance bet with significantly negative expectation. But the track record of "rationalist-adjacent" subgroups that push the practice internally and would-be leaders who suggest to other people that they do them seems just way too bad.
I'm also about read
My paper with my Ph.D. advisor Vince Conitzer titled "Extracting Money from Causal Decision Theorists" has been formally published (Open Access) in The Philosophical Quarterly. Probably many of you have seen either earlier drafts of this paper or similar arguments that others have independently given on this forum (e.g., Stuart Armstrong posted about an almost identical scenario; Abram Demski's post on Dutch-Booking CDT also has some similar ideas) and elsewhere (e.g., Spencer (forthcoming) and Ahmed (unpublished) both make arguments that resemble some points from our paper).
Our paper focuses on the following simple scenario which can be used to, you guessed it, extract money from causal decision theorists:
...Adversarial Offer: Two boxes, and , are on offer. A (risk-neutral) buyer may purchase one or none of the boxes but not both.
I think we might be talking past each other. I will try and clarify what I meant.
Firstly, I fully agree with you that standard game theory should give you access to randomization mechanisms. I was just saying that I think that hypotheticals where you are judged on the process you use to decide, and not on your final decision are a bad way of working out which processes are good, because the hypothetical can just declare any process to be the one it rewards by fiat.
Related to the randomization mechanisms, in the kinds of problems people worry about with pre...
TL:DR: Recently, Lucius held a presentation on the nature of deep learning and why it can generalise to new data. Kaarel, Dmitry and Lucius talked about the slides for that presentation in a group chat. The conversation quickly became a broader discussion on the nature of intelligence and how much we do or don't know about it.
Lucius: I recently held a small talk presenting an idea for how and why deep learning generalises. It tried to reduce concepts from Singular Learning theory back to basic algorithmic information theory to sketch a unified picture that starts with Solomonoff induction and, with a lot of hand waving, derives that under some assumptions, just fitting a big function to your data using a local optimisation method like stochastic gradient descent...
If it’s worth saying, but not worth its own post, here's a place to put it.
If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.
If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.
If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.
The Open Thread tag is here. The Open Thread sequence is here.
The list is very long, so it is hard to make a screenshot. Now with some hours of distance, I reloaded the homepage, tried again, and one 0 karma post appeared. (Last time, it did definitely not, I search very rigorously.)
However, the mathematical formular still tells me that all 0 karma post should appear at the same position, and negative karma posts below them?
Previously: Sadly, FTX
I doubted whether it would be a good use of time to read Michael Lewis’s new book Going Infinite about Sam Bankman-Fried (hereafter SBF or Sam). What would I learn that I did not already know? Was Michael Lewis so far in the tank of SBF that the book was filled with nonsense and not to be trusted?
I set up a prediction market, which somehow attracted over a hundred traders. Opinions were mixed. That, combined with Matt Levine clearly reporting having fun, felt good enough to give the book a try.
I need not have worried.
Going Infinite is awesome. I would have been happy with my decision on the basis of any one of the following:
The details I learned or clarified about the psychology of SBF...
+9. This is at times hilarious, at times upsetting story, of how a man gained a massive amount of power and built a corrupt empire. It's a psychological study, as well as a tale of a crime, hand-in-hand with a lot of naive ideologues.
I think it is worthwhile for understanding a lot about how the world currently works, including understanding individuals with great potential for harm, the crooked cryptocurrency industry, and the sorts of nerds in the world who falsely act in the name of good.
I don't believe that all the details here are fully accurate, but ...