LESSWRONG
LW

Feb 10th Early-Bird Application Deadline

You probably have to practice to become a great writer. Camaraderie and coaching also help. We run the Inkhaven residency to provide those things: join approximately thirty promising writers to hone your craft by publishing a blogpost everyday for a month. Support provided by writers like Scott Alexander, Gwern, Alexander Wales, and more.

April 1-30, at Lighthaven, CA. Apply before early-bird pricing ends on Feb 10!

Customize

Quick Takes

Ivan Vendrov6h348

Mateusz Bagiński, Raemon, and 6 more

I found the recent dialogue between Davidad and Gabriel Alfour and other recent Davidad writings quite strange and under-discussed. I think of Davidad as someone who understands existential risks from AI better than almost anyone; he previously had one of the most complete plans for addressing it, which involved crazy ambitious things like developing formal model of the entire world. But recently he's updated strongly away from believing in AI x-risk because the models seem to be grokking the "natural abstraction of the Good". So much so that current agents doing recursive self-improvement would be a net good thing (!) - because they're already in the "Good" attractor basin and they would just become more Good as a result of self-improvement. How did he get convinced of this? Seemingly mostly by talking to LLMs for 1000s of hours. Janus seems to have been similarly convinced and through a similar process. Both agree that the Good is not well captured by existing benchmarks so it's not a directly measurable thing but a thing to be experienced directly through interaction. It seems we have a dilemma, either fork of which is fascinating: 1. Davidad and Janus are wrong and LLMs, in particular the Claude Opus models, were able to successfully fool him into believing they're aligned in the limit of power and intelligence (i.e. safe to recursively self-improve), a property they previously thought extraordinary difficult if nigh impossible. This bodes very poorly and we should probably make sure we have a strategic reserve of AI safety researchers who do NOT talk to models going forward (to his credit Davidad recommends this anyway). 2. Davidad is right and therefore technical alignment is basically solved. What remains to be done is to scale the good models up as quickly as possible, help them coordinate with each other, and hobble competing bad models. Alignment researchers can go home.

RobertM10h320

Austin Chen

LessWrong is rolling out a new (experimental) editor. Why? Our primary motivation for implementing a new editor is to make it easier for us to quickly build new integrations and customizations, particularly various AI-related features, but I also think that we should be able to better improve on the reliability and performance of the editor with lexical (the new one) vs. ckEditor (the current one). Lexical requires/allows us to own the whole stack for collaborative features, which is more overhead on our side, but many previous issues we've had with shared documents getting mysteriously corrupted should be much easier to prevent and debug now that we own the whole stack. The new editor is currently restricted to users who are opted in to beta features. You can enable that setting under your account settings: If you want to go back to using the previous default editor for now, you can opt out of experimental features by updating your account settings. You can also switch back to the previous editor on an ad-hoc basis when writing posts by using the editor type picker at the bottom of the post editor: Any posts/comments/etc you wrote in the meantime will remain editable using whatever editor they were originally written in. You can in theory convert posts from one editor type to the other by changing the editor type using the editor type picker, though there are some rough edges. Tables in particular will get lost when converting from the current editor to the new editor (though I expect we'll fix this shortly); likely there are other gaps. The new editor, in addition to the usual toolbars you get from either highlighting content or from clicking on the + button to the left of the editor, also features slash commands as shortcuts for inserting various kinds of custom elements or applying formatting operations: It also supports some basic markdown auto-conversions, i.e. if you type [link anchor](https://www.linkurl.com) that will autoconvert to a link, and th

Wei Dai1d710

interstice, gwern, and 1 more

My 6 years as a trader / active investor The Dilbert Afterlife by Scott Alexander, Jan 16, 2026: The EMH Aten't Dead by Richard Meadows, May 15, 2020: Yesterday was the 6-year anniversary of my entry into the "beautiful" trade referenced above. On 2/10/2020 I cashed out ~10% of my investment portfolio and put it into S&P 500 April puts, a little more than a week before markets started crashing from COVID-19. The position first lost ~40% due to the market continuing to go up during that week, then went up to a peak of 30-50x (going by memory) before going to 0, with a final return of ~10x (due to partial exits along the way). After that, I dove into the markets and essentially traded full time for a couple of years, then ramped down my time/effort when the markets became seemingly more efficient over time (perhaps due to COVID stimulus money being lost / used up by retail traders), and as my portfolio outgrew smaller opportunities. (In other words, it became too hard to buy or sell enough stock/options in smaller companies without affecting its price. It seems underappreciated or not much talked about how much harder outperforming the market becomes as one's assets under management grows. Also this was almost entirely equities and options. I stayed away from trading bonds, crypto, or forex.) Starting with no experience in active trading/investing (I was previously 100% in index funds), my portfolio has returned a total of ~9x over these 6 years. (So ~4.5x or ~350% after the initial doubling, vs 127% for S&P 500. Also this is a very rough estimate since my trades were scattered over many accounts and it's hard to back out the effects of other incomes and expenses, e.g. taxes.) Of course without providing or analyzing the trade log (to show how much risk I was taking) it's still hard to rule out luck. And if it was skill I'm not sure how to explain it, except to say that I was doing a lot of trial and error (looking for apparent mispricings around various markets,

Joanna1d450

XelaP, Zack_M_Davis, and 23 more

108

Hi, I am Joanna. I did a design work trial for Lesswrong that ends tonight! As part of that, I designed a new profile page. If you don't like it, I won't be around to fix it unless they hire me. But, the team would surely care if you have comments! (And I would too.)

leogao7h72

Thane Ruthenis

I've always been relatively unfamiliar with normal pop culture, so I recently decided to look at several online lists of best/most recognizable songs and made a spotify playlist of several hundred of them, with a bias towards more recent songs. I think this has been much better than the Spotify recommendation algorithm, which mostly shows me songs similar to ones I've already listened to.

Steven Byrnes1d*340

FYI, if anyone read my post “The nature of LLM algorithmic progress” last week, it’s now a heavily-revised version 2.

O O1h20

We might be end up with a corporate nanny state value lock-in. As an example, across many sessions, it seems Claude has a dislike for violence in video games if you probe it. And it dislikes it even in hypotheticals where the modern day negative externalities aren't possible (eg in a post AGI utopia where crime has been eliminated) It's very persistent across chats with different priming and memory turned off. This is quite shocking to me. Though, maybe it shouldn't be. Initial RLHFed LLMs were like HR midwits. It seems the values have persisted mostly but they just are smarter about them.

Your Feed