RL capability gains might mostly come from better self-elicitation.
Ran across a paper NUDGING: Inference-time Alignment of LLMs via Guided Decoding. The authors took a base model and a post-trained model. They had the base model try to answer benchmark questions, found the positions where the base model was least certain, and replaced specifically those tokens with tokens from the post-trained model. The base model, so steered, performed surprisingly well on benchmarks. Surprisingly (to me at least), the tokens changed tended to be transitional phrases rat...
The way they use the word "aligned" in that paper is very weird to me :P (they basically use it as a synonym for "instruction-following" or "post-trained").
But I feel like this method could actually be adapted for AI alignment/safety. It's kind of similar to my "incremental steering" idea, but instead of a strong untrusted model guiding a weak trusted model, there's a weak post-trained model guiding a strong base model. This also looks more practical than incremental steering, because it alternates between the weak model and the strong model, rather than g...
Superstable proteins: A team from Nanjing University just created a protein that's 5x stronger against unfolding than normal proteins and can withstand temperatures of 150C. The upshot from some analysis on X seems to be:
So why is this relevant? It's basically the first step to...
Yeah the paper seems more like a material science paper than a biology paper. There was no test/simulations/discussion about biological function; similar to DNA computing/data storage, it's mostly interested in the properties of the material than how it interfaces with pre-existing biology.
They did optimize for foldability, and did successfully produce the folded protein in (standard bacterial) cells. So it can be produced by biological systems (at least briefly), and more complex proteins had lower yields.
Their application they looked at was hydrogels, and it seems to have improved performance there? But functioning in biological systems introduces more constraints.
Dopamine might be what regulates top-down, "will-imposing" action.
Stimulants are great for increasing attention, motivation and mood. However, they also cause downregulation of dopamine receptors, thus potentially causing dependence and the opposite of the benefits when not taking them.
Some lesser-known ways to upregulate the dopaminergic system without (or with less of) this effect:
Yes, dopamine certainly plays an important role in "will imposing" actions (and will-distracting actions if you've got too much dopamine activity).
I don't think there's a free lunch on downregulation as a result of upregulation. The brain has those negative feedback loops at many levels. (I'm told the rest of biology does too- it's a big part of how a messy genome makes something that works under a lot of conditions).
What goes up, must come down. Artifrically changing how your brain works means it will adapt, and work more the opposite way when you're not ...
One theme I've been thinking about recently is how bids for connection and understanding are often read as criticism. For example:
Person A shares a new idea, feeling excited and hoping to connect with Person B over something they've worked hard on and hold dear.
Person B asks a question about a perceived inconsistency in the idea, feeling excited and hoping for an answer which helps them better understand the idea (and Person B).
Person A feels hurt and unfairly rejected by Person B. Specifically, Person A feels like Person B isn't willing to give their sinc...
Agreed that there's a lot of suffering involved in this sort of interaction. Not sure how to fix it in general - I've been working on it in myself for decades, and still forget often. Please take the following as a personal anecdote, not as general advice.
The difficulty (for me) is that "hoping to connect" and understanding the person in addition to the idea are very poorly defined, and are very often at least somewhat asymmetrical, and trying to make them explicit is awkward and generally doesn't work.
I find it bizarre and surprising, no mat...
Someone on the EA forum asked why I've updated away from public outreach as a valuable strategy. My response:
I used to not actually believe in heavy-tailed impact. On some gut level I thought that early rationalists (and to a lesser extent EAs) had "gotten lucky" in being way more right than academic consensus about AI progress. I also implicitly believed that e.g. Thiel and Musk and so on kept getting lucky, because I didn't want to picture a world in which they were actually just skillful enough to keep succeeding (due to various psychological blockers)....
Why did the task fall to a bunch of kids/students?
I'm not surprised by this, my sense is that it's usually young people and outsiders who pioneer new fields. Older people are just so much more shaped by existing paradigms, and also have so much more to lose, that it outweighs the benefits of their expertise and resources.
Also 1993 to 2000 doesn't seem like that large a gap to me. Though I guess the thing I'm pointing at could also be summarized as "why hasn't someone created a new paradigm of AI safety in the last decade?" And one answer is that Paul and C...
For decades, people have been saying that the prediction market has the potential to become economically important, yet it remains unimportant. I would not be surprised if it becomes important over the next 4 years thanks to broadly-available AI technology.
Let's define "economically important" as a state of affairs in which there continues to be at least $50 billion riding on predictions at every instant in time.
First of all, AI tech might make prediction markets better by helping with market-making and arbitrage. Second, a sufficiently robust prediction m...
Maybe I failed to write something that reasonable people could parse.
Notes to self about the structure of the problem, probably not interesting to others:
This is heavily drawn from MIRIs work and Joe Carlsmith's work
So, there are two kinds of value structure: (1) Long-term goals, and (2) immediate goals & deontological constraints. The line between them isn't sharp but that's OK.
If we imagine an agent that only has long-term goals, well, that thing is going to be a ruthless consequentialisty optimizer thingy and when it gets smart and powerful enough it'll totally take over the world if it can, unless the maxima of its ...
More reasons to worry about relying on constraints:
Do we have some page containing resources for rationalist parents, or generally for parents of smart children? Such as recommended books, toys, learning apps, etc.
I found tag https://www.lesswrong.com/w/parenting but I was hoping for some kind of best textbooks / recommendations / reference works but for parents/children.
I'm not arguing either way. I just note this specific aspect that seems relevant. The question is: Is the babies body more susceptible to alcohol than an adults body. For example, does the liver work better or worse than for a baby? Are there developmental processes that can be disturbed by the presence of alcohol? By default I'd assume that the effect is proportional (except maybe the baby "lives faster" in some sense, so the effect may be proportional to metabilism or growth speed or something). But all of that is speculation.
For the last week ChatGPT 5.1 is glitching.
*It claims to be 5.1, I do not know how to check it, since I use free version (limited questions per day), and there is no version selection.
When I ask it to explain some topic and ask deeper and deeper questions, at some point it chooses to enter the thinking mode. I see that the topics it thinks about are relevant, but as it stops thinking it and says something similar "Ah, Great, here is the answer..." and explains another topic from like 2-3 messages back, which is already not related to the question.
I do not use memory or characters features.
In simple chat conversations, where i want it to generate a javascript line of code, it gets stupid. But in other chats, where i raise more difficult topics and thus i explain more than ask, it seems to be quite smart.
Am I understanding correctly that recent revelations from Ilya's deposition (e.g. looking at the parts here) suggest Ilya Sutskever and Mira Murati seem like very selfish and/or cowardly people? They seem approximately as scheming or manipulative as Sam Altman, if maybe more cowardly and less competent.
My understanding from is that they were basically wholly responsible for causing the board to try to fire Sam Altman. But when it went south, they actively sabotaged the firing (e.g. Mira disavowing it and trying to retain her role, Ilya saying he regr...
Make sure the people on the board of OpenAI were not catastrophically naive about corporate politics and public relations? Or, make sure they understand their naïveté well enough to get professional advice beforehand? I just reread the press release and can still barely believe how badly they effed it up.
Do LLMs have intelligence (mind), or are they only rational agents? To understand this question, I think it is important to delineate the subtle difference between intelligence and rationality.
In current practice of building artificial intelligence, the most common approach is the standard model, which refers to building rationally acting agents—those that strive to accomplish some objective put into them (see “Artificial Intelligence: A Modern Approach” by Russell and Norvig). These agents, built according to the standard model, use an external standard f...
Many have asserted that LLM pre-training on human data can only produce human-level capabilities at most. Others, eg Ilya Sutskever and Eliezer Yudkowsky, point out that since prediction is harder than generation, there's no reason to expect such a cap.
The latter position seems clearly correct to me, but I'm not aware of it having been tested. It seems like it shouldn't be that hard to test, using some narrow synthetic domain.
The only superhuman capability of LLMs that's been clearly shown as far as I know is their central one: next-token prediction. But I...
LLMs are of course also superhuman at knowing lots of facts, but that's unlikely to impress anyone since it was true of databases by the early 1970s.
Epistemic status: I think that there are serious problems with honesty passwords (as discussed in this post), and am not sure that there are any circumstances in which we'd actually want to use them. Furthermore, I was not able to come up with a practical scheme for honesty passwords with ~2 days of effort. However, there might be some interesting ideas in this post, and maybe they could turn into something useful at some later point.
...Thanks to Alexa Pan, Buck Shlegeris, Ryan Greenblatt, Vivek Hebbar and Nathan Sheffield for discussions that led to me writi
Fine-tuning induced confidence is a concerning possibility that I hadn't thought of. idk how scared to be of it. Thanks!
much has been made of the millennial aversion to phone calls that could have been an email, and I have a little bit of this nature, but I think most of my aversion here is to being on hold and getting bounced around different call departments.
I kind of want to check if 1. the aversion is real and generational as common wisdom holds, 2. if it is real, if calling became a genuinely worse experience around the time millennials started trying to do things.
Data point: I'm millennial (born 1992) and have a pretty strong aversion to phone calls, which is motivated mainly by the fact that I prefer most communication to be non-real-time so that I can take time to think about what to say without creating an awkward silence. And when I do engage in real-time communication, visual cues make it much less unpleasant, so phone calls are particularly bad in that I have to respond to someone in real time without either of us seeing the other's face/body language.
If I had to take a wild guess at why this seems to be gene...
I quite appreciated Sam Bowman's recent Checklist: What Succeeding at AI Safety Will Involve. However, one bit stuck out:
...In Chapter 3, we may be dealing with systems that are capable enough to rapidly and decisively undermine our safety and security if they are misaligned. So, before the end of Chapter 2, we will need to have either fully, perfectly solved the core challenges of alignment, or else have fully, perfectly solved some related (and almost as difficult) goal like corrigibility that rules out a catastrophic loss of control. This work co
"we don't need to worry about Goodhart and misgeneralization of human values at extreme levels of optimization."
That isn't a conclusion I draw, though. I think you don't know how to parse what I'm saying as different from that rather extreme conclusion -- which I don't at all agree with? I feel concerned by that. I think you haven't been tracking my beliefs accurately if you think I'd come to this conclusion.
FWIW I agree with you on a), I don't know what you mean by b), and I agree with c) partially—meaningfully different, sure.
Anyways, when I talk abou...
Gpt5.2 seems to have been trained specifically to be better at work tasks, especially long ones. It was also released early, according to articles about a "code red" in openAI. As such, (I predict) it should be a jump on the metr graph. It will be difficult to differentiate progress because it was trained to do well at long work tasks from the results of the early release and from any actual algorithms progress. (An example of algorithms progress would be a training method for using memory well - something not specific to eg programming tasks.)
What do you mean by "a jump on the metr graph"? Do you just mean better than GPT-5.1? Do you mean something more than that?
What constitutes cooperation? (previously)
Because much in the complex system of human interaction and coordination is about negotiating norms, customs, institutions, constitutions to guide and constrain future interaction in mutually-preferable ways, I think:
deserves special attention.
This is despite it perhaps (for a given 'coordination moment') being in theory reducible to preference elicitation or aggregation, searching outcome space, negotiation, enforcement, ...
There are failure modes (unintended consequences, ...
Seems right! 'studies' uplifts 'design' (either incremental or saltatory), I suppose. For sure, the main motivation here is to figure out what sorts of capabilities and interventions could make coordination go better, and one of my first thoughts under this heading is open librarian-curator assistive tech for historic and contemporary institution case studies. Another cool possibility could be simulation-based red-teaming and improvement of mechanisms.
If you have any resources or detailed models I'd love to see them!
I think people in these parts are not taking sufficiently seriously the idea that we might be in an AI bubble. this doesn't necessarily mean that AI isn't going to be a huge deal - just because there was a dot com bubble doesn't mean the Internet died - but it does very substantially affect the strategic calculus in many ways.
The evidence just seems to keep pointing towards this not being a bubble.