For anyone who doubts deep state power:
(1) When Elon's Doge tried to investigate the Pentagon. A bit after that there's the announcement that Elon will soon leave Doge and there's no real Doge report about cuts to the Pentagon.
(2) Pete Hegseth was talking about 8% cuts to the military budget per year. Instead of a cut, the budget increased by 13%.
(3) Kash Patel and Pam Bondi switch on releasing Epstein files and their claim that Epstein never blackmailed anyone is remarkable.
"People can get pressured" or "people can get bribed" or "people sometimes once inside a system discover they are in fact subject to all the same incentives that applied to all the other people inside that system before them" is all you'd get from this, but that's not evidence for anything like a "deep state" unless you meant that term in such a loose meaning that then it would be a trivial discover.
Yeah, established organizations have internal politics we don't all know about from the outside. When push comes to shove, the rich donors who have stock in arms sales end up mattering more than Elon Musk or Pete Hegseth.
We get like 10-20 new users a day who write a post describing themselves as a case-study of having discovered an emergent, recursive process while talking to LLMs. The writing generally looks AI generated. The evidence usually looks like, a sort of standard "prompt LLM into roleplaying an emergently aware AI".
It'd be kinda nice if there was a canonical post specifically talking them out of their delusional state.
If anyone feels like taking a stab at that, you can look at the Rejected Section (https://www.lesswrong.com/moderation#rejected-posts) to see what sort of stuff they usually write.
What time of day are you least instrumentally rational?
(Instrumental rationality = systematically achieving your values.)
A couple months ago, I noticed that I was consistently spending time in ways I didn't endorse when I got home after dinner around 8pm. From then until about 2-3am, I would be pretty unproductive, often have some life admin thing I should do but was procrastinating on, doomscroll, not do anything particularly fun, etc.
Noticing this was the biggest step to solving it. I spent a little while thinking about how to fix it, and it's not like a...
Intuitively, when I'm more tired or most stressed. I would guess that is most likely in the morning - if often have to get up earlier than I like. This excludes getting woken up unexpectedly in the middle of the night, which is known to mess with people's minds.
I tried to use my hourly Anki performance, but it seems very flat, except indeed for a dip a 6 AM, but that could be lack of data (70 samples).
There's this concept I keep coming around to around confidentiality and shooting the messenger, which I have not really been able to articulate well.
There's a lot of circumstances where I want to know a piece of information someone else knows. There's good reasons they have not to tell me, for instance if the straightforward, obvious thing for me to do with that information is obviously against their interests. And yet there's an outcome better for me and either better for them or the same for them, if they tell me and I don't use it against them.
(Consider...
I used to think reward was not going to be the optimization target. I remember hearing Paul Christiano say something like "The AGIs, they are going to crave reward. Crave it so badly," and disagreeing.
The situationally aware reward hacking results of the past half-year are making me update more towards Paul's position. Maybe reward (i.e. reinforcement) will increasingly become the optimization target, as RL on LLMs is scaled up massively. Maybe the models will crave reward.
What are the implications of this, if true?
Well, we could end up in Control Wo...
Well, continual learning! But otherwise, yeah, it's closer to undefined.
The question of what happens after the end of the training is more like a free parameter here. "Do reward seeking behaviors according to your reasoning about the reward allocation process" becomes undefined when there is none and the agent knows it.
Maybe it tries to do long shots to get some reward anyway, maybe it indulges in some correlate of getting reward. Maybe it just refuses to work, if it know there is no reward. (it read all the acausal decision theory stuff, after all)
I just learned about the difference between fundamental and technical analysis in stock trading. It seems like a very useful metaphor to apply to other areas.
My thoughts here are very fuzzy though. It seems pretty similar to inside vs outside view.
Does anyone have thoughts here? What is the essence of the difference between fundamental and technical analysis? How similar is it to inside vs outside view? Whether or not you're modeling the thing itself (fundamental) or things "outside" the thing itself (technical)? Maybe it makes sense to think about causal ...
definitely has not helped my bank account to not have a degree though, lol
It's instrumentally useful for early AGIs to Pause development of superintelligence for the same reasons as it is for humans. Thus preliminary work on policy tools for Pausing unfettered RSI is also something early AGIs could be aimed at, even if it's only half-baked ideas available on the eve of potential takeoff, as the AGIs are proving hard to aim and start doing things for their own reasons.
because they gamble that more powerful AIs will share their preferences (edit: share their preferences more than the humans in control do)
Ah, I'm thinking the AGIs themselves get closer to being proper stakeholders at that stage, for practical purposes (along the lines of gradual disempowerment), since they do have all the basic AI advantages even if they aren't superintelligent. So humans remaining in control is not centrally the case even if nominally they still are and intent alignment still mostly works.
The conditions for such partial loss of contro...
I'm looking for a video of AI gone wrong illustrating AI risk and unusual persuasion. It starts with a hall with blinking computers where an AI voice is manipulating a janitor and it ends with a plane crashing and other emergencies. I think it was made between 2014 and 2018 and linked on LW but I can't google, perplex or o3 it. And ideas?
Yes! That's the one. Thank you.
Suppose you want to collect some kind of data from a population, but people vary widely in their willingness to provide the data (eg maybe you want to conduct a 30 minute phone survey but some people really dislike phone calls or have much higher hourly wages this funges against).
One thing you could do is offer to pay everyone dollars for data collection. But this will only capture the people whose cost of providing data is below , which will distort your sample.
Here's another proposal: ask everyone for their fair price to provide the dat...
Ah oops, I now see that one of Drake's follow-up comments was basically about this!
One suggestion that I made to Drake, which I'll state here in case anyone else is interested:
Define a utility function: for example, utility = -(dollars paid out) - c*(variance of your estimator). Then, see if you can figure out how to sample people to maximize your utility.
I think this sort of analysis may end up being more clear-eyed in terms of what you actually want and how good different sampling methods are at achieving that.
Papers as thoughts: I have thoughts that contribute to my overall understanding of things. The AI safety field has papers that contributes to its overall understanding of things. Lots of thoughts are useful without solving everything by themselves. Lots of papers are useful without solving everything by themselves. Papers can be pretty detailed thoughts, but they can and probably should tackle pretty specific things, not try to be extremely wide-reaching. The scope of your thoughts on AI safety don’t need to be limited to the scope of your paper; in fact, ...
Superintelligence that both lets humans survive (or revives cryonauts) and doesn't enable indefinite lifespans is a very contrived package. Grading "doom" on concerns centrally about the first decades to centuries of post-AGI future (value/culture drift, successors, the next few generations of humanity) is not taking into account that the next billions+ years is also what could happen to you or people you know personally, if there is a future for originally-humans at all.
(This is analogous to the "missing mood" of not taking superintelligence into account ...
Superintelligence that both lets humans survive (or revives cryonauts) and doesn't enable indefinite lifespans is a very contrived package.
I don't disagree, but I think we might not agree on the reason. Superintelligence that lets humanity survive (with enough power/value to last for more than a few thousand years, whether or not individuals extend beyond 150 or so years) is pretty contrived.
There's just no reason to keep significant amounts of biological sub-intelligence around.
This post proposes a mechanistic model of a common kind of depression, framing it not as a transient emotional state or a chemical imbalance, but as a persistent, self-reinforcing control loop. The model assumes a brain composed of interacting subsystems, some of which issue heuristic error signals (e.g., bad feelings), and others which execute learned policies in response. The claim is that a large part of what is commonly called "depression" can be understood as a long-term learned pattern of suppressing ...
I’m glad that there are radical activist groups opposed to AI development (e.g. StopAI, PauseAI). It seems good to raise the profile of AI risk to at least that of climate change, and it’s plausible that these kinds of activist groups help do that.
But I find that I really don’t enjoy talking to people in these groups, as they seem generally quite ideological, rigid and overconfident. (They are generally more pleasant to talk to than e.g. climate activists in my opinion, though. And obviously there are always exceptions.)
I also find a bunch of activist tactics very irritating aesthetically (e.g. interrupting speakers at events)
I feel some cognitive dissonance between these two points of view.
Able activists are conflict theorists. They understand the logic of power & propaganda & cultish devotion at an intuitive level. To become an effective soldier one needs to excise a part of the brain devoted to evenkeeled uncertainty, nuance, intellectual empathy, self-doubt.
Conflict theorists may do great good as readily as they may do great harm. They wield a dangerous force, easily corruptible, yet perhaps necessary.
Epistemic status: Random thought, not examined too closely.
I was thinking a little while ago about the idea that there are three basic moral frameworks (consequentialism, virtue ethics, deontology) with lots of permutations. It occurred to me that in some sense they form a cycle, rather than one trying to be fundamental. I don't think I've ever considered or encountered that idea before. I highly doubt this is in any way novel, and am curious how common it is or where I can find good sources that explore it or something similar.
Events are judged by their c...
This got me thinking. It may be a tangent, but still:
Seems to me as if values are what underpins it all. What do we value? How do we evaluate things? Once you have a clear enough picture of those two questions, the rest will follow.
Also: There will clearly be individual answers changed over time. Crafted by experiences, interactions and our very evolutionary foundation for ethics. That last part confuses the hell out of me. It seems so random. Like, I clearly deeply feel that reacting to things in my proximity is "right", as opposed to tragedies happening ...
Are there known "rational paradoxes", akin to logical paradoxes ? A basic example is the following :
In the optimal search problem, the cost of search at position i is C_i, and the a priori probability of finding at i is P_i.
Optimality requires to sort search locations by non-decreasing P_i/C_i : search in priority where the likelyhood of finding divided by the cost of search is the highest.
But since sorting cost is O(n log(n)), C_i must grow faster than O(log(i)) otherwise sorting is asymptotically wastefull.
Do you know any other ?
You don't need O(n log(n)) sorting, but the real problem is that this is a problem in bounded rationality where the cost of rational reasoning itself is considered to come from a limited resource that needs to be allocated.
PSA: if you're looking for a name for your project, most interesting .ml domains are probably available for $10, because the mainstream registrars don't support the TLD.
I bought over 170 .ml domains, including anthropic.ml (redirects to the Fooming Shoggoths song), closed.ml & evil.ml (redirect to OpenAI), interpretability.ml, lens.ml, evals.ml, and many others (I'm happy to donate them to AI safety projects).
Having young kids is mind bending because it's not uncommon to find yourself simultaneously experiencing contradictory feelings, such as:
Every now and then in discussions of animal welfare, I see the idea that the "amount" of their subjective experience should be weighted by something like their total amount of neurons. Is there a writeup somewhere of what the reasoning behind that intuition is? Because it doesn't seem intuitive to me at all.
From something like a functionalist perspective, where pleasure and pain exist because they have particular functions in the brain, I would not expect pleasure and pain to become more intense merely because the brain happens to have more neurons. Rather...
Oops, that's a weird side-effect of the way we implemented spam purging (which is a more aggressive form of deletion than we usually use). We should really fix some bugs related to that implementation.