So is it an example of a statement that is maybe a truth because it is maybe a lie? And if it is definitely one or the other, it is definitely a lie? Fun!
I think notion of "one-shot-ness" introduces a counterproductive dichotomy. A lot of people, even those aligned with the general AI-risk position of the writer, react to this framing as downplaying or dismissing the role of empirical research, trials with smaller AIs, etc. (Oliver's quotes as evidence). I think it doesn't necessitate strawmaning to reply in this way! And I found confusion around "Yudkowsky wants us to have perfect theoretical understanding before we try anything" reasonable, even if it misrepresents the writer's position.
Indeed, "one-shot-...
Practically, mode collapse seems like a bad thing by itself if (a) underlying reality shifts, or (b) if your beliefs in the first place were incorrect. Example of (a) would be when, after your mode collapsed, animal welfare becomes 80% of the proposals. Example of (b) would be an image model that "didn't know" that diversity of outputs is in itself a value.
(b) doesn't seem as bad for humans, because if we are investigating our beliefs, and find out some of our previously held convictions were wrong, we can try to trace back what decisions those informed, ...
I don't think we (currently, I do not make the case for me-living-in-1490s) live in a world of scarce opportunities to apply yourself to something. Yes, things that turn out to be good can sometimes not be done with good intentions of the majority of people doing them. But also, most (?) bad things are done by people with bad intentions (or bad beliefs). I would be surprised if in a world where you can choose to apply yourself to movements that exercise effort to have good intentions & good beliefs you wouldn't be better off assigning some weight to th...
I'm under impression that conquest of the American continent was not a project of goodness, it was a project of conquest. Then, revolution happened (the way it happened), Washington resigning happened, and a whole lot of other things, and in the end, it could be argued (controversially), that, given on oracle into 2026 and a counterfactual world, a person guided by "goodness" would not oppose the colonial project. But original conquerors didn't have this oracle, and weren't acting in service of "goodness". There are, I believe, countless examples throughou...
I'm somewhat new to the rationalist space. I'm under impression rationalist thought, existence / magnitude of which is partially supported through EA infrastructure, is one of the main driving forces behind whatever progress we have at AI Safety. Most of the people I meet in this space are at least EA / rationality adjacent.
I read your comment as implying that EA is, at least within the scope of just the accelerating ASI, a net negative and world would be better off counterfactually if EA was to retire. Can you explain why do you believe what you believe?
I’m sceptical of this being a universal rule because I don’t know why you believe what you believe. However, I want to scream “preach!” because empirically I agree 100%.
I don’t know if it is my theatre background, but another thing I find severely underappreciated are textured shadows. Worse than a blue LED is only a bright blue LED in the middle of the ceiling as the only light source.
I think (based on photos), it is another thing Lighthaven does really well!
I want to add that helping out someone with low morale seems like a very high impact intervention to me, if you are in a good position to help! Low morale predicts even lower morale, both because there is likely something in the environment causing it, and because you have less desire to act or contribute effort, which starves you of positive rewards. In that regard, it is similar to depression, but far more people are in position to help someone with morale rather than with treating depression, by rewarding someone neglected fairly for their contributions.
...If you accept both properties and you violate independence, you can be money-pumped. Here is how it works, concretely. Suppose your preference between gambles A and B depends on what the common component C is (as the independence axiom says it shouldn't). Before the uncertainty resolves, you evaluate the compound lottery holistically and prefer the plan involving B (because, in combination with the C branch, B produces a better overall distribution). But then the coin comes up heads, the C branch is now off the table, and you find yourself choosing between
...Here is the specific confusion that matters for our purposes. When someone says "a rational agent maximizes expected utility," this sounds, to a casual listener, like it means "a rational agent computes the probability-weighted average of their subjective values across all possible outcomes." In other words, it sounds like the agent takes f1, the function representing how good each outcome feels or how much they value it, and averages it across possible worlds, weighted by probability. This would mean that the agent literally values a gamble at the weighte
...I think the most natural fix within the VNM theory is to just say S' and D' are the events "car is awarded so son/daughter based on a coin toss", which are slightly better than S and D themselves, and that F is really 0.5S' + 0.5D'. Unfortunately, such modifications undermine the applicability of the VNM theorem, which implicitly assumes that the source of probabilities itself is insignificant to the outcomes for the agent. Luckily, Bolker4 has divised an axiomatic theory whose theorems will apply without such assumptions, at the expense of some uniquenes
Coauthored by Fedor Ryzhenkov and Dmitrii Volkov (Palisade Research)
At Palisade, we often discuss latest safety results with policymakers and think tanks who seek to understand the state of current technology. This document condenses and streamlines the various internal notes we wrote when discussing Anthropic's "Scaling Monosemanticity".
Research on AI interpretability aims to unveil the inner workings of AI models, traditionally seen as “black boxes.” This enhances our understanding, enabling us to make AI safer, more predictable, and more efficient. Anthropic’s Transformer Circuits Thread focuses on mechanistic (bottom-up) interpretability of AI models.
Their latest result, Scaling Monosemanticity, demonstrates how interpretability techniques that worked for small, shallow models can scale to practical 7B (GPT-3.5-class) models. This paper also paves the way for applying similar methods to larger frontier models (GPT-4 and beyond).
Yes, this was exactly what I was pointing at, unless I misunderstood your further comment and it does, in fact, point to a mistake of my reasoning.