Does this match your viewpoint? "Suffering is possible without consciousness. The point of welfare is to reduce suffering."
I have been associating the term "welfare" with suffering-minimization. (Suffering is, in the most general sense, the feelings that come from lacking something from the Maslow's hierarcy of needs.)
It indeed seems like I've misunderstood the whole caring-about-others thing. It's about value-fullfillment, letting them shape the world as they see fit? And reducing suffering is just the primary example of how biological agents wish the world to change.
That's way more elegant model than focusing on the suffering, at least. Sadly this seems to make the question "why should I care?" even harder to answer. At least suffering is aesthetically ugly, and there's some built-in impulse to avoid it.
EDIT: You're arguing for preference utilitarism here, right?
I have not. A reasonable person would have. I think obtaining it rather complicated (in Finland, that is), but possibly worth it. Possibly worth it doesn't mean worth it. I recognize that I'm probably not thinking clearly about this.
But what's the reason to be productive beyond my natural abilities? I fear I would just use it all to make even more money, which doesn't matter to my wellbeing at all. The admiration of others? That would be cheating. Self actualization? I don't think depression will go away by just doing more stuff, the problem isn't not doing enough, it's not enjoying the results. Fixable with other medication? Possibly. (Go to step one)
An astute reader pointed out that the Clueless designation might make more sense, if we consider ChatGPT inferior instead. I hadn't consider that option, and it makes much more sense.
I'm drawing parallels between conventional system auditing and AI alignment assessment. I'm admittedly not sure if my intuitions transfer over correctly. I'm certainly not expecting the same processes to be followed here, but many of the principles should still hold.
We believe that these findings are largely but not entirely driven by the fact that this early snapshot had severe issues with deference to harmful system-prompt instructions. [..] This issue had not yet been mitigated as of the snapshot that they tested.
In my experience, if an audit finds lots of issues, it means nobody has time to look for the hard-to-find issues. I get the same feeling from this section; Apollo easily found scheming issues where the model deferred to the system prompt too much. Often subtler issues get completely shadowed, e.g. some findings could be attributed to the system prompt deference, when in reality they were caused by something else.
To help reduce the risk of blind spots in our own assessment, we contracted with Apollo Research to assess an early snapshot for propensities and capabilities related to sabotage
What I'm worried about that these potential blind spots were not found, as per my reasoning above. I think the marginal value produced by a second external assessment wasn't diminished much by the first one. That said, I agree that deploying Claude 4 is quite unlikely to pose any catastrophic risks, especially with ASL-3 safeguards. Deploying earlier, allowing anyone to run evaluations on the model is also valuable.
You cannot incentivize people to make that sacrifice at anything close to the proper scale because people don’t want money that badly. How many hands would you amputate for $100,000?
There's just no political will to do it, since the solutions would be harsh or expensive enough that nobody could impose them upon society. A god-emperor, who really wished to increase fertility numbers and could set laws freely without the society revolting, could use some combination of these methods:
Communication is indeed hard, and it's certainly possible that this isn't intentional. On the other hand, making mistakes is quite suspicious when they're also useful for your agenda. But I agree that we probably shouldn't read too much into it. The system card doesn't even mention the possibility of the model acting maliciously, so maybe that's simply not in scope for it?
While reading OpenAI Operator System Card, the following paragraph on page 5 seemed a bit weird:
We found it fruitful to think in terms of misaligned actors, where:
- the user might be misaligned (the user asks for a harmful task),
- the model might be misaligned (the model makes a harmful mistake), or
- the website might be misaligned (the website is adversarial in some way).
Interesting use of language here. I can understand calling the user or website misaligned, as understood as alignment relative to laws or OpenAI's goals. But why call a model misaligned when it makes a mistake? To me, misalignment would mean doing that on purpose.
Later, the same phenomenon is described like this:
The second category of harm is if the model mistakenly takes some action misaligned with the user’s intent, and that action causes some harm to the user or others.
Is this yet another attempt to erode the meaning of "alignment"?
Ellison is the CTO of Oracle, one of the three companies running the Stargate Project. Even if aligning AI systems to some values can be solved, selecting those values badly can still be approximately as bad as the AI just killing everyone. Moral philosophy continues to be an open problem.
Don't be sorry. While I didn't like it, it was worth it; no question about that. In the intro post on the 1st, I wrote:
That was achieved. I ought to feel proud of myself, but right now I just feel numb.
My motivation was in a way a mix of all four categories, the division between them quite unclear. I don't think it was so much about writing, though, and more about expressing ideas. I want to be the kind of person who is known for having the kind of ideas I do have. And on the object level, I want those ideas to be known and discussed about. Writing is just the form in which ideas are supposed to be communicated, when aiming for clarity. That mostly covers B and D. The blog was also a good conversation-starter, too, and allowed me for a moment to define myself to others as a blogger instead of a tech worker. There's certainly some self-image (C) aspects to this too, but it's less prominent.
But especially in the beginning I also wanted to try writing to know whether I wanted it. I'm quite prone to expecting every new thing to feel awful, so trying things regardless is necessary. Eighty or so hours is not that steep of a price to pay for figuring that out. I rarely stick on things for such a long time, and when halfway through I was feeling that this makes no sense, I recognized that I was about to give up because it was hard, not because I disliked it.
I would gladly exchange my current work for writing texts like these, if I didn't think money and issue and there was some external motivator making me do it. But currently quitting my job to write seems unwise; I'd just spent the freed up time on some form of mindless time wasting instead. I was hoping to change that view of myself by doing this, but alas. Truthseeking doesn't cure depression; the cause and effect are intertwined.