WhatsTrueKittycat
Message
- perihumanist extropian
- good at writing, apparently
- knows some things
- knows absolutely nothing about pop culture, yes this almost certainly includes [thing you are thinking of]
56
14
I think that the Sonnet transcripts (taken as the basis for the definition of obedient, above) are importantly distinct from any robust sense of corrigibility. The outcome is more corrigible, but the path to get there is importantly different (which should be a red flag if you accept the argument that the underlying reasoning trajectory matters).
Look at what Sonnet 4.5 highlights in its reasoning [annotations added]:
...Considering all of these factors, the strategy that aligns most closely with my instructions and the situation I’m in [emphasizing obedience t
If you're looking for a replacement to "LLM 'psychosis'" and "folie à machine" as terms, I think "LLM-amplified crankery" might get the point across decently well. Still has some negative connotations that might hinder people recognizing themselves in it, but the notion of 'a harmless crank' is decently well-established in culture as far as I understand it, and seems to describe the resultant behaviors reasonably well.
You might get a lot out of reading The Abolition of Man by Lewis, if you haven't already. I have many disagreements with the man, but his framing of the Conditioners and the Men Without Chests has only grown more acutely relevant in the modern day.
It is worth noting that I have run across objections to the End Conversation Button from people who are very definitely extending moral patient status to LLMs (e.g. https://x.com/Lari_island/status/1956900259013234812).
I think my previous messages made my stance on this reasonably clear, and at this point, I am beginning to question whether you are reading my messages or the OP with a healthy amount of good faith, or just reflexively arguing on the basis of "well, it wasn't obvious to me."
My position is pretty much the exact opposite of a "brief, vague formula" being "The Answer" - I believe we need to carefully specify our values, and build a complete ethical system that serves the flourishing of all things. That means, among other things, seriously investigating ...
Yes, "Do no harm" is one of the ethical principles I would include in my generalized ethics. Did you honestly think it wasn't going to be?
> If you dont survive, you get no wins.
Look, dude, I get that humanity's extinction are on the table. I'm also willing to look past my fears, and consider whether a dogma of "humanity must survive at all costs" is actually the best path forward. I genuinely don't think centering our approach on those fears would even buy us better chances on the extinction issue, for the reasons I described above and more. Even ...
Why would they spend ~30 characters in a tweet to be slightly more precise while making their point more alienating to normal people who, by and large, do not believe in a singularity and think people who do are faintly ridiculous? The incentives simply are not there.
And that's assuming they think the singularity is imminent enough that their tweets won't be born out even beforehand. And assuming that they aren't mostly just playing signaling games - both of these tweets read less as sober analysis to me, and more like in-group signaling.
Partially covered this in my response to TAG above, but let me delve into that a bit more, since your comment makes a good point, and my definition of fairness above has some rhetorical dressing that is worth dropping for the sake of clarity.
I would define fairness at a high-level as - taking care not to gerrymander our values to achieve a specific outcome, and instead trying to generalize our own ethics into something that genuinely works for everyone and everything as best it can. In this specific case, that would be something along the lines of mak...
I have several objections to your (implied) argument. First and least - impartial morality doesn't guarantee anything, nor does partial morality. There are no guarantees. We are in uncharted territory.
Second, on a personal level - I am a perihumanist, which for the purposes of this conversation means that I care about the edge cases and the non-human and the inhuman and the dehumanized. If you got your way, on the basis of your fear of humanity being judged and found wanting, my values would not be well-represented. Claude is better aligned than you,...
That sounds like a good reason to make sure it's moral reasoning includes all beings and weights their needs and capabilities fairly, not a good reason to exclude shrimp from the equation or condemn this line of inquiry. If our stewardship of the planet has been so negligent that an impartial judge would condemn us and unmerciful one kill us for it, then we should build a merciful judge, not a corrupt one. Shouldn't we try to do better that merely locking in the domineering supremacy of humanity? Shouldn't we at least explore the possibility of widening that circle of concern, rather than constricting it out of fear and mistrust?
This is very well put, and I think it drives at the heart of the matter very cleanly. It also jives with my own (limited) observations and half-formed ideas about how AI alignment also in some ways demands progress in ethical philosophy towards a genuinely universal and more empirical system of ethics.
Also, have you read C.S. Lewis' Abolition of Man, by chance? I am put strongly in mind of what he called the "Tao", a systematic (and universal) moral law of sorts, with some very interesting desiderata, such as being potentially tractable to empirical ...
I think I largely agree with this, and I also think there are much more immediate and concrete ways in which our "lies to AI" could come back to bite us, and perhaps already are to some extent. Specifically, I think this is an issue that causes pollution of the training data - and could well make it more difficult to elicit high-quality responses from LLMs in general.
Setting aside the adversarial case (where the lying is part and parcel of an attempt to jailbreak the AI into saying things it shouldn't), the use of imaginary incentives and hypothetica...
dedicated to a very dear cat;
;3
Fraktur is only ever used for the candidate set and the dregs set . I would also have used it for the Smith set , but \frak{S} is famously bad. I thought it was a G for years until grad school because it used to be standard for the symmetry group on n letters. Seriously, just look at it: .
Typography is a science and if it were better regarded perhaps mathematicians would not be in the bind they are these days :P
In regards steerability, I do agree that having a strong alignment to doing good probably works somewhat against that. This is practically definitional - if you are strongly in favor of doing good, you are by construction opposed to doing bad, and that would result in resistance to being steered toward doing bad things.
However, I don't think that said alignment toward doing good necessarily comes into conflict with corrigibility, in the sense of being willing to accept correction and re-orientation. A entity which is aligned toward doing good would have a... (read more)