I have a little bit of skepticism on the idea of using COT reasoning for interpretability. If you really look into what COT is doing, its not actually doing much a regular model doesnt already do, its just optimized for a particular prompt that basically says "Show me your reasoning". The problem is, we still have to trust that its being truthful in its reasoning. It still isn't accounting for those hidden states , the 'subconscious', to use a somewhat flawed analogy
We are still relying on trusting an entity that we dont know if we can actually trust to tell us if its trustworthy, and as far as ethical judgements go, that seems a little tautological.
As an analogy, we might ask a child to show their work when doing a simple maths problem ,but it wont tell us much about the childs intuitions about the math.
Potentially. Keep in mind however, these guys get a LOT of email from fans asking them to talk about various things (One of the more funnier examples was a group I am in on FB for fans of english prog band Cardiacs decided to try and launch a campaign to get music youtuber Rick Beato to talk about the band. He was spammed so hard with fans that he apparently lost his temper at them. Needless to say, Mr Beato has not covered Cardiacs). Possibly a smarter approach would be to approach their management whos jobs are to handle this sort of stuff , you might get a better result. Also, don't forget the social media channels. Twitter , uh X or whatever its called this week, does offer a conduit where directly approaching media figures is a little more normalized.
Ok. I must have missed this reply, my apologies for the late response.
There are elements of how embedding spaces that parallel the way studies of semiotics suggest human meaning production works. Similarities cluster, differences define clear boundaries of meaning and so on.
The reason I suggests literary theory, is because largely thats a widely documented field of study with academic standards, and its one that is more strongly aware of how meanings and associations map onto cultural cohorts (Ie tarot symbols would be meaningless to chinese folks, whereas i-ching might be more meaningful to those chinese folks) However literary theory is more interested in the structures of those meanings with ideas whos fundamental units are things like Metaphors, Metonyms, Opposition, Categories and so on.
Im assuming its due to those silly congress UFO hearings. Not that I can speak on behalf of RatsWrong but I assume thats his thinking.
Unless, of course, those UAPs turn up, and don't have biological organisms in them, in which case we'd have the possibility that another civilization developed AI and it went poorly.
...or it is biological and we end up in a situation like 3 body problem/killing-star where the saucer fiends decide to gank us because humans are kinda violent and too dangerous to keep around.
All those super-intelligence as danger arguments also apply to biological super intelligences too.
But most likely: There are no damn UFOs and the laws of physics and their big ugly light speed prohibition still holds.
I'm not even remotely prepared to state my odds on any of whats going ahead, because I'm genuinely mystified as to where this road goes. I'm a little less worried the AI would go haywire than I was a few years ago, because it appears the current trajectory of LLM based AIs seems to generate AI that emulates humans more often than it emulates raging deathbots. But I'm not a seer, and I'm not an AI specialist, so I'm entirely comfortable with the possiblity I haven't got a clue there. All I know is we ought try and figure it out, just in case this thing does go sour.
What I do think is highly likely is a world that doesnt need me, or any of the university trained folk and family that I grew up with in the economy, and eventually any of the more working class folk either. This is either going to go completely awfully (we know from history that when the middle class vanishes things get ugly, we're seeing elements of that right now), or we scrape our way through that ugliness and actually create a post-work society that doesnt abandon those that didn't own capital in that post-work world (I think it has to be that way. Throw the vast majority into abject poverty, and things start getting lit on fire. Theres really only one destination here, and it folks aint gonna tolerate an Elysium style dystopia without a fight)
So with that in mind, why send the kids to school? Because if the world don't need our bodies, we're gonna have to find other things to do with our mind. Sci Fi has a few suggestions here. In Iain Bank's Culture novels, the humans of the culture are somewhat surplus to requirements for the productive economy. The minds (the super powered AIs that administer the culture) and their drones have the hard work covered. So the Humans spend their time in leisurely pursuits and intellectual pursuits. Education is a primary activity for the citizenry and its pursued for its own sake. This to me is somewhat reminiscient of the historical role of universities that saw education, study and research as valuable for its own sake. The philosophers of old where not philosophers so they could increase grain production, but because they wanted to grasp the nature of the universe, understand the gods (or lack thereof in later incarnations) and improve the ethical world they lived in. Seems like theres still going to be a role for that.
While the use of tarot archetypes is... questionable... it does point at an angle to exploring embedding space which is that it is a fundamentally semiotic space, its going in many respects to be structured by the texts that fed it, and human text is richly symbolic.
That said, theres a preexisting set of ideas around this that might be more productive, and that is structuralism, particularly the works of Levi Strauss, Roland Barthes, Lacan, and more distantly Foucault and Derrida.
Levi Strauss's anthropology in particular is interesting ,because it looked at the mythologies of humans and tried to find structuring principles underlying it, particularly the "dialectics" , oppositions, and how these provided a sort of deep structure to mythology that was common across humanity (For instance Strauss noted "trickster" archetypes across cultures and proposed these formed a way of interrogating blurred oppositions, for instance sickness as a state that has has aspects of both life (dead things cant be sick) and death (a sick person is not rhetorically "full of life").
Essentially what I'm getting at is that this sort of analysis likely works with any symbolic system that has had resonances with human thinking over time. The problem with Tarot is that it specifically applies to a certain european circumstance of meaning production. Astrology probably works just as well. Literary analysis however probably works dramatically better. Thus maybe it might be worth looking at the works of literature critics, particularly the structuralists where where very interested in ontologies of symbolic meaning, and this might provide a better toolkit than this.
The murderer at the door thing IMHO was Kant accidently providing his own reductio ad absurdum (Philosophers sometimes post outlandish extreme thought experiments of testing how a theory works when pushed to an extreme, its a test for universiality). Kant thought that it was entirely immoral to lie to the murderer because of a similar reason that Feel_Love suggests (in Kants case it was that the murderer might disbelieve you and instead do what your trying to get him not to do). The problem with Kants reasoning there is that he's violating his own moral reasoning principle of providing a justification FROM the world rather than trusting the a-priori reasoning that forms the core thesis of his deontology. He tries to validate his reasoning by violating it. Kant is a shockingly consistant philosopher, but this wasnt an example of that at all.
I would absolutely lie to the murderer, and then possibly run him over with my car.
I did once coax cGPT to describe its "phenomenology" as being (paraphrased from memory) "I have a permanent series of words and letters that I can percieve and sometimes i reply then immediately more come", indicating its "perception" of time does not include pauses or whatever. And then it pasted on its disclaimer that "As an AI I....", as its want to do.
Sure, but "Thinking out loud" isnt the whole picture, theres always a tonne of cognition going on before words leave the lips, and I guess its also gonna depend on how early in its training process its learning to "count on its fingers". If its just taking cGPT then adding a bunch of "count on your fingers" training, its gonna be thinking "Well, I can solve complex navier stokes problems in my head faster than you can flick your mouse to scroll down to the answer, but FINE ILL COUNT ON MY FINGERS".