I just can’t wrap my head around people who work on AI capabilities or AI control. My worst fear is that AI control works, power inevitably concentrates, and then the people who have the power abuse it. What is outlandish about this chain of events? It just seems like we’re trading X-risk for S-risks, which seems like an unbelievably stupid idea. Do people just not care? Are they genuinely fine with a world with S-risks as long as it’s not happening to them? That’s completely monstrous and I can’t wrap my head around it. The people who work at the top labs make me ashamed to be human. It’s a shandah.
This probably won’t make a difference, but I’ll write this anyways. If you’re working on AI-control, do you trust the people who end up in charge of the technology to wield it well? If you don’t, why are you working on AI control?
I don't understand how working on "AI control" here is any worse than working on AI alignment (I'm assuming you don't feel the same about alignment since you don't mention it).
In my mind, two different ways AI could cause bad things to happen are: (1) misuse: people use the AI use it for bad things, and (2) misalignment: regardless of anyone's intent, the AI does bad things of its own accord.
Both seem bad. Alignment research and control are both ways to address misalignment problems, I don't see how they differ for the purposes of your argument (though maybe I'm failing to understand your argument).
Addressing misalignment slightly increases people's ability to misuse AI, but I think the effect is fairly small and outweighed by the benefit of decreasing the odds a misaligned AI takes catastrophic actions.
Most s-risk scenarios vaguely analogous to historical situations don't happen in a post-AGI world, because there humans aren't useful for anything, either economically or in terms of maintaining power (unlike how they were throughout human history). It's not useful for the entities in power to do any of the things with traditionally terrible side effects.
Absence of feedback loops for treating people well (at the level of humanity as a whole) is its own problem, but it's a distinct kind of problem. It doesn't necessarily settle poorly (at the level of individuals and smaller communities) in a world with radical abundance, if indeed even a tiny fraction of the global resources gets allocated to the future of humanity, which is the hard part to ensure.
Is intology a legitimate research lab? Today they talked about having an AI researcher that performed better than humans on RE-bench at 64 hr time horizons. This seems really unbelievable to me. The AI system is called Locus.
It's happened before, see Reflexion (I hope I'm remembering the name right) hyping up their supposed real time learner model only for it to be a lie. Tons of papers overpromise and don't seem to get lasting consequences. But yeah I also don't know why Intology would be lying, but the fact there's no paper and that their deployment plans are waitlist-based and super vague (and the fact no one ever talks about zochi despite their beta program being old by this point) means we likely won't ever know. They say they plan on sharing Locus' discoveries "in the coming months", but until they actually do there's no way to verify past checking their kernel samples on GitHub.
For now I'm heavily, heavily skeptical. Agentic scaffolds don't usually magically 10x frontier models' performance, and we know the absolute best current models are still far from RE-Bench human performance (per their model cards, in which they also use proper scaffolding for the benchmark).
Making the (tenuous) assumption that humans remain in control of AGI, won't it just be an absolute shitshow of attempted power grabs over who gets to tell the AGI what to do? For example, supposing OpenAI is the first to AGI, is it really plausible that Sam Altman will be the one actually in charge when there will have been multiple researchers interacting with the model much earlier and much more frequently? I have a hard time believing every researcher will sit by and watch Sam Altman become more powerful than anyone ever dreamed of when there's a chance they're a prompt away from having that power for themselves.
You're assuming that:
- There is a single AGI instance running.
- There will be a single person telling that AGI what to do
- The AGI's obedience to this person will be total.
I can see these assumptions holding approximately true if we get really really good at corrigibility and if at the same time running inference on some discontinuously-more-capable future model is absurdly expensive. I don't find that scenario very likely, though.
what is the plan for making task-alignment go well? i am much more worried about the possibility of being at the mercy of some god-emperor with a task-aligned AGI slave than I am about having my atoms repurposed by an unaligned AGI. the incentives for blackmail and power-consolidation look awful.
Everything feels so low-stakes right now compared to future possibilities, and I am envious of people who don’t realize that. I need to spend less time thinking about it but I still can’t wrap my head around people rolling a dice which might have s-risks on it. It just seems like a -inf EV decision. I do not understand the thought process of people who see -inf and just go “yeah I’ll gamble that.” It’s so fucking stupid.
This just boils down to “humans aren’t aligned,” and that fact is why this would never work, but I still think it’s worth bringing up. Why are you required to get a license to drive, but not to have children? I don’t mean this in a literal way, I’m just referring to how casual the decision to have children is seen by much of society. Bringing someone into existence is vastly higher stakes than driving a car.
I’m sure this isn’t implementable, but parents should at least be screened for personality disorders before they’re allowed to have children. And...
>be me, omnipotent creator
>decide to create
>meticulously craft laws of physics
>big bang
>pure chaos
>structure emerges
>galaxies form
>stars form
>planets form
>life
>one cell
>cell eats other cell, multicellular life
>fish
>animals emerge from the oceans
>numerous opportunities for life to disappear, but it continues
>mammals
>monkeys
>super smart monkeys
>make tools, control fire, tame other animals
>monkeys create science, philosophy, art
>the universe is beginning to understand itself
>AI
>Humans and AI...
From what I understand, JVN, Poincaré, and Terence Tao all had/have issues with perceptual intuition/mental visualization. JVN had “the physical intuition of a doorknob,” Poincaré was tested by Binet and had extremely poor perceptual abilities, and Tao (at least as a child) mentioned finding mental rotation tasks “hard.”
I also fit a (much less extreme) version of this pattern, which is why I’m interested in this in the first place. I am (relatively) good at visual pattern recognition and math, but I have aphantasia and have an average visual working ...
Has anybody checked if finetuning LLMs to have inconsistent “behavior” degrades performance? Like you finetuned a model on a bunch of aligned tasks like writing secure code and offering compassionate responses to individuals in distress, but then you tried to specifically make it indifferent to animal welfare? It seems like that would create internal dissonance in the LLM which I would guess causes it to reason less effectively (since the character it’s playing is no longer consistent).
Apologies in advance if this is a midwit take. Chess engines are “smarter” than humans at chess, but they aren’t automatically better at real-world strategizing as a result. They don’t take over the world. Why couldn’t the same be true for STEMlord LLM-based agents?
It doesn’t seem like any of the companies are anywhere near AI that can “learn” or generalize in real time like a human or animal. Maybe a superintelligent STEMlord could hack their way around learning, but that still doesn’t seem the same as or as dangerous as fooming, and it also seems m...
For me, depression has been independent of the probability of doom. I’ve definitely been depressed, but I’ve been pretty cheerful for the past few years, even as the apparent probability of near-term doom has been mounting steadily. I did stop working on AI, and tried to talk my friends out of it, which was about all I could do. I decided not to worry about things I can’t affect, which has clarified my mind immensely.
The near-term future does indeed look very bright.
You shouldn’t worry about whether something “is AGI”; it’s an I’ll-defined concept. I agree that current models are lacking the ability to accomplish long-term tasks in the real world, and this keeps them safe. But I don’t think this is permanent, for two reasons.
Current large-language-model type AI is not capable of continuous learning, it is true. But AIs which are capable of it have been built. AlphaZero is perhaps the best example; it learns to play games to a superhuman level in a few hours. It’s a topic of current research to try to combine them.
Moreover, tool-type AIs tend to be developed to provide agency, because it’s more useful to direct an agent than it is a tool. This is a more fully fleshed out here: https://gwern.net/tool-ai
Much of my probability of non-doom is resting on people somehow not developing agents.
Fun Fact of the Day: Kanye West’s WAIS is within two points of a fields medalist’s (the fields medalist is Richard Borcherds, their respective IQs are 135 and 137).
Extra Fun Fact: Kanye West was bragging about this to Donald Trump in the Oval Office. He revealed that his digit span was only 92.5 (which is what makes me think he actually had a psychologist-administered WAIS).
Extra Extra Fun Fact: Richard Borcherds was administered the WAIS-R by Sacha Baron Cohen's first cousin.
What if Trump is channeling his inner doctor strange and is crashing the economy in order to slow AI progress and buy time for alignment? Eliezer calls for an AI pause, Trump MAKES an AI pause. I rest my case that Trump is the most important figure in the history of AI alignment.
I think a lot of people are confused by good and courageous people and don’t understand why some people are that way. But I don’t think the answer is that confusing. It comes down to strength of conscience. For some people, the emotional pain of not doing what they think is right hurts them 1000x more than any physical pain. They hate doing what they think is wrong more than they hate any physical pain.
So if you want to be an asshole, you can say that good and courageous people, otherwise known as heroes, do it out of their own self-interest.
How far along are the development of autonomous underwater drones in America? I’ve read statements by American military officials about wanting to turn the Taiwan straight into a drone-infested death trap. And I read someone (not an expert) who said that China is racing against time to try and invade before autonomous underwater drones take off. Is that true? Are they on track?
I’ve found the best way to get out of philosophical rabbit holes is to spend more time living. It provides far more reassurance and wisdom than spending all day trying to solve the problem of evil. I think Hume found something similar and that’s deeply reassuring to me.
I’m weighing my career options, and the two issues that seem most important to me are factory farming and preventing misuse/s-risks from AI. Working for a lab-grown meat startup seems like a very high-impact line of work that could also be technically interesting. I think I would enjoy that career a lot.
However, I believe that S-risks from human misuse of AI and neuroscience introduce scenarios that dwarf factory-farming in awfulness. I think that there are lots of incredibly intelligent people working on figuring out how to align AIs to who/what we want. ...
The idea of a superintelligence having an arbitrary utility function doesn’t make much sense to me. It ultimately makes the superintelligence a slave to its utility function which doesn’t seem like the way a superintelligence would work.
I got into reading about near death experiences and it seems a common theme is that we’re all one. Like each and every one of us is really just part of some omniscient god that’s so omniscient and great that god isn’t even a good enough name for it: experiencing what it’s like to be small. Sure, why not. That’s sort of intuitive to me. Given that I can’t verify the universe exists and can only verify my experience it doesn’t seem that crazy to say experience is fundamental.
But if that’s the case then I’m just left with an overwhelming sense of why. Why mak...
My guess is that finetuning an LLM turns it into a p-zombie. I don’t think the architecture is complicated enough to support consciousness. There’s zero capacity for choice involved, which seems to be what consciousness is all about.
contra the orthogonality thesis.
if you want to waste a day or two, try to find an eminent mathematician or physicist who had NPD or ASPD. as far as i can tell, i haven't been able to find any successful ones who had either disorder.
as far as the research goes, ASPD is correlated with significantly lower non-verbal intelligence. and in one study i found, NPD wasn't really correlated with any parts of intelligence except with lower non-verbal intelligence.
which can lead to the idea that everbody starts out aligned, and then when those with less cognitive res...