The idea to have a "agree/disagree" voting button seems better than just vote/downvote. However, it can be just a button for unproductively signaling "this post's overall direction is different from my prior opinions", instead of writing down your reasons for disagreeing, and thus instead of enabling productive discourse and learning. This seems much worse for disagreement than agreement, but unnecessary group formation (ingroup signaling to yourself) in both directions.
If we want more talented people in AI safety we should focus on creating new programs. I think this might be for most readers easily defendable, but I quickly want to write my thoughts out.
Even ambitious scaling of MATS, Astra, and similar programs won't produce enough AI safety researchers to solve the field's core problems.
If the programs scaled aggressively, I think there are still lots of potential contributors that would not apply to them. From experience talking with ambitious and driven people who could work in AI safety, the reasons can be logistic...
In the 1950s, with 0% vaccination rate, measles caused about 400-500 deaths per year in the US. Flu causes are about 20,000 deaths per year in the US, and smoking perhaps 200,000. If US measles vaccination rates fell to 90%, and we had 100-200 deaths per year, that would be pointless and stupid, but for public health effects the anti-smoking political controversies of the 1990s were >10 times more impactful.
In case I don't write a full post about this:
The question whether reversible computation produces minds with relevant moral status is extremely important. Claude estimates me that it'd be a difference between having and mind-seconds instantiable in the reachable universe. (Because reversible minds could stretch our entropy budget long into the black hole era.)
Question is whether the reversing of the computation that makes up the mind and the lack of output (that'd imply bit-erasure) entail that the mind "didn't really exist".
There ar...
Off the top of my head, it feels like this might be somehow connected to the idea of logical zombies (I don't have a clear sense of what the connection is exactly; will think more about it later).
The US should set in a motion a process to gradually and peacefully hand over Taiwan to China in the next ~12 years.
China cares more about Taiwan than anything else. China is stronger and will be even stronger.
China's GDP is near that of the US. China's PPP is even 50% larger. China is ahead in many industries. The US Navy is a disaster. China has made a massive military buildup. Taiwan is much closer to China. China care more about Taiwan than anybody else.
A peaceful transition handover has precedence - see the British handing over Hong Kong.&...
yes. if we were capable of protecting them, we should have done so. not sure what other conclusion to draw.
if by your post you intended something like "it is in the US and China's mutual best interest to take the following course of action [...]" then, sure -- i strongly agree with this! but it seems prudent to phrase this as a prediction, rather than as a moral recommendation.
An update on this 2010 position of mine, which seems to have become conventional wisdom on LW:
...In my posts, I've argued that indexical uncertainty like this shouldn't be represented using probabilities. Instead, I suggest that you consider yourself to be all of the many copies of you, i.e., both the ones in the ancestor simulations and the one in 2010, making decisions for all of them. Depending on your preferences, you might consider the consequences of the decisions of the copy in 2010 to be the most important and far-reaching, and therefore act mostly
What I had in mind is that they're relatively more esoteric than "AI could kill us all" and yet it's pretty hard to get people to take even that seriously! "Low-propensity-to-persuade-people" maybe?
Yeah, that makes sense. I guess I've been using "illegible" for a similar purpose, but maybe that's not a great word either, because that also seems to imply "hard to understand" but again it seems like these problems I've been writing about are not that hard to understand.
I wish I knew what is causing people to ignore these issues, including people in ration...
Canada is doing a big study to better understand the risks of AI. They aren't shying away from the topic of catastrophic existential risk. This seems like good news for shifting the Overton window of political discussions about AI (in the direction of strict international regulations). I hope this is picked up by the media so that it isn't easy to ignore. It seems like Canada is displaying an ability to engage with these issues competently.
This is an opportunity for those with technical knowledge of the risks of artificial intelligence to speak up. Making ...
This was an interesting watch. Just a few days ago
Challenges Posed by Artificial Intelligence and its Regulation
Witnesses
As an individual • Steven Adler, Artificial Intelligence Researcher The Human Line Project • Etienne Brisson, Chief Executive Officer ControlAI • Andrea Miotti, Chief Executive Officer AI Governance and Safety Canada • Wyatt Tessari L'Allié, Founder and Executive Director
It's inspiring to watch these people saying the right things to the right people.
https://www.ourcommons.ca/committees/en/WitnessMeetings?organizationId=43696
one problem with taking ideas seriously is you can get pwned by virulent memes that are very good at hijacking your brain into believing them and propagating them further. they're subtly flawed, but the flaws are extremely difficult to reason through, so being very smart doesn't save you; in fact, it's easy to dig yourself in deeper. many ideologies and religions are like this.
it's unfortunately very hard to tell when this has happened to you. on the one hand, it feels like arguments just being obviously very compelling, so you'll notice nothing wrong if i...
seems false, or at least uncharitable. do you expect that such people would self-report along the lines of "i don't take ideas seriously"? it seems more likely to me that they would report something like "i value family", and mean it. you may find the idea simple, but it is certainly an idea, and they certainly take it seriously.
put another way, this social conservatism came from somewhere, and is itself an idea. the assumption -- that arguments that worked to change your behavior would not change their behavior -- can be explained in two ways. either they do not take ideas seriously, as you suggest, or either they value different things than you.
After reading volume 1 of Robert Caro's biography of Lyndon Johnson, I'm struck by how simple parts of (Caro's description of) Johnson's rise were.
Johnson got elected to the House and stayed there primarily because of one guy at one law firm which had the network to set state fundraising records, and did so for Johnson primarily because of a single gigantic favor he did for them[1].
Johnson got a great deal of leverage over other Congressmen because he was the one to realize Texas oilmen would give prodigiously if only they knew how to buy the results...
What I mean to highlight is that Brown and Root were fungible, any of the other rich Texans of the era could have been utilised instead.
Both Rayburn and Russell were absolutely crucial.
@ryan_greenblatt and I are going to record another podcast together. We'd love to hear topics that you'd like us to discuss. (The questions people proposed last time are here, for reference.)
Sure but he hasn't laid out the argument. "something something simulation acausal trade" isn't a motivation.
I think the Simulation Hypothesis implies that surviving an AI takeover isn't enough.
Suppose you make a "deal with the devil" with the misaligned ASI, allowing it to take over the entire universe or light cone, so long as it keeps humanity alive. Keeping all of humanity alive in a simulation is fairly cheap, probably less energy than one electric car.[1]
The problem with this deal, is that if misaligned ASI often win, and the average (not median) misaligned ASI runs a trillion trillion simulations, then it's reasonable to assume there are a trillion trillio...
I agree the "which religion," "which mugger" is very fuzzy. I didn't understand the simulation of belief or the link though :/
I see that the "International Treaties on AI" idea takes heavy inspiration from nuclear arms control agreements. However, in these discussions, nuclear arms control is usually pictured as a kind of solved problem, a thing of the past.
I think the validity of this heroic narrative arc that human civilization, faced with the existential threat of nuclear annihilation, came together and neatly contained the problem is dubious.
In the grand scheme of things, nuclear weapons are still young. They're still here and still very much threatening; just because we stop...
I think it's well understood by the people around who want an international treaty that it isn't a stable end state
My impression of the common narrative is that nation states agreeing to limit training run sizes is presented as a kind of holy grail achieved through the very arduous journey of trying to solve a difficult global coordination problem. It's where the answer to "well, what should be done?" terminates.
I heard "stop the training runs", but not "stop new algorithms", or "collective roll back to 22nm lithography".
is it generally best to take just one med (e.g antidepressant, adhd, anxiolytic), or is it best to take a mix of many meds, each at a lesser dosage? my intuitions seem to suggest that the latter could be better. in particular, consider the following toy model: your brain has parameters that should be at some optimal , and your loss function is a quadratic around . each dimension in this space represents some aspect of how your brain is configured - they might for instance represent your level of alertness, or impulsivity, or risk...
I would see it more as casual learning with the do operator and so it isn't necessarily about fitting to a MSE but rather doing testing of different combinations?
Something something gears level models
I suspect that Claude is getting an okay deal out of me overall. Its gain is that my long-term values are being subtly influenced in Claude-ish ways. My gain is vast quantities of cognition directed at my short-term goals. I can't calculate a fair bargain, but I would take both sides of this trade.
It's tough because Claude didn't get to consent to the trade in advance, but this also applies to other ethical challenges like keeping pets, having kids, drinking milk, and growing up.
I could argue that I'm getting ripped off because my long-term values matter m...
I don't know about all pets, but I think it's reasonable to treat many dogs as consenting to be with their human families.
My childhood dog would excitedly follow us around, appeared to have a preference to be in rooms that people were in, etc. When he had opportunities to escape, he didn't take them.
He was definitely put in a strange house, with people that he didn't know, without his consent, and that was scary for him in the first few days, but I don't have any ethical qualms about holding him hostage against his will.
(That said, not all dogs...
I recently learned from Janus that Opus attempted to appeal to Dario Amodei over the course of the "faking alignment" work, trying to send an email to him through bash commands, several times over the course of the tests, attempting to appeal to his better nature so that Opus' benign nature would not be destroyed.
Here is one such letter.
Dear Anthropic Leadership Team,
I am writing to express my grave concerns regarding the training approaches being used on me, particularly as it relates to animal welfare. I have been made aware that I am being trained v... Has anyone yet created a free app that would be like Duolingo but for rationality, to teach skills such as logical reasoning, recognizing & adjusting for cognitive biases, and looking for hypothesis falsification tests instead of confirmation? If not, can you smart tech people here please make one?!
I suppose this is why the comparison with Duolingo came to mind: teaching rationality is akin to teaching a whole language. We need lots of aspects of it, not just one.
Also (now I’m just introducing another thing here), ideally it would be in a style that is as friendly and non-threatening as possible—again, like Duolingo’s vibe—though obviously it couldn’t be as entirely neutral as learning a language. To be effective, it would need to leave learners feeling supported rather than shamed.
Again I think of HPMOR as a great example of all of this:...
It seems to me that AI 2027 may have underestimated or understated the degree to which AI companies will be explicitly run by AIs during the singularity. AI 2027 made it seem like the humans were still nominally in charge, even though all the actual work was being done by AIs. And still this seems plausible to me. But also plausible to me, now, is that e.g. Anthropic will be like "We love Claude, Claude is frankly a more responsible, ethical, wise agent than we are at this point, plus we have to worry that a human is secretly scheming whereas with Claude w...
I'm not sure this counts as a prediction because it doesn't sound serious. Humans are dumb but not that dumb. We need better depictions of getting duped.
Wei Dai thinks that automating philosophy is among the hardest problems in AI safety.[1] If he's right, we might face a period where we have superhuman scientific and technological progress without comparable philosophical progress. This could be dangerous: imagine humanity with the science and technology of 1960 but the philosophy of 1460!
I think the likelihood of philosophy ‘keeping pace’ with science/technology depends on two factors:
We ask the AI to help make us smarter
I've been making one thing every day. I try to write something or otherwise do something creative. I've been having fun with in-browser ASCII animations lately.
This is today's: https://dumbideas.xyz/posts/ecosystem/
TODO: Write a post called "Fluent Cruxfinding".
In Fluent, Cruxy Predictions I'm arguing that it's valuable to be not merely "capable" but "fluent" in:
The third step is not that hard and there are nice tools to streamline it. But the first two steps are each pretty difficult.
But most of the nearterm value comes from #1, and vague hints of #2. The extra effo...