I live in Australia, I'd prefer to live in SF, but am currently trying to build skills to get a job I'd be interested in over there. If a post of mine indicated to you that I might be a good fit for a role you're looking to fill, please send me a DM :D
Other reasons I'd appreciate DMs include (though are not limited to):
- You'd like to collaborate on a project
- You'd like feedback on something
- You need an experienced filmmaker
- You need someone to run a workshop
- You would like to say "Hi!"
- You need a place to stay while in Australia
(If we met at LessOnline, you might remember me better as the Drama Workshop guy, or the counterpart to @isabella, or the Fooming Shoggoths' biggest fan!)
I blog here: https://keltan.substack.com/
Post videos here: youtube.com/keltan
and Rationalist music here: YouTube.com/keltantoo
My name is always spelt with a lower case "k"
I sometimes hear people say "The MIRI warnings of AI risks are out of date. There are different risks now, but MIRI hasn't changed". What do people mean by this? The MIRI arguments still seem to hold up to me?
I asked Claude for it's input, and received this answer; which seems like a good breakdown?
Claude's Response
This is a criticism I've seen circulating in AI safety discussions, and there are a few distinct claims people seem to be making:
The "outdated threat model" argument
Some critics argue MIRI's core warnings were developed when the assumed path to AGI looked quite different—more like classical GOFAI or relatively interpretable systems where you might hope to formally verify properties. The argument goes that deep learning systems present different alignment challenges (mesa-optimization, deceptive alignment emerging from gradient descent, RLHF-specific failure modes, multi-agent dynamics) that MIRI's framework wasn't designed around.
The "they predicted wrong timelines/capabilities"
There's a strain of criticism that MIRI expected AGI to look more like a single powerful optimiser with coherent goals, whereas what we're getting is... weirder. Large pretrained models that are hard to characterise as "agents" in the classical sense, that exhibit capability gains from scale rather than algorithmic breakthroughs, and where the alignment problem manifests differently than the "utility maximiser finds unexpected solutions" framing.
The "strategic irrelevance" argument
Some claim MIRI's response to the current landscape (particularly around corrigibility, interpretability work at other orgs, constitutional AI approaches) has been to maintain their pessimism without engaging substantively with whether these approaches might work, or pivoting their research programme accordingly.
My take on whether this is fair
The core MIRI arguments about instrumental convergence, Goodhart's law, and the difficulty of specifying human values seem to me to hold up regardless of architecture. The question is more whether the specific failure modes they emphasised (treacherous turns, sharp capability gains enabling deception) are the ones we should be most worried about with current systems versus, say, gradual value drift, multi-agent race dynamics, or misuse.
What specifically prompted you hearing this recently?
Link to chat here: https://claude.ai/share/36ac7360-b2e8-4601-94da-1ce7d05962d9
I am far away (Aus) and probably have a distorted view of Inkhaven. Though, I probably should have been more vocal about how grateful I am, that it is a thing that is happening. I did draft a long comment to the original Inkhaven announcement. Where I gushed about my excitement, and was cheering on the people of the Lightcone. But, I felt embarrassed being so open about my enthusiasm, and ultimately deleted the draft. Oops!
I'll make up for this mistake a little, by saying, thank you for running Inkhaven, Ben. I think this is a good thing for the world, and it seems like a hard project to organize. It is quite understandable that you'd be feeling like shit, with so much on your plate.
Inkhaven has already had positive knock-on effects in the community. Having participated in the Half-Haven discord for the past two months, I have written ~28 blog posts as of today, and am on track to complete the 30 posts in 2 months goal. I think I am a better writer, because of this, and Half-Haven is something that wouldn't have happened without Inkhaven, inspiring the idea.
I am grateful to you, and your project. Good luck.
For the Blogosphere!
I haven't. Where should I search for these? Preferably, raw footage, non-news broadcast?
The best video on the realities of the modern drone-dominated battlefield is here.
I basically know nothing about this topic. But I saw this video, and was kinda shocked. Were the drones in the removed video, better than this?
Going down a bit of a rabbit hole now, I'll link other videos that seem relevant:
- Drone Swarm: Again, idk anything about this topic. But this seemed pretty scary. I expect the product being advertised would be ~easy to counter against.
Strong upvote.
IMO, the portrayal of 'smart' characters in media, can do damage to the way people who grow up thinking they are intelligent, will interact with others. E.g. Dr House, House; Rick, Rick and Morty; Sherlock Holmes, Lots of things.
This happened to me growing up, and I was a nihilistic prick as a teenager. For me, the remedy was engaging with media created by real-life intelligent people. The Podcasts "Cortex" and "Dear Hank and John", have majorly shifted my personality in a positive direction. I wouldn't have predicted that they would make me a better rationalist, but I think they have done that too.
TLDR: Absorb the personality traits of intelligent and kind people, for a possible easy fix to the problem detailed in this post.
before I speak I ask myself what I’m about to say and how I think the person I’m talking to is going to react next.
Obligatory reminder to reverse all advice you hear. This might be the type of idea that would destroy a person with social anxiety
So far, I'm fortunate to be one of the people on track to meet the post number goal. Here are some thoughts at the half way point:
adding my own, insane, shower thought idea here.
Woke Shutdown: Gain access to, and change the system prompt of the most widely used LLM. Change the prompt in a way, that causes it to output hate, mistrust, yet true information, about whoever the current president is. If this only lasts for a day, the media might still pick up on it, and make a big deal, perhaps ‘AI is woke’ becomes a more common belief. Which would force the president to act.
I have been thinking in the 'Pavlov-ing my Algorithm' mindset for years, and there is a failure state I would like to warn about.
It is possible for an algorithm to pick up on you trying to train it, then purposely show you some bad things, so that you feel the need to stick around longer, so that you can train it properly, all the while, you see incremental progress in what the algorithm is showing you.
I have failed in this way, the training becomes a meta game atop the algorithm, and for a certain type of person, that meta game can be more engaging than the content itself.
Inspired by your post, I very quickly (~2h) vibe coded an Obsidian Plugin, based on your River Timeline idea.
The Plugin crashes pretty often, and has some major problems. But there is something interesting here. I have no plans to publish on Obsidian/work on the plugin further. If anyone becomes interested in this project, they are welcome to steal all this code without credit.