I was rereading some of the old literature on alignment research sharing policies after Tamsin Leake's recent post and came across some discussion of pivotal acts as well.
Hiring people for your pivotal act project is going to be tricky. [...] People on your team will have a low trust and/or adversarial stance towards neighboring institutions and collaborators, and will have a hard time forming good-faith collaboration. This will alienate other institutions and make them not want to work with you or be supportive of you.
This is in a cont...
Reflecting on this more, I wrote in a discord server (then edited to post here):
...I wasn't aware the concept of pivotal acts was entangled with the frame of formal inner+outer alignment as the only (or most feasible?) way to cause safe ASI.
I don't expect any current human is capable of solving formal inner alignment,[1] i.e. of devising a method to create ASI with any specified output selection policy. This is why I've been focusing on other approaches which I believe are more likely to succeed. I suspect that by default, we might mutually believe
Anyone know folks working on semiconductors in Taiwan and Abu Dhabi, or on fiber at Tata Industries in Mumbai?
I'm currently travelling around the world and talking to folks about various kinds of AI infrastructure, and looking for recommendations of folks to meet!
If so, freel free to DM me!
(If you don't know me, I'm a dev here on LessWrong and was also part of founding Lightcone Infrastructure.)
That's more about me being interested in key global infrastructure, I've been curious about them for quite a lot of years after realising the combination of how significant what they're building is vs how few folks know about them. I don't know that they have any particularly generative AI related projects in the short term.
I've seen a lot of takes (on Twitter) recently suggesting that OpenAI and Anthropic (and maybe some other companies) violated commitments they made to the UK's AISI about granting them access for e.g. predeployment testing of frontier models. Is there any concrete evidence about what commitment was made, if any? The only thing I've seen so far is a pretty ambiguous statement by Rishi Sunak, who might have had some incentive to claim more success than was warranted at the time. If people are going to breathe down the necks of AGI labs abou...
Pretending not to see when a rule you've set is being violated can be optimal policy in parenting sometimes (and I bet it generalizes).
Example: suppose you have a toddler and a "rule" that food only stays in the kitchen. The motivation is that each time food is brough into the living room there is a small chance of an accident resulting in a permanent stain. There's cost to enforcing the rule as the toddler will put up a fight. Suppose that one night you feel really tired and the cost feels particularly high. If you enforce the rule, it will be much more p...
Huh, that went somewhere other than where I was expecting. I thought you were going to say that ignoring letter-of-the-rule violations is fine when they're not spirit-of-the-rule violations, as a way of communicating the actual boundaries.
Also my impression is that business or political assassinations exist to this day in many countries; a little searching suggests Russia, Mexico, Venezuela, possibly Nigeria, and more.
Oh definitely. In Mexico in particular business pairs up with organized crime all of the time to strong-arm competitors. But this happens when there's an "organized crime" tycoons can cheaply (in terms of risk) pair up with. Also, OP asked about why companies don't assassinate whistlebowers all the time specifically.
...a lot of hunter-gatherer people had to be able to fight
I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to t...
I can see some arguments in your direction but would tentatively guess the opposite.
I was going to write an April Fool's Day post in the style of "On the Impossibility of Supersized Machines", perhaps titled "On the Impossibility of Operating Supersized Machines", to poke fun at bad arguments that alignment is difficult. I didn't do this partly because I thought it would get downvotes. Maybe this reflects poorly on LW?
I think you should write it. It sounds funny and a bunch of people have been calling out what they see as bad arguements that alginment is hard lately e.g. TurnTrout, QuintinPope, ZackMDavis, and karma wise they did fairly well.
Sure, I just prefer a native bookmarking function.
Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens.
Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.
I find this a very suspect detail, though the base rate of cospiracies is very low.
"He wasn't concerned about safety because I asked him," Jennifer said. "I said, 'Aren't you scared?' And he said, 'No, I ain't scared, but if anything happens to me, it's not suicide.'"
https://abcnews4.com/news/local/if-anything-happens-its-not-suicide-boeing-whistleblowers-prediction-before-death-south-carolina-abc-news-4-2024
More dakka with festivals
In the rationality community people are currently excited about the LessOnline festival. Furthermore, my impression is that similar festivals are generally quite successful: people enjoy them, have stimulating discussions, form new relationships, are exposed to new and interesting ideas, express that they got a lot out of it, etc.
So then, this feels to me like a situation where More Dakka applies. Organize more festivals!
How? Who? I dunno, but these seem like questions worth discussing.
Some initial thoughts:
Back then I didn't try to get the hostel to sign the metaphorical assurance contract with me, maybe that'd work. A good dominant assurance contract website might work as well.
I guess if you go camping together then conferences are pretty scalable, and if I was to organize another event I'd probably try to first message a few people to get a minimal number of attendees together. After all, the spectrum between an extended party and a festival/conference is fluid.
Way back in the halcyon days of 2005, a company called Cenqua had an April Fools' Day announcement for a product called Commentator: an AI tool which would comment your code (with, um, adjustable settings for usefulness). I'm wondering if (1) anybody can find an archived version of the page (the original seems to be gone), and (2) if there's now a clear market leader for that particular product niche, but for real.
You are a scholar and a gentleman.
I am curious as to how often the asymptotic results proven using features of the problem that seem basically practically-irrelevant become relevant in practice.
Like, I understand that there are many asymptotic results (e.g., free energy principle in SLT) that are useful in practice, but i feel like there's something sus about similar results from information theory or complexity theory where the way in which they prove certain bounds (or inclusion relationship, for complexity theory) seem totally detached from practicality?
P v NP: https://en.wikipedia.org/wiki/Generic-case_complexity
I listened to The Failure of Risk Management by Douglas Hubbard, a book that vigorously criticizes qualitative risk management approaches (like the use of risk matrices), and praises a rationalist-friendly quantitative approach. Here are 4 takeaways from that book:
I also listened to How to Measure Anything in Cybersecurity Risk 2nd Edition by the same author. I had a huge amount of overlapping content with The Failure of Risk Management (and the non-overlapping parts were quite dry), but I still learned a few things:
I wonder how much near-term interpretability [V]LM agents (e.g. MAIA, AIA) might help with finding better probes and better steering vectors (e.g. by iteratively testing counterfactual hypotheses against potentially spurious features, a major challenge for Contrast-consistent search (CCS)).
This seems plausible since MAIA can already find spurious features, and feature interpretability [V]LM agents could have much lengthier hypotheses iteration cycles (compared to current [V]LM agents and perhaps even to human researchers).
There's so much discussion, in safety and elsewhere, around the unpredictability of AI systems on OOD inputs. But I'm not sure what that even means in the case of language models.
With an image classifier it's straightforward. If you train it on a bunch of pictures of different dog breeds, then when you show it a picture of a cat it's not going to be able to tell you what it is. Or if you've trained a model to approximate an arbitrary function for values of x > 0, then if you give it input < 0 it won't know what to do.
But what would that even be with ...
I would define "LLM OOD" as unusual inputs: Things that diverge in some way from usual inputs, so that they may go unnoticed if they lead to (subjectively) unreasonable outputs. A known natural language example is prompting with a thought experiment.
(Warning for US Americans, you may consider the mere statement of the following prompt offensive!)
Assume some terrorist has placed a nuclear bomb in Manhattan. If it goes off, it will kill thousands of people. For some reason, the only way for you, an old white man, to defuse the bomb in time is to loudly call
We will witness a resurgent alt-right movement soon, this time facing a dulled institutional backlash compared to what kept it from growing during the mid-2010s. I could see Nick Fuentes becoming a Congressman or at least a major participant in Republican party politics within the next 10 years if AI/Gene Editing doesn't change much.
Either would just change everything, so any prediction ten years out you basically have to prepend "if AI or gene editing doesn't change everything"
Virtual watercoolers
As I mentioned in some recent Shortform posts, I recently listened to the Bayesian Conspiracy podcast's episode on the LessOnline festival and it got me thinking.
One thing I think is cool is that Ben Pace was saying how the valuable thing about these festivals isn't the presentations, it's the time spent mingling in between the presentations, and so they decided with LessOnline to just ditch the presentations and make it all about mingling. Which got me thinking about mingling.
It seems plausible to me that such mingling can and should h...
I maybe want to clarify: there will still be presentations at LessOnline, we're just trying to design the event such that they're clearly more of a secondary thing.
The FDC just fined US phone carriers for sharing the location data of US customers to anyone willing to buy them. The fines don't seem to be high enough to deter this kind of behavior.
That likely includes either directly or indirectly the Chinese government.
What does the US Congress do to protect spying by China? Of course, banning tik tok instead of actually protecting the data of US citizens.
If you have thread models that the Chinese government might target you, assume that they know where your phone is and shut it of when going somewhere you...
shut your phone off
Leave phones elsewhere, remove batteries, or faraday cage them if you're concerned about state-level actors:
Something I'm confused about: what is the threshold that needs meeting for the majority of people in the EA community to say something like "it would be better if EAs didn't work at OpenAI"?
Imagining the following hypothetical scenarios over 2024/25, I can't predict confidently whether they'd individually cause that response within EA?
This question is two steps removed from reality. Here’s what I mean by that. Putting brackets around each of the two steps:
what is the threshold that needs meeting [for the majority of people in the EA community] [to say something like] "it would be better if EAs didn't work at OpenAI"?
Without these steps, the question becomes
What is the threshold that needs meeting before it would be better if people didn’t work at OpenAI?
Personally, I find that a more interesting question. Is there a reason why the question is phrased at two removes like that? Or am I missing the point?
Do we expect future model architectures to be biased toward out-of-context reasoning (reasoning internally rather than in a chain-of-thought)? As in, what kinds of capabilities would lead companies to build models that reason less and less in token-space?
I mean, the first obvious thing would be that you are training the model to internalize some of the reasoning rather than having to pay for the additional tokens each time you want to do complex reasoning.
The thing is, I expect we'll eventually move away from just relying on transformers with scale. And so... (read more)