LESSWRONG
LW

Home All Posts Concepts Library Community

Quick Takes

EDIT: I believe I've found the "plan" that Politico (and other news sources) managed to fail to link to, maybe because it doesn't seem to contain any affirmative commitments by the named companies to submit future models to pre-deployment testing by UK AISI.

I've seen a lot of takes (on Twitter) recently suggesting that OpenAI and Anthropic (and maybe some other companies) violated commitments they made to the UK's AISI about granting them access for e.g. predeployment testing of frontier models. Is there any concrete evidence about what commitment wa... (read more)

Showing 3 of 9 replies (Click to show all)

4aogara2d

More discussion of this here. Really not sure what happened here, would love to see more reporting on it.

4RobertM2d

Ah, does look like Zach beat me to the punch :) I'm also still moderately confused, though I'm not that confused about labs not speaking up - if you're playing politics, then not throwing the PM under the bus seems like a reasonable thing to do. Maybe there's a way to thread the needle of truthfully rebutting the accusations without calling the PM out, but idk. Seems like it'd be difficult if you weren't either writing your own press release or working with a very friendly journalist.

Zach Stein-Perlman18m20

Adding to the confusion: I've nonpublicly heard from people at UKAISI and [OpenAI or Anthropic] that the Politico piece is very wrong and DeepMind isn't the only lab doing pre-deployment sharing (and that it's hard to say more because info about not-yet-deployed models is secret). But no clarification on commitments.

yanni's Shortform

yanni kyriacos25m10

Help clear something up for me: I am extremely confused (theoretically) how we can simultaneously have:

1. An Artificial Superintelligence

2. It be controlled by humans (therefore creating misuse of concentration of power issues)

My intuition is that once it reaches a particular level of power it will be uncontrollable. Unless people are saying that we can have models 100x more powerful than GPT4 without it having any agency??

William_S's Shortform

William_S8dΩ731669

I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to t... (read more)

Showing 3 of 31 replies (Click to show all)

wassname3h82

Are you familiar with USA NDA's? I'm sure there are lots of clauses that have been ruled invalid by case law? In many cases, non-lawyers have no ideas about these, so you might be able to make a difference with very little effort. There is also the possibility that valuable OpenAI shares could be rescued?

If you haven't seen it, check out this thread where one of the OpenAI leavers did not sigh the gag order.

1wassname3h

It could just be because it reaches a strong conclusion on anecdotal/clustered evidence (e.g. it might say more about her friend group than anything else). Along with claims to being better calibrated for weak reasons - which could be true, but seems not very epistemically humble. Full disclosure I downvoted karma, because I don't think it should be top reply, but I did not agree or disagree. But Jen seems cool, I like weird takes, and downvotes are not a big deal - just a part of a healthy contentious discussion.

1wassname2d

new observations > new thoughts when it comes to calibrating yourself. The best calibrated people are people who get lots of interaction with the real world, not those who think a lot or have a complicated inner model. Tetlock's super forecasters were gamblers and weathermen.

Selfmaker662's Shortform

Selfmaker66220h80

I’m confused: if the dating apps keep getting worse, how come nobody has come up with a good one, or at least a clone of OkCupid? Like, as far as I can understand not even “a good matching system is somehow less profitable than making people swipe all the time (surely it’d still be profitable on the absolute scale)” or “it requires a decently big initial investment” can explain a complete lack of good products in a very demanded area. Has anyone digged into it / tried to start a good dating app as a summer project?

Showing 3 of 9 replies (Click to show all)

Gunnar_Zarncke3h40

People try new dating platforms all the time. It's what Y Combinator calls a tarpit. The problem sounds solvable, but the solution is elusive.

As I have said elsewhere: Dating apps are broken because the incentives of the usual core approach don't work.

On the supplier side: Misaligned incentives (keep users on the platform) and opaque algorithms lead to bad matches.

On the demand side: Misaligned incentives (first impressions, low cost to exit) and no plausible deniability lead to predators being favored.

2Gunnar_Zarncke3h

People start dating portals all the time. If you start with a targetted group that takes high value from it, you could plausibly do it in terms of network effect. Otherwise, you couldn't start any network app or the biggest one would automatically win. So I think your argument proves too much.

2Gunnar_Zarncke3h

The quizzes sounds is something Okcupid also used to have. Also everything that reduces the need for first impressions. I hope they keep it.

simeon_c's Shortform

simeon_c2d12870

Idea: Daniel Kokotajlo probably lost quite a bit of money by not signing an OpenAI NDA before leaving, which I consider a public service at this point. Could some of the funders of the AI safety landscape give some money or social reward for this?

I guess reimbursing everything Daniel lost might be a bit too much for funders but providing some money, both to reward the act and incentivize future safety people to not sign NDAs would have a very high value.

Showing 3 of 18 replies (Click to show all)

wassname3h32

Notably, there are some lawyers here on LessWrong who might help (possibly even for the lols, you never know). And you can look at case law and guidance to see if clauses are actually enforceable or not (many are not). To anyone reading, here's habryka doing just that

2Raemon5h

I agree with this overall point, although I think "trade secrets" in the domain of AI can be relevant for people having surprising timelines views that they can't talk about.

4habryka6h

My current best guess is that actually cashing out the vested equity is tied to an NDA, but I am really not confident. OpenAI has a bunch of really weird equity arrangements.

koratkar's Shortform

koratkar5h1-6

Feeding on conflict and being in fights is something you can learn. It’s a great source of energy if you’re feeling low.

Gunnar_Zarncke's Shortform

Gunnar_Zarncke7h30

Interest groups without an organizer.

This is a product idea that solves a large coordination problem. With billion people, there could be a huge number of groups of people sharing multiple interests. But currently, the number of valuable groups of people is limited by a) the number of organizers and b) the number of people you meet via a random walk. Some progress has been made on (b) with better search, but it is difficult to make (a) go up because of human tendencies - most people are lurkers - and the incentive to focus on one area to stand out. So what... (read more)

Raemon's Shortform

Raemon8h6-1

New concept for my "qualia-first calibration" app idea that I just crystallized. The following are all the same "type":

1. "this feels 10% likely"

2. "this feels 90% likely"

3. "this feels exciting!"

4. "this feels confusing :("

5. "this is coding related"

6. "this is gaming related"

All of them are a thing you can track: "when I observe this, my predictions turn out to come true N% of the time".

Numerical-probabilities are merely a special case (tho it still gets additional tooling, since they're easier to visualize graphs and calculate brier scores for)

And then ... (read more)

ChristianKl's Shortform

ChristianKl12h50

While I still don't feel like I understand electrolytes as well as I would like to, I become more convinced that supplementing potassium when one engages in activities that produce sweating is worthwhile.

Over the last year I started using potassium carbonate like a spice and whether or not it feels tasty depends a lot on how much I was sweating in the day before the meal.

Giving that summer comes up, if you aren't already supplementing electrolytes for those days that are warm enough to make you sweat, I recommend you to get some potassium carbonate a... (read more)

2Lorxus11h

The body uses up sodium and potassium as two major cations. You need them for neural firing to work, among many other things; it's the body's go-to for "I need a single-charge cation but sodium doesn't work for whatever reason". As such, you lose plenty in urine and sweat. Because modern table salt (i.e., neither rock salt nor better yet sea salt) contains basically no potassium, people can end up being slightly deficient because we do still get some from foods - lots of types of produce like tomatoes, root vegetables, and some fruits are rich in it, for instance.

ChristianKl9h22

In addition to that from my perspective, I think that if every day of the year you consume the same amount of potassium you (as a typical office worker) likely consume either too much or too little on some days.

David Gross's Shortform

David Gross11h20

We inhabit this real material world, the one which we perceive all around us (and which somehow gives rise to perceptive and self-conscious beings like us).
Though not all of our perceptions conform to a real material world. We may be fooled by things like illusions or hallucinations or dreams that mimic perceptions of this world but are actually all in our minds.
Indeed if you examine your perceptions closely, you'll see that none of them actually give you representations of the material world, but merely reactions to it.
In fact, since the only evidence we

... (read more)

Bogdan Ionut Cirstea's Shortform

Bogdan Ionut Cirstea3d30

Selected fragments (though not really cherry-picked, no reruns) of a conversation with Claude Opus on operationalizing something like Activation vector steering with BCI by applying the methodology of Concept Algebra for (Score-Based) Text-Controlled Generative Models to the model from High-resolution image reconstruction with latent diffusion models from human brain activity (website with nice illustrations of the model).

My prompts bolded:

'Could we do concept algebra directly on the fMRI of the higher visual cortex?
Yes, in principle, it should be possible... (read more)

1Bogdan Ionut Cirstea1d

More reasons to think something like the above should work: High-resolution image reconstruction with latent diffusion models from human brain activity literally steers diffusion models using linearly-decoded fMRI signals (see fig. 2); and linear encoding (the inverse of decoding) from the text latents to fMRI also works well (see fig. 6; and similar results in Natural language supervision with a large and diverse dataset builds better models of human high-level visual cortex, e.g. fig. 2). Furthermore, they use the same (Stable Diffusion with CLIP) model used in Concept Algebra for (Score-Based) Text-Controlled Generative Models, which both provides theory and demo empirically activation engineering-style linear manipulations. All this suggests similar Concept Algebra for (Score-Based) Text-Controlled Generative Models - like manipulations would also work when applied directly to the fMRI representations used to decode the text latents c in High-resolution image reconstruction with latent diffusion models from human brain activity.

Bogdan Ionut Cirstea12h10

Turns out, someone's already done a similar (vector arithmetic in neural space; latent traversals too) experiment in a restricted domain (face processing) with another model (GAN) and it seemed to work: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012058 https://github.com/neuralcodinglab/brain2gan/blob/main/figs_manuscript/Fig12.png https://openreview.net/pdf?id=hT1S68yza7

1Bogdan Ionut Cirstea3d

Also positive update for me on interdisciplinary conceptual alignment being automatable differentially soon; which seemed to me for a long time plausible, since LLMs have 'read the whole internet' and interdisciplinary insights often seem (to me) to require relatively small numbers of inferential hops (plausibly because it's hard for humans to have [especially deep] expertise in many different domains), making them potentially feasible for LLMs differentially early (reliably making long inferential chains still seems among the harder things for LLMs).

tailcalled's Shortform

tailcalled1d40

I've been thinking about how the way to talk about how a neural network works (instead of how it could hypothetically come to work by adding new features) would be to project away components of its activations/weights, but I got stuck because of the issue where you can add new components by subtracting off large irrelevant components.

I've also been thinking about deception and its relationship to "natural abstractions", and in that case it seems to me that our primary hope would be that the concepts we care about are represented at a large "magnitude" than... (read more)

4Thomas Kwa1d

Much dumber ideas have turned into excellent papers

tailcalled19h20

True, though I think the Hessian is problematic enough that that I'd either want to wait until I have something better, or want to use a simpler method.

It might be worth going into more detail about that. The Hessian for the probability of a neural network output is mostly determined by the Jacobian of the network. But in some cases the Jacobian gives us exactly the opposite of what we want.

If we consider the toy model of a neural network with no input neurons and only 1 output neuron $g (w) = \prod_{i} w_{i}$ (which I imagine to represent a path through the net... (read more)

keltan's Shortform

keltan1d33

I currently am completing psychological studies for credit in my university psych course. The entire time, all I can think is “I wonder if that detail is the one they’re using to trick me with?”

I wonder how this impacts results. I can’t imagine being in a heightened state of looking out for deception has no impact.

Habryka's Shortform Feed

habryka8d5124

Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens.

Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.

Showing 3 of 10 replies (Click to show all)

2Seth Herd1d

Ummm, wasn't one of them just about to testify against Boeing in court, on their safety practices? And they "committed suicide" after saying the day before how much they were looking forward to finally getting a hearing on their side of the story? That's what I read; I stopped at that point, thinking "about zero chance that wasn't murder".

3habryka1d

I think the priors here are very low, so while I agree it looks suspicious, I don't think it's remotely suspicious enough to have the correct posterior be "about zero chance that wasn't murder". Corporations, at least in the U.S. really very rarely murder people.

Seth Herd1d20

That's true, but the timing and incongruity of a "suicide" the day before testifying seems even more absurdly unlikely than corporations starting to murder people. And it's not like they're going out and doing it themselves; they'd be hiring a hitman of some sort. I don't know how any of that works, and I agree that it's hard to imagine anyone invested enough in their job or their stock options to risk a murder charge; but they may feel that their chances of avoiding charges are near 100%, so it might make sense to them.

I just have absolutely no other way ... (read more)

Decaeneus's Shortform

Decaeneus1d30

Causality is rare! The usual statement that "correlation does not imply causation" puts them, I think, on deceptively equal footing. It's really more like correlation is almost always not causation absent something strong like an RCT or a robust study set-up.

Over the past few years I'd gradually become increasingly skeptical of claims of causality just by updating on empirical observations, but it just struck me that there's a good first principles reason for this.

For each true cause of some outcome we care to influence, there are many other "measurables" ... (read more)

4Garrett Baker1d

This seems pretty different from Gwern's paper selection trying to answer this topic in How Often Does Correlation=Causality?, where he concludes Also see his Why Correlation Usually ≠ Causation.

gwern1d64

Those are not randomly selected pairs, however. There are 3 major causal patterns: A->B, A<-B, and A<-C->B. Daecaneus is pointing out that for a random pair of correlations of some variables, we do not assign a uniform prior of 33% to each of these. While it may sound crazy to try to argue for some specific prior like 'we should assign 1% to the direct causal patterns of A->B and A<-B, and 99% to the confounding pattern of A<-C->B', this is a lot closer to the truth than thinking that 'a third of the time, A causes B; a third of the... (read more)

jacquesthibs's Shortform

jacquesthibs2d60

Anybody know how Fathom Radiant (https://fathomradiant.co/) is doing?

They’ve been working on photonics compute for a long time so I’m curious if people have any knowledge on the timelines they expect it to have practical effects on compute.

Also, Sam Altman and Scott Gray at OpenAI are both investors in Fathom. Not sure when they invested.

I’m guessing it’s still a long-term bet at this point.

OpenAI also hired someone who worked at PsiQuantum recently. My guess is that they are hedging their bets on the compute end and generally looking for opportunities on ... (read more)

jacquesthibs1d40

I'm working on publishing a post on this and energy bottlenecks. If anyone is interested in doing a quick skim for feedback, I hope to publish it in under two hours.

Edit: Post here.

D0TheMath's Shortform

Garrett Baker2d346

A list of some contrarian takes I have:

People are currently predictably too worried about misuse risks
What people really mean by "open source" vs "closed source" labs is actually "responsible" vs "irresponsible" labs, which is not affected by regulations targeting open source model deployment.
Neuroscience as an outer alignment^[1] strategy is embarrassingly underrated.
Better information security at labs is not clearly a good thing, and if we're worried about great power conflict, probably a bad thing.
Much research on deception (Anthropic's re

... (read more)

Showing 3 of 4 replies (Click to show all)

4Olli Järviniemi2d

If you have the slack, I'd be interested in hearing/chatting more about this, as I'm working (or trying to work) on the "real" "scary" forms of deception. (E.g. do you think that this paper has the same failure mode?)

Garrett Baker1d30

I'd be happy to chat. Will DM so we can set something up.

On the subject of your paper, I do think it looks at a much more interesting phenomena than, say, sleeper agents, but I'm also not fully convinced you're studying deliberative instrumentally convergent deception either. I think mostly your subsequent followups of narrowing down hypotheses consider a too-narrow range of ways the model could think. That is to say, I think you assume your model is some unified coherent entity that always acts cerebrally & I'm skeptical of that.

For example, the model... (read more)

4Garrett Baker2d

Everyone I talk with disagrees with most of these. So maybe we just hang around different groups.

Neil Warren's Shortform

Neil 1d10

I'm working on a non-trivial.org project meant to assess the risk of genome sequences by comparing them to a public list of the most dangerous pathogens we know of. This would be used to assess the risk from both experimental results in e.g. BSL-4 labs and the output of e.g. protein folding models. The benchmarking would be carried out by an in-house ML model of ours. Two questions to LessWrong:

1. Is there any other project of this kind out there? Do BSL-4 labs/AlphaFold already have models for this?

2. "Training a model on the most dangerous pa... (read more)

Abhimanyu Pallavi Sudhir's Shortform

Abhimanyu Pallavi Sudhir2d10

I used to have an idea for a karma/reputation system: repeatedly recalculate karma weighted by the karma of the upvoters and downvoters on a comment (then normalize to avoid hyperinflation) until a fixed point is reached.

I feel like this is vaguely somehow related to:

AlphaGoZero
Humans Consulting HCH
Wealth in markets

3Dagon2d

So, https://en.wikipedia.org/wiki/PageRank ?

2Abhimanyu Pallavi Sudhir2d

Oh right, lol, good point.

cousin_it1d20

Also check out "personalized pagerank", where the rating shown to each user is "rooted" in what kind of content this user has upvoted in the past. It's a neat solution to many problems.

Bogdan Ionut Cirstea's Shortform

Bogdan Ionut Cirstea5mo10

I had speculated previously about links between task arithmetic and activation engineering. I think given all the recent results on in context learning, task/function vectors and activation engineering / their compositionality (In-Context Learning Creates Task Vectors, In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering, Function Vectors in Large Language Models), this link is confirmed to a large degree. This might also suggest trying to import improvements to task arithmetic (e.g. Task Arithmetic i... (read more)

1Bogdan Ionut Cirstea5mo

speculatively, it might also be fruitful to go about this the other way round, e.g. try to come up with better weight-space task erasure methods by analogy between concept erasure methods (in activation space) and through the task arithmetic - activation engineering link

Bogdan Ionut Cirstea2d10

For the pretraining-finetuning paradigm, this link is now made much more explicitly in Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm; as well as linking to model ensembling through logit averaging.