Quick Takes

RobertM5739

EDIT: I believe I've found the "plan" that Politico (and other news sources) managed to fail to link to, maybe because it doesn't seem to contain any affirmative commitments by the named companies to submit future models to pre-deployment testing by UK AISI.

I've seen a lot of takes (on Twitter) recently suggesting that OpenAI and Anthropic (and maybe some other companies) violated commitments they made to the UK's AISI about granting them access for e.g. predeployment testing of frontier models.  Is there any concrete evidence about what commitment wa... (read more)

Showing 3 of 9 replies (Click to show all)
4aogara
More discussion of this here. Really not sure what happened here, would love to see more reporting on it. 
4RobertM
Ah, does look like Zach beat me to the punch :) I'm also still moderately confused, though I'm not that confused about labs not speaking up - if you're playing politics, then not throwing the PM under the bus seems like a reasonable thing to do.  Maybe there's a way to thread the needle of truthfully rebutting the accusations without calling the PM out, but idk.  Seems like it'd be difficult if you weren't either writing your own press release or working with a very friendly journalist.

Adding to the confusion: I've nonpublicly heard from people at UKAISI and [OpenAI or Anthropic] that the Politico piece is very wrong and DeepMind isn't the only lab doing pre-deployment sharing (and that it's hard to say more because info about not-yet-deployed models is secret). But no clarification on commitments.

Help clear something up for me: I am extremely confused (theoretically) how we can simultaneously have:

1. An Artificial Superintelligence

2. It be controlled by humans (therefore creating misuse of concentration of power issues)

My intuition is that once it reaches a particular level of power it will be uncontrollable. Unless people are saying that we can have models 100x more powerful than GPT4 without it having any agency??

William_SΩ731669

I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to t... (read more)

Reply14931
Showing 3 of 31 replies (Click to show all)

Are you familiar with USA NDA's? I'm sure there are lots of clauses that have been ruled invalid by case law? In many cases, non-lawyers have no ideas about these, so you might be able to make a difference with very little effort. There is also the possibility that valuable OpenAI shares could be rescued?

If you haven't seen it, check out this thread where one of the OpenAI leavers did not sigh the gag order.

1wassname
It could just be because it reaches a strong conclusion on anecdotal/clustered evidence (e.g. it might say more about her friend group than anything else). Along with claims to being better calibrated for weak reasons - which could be true, but seems not very epistemically humble. Full disclosure I downvoted karma, because I don't think it should be top reply, but I did not agree or disagree. But Jen seems cool, I like weird takes, and downvotes are not a big deal - just a part of a healthy contentious discussion.
1wassname
new observations > new thoughts when it comes to calibrating yourself. The best calibrated people are people who get lots of interaction with the real world, not those who think a lot or have a complicated inner model. Tetlock's super forecasters were gamblers and weathermen.

I’m confused: if the dating apps keep getting worse, how come nobody has come up with a good one, or at least a clone of OkCupid? Like, as far as I can understand not even “a good matching system is somehow less profitable than making people swipe all the time (surely it’d still be profitable on the absolute scale)” or “it requires a decently big initial investment” can explain a complete lack of good products in a very demanded area. Has anyone digged into it / tried to start a good dating app as a summer project?

Showing 3 of 9 replies (Click to show all)

People try new dating platforms all the time. It's what Y Combinator calls a tarpit. The problem sounds solvable, but the solution is elusive.

As I have said elsewhere: Dating apps are broken because the incentives of the usual core approach don't work.

On the supplier side: Misaligned incentives (keep users on the platform) and opaque algorithms lead to bad matches. 

On the demand side: Misaligned incentives (first impressions, low cost to exit) and no plausible deniability lead to predators being favored.

2Gunnar_Zarncke
People start dating portals all the time. If you start with a targetted group that takes high value from it, you could plausibly do it in terms of network effect. Otherwise, you couldn't start any network app or the biggest one would automatically win. So I think your argument proves too much.
2Gunnar_Zarncke
The quizzes sounds is something Okcupid also used to have. Also everything that reduces the need for first impressions. I hope they keep it. 
simeon_c12870

Idea: Daniel Kokotajlo probably lost quite a bit of money by not signing an OpenAI NDA before leaving, which I consider a public service at this point. Could some of the funders of the AI safety landscape give some money or social reward for this?

I guess reimbursing everything Daniel lost might be a bit too much for funders but providing some money, both to reward the act and incentivize future safety people to not sign NDAs would have a very high value. 

Showing 3 of 18 replies (Click to show all)

Notably, there are some lawyers here on LessWrong who might help (possibly even for the lols, you never know). And you can look at case law and guidance to see if clauses are actually enforceable or not (many are not). To anyone reading, here's habryka doing just that

2Raemon
I agree with this overall point, although I think "trade secrets" in the domain of AI can be relevant for people having surprising timelines views that they can't talk about.
4habryka
My current best guess is that actually cashing out the vested equity is tied to an NDA, but I am really not confident. OpenAI has a bunch of really weird equity arrangements.

Feeding on conflict and being in fights is something you can learn. It’s a great source of energy if you’re feeling low.

Interest groups without an organizer.

This is a product idea that solves a large coordination problem. With billion people, there could be a huge number of groups of people sharing multiple interests. But currently, the number of valuable groups of people is limited by a) the number of organizers and b) the number of people you meet via a random walk. Some progress has been made on (b) with better search, but it is difficult to make (a) go up because of human tendencies - most people are lurkers - and the incentive to focus on one area to stand out. So what... (read more)

Raemon6-1

New concept for my "qualia-first calibration" app idea that I just crystallized. The following are all the same "type":

1. "this feels 10% likely"

2. "this feels 90% likely"

3. "this feels exciting!"

4. "this feels confusing :("

5. "this is coding related"

6. "this is gaming related"

All of them are a thing you can track: "when I observe this, my predictions turn out to come true N% of the time".

Numerical-probabilities are merely a special case (tho it still gets additional tooling, since they're easier to visualize graphs and calculate brier scores for)

And then ... (read more)

While I still don't feel like I understand electrolytes as well as I would like to, I become more convinced that supplementing potassium when one engages in activities that produce sweating is worthwhile.

Over the last year I started using potassium carbonate like a spice and whether or not it feels tasty depends a lot on how much I was sweating in the day before the meal. 

Giving that summer comes up, if you aren't already supplementing electrolytes for those days that are warm enough to make you sweat, I recommend you to get some potassium carbonate a... (read more)

2Lorxus
The body uses up sodium and potassium as two major cations. You need them for neural firing to work, among many other things; it's the body's go-to for "I need a single-charge cation but sodium doesn't work for whatever reason". As such, you lose plenty in urine and sweat. Because modern table salt (i.e., neither rock salt nor better yet sea salt) contains basically no potassium, people can end up being slightly deficient because we do still get some from foods - lots of types of produce like tomatoes, root vegetables, and some fruits are rich in it, for instance.

In addition to that from my perspective, I think that if every day of the year you consume the same amount of potassium you (as a typical office worker) likely consume either too much or too little on some days. 

  1. We inhabit this real material world, the one which we perceive all around us (and which somehow gives rise to perceptive and self-conscious beings like us).
  2. Though not all of our perceptions conform to a real material world. We may be fooled by things like illusions or hallucinations or dreams that mimic perceptions of this world but are actually all in our minds.
  3. Indeed if you examine your perceptions closely, you'll see that none of them actually give you representations of the material world, but merely reactions to it.
  4. In fact, since the only evidence we
... (read more)

Selected fragments (though not really cherry-picked, no reruns) of a conversation with Claude Opus on operationalizing something like Activation vector steering with BCI by applying the methodology of Concept Algebra for (Score-Based) Text-Controlled Generative Models to the model from High-resolution image reconstruction with latent diffusion models from human brain activity (website with nice illustrations of the model).

My prompts bolded:

'Could we do concept algebra directly on the fMRI of the higher visual cortex?
Yes, in principle, it should be possible... (read more)

1Bogdan Ionut Cirstea
More reasons to think something like the above should work: High-resolution image reconstruction with latent diffusion models from human brain activity literally steers diffusion models using linearly-decoded fMRI signals (see fig. 2); and linear encoding (the inverse of decoding) from the text latents to fMRI also works well (see fig. 6; and similar results in Natural language supervision with a large and diverse dataset builds better models of human high-level visual cortex, e.g. fig. 2). Furthermore, they use the same (Stable Diffusion with CLIP) model used in Concept Algebra for (Score-Based) Text-Controlled Generative Models, which both provides theory and demo empirically activation engineering-style linear manipulations. All this suggests similar Concept Algebra for (Score-Based) Text-Controlled Generative Models - like manipulations would also work when applied directly to the fMRI representations used to decode the text latents c in High-resolution image reconstruction with latent diffusion models from human brain activity.

Turns out, someone's already done a similar (vector arithmetic in neural space; latent traversals too) experiment in a restricted domain (face processing) with another model (GAN) and it seemed to work: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012058 https://github.com/neuralcodinglab/brain2gan/blob/main/figs_manuscript/Fig12.png https://openreview.net/pdf?id=hT1S68yza7

1Bogdan Ionut Cirstea
Also positive update for me on interdisciplinary conceptual alignment being automatable differentially soon; which seemed to me for a long time plausible, since LLMs have 'read the whole internet' and interdisciplinary insights often seem (to me) to require relatively small numbers of inferential hops (plausibly because it's hard for humans to have [especially deep] expertise in many different domains), making them potentially feasible for LLMs differentially early (reliably making long inferential chains still seems among the harder things for LLMs).

I've been thinking about how the way to talk about how a neural network works (instead of how it could hypothetically come to work by adding new features) would be to project away components of its activations/weights, but I got stuck because of the issue where you can add new components by subtracting off large irrelevant components.

I've also been thinking about deception and its relationship to "natural abstractions", and in that case it seems to me that our primary hope would be that the concepts we care about are represented at a large "magnitude" than... (read more)

4Thomas Kwa
Much dumber ideas have turned into excellent papers

True, though I think the Hessian is problematic enough that that I'd either want to wait until I have something better, or want to use a simpler method.

It might be worth going into more detail about that. The Hessian for the probability of a neural network output is mostly determined by the Jacobian of the network. But in some cases the Jacobian gives us exactly the opposite of what we want.

If we consider the toy model of a neural network with no input neurons and only 1 output neuron  (which I imagine to represent a path through the net... (read more)

keltan33

I currently am completing psychological studies for credit in my university psych course. The entire time, all I can think is “I wonder if that detail is the one they’re using to trick me with?”

I wonder how this impacts results. I can’t imagine being in a heightened state of looking out for deception has no impact.

habryka5124

Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens. 

Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.

Showing 3 of 10 replies (Click to show all)
2Seth Herd
Ummm, wasn't one of them just about to testify against Boeing in court, on their safety practices? And they "committed suicide" after saying the day before how much they were looking forward to finally getting a hearing on their side of the story? That's what I read; I stopped at that point, thinking "about zero chance that wasn't murder".
3habryka
I think the priors here are very low, so while I agree it looks suspicious, I don't think it's remotely suspicious enough to have the correct posterior be "about zero chance that wasn't murder". Corporations, at least in the U.S. really very rarely murder people.

That's true, but the timing and incongruity of a "suicide" the day before testifying seems even more absurdly unlikely than corporations starting to murder people. And it's not like they're going out and doing it themselves; they'd be hiring a hitman of some sort. I don't know how any of that works, and I agree that it's hard to imagine anyone invested enough in their job or their stock options to risk a murder charge; but they may feel that their chances of avoiding charges are near 100%, so it might make sense to them.

I just have absolutely no other way ... (read more)

Causality is rare! The usual statement that "correlation does not imply causation" puts them, I think, on deceptively equal footing. It's really more like correlation is almost always not causation absent something strong like an RCT or a robust study set-up.

Over the past few years I'd gradually become increasingly skeptical of claims of causality just by updating on empirical observations, but it just struck me that there's a good first principles reason for this.

For each true cause of some outcome we care to influence, there are many other "measurables" ... (read more)

4Garrett Baker
This seems pretty different from Gwern's paper selection trying to answer this topic in How Often Does Correlation=Causality?, where he concludes Also see his Why Correlation Usually ≠ Causation.
gwern64

Those are not randomly selected pairs, however. There are 3 major causal patterns: A->B, A<-B, and A<-C->B. Daecaneus is pointing out that for a random pair of correlations of some variables, we do not assign a uniform prior of 33% to each of these. While it may sound crazy to try to argue for some specific prior like 'we should assign 1% to the direct causal patterns of A->B and A<-B, and 99% to the confounding pattern of A<-C->B', this is a lot closer to the truth than thinking that 'a third of the time, A causes B; a third of the... (read more)

Anybody know how Fathom Radiant (https://fathomradiant.co/) is doing?

They’ve been working on photonics compute for a long time so I’m curious if people have any knowledge on the timelines they expect it to have practical effects on compute.

Also, Sam Altman and Scott Gray at OpenAI are both investors in Fathom. Not sure when they invested.

I’m guessing it’s still a long-term bet at this point.

OpenAI also hired someone who worked at PsiQuantum recently. My guess is that they are hedging their bets on the compute end and generally looking for opportunities on ... (read more)

I'm working on publishing a post on this and energy bottlenecks. If anyone is interested in doing a quick skim for feedback, I hope to publish it in under two hours.

Edit: Post here.

A list of some contrarian takes I have:

  • People are currently predictably too worried about misuse risks

  • What people really mean by "open source" vs "closed source" labs is actually "responsible" vs "irresponsible" labs, which is not affected by regulations targeting open source model deployment.

  • Neuroscience as an outer alignment[1] strategy is embarrassingly underrated.

  • Better information security at labs is not clearly a good thing, and if we're worried about great power conflict, probably a bad thing.

  • Much research on deception (Anthropic's re

... (read more)
Reply922221111
Showing 3 of 4 replies (Click to show all)
4Olli Järviniemi
  If you have the slack, I'd be interested in hearing/chatting more about this, as I'm working (or trying to work) on the "real" "scary" forms of deception. (E.g. do you think that this paper has the same failure mode?)

I'd be happy to chat. Will DM so we can set something up.

On the subject of your paper, I do think it looks at a much more interesting phenomena than, say, sleeper agents, but I'm also not fully convinced you're studying deliberative instrumentally convergent deception either. I think mostly your subsequent followups of narrowing down hypotheses consider a too-narrow range of ways the model could think. That is to say, I think you assume your model is some unified coherent entity that always acts cerebrally & I'm skeptical of that.

For example, the model... (read more)

4Garrett Baker
Everyone I talk with disagrees with most of these. So maybe we just hang around different groups.
Neil 10

I'm working on a non-trivial.org project meant to assess the risk of genome sequences by comparing them to a public list of the most dangerous pathogens we know of. This would be used to assess the risk from both experimental results in e.g. BSL-4 labs and the output of e.g. protein folding models. The benchmarking would be carried out by an in-house ML model of ours. Two questions to LessWrong: 

1. Is there any other project of this kind out there? Do BSL-4 labs/AlphaFold already have models for this? 

2. "Training a model on the most dangerous pa... (read more)

I used to have an idea for a karma/reputation system: repeatedly recalculate karma weighted by the karma of the upvoters and downvoters on a comment (then normalize to avoid hyperinflation) until a fixed point is reached.

I feel like this is vaguely somehow related to:

3Dagon
So, https://en.wikipedia.org/wiki/PageRank ?
2Abhimanyu Pallavi Sudhir
Oh right, lol, good point.

Also check out "personalized pagerank", where the rating shown to each user is "rooted" in what kind of content this user has upvoted in the past. It's a neat solution to many problems.

I had speculated previously about links between task arithmetic and activation engineering. I think given all the recent results on in context learning, task/function vectors and activation engineering / their compositionality (In-Context Learning Creates Task Vectors, In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering, Function Vectors in Large Language Models), this link is confirmed to a large degree. This might also suggest trying to import improvements to task arithmetic (e.g. Task Arithmetic i... (read more)

1Bogdan Ionut Cirstea
speculatively, it might also be fruitful to go about this the other way round, e.g. try to come up with better weight-space task erasure methods by analogy between concept erasure methods (in activation space) and through the task arithmetic - activation engineering link

For the pretraining-finetuning paradigm, this link is now made much more explicitly in Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm; as well as linking to model ensembling through logit averaging. 

Load More