All of Dave Orr's Comments + Replies

The thing you're missing is called instruction tuning. You gather a series of prompt/response pairs and fine tune the model over that data. Do it right and you have a chatty model.

Thanks, Zvi, these roundups are always interesting.

I have one small suggestion, which is that you limit yourself to one Patrick link per post. He's an interesting guy but his area is quite niche, and if people want his fun stories about banking systems they can just follow him. I suspect that people who care about those things already follow him, and people who don't aren't that interested to read four items from him here.

I feel like a lot of the issues in this post are that the published RSPs are not very detailed and most of the work to flesh them out is not done. E.g. the comparison to other risk policies highlights lack of detail in various ways.

I think it takes a lot of time and work to build our something with lots of analysis and detail, years of work potentially to really do it right. And yes, much of that work hasn't happened yet.

But I would rather see labs post the work they are doing as they do it, so people can give feedback and input. If labs do so, the framewo... (read more)

Thanks for your comment. 

I feel like a lot of the issues in this post are that the published RSPs are not very detailed and most of the work to flesh them out is not done.

I strongly disagree with this. In my opinion, a lot of the issue is that RSPs have been thought from first principles without much consideration for everything the risk management field has done, and hence doing wrong stuff without noticing. 

It's not a matter of how detailed they are; they get the broad principles wrong. As I argued (the entire table is about this) I think... (read more)

I agree with all of this. It's what I meant by "it's up to all of us."

It will be a signal of how things are going if I'm a year we still have only vague policies, or if there has been real progress in operationalizing the safety levels, detection, what the right reactions are, etc.

That's fair, I think I misread you. I guess our biggest differences are (i) I don't think the takeaway depends so strongly on whether AI developers are trying to do the right thing---either way it's up to all of us, and (ii) I think it's already worth talking about ways which Anthropic's RSP is good or bad or could be better, and so I disagree with "there's probably not much to say at this point."
Dave Orr1moΩ1011-2

I think there are two paths, roughly, that RSPs could send us down. 

  1. RSPs are a good starting point. Over time we make them more concrete, build out the technical infrastructure to measure risk, and enshrine them in regulation or binding agreements between AI companies. They reduce risk substantially, and provide a mechanism whereby we can institute a global pause if necessary, which seems otherwise infeasible right now.
  2. RSPs are a type of safety-washing. They provide the illusion of a plan, but as written they are so vague as to be meaningless. T
... (read more)

But I also suspect that people on the more cynical side aren't going to be persuaded by a post like this. If you think that companies are pretending to care about safety but really are just racing to make $$, there's probably not much to say at this point other than, let's see what happens next.

This seems wrong to me. We can say all kinds of things, like:

  • Are these RSPs actually effective if implemented? How could they be better? (Including aspects like: how will this policy be updated in the future? What will happen given disagreements?)
  • Is there external v
... (read more)

If you think that Anthropic and other labs that adopt these are fundamentally well meaning and trying to do the right thing, you'll assume that we are by default heading down path #1. If you are more cynical about how companies are acting, then #2 may seem more plausible.

I disagree that what you think about a lab's internal motivations should be very relevant here. For any particular lab/government adopting any particular RSP, you can just ask, does having this RSP make it easier or harder to implement future good legislation? My sense is that the answ... (read more)

New York City Mayor Eric Adams has been using ElevenLabs AI to create recordings of him in languages he does not speak and using them for robocalls. This seems pretty not great.


Can you say more about why you think this is problematic? Recording his own voice for a robocall is totally fine, so the claim here is that AI involvement makes it bad? 

Yes he should disclose somewhere that he's doing this, but deepfakes with the happy participation of the person whose voice is being faked seems like the best possible scenario.

Yes and no. The main mode of harm we generally imagine is to the person deepfaked. However, nothing prevents the main harm in a particular incident of harmful deepfaking from being to the people who see the deep fake and believe the person depicted actually said and did the things depicted. That appears to be the implicit allegation here - that recipients might be deceived into thinking Adams actually speaks their language (at least well enough to record a robocall). Or at least, if that's not it, then I don't get it either.

FWIW as an executive working on safety at Google, I basically never consider my normal working activities in light of what they would do to Google's stock price.

The exception is around public communication. There I'm very careful because it's asymmetrical -- I could potentially cause a pr disaster that would affect the stock, but I don't see how I could give a talk that's so good that it helps it.

Maybe a plug pulling situation would be different, but I also think it's basically impossible for it to be a unilateral situation, and if we're in such a moment, I hardly think any damage would be contained to Google's stock price, versus say the market as a whole.

Hmm, I do think that is something that seems pretty likely to change, I think?  I expect safety researchers to be consulted quite a bit on regulations that will affect Google pretty heavily and i.e. any given high-level safety researcher currently has a decent chance to testify in front of congress, and like, I would want them to feel comfortable taking actions that definitely would have a large effect on the Google stock price (like saying that Google's AGI program should be shut down completely, or nationalized, or Google should be held liable for some damages caused by its AI systems).

How much do you think that your decisions affect Google's stock price? Yes maybe more AI means a higher price, but on the margin how much will you be pushing that relative to a replacement AI person? And mostly the stock price fluctuates on stuff like how well the ads business is doing, macro factors, and I guess occasionally whether we gave a bad demo.  

It feels to me like the incentive is just so diffuse that I wouldn't worry about it much.

Your idea of just donating extra gains also seems fine.

As I said in the dialogue, I think as a safety engineer, especially as someone who might end up close to the literal or metaphorical "stop button", the effect here seems to me to be potentially quite large, especially in aggregate.

That's not correct, or at least not how my Google stock grants work. The price is locked in at grant time, not vest time. In practice what that means is that you get x shares every month, which counts as income when multiplied by the current stock price.

And then you can sell them or whatever, including having a policy that automatically sells them as soon as they vest.

The star ratings are an improvement, I had felt also that breakthrough was overselling many of the items last week.

However, stars are very generic and don't capture the concept of a breakthrough very well. You could consider a lightbulb.

I also asked chatgpt to create an emoji of an AI breakthrough, and after some iteration it came up with this:

Use it if you like it!

Thanks for putting together this roundup, I learn things from it every time.

I agree with this.

Consider a hypothetical: there are two drugs we could use to execute prisoners convinced with the death penalty. One of them causes excruciating pain, the other does not, but costs more.

Would we feel that we would rather use the torture drug later? After all, the dude is dead, so he doesn't care either way.

I have a pretty strong intuition that those drugs are not similar. Same thing with the anesthesia example.

HT Michael Thiessen, who expects this to result in people figuring out how to extract the (distilled) model weights. Is that inevitable?


Not speaking for Google here.

I think it's inevitable, or at least it's impossible to stop someone willing to put in the effort. The weights are going to be loaded into the phone's memory, and a jailbroken phone should let you have access to the raw memory.

But it's a lot of effort and I'm not sure what the benefit would be to anyone. My guess is that if this happens it will be by a security researcher or some enterprising grad student, not by anyone actually motivated to use the weights for anything in particular.

I could see the illustrations via RSS, but don't see them here, chrome on mobile.

1Tony Karlsson2mo
I have read some articles, I can have a look again, thanks. Were to much information and possibilities so felt better to talk to a human.

The main place we differ is that we are on opposite sides of the ‘will Tether de-peg?’ market. No matter what they did in the past, I now see a 5% safe return as creating such a good business that no one will doubt ability to pay. Sometimes they really do get away with it, ya know?

This seems sensible, but I remember thinking something very similar about Full Tilt, and then they turned out to be doing a bunch of shady shit that was very not in their best interest. I think there's a significant chance that fraudsters gonna fraud even when they really shouldn... (read more)

Pradyumna: You a reasonable person: the city should encourage carpooling to reduce congestion

Bengaluru’s Transport Department (a very stable genius): Taxi drivers complained and so we will ban carpooling


It's not really that Bangalore banned carpooling, they required licenses for ridesharing apps. Maybe that's a de facto ban of those apps, but that's a far cry from banning carpooling in general.


Partly this will be because in fact current ML systems are not analogous to future AGI in some ways - probably if you tell the AGI that A is B, it will also know that B is A.

One oddity of LLMs is that we don't have a good way to tell the model that A is B in a way that it can remember. Prompts are not persistent, and as this paper shows, fine tuning doesn't do a good job of getting a fact into the model without doing a bunch of paraphrasing. Pretraining presumably works in a similar way.

This is weird! And I think helps make sense of some of the problems we see with current language models.

2Michael Tontchev2mo
Maybe our brains do a kind of expansion of a fact before memorizing it and its neighbors in logic space.
Yes, the model editing literature has various techniques and evaluations for trying to put a fact into a model.  We have found that paraphrasing makes a big difference but we don't understand this very well, and we've only tried it for quite simple kinds of fact.

45->55% is a 22% relative gain, while 90->100% is only an 11% gain. 

On the other hand, 45->55% is a reduction in error by 18%, while 90->100% is a 100% reduction in errors.

Which framing is best depends on the use case. Preferring one naively over the other is definitely an error. :)

I think the argument against LeCun is simple: while it may be true that AIs won't necessarily have a dominance instinct the way that people do, they could try to dominate for other reasons: namely that such dominance is an instrumental goal towards whatever its objective is. And in fact that is a significant risk, and can't be discounted by pointing out that they may not have a natural instinct towards dominance.

I just think that to an economist, models and survey results are different things, and he's not asking for the latter.

I think that Tyler is thinking more of an economic type model that looks at the incentives of various actors and uses that to understand what might go wrong and why. I predict that he would look at this model and say, "misaligned AI can cause catastrophes" is the hand-wavy bit that he would like to see an actual model of.

I'm not an economist (is IANAE a known initialization yet?), but it would probably include things like the AI labs, the AIs, and potentially regulators or hackers/thieves, try to understand and model their incentives and behaviors, and see... (read more)

2Sammy Martin3mo
I guess it is down to Tyler's personal opinion, but would he accept asking IR and defense policy experts on the chance of a war with China as an acceptable strategy or would he insist on mathematical models of their behaviors and responses? To me it's clearly the wrong tool, just as in the climate impacts literature we can't get economic models of e.g. how governments might respond to waves of climate refugees but can consult experts on it.

So... when can we get the optimal guide, if this isn't it? :)

In general to solve an NP complete problem like 3-SAT, you have to spend compute or storage to solve it. 

Suppose you solve one 3-SAT problem. If you don't write down the solution and steps along the way, then you have no way to get the benefit of the work for the next problem. But if you do store the results of the intermediate steps, then you need to store data that's also polynomial in size.

In practice often you can do much better than that because the problems you're solving may share certain data or characteristics that lead to shortcuts, but in the general case you have to pay the cost every time you need to solve an NP complete problem.

So it means that you can't gain cost advantages in general by solving other same or similarly computationally complex problems?

If one person estimates the odds at a billion to one, and the other at even, you should clearly bet the middle. You can easily construct bets that offer each of them a very good deal by their lights and guarantee you a win. This won't maximize your EV but seems pretty great if you agree with Nick.

Anthropic reportedly got a $4B valuation on negligible revenue. Cohere is reportedly asking for a $6B valuation on maybe a few $M in revenue.

AI startups are getting pretty absurd valuations based on I'm not sure what, but I don't think it's ARR.

Thanks! I mentioned anthropic in the post, but would similarly find it interesting if someone did a write up about cohere. It could be that OAI is not representative for reasons I don't understand.

I'm not sure multiple of revenue is meaningful right now. Nobody is investing in OAI because of their current business. Also there are tons of investments at infinite multiples once you realize that many companies get investments with no revenue.

1. Yep, revenue multiples are a heuristic for expectations of future growth, which is what I care about 2. This is true, but I'm not aware of any investments on $0 revenue at the $10B scale. Would love to hear of counterexamples if you know of any![1] 1. ^ Instagram is the closest I can think of, but that was ~20x smaller and an acquisition, not an investment

I mean, computers aren't technically continuous and neither are neural networks, but if your time step is small enough they are continuous-ish. It's interesting that that's enough.

I agree music would be a good application for this approach.

I think this is real, in the sense that they got the results they are reporting and this is a meaningful advance. Too early to say if this will scale to real world problems but it seems super promising, and I would hope and expect that Waymo and competitors are seriously investigating this, or will be soon. 

Having said that, it's totally unclear how you might apply this to LLMs, the AI du jour. One of the main innovations in liquid networks is that they are continuous rather than discrete, which is good for very high bandwidth exercises like vision. O... (read more)

Thanks for your answer! Very interesting I didn't know about the continuous nature of LNN; I would have thought that you needed different hardware (maybe an analog computer?) to treat continuous values. Maybe it could work for generative networks for images or music, that seems less discrete than written language.
Then again...the output of an LLM is a stream of tokens (yeah?). I wonder what applications LTCs could have as a post-processor for LLM output? No idea what I'm really talking about though.

Usually "any" means each person in the specific class individually. So perhaps not groups of people working together, but a much higher bar than a randomly sampled person.

But note that Richard doesn't think that "the specific 'expert' threshold will make much difference", so probably the exact definition of "any" doesn't matter very much for his thoughts here.

Similar risk to Christiano, which might be medium by less wrong standards but is extremely high compared to the general public.

High risk tolerance (used to play poker for a living, comfortable with somewhat risky sports like climbing or scuba diving). Very low neuroticism, medium conscientiousness. I spend a reasonable amount of time putting probabilities on things, decently calibrated. Very calm in emergency situations.

I'm a product manager exec mostly working on applications of language AI. Previously an ml research engineer.

I don't actually follow -- how does change blindness in people relate to how much stuff you have to design?

I assumed you meant that you (as the one running the simulation) had arranged for people to be change-blind. Which means that there's no particular reason that you yourself would be change-blind. So you can't just make the people copies of yourself, or the world a copy of your own world. You have to design them from scratch, and then put together a whole history for the universe so that their having evolved to be change-blind fits with the supposed past. On edit: and of course you can't just let them evolve and assume they'll be change-blind, unless you have a pretty darned impressive ability to predict how that will come out.

Suppose you were running a simulation, and it had some problems around object permanence, or colors not being quite constant (colors are surprisingly complicated to calculate since some of them depend on quantum effects), or other weird problems. What might you do to help that? 

One answer might be to make the intelligences you are simulating ignore the types of errors that your system makes. And it turns out that we are blind to many changes around us!

Or conversely, if you are simulating an intelligence that happens to have change blindness, then you ... (read more)

Doesn't that mean you have to do an awful lot of work to design everything in tremendous detail, and also fabricate the back story?

One thing that I think is missing (maybe just beyond the scope of this post) is thinking about newcomers with a positive frame: how do we help them get up to speed, be welcomed, and become useful contributors?

You could imagine periodic open posts, for instance, where we invite 101-style questions, post your objection to AI risks, etc where more experienced folks could answer those kind of things without cluttering up the main site. Possibly multiple more specific such threads if there's enough interest.

Then you can tell people who try to post level 1-3 stu... (read more)

Oh, oops. I originally intended to include my thoughts about newcomers in this post and must have gotten distracted before I did so. I just add a whole section on that. Thanks for flagging. Periodic open posts is one of my current favorite ideas of what to do here.

Yeah I think the AI Questions Open Thread series has been good for this (and I've been directing people to that where appropriate).

I've also been mulling over how to make a good "okay, you're new, you want to contribute to the AI discussion on LessWrong, what stuff is useful to do?" post.

Let me suggest a different direction.

The risk is that a niche candidate will make the idea too associated with them, which will let everyone else off the hook -- it's easy to dismiss a weirdo talking about weird stuff.

A better direction might be to find a second tier candidate that wants to differentiate themselves, and help them with good snappy talking points that sound good in a debate. I think that's both higher impact and has a much smaller chance of pushing things in the wrong direction accidentally.

Andrew Yang. He signed the FLI letter, transformative AI was a core plank of his run in 2020, and he made serious runs for president and NYC mayor. 

YouGov is a solid but not outstanding Internet pollster.

Still have to worry about selection bias with Internet polls, but I don't think you need to worry that they have a particular axe to grind here.

This seems like an argument that proves too much. Many times, people promising simple solutions to complex problems are scammers or just wrong. But we also have lots of times where someone has an insight that cuts to the core of a problem, and we have great solutions that much better and more scalable than what has come before.

Maybe the author is on to something, but I think the idea needs to go one level deeper: what distinguishes real innovation from "solutionism"?

Also, his argument about why making work more efficient doesn't have any upside is so bafflingly wrongheaded that I highly doubt there are genuine insights to mine here.

Here's one argument:

Consumption is great when you get something in return that improves your life in some way. Convenience, saving time, and things that you use are all great.

However, there's a ton of consumption in terms of buying things that don't add utility, at least not at a reasonable return. People buy exercise bikes that they don't use, books that they don't read, panini presses that just sit on the counter, and lives become more cluttered and less enjoyable.

One reason for this is the hedonic treadmill, that our happiness reverts to a mean over tim... (read more)

Next time I would actually include the definition of a technical term like Leibniz's first principle to make this post a little less opaque, and therefore more interesting, to non experts.

1Sven Nilsen8mo
Thank you! Post updated to include the definition.

This. If they had meant 19% less hallucinations they would have said 19% reduction in whatever, which is a common way to talk about relative improvements in ML.

For sure product risk aversion leads towards people moving to where they can have some impact, for people who don't want pure research roles. I think this is basically fine -- I don't think that product risk is all that concerning at least for now.

Misalignment risk would be a different story but I'm not aware of cases where people moved because of it. (I might not have heard, of course.)

There's a subtlety here around the term risk.

Google has been, IMO, very unwilling to take product risk, or risk a PR backlash of the type that Blenderbot or Sydney have gotten. Google has also been very nervous about perceived and actual bias in deployed models.

When people talk about red tape, it's not the kind of red tape that might be useful for AGI alignment, it's instead the kind aimed at minimizing product risks. And when Google says they are willing to take on more risk, here they mean product and reputational risk.

Maybe the same processes that would... (read more)

1Justin Olive9mo
Hi Dave, thanks for the great input from the insider perspective. Do you have any thoughts on whether risk-aversion (either product-related or misalignment-risk) might be contributing to a migration of talent towards lower-governance zones? If so, are there any effective ways to combat this that don't translate to accepting higher levels of risk?

I feel like every week there's a post that says, I might be naive but why can't we just do X, and X is already well known and not considered sufficient. So it's easy to see a post claiming a relatively direct solution as just being in that category.

The amount of effort and thinking in this case, plus the reputation of the poster, draws a clear distinction between the useless posts and this one, but it's easy to imagine people pattern matching into believing that this is also probably useless without engaging with it.

(Ah, to clarify: I wasn't saying that Kaj's post seems insane; I was referring to the fact that lots of thinking/discourse in general about AI seems to be dangerously insane.)

FWIW I at least found this to be insightful and enlightening. This seems clearly like a direction to explore more and one that could plausibly pan out.

I wonder if we would need to explore beyond the current "one big transformer" setup to realize this. I don't think humans have a specialized brain region for simulations (though there is a region that seems heavily implicated, see, but if you want to train something using gradient descent, it might b... (read more)

I think in practice roughly the opposite is true.

As people age, they become less flexible in their beliefs and more set in their ways. If they are highly influential, then it's difficult to make progress when they are still alive.

Science advances one funeral at a time:'s_principle

It's true that as you age you accumulate experience, and this stops when you die. For you, death is or course a hard limit on knowledge.

For the world, it's much less clear.

"Science advances one funeral at a time" -> this seems to be both generally not true as well as being a harmful meme (because it is a common argument used to argue against life extension research). 

Nuclear submarines (1870, Twenty Thousand Leagues Under the Sea)

Time travel (1895, The Time Machine)

This seems closely related to the concept of weirdness points. 

I certainly am careful about how "lively" I appear in many settings, so that it doesn't become a distraction or cause social penalties to me or whatever aim I'm trying to accomplish. This is the way that societies work -- we all have shared norms for many interactions that allow for violations up to a point, and then much more freedom in private or with trusted friends and family.

And of course what counts as weird in any group depends on the group.  At work, advocating for cryonics ma... (read more)

It might be closely related but I think you're leaving a large argumentative gap here: why should 'weirdness points' have anything to do with 'social predators'? Why do you need to tailor one to the other? I'd suggest two possible arguments stemming from the same basic consequence of weirdness increasing danger.*

Having lots of (visible) weirdness is dangerous because it singles you out for predation: the weirdnesses themselves are probably intrinsically dangerous vulnerabilities (if they were not considered dubious or disgusting or unpopular or immoral, wh... (read more)

I predict that instead of LLMs being trained on ASR-generated text, instead they will be upgraded to be multimodal, and trained on audio and video directly in addition to text.

Google has already discussed this publicly, e.g. here:

This direction makes sense to me, since these models have huge capacity, much greater than the ASR models do. Why chain multiple systems if you can learn directly from the data?

I do agree with your underlying point that there's massive amounts of audio and video data that haven't been used much yet, and those are a great and growing resource for LLM training.

In general I am no fan of angellist syndicates because the fees are usurious, but if you have high conviction that there are huge returns to AI, possibly LLM syndicates might be worth a look.

TBF there's no way Eliezer would approve that prompt to a superhuman AI, so I think no is the correct answer there. The first explanation is vague but basically correct as to why, at least on my model of Eliezer.

Load More