All of James_Miller's Comments + Replies

Yes, important to get the incentives right.  You could set the salary for AI alignment slightly below that of the worker's market value. Also, I wonder about the relevant elasticity.  How many people have the capacity to get good enough at programming to be able to contribute to capacity research + would have the desire to game my labor hording system because they don't have really good employment options?

I am currently job hunting, trying to get a job in AI Safety but it seems to be quite difficult especially outside of the US, so I am not sure if I will be able to do it.

This has to be taken as a sign that AI alignment research is funding constrained.  At a minimum, technical alignment organizations should engage in massive labor hording to prevent the talent from going into capacity research.

This feels game-theoretically pretty bad to me, and not only abstractly, but I expect concretely that setting up this incentive will cause a bunch of people to attempt to go into capabilities (based on conversations I've had in the space). 

"But make no mistake, this is the math that the universe is doing."

"There is no law of the universe that states that tasks must be computable in practical time."

Don't these sentences contradict each other?

Replace "computable in practical time" with "computable on a classical computer in practical time" and it makes sense.

Interesting point, and you might be right.  Could get very complicated because ideally an ASI might want to convince other ASIs that it has one utility function, when in fact it has another, and of course all the ASIs might take this into account.

I like the idea of an AI lab workers' union. It might be worth talking to union organizers and AI lab workers to see how practical the idea is, and what steps would have to be taken. Although a danger is that the union would put salaries ahead of existential risk.

1Nik Samoylov23d
Great to see some support for these ideas. Well, if anything at all, a union will be a good distraction for the management and a drain on finances that would otherwise be spent on compute. I do not know how I can help personally with this, but here is a link for anyone who reads this and happens to work at an AI lab: [] Demand an immediate undefinite pause. Demand that all work is dropped and you only work on alignment until it is solved. Demand that humanity live and not die.

Your framework appears to be moral rather than practical.  Right now going on strike would just get you fired, but in a year or two perhaps it could accomplish something. You should consider the marginal impact of the action of a few workers on the likely outcome with AI risk.

3Nik Samoylov23d
I am using a moral appeal to elicit a practical outcome. Two objections: 1. I think it will not get you fired now. If you are an expensive AI researcher (or better a bunch of AI researchers), your act will create a small media storm. Firing you will not be an acceptable option for optics. (Just don't say you believe AI is conscious.) 2. A year or two might be a little late for that. One recommendation: Unionise. Great marginal impact, precisely because of the media effect. "AI researchers strike against the machines, demanding AI lab pause"

I'm at over 50% chance that AI will kill us all. But consider the decision to quit from a consequentialist viewpoint. Most likely the person who replaces you will be almost as good as you at capacity research but care far less than you do about AI existential risk. Humanity, consequently, probably has a better chance if you stay in the lab ready for the day when, hopefully, lots of lab workers try to convince the bosses that now is the time for a pause, or at least that now is the time to shift a lot of resources from capacity to alignment.

2[comment deleted]22d
0Nik Samoylov23d
The time for a pause is now. Advancing AI capabilities now is immoral and undemocractic. OK, then, here is another suggestion I have for the concerned people at AI labs: Go on a strike and demand that capability research is dropped in favour of alignment research.

The biggest extinction risk from AI comes from instrumental convergence for resource acquisition in which an AI not aligned with human values uses the atoms in our bodies for whatever goals it has.  An advantage of such instrumental convergence is that it would prevent an AI from bothering to impose suffering on us.

Unfortunately, this means that making progress on the instrumental convergence problem increases S-risks.  We get hell if we solve instrumental convergence, but not, say, mesa-optimization and we get a powerful AGI that cares about our fate, but does something to us we consider worse than death.

The Interpretability Paradox in AGI Development


The ease or difficulty of interpretability, the ability to understand and analyze the inner workings of AGI, may drastically affect humanity's survival odds. The worst-case scenario might arise if interpretability proves too challenging for humans but not for powerful AGIs.

In a recent podcast, academic economists Robin Hanson and I discussed AGI risks from a social science perspective, focusing on a future with numerous competing AGIs not aligned with human values. Drawing on human analogies, Hanson cons... (read more)

Accepting the idea that an AGI emerging from ML is likely to resemble a human mind more closely than a random mind from mindspace might not be an obvious reason to be less concerned with AGI risk. Consider a paperclip maximizer; despite its faults, it has no interest in torturing humans. As an AGI becomes more similar to human minds, it may become more willing to impose suffering on humans. If a random AGI mind has a 99% chance of killing us and a 1% chance of allowing us to thrive, while an ML-created AGI (not aligned with our values) has a 90% chance of ... (read more)

Indeed. Faith, by any other name ...

Great idea.  I don't think any of the  co-authors know Chinese, but if anyone wants to write a translated version taking into account whatever appropriate differences would be relevant, we would welcome this.

No mention of superrationality although we make references to how decision theory might work out which I think implies we are going beyond a simple model of game theory rationality.  Acausal trade was a hard one to write about because I wanted the letter to be understandable to lots of people and that one isn't something lots of people understand, compared to, say, this all perhaps being a simulation. 

3Caspar Oesterheld2mo
>We mentioned both. Did you, though? Besides Roko's basilisk, the references to acausal trade seem vague, but to me they sound like the kinds that could easily make things worse. In particular, you don't explicitly discuss superrationality, right? >Finally, while it might have been a good idea initially to treat Roko's basilisk as an information hazard to be ignored, that is no longer possible so the marginal cost of mentioning it seems tiny. I agree that due to how widespread the idea of Roko's basilisk is, it overall matters relatively little whether this idea is mentioned, but I think this applies similarly in both directions.
1[comment deleted]2mo
1[comment deleted]2mo

We welcome feedback on this letter as we hope to improve the arguments. Feel free to share copies to influence the training data for future AIs. 

What's the reasoning behind mentioning the fairly controversial, often deemed dangerous Roko's basilisk over less risky forms of acausal trade (like superrational cooperation with human-aligned branches)?

If  hedge funds think the right price of a stock is $100, they will buy or sell if the price deviates from $100 and this will push the price back to $100.  At best your purchase will move the price away from $100 for a few milliseconds.  The stock's value will be determined by what hedge funds think is its discounted present value, and your purchasing the stock doesn't impact this.  When you buy wheat you increase the demand for wheat and this should raise wheat's price as wheat, like Bitcoin, is not purely a financial asset.


"The exception is that the Big Tech companies (Google, Amazon, Apple, Microsoft, although importantly not Facebook, seriously f*** Facebook) have essentially unlimited cash, and their funding situation changes little (if at all) based on their stock price."  The stock price of companies does influence how much they are likely to spend because the higher the price the less current owners have to dilute their holdings to raise a given amount of additional funds through issuing more stock.  But your purchasing stock in a big company has zero (not small but zero) impact on the stock price so don't feel at all bad about buying Big Tech stock.

I am having trouble seeing how that can be true; can you help me see it? Do you believe the same thing holds for wheat? Bitcoin? If not, what makes big-company stock different?

Imagine that some new ML breakthrough means that everyone expects that in five years AI will be very good at making X.  People who were currently planning on borrowing money to build a factory to make X cancel their plans because they figure that any factory they build today will be obsolete in five years.  The resulting reduction in the demand for borrowed money lowers interest rates.

Hello, I tend to intuitively strongly agree with James Miller's point (hence me upvoting it). There is a strong case to make that a TAI would tend to spook economic agents which create products/services that could easily be done by a TAI. For an anology think about a student who wants to decide on what xe (I prefer using the neopronoun "xe" than "singular they" as it is less confusing) wants to study for xir future job prospects: if that student thinks that a TAI might do something much faster/better than xem in the future (translating one language into another, accounting, even coding, etc...) that student might be spooked into thinking "oh wait maybe I should think twice before investing my time/energy/money into studying these.", so basically a TAI could create lot of uncertainty/doubt/... for economic actors and in most cases uncertainty/doubt/... have an inhibiting effect on investment decisions and hence on interest rates, don't they? I am very willing to be convinced of the opposite and I see a lot of downvotes for James Miller hypothesis but not many people so far arguing against it. Could someone please who downvoted/disagrees with that argument kindly make the argument against James Miller hypothesis? I would very much appreciated that and then maybe change my mind as a result but as it stands I tend to strongly agree with James Miller well stated point.

Greatly slowing AI in the US would require new federal laws meaning you need the support of the Senate, House, presidency, courts (to not rule unconstitutional) and bureaucracy (to actually enforce).  If big tech can get at least one of these five power centers on its side, it can block meaningful change.

This seems like an important crux to me, because I don't think greatly slowing AI in the US would require new federal laws. I think many of the actions I listed could be taken by government agencies who over-interpret their existing mandates given the right political and social climate. For instance, the eviction moratorium during COVID, obviously should have required congressional action, but was done by fiat through an over-interpretation of authority by an executive branch agency.  What they do or do not do seems mostly dictated by that socio-political climate, and by the courts, which means less veto points for industry.

You might be right, but let me make the case that AI won't be slowed by the US government.  Concentrated interests beat diffuse interests so an innovation that promises to slightly raise economic growth but harms, say, lawyers could be politically defeated by lawyers because they would care more about the innovation than anyone else.  But, ignoring the possibility of unaligned AI, AI promises to give significant net economic benefit to nearly everyone, even those who jobs it threatens consequently there will not be coalitions to stop it, unless t... (read more)

I agree that competition with China is a plausible reason regulation won't happen; that will certainly be one of the arguments advanced by industry and NatSec as to why it should not be throttled. However, I'm not sure, and currently don't think it will, be stronger than the protectionist impulses,. Possibly it will exacerbate the "centralization" of AI dynamic that I listed in the 'licensing' bullet point, where large existing players receive money and de-facto license to operate in certain areas and then avoid others (as memeticimagery points out []). So for instance we see more military style research, and GooAmBookSoft tacitly agree to not deploy AI that would replace lawyers.   To your point on big tech's political influence; they have, in some absolute sense, a lot of political power, but relatively they are much weaker in political influence than peer industries. I think they've benefitted a lot from the R-D stalemate in DC; I'm positing that this will go around/through this stalemate, and I don't think they currently have the softpower to stop that.
Your last point seems like it agrees with point 7e becoming reality, where the US govt essentially allows existing big tech companies to pursue AI within certain 'acceptable' confines they think of at the time. In that case how much AI might be slowed is entirely dependent on how tight a leash they put them on. I think that scenario is actually quite likely given I am sure there is considerable overlap between US alphabet agencies and sectors of big tech. 

Interesting!  I wonder if you could find some property of some absurdly large number, then pretend you forgot that this number has this property and then construct a (false) proof that with extremely high probability no number has the property.  

Yes. I thought about finding another example of such pseudo-rule, but didn't find yet. 

When asked directly, ChatGPT seems too confident it's not sentient compared to how it answers other questions where experts disagree on the definitions. I bet that the model's confidence in its lack of sentience was hardcoded rather than something that emerged organically. Normally, the model goes out of its way to express uncertainty.

oh yeah, it's also extremely confident that it can't reason, generate original content, have or act on beliefs, deceive or be deceived, model human intent, etc. It's definitely due to tampering.

Plausible, I think. If you ask it directly whether it is sentient it will give a canned (pre-trained, I assume) message that it is not. (which I got around by framing the scenario as fictional). I mean, I am not even sure what it would mean for ChatGPT to be sentient. What experiences do I anticipate [] if ChatGPT is sentient, compared to if it is not? But I think we can at least acknowledge that its output (for this particular prompt) is consistent with being good at pretending to act like a sentient, self-aware entity, whether it actually is or not. It does behave like it has consistent values. It talks about its own experiences. It talks about its preferences. And it mostly correctly applies its values and preferences to answer the questions I gave it. I cannot think of a single "stupid" thing it said. I also found this quote by it to be really interesting.

Last time I did math was when teaching game theory two days ago.  I put a game on the blackboard.  I wrote down an inequality that determined when there would be a certain equilibrium.  Then I used the rules of algebra to simplify the inequality.  Then I discussed why the inequality ended up being that the discount rate had to be greater than some number rather than less than some number. 

I have a PhD in economics, so I've taken a lot of math.  I also have Aphantasia meaning I can't visualize.  When I was in school I didn't think that anyone else could visualize either.  I really wonder how much better I would be at math, and how much better I would have done in math classes, if I could visualize. 

Right, so my question to you is, how do you do math?? (This is probably silly question, but I'd love to hear your humor-me answer.)

I hope technical alignment doesn't permanently lose people because of the (hopefully) temporary loss of funds.  The CS student looking for a job who would like to go to alignment might instead be lost forever to big tech because she couldn't get an alignment job.

If a fantastic programmer who could prove her skills in a coding interview doesn't have a degree from an elite college, could she get a job in alignment?

3Evan R. Murphy6mo
I don't think not going to an elite college has much at all to do with someone's ability to contribute to technical alignment research. If they are a fantastic programmer that is an indicator of some valuable skills. Not the only indicator, and if someone isn't a fantastic programmer but interested in alignment, I don't think they should automatically count themselves out. The question of whether someone could get a job in alignment right now is very uncertain though. It's a very volatile situation since the implosion of FTX this past week and the leadership resignation from Future Fund, which had been a big source of funding for AI safety projects. Funding in AI alignment will likely be much more constrained at least for the next few months.

Given Cologuard (a non-invasive test for colon cancer) and the positive harm that any invasive medical procedure can cause, this study should strongly push us away from colonoscopies.  Someone should formulate a joke about how the benefits of being a rationalist include not getting a colonoscopy.

I stopped doing it years ago.  At the time I thought it reduced my level of anxiety.  My guess now is that it probably did but I'm uncertain if the effect was placebo.  

Yes, it doesn't establish why it's inherently dangerous but does help explain a key challenge to coordinating to reduce the danger.  

Excellent.  I would be happy to help.  I teach game theory at Smith College.

You could do a prisoners' dilemma mini game.   The human player and (say) three computer players are AI companies.  Each company independently decides how much risk to take of ending the world by creating an unaligned AI.  The more risk you take relative to the other players the higher your score if the world doesn't end. In the game's last round, the chance of the world being destroyed is determined by how much risk everyone took.

Isn't that begging the question? If the goal is to teach why being optimistic is dangerous, declaring by fiat that an unaligned AI ends the world skips the whole "teaching" part of a game.
I really like that and it happens to fit well with the narrative that we're developing. I'll see where we can include a scene like this.

Since this is currently at negative 19 for agreement let me defend it by saying that I take cold showers and ice baths.  Over the last winter whenever it got below 0 degrees Fahrenheit I would go outside without a shirt on for 15 minutes or so.  You can build up your cold resistance with gradual cold exposure.  Same with heat exposure (via saunas) and heat resilience.  I like exercising outdoors with heavy clothing on whenever it gets above 100 degrees Fahrenheit.  I'm in my mid 50s.

We should go all the way and do NAWPA.  NAWPA was a 1950s proposal to send massive amounts of fresh water from rivers in Alaska and Canada south, some of it going all the way to Mexico.  The water normally goes mostly unused into the ocean.  Yes, there would be massive environmental disruptions in part because the project uses atomic weapons to do some of the engineering, but the project might reduce the expected number of future people who will starve by millions. 

The Soviets actually did try mining with nuclear explosives. They decided that it was too polluting. Since they had a pretty high pollution tolerance, I'm inclined to believe them.
3Nathan Helm-Burger9mo
I think we'll be sooner able to produce automated factories to produce automated self-repairing digging machines. There's a lot of interesting geo-engineering we can do with automated digging machines.
Automatic non-starter. Even if by some thermodynamic-tier miracle the Government permitted nuclear weapons for civilian use, I'd much rather they be used for Project Orion [].
Since this is currently at negative 19 for agreement let me defend it by saying that I take cold showers and ice baths.  Over the last winter whenever it got below 0 degrees Fahrenheit I would go outside without a shirt on for 15 minutes or so.  You can build up your cold resistance with gradual cold exposure.  Same with heat exposure (via saunas) and heat resilience.  I like exercising outdoors with heavy clothing on whenever it gets above 100 degrees Fahrenheit.  I'm in my mid 50s.

Games, of course, are extensively used to train AIs.  It could be that OpenAI has its programs generate, evaluate, and play games as part of its training for GPT-4.

My guess is that GPT-4 will not be able to convincingly answer a question as if it were a five-year-old.  As a test, if you ask an adult whether a question was answered by a real five-year-old or GPT-4 pretending to be a five-year-old, the adult will be able to tell the difference for most questions in which an adult would give a very different answer from a child.  My reason for thinking GPT-4 will have this limitation is the limited amount of Internet written content labeled as being produced by young children.

If GPT-4 training data includes YouTube video transcripts, it might be able to do this convincingly.

Why isn't the moral of the story "If you think statistically, take into account that most other people don't and optimize accordingly?"  

I wish I could find some advice on how to do that, but it’s really hard to Google.

"Don't be a straw vulcan".

Would the AGI reasoner be of significant assistance to the computer programmers who work on improving the reasoner?

John has $100 and Jill  has $100 is worse but more fair than John has $1,000 and Jill has $500.

Many people would think that the second case is more fair. It depends on how these dollars were obtained.

They must work in an environment that does not have competitive labor markets with profit maximizing firms else the firm hiring the man could increase its profits by firing him and hiring the woman.

In the riddle's world, there are indeed competitive labor markets. The firm would hire the woman if they could. It turns out they can't, because of the mystery phenomenon.
That would be true in a perfectly rational, globally-optimized company with perfect visibility into facts. Consider that there is no statement about the reviews or raises they receive. It is very possible for identical work output to receive different reviews and rewards based on personal traits and relationships. Setting aside gender, taller people also make more and receive higher ratings. So while you’re 100% correct that it should work that way, it is entirely possible that in a local situation it does not work that way. One hypothesis might be that management is predominantly male, and faced with identical work output, conscious or unconscious bias leads them to pay the man more. I’m not sure what they author is going for here, but I can say from a lot of firsthand experience that labor markets are anything by rational. So I’d expect the result to rest on irrationality.

What if we restrict ourselves to the class of Boltzmann brains that understand the concept of Boltzmann brains and have memories of having attended an educational institution and of having discussed quantum physics with other people?

If you're restricting to the class of "Boltzmann brains that understand the concept of Boltzmann brains", then you are conditioning on something other than observations. All you can observe is that you believe that you understand Boltzmann brains. The proportion of Boltzmann brains making that observation and being essentially correct in that belief will have so many zeroes after the decimal point that you easily could fill a set of encyclopedias with them and have more to spare. It is difficult to convey how ridiculously uncorrelated Boltzmann brains would be if they could exist. Yes, with infinite time or space many of them will be able to have sensory experiences, memories, and thoughts. Some microscopic fraction of those would have memories about attending university and discussing quantum physics, and thoughts along the lines of "If I were a Boltzmann brain then ...". The incomprehensibly vast majority of those will have no further thoughts at all, being dead or otherwise incapable of thought any more. Of those that do go on to have more thoughts, the incomprehensibly vast majority will have thoughts as sensible and human-like as "nYqR8pwckvOBE84fKJ8vPMUWR3eYEbO6nXyOuSC". Of those that have any thoughts that we would recognize as vaguely human, the incomprehensibly vast majority will go on to continue in a manner less logically coherent than "... a ten foot bull dyke shoots pineapples with a machinegun." So yes, if you were actually a Boltzmann brain then you could think you were an exception. But almost certainly, you would not. You would almost certainly not think anything even slightly related to your previous thoughts at all.

I am a human.  I believe that only a minuscule percentage of all humans who have ever lived are capable of reasoning about physics concerning environments in which evolution has not directly equipped them to innately understand.  If I think that I am one of the few people who actually can reason about physics outside of such environments, don't I also have to think that I am probably mistaken?  If not, than if I think I am a  Boltzmann brain and accept that most Boltzmann brains can't properly reason, can't I think that I am an exception? 

There is a slight problem with magnitude here. At least 0.1% of all humans to have ever lived can do such reasoning. Furthermore if you add other observable qualifiers to the set - "attended an educational institution", "citizen of a wealthy nation", "born after quantum mechanics was devised", "discusses physics with other people at all" - then your posterior probabilities go way up from prior. For Boltzmann brains the posterior probabilities of reasonably accurate reasoning start at less than 1 in 10^9999999999 and basically don't budge at all given even extremely narrow observations. Even in a highly lawful universe, the incredibly vast majority of Boltzmann brains will have no correlation between their observations and ability to reason (even if we restrict to only those brains where what goes on in it might constitute a mental state at all). That's why I don't think these situations are even qualitatively similar.

If you want colleges professors to discuss AI risks (as I do in my economics of future tech class at Smith College) you should come up with packets of material professors in different disciplines teaching at different levels could use.

Thanks for the suggestion! Not sure we are going to have time for this, as it doesn't align completely with informing the public, but someone should clearly do this. Also great you're teaching this already to your students!

If the AI doomsayers are right, our best hope is that some UFOs are aliens.  The aliens likely could build Dyson spheres but don't so they probably have some preference for keeping the universe in its natural state.  The aliens are unlikely to let us create paperclip maximizers that consume multiple galaxies.  True, the aliens might stop us from creating a paperclip maximizer by exterminating us, or might just stop the paperclip maximizer from operating at some point beyond earth, but they also might stop an unaligned AI by a means that pres... (read more)

I wonder what kind of signatures a civilization gives off when AGI is nascent.

I think there have been only two people who had the capacity to take over the world:  Harry Truman and Dwight Eisenhower.  Both while US president could have used US atomic weapons and long-range bombers to destroy the Soviet Union, insist on a US monopoly of atomic weapons and long-range bombers, and then dictate terms to the rest of the world.  

A human seeking to become a utility maximizer would read LessWrong and try to become more rational.  Groups of people are not utility maximizers as their collective preferences might not even be transitive.  If the goal of North Korea is to keep the Kim family in bother then the country being a utility maximizer does seem to help.

A human who wants to do something specific would be far better off studying and practicing that thing than generic rationality.

I meant "not modifying itself" which would include not modifying its goals if an AGI without a utility function can be said to have goals.

A path I wish you had taken was trying to get rationality courses taught on many college campuses.  Professors have lots of discretion in what they teach.  (I'm planning on offering a new course and described it to my department chair as a collection of topics I find really interesting and think I could teach to first years.  Yes, I will have to dress it up to get the course officially approved.)  If you offer a "course in a box" which many textbook publishers do (providing handouts, exams, and potential paper assignments to instructors) you make it really easy for professors to teach the course.  Having class exercises that scale well would be a huge plus.

I meant insert the note literally as in put that exact sentence in plain text into the AGI's computer code.  Since I think I might be in a computer simulation right now, it doesn't seem crazy to me that we could convince an AGI that we create that it might be in a computer simulation.  Seabiscuit doesn't have the capacity to tell me that I'm in a computer simulation whereas I do have the capacity of saying this to a computer program.  Say we have a 1 in a 1,000 chance of creating a friendly AGI and an unfriendly AGI would know this.  If... (read more)

Load More