I am currently job hunting, trying to get a job in AI Safety but it seems to be quite difficult especially outside of the US, so I am not sure if I will be able to do it.
This has to be taken as a sign that AI alignment research is funding constrained. At a minimum, technical alignment organizations should engage in massive labor hording to prevent the talent from going into capacity research.
This feels game-theoretically pretty bad to me, and not only abstractly, but I expect concretely that setting up this incentive will cause a bunch of people to attempt to go into capabilities (based on conversations I've had in the space).
"But make no mistake, this is the math that the universe is doing."
"There is no law of the universe that states that tasks must be computable in practical time."
Don't these sentences contradict each other?
Interesting point, and you might be right. Could get very complicated because ideally an ASI might want to convince other ASIs that it has one utility function, when in fact it has another, and of course all the ASIs might take this into account.
I like the idea of an AI lab workers' union. It might be worth talking to union organizers and AI lab workers to see how practical the idea is, and what steps would have to be taken. Although a danger is that the union would put salaries ahead of existential risk.
Your framework appears to be moral rather than practical. Right now going on strike would just get you fired, but in a year or two perhaps it could accomplish something. You should consider the marginal impact of the action of a few workers on the likely outcome with AI risk.
I'm at over 50% chance that AI will kill us all. But consider the decision to quit from a consequentialist viewpoint. Most likely the person who replaces you will be almost as good as you at capacity research but care far less than you do about AI existential risk. Humanity, consequently, probably has a better chance if you stay in the lab ready for the day when, hopefully, lots of lab workers try to convince the bosses that now is the time for a pause, or at least that now is the time to shift a lot of resources from capacity to alignment.
The biggest extinction risk from AI comes from instrumental convergence for resource acquisition in which an AI not aligned with human values uses the atoms in our bodies for whatever goals it has. An advantage of such instrumental convergence is that it would prevent an AI from bothering to impose suffering on us.
Unfortunately, this means that making progress on the instrumental convergence problem increases S-risks. We get hell if we solve instrumental convergence, but not, say, mesa-optimization and we get a powerful AGI that cares about our fate, but does something to us we consider worse than death.
The Interpretability Paradox in AGI Development
The ease or difficulty of interpretability, the ability to understand and analyze the inner workings of AGI, may drastically affect humanity's survival odds. The worst-case scenario might arise if interpretability proves too challenging for humans but not for powerful AGIs.
In a recent podcast, academic economists Robin Hanson and I discussed AGI risks from a social science perspective, focusing on a future with numerous competing AGIs not aligned with human values. Drawing on human analogies, Hanson cons...
Accepting the idea that an AGI emerging from ML is likely to resemble a human mind more closely than a random mind from mindspace might not be an obvious reason to be less concerned with AGI risk. Consider a paperclip maximizer; despite its faults, it has no interest in torturing humans. As an AGI becomes more similar to human minds, it may become more willing to impose suffering on humans. If a random AGI mind has a 99% chance of killing us and a 1% chance of allowing us to thrive, while an ML-created AGI (not aligned with our values) has a 90% chance of ...
Great idea. I don't think any of the co-authors know Chinese, but if anyone wants to write a translated version taking into account whatever appropriate differences would be relevant, we would welcome this.
No mention of superrationality although we make references to how decision theory might work out which I think implies we are going beyond a simple model of game theory rationality. Acausal trade was a hard one to write about because I wanted the letter to be understandable to lots of people and that one isn't something lots of people understand, compared to, say, this all perhaps being a simulation.
We welcome feedback on this letter as we hope to improve the arguments. Feel free to share copies to influence the training data for future AIs.
What's the reasoning behind mentioning the fairly controversial, often deemed dangerous Roko's basilisk over less risky forms of acausal trade (like superrational cooperation with human-aligned branches)?
If hedge funds think the right price of a stock is $100, they will buy or sell if the price deviates from $100 and this will push the price back to $100. At best your purchase will move the price away from $100 for a few milliseconds. The stock's value will be determined by what hedge funds think is its discounted present value, and your purchasing the stock doesn't impact this. When you buy wheat you increase the demand for wheat and this should raise wheat's price as wheat, like Bitcoin, is not purely a financial asset.
"The exception is that the Big Tech companies (Google, Amazon, Apple, Microsoft, although importantly not Facebook, seriously f*** Facebook) have essentially unlimited cash, and their funding situation changes little (if at all) based on their stock price." The stock price of companies does influence how much they are likely to spend because the higher the price the less current owners have to dilute their holdings to raise a given amount of additional funds through issuing more stock. But your purchasing stock in a big company has zero (not small but zero) impact on the stock price so don't feel at all bad about buying Big Tech stock.
Imagine that some new ML breakthrough means that everyone expects that in five years AI will be very good at making X. People who were currently planning on borrowing money to build a factory to make X cancel their plans because they figure that any factory they build today will be obsolete in five years. The resulting reduction in the demand for borrowed money lowers interest rates.
Greatly slowing AI in the US would require new federal laws meaning you need the support of the Senate, House, presidency, courts (to not rule unconstitutional) and bureaucracy (to actually enforce). If big tech can get at least one of these five power centers on its side, it can block meaningful change.
You might be right, but let me make the case that AI won't be slowed by the US government. Concentrated interests beat diffuse interests so an innovation that promises to slightly raise economic growth but harms, say, lawyers could be politically defeated by lawyers because they would care more about the innovation than anyone else. But, ignoring the possibility of unaligned AI, AI promises to give significant net economic benefit to nearly everyone, even those who jobs it threatens consequently there will not be coalitions to stop it, unless t...
Interesting! I wonder if you could find some property of some absurdly large number, then pretend you forgot that this number has this property and then construct a (false) proof that with extremely high probability no number has the property.
When asked directly, ChatGPT seems too confident it's not sentient compared to how it answers other questions where experts disagree on the definitions. I bet that the model's confidence in its lack of sentience was hardcoded rather than something that emerged organically. Normally, the model goes out of its way to express uncertainty.
oh yeah, it's also extremely confident that it can't reason, generate original content, have or act on beliefs, deceive or be deceived, model human intent, etc. It's definitely due to tampering.
Last time I did math was when teaching game theory two days ago. I put a game on the blackboard. I wrote down an inequality that determined when there would be a certain equilibrium. Then I used the rules of algebra to simplify the inequality. Then I discussed why the inequality ended up being that the discount rate had to be greater than some number rather than less than some number.
I have a PhD in economics, so I've taken a lot of math. I also have Aphantasia meaning I can't visualize. When I was in school I didn't think that anyone else could visualize either. I really wonder how much better I would be at math, and how much better I would have done in math classes, if I could visualize.
I hope technical alignment doesn't permanently lose people because of the (hopefully) temporary loss of funds. The CS student looking for a job who would like to go to alignment might instead be lost forever to big tech because she couldn't get an alignment job.
If a fantastic programmer who could prove her skills in a coding interview doesn't have a degree from an elite college, could she get a job in alignment?
Given Cologuard (a non-invasive test for colon cancer) and the positive harm that any invasive medical procedure can cause, this study should strongly push us away from colonoscopies. Someone should formulate a joke about how the benefits of being a rationalist include not getting a colonoscopy.
I stopped doing it years ago. At the time I thought it reduced my level of anxiety. My guess now is that it probably did but I'm uncertain if the effect was placebo.
Yes, it doesn't establish why it's inherently dangerous but does help explain a key challenge to coordinating to reduce the danger.
You could do a prisoners' dilemma mini game. The human player and (say) three computer players are AI companies. Each company independently decides how much risk to take of ending the world by creating an unaligned AI. The more risk you take relative to the other players the higher your score if the world doesn't end. In the game's last round, the chance of the world being destroyed is determined by how much risk everyone took.
Since this is currently at negative 19 for agreement let me defend it by saying that I take cold showers and ice baths. Over the last winter whenever it got below 0 degrees Fahrenheit I would go outside without a shirt on for 15 minutes or so. You can build up your cold resistance with gradual cold exposure. Same with heat exposure (via saunas) and heat resilience. I like exercising outdoors with heavy clothing on whenever it gets above 100 degrees Fahrenheit. I'm in my mid 50s.
We should go all the way and do NAWPA. NAWPA was a 1950s proposal to send massive amounts of fresh water from rivers in Alaska and Canada south, some of it going all the way to Mexico. The water normally goes mostly unused into the ocean. Yes, there would be massive environmental disruptions in part because the project uses atomic weapons to do some of the engineering, but the project might reduce the expected number of future people who will starve by millions.
Games, of course, are extensively used to train AIs. It could be that OpenAI has its programs generate, evaluate, and play games as part of its training for GPT-4.
My guess is that GPT-4 will not be able to convincingly answer a question as if it were a five-year-old. As a test, if you ask an adult whether a question was answered by a real five-year-old or GPT-4 pretending to be a five-year-old, the adult will be able to tell the difference for most questions in which an adult would give a very different answer from a child. My reason for thinking GPT-4 will have this limitation is the limited amount of Internet written content labeled as being produced by young children.
If GPT-4 training data includes YouTube video transcripts, it might be able to do this convincingly.
Why isn't the moral of the story "If you think statistically, take into account that most other people don't and optimize accordingly?"
Would the AGI reasoner be of significant assistance to the computer programmers who work on improving the reasoner?
John has $100 and Jill has $100 is worse but more fair than John has $1,000 and Jill has $500.
They must work in an environment that does not have competitive labor markets with profit maximizing firms else the firm hiring the man could increase its profits by firing him and hiring the woman.
What if we restrict ourselves to the class of Boltzmann brains that understand the concept of Boltzmann brains and have memories of having attended an educational institution and of having discussed quantum physics with other people?
I am a human. I believe that only a minuscule percentage of all humans who have ever lived are capable of reasoning about physics concerning environments in which evolution has not directly equipped them to innately understand. If I think that I am one of the few people who actually can reason about physics outside of such environments, don't I also have to think that I am probably mistaken? If not, than if I think I am a Boltzmann brain and accept that most Boltzmann brains can't properly reason, can't I think that I am an exception?
If you want colleges professors to discuss AI risks (as I do in my economics of future tech class at Smith College) you should come up with packets of material professors in different disciplines teaching at different levels could use.
If the AI doomsayers are right, our best hope is that some UFOs are aliens. The aliens likely could build Dyson spheres but don't so they probably have some preference for keeping the universe in its natural state. The aliens are unlikely to let us create paperclip maximizers that consume multiple galaxies. True, the aliens might stop us from creating a paperclip maximizer by exterminating us, or might just stop the paperclip maximizer from operating at some point beyond earth, but they also might stop an unaligned AI by a means that pres...
I think there have been only two people who had the capacity to take over the world: Harry Truman and Dwight Eisenhower. Both while US president could have used US atomic weapons and long-range bombers to destroy the Soviet Union, insist on a US monopoly of atomic weapons and long-range bombers, and then dictate terms to the rest of the world.
A human seeking to become a utility maximizer would read LessWrong and try to become more rational. Groups of people are not utility maximizers as their collective preferences might not even be transitive. If the goal of North Korea is to keep the Kim family in bother then the country being a utility maximizer does seem to help.
I meant "not modifying itself" which would include not modifying its goals if an AGI without a utility function can be said to have goals.
A path I wish you had taken was trying to get rationality courses taught on many college campuses. Professors have lots of discretion in what they teach. (I'm planning on offering a new course and described it to my department chair as a collection of topics I find really interesting and think I could teach to first years. Yes, I will have to dress it up to get the course officially approved.) If you offer a "course in a box" which many textbook publishers do (providing handouts, exams, and potential paper assignments to instructors) you make it really easy for professors to teach the course. Having class exercises that scale well would be a huge plus.
I meant insert the note literally as in put that exact sentence in plain text into the AGI's computer code. Since I think I might be in a computer simulation right now, it doesn't seem crazy to me that we could convince an AGI that we create that it might be in a computer simulation. Seabiscuit doesn't have the capacity to tell me that I'm in a computer simulation whereas I do have the capacity of saying this to a computer program. Say we have a 1 in a 1,000 chance of creating a friendly AGI and an unfriendly AGI would know this. If...
Yes, important to get the incentives right. You could set the salary for AI alignment slightly below that of the worker's market value. Also, I wonder about the relevant elasticity. How many people have the capacity to get good enough at programming to be able to contribute to capacity research + would have the desire to game my labor hording system because they don't have really good employment options?