Student at Caltech. Currently trying to get an AI safety inside view.
Have you seen this ever work for an advance prediction? It seems like you need to be in a better epistemic position than Feynman, which is pretty hard.
I disagree that every indirect normativity process must approximate this linear process of humans delegating to the future. In the ELK appendix, there is some discussion of a process that allows humans to delegate to the past so as to form a self-consistent chain:
Sometimes the delegate Hn will want to delegate to a future version of themselves, but they will realize that the situation they are in is actually not very good (for example, the AI may have no way to get them food for the night), and so they would actually prefer that the AI had made a different decision at some point in the past. We want our AI to take actions now that will help keep us safe in the future, so it’s important to use this kind of data to guide the AI’s behavior. But doing so introduces significant complexities, related to the issues discussed in Appendix: subtle manipulation.
Mark would probably have more to say here.
Maybe this is too tired a point, but AI safety really needs exercises-- tasks that are interesting, self-contained (not depending on 50 hours of readings), take about 2 hours, have clean solutions, and give people the feel of alignment research.
I found some of the SERI MATS application questions better than Richard Ngo's exercises for this purpose, but there still seems to be significant room for improvement. There is currently nothing smaller than ELK (which takes closer to 50 hours to develop a proposal for and properly think about it) that I can point technically minded people to and feel confident that they'll both be engaged and learn something.
SpaceX's superpower is doing things slightly better, which yields substantial gains thanks to the large exponent on the rocket equation.
Agree with the rest of this comment, but I don't think SpaceX's success is due to the rocket equation. Their engine specific impulse is not better than state-of-the-art, and as a consequence their payload fraction to orbit isn't better either. The success is driven by huge reductions in cost per ton at launch of each rocket.
Why can't we use wrong probabilities in real life?
There are various circumstances where I want to "rotate" between probabilities and utilities in real life, in ways that still prescribe the correct decisions. For example, if I have a startup idea and want to maximize my expected profit, I'd be much more emotionally comfortable with thinking it has a 90% chance of making $1 billion than a 10% chance of making $9 billion. So why can't we use wrong probabilities in real life?I think there are three major reasons why this doesn't always work.
Nuno Sempere points out that this was written up in an economics paper in 2012: https://arxiv.org/pdf/1201.6655.pdf
Do we know how to train act-based agents? Is the only obstacle competitiveness, similarly to how Tool AI wants to be Agent AI?
I'm pretty skeptical that sophisticated game theory happens between shards in the brain, and also that coalitions between shards are how value preservation in an AI will happen (rather than there being a single consequentialist shard, or many shards that merge into a consequentialist, or something I haven't thought of).
To the extent that shard theory makes such claims, they seem to be interesting testable predictions.
RSA-2048 has not been factored. It was generated by taking two random primes of approximately 1024 bits each and multiplying them together.
Not an answer, but I think of "adversarial coherence" (the agent keeps optimizing for the same utility function even under perturbations by weaker optimizing processes, like how humans will fix errors in building a house or AlphaZero can win a game of Go even when an opponent tries to disrupt its strategy) as a property that training processes could select for. Adversarial coherence and corrigibility are incompatible.