Rough is easy to find and not worth much.
Diamonds are much harder to find and worth a lot more.
I once read a post by someone who was unimpressed with the paper that introduced Generative Adversarial Networks (GANs). They pointed out some sloppy math and other such problems and were confused why such a paper had garnered so much praise.
Someone replied that, in her decades of reading research papers, she learned that finding flaws is easy and uninteresting. The real trick is being able to find the rare glint of insight that a paper brings to the table. Understanding how even a subtle idea can move a whole field forward. I kinda sympathize as a software developer.
I remember when I first tried to slog through Marcus Hutter's book on AIXI, I found the idea absurd. I have no formal background in mathematics, so I chalked some of that up to me not fully understanding what I was reading. I kept coming back to the question (among many others): "If AIXI is incomputable, how can Hutter supposedly prove that it performs 'optimally'? What does 'optimal' even mean? Surely it should include the computational complexity of the agent itself!"
I tried to modify AIXI to include some notion of computational resource utilization until I realized that any attempt to do so would be arbitrary. Some problems are much more sensitive to computational resource utilization than others. If I'm designing a computer chip, I can afford to have the algorithm run an extra month if it means my chip will be 10% faster. The algorithm that produces a sub-optimal solution in milliseconds using less than 20 MB of RAM doesn't help me. At the same time, if a saber-toothed tiger jumps out of a bush next to me. I don't have months to figure out a 10% faster route to get away.
I believe there are problems with AIXI, but lots of digital ink has been spilled on that subject. I plan on contributing a little to that in the near future, but I also wanted to point out that, it's easy to look at an idea like AIXI from the wrong perspective and miss a lot of what it truly has to say.
Drop the "A"
Flight is a phenomenon exhibited by many creatures and machines alike. We don't say mosquitos are capable of flight and helicopters are capable of "artificial flight" as though the word means something fundamentally different for man-made devices. Flight is flight: the process by which an object moves through an atmosphere (or beyond it, as in the case of spaceflight) without contact with the surface.
So why do we feel the need to discuss intelligence as though it wasn't a phenomenon in its own right, but something fundamentally different depending on implementation?
If we were approaching this rationally, we'd first want to formalize the concept of intelligence mathematically so that we can bring to bear the full power of math to the pursuit and we could put to rest all the arguments and confusion caused by leaving the term so vaguely defined. Then we'd build a science dedicated to studying the phenomenon regardless of implementation. Then we'd develop ways to engineer intelligent systems (biological or otherwise) guided by the understanding granted us by a proper scientific field.
What we've done, instead; is developed a few computer science techniques, coined the term "Artificial Intelligence" and stumbled around in the dark trying to wear both the hat of engineer and scientist while leaving the word itself undefined. Seriously, our best definition amounts to: "we'll know it when we see it" (i.e. the Turing Test). That doesn't provide any guidance. That doesn't allow us to say "this change will make the system more intelligent" with any confidence.
Keeping the word "Artificial" in the name of what should be a scientific field only encourages tunnel vision. We should want to understand the phenomenon of intelligence whether it be exhibited by a computer, a human, a raven, a fungus, or a space alien.
Subjective does not imply arbitrary.
I see this conflation quite often. "Poop smells bad" is a subjective statement, but there's a pretty clear reason why most humans share that subjective evaluation. It's not an arbitrary evaluation.
A flaw in the Gödel Machine may provide a formal justification for evolution
I've never been a fan of the concept of evolutionary computation. Evolution isn't fundamentally different than other forms of engineering, rather it's the most basic concept in engineering. The idea slightly modifying an existing solution to arrive at a better solution is a fundamental part of engineering. When you take away all of an engineer's other tools, like modeling, analysis, heuristics, etc. You're left with evolution.
Designing something can be modeled as a series of choices like traversing a tree. There are typically far more possibilities per choice than is practical to explore, so we use heuristics and intelligence to prune the tree. Sure, an evolutionary algorithm might consider branches you never would have considered, but it's still aimless, you could probably do better simply less aggressively pruning the search tree if you have the resources, there will always be countless branches that are clearly not worth exploring. You want to make a flying machine? What material should the fuselage be made of? What? You didn't even consider peanut butter? Why?!
I think that some of the draw to evolution comes from the elegant forms found in nature, many of which are beyond the capabilities of human engineering, but a lot of that can be chalked up to the fact that biology started by default with the "holy grail" of manufacturing technologies: codified molecular self-assembly. If we could harness that capability and bring all the techniques we've learned over the last few centuries about managing complexity (particularly from computer science), we would quickly be able to engineer some mind-blowing technology in a matter of decades rather than billions of years.
Despite all this, people still find success using evolutionary algorithms and generate a lot of hype even though the techniques are doomed not to scale. Is there a time and place where evolution really is the best technique? Can we derive some rule for when to try evolutionary techniques? Maybe.
There's a particular sentence in the paper on the Gödel Machine paper that always struck me as odd:
Any formal system that encompasses arithmetics (or ZFC etc) is either flawed or allows for unprovable but true statements. Hence even a Gödel machine with unlimited computational resources must ignore those self-improvements whose effectiveness it cannot prove
It seems like the machine is making an arbitrary decision in the face of undecidability, especially after admitting that a formal system is either flawed or allows for unprovable but true statements. The more appropriate behavior should be for the Gödel machine to copy itself where one copy implements the change and the other doesn't. This introduces some more problems, like; what is the cost of duplicating the machine and how should that be factored in, but I thought that observation might provide some food for thought.
Arbitrary Intelligence does not imply a maximally greedy algorithm.
A greedy algorithm is one that always makes decisions based on what yields the most immediate benefit. It essentially eschews the concept of delayed gratification.
In discussions about value alignment, where the assumption is often that the system to be 'aligned' possesses arbitrary intelligence, I often see people assume that arbitrary intelligence can be conceived of as a maximally greedy algorithm.
When someone proposes a specific goal, detractors often conjure a scenario in which a greedy optimizer with that goal would misbehave, then dismiss the proposal without further consideration. Instead, we should expect an arbitrarily intelligent agent to employ far more sophisticated decision criteria.
Eliezer Yudkowsky makes this assumption many times in this lecture. He claims that an agent with arbitrary intelligence, given the utility function {1 if the cauldron is full, 0 if the cauldron is empty}, will keep adding as much water as possible to the cauldron to maximize the probability that the cauldron is full. But what about the probability that the cauldron will be full in the future? Surely an arbitrarily intelligent agent would consider conserving water to ensure future utility, right? Wouldn't securing Earth's water supply put the agent into conflict with many other agents in the environment? There's at least some probability that the other agents will destroy our cauldron filler, leading to zero future utility, right? The list of reasons why an arbitrarily intelligent agent might not want to grab up all the water in the observable universe and pour it into the cauldron is endless.
You might say, "Sure, but you can't guarantee that the list of reasons an agent might not do something dangerous will overcome the list of reasons it might. This is a discussion about safety after all. Yudkowsky's job is to show how things could go horribly wrong, not that it will. A detractor would have to prove the system will be safe, not that it might. Where safety is concerned, the onus of proof lies with the claimant that something is safe. It's enough to show that it might not be." I would largely agree with this sentiment, but it becomes a form of Pascal's mugging when taken to the extreme.
Like the state of the cauldron, we can only reason about the safety of a system in terms of probability. We may want to maximize the probability of a safe outcome and minimize the probability of a dangerous outcome, but in the real world, the mathematical certainty of proofs simply doesn't apply. An arbitrarily intelligent system may dream up plans that we could never conceive of, which complicates our ability to weigh the probability of negative, positive, and neutral side effects, but the contrarian instinct to think of the worst possible outcome and disregard the absurdity of how that outcome might come about doesn't serve us well.
When I brew my coffee in the morning, there's always a non-zero chance that a hypnotist tricked me into thinking I'm grinding coffee beans when I am, in fact, launching a nuclear missile that will start WW III and lead to the extinction of humanity. There's a chance that I will somehow anger an arbitrarily powerful being who will strike me with a lightning bolt for some reason that's ultimately unfathomable to my tiny human brain. There are infinite possibilities for infinite tragedy. There has got to be a more rational way to estimate risk.
He claims that an agent with arbitrary intelligence, given the utility function {1 if the cauldron is full, 0 if the cauldron is empty}, will keep adding as much water as possible to the cauldron to maximize the probability that the cauldron is full. But what about the probability that the cauldron will be full in the future?
I didn't watch the video so I might be missing something, but assuming you created an AI with that utility function, whether the cauldron is full right now and the probability that the cauldron will be full in the future are different utility functions. A sufficiently intelligent AI would know that maximizing its utility function now will hurt it later, but it doesn't care because that's not the utility function.
Eliezer has a bunch of different arguments because he's trying to address different levels of familiarity with the problem. My impression is that he expects a sufficiently intelligent but unaligned AI to not be greedy like this and to scheme until it no longer needs us.