I feel like an important lesson to learn from analogy to air conditioners is that some technologies are bounded by physics and cannot improve quickly.(or at all). I doubt anyone has the data, but I would be surprised if average air conditioning efficiency in BTUs per Watt plotted over the 20th century is not a sigmoid.
For seeing through the fog of war, I'm reminded of the German Tank Problem.
https://en.wikipedia.org/wiki/German_tank_problem
Statistical estimates were ~50x more accurate than intelligence estimates in the cannonical example. When you include the strong and reasonable incentives for all participants to propagandize, it is nearly impossible to get accurate information about an ongoing conflict.
I think as rationalists, if we're going to see more clearly than conventional wisdom, we need to find sources of information that have more fundamental basis. I don't yet know what those would be.
In reality, an AI can use algorithms that find a pretty good solution most of the time.
If you replace "AI" with "ML" I agree with this point. And yep this is what we can do with the networks we're scaling. But "pretty good most of the time" doesn't get you an x-risk intelligence. It gets you some really cool tools.
If the 3 sat algorithm is O(n^4) then this algorithm might not be that useful compared to other approaches.
If 3 SAT is O(n^4) then P=NP and back to Aaronson's point; the fundamental structure of reality is much different t...
I think less than human intelligence is sufficient for an x-risk because that is probably what is sufficient for a takeoff.
If less than human intelligence is sufficient, wouldn't humans have already done it? (or are you saying we're doing it right now?)
How intelligent does an agent need to be to send a HTTP request to the URL
/ldap://myfirstrootkit.comon a few million domains?)
A human could do this or write a bot to do this.(and they've tried) But they'd also be detected, as would an AI. I don't see this as an x-risk, so much as a manageable pr...
I spent some time reading the Grinnblatt paper. Thanks again for the link. I stand corrected on IQ being uncorrelated with stock prediction. One part did catch my eye.
...Our findings relate to three strands of the literature. First, the IQ and trading behavior analysis builds on mounting evidence that individual investors exhibit wealth-reducing behavioral biases. Research, exemplified by Barber and Odean (2000, 2001, 2002), Grinblatt and Keloharju (2001), Rashes (2001), Campbell (2006), and Calvet, Campbell, and Sodini (2007, 2009a, 2009b),
We don't know that, P vs NP is an unproved conjecture. Most real world problems are not giant knapsack problems. And there are algorithms that quickly produce answers that are close to optimal. Actually, most of the real use of intelligence is not a complexity theory problem at all. "Is inventing transistors a O(n) or an O(2^n) problem?"
P vs. NP is unproven. But I disagree that "most real world problems are not giant knapsack problems". The Cook-Levin theorem showed that many of the most interesting problems are reducible to NP-complete problems. &nb...
AlphaGo went from mediocre, to going toe-to-toe with the top human Go players in a very short span of time. And now AlphaGo Zero has beaten AlphaGo 100-0. AlphaFold has arguably made a similar logistic jump in protein folding
Do you know how many additional resources this required?
...Cost of compute has been decreasing at exponential rate for decades, this has meant entire classes of algorithms which straightforward scale with compute also have become exponentially more capable, and this has already had profound impact on our world. At the very lea
Since you bring up selection bias, Grinblatt et al 2012 studies the entire Finnish population with a population registry approach and finds that.
Thanks for the citation. That is the kind of information I was hoping for. Do you think that slightly better than human intelligence is sufficient to present an x-risk, or do you think it needs some sort of takeoff or acceleraton to present an x-risk?
I think I can probably explain the "so" in my response to Donald below.
Overshooting by 10x (or 1,000x or 1,000,000x) before hitting 1.5x is probably easier than it looks for someone who does not have background in AI.
Do you have any examples of 10x or 1000x overshoot? Or maybe a reference on the subject?
Hmmmmm there is a lot here let me see if I can narrow down on some key points.
Once you have the right algorithm, it really is as simple as increasing some parameter or neuron count.
There are some problems that do not scale well(or at all). For example, doubling the computational power applied to solving the knapsack problem will let you solve a problem size that is one element bigger. Why should we presume that intelligence scales like an O(n) problem and not an O(2^n) problem?
...What is happening here? Are both people just looking a
Are we equivocating on 'much better' here?
Not equivocating but if intelligence is hard to scale and slightly better is not a threat, then there is no reason to be concerned about AI risk. (maybe a 1% x-risk suggested by OP is in fact a 1e-9 x-risk)
there are considerable individual differences in weather forecasting performances (it's one of the more common topics to study in the forecasting literature),
I'd be interested in seeing any papers on individual differences in weather forecasting performance (even if IQ is not mentioned). My understand...
I think I'm convinced that we can have human capable AI(or greater) in the next century(or sooner). I'm unconvinced on a few aspects of AI alignment. Maybe you could help clarify your thinking.
(1) I don't see how an human capable or a bit smarter than human capable AI(say 50% smarter) will be a serious threat. Broadly humans are smart because of group and social behavior. So a 1.5 Human AI might be roughly as smart as two humans? Doesn't seem too concerning.
(2) I don't see how a bit smarter than humans scales to superhuman lev...
Current AI does stochastic search, but it is still search. Essentially PP complexity class, instead of NP/P. (with a fair amount of domain specific heuristics)
Never leave the house without your d20 :-P
But I agree with you. This seems a simple way to do something like satisficing. Avoiding the great computational cost of an optimal decision.
In terms of prior art that is probably the field you want to explore: https://en.m.wikipedia.org/wiki/Satisficing
Not sure if this is helpful, but since you analogized to chip design. In chip design, you typically verify using a constrained random method when the state space grows too large to verify every input exhaustively. That is, you construct a distribution over the set of plausible strings and then sample it and feed it to your design. Then you compare the result to a model in a higher level language.
Of course, standard techniques like designing for modularity can make the state space more manageable too.
First off, Scott’s blog is awesome.
Second, the example of dieting comes to mind when I think of training rationality. While they’re not much connected to the rationality community, they are a large group of people focused on overcoming one particular aspect of our irrationallity. (but without much success)
What basis is there to assume that the distribution of these variables is log uniform? Why, in the toy example, limit the variables to the interval [0,0.2]? Why not [0,1]?
These choices drive the result.
The problem is, for many of the probabilities, we don’t even know enough about them to say what distribution they might take. You can’t infer a meaningful distribution over variables where your sample size is 1 or 0
I’m still not seeing a big innovation here. I’m pretty sure most researchers who look at the Drake equation think “huge sensitivity to parameterization.”
If we have a 5 parameter drake equation then number of civilizations scales with X^5, so if X comes in at 0.01, we’ve got a 1e-10 probability of detectable civilization formation. But if we’ve got a 10 parameter Drake equation and X comes in at 0.01 then it implies a 1e-20 probability. (extraordinary smaller)
So yes, it has a a huge sensitivity, but it is primarily a constructed sensitivity. All the Drake equation really tells us is that we don’t know very much and it probably won’t be useful until we can get N above one for more of the parameters.
I’m not sure I understand why they’re against point estimates. As long as the points match the mean of our estimates for the variables, then the points multiplied should match the expected value of the distribution.
A traditional Turing machine doesn't make a distinction between program and data. The distinction between program and data is really a hardware efficiency optimization that came from the Harvard architecture. Since many systems are Turing complete, creating an immutable program seems impossible to me.
For example a system capable of speech could exploit the Turing completeness of formal grammars to execute de novo subroutines.
A second example. Hackers were able to exploit the surprising Turing completeness of an image compression standard to embed a virtual machine in a gif.
https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html