Charlie Sanders

Sorted by New

# Wiki Contributions

Sorted by

I'd like to propose a test to objectively quantify the average observer's reaction with regards to skepticism of doomsday prophesizing present in a given text. My suggestion is this: take a text, swap the subject of doom (in this case AGI) with another similar text spelling out humanity's impending doom - for example, a lecture on Scientology and Thetans or the Jonestown massacre - and present these two texts to independent observers, in the same vein as a Turing test.

If an outside independent observer cannot reliably identify which subject of doom corresponds to which text, then that could serve as an effective way of benchmarking when a specific text has transitioned away from effectively conveying information and towards fearmongering.

I think this post would be stronger if it covered at least basic metrology and statistics.

It's incorrect to say that billions of variables aren't affecting a sled sliding down a hill - of course they're affecting the speed, even if most are only by a few planck-lengths per hour. But, crucially, they're mostly not affecting it to a detectable amount. The detectability threshold is the key to the argument.

For detectability, whether you notice the effects of outside variables is going to come down to the precision of the instrument that you're using to measure your output. If you're using a radar gun that gives readings to the nearest MPH, for example, you won't perceive a difference between 10.1 and 10.2 MPH, and so to you the two are equivalent. Nonetheless, outside variables have absolutely influenced the two readings differently.

Equally critical is the number of measurements that you're taking. For example, if you're taking repeated measurements after controlling a certain set of variables, you may be able to say with a certain confidence/reliability that no other variables are causing enough variations in speed to register an output that's outside of the parameters that you've set. But that is a very different thing than saying that those other variables simply don't exist! One is a statement of probability, another is a statement of certainty. Maybe there's a confluence of variables that only occur once every thousand times, which you won't pick up when doing an initial evaluation.

1. The size of the community working on the alignment problem can be assumed to be at least somewhat proportional to the likelihood of successfully solving the alignment problem.
2. Eliezer, being the most public face of the alignment problem community, wields outsized influence in shaping public perception of the community.
3. Eliezer's writing is distinctly condescending and polemical, and has at least a hypothetical possibility of causing reputational harm to the community (as evidenced by your comment).

Based on this, there absolutely exists a hypothetical point where, based purely on writing style, the net effect of a post like this could fully undermine the post's ostensible aim. Whether this post crosses that point is a subjective evaluation, and I don't know of any rigorous way to evaluate this.

I'm fully aware that this could be construed as "tone policing", but ignorance of the impacts of writing tone seems like a blind spot to Eliezer and the community overall, so I think the topic is worthy of discussion.

Imagine you have two points, A and B. You're at A, and you can see B in the distance. How long will it take you to get to B?

Well, you're a pretty smart fellow. You measure the distance, you calculate your rate of progress, maybe you're extra clever and throw in a factor of safety to account for any irregularities during the trip. And you figure that you'll get to point B in a year or so.

Then you start walking.

And you run into a wall.

Turns out, there's a maze in between you and point B. Huh, you think. Well that's ok, I put a factor of safety into my calculations, so I should be fine. You pick a direction, and you keep walking.

You run into more walls.

You start to panic. You figured this would only take you a year, but you keep running into new walls! At one point, you even realize that the path you've been on is a dead end — it physically can't take you from point A to point B, and all of the time you've spent on your current path has been wasted, forcing you to backtrack to the start.

Fundamentally, this is what I see happening, in various industries: brain scanning, self-driving cars, clean energy, interstellar travel, AI development. The list goes on.

Laymen see a point B in the distance, where we have self-driving cars run on green energy powered by AGI's. They see where we are now. They figure they can estimate how long it'll take to get to that point B, slap on a factor of safety, and make a prediction.

But the real world of problem solving is akin to a maze. And there's no way to know the shape or complexity of that maze until you actually start along the path. You can think you know the theoretical complexity of the maze you'll encounter, but you can't.

One implication of the Efficient Market Hypothesis (EMH) is that is it difficult to make money on the stock market. Generously, maybe only the top 1% of traders will be profitable.

Nitpick: it's incredibly easy to make money on the stock market: just put your money in it, ideally in an index fund. It goes up by an average of 8% a year. Almost all traders will be profitable, although many won't beat that 8% average.

The entire FIRE movement is predicated on it being incredibly simple to make money on the stock market. It takes absolutely zero skill to be a sufficiently profitable trader, given a sizeable enough initial investment.

I get that you're trying to convey above-market-rate returns here, but your wording is imprecise.

Right, but I'm not sure how you'd "test" for success in that scenario. Usefulness to humanity, as demonstrated by effective product use, seems to me like the only way to get a rigorous result. If you can't measure the success or failure of an idea objectively, then the idea probably isn't going to matter much.

On fuzzy tasks: I think the appropriate frame of comparison is neither an average subset (Mechanical Turk) or the ideal human (Go), but instead the median resource that someone would be reasonably likely to seek out. To use healthcare as an example, you'd want your AI to beat the average family doctor that most people would reach out to, as opposed to either a layman's opinion or the preeminent doctor in the field.