Richard Korzekwa

Director at AI Impacts.

Wiki Contributions

Comments

I would find this more compelling if it included examples of classic style writing (especially Pinker's writing) that fail at clear, accurate communication.

A common generator of doominess is a cluster of views that are something like "AGI is an attractor state that, following current lines of research, you will by default fall into with relatively little warning". And this view generates doominess about timelines, takeoff speed, difficulty of solving alignment, consequences of failing to solve alignment on the first try, and difficulty of coordinating around AI risk. But I'm not sure how it generates or why it should strongly correlate with other doomy views, like:

  1. Pessimism that warning shots will produce any positive change in behavior at all, separate from whether a response to a warning shot will be sufficient to change anything
  2. Extreme confidence that someone, somewhere will dump lots of resources into building AGI, even in the face of serious effort to prevent this
  3. The belief that narrow AI basically doesn't matter at all, strategically
  4. High confidence that the cost of compute will continue to drop on or near trend

People seem to hold these beliefs in a way that's not explained by the first list of doomy beliefs, It's not just that coordinating around reducing AI risk is hard because it's a thing you make suddenly and by accident, it's because the relevant people and institutions are incapable of such coordination. It's not just that narrow AI won't have time to do anything important because of short timelines, it's that the world works in a way that makes it nearly impossible to steer in any substantial way unless you are a superintelligence.

A view like "aligning things is difficult, including AI, institutions, and civilizations" can at least partially generate this second list of views, but overall the case for strong correlations seems iffy to me. (To be clear, I put substantial credence in the attractor state thing being true and I accept at least a weak version of "aligning things is hard".)

Montgolfier's balloon was inefficient, cheap, slapped together in a matter of months


I agree the balloons were cheap in the sense that they were made by a couple hobbyists. It's not obvious to me how many people at the time had the resources to make one, though.

As for why nobody did it earlier, I suspect that textile prices were a big part of it. Without doing a very deep search, I did find a not-obviously-unreliable page with prices of things in Medieval Europe, and it looks like enough silk to make a balloon would have been very expensive. A sphere with a volume of 1060 m^3 the volume of their first manned flight) has a surface area of ~600 yard^2. That page says a yard of silk in the 15th century was 10-12 shillings, so 600 yards would be ~6000s or 300 pounds. That same site lists "Cost of feeding a knight's or merchants household per year" as "£30-£60,  up to £100", so the silk would cost as much as feeding a household for 3-10 years.

This is, of course, very quick-and-dirty and maybe the silk on that list is very different from the silk used to make balloons (e.g. because it's used for fancy clothes).  And that's just the price at one place and time. But given my loose understanding of the status of silk and the lengths people went to to produce and transport it, I would not find it surprising if a balloon's worth of silk was prohibitively expensive until not long before the Montgolfiers came along.

I also wonder if there's a scaling thing going on. The materials that make sense for smaller, proof-of-concept experiments is not the same as what makes sense for a balloon capable of lifting humans. So maybe people had been building smaller stuff with expensive/fragile things like silk and paper for a while, without realizing they could use heavier materials for a larger balloon.

it's still not the case that we can train a straightforward neural net on winning and losing chess moves and have it generate winning moves. For AlphaGo, the Monte Carlo Tree Search was a major component of its architecture, and then any of the followup-systems was trained by pure self-play.

AlphaGo without the MCTS was still pretty strong:

We also assessed variants of AlphaGo that evaluated positions using just the value network (λ = 0) or just rollouts (λ = 1) (see Fig. 4b). Even without rollouts AlphaGo exceeded the performance of all other Go programs, demonstrating that value networks provide a viable alternative to Monte Carlo evaluation in Go.

Even with just the SL-trained value network, it could play at a solid amateur level:

We evaluated the performance of the RL policy network in game play, sampling each move...from its output probability distribution over actions. When played head-to-head, the RL policy network won more than 80% of games against the SL policy network. We also tested against the strongest open-source Go program, Pachi14, a sophisticated Monte Carlo search program, ranked at 2 amateur dan on KGS, that executes 100,000 simulations per move. Using no search at all, the RL policy network won 85% of games against Pachi.

I may be misunderstanding this, but it sounds like the network that did nothing but get good at guessing the next move in professional games was able to play at roughly the same level as Pachi, which, according to DeepMind, had a rank of 2d.

Here's a selection of notes I wrote while reading this (in some cases substantially expanded with explanation).

The reason any kind of ‘goal-directedness’ is incentivised in AI systems is that then the system can be given an objective by someone hoping to use their cognitive labor, and the system will make that objective happen. Whereas a similar non-agentic AI system might still do almost the same cognitive labor, but require an agent (such as a person) to look at the objective and decide what should be done to achieve it, then ask the system for that. Goal-directedness means automating this high-level strategizing.

This doesn't seem quite right to me, at least not as I understand the claim. A system that can search through a larger space of actions will be more capable than one that is restricted to a smaller space, but it will require more goal-like training and instructions. Narrower instructions will restrict its search and, in expectation, result in worse performance. For example, if a child wanted cake, they might try to dictate actions to me that would lead to me baking a cake for them. But if they gave me the goal of giving them a cake, I'd find a good recipe or figure out where I can buy a cake for them and the result would be much better. Automating high-level strategizing doesn't just relieve you of the burden of doing it yourself, it allows an agent to find superior strategies to those you could come up with.

Skipping the nose is the kind of mistake you make if you are a child drawing a face from memory. Skipping ‘boredom’ is the kind of mistake you make if you are a person trying to write down human values from memory. My guess is that this seemed closer to the plan in 2009 when that post was written, and that people cached the takeaway and haven’t updated it for deep learning which can learn what faces look like better than you can.

(I haven't waded through the entire thread on the faces thing, so maybe this was mentioned already.) It seems to me that it's a lot easier to point to examples of faces that an AI can learn from than examples of human values that an AI can learn from.

It also seems plausible that [the AIs under discussion] would be owned and run by humans. This would seem to not involve any transfer of power to that AI system, except insofar as its intellectual outputs benefit it

I think this is a good point, but isn't this what the principal-agent problem is all about? And isn't that a real problem in the real world?

That is, tasks might lack headroom not because they are simple, but because they are complex. E.g. AI probably can’t predict the weather much further out than humans.

They might be able to if they can control the weather!

IQ 130 humans apparently earn very roughly $6000-$18,500 per year more than average IQ humans.

I left a note to myself to compare this to disposable income. The US median household disposable income (according to the OECD, includes transfers, taxes, payments for health insurance, etc) is about $45k/year. At the time, my thought was "okay, but that's maybe pretty substantial, compared to the typical amount of money a person can realistically use to shape the world to their liking". I'm not sure this is very informative, though.

Often at least, the difference in performance between mediocre human performance and top level human performance is large, relative to the space below, iirc.

I take machine chess performance as evidence for a not-so-small range of human ability, especially when compared to rate of increase of machine ability. But I think it's good to be cautious about using chess Elo as a measure of the human range of ability, in any absolute sense, because chess is popular in part because it is so good at separating humans by skill. It could be the case that humans occupy a fairly small slice of chess ability (measured by, I dunno, likelihood of choosing the optimal move or some other measure of performance that isn't based on success rate against other players), but a small increase in skill confers a large increase in likelihood of winning, at skill levels achievable by humans.

~Goal-directed entities may tend to arise from machine learning training processes not intending to create them (at least via the methods that are likely to be used).~

I made my notes on the AI Impacts version, which was somewhat different, but it's not clear to me that this should be crossed out. It seems to me that institutions do exhibit goal-like behavior that is not intended by the people who created them.

"Paxlovid's usefulness is questionable and could lead to resistance. I would follow the meds and supplements suggested by FLCC"

Their guide says:

In a follow up post-marketing study, Paxlovid proved to be ineffective in patients less than 65 years of age and in those who were vaccinated.

This is wrong. The study reports the following:

Among the 66,394 eligible patients 40 to 64 years of age, 1,435 were treated with nirmatrelvir. Hospitalizations due to Covid-19 occurred in 9 treated and 334 untreated patients: adjusted HR 0.78 (95% CI, 0.40 to 1.53). Death due to Covid-19 occurred in 1 treated and 13 untreated patients; adjusted HR: 1.64 (95% CI, 0.40 to 12.95).

As the abstract says, the study did not have the statistical power to show a benefit for preventing severe outcomes in younger adults. It did not "prove [Paxlovid] to be ineffective"! This is very bad, the guide is clearly not a reliable source of information about covid treatments, and I recommend against following the advice of anything else on that website.

I was going to complain that the language quoted from the abstract in the frog paper is sufficiently couched that it's not clear the researchers thought they were measuring anything at all. Saying that X "suggests" Y "may be explained, at least partially" by Z seems reasonable to me (as you said, they had at least not ruled out that Z causes Y). Then I clicked through the link and saw the title of the paper making the unambiguous assertion that Z influences Y.

When thinking about a physics problem or physical process or device, I track which constraints are most important at each step. This includes generic constraints taught in physics classes like conservation laws, as well as things like "the heat has to go somewhere" or "the thing isn't falling over, so the net torque on it must be small".

Another thing I track is what everything means in real, physical terms. If there's a magnetic field, that usually means there's an electric current or permanent magnet somewhere. If there's a huge magnetic field, that usually means a superconductor or a pulsed current. If there's a tiny magnetic field, that means you need to worry about the various sources of external fields. Even in toy problems that are more like thought experiments than descriptions of the real world, this is useful for calibrating how surprised you should be by a weird result (e.g. "huh, what's stopping me from doing this in my garage and getting a Nobel prize?" vs "yep, you can do wacky things if you can fill a cubic km with a 1000T field!").

Related to both of these, I track which constraints and which physical things I have a good feel for and which I do not. If someone tells me their light bulb takes 10W of electrical power and creates 20W of visible light, I'm comfortable saying they've made a mistake*. On the other hand, if someone tells me about a device that works by detecting a magnetic field on the scale of a milligauss, I mentally flag this as "sounds hard" and "not sure how to do that or what kind of accuracy is feasible".

*Something else I'm noticing as I'm writing this: I would probably mentally flag this as "I'm probably misunderstanding something, or maybe they mean peak power of 20W or something like that"

Communication as a constraint (along with transportation as a constraint), strikes me as important, but it seems like this pushes the question to "Why didn't anyone figure out how to control something that's more than a couple weeks away by courier?"

I suspect that, as Gwern suggests, making copies of oneself is sufficient to solve this, at least for a major outlier like Napoleon. So maybe another version of the answer is something like "Nobody solved the principle-agent problem well enough to get by on communication slower than a couple weeks". But it still isn't clear to me why that's the characteristic time scale? (I don't actually know what the time scale is, by the way, I just did five minutes of Googling to find estimates for courier time across the Mongol and Roman Empires)

in a slow takeoff world, many aspects of the AI alignment problems will already have showed up as alignment problems in non-AGI, non-x-risk-causing systems; in that world, there will be lots of industrial work on various aspects of the alignment problem, and so EAs now should think of themselves as trying to look ahead and figure out which margins of the alignment problem aren’t going to be taken care of by default, and try to figure out how to help out there.

I agree with this, and I think it extends beyond what you're describing here. In a slow takeoff world, the aspects of the alignment problem that show up in non-AGI systems will also provide EAs with a lot of information about what's going on, and I think we should try to do things now that will help us to notice those aspects and act appropriately. (I'm not sure what this looks like; maybe we want to build relationships with whoever will be building these systems, or maybe we want to develop methods for figuring things out and fixing problems that are likely to generalize.)

Load More