All of faul_sname's Comments + Replies

For example a human can to an extent inspect what they are going to say before they say or write it. Before saying Gary Marcus was "inspired by his pet chicken, Henrietta" a human may temporarily store the next words they plan to say elsewhere in the brain, and evaluate it.

Transformer-based also internally represent the tokens they are likely to emit in future steps. Demonstrated rigorously in Future Lens: Anticipating Subsequent Tokens from a Single Hidden State, though perhaps the simpler demonstration is simply that LLMs can reliably complete the se... (read more)

2Gerald Monroe8d
So yes but actually no. What's happening in the example you gave is the most probable token at each evaluation makes forward progress towards completing the sentence. Suppose the prompt contained the constraint "the third word of the response must begin with the letter 'c'. And the model has already generated "Alice likes apples". The current models can be prompted to check all the constraints, and will often notice an error, but have no private buffer to try various generations until one that satisfies the prompt gets generated. Humans have a private buffer and can also write things down they don't share. (Imagine solving this as a human. You would stop on word 3 and start brainstorming 'c' words and wouldn't continue until you have a completion) There's a bunch of errors like this I hit with gpt-4. Similarly if the probability of a correct generation is very low ("apples" may be far more probable even with the constraint for the letter 'c' in the prompt), current models are unable to online learn from their mistakes for common questions they get wrong. This makes them not very useful as "employees" for a specific role yet because they endlessly make the same errors.

I think the answer pretty much has to be "yes", for the following reasons.

  1. As noted in the above post, weather is chaotic.
  2. Elections are sometimes close. For example, the winner of the 2000 presidential election came down to a margin of 537 votes in Florida.
  3. Geographic location correlates reasonably strongly with party preference.
  4. Weather affects specific geographic areas.
  5. Weather influences voter turnout[1] --

During the 2000 election, in Okaloosa County, Florida (at the western tip of the panhandle), 71k of the county's 171k residents voted, with 52186 vo... (read more)

4Thomas Kwa1mo
I think this argument is not sufficient. Turnout effects of weather can flip elections that are already close, and from our limited perspective, more than 0.1% of elections are close. But the question is asking about the 2028 election in particular, which will probably not be so close.

An attorney rather than the police, I think.

Also "provably safe" is a property a system can have relative to a specific threat model. Many vulnerabilities come from the engineer having an incomplete or incorrect threat model, though (most obviously the multitude of types of side-channel attack).

Counterpoint: Sydney Bing was wildly unaligned, to the extent that it is even possible for an LLM to be aligned, and people thought it was cute / cool.

1Stephen Fowler1mo
I was not precise enough in my language and agree with you highlighting that what "alignment" means for LLM is a bit vague. While people felt Sydney Bing was cool, if it was not possible to reign it in it would have made it very difficult for Microsoft to gain any market share. An LLM that doesn't do what it's asked or regularly expresses toxic opinions is ultimately bad for business. In the above paragraph understand "aligned" to mean in the concrete sense of "behaves in a way that is aligned with it's parent companies profit motive", rather than "acting in line with humanities CEV". To rephrase the point I was making above, I feel much of (a majority even) of today's alignment research is focused on the the first definition of alignment, whilst neglecting the second.

The two examples everyone loves to use to demonstrate that massive top-down engineering projects can sometimes be a viable alternative to iterative design (the Manhattan Project and the Apollo Program) were both government-led initiatives, rather than single very smart people working alone in their garages. I think it's reasonable to conclude that governments have considerably more capacity to steer outcomes than individuals, and are the most powerful optimizers that exist at this time.

I think restricting the term "superintelligence" to "only that which ca... (read more)

Looking at the AlphaZero paper

Our new method uses a deep neural network fθ with parameters θ. This neural network takes as an input the raw board representation s of the position and its history, and outputs both move probabilities and a value, (p, v) = fθ(s). The vector of move probabilities p represents the probability of selecting each move a (including pass), pa = Pr(a| s). The value v is a scalar evaluation, estimating the probability of the current player winning from position s. This neural network combines the roles of both policy network and value

... (read more)

As I will expand upon later, this contrast makes no sense. We are not going to have machines outperforming humans on every task in 2047 and then only fully automating human occupations in 2116. Not in any meaningful sense.

Maybe people are interpreting "task" as "bounded, self-contained task", and so they're saying that machines will be able to outperform humans on every "task" but not on the parts of their jobs that are not "tasks".

The exact wording of the question was

Say we have ‘high-level machine intelligence’ when unaided machines can accomplish every

... (read more)
  • Gradientware? Seems verbose and isn't robust to other ML approaches to fit data.
  • Datagenicware? Captures the core of what makes them like that, but it's a mouthful.
  • Modelware? I don't love it
  • Puttyware? Aims to capture the "takes the shape of its surroundings" aspect, might be too abstract though. Also implies that it will take the shape of its current surroundings, rather than the ones it was built with
  • Resinware - maybe more evocative of the "was fit very closely to its particular surroundings", but still doesn't seem to capture quite what I want

When you get large, directed systems—(e.g., we are composed of 40 trillion cells, each containing tens of millions of proteins)—I think you basically need some level of modularity if there’s any hope of steering the whole thing.

This seems basically right to me. That said, while it is predictable that the systems in question will be modular, what exact form that modularity takes is both environment-dependent and also path-dependent. Even in cases where the environmental pressures form a very strong attractor for a particular shape of solution, the "modul... (read more)

Excellent post! In particular, I think "You Don’t Get To Choose The Problem Factorization" is a valuable way to crystallize a problem that comes up in a lot of different contexts.

Editing note: the link in

And if we’re not measuring what we think we are measuring, that undercuts the whole “iterative development” model.

points at a draft. Probably a draft of a very interesting post, based on the topic.

Also on the topic of that section, I do expect that if the goal was to build a really tall tower, we would want to do a bunch of testing on the individual c... (read more)

2johnswentworth2mo
Oh lol, that's a draft of this post - I had a few intra-post links in case people don't read it linearly. I'll fix those, thanks for catching it.

Very late reply, reading this for the 2022 year in review.

As one example: YCombinator companies have roughly linear correlation between exit value and number of employees, and basically all companies with $100MM+ exits have >100 employees. My impression is that there are very few companies with even $1MM revenue/employee (though I don't have a data set easily available).

So there are at least two different models which both yield this observation.

The first is that there are few people who can reliably create $1MM / year of value for their company, an... (read more)

1Xodarap2mo
Sure, I think everyone agrees that marginal returns to labor diminish with the number of employees. John's claim though was that returns are non-positive, and that seems empirically false.

So I think we might be talking past each other a bit. I don't really have a strong view on whether Shannon's work represented a major theoretical advancement. The specific thing I doubt is that Shannon's work had significant counterfactual impacts on the speed with which it became practical to do specific things with computers.

This was why I was focusing on error correcting codes. Is there some other practical task which people wanted to do before Shannon's work but were unable to do, which Shannon's work enabled, and which you believe would have taken at least 5 years longer had Shannon not done his theoretical work?

6johnswentworth2mo
This is an interesting question. Let's come at it from first principles. Going by the model in my own comment, Shannon's work was counterfactually impactful plausibly because most people didn't realize there was a problem to be solved there in the first place. So, in terms of practical applications, his work would be counterfactual mainly for things which people wouldn't even have thought to try or realized was possible prior to information theory. With that lens in mind, let's go back to error-correcting codes; I can see now why you were looking there for examples. Natural guess: Shannon's counterfactual impact on error-correcting codes was to clearly establish the limits of what's possible, so that code-designers knew what to aim for. It's roughly analogous to the role of Joseph Black's theory of latent heat in the history of the steam engine: before that theory, engines were wildly inefficient; Watt's main claim to fame was to calculate where the heat was used, realize that mostly it went to warming and cooling the cylinder rather than the steam, then figure out a way to avoid that, resulting in massive efficiency gains. That's the sort of thing I'd a-priori expect to see in the history of error-correcting codes: people originally did wildly inefficient things (like e.g. sending messages without compression so receivers could easily correct typos, or duplicating messages). Then, post-Shannon, people figured out efficient codes. And I think the history bears that out. Here's the wikipedia page on error-correcting codes: So, the first method efficient enough to merit mention on the wikipedia page at all was developed by the guy who shared an office with Shannon, within a few years of Shannon's development of information theory. And there had been nothing even remotely as efficient/effective for centuries before. If that's not a smoking gun for counterfactual impact, then I don't know what is.

The "make sure that future AIs are aligned with humanity" seems, to me, to be a strategy targeting the "determines humans are such entities" step of the above loss condition. But I think there are two additional stable Nash equilibria, namely "no single entity is able to obtain a strategic advantage" and "attempting to destroy anyone who could oppose you will, in expectation, leave you worse off in the long run than not doing that". If there are three I have thought of there are probably more that I haven't thought of, as well.

1Seth Ahrenbach2mo
You are correct that my argument would be stronger if I could prove that the NE I identified is the only one. I do not think it is reasonable that AGI would fail to obtain strategic advantage if sought, unless we pre-built in MAD-style assurances. But perhaps under my assumptions a stable “no one manages to destroy the other” outcome results. I would need to do more work to bring in assumptions about AGI becoming vastly more powerful and definitely winning, to prevent this. And I think this is the case, but maybe I should make it more clear. Similarly, if we can achieve a provable alignment, rather than probabilistic, then we simply do not have the game arise. The AGI would never be in a position to protect its own existence at the expense of ours, due to that provable alignment. In each case I think you are changing the game, which is something we can and I think should do, but barring some actual work to do that, I think we are left with a game as I’ve described, maybe without sufficient technical detail.

I think this is basically right on the object level -- specifically, I think that what von Neumann missed was that by changing the game a little bit, it was possible to get to a much less deadly equilibrium. Specifically, second strike capabilities and a pre-commitment to use them ensure that the expected payoff for a first strike is negative.

On the meta level, I think that very smart people who learn some game theory have a pretty common failure mode, which looks like

  1. Look at some real-world situation
  2. Figure out how to represent it as a game (in the game
... (read more)
1Seth Ahrenbach2mo
I totally agree with your diagnosis of how some smart people sometimes misuse game theory. And I agree that that’s the loss condition

So I note that our industrial civilization has not in fact been plunged into nuclear fire. With that in mind, do you think that von Neumann's model of the world was missing anything? If so, does that missing thing also apply here? If not, why hasn't there been a nuclear war?

1Seth Ahrenbach2mo
The missing piece is mutually assured destruction. Given that we did not play the Nash equilibrium as von Neumann suggested, the next best thing is MAD and various counterproliferation treaties that happened to work okay for humans. With an AGI counterparty, we can hope to build in a MAD-like assurance, but it will be a lot more challenging. The equilibrium move is to right now not build AGI.

There are many ways to improve employee incentives:

One more extremely major one: ensure that you pay employees primarily in money that will retain its value if the company stops capabilities work, instead of trying to save money by paying employees partly in ownership of future profits (which will be vastly decreased if the company stops capabilities work).

4nikola2mo
Agreed. AGI labs should probably look into buying back their shares from employees to fix this retroactively.

Telegraph operators and ships at sea, in the decades prior to World War II, frequently had to communicate in Morse code over noisy channels. However, as far as I can tell, none of them ever came up with the idea of using checksums or parity bits to leverage the parts of the message that did get through to correct for the parts of the message that did not. So that looks pretty promising for the hypothesis that Shannon was the first person to come up with the idea of using error correcting codes to allow for the transmission of information over noisy channel... (read more)

I would not consider error-correcting codes one of the more central Shannon discoveries. They were a major application which Shannon's discoveries highlighted, but not one of the more counterfactually impactful ideas in their own right.

I have a whole post here about how Shannon's discoveries qualitatively changed the way people thought about information. Very briefly: the major idea is that information/channel capacity is fungible. We do not need different kinds of channels to carry different kinds of information efficiently. Roughly speaking, any channel ... (read more)

Hartley's Transmission of Information was published in 1928, when Shannon was only 12 years old. Certainly Shannon produced a lot of new insights into the field, particularly in terms of formalizing things, but he did not invent the field. Are there particular advancements that Shannon in particular made that you expect would have taken many years to discover if Shannon had not discovered them?

2johnswentworth2mo
Smart-ass answer: "yes, all of the advancements that Shannon in particular made". That's probably not literally true, but my understanding is that it is at least true for the central results, i.e. there was nobody else even remotely close to making the main central discoveries which Shannon made (most notably the source coding theorem and noisy channel coding theorem).

Suppose we have such an agent, and it models the preferences of humanity. It models that humans cannot be sure that it will not destroy humanity, due to the probabilistic guarantees provided by its own action filter. It models that humans have a strong goal of self-preservation. It models that if it presents a risk to humanity, they will be forced to destroy it. Represented as a game, each player can either wait, or destroy. Assuming strong preferences for self-preservation, this game has a Nash equilibrium where the first mover destroys the other agent.

... (read more)
1Seth Ahrenbach2mo
Correct. Are you intending for this to be a reductio ad absurdum?

I think it could be safely assumed that people have an idea of "software"

Speaking as a software developer who interacts with end-users sometimes, I think you might be surprised at what the mental model of typical software users, rather than developers, looks like. When people who have programmed, or who work a lot with computers, think of "software", we think of systems which do exactly what we tell them to do, whether or not that is what we meant. However, the world of modern software does its best to hide the sharp edges from users, and the culture of... (read more)

I suspect that you are attributing far too detailed of a mental model to "the general public" here. Riffing off your xkcd:

4AnthonyC2mo
And it applies to fields far less technical than AI research and geochemistry. I've been a consultant for years. My parents still regularly ask me what it is I actually do.
6Thane Ruthenis2mo
I don't think I'm doing so? I think it could be safely assumed that people have an idea of "software", and that they know that AI is a type of software. Other than that, I'm largely assuming that they have no specific beliefs about how AI works, a blank map. Which, however, means that when they think of AI, they think about generic "software", thereby importing their ideas about how software works to AI. And those ideas include "people wrote it", which is causing the misconception I suspect them to have. What's your view on this instead?

But there are like 10x more safety people looking into interpretability instead of how they generalize from data, as far as I can tell.

I think interpretability is a really powerful lens for looking at how models generalize from data, partly just in terms of giving you a lot more stuff to look at than you would have purely by looking at model outputs.

If I want to understand the characteristics of how a car performs, I should of course spend some time driving the car around, measuring lots of things like acceleration curves and turning radius and power ou... (read more)

I think if we're fine with building an "increaser of diamonds in familiar contexts", that's pretty easy, and yeah I think "wrap an LLM or similar" is a promising approach. If we want "maximize diamonds, even in unfamiliar contexts", I think that's a harder problem, and my impression is that the MIRI folks think the latter one is the important one to solve.

1RogerDearnaley2mo
What in my diamond maximization proposal above only works in familiar contexts? Most of it is (unsurprisingly) about crystalography and isotopic ratios, plus a standard causal wrapper. (If you look carefully, I even allowed for the possibility of FTL.) The obvious "brute force" solution to aimability is a practical, approximately Bayesian, GOFAI equivalant of AIXI that is capable of tool use and contains an LLM as a tool;. This is extremely aimable — it has an explicit slot to plug a utility function in. Which makes it extremely easy to build a diamond maximizer, or a paperclip maximizer, or any other such x-risk. Then we need to instead plug in something that hopefully isn't an x-risk, like value learning or CEV or "solve goalcraft" as the terminal goal: figure out what we want, then optimize that, while appropriately pessimizing that optimization over remaining uncertainties in "what we want".

Thanks for the reply.

That is how MIRI imagines a sane developer using just-barely-aligned AI to save the world. You don't build an open-ended maximizer and unleash it on the world to maximize some quantity that sounds good to you; that sounds insanely difficult. You carve out as many tasks as you can into concrete, verifiable chunks, and you build the weakest and most limited possible AI you can to complete each chunk, to minimize risk. (Though per faul_sname, you're likely to be pretty limited in how much you can carve up the task, given time will be a

... (read more)

Good post!

In their most straightforward form (“foundation models”), language models are a technology which naturally scales to something in the vicinity of human-level (because it’s about emulating human outputs), not one that naturally shoots way past human-level performance

You address this to some extent later on in the post, but I think it's worth emphasizing the extent to which this specifically holds in the context of language models trained on human outputs. If you take a transformer with the same architecture but train it on a bunch of tokenized... (read more)

The quote from Paul sounds about right to me, with the caveat that I think it's pretty likely that there won't be a single try that is "the critical try": something like this (also by Paul) seems pretty plausible to me, and it is cases like that that I particularly expect having existing but imperfect tooling for interpreting and steering ML models to be useful.

Does anyone want to stop [all empirical research on AI, including research on prosaic alignment approaches]?

Yes, there are a number of posts to that effect.

That said, "there exist such posts" is not really why I wrote this. The idea I really want to push back on is one that I have heard several times in IRL conversations, though I don't know if I've ever seen it online. It goes like

There are two cars in a race. One is alignment, and one is capabilities. If the capabilities car hits the finish line first, we all die, and if the alignment car hits the f

... (read more)

But let’s be more concrete and specific. I’d like to know what’s the least impressive task which cannot be done by a 'non-agentic' system, that you are very confident cannot be done safely and non-agentically in the next two years.

Focusing on the "minimal" part of that, maybe something like "receive a request to implement some new feature in a system it is not familiar with, recognize how the limitations of the architecture that system make that feature impractical to add, and perform a major refactoring of that program to an architecture that is not so... (read more)

A lot of AI x-risk discussion is focused on worlds where iterative design fails. This makes sense, as "iterative design stops working" does in fact make problems much much harder to solve.

However, I think that even in the worlds where iterative design fails for safely creating an entire AGI, the worlds we succeed will be ones in which we were able to do iterative design on the components that safe AGI, and also able to do iterative design on the boundaries between subsystems, with the dangerous parts mocked out.

I am not optimistic about approaches that loo... (read more)

7ryan_greenblatt2mo
Maybe on LW, this seems way less true for lab alignment teams, open phil, and safety researchers in general. Also, I think it's worth noting the distinction between two different cases: * Iterative design against the problems you actually see in production fails. * Iterative design against carefully constructed test beds fails to result in safety in practice. (E.g. iterating against AI control test beds, model organisms, sandwiching setups, and other testbeds) See also this quote from Paul from here:
2ryan_greenblatt2mo
Does anyone want to stop this? I think some people just contest the usefulness of improving RLHF / RLAIF / constitutional AI as safety research and also think that it has capabilties/profit externalities. E.g. see discussion here. (I personally think this this research is probably net positive, but typically not very important to advance at current margins from an altruistic perspective.)

Thanks for the clarification!

If you relax the "specific intended content" constraint, and allow for maximizing any random physical structure, as long as it's always the same physical structure in the real world and not just some internal metric that has historically correlated with the amount of that structure that existed in the real world, does that make the problem any easier / is there a known solution? My vague impression was that the answer was still "no, that's also not a thing we know how to do".

4Rob Bensinger2mo
I expect it makes it easier, but I don't think it's solved.

As in, AIs boosting human productivity might/should let us figure out how to make stuff safe as it comes up, so no need to be concerned about us not having a solution to the endpoint of that process before we've made the first steps?

I don't expect it to be helpful to block individually safe steps on this path, though it would probably be wise to figure out what unsafe steps down this path look like concretely (which you're doing!).

But yeah. I don't have any particular reason to expect "solve for the end state without dealing with any of the intermediate... (read more)

How about "able to automate most simple tasks where it has an example of that task being done correctly"? Something like that could make researchers much more productive. Repeat the "the most time consuming part of your workflow now requires effectively none of your time or attention" a few dozen times and that does end up being transformative compared to the state before the series of improvements.

I think "would this technology, in isolation, be transformative" is a trap. It's easy to imagine "if there was an AI that was better at everything than we do, t... (read more)

I'm not particularly concerned about AI being "transformative" or not. I'm concerned about AGI going rogue and killing everyone. And LLMs automatic workflow is great and not (by itself) omnicidal at all, so that's... fine?

But I think what happens between now and when AIs that are better than humans-in-2023 at everything matters.

As in, AIs boosting human productivity might/should let us figure out how to make stuff safe as it comes up, so no need to be concerned about us not having a solution to the endpoint of that process before we've made the first steps... (read more)

I think the MIRI objection to that type of human-in-the-loop system is that it's not optimal because sometimes such a system will have to punt back to the human, and that's slow, and so the first effective system without a human in the loop will be vastly more effective and thus able to take over the world, hence the old "that's safe but it doesn't prevent someone else from destroying the world".

We can't just build a very weak system, which is less dangerous because it is so weak, and declare victory; because later there will be more actors that have the

... (read more)

Suppose you want to synthesize a lot of diamonds. Instead of giving an AI some lofty goal "maximize diamonds in an aligned way", why not a bunch of small grounded ones.

  1. "Plan the factory layout of the diamond synthesis plant with these requirements".
  2. "Order the equipment needed, here's the payment credentials".
  3. "Supervise construction this workday comparing to original plans"
  4. "Given this step of the plan, do it"
  5. (Once the factory is built) "remove the output from diamond synthesis machine A53 and clean it".

That is how MIRI imagines a sane developer using just-b... (read more)

"We don't currently have any way of getting any system to learn to robustly optimize for any specific goal once it enters an environment very different from the one it learned in" is my own view, not Nate's.

Like I think the MIRI folks are concerned with "how do you get an AGI to robustly maximize any specific static utility function that you choose".

I am aware that the MIRI people think that the latter is inevitable. However, as far as I know, we don't have even a single demonstration of "some real-world system that robustly maximizes any specific static u... (read more)

4Rob Bensinger2mo
To be clear: The diamond maximizer problem is about getting specific intended content into the AI's goals ("diamonds" as opposed to some random physical structure it's maximizing), not just about building a stable maximizer.
2Gerald Monroe2mo
So as an engineer I have trouble engaging with this as a problem. Suppose you want to synthesize a lot of diamonds. Instead of giving an AI some lofty goal "maximize diamonds in an aligned way", why not a bunch of small grounded ones. 1. "Plan the factory layout of the diamond synthesis plant with these requirements". 2. "Order the equipment needed, here's the payment credentials". 3. "Supervise construction this workday comparing to original plans" 4. "Given this step of the plan, do it" 5. (Once the factory is built) "remove the output from diamond synthesis machine A53 and clean it". And so on. And any goal that isn't something the model has empirical confidence in - because it's in distribution for the training environment - an outer framework should block the unqualified model from attempting. I think the problem MIRI has is this myopic model is not aware of context, and so it will do bad things sometimes. Maybe the diamonds are being cut into IC wafers and used in missiles to commit genocide. Is that what it is? Or maybe the fear is that one of these tasks could go badly wrong? That seems acceptable, industrial equipment causes accidents all the time, the main thing is to limit the damage. Fences to limit the robots operating area, timers that shut down control after a timeout, etc.

I don't think we have any way of getting an AI to "care about" any arbitrary particular thing at all, by the "attempt to maximize that thing, self-correct towards maximizing that thing if the current strategies are not working" definition of "care about". Even if we relax the "and we pick the thing it tries to maximize" constraint.

3bideup2mo
I don’t think that that’s the view of whoever wrote the paragraph you’re quoting, but at this point we’re doing exegesis

Or in less metaphorical language, the worry is that mostly that it's hard to give the AI the specific goal you want to give it, not so much that it's hard to make it have any goal at all.

At least some people are worried about the latter, for a very particular meaning of the word "goal". From that post:

Finally, I'll note that the diamond maximization problem is not in fact the problem "build an AI that makes a little diamond", nor even "build an AI that probably makes a decent amount of diamond, while also spending lots of other resources on lots of ot

... (read more)
1RogerDearnaley2mo
This is rather off-topic here, but for any AI that has an LLM as a component of it, I don't believe diamond-maximization is a hard problem, apart from Inner Alignment problems. The LLM knows the meaning of the word 'diamond' (GPT-4 defined it as "Diamond is a solid form of the element carbon with its atoms arranged in a crystal structure called diamond cubic. It has the highest hardness and thermal conductivity of any natural material, properties that are utilized in major industrial applications such as cutting and polishing tools. Diamond also has high optical dispersion, making it useful in jewelry as a gemstone that can scatter light in a spectrum of colors."). The LLM also knows its physical and optical properties, its social, industrial and financial value, its crystal structure (with images and angles and coordinates), what carbon is, its chemical properties, how many electrons, protons and neutrons a carbon atom can have, its terrestrial isotopic ratios, the half-life of carbon-14, what quarks a neutron is made of, etc. etc. etc. — where it fits in a vast network of facts about the world. Even if the AI also had some other very different internal world model and ontology, there's only going to be one "Rosetta Stone" optimal-fit mapping between the human ontology that the LLM has a vast amount of information about and any other arbitrary ontology, so there's more than enough information in that network of relationships to uniquely locate the concepts in that other ontology corresponding to 'diamond'. This is still true even if the other ontology is larger and more sophisticated: for example, locating Newtonian physics in relativistic quantum field theory and mapping a setup from the former to the latter isn't hard: its structure is very clearly just the large-scale low-speed limiting approximation. The point where this gets a little more challenging is Outer Alignment, where you want to write a mathematical or pseudocode reward function for training a diamon
5bideup2mo
Hm, I think that paragraph is talking about the problem of getting an AI to care about a specific particular thing of your choosing (here diamond-maximising), not any arbitrary particular thing at all with no control over what it is. The MIRI-esque view thinks the former is hard and the latter happens inevitably.

I'm also not entirely clear on what scenario I should be imagining for the "humanity had survived (or better)" case.

I think that one is supposed to be parsed as "If AI wipes out humanity and colonizes the universe itself, the future will go about as well as, or go better than, if humanity had survived" rather than "If AI wipes out humanity and colonizes the universe itself, the future will go about as well as if humanity had survived or done better than survival".

emphasizing a plan to update after the fact should be viewed primarily through the lens of damage control.

Is anyone acting like that is not a damage control measure? I upvoted specifically because "do damage control" is better than "don't". Usually when I see a hit piece, and later there are a bunch of inaccuracies that come to light, I don't in fact see that damage control done afterwards.

Also I think this kind of within-tribe conflict gets lots of attention within the EA and LW social sphere. I expect that if Ben publishes corrections a bunch of people will read them.

5TracingWoodgrains2mo
My read is that many people still consider the publication of the original post to be prudent and responsible given the circumstances, while any updates based on information that comes to light here will be prudent and responsible given the new information. Instead, I think people should view the original post as imprudent and irresponsible to the extent that it did not give one side of an adversarial situation an adequate hearing-out (and it really seems like it didn't: a three-hour phone call where you misleadingly summarize their response as "Good summary!", then refuse to wait until they can provide a more substantive response, is extraordinarily bad practice given the hundreds of hours he mentions putting into the rest of the investigation), with any subsequent updates being judged as returning towards responsibility after the fact rather than continuing a pattern of prudence.

Strongly positive/negative relative to what? Relative to being more accurate initially, sure. Relative to being wrong but just not acknowledging it, no.

Specifically, I think people by default read this post as an example of good epistemic practice in a community built around good epistemic practice. In this case, though, I think the prior bad epistemic practice (not waiting for full information before publishing a highly consequential piece aimed at inflicting reputational damage on someone) is significant enough and bad enough that emphasizing a plan to update after the fact should be viewed primarily through the lens of damage control. 

The standard with this sort of investigative piece should be to... (read more)

Props for doing this! Mine:

I do feel like "disempower humanity" is a slightly odd framing. I'm operationalizing "humanity remains in power" as something along the lines of "most human governments continue collecting taxes and using those taxes on things like roads and hospitals, at least half of global energy usage is used in the process of achieving ends that specific humans want to achieve", and "AI disempowers humans" as being that "humanity remains in power" becomes false specifically due to AI.

But there's another interpretation that goes something lik... (read more)

I think this is a good analogy. Though I think "one day you might have to dynamite a bunch of innocent people's homes to keep a fire from spreading, that's part of the job" is a good thing to have in the training if that's the sort of thing that's likely to come up.

The term "efficacy nod" is a little confusing, the FDA term is "reasonable expectation of effectiveness", which makes more sense to me, it sounds like the drug has enough promise that the FDA thinks its worth continuing testing. They may not have actual effectiveness data yet, just evidence that it's safe and a reasonable explanation for why it might work.

That's what I thought too, but the FDA's website indicates that a company that gets conditional approval can sell a drug where they have adequately demonstrated safety but have not demonstrated efficac... (read more)

5harsimony2mo
Agreed! Beyond potentially developing a drug, think Loyal's strategy has the potential to change regulations around longevity drugs, raise profits for new trials, and bring attention/capital to the longevity space. I don't see many downside risks here unless the drug turns out to be unsafe.

Depends on how big of a model you're trying to train, and how you're trying to train it.

I was imagining something along the lines of "download the full 100TB torrent which includes 88M articles, extract the text of each article ("extract text from a given PDF" isn't super reliable but it should be largely doable), which should leave you somewhere in the ballpark of 4TB of uncompressed plain text. If you're using a BPE, that would leave you with ~1T tokens.

If you're trying to do the chinchilla optimality thing, I fully agree that there's no way you're going... (read more)

i.e. $1000-$2000 in drive space, or $20 / day to store on Backblaze if you don't anticipate needing it for more than a couple of months tops.

1Shankar Sivarajan2mo
You're correct that simply storing the entire database isn't infeasible. But as I understand it, that's large enough that training a model on that is too expensive for most hobbyists to do just for kicks.

I agree that that's the only realistic way. That doesn't mean I expect it to be popular or something Loyal wants to draw attention to.

I think the kind of AI likely to take over the world can be described closely enough in such a way. Certainly for the kind of aligned AI that saves the world, it seems likely to me that expected utility is sufficient to think about how it thinks about its impact on the world.

What observations are backing this belief? Have you seen approaches that share some key characteristics with expected utility maximization approaches which have worked in real-world situations, and where you expect that the characteristics that made it work in the situation you obse... (read more)

I think using dogs for life extension research makes at least as much sense as raising pigs for food.

More interestingly, it also seems to happen for things like countries, companies, products and communities

I think this is a function of "create a new instance of something" being an easier problem than "fix a broken instance of that thing". If there are any types of damage that you can't fix, you will accumulate those types of damage over time. Consider teeth -- pretty simple to grow, but once they're exposed to the world your body can't repair them, so... (read more)

1StartAtTheEnd2mo
That makes sense! If it's 'cheaper', then evolution will choose it. Thinking about it, I also think that we sometimes kill or replace parts of something so that the rest can live. If we have bad habits, then we need to kill said habits before they kill us. I've long thought that adaptability is important to survival, and that inflexibility means death, but it makes sense that we haven't evolved ways to heal all kind of damage, and that certain noise/damage/waste accumulate until we break. 

That is a really good point that there are intermediate scenarios -- "thump" sounds pretty plausible to me as well, and the likely-to-be-effective mitigation measures are again different.

I also postulate "splat": one AI/human coalition comes to believe that they are militarily unconquerable, another coalition disagrees, and the resulting military conflict is sufficient to destroy supply chains and also drops us into an equilibrium where supply chains as complex as the ones we have can't re-form. Technically you don't need an AI for this one, but if you had... (read more)

It's surprising, to say the least, to see a company go from zero information to efficacy nod, because, well, what are you basing your efficacy on? How did you recruit your patients and veterinary partners to help you with efficacy? Did you make them all sign some incredibly airtight NDAs?

I suspect that the answer for "what are they basing efficacy on" is "animal testing on dogs", and that's also why you're not hearing them announce the specifics to the general public.

1inaimathi2mo
I mean ... how else are you supposed to test a novel treatment for dogs? I don't have a good sense for the space, but my prior is higher on silence-given-no-testing than silence-given-tests-on-the-eventual-target here. If they had tests on dogs that showed significant results, I'd expect the headline/pitch/whatever to be "Double the remaining lifespan of your dog!" or something. 
Load More