All of alexflint's Comments + Replies

My take on Michael Littman on "The HCI of HAI"

Yes, I agree, it's difficult to find explicit and specific language for what it is that we would really like to align AI systems with. Thank you for the reply. I would love to read such a story!

My take on Michael Littman on "The HCI of HAI"

Thank you for the kind words.

for example, being aware that human intentions can change -- it's not obvious that the right move is to 'pop out' further and assume there is something 'bigger' that the human's intentions should be aligned with. Could you elaborate on your vision of what you have in mind there?

Well it would definitely be a mistake to build an AI system that extracts human intentions at some fixed point in time and treats them as fixed forever, yes? So it seems to me that it would be better to build systems predicated on that which is the u... (read more)

Jean Monnet: The Guerilla Bureaucrat

Thank you for this post Martin. The anecdotes you've assembled here are delightful and insightful.

The two Governments declare that France and Great Britain shall no longer be two nations, but one Franco-British Union. [...] And thus we shall conquer.

A world in which full political union between Britain and France is contemplated in desperation is very much less complacent than the world we live in today. It would be good for our world to become less complacent, but how can we become less complacent without a big crisis? What are the times and places wh... (read more)

2evhub3moNp! Also, just going through the rest of the proposals in my 11 proposals paper [https://arxiv.org/abs/2012.07532], I'm realizing that a lot of the other proposals also try to avoid a full agency hand-off. STEM AI [https://www.alignmentforum.org/posts/fRsjBseRuvRhMPPE5/an-overview-of-11-proposals-for-building-safe-advanced-ai#6__STEM_AI] restricts the AI's agency to just STEM problems, narrow reward modeling [https://www.alignmentforum.org/posts/fRsjBseRuvRhMPPE5/an-overview-of-11-proposals-for-building-safe-advanced-ai#7__Narrow_reward_modeling___transparency_tools] restricts individual AIs to only apply their agency to narrow domains, and the amplification and debate proposals are trying to build corrigible question-answering systems rather than do a full agency hand-off.
Thoughts on Iason Gabriel’s Artificial Intelligence, Values, and Alignment

Our choice is not between having humans run the world and having a benevolent god run the world.

Right, I agree that having a benevolent god run the world is not within our choice set.

Our choice is between having humans run the world, and having humans delegate the running of the world to something else (which is kind of just an indirect way of running the world).

Well just to re-state the suggestion in my original post: is this dichotomy between humans running the world or something else running the world really so inescapable? The child in the sand ... (read more)

2paulfchristiano3moI buy into the delegation framing, but I think that the best targets for delegation look more like "slightly older and wiser versions of ourselves with slightly more space" (who can themselves make decisions about whether to delegate to something more alien). In the sand-pit example, if the child opted into that arrangement then I would say they have effectively delegated to a version of themselves who is slightly constrained and shaped by the supervision of the adult. (But in the present situation, the most important thing is that the parent protects them from the outside the world while they have time to grow.)
Thoughts on Iason Gabriel’s Artificial Intelligence, Values, and Alignment

Thank you for this jbash.

Humans aren't fit to run the world, and there's no reason to think humans can ever be fit to run the world

My short response is: Yes, it would be very bad for present-day humanity to have more power than it currently does, since its current level of power is far out of proportion to its level of wisdom and compassion. But it seems to me that there are a small number of humans on this planet who have moved some way in the direction of being fit to run the world, and in time, more humans could move in this direction, and could mov... (read more)

2paulfchristiano3moIf the humans in the container succeed in becoming wiser, then hopefully it is wise for us to leave this decision up to them than to preemptively make it now (and so I think the situation is even better than it sounds superficially). It seems like the real thing up for debate will be about power struggles amongst humans---if we had just one human, then it seems to me like the grandparent's position would be straightforwardly incoherent. This includes, in particular, competing views about what kind of structure we should use to govern ourselves in the future.
Reflections on Larks’ 2020 AI alignment literature review

I very much agree with these two:

On the other hand, there are lots of people who really do want to help, for the right reason. So if growth is the goal, helping these people out seems like just an obvious thing to do

So I think there is a lot of room for growth, by just helping the people who are already involved and trying.

Reflections on Larks’ 2020 AI alignment literature review

Thank you for this thoughtful comment Linda -- writing this replying has helped me to clarify my own thinking on growth and depth. My basic sense is this:

If I meet someone who really wants to help out with AI safety, I want to help them to do that, basically without reservation, regardless of their skill, experience, etc. My sense is that we have a huge and growing challenge in navigating the development of advanced AI, and there is just no shortage of work to do, though it can at first be quite difficult to find. So when I meet individuals, I will try to ... (read more)

3Linda Linsefors3moOk, that makes sense. Seems like we are mostly on the same page then. I don't have strong opinions weather drawing in people via prestige is good or bad. I expect it is probably complicated. For example, there might be people who want to work on AI Safety for the right reason, but are too agreeable to do it unless it reach some level of acceptability. So I don't know what the effects will be on net. But I think it is an effect we will have to handle, since prestige will be important for other reasons. On the other hand, there are lots of people who really do want to help, for the right reason. So if growth is the goal, helping these people out seems like just an obvious thing to do. I expect there are ways funders can help out here too. I would not update much on the fact that currently most research is produced by existing institutions. It is hard to do good research, and even harder with out collogues, sallary and other support that comes with being part of an org. So I think there is a lot of room for growth, by just helping the people who are already involved and trying.
Belief Functions And Decision Theory

Ah this is helpful, thank you.

So let's say I'm estimating the position of a train on a straight section of track as a single real number and I want to do an update each time I receive a noisy measurement of the train's position. Under the theory you're laying out here I might have, say, three Gaussians N(0, 1), N(1, 10), N(4, 6), and rather than updating a single pdf over the position of the train, I'm updating measures associated with each of these three pdf. Is that roughly correct?

(I realize this isn't exactly a great example of how to use this theory s... (read more)

2Vanessa Kosoy2moI'm not sure I understood the question, but the infra-Bayesian update is not equivalent to updating every distribution in the convex set of distributions. In fact, updating a crisp infra-distribution (i.e. one that can be described as a convex set of distributions) in general produces an infra-distribution that is not crisp (i.e. you need sa-measures to describe it or use the Legendre dual view).
Belief Functions And Decision Theory

Thank you for your work both in developing this theory and putting together this heroic write-up! It's really a lot of work to write all this stuff out.

I am interested in understanding the thing you're driving at here, but I'm finding it difficult to navigate because I don't have much of a sense for where the definitions are heading towards. I'm really looking for an explanation of what exactly is made possible by this theory, so that as I digest each of the definitions I have a sense for where this is all heading.

My current understanding is that this is a... (read more)

4Diffractor3moSo, first off, I should probably say that a lot of the formalism overhead involved in this post in particular feels like the sort of thing that will get a whole lot more elegant as we work more things out, but "Basic inframeasure theory" still looks pretty good at this point and worth reading, and the basic results (ability to translate from pseudocausal to causal, dynamic consistency, capturing most of UDT, definition of learning) will still hold up. Yes, your current understanding is correct, it's rebuilding probability theory in more generality to be suitable for RL in nonrealizable environments, and capturing a much broader range of decision-theoretic problems, as well as whatever spin-off applications may come from having the basic theory worked out, like our infradistribution logic stuff. It copes with unrealizability because its hypotheses are not probability distributions, but sets of probability distributions (actually more general than that, but it's a good mental starting point), corresponding to properties that reality may have, without fully specifying everything. In particular, if an agent learns a class of belief functions (read: properties the environment may fulfill) is learned, this implies that for all properties within that class that the true environment fulfills (you don't know the true environment exactly), the infrabayes agent will match or exceed the expected utility lower bound that can be guaranteed if you know reality has that property (in the low-time-discount limit) There's another key consideration which Vanessa was telling me to put in which I'll post in another comment once I fully work it out again. Also, thank you for noticing that it took a lot of work to write all this up, the proofs took a while. n_n
Reflections on Larks’ 2020 AI alignment literature review

Yeah so to be clear, I do actually think strategy research is pretty important, I just notice that in practice most of the strategy write-ups that I actually read do not actually enlighten me very much, whereas it's not so uncommon to read technical write-ups that seem to really move our understanding forward. I guess it's more that doing truly useful strategy research is just ultra difficult. I do think that, for example, some of Bostrom's and Yudkowsky's early strategy write-ups were ultra useful and important.

Search versus design

Yes I agree with this. Another example is the way a two-by-four length of timber is a kind of "interface" between the wood mill and the construction worker. There is a lot of complexity in the construction of these at the wood mill, but the standard two-by-four means that the construction worker doesn't have to care. This is also a kind of factorization that isn't about decomposition into parts or subsystems.

Search versus design

Nice post, very much the type of work I'd like to see more of.

Thank you!

I'm not sure I'd describe this work as "notorious", even if some have reservations about it.

Oops, terrible word choice on my part. I edited the article to say "gained attention" rather than "gained notoriety".

I think this is incorrect - for example, "biological systems are highly modular, at multiple different scales". And I expect deep learning to construct minds which are also fairly modular. That also allows search to be more useful, because it can make changes which are co

... (read more)
Search versus design

And thus the wheel of the Dharma was set in motion once again, for one more great turning of time

3adamShimi8moIf there was a vote for the best comment thread of 2020, that would probably be it for me.
Search versus design

Ah this is a different Ben.

3Ben Pace8moThen I will prepare for combat.
Search versus design

I think this is a very good summary

2rohinmshah8moThanks :)
Search versus design

I think my real complaint here is that your story is getting its emotional oomph from an artificial constraint (every output must be 100% correct or many beings die) that doesn't usually hold, not even for AI alignment

Well OK I agree that "every output must be 100% correct or many beings die" is unrealistic. My apologies for a bad choice of toy problem that suggested that I thought such a stringent requirement was realistic.

But would you agree that there are some invariants that we want advanced AI systems to have, and that we really want to be very con... (read more)

2rohinmshah8moI agree that there are some invariants that we really would like to hold, but I don't think it should necessarily be thought of in the same way as in the sorting example. Like, it really would be nice to have a 100% guarantee on intent alignment. But it's not obvious to me that you should think of it as "this neural network output has to satisfy a really specific and tight constraint for every decision it ever makes". It's not like for every possible low-level action a neural net is going to take, it's going to completely rethink its motivations / goals and forward-chain all the way to what action it should take. The risk seems quite a bit more nebulous: maybe the specific motivation the agent has changes in some particular weird scenario, or would predictably drift away from what humans want as the world becomes very different from the training setup. (All of these apply to humans too! If I had a human assistant who was intent aligned with me, I might worry that if they were deprived of food for a long time, they might stop being intent aligned with me; or if I got myself uploaded, then they may see the uploaded-me as a different person and so no longer be intent aligned with me. Nonetheless, I'd be pretty stoked to have an intent aligned human assistant.) There is a relevant difference between humans and AI systems here, which is that we expect that we'll be ceding more and more decision-making influence to AI systems over time, and so errors in AI systems are more consequential than errors in humans. I do think this raises the bar for what properties we want out of AI systems, but I don't think it gets to the point of "every single output must be correct", at least depending on what you mean by "correct". Re: the ARCHES point: I feel like an AI system would only drastically modify the temperature "intentionally". Like, I don't worry about humans "unintentionally" jumping into a volcano. The AI system could still do such a thing, even if intent aligned (e.g.
Search versus design

Obviously a regular sorting algorithm would be better, but if the choice were between the neural net and a human, and you knew there wasn't going to be any distributional shift, I would pick the neural net.

Well, sure, but this is a pretty low bar, no? Humans are terrible at repetitive tasks like sorting numbers.

Better than any of these solutions is to not have a system where a single incorrect output is catastrophic.

Yes very much agreed. It is actually incredibly challenging to build systems that are robust to any particular algorithm failing, espec... (read more)

9rohinmshah8moIt may be a low bar, but it seems like the right bar if you're thinking on the margin? It's what we use for nuclear reactors, biosecurity, factory safety, etc. (See also Engineering a Safer World [https://www.alignmentforum.org/posts/bP6KA2JJQMke8H4Au/an-112-engineering-a-safer-world] .) I think my real complaint here is that your story is getting its emotional oomph from an artificial constraint (every output must be 100% correct or many beings die) that doesn't usually hold, not even for AI alignment. If you told the exact same story but replaced the neural net with a human, the correct response would be "why on earth does your system rely on a human to perfectly sort; go design a new system". I think we should react basically the same way when you tell this story with neural nets. The broader heuristic I'm using: you should not be relying on stories that seem ridiculous if you replaced AIs with humans, unless you specifically identify a relevant difference between AIs and humans that matters for that story. (Plausibly you could believe that even if we never built AI systems, humans would still cause an existential catastrophe, and so we need to hold AI systems to a higher standard than humans. If so, it would be good to make this assumption clear, as to my knowledge it isn't standard.)
Search versus design

Well said, friend.

Yes when we have a shared understanding of what we're building together, with honest and concise stories flowing in both directions, we have a better chance of actually understanding what all the stakeholders are trying to achieve, which at least makes it possible to find a design that is good for everyone.

The distinction you point out between a design error and missing information seems like a helpful distinction to me. Thank you.

It reminds me of the idea of interaction games that CHAI is working on. Instead of having a human give a full... (read more)

Search versus design

One key to this whole thing seems to be that "helpfulness" is not something that we can write an objective for. But I think the reason that we can't write an objective for it is better captured by inaccessible information than by Goodhart's law.

By "other-izer problem", do you mean the satisficer and related ideas? I'd be interested in pointers to more "other-izers" in this cluster.

But isn't it the case that these approaches are still doing something akin to search in the sense that they look for any element of a hypothesis space meeting some conditions (pe... (read more)

2Charlie Steiner8moWell, any process that picks actions ends up equivalent to some criterion, even if only "actions likely to be picked by this process." The deal with agents and agent-like things is that they pick actions based on their modeled consequences. Basically anything that picks actions in different way (or, more technically, a way that's complicated to explain in terms of planning) is an other-izer to some degree. Though maybe this is drift from the original usage, which wanted nice properties like reflective stability etc. The example of the day is language models. GPT doesn't pick its next sentence by modeling the world and predicting the consequences. Bam, other-izer. Neither design nor search. Anyhow, back on topic, I agree that "helpfulness to humans" is a very complicated thing. But maybe there's some simpler notion of "helpful to the AI" that results in design-like other-izing that loses some of the helpfulness-to-humans properties, but retains some of the things that make design seem safer than search even if you never looked at the "stories."
Search versus design

Hey thank you for your thoughts on this post, friend

overall "design" is being used as a kind of catch-all that is probably very complicated

Yes it may be that "automating design" is really just a rephrasing of the whole AI problem. But I'm hopeful that it's not. Keep in mind that we only have to be competitive with machine learning, which means we only have to be able to automate the design of artifacts that can also be produced by black box search. This seems to me to be a lower bar than automating all human capacity for design, or automating design in... (read more)

Search versus design

I am not sure that designed artefacts are automatically easily interpretable.

It is certainly not the case that designed artifacts are easily interpretable. An unwieldy and poorly documented codebase is a design artifact that is not easily interpretable.

Design at its best can produce interpretable artifacts, whereas the same is not true for machine learning.

The interpretability of artifacts is not a feature of the artifact itself but of the pair (artifact, story), or you might say (artifact, documentation). We design artifacts in such a way that it is po... (read more)

On Need-Sets

If this is true then a larger need-set would lead to more negative motivation due to there being more ways for something we think we need to be taken away from us.

Yes, exactly.

So the solution is for us to give up those "needs" in the need-set that aren't actually needed for us to do what must be done, yes? We might believe that we need a cushy mattress to sleep on, a netflix account to entertain us, and a wardrobe of clothes to wear. If we simply satisfy these needs by acquiring all these things then we don't really become happy because now we're j... (read more)

1Marcello8moI broadly agree. Though I would add that those things could still be (positive motivation) wants afterwards, which one pursues without needing them. I'm not advocating for asceticism. Also, while I agree that you get more happiness by having fewer negative motives, being run by positive motives is not 100% happiness. One can still experience disappointment if one wants access to Netflix, and it's down for maintenance one day. However, disappointment is still both more hedonic than fear and promotes a more measured reaction to the situation.
On Need-Sets

First point: I was quite surprised when you said that

the environment we evolved in is far harsher than the one we find ourselves in today

Our ancestral environment was harsher in terms of providing fewer means to satisfy our need-sets, yet the modern environment seems to me harsher in terms of the overall level of unhappiness. Perhaps the reason for this unhappiness is that as our need-sets grow, we become more and more entrenched in anxiety and depression as we are negatively motivated more and more of the time. I think you said some of this yourself i... (read more)

1Marcello8mo1. Yes, I agree with the synopsis (though expanded need-sets are not the only reason people are more anxious in the modern world). 2. Ah. Perhaps my language in the post wasn't as clear as it could have been. When I said: I was thinking of the needs as already being about what seems true about future states of the world, not just present states. For example, your need for drinking water is about being able to get water when thirsty at a whole bunch of future times. Yes, exactly.
Search versus design

Nice write-up. The note about adversarial examples for LIME and SHAP was not something I've come across before - very cool.

Thanks for the pointer to Daniel Filan's work - that is indeed relevant and I hadn't read the paper before now.

Our take on CHAI’s research agenda in under 1500 words

Well, yes, one way to help some living entity is to (1) interpret it as an agent, and then (2) act in service of the terminal goals of that agent. But that's not the only way to be helpful. It may also be possible to directly be helpful to a living entity that is not an agent, without getting any agent concepts involved at all.

I definitely don't know how to do this, but the route that avoids agent models entirely seems more plausible me compared to working hard to interpret everything using some agent model that is often a really poor fit, and then helping... (read more)

Our take on CHAI’s research agenda in under 1500 words

That part is my take on further challenges from an embedded agency perspective.

Our take on CHAI’s research agenda in under 1500 words

At first glance I think it feels a bit nonsensical to try to "help" a rainforest. But, I'm kinda worried that it'll turn out that it's not (much) less nonsensical to try to help a human, and figuring out how to help arbitrary non-obviously-agenty systems seems like it might be the sort of thing we have to understand.

Yeah this question of what it really means to help some non-pure-agent living entity seems more and more central to me. It also, unfortunately, seems more and more difficult. Another way that I state the question in order to meditate on it is: what does it mean to act out of true love?

Our take on CHAI’s research agenda in under 1500 words

I think my main point is that "CHAI's agenda depends strongly on an agent assumption" seems only true of the specific mathematical formalization that currently exists; I would not be surprised if the work could then be generalized to optimizing systems instead of agents / EU maximizers in particular.

Ah, very interesting, yeah I agree this seems plausible, and also this is very encouraging to me!

The ground of optimization

I'd say the utility function needs to contain one or more local optima with large basins of attraction that contain the initial state, not that the utility function needs to be simple. The simplest possible utility function is a constant function, which allows the system to wander aimlessly and certainly not "correct" in any way for perturbations.

2ESRogs9moAh, good points!
The ground of optimization

Well most system don't have a tendency to evolve towards any small set of target states despite perturbations. Most systems, if you perturb then, just go off in some different direction. For example, if you perturb most running computer programs by modifying some variable with a debugger, they do not self-correct. Same with the satellite and billiard balls example. Most systems just don't have this "attractor" dynamic.

2ESRogs9moHmm, I see what you're saying, but there still seems to be an analogy to me here with arbitrary utility functions, where you need the set of target states to be small (as you do say). Otherwise I could just say that the set of target states is all the directions the system might fly off in if you perturb it. So you might say that, for this version of optimization to be meaningful, the set of target states has to be small (however that's quantified), and for the utility maximization version to be meaningful, you need the utility function to be simple (however that's quantified). EDIT: And actually, maybe the two concepts are sort of dual to each other. If you have an agent with a simple utility function, then you could consider all its local optima to be a (small) set of target states for an optimizing system. And if you have an optimizing system with a small set of target states, then you could easily convert that into a simple utility function with a gradient towards those states. And if your utility function isn't simple, maybe you wouldn't get a small set of target states when you do the conversion, and vice versa?
Book Summary: Consciousness and the Brain

Thank you for this helpful summary Kaj. I found the part about what exactly the "conscious" and "unconscious" parts of the mind are capable of fascinating.

In the meditation training I've done, a late step on the path is to let go of consciousness entirely. I haven't experienced this directly so I can't speak much to it, but it certainly suggests that what my teacher means by consciousness is very different to that of this book.

The ground of optimization

Thanks for the very thoughtful comment Rohin. I was on retreat last week after I published the article and upon returning to computer usage I was delighted by the engagement from you and others.

Generality of intelligence: The generality of O’s intelligence is a function of the number and diversity of tasks T that it can solve, as well as its performance on those tasks.

I like this.

We'll presumably need to give O some information about the goal / target configuration set for each task. We could say that a robot capable of moving a vase around is a little

... (read more)
4rohinmshah10mo+1 to all of this. I was imagining that the tasks can come equipped with some specification, but some sort of counterfactual also makes sense. This also gets around issues of the AI system not being appropriately "motivated" -- e.g. I might be capable of performing the task "lock up puppies in cages", but I wouldn't do it, and so if you only look at my behavior you couldn't say that I was capable of doing that task. +1 especially to this
Focus: you are allowed to be bad at accomplishing your goals

Lastly, I pick a distance between policies. If the two policies are deterministic, a Hamming distance will do. If they are stochastic, maybe some vector distance based on the Kullback-Leibler divergence.

I think it might actually be very difficult to come up with a distance metric between policies that corresponds even reasonably well to behavioral similarity. I imagine that flipping the sign on a single crucial parameter in a neural net could completely change its behavior, or at least break it sufficiently that it goes from highly goal oriented behavio

... (read more)
1adamShimi9moSorry for the delay in answering. In this post, I assume that a policy is a description of its behavior (like a function from state to action or distribution over action), and thus the distances mentioned indeed capture behavioral similarity. That being said, you're right that a similar concept of distance between the internal structure of the policies would prove difficult, eventually butting against uncomputability.
The ground of optimization

My biggest objection to this definition is that it inherently requires time

Fascinating - but why is this an objection? Is it just the inelegance of not being able to look at a single time slice and answer the question of whether optimization is happening?

One class of cases which definitely seem like optimization but do not satisfy this property at all: one-shot non-iterative optimization.

Yes this is a fascinating case! I'd like to write a whole post about it. Here are my thoughts:

  • First, just as a fun fact, not that it's actually extremely rare to
... (read more)
2johnswentworth9moNo, the issue is that the usual definition of an optimization problem (e.g.maxxf (x)) has no built-in notion of time, and the intuitive notion of optimization (e.g. "the system makes Y big") has no built-in notion of time (or at least linear time). It's this really fundamental thing that isn't present in the "original problem", so to speak; it would be very surprising and interesting if time had to be involved when it's not present from the start. If I specifically try to brainstorm things-which-look-like-optimization-but-don't-involve-objective-improvement-over-time, then it's not hard to come up with examples: * Rather than a function-value "improving" along linear time, I could think about a function value improving along some tree or DAG - e.g. in a heap data structure, we have a tree where the "function value" always "improves" as we move from any leaf toward the root. There, any path from a leaf to the root could be considered "time" (but the whole set of nodes at the "same level" can't be considered a time-slice, because we don't have a meaningful way to compare whole sets of values; we could invent one, but it wouldn't actually reflect the tree structure). * The example from the earlier comment: a one-shot non-iterative optimizer * A distributed optimizer: the system fans out, tests a whole bunch of possible choices in parallel, then selects the best of those. * Various flavors of constraint propagation, e.g. the simplex algorithm (and markets more generally)
4johnswentworth10moAnother big thing to note in examples like e.g. iteratively computing a square root for the quadratic formula or iteratively computing eigenvalues to solve a matrix: the optimization problems we're solving are subproblems, not the original full problem. These crucially differ from most of the examples in the OP in that the system's objective function (in your sense) does not match the objective function (in the usual intuitive sense). They're iteratively optimizing a subproblem's objective, not the "full" problem's objective. That's potentially an issue for thinking about e.g. AI as an optimizer: if it's using iterative optimization on subproblems, but using those results to perform some higher-level optimization in a non-iterative manner, then aligning the sobproblem-optimizers may not be synonymous with aligning the full AI. Indeed, I think a lot of reasoning works very much like this: we decompose a high-dimensional problem into coupled low-dimensional subproblems [https://www.lesswrong.com/posts/pT48swb8LoPowiAzR/everyday-lessons-from-high-dimensional-optimization] (i.e. "gears"), then apply iterative optimizers to the subproblems. That's exactly how eigenvalue algorithms work, for instance: we decompose the full problem into a series of optimization subproblems in narrower and narrower subspaces, while the "high-level" part of the algorithm (i.e. outside the subproblems) doesn't look like iterative optimization.
The ground of optimization

Thank you Ben. Reading this really filled me with joy and gives me energy to write more. Thank you for your curation work - it's a huge part of why there is this place for such high quality discussion of topics like this, for which I'm very grateful.

2Ben Pace10moYou’re welcome :-)
The ground of optimization

suppose you have a box with a rock in it, in an otherwise empty universe [...]

Yes you're right, this system would be described by a constant utility function, and yes this is analogous to the case where the target configuration set contains all configurations, and yes this should not be considered optimization. In the target set formulation, we can measure the degree of optimization by the size of the target set relative to the size of the basin of attraction. In your rock example, the sets have the same size, so it would make sense to say that the degr

... (read more)
The ground of optimization

Yeah I agree that duality is not a good measure of whether a system contains something like an AI. There is one kind of AI that we can build that is highly dualistic. Most present-day AI systems are quite dualistic, because they are predicated on having some robust compute infrastructure that is separate from and mostly unperturbed by the world around it. But there is every reason to go beyond these dualistic designs, for precisely the reason you point to: such systems do tend to be somewhat brittle.

I think it's quite feasible to build highly robust AI sys

... (read more)
The ground of optimization

Great examples! Thank you.

  1. Consider adding a big black hole in the middle of a galaxy. Does this turn the galaxy into a system optimising for a really big black hole in the middle of the galaxy?

Yes this would qualify as an optimizing system by my definition. In fact just placing a large planet close to a bunch of smaller planets would qualify as an optimizing system if the eventual result is to collapse the mass of the smaller planets into the larger planet.

This seems to me to be a lot like a ball rolling down a hill: a black hole doesn't seem alive o

... (read more)
2ESRogs9moDoesn't the set-of-target-states version have just the same issue (or an analogous one)? For whatever behavior the system exhibits, I can always say that the states it ends up in were part of its set of target states. So you have to count on compactness (or naturalness of description, which is basically the same thing) of the set of target states for this concept of an optimizing system to be meaningful. No?
8Richard_Ngo10moHmmm, I'm a little uncertain about whether this is the case. E.g. suppose you have a box with a rock in it, in an otherwise empty universe. Nothing happens. You perturb the system by moving the rock outside the box. Nothing else happens in response. How would you describe this as an optimising system? (I'm assuming that we're ruling out the trivial case of a constant utility function; if not, we should analogously include the trivial case of all states being target states). As a more general comment: I suspect that what starts to happen after you start digging into what "perturbation" means, and what counts as a small or big perturbation, is that you run into the problem that a *tiny* perturbation can transform a highly optimising system to a non-optimising system (e.g. flicking the switch to turn off the AGI). In order to quantify size of perturbations in an interesting way, you need the pre-existing concept of which subsystems are doing the optimisation. My preferred solution to this is just to stop trying to define optimisation in terms of *outcomes*, and start defining it in terms of *computation* done by systems. E.g. a first attempt might be: an agent is an optimiser if it does planning via abstraction towards some goal. Then we can zoom in on what all these words mean, or what else we might need to include/exclude (in this case, we've ruled out evolution, so we probably need to broaden it). The broad philosophy here is that it's better to be vaguely right than precisely wrong. Unfortunately I haven't written much about this approach publicly - I briefly defend it in a comment thread on this post though [https://www.lesswrong.com/posts/9pxcekdNjE7oNwvcC/goal-directedness-is-behavioral-not-structural] .
The ground of optimization

Well we could always just set the last digit to 0 as a post-processing step to ensure perfect repeatability. But point taken, you're right that most numerical algorithms are not quite as perfectly stable as I claimed.

The ground of optimization

Thank you for the pointer to this terminology. It seems relevant and I wasn't aware of the terminology before.

Our take on CHAI’s research agenda in under 1500 words

While this is the agenda that Stuart talks most about, other work also happens at CHAI

Yes good point - I'll clarify and link to ARCHES.

The reason I'm excited about CIRL is because it provides a formalization of assistance games in the sequential decision-making setting ... There should soon be a paper that more directly explains the case for the formalism

Yeah this is a helpful perspective, and great to hear re upcoming paper. I have definitely spoken to some folks that think of CHAI as the "cooperative inverse reinforcement learning lab" so I wanted

... (read more)
3rohinmshah10moNow that I've read your post [https://www.alignmentforum.org/posts/znfkdCoHMANwqc2WE/the-ground-of-optimization-1] on optimization, I'd restate as Which I guess was your point in the first place, that we should view things as optimizing systems and not agents. (Whereas when I hear "agent" I usually think of something like what you call an "optimizing system".) I think my main point is that "CHAI's agenda depends strongly on an agent assumption" seems only true of the specific mathematical formalization that currently exists; I would not be surprised if the work could then be generalized to optimizing systems instead of agents / EU maximizers in particular.
3rohinmshah10moIn all of the "help X" examples you give, I do feel like it's reasonable to do it via taking an intentional stance towards X, e.g. a tree by default takes in water + nutrients through its roots and produces fruit and seeds, in a way that wouldn't happen "randomly", and so "helping a tree" means "causing the tree to succeed more at taking in water + nutrients and producing fruit + seeds". In the case of a country, I think I would more say "whatever the goal of a country, since the country knows how to use money / military power, that will likely help with its goal, since money + power are instrumental subgoals". This is mostly a shortcut; ideally I'd figure out what the country's "goal" is and then assist with that, but that's very difficult to do because a country is very complex.
Our take on CHAI’s research agenda in under 1500 words

I'm afraid I don't really know what you're referring to here.

4romeostevensit10moMy take on CHAIs research agenda in 21 words, having read your take.
How does one authenticate with the lesswrong API?

Yes this is what I have been doing so far. I have been able to grab the auth token in this way but I imagine it will expire sooner or later and I was hoping to be able to programatically acquire an auth token. Based on the source in this file it looks like you're using Meteor to manage authentication. For password-based authentication (as opposite to oauth via google/fb/github) are you also using Meteor?

Interestingly, I see the username and a hash of the password being sent to a sockjs endpoint. Does authentication happen via a websocket?!

2habryka10moYep, all auth currently happens via Meteor. We sadly don’t really have any infrastructure set up to hand out programmatic auth tokens, but I think we set the expiration date to something like 5 years, so I don’t think you should run into much of an issue. And yeah, Meteor generally communicates over websockets. So my guess is that includes the auth part.
Reply to Paul Christiano on Inaccessible Information

Thanks for this question. No you're not confused!

There are two levels of search that we need to think about here: at the outer level, we use machine learning to search for an AI design that works at all. Then, at the inner level, when we deploy this AI into the world, it most likely uses search to find good explanations of its sensor data (i.e. to understand things that we didn't put in by hand) and most likely also uses search to find plans that lead to fulfilment of its goals.

It seems to me that design at least needs to be part of the story for how we do

... (read more)
2Steven Byrnes10moOK, well I spend most of my time thinking about a particular AGI architecture (1 [https://www.lesswrong.com/posts/cfvBm2kBtFTgxBB7s/predictive-coding-rl-sl-bayes-mpc] 2 [https://www.lesswrong.com/posts/DWFx2Cmsvd4uCKkZ4/inner-alignment-in-the-brain] etc.) in which the learning algorithm is legible and hand-coded ... and let me tell you, in that case, all the problems of AGI safety and alignment are still really really hard, including the "inaccessible information" stuff that Paul was talking about here. If you're saying that it would be even worse if, on top of that, the learning algorithm itself is opaque, because it was discovered from a search through algorithm-space ... well OK, yeah sure, that does seem even worse.
Reply to Paul Christiano on Inaccessible Information

If I had to summarise the history of AI in one sentence, it'd be something like: a bunch of very smart people spent a long time trying to engineer sophisticated systems without using search, and it didn't go very well until they started using very large-scale search.

Yeah this is not such a terrible one-sentence summary of AI over the past 20 years (maybe even over the whole history of AI). There are of course lots of exceptions, lots of systems that were built successfully using design. The autonomous cars being built today have algorithms that are high

... (read more)
Reply to Paul Christiano on Inaccessible Information

Thanks for this way of thinking about AlphaZero as a hybrid design/search system - I found this helpful.

Reply to Paul Christiano on Inaccessible Information

Thanks for the note Paul.

I agree re finding hard evidence that search is a lost cause, and I see how your overall work in the field has the property of (hopefully) either finding a safe way to use search, or producing evidence (perhaps weak or perhaps strong) that search is a lost cause.

As I speak to young (and senior!) ML folk, I notice they often struggle to conceive of what a non-search approach to AI really means. I'm excited about elucidating what search and design really are, and getting more people to consider using aspects of design alongside search.

Inaccessible information

Thanks for this post Paul. I wrote a long-form reply here.

Load More