All of Alex Flint's Comments + Replies

It's worse, even, in a certain way, than that: the existence of optimizing systems organized around a certain idea of "natural class" feeds back into more observers observing data that is distributed according to this idea of "natural class", leading to more optimizing systems being built around that idea of "natural class", and so on.

Once a certain idea of "natural class" gains a foothold somewhere, observers will make real changes in the world that further suggest this particular idea of "natural class" to others, and this forms a feedback loop.

If you pin down what a thing refers to according to what that thing was optimized to refer to, then don't you have to look at the structure of the one who did the optimizing in order to work out what a given thing refers to? That is, to work out what the concept "thermodynamics" refers to, it may not be enough to look at the time evolution of the concept "thermodynamics" on its own, I may instead need to know something about the humans who were driving those changes, and the goals held within their minds. But, if this is correct, then doesn't it raise anot... (read more)

2Ramana Kumar1y
The trick is that for some of the optimisations, a mind is not necessary. There is a sense perhaps in which the whole history of the universe (or life on earth, or evolution, or whatever is appropriate) will become implicated for some questions, though.

There seems to be some real wisdom in this post but given the length and title of the post, you haven't offered much of an exit -- you've just offered a single link to a youtube channel for a trauma healer. If what you say here is true, then this is a bit like offering an alcoholic friend the sum total of one text message containing a single link to the homepage of alcoholics anonymous -- better than nothing, but not worthy of the bombastic title of this post.

If someone feels resonance with what I'm pointing out but needs more, they're welcome to comment and/or PM me to ask for more.

friends and family significantly express their concern for my well being

What exact concerns do they have?

Thinking that psychedelics are safe, or that masks are useless against Covid, seem like the beliefs most likely to trigger concern... 
You’re very welcome! Happy to help.
  1. You don't get to fucking assume any shit on the basis of "but... ah... come on". If you claim X and someone asks why, then congratulations now you're in a conversation. That means maybe possible shit is about to get real, like some treasured assumptions might soon be questioned. There are no sarcastic facial expressions or clever grunts that get you an out from this. You gotta look now at the thing itself.
Can you link to another conversation on this site where this occurs?

I just want to acknowledge the very high emotional weight of this topic.

For about two decades, many of us in this community have been kind of following in the wake of a certain group of very competent people tackling an amazingly frightening problem. In the last couple of years, coincident with a quite rapid upsurge in AI capabilities, that dynamic has really changed. This is truly not a small thing to live through. The situation has real breadth -- it seems good to take it in for a moment, not in order to cultivate anxiety, but in order to really engage w... (read more)

That is correct. I know it seems little weird to generate a new policy on every timestep. The reason it's done that way is that the logical inductor needs to understand the function that maps prices to the quantities that will be purchased, in order to solve for a set of prices that "defeat" the current set of trading algorithms. That function (from prices to quantities) is what I call a "trading policy", and it has to be represented in a particular way -- as a set of syntax tree over trading primitives -- in order for the logical inductor to solve for pri... (read more)

Thanks for the extra detail! (Actually, I was reading a post by Mark Xu which seems to suggest that the TradingAlgorithms have access to the price history rather than the update history as I suggested above)

Thank you for this extraordinarily valuable report!

I believe that what you are engaging in, when you enter into a romantic relationship with either a person or a language model, is a kind of artistic creation. What matters is not whether the person on the "other end" of the relationship is a "real person" but whether the thing you create is of true benefit to the world. If you enter into a romantic relationship with a language model and produce something of true benefit to the world, then the relationship was real, whether or not there was a "real person" on the other end of it (whatever that would mean, even in the case of a human).

This is a relatively banal meta-commentary on reasons people sometimes give for doing worst-case analysis, and the differences between those reasons. The post reads like a list of things with no clear through-line. There is a gesture at an important idea from a Yudkowsky post (the logistic success curve idea) but the post does not helpfully expound that idea. There is a kind of trailing-off towards the end of the post as things like "planning fallacy" seem to have been added to the list with little time taken to place them in the context of the other thing... (read more)

Many people believe that they already understand Dennett's intentional stance idea, and due to that will not read this post in detail. That is, in many cases, a mistake. This post makes an excellent and important point, which is wonderfully summarized in the second-to-last paragraph:

In general, I think that much of the confusion about whether some system that appears agent-y “really is an agent” derives from an intuitive sense that the beliefs and desires we experience internally are somehow fundamentally different from those that we “merely” infer and a

... (read more)

Have you personally ever ridden in a robot car that has no safety driver?

This post consists of comments on summaries of a debate about the nature and difficulty of the alignment problem. The original debate was between Eliezer Yudkowsky and Richard Ngo but this post does not contain the content from that debate. This posts is mostly of commentary by Jaan Tallinn on that debate, with comments by Eliezer.

The post provides a kind of fascinating level of insight into true insider conversations about AI alignment. How do Eliezer and Jaan converse about alignment? Sure, this is a public setting, so perhaps they communicate differentl... (read more)

RE the FOOM debate: On this, I think the Hansonian viewpoint that takeoff would be gradual was way more correct than the discontinuous narrative of Eliezer, where AI progress in the real world follows more of a Hansonian path. Eliezer didn't get this totally wrong, and there are some results in AI showing that there can be phase transitions/discontinuities. But overall, a good prior for AI progress is that it will look like the Hansonian continuous progress rather than the FOOM of Eliezer.
I think I disagree with this characterization. A) we totally have robot cars by now, B) I think mostly what we don't have are AI running systems where the consequence of failure is super high (which maybe happens to be more true for the physical world, but I'd expect to also be true for critical systems in the digital world)

Thanks for the note.

In Life, I don't think it's easy to generate an X-1 time state that leads to an X time state, unfortunately. The reason is that each cell in an X time state puts a logical constraint on 9 cells in an X-1 time state. It is therefore possible to set up certain constraint satisfaction problems in terms of finding an X-1 time state that leads to an X time state, and in general these can be NP-hard.

However, in practice, it is very very often quite easy to find an X-1 time state that leads to a given X time state, so maybe this experiment cou... (read more)

It surprises me a little that there hasn't been more work on working backwards in Life. Perhaps it's just too hard/not useful given the number of possible X-1 time slices. With the smiley face example, there could be a very large number of combinations for the squares outside the smiley face at X-1 which result in the same empty grid space (i.e. many possible self-destructing patterns). I'm unreasonably fond of brute forcing problems like these. I don't know if I'd have anything useful to say on this topic that I haven't already, but I'm interested to follow this work. I think this is a fascinating analogy for the control problem. Edit - It just occurred to me, thanks to a friend, that instead of reverse engineering the desired state, it might be easier to just randomise the inputs until you get the outcome you want (not sure why this didn't occur to me). Still very intensive, but perhaps easier.

Interesting. Thank you for the pointer.

The real question, though, is whether it is possible within our physics.

Oh the only information I have about that is Dave Green's comment, plus a few private messages from people over the years who had read the post and were interested in experimenting with concrete GoL constructions. I just messaged the author of the post on the GoL forum asking about whether any of that work was spurred by this post.

Thanks - fixed! And thank you for the note, too.

Yeah it might just be a lack of training data in 10-second-or-less interactive instructions.

The thing I really wanted to test with this experiment was actually whether ChatGPT could engage with the real world using me as a guinea pig. The 10-second-or-less thing was just the format I used to try to "get at" the phenomenon of engaging with the real world. I'm interested in improving the format to more cleanly get at the phenomenon.

I do currently have the sense that it's more than just a lack of training data. I have the sense that ChatGPT has learned much l... (read more)

I asked a group of friends for "someone to help me with an AI experiment" and then I gave this particular friend the context that I wanted her help guiding me through a task via text message and that she should be in front of her phone in some room that was not the kitchen.

I asked a group of friends for "someone to help me with an AI experiment" and then I gave this particular friend the context that I wanted her help guiding me through a task via text message and that she should be in front of her phone in some room that was not the kitchen.

If you look at how ChatGPT responds, it seems to be really struggling to "get" what's happening in the kitchen -- it never really comes to the point of giving specific instructions, and especially never comes to the point of having any sense of the "situation" in the kitchen -- e.g. whet... (read more)

I'm very interested in Wei Dai's work, but I haven't followed closely in recent years. Any pointers to what I might read of his recent writings?

I do think Eliezer tackled this problem in the sequences, but I don't really think he came to an answer to these particular questions. I think what he said about meta-ethics is that it is neither that there is some measure of goodness to be found in the material world independent from our own minds, nor that goodness is completely open to be constructed based on our whims or preferences. He then says "well there ju... (read more)

Unfortunately, everything I'm thinking of was written as comments, and I can't remember when or where he wrote them. You'd have to talk to Wei Dai on the topic if you want an accurate summary of his position. Yeah, I agree he didn't seem to come to a conclusion. The most in depth examples I've seen on LW of people trying to answer these questions are about as in-depth as this post in lukeprog's no-nonsense metaethics sequence. Maybe there's more available, but I don't know how to locate it. 

Recursive relevance realization seems to be designed to answer about the "quantum of wisdom".

It does! But... does it really answer the question? Curious about your thoughts on this.

 The high concepts seem high quality concept work and when trying to fill in details with imagniation it seems workable. But the details are not in yet. If one could brigde the gap from (something like) bayesian evidence updating that touches the lower points of RRR it woudl pretty much be it. But the details are not in yet.

you ask whether you are aligned to yourself (ideals, goals etc) and find that your actuality is not coherent with your aim

Right! Very often, what it means to become wiser is to discover something within yourself that just doesn't make sense, and then to in some way resolve that.

Discovering incoherency seems very different from keeping a model on coherence rails

True. Eliezer is quite vague about the term "coherent" in his write-ups, and some more recent discussions of CEV drop it entirely. I think "coherent" was originally about balancing the extrapo... (read more)

Did you ever end up reading Reducing Goodhart?

Not yet, but I hope to, and I'm grateful to you for writing it.

processes for evolving humans' values that humans themselves think are good, in the ordinary way we think ordinary good things are good

Well, sure, but the question is whether this can really be done by modelling human values and then evolving those models. If you claim yes then there are several thorny issues to contend with, including what constitutes a viable starting point for such a process, what is a reasonable dynamic for such a process, and on what basis we decide the answers to these things.

Wasn't able to record it - technical difficulties :(

Yes, I should be able to record the discussion and post a link in the comments here.

If you train a model by giving it reward when it appears to follow a particular human's intention, you probably get a model that is really optimizing for reward, or appearing to follow said humans intention, or something else completely different, while scheming to seize control so as to optimize even more effectively in the future. Rather than an aligned AI.

Right yeah I do agree with this.

Perhaps instead you mean: No really the reward signal is whether the system really deep down followed the humans intention, not merely appeared to do so [...] That

... (read more)

Well even if language models do generalize beyond their training domain in the way that humans can, you still need to be in contact with a given problem in order to solve that problem. Suppose I take a very intelligent human and ask them to become a world expert at some game X, but I don't actually tell them the rules of game X nor give them any way of playing out game X. No matter how intelligent the person is, they still need some information about what the game consists of.

Now suppose that you have this intelligent person write essays about how one ough... (read more)

This makes sense, but it seems to be a fundamental difficulty of the alignment problem itself as opposed to the ability of any particular system to solve it.  If the language model is superintelligent and knows everything we know, I would expect it to be able to evaluate its own alignment research as well as if not better than us.  The problem is that it can't get any feedback about whether its ideas actually work from empirical reality given the issues with testing alignment problems, not that it can't get feedback from another intelligent grader/assessor reasoning in a ~a priori way.

This is a post about the mystery of agency. It sets up a thought experiment in which we consider a completely deterministic environment that operates according to very simple rules, and ask what it would be for an agentic entity to exist within that.

People in the game of life community actually spent some time investigating the empirical questions that were raised in this post. Dave Greene notes:

The technology for clearing random ash out of a region of space isn't entirely proven yet, but it's looking a lot more likely than it was a year ago, that a work

... (read more)

This post attempts to separate a certain phenomenon from a certain very common model that we use to understand that phenomenon. The model is the "agent model" in which intelligent systems operate according to an unchanging algorithm. In order to make sense of their being an unchanging algorithm at the heart of each "agent", we suppose that this algorithm exchanges inputs and outputs with the environment via communication channels known as "observations" and "actions".

This post really is my central critique of contemporary artificial intelligence discourse.... (read more)

This is an essay about methodology. It is about the ethos with which we approach deep philosophical impasses of the kind that really matter. The first part of the essay is about those impasses themselves, and the second part is about what I learned in a monastery about addressing those impasses.

I cried a lot while writing this essay. The subject matter -- the impasses themselves -- are deeply meaningful to me, and I have the sense that they really do matter.

It is certainly true that there are these three philosophical impasses -- each has been discussed in... (read more)

This post trims down the philosophical premises that sit under many accounts of AI risk. In particular it routes entirely around notions of agency, goal-directedness, and consequentialism. It argues that it is not humans losing power that we should be most worried about, but humans quickly gaining power and misusing such a rapid increase in power.

Re-reading the post now, I have the sense that the arguments are even more relevant than when it was written, due to the broad improvements in machine learning models since it was written. The arguments in this po... (read more)

Thanks for writing this.

Alignment research has a track record of being a long slow slog. It seems that what we’re looking for is a kind of insight that is just very very hard to see, and people who have made real progress seem to have done so through long periods of staring at the problem.

With your two week research sprints, how do you decide what to work on for a given sprint?

Well suffering is a real thing, like bread or stones. It's not a word that refers to a term in anyone's utility function, although it's of course possible to formulate utility functions that refer to it.

I have understood it to be the experience of adversity. You are in pain and hate it, that is you suffer. If you are in pain and like it that is not suffering. If you knew that your choice would lead to your suffering you probably would not be making that choice. Hence the main way that problematic suffering is produced is not seeing the connection between choices and outcomes. "I went to avoid pain and now I am in pain, where did I go wrong?". This kind of problem would form even if the states that you find worth seeking and avoiding would be arbitrarily given at random, as long as you don't have an infinitely competent world model. The claim that some actions get you nearer to the arbitray goals and some get you away from them would still hold. Even if it would not refer to same states or concepts for different individuals evaluating the claim for their different arbitrary goals would still check this out. Putting a needle into themselfs makes one suffers and the other blisses out which compared to the boring option of not applying the needle shows that not all actions are equal for goal aquisition. So you might form a pro-needle or con-needle opinion and form a corresponding strategy. But then you might encounter something other like ice for which the needle stuff is inapplicable. But there seems to be an innate capability to "know" whether suffering occurs and this can be done before and independent of the formation of the opinions or strategies. Thus you might believe that you are a ice-blisser but then infact discover that you are an ice-sufferer. "utility function" might mean the functionality of the black box that reveals this goalnessness perception, "We are in a bad experience right now". Or "utility function" might refer to the opinion that you profess, "I am the kind of person that blisses about ice". Over time your opinions tend to grow (refer to more stuff and make finer distinctions). It is plausible or atleast imaginable that the innate goal experie

The direct information I'm aware of is (1) CZ's tweets about not acquiring, (2) SBF's own tweets yesterday, (3) the leaked P&L doc from Alameda. I don't think any of these are sufficient to decide "SBF committed fraud" or "SBF did something unethical". Perhaps there is additional information that I haven't seen, though.

(I do think that if SBF committed fraud, then he did something unethical.)

You have to be confident that no such information is available to say that it's too early for others to have made up their mind. It sounds like it's too early for you, but you don't know how much time others have spent following the situation. Obviously nothing is slam-dunk certain when the situation's still developing, but it's often the case that you can draw fairly strong conclusions based on a few unusual data points. You can't assess whether that's the case if you're not aware of all the data points that are out there. 

If you view people as machiavelian actors using models to pursue goals then you will eventually find social interactions to be bewildering and terrifying, because there actually is no way to discern honesty or kindness or good intention if you start from the view that each person is ultimately pursuing some kind of goal in an ends-justify-means way.

But neither does it really make sense to say "hey let's give everyone the benefit of the doubt because then such-and-such".

I think in the end you have to find a way to trust something that is not the particular beliefs or goals of a person.

In Buddhist ideology, the reason to pick one set of values over another is to find an end to suffering. The Buddha claimed that certain values tended to lead towards the end of suffering and other values tended to lead in the opposite direction. He recommended that people check this claim for themselves.

In this way values are seen as instrumental rather than fundamental in Buddhism -- that is, Buddhists pick values on the basis of the consequences of holding those values, rather than any fundamental rightness of the values themselves.

Now you may say that t... (read more)

Is suffering something other than experiencing ununderstood negative terms of your utility function?

There's mounting evidence that FTX was engaged in theft/fraud, which would be straightforwardly unethical.

I think it's way too early to decide anything remotely like that. As far as I understand, we have a single leaked balance sheet from Alameda and a handful of tweets from CZ (CEO of Binance) who presumably got to look at some aspect of FTX internals when deciding whether to acquire. Do we have any other real information?

Having spent the better part of the last three days looking into this, I disagree. FTX lent $10 billion out of $16 billion in customer assets to a hedge fund in which its CEO owned a 50% stake. It accepted at least $4 billion in collateral of its own token, FTT. The total circulating supply at the time was less than that. Exchanges are NEVER supposed to lend out customer funds without their consent, and it's clear FTX did that. What's more, they should not accept their own "stock" as collateral. Accepting your own stock as collateral is like opening the book of world-ending spells and randomly chanting incantations; there's a reason no one does it. Normally it's impossible to bankrupt a company by shorting its stock. But if the company in question holds its own stock as collateral, that changes. You can crash the price of stock (or a token in this case) by borrowing a ton of it and then selling it on the market. And now the company has a major problem: they've lent out user funds, but the collateral that was supposed to protect them in case the borrower defaulted is now worthless. If users become aware of the situation, a bank panic will ensue as everyone tries to get their money out before the bank (or exchange in this case) runs out. After that, all they will have left is their useless stock, which they can't redeem for assets they owe to users. If you have huge reserves, this wouldn't really matter much because you could pay them off with those other reserves. But when you've lent out over half of all user funds to this one borrower in particular without their knowledge, accepted collateral that is now worth nothing, and lied about it publicly on Twitter... You're in for a world of pain. Sam Bankman Fried did commit fraud, and knowingly lied about it to the public. I will happily eat my shorts if he turns out to be innocent. I think what he did is a crying shame. I deeply admired him before this all went down. He had created a money printing machine and w
3Cleo Nardo1y
Are you saying that it's too early to claim "SBF committed fraud", or "SBF did something unethical", or "if SBF committed fraud, then he did something unethical"? I think we have enough evidence to assert all three.

I'm curious about this too. I actually have the sense that overall funding for AI alignment was already larger than overall shovel-ready projects before FTX was involved. This is normal and expected in a field that many people is working on an important problem but where most of the work is funding for research, and where hardly anyone has promising scalable uses for money.

I think this led a lot of prizes being announced. A prize is a good way to fund if you don't see enough shovel-ready projects to exhaust your funding. You offer prizes for anyone who can... (read more)

Regarding your point on ELK: to make the output of the opaque machine learning system counterfactable, wouldn't it be sufficient to include the whole program trace? Program trace means the results of all the intermediate computations computed along the way. Yet including a program trace wouldn't help us much if we don't know what function of that program trace will tell us, for example, whether the machine learning system is deliberately deceiving us.

So yes it's necessary to have an information set that includes the relevant information, but isn't the main part of the (ELK) problem to determine what function of that information corresponds to the particular latent variable that we're looking for?

2Scott Garrabrant1y
I agree, this is why I said I am being sloppy with conflating the output and our understanding of the output. We want our understanding of the output to screen off the history.

If I understand you correctly, the reason that this notion of counterfactable connects with what we normally call a counterfactual is that when an event screens of its own history, it's easy to consider other "values" of the "variable" underlying that event without coming into any logical contradictions with other events ("values of other variables") that we're holding fixed.

For example if I try to consider what would have happened if there had been a snow storm in Vermont last night, while holding fixed the particular weather patterns observed in Vermont ... (read more)

2Scott Garrabrant1y
Yeah, remember the above is all for updateless agents, which are already computationally intractable. For updateful agents, we will want to talk about conditional counterfactability. For example, if you and I are in a prisoners dilemma, we could would conditional on all the stuff that happened prior to us being put in separate cells, and given this condition, the histories are much smaller.  Also, we could do all of our reasoning up to a high level world model that makes histories more reasonably sized. Also, if we could think of counterfactability as a spectrum. Some events are especially hard to reason about, because there are lots of different ways we could have done it, and we can selectively add details to make it more and more counterfactable, meaning it approximately screens off its history from that which you care about.

I expect you could build a system like this that reliably runs around and tidies your house say, or runs your social media presence, without it containing any impetus to become a more coherent agent (because it doesn’t have any reflexes that lead to pondering self-improvement in this way).

I agree, but if there is any kind of evolutionary variation in the thing then surely the variations that move towards stronger goal-directedness will be favored.

I think that overcoming this molochian dynamic is the alignment problem: how do you build a powerful system ... (read more)

I really appreciate this post!

For instance, employers would often prefer employees who predictably follow rules than ones who try to forward company success in unforeseen ways.

Fascinatingly, EA employers in particular seem to seek employees who do try to forward organization goals in unforeseen ways!

Yeah right. There is something about existing within a spatial world that makes it reasonable to have a bunch of bodies operating somewhat independently. The laws of physics seem to be local, and they also place limits on communication across space, and for this reason you get, I suppose, localized independent consciousness.

Agreed, but what is it about the structure of the world that made it the case that this Cartesian simplification works so much of the time?

1Filip Sondej1y
It just looks that's what worked in evolution - to have independent organisms, each carrying its own brain. And the brain happens to have the richest information processing and integration, compared to information processing between the brains. I don't know what would be necessary to have a more "joined" existence. Mushrooms seem to be able to form bigger structures, but they didn't have an environment complex enough to require the evolution of brains.
Basically, without mind-uploading, you really can't safely merge minds, nor can you edit them very well. You also can't put a brain in a new body without destroying it. This the simplification of a separate mind works very well.

It's a very interesting point you make because we normally think of our experience as so fundamentally separate from others. Just to contemplate conjoined twins accessing one anothers' experiences but not have identical experiences really bends the heck out of our normal way of considering mind.

Why is it, do you think, that we have this kind of default way of thinking about mind as Cartesian in the first place? Where did that even come from?

I imagine that shared bodies with shared experiences are difficult to coordinate. If you have two bodies with two brains, they can go on two different places, do two different things in parallel. If you have one body, it can only be at one place and do one thing, but it's perfectly coordinated. One-and-half body with one-and-half brain seems to have all disadvantages of one body, but much worse coordination. Thus evolution selects for separate bodies, each with one mind. (We have two hemispheres, and we may be unconsciously thinking about multiple things in parallel, but we have one consciousness which decides the general course of action.) We might try looking for counter-examples in nature. Octopi seem to be smart, and they have a nervous system less centralized than humans. They still have a central brain, but most of their neurons are in arms. I wonder whether that means something, other than that movement of an octopus arm is more difficult (has more degrees of freedom) than movement of a human limb.
1Filip Sondej1y
It seems that we just never had any situations that would challenge this way of thinking (those twins are an exception). This Cartesian simplification almost always works, so it seems like it's just the way the world is at its core.

I have the sense that boundaries are so effective as a coordination mechanism that we have come to believe that they are an end in themselves. To me it seems that the over-use of boundaries leads to loneliness that eventually obviates all the goodness of the successful coordination. It's as if we discovered that cars were a great way to get from place to place, but then we got so used to driving in cars that we just never got out of them, and so kind of lost all the value of being able to get from place to place. It was because the cars were in fact so eff... (read more)

Oooo I like this comment, especially the first two examples ----------------------------------------   also, Personally I wouldn't call this a «boundary». I don't consider boundaries to be things that are "set" or "established"
Load More