Wiki Contributions


Is AI Alignment a pseudoscience?

Yeah, but also this is the sort of response that goes better with citations.

Like, people used to make a somewhat hand-wavy argument that AIs trained on goal X might become consequentialists which pursued goal Y, and gave the analogy of the time when humans 'woke up' inside of evolution, and now are optimizing for goals different from evolution's goals, despite having 'perfect training' in some sense (and the ability to notice the existence of evolution, and its goals). Then eventually someone wrote Risks from Learned Optimization in Advanced Machine Learning Systems in a way that I think involves substantially less hand-waving and substantially more specification in detail.

Of course there are still parts that remain to be specified in detail--either because no one has written it up yet (Risks from Learned Optimization came from, in part, someone relatively new to the field saying "I don't think this hand-wavy argument checks out", looking into it a bunch, being convinced, and then writing it up in detail), or because we don't know what we're looking for yet. (We have a somewhat formal definition of 'corrigiblity', but is it the thing that we actually want in our AI designs? It's not yet clear.)

Hyperpalatable Food Hypothesis: A LessWrong Study?

I enjoy my first MealSquare. My fourth (I eat one meal a day) is generally "fine." Whether or not I eat a fifth (or sixth) depends on how hungry I am in a manner much more pronounced than it is for other foods.

Hyperpalatable Food Hypothesis: A LessWrong Study?

Well the point isn't meant to be that the food is inherently unsatisfying. The point is meant to be that the food is stuff that is within the normal range of palatability we are adapted for. 

IMO you either want to go the 'French women' approach as described in another comment, or you want to select a food that is 'bland'. The specific property I mean is a psychological reaction, and so it's going to fire for different foods for different people, but basically: when you're starting a meal you want to eat the food, and then when you've eaten enough of the food, you look at more on your plate and go "I'm not finishing that." [This is different from the "I'm too full" reaction; there have been many times that I have put MealSquares back in the fridge when I would have eaten more bread.]

One thing that I've tried, but not for long enough to get shareable data, is having the 'second half' of my day's calories be bland food. (That is, cook / order 1000 calories of tasty food, and then eat as many MealSquares as I want afterwards.) This is less convenient than a "cheat day" style of diet, but my guess is it's more psychologically easy.

A non-mystical explanation of "no-self" (three characteristics series)

Another thing that I don't quite like about that definition is that it looks like it's saying "not and" which is not quite the thing? Like I can look at that and go "oh, okay, my separate independent acausal autonomous self can be in reality, because it's impermanent." Instead I want it to be something like "the self is temporary instead of permanent, embedded instead of separate, dependent instead of independent, causal instead of acausal, <> instead of autonomous" (where I'm not quite sure what Ingram is hoping to contrast autonomous with).

Also, since I'm thinking about this, one of the things that I like about "observation" / think is a big part of Buddhist thinking that is useful to clearly explain to people, is that this is (as I understand it) not an axiom that you use to build your model of the world, but a hypothesis that you are encouraged to check for yourself (in the same way that we might have physics students measure the amount of time it takes for objects to drop, and (ideally) not really expect them to believe our numbers without checking them themselves). "You think your self isn't made of parts? Maybe you should pay attention to X, and see if you still think that afterwards."

Circling as Cousin to Rationality

This post is hard for me to review, because I both 1) really like this post and 2) really failed to deliver on the IOUs. As is, I think the post deserves highly upvoted comments that are critical / have clarifying questions; I give some responses, but not enough that I feel like this is 'complete', even considering the long threads in the comments.

[This is somewhat especially disappointing, because I deliberately had "December 31st" as a deadline so that this would get into the 2019 review instead of the 2020 review, and had hoped this would be the first post in a sequence that would be remembered fondly instead of something closer to 'a shout into the void'; also apparently I was tricked by the difference between server time and local time or something, and so it's being reviewed now instead of last year, one of the oldest posts instead of one of the newest.]

And so it's hard to see the post without the holes; it's hard to see the holes without guilt, or at least a lingering yearning.

The main thing that changed after this post is some Circlers reached out to me; overall, I think the reception of this post in the Circling world was positive. I don't know if the rationalist world thought much differently about Circling; I think the pandemic killed most of the natural momentum it had, and there wasn't any concerted push (that I saw) to use Circle Anywhere, which might have kept the momentum going (or spread it).

The "Outside the Box" Box

I think it's not the case that "neural networks" as discussed in this post made AlphaGo. That is, almost of the difficulty in making AlphaGo happen was picking which neural network architecture would solve the problem / buying fast enough computers to train it in a reasonable amount of time. A more recent example might be something like "model-based reinforcement learning"; for many years 'everyone knew' that this was the next place to go, while no one could write down an algorithm that actually performed well.

I think the underlying point--if you want to think of new things, you need to think original thoughts instead of signalling "I am not a traditionalist"--is broadly correct even if the example fails.

That said, I agree with you that the example seems unfortunately timed. In 2007, some CNNs had performed well on a handful of tasks; the big wins were still ~4-5 years in the future. If the cached wisdom had been "we need faster computers," I think the cached wisdom would have looked pretty good.

A non-mystical explanation of "no-self" (three characteristics series)

I like what this post is trying to do more than I like this post. (I still gave it a +4.)

That is, I think that LW has been flirting with meditation and similar practices for years, and this sort of 'non-mystical explanation' is essential to make sure that we know what we're talking about, instead of just vibing. I'm glad to see more of it.

I think that no-self is a useful concept, and had written a (shorter, not attempting to be fully non-mystical) post on the subject several months before. I find myself sort of frustrated that there isn't a clear sentence that I can point to, which identifies what no-self is, like "no-self is the observation that the 'self' can be reduced to constituent parts instead of being ontologically basic."

But when I imagine Kaj reading the previous paragraph, well, can't he point out that there's actually a class of insights here, rather than just a single concept? For example, I didn't include in that sentence that you can introspect into the process by which your mind generates your perception of self, or the way in which a sense of self is critical to the planning apparatus, or so on. I'm making the mistake he describes in the second paragraph, of pointing to something and saying "this is enlightenment" instead of thinking about the different enlightenments.

Even after that (imagined) response, I still have some sense that something is backwards. The section heading ("Early insights into no-self") seems appropriate, but the post title ("a non-mystical explanation") seems like overreach. The explanation is there, in bits and pieces, but it reads somewhat more like an apology for not having a real explanation.

[For example, the 'many insights' framing makes more sense to me if we have a map or a list of those insights, which I think we don't have (or, even if some Buddhist experts have it, it's not at all clear we'd trust their ontology or epistemology). To be fair, I think we haven't build that map/list for rationality either, but doing that seems like an important task for the field as a whole.]

Brain Efficiency: Much More than You Wanted to Know

But if the brain is already near said practical physical limits, then merely achieving brain parity in AGI at all will already require using up most of the optimizational slack, leaving not much left for a hard takeoff - thus a slower takeoff.

While you do talk about stuff related to this in the post / I'm not sure you disagree about facts, I think I want to argue about interpretation / frame.

That is, efficiency is a numerator over a denominator; I grant that we're looking at the right numerator, but even if human brains are maximally efficient by denominator 1, they might be highly inefficient by denominator 2, and the core value of AI may be being able to switch from denominator 1 to denominator 2 (rather than being a 'straightforward upgrade').

The analogy between birds and planes is probably useful here; birds are (as you would expect!) very efficient at miles flown per calorie, but if it's way easier to get 'calories' through chemical engineering on petroleum, then a less efficient plane that consumes jet fuel can end up cheaper. And if what's economically relevant is "top speed" or "time it takes to go from New York to London", then planes can solidly beat birds. I think we were living in the 'fast takeoff' world for planes (in a technical instead of economic sense), even tho this sort of reasoning would have suggested there would be slow takeoff as we struggled to reach bird efficiency.

The easiest disanalogy between humans and computers is probably "ease of adding more watts"; my brain is running at ~10W because it was 'designed' in an era when calories were super-scarce and cooling was difficult. But electricity is super cheap, and putting 200W through my GPU and then dumping it into my room costs basically nothing. (Once you have 'datacenter' levels of compute, electricity and cooling costs are significant; but again substantially cheaper than the costs of feeding similar numbers of humans.)

A second important disanalogy is something like "ease of adding more compute in parallel"; if I want to add a second GPU to my computer, this is a mild hassle and only takes some tweaks to work; if I want to add a second brain to my body, this is basically impossible. [This is maybe underselling humans, who make organizations to 'add brains' in this way, but I think this is still probably quite important for timeline-related concerns.]

Load More