Why Everyone (Else) Is a Hypocrite: Evolution and the Modular Mind
Concept Safety
Multiagent Models of Mind
Keith Stanovich: What Intelligence Tests Miss

Wiki Contributions


Agree. This connects to why I think that the standard argument for evolutionary misalignment is wrong: it's meaningless to say that evolution has failed to align humans with inclusive fitness, because fitness is not any one constant thing. Rather, what evolution can do is to align humans with drives that in specific circumstances promote fitness. And if we look at how well the drives we've actually been given generalize, we find that they have largely continued to generalize quite well, implying that while there's likely to still be a left turn, it may very well be much milder than is commonly implied.

Ending a relationship/marriage doesn't necessarily imply that you no longer love someone (I haven't been married but I do still love several of my ex-partners), it just implies that the arrangement didn't work out for one reason or another.

I would guess that getting space colonies to the kind of a state where they could support significant human inhabitation would be a multi-decade project, even with superintelligence? Especially taking into account that they won't have much nature without significant terraforming efforts, and quite a few people would find any colony without any forests etc. to be intrinsically dystopian.

hmm, I don't understand something, but we are closer to the crux :)

Yeah I think there's some mutual incomprehension going on :)

  1. To the question, "Would you update if this experiment is conducted and is successful?" you answer, "Well, it's already my default assumption that something like this would happen". 
  2. To the question, "Is it possible at all?" You answer 70%. 

So, you answer 99-ish% to the first question and 70% to the second question, this seems incoherent.

For me "the default assumption" is anything with more than 50% probability. In this case, my default assumption has around 70% probability.

It seems to me that you don't bite the bullet for the first question if you expect this to happen. Saying, "Looks like I was right," seems to me like you are dodging the question.

Sorry, I don't understand this. What question am I dodging? If you mean the question of "would I update", what update do you have in mind? (Of course, if I previously gave an event 70% probability and then it comes true, I'll update from 70% to ~100% probability of that event happening. But it seems pretty trivial to say that if an event happens then I will update to believing that the event has happened, so I assume you mean some more interesting update.)

Hum, it seems there is something I don't understand; I don't think this violates the law.

I may have misinterpreted you; I took you to be saying "if you expect to see this happening, then you might as well immediately update to what you'd believe after you saw it happen". Which would have directly contradicted "Equivalently, the mere expectation of encountering evidence—before you’ve actually seen it—should not shift your prior beliefs".

I agree I only gave the skim of the proof, it seems to me that if you can build the pyramid, brick by brick, then this solved the meta-problem.

for example, when I give the example of meta-cognition-brick, I say that there is a paper that already implements this in an LLM (and I don't find this mysterious because I know how I would approximately implement a database that would behave like this).

Okay. But that seems more like an intuition than even a sketch of a proof to me. After all, part of the standard argument for the hard problem is that even if you explained all of the observable functions of consciousness, the hard problem would remain. So just the fact that we can build individual bricks of the pyramid isn't significant by itself - a non-eliminativist might be perfectly willing to grant that yes, we can build the entire pyramid, while also holding that merely building the pyramid won't tell us anything about the hard problem nor the meta-problem. What would you say to them to convince them otherwise?

  1. Let's say we implement this simulation in 10 years and everything works the way I'm telling you now. Would you update?

Well, it's already my default assumption that something like this would happen, so the update would mostly just be something like "looks like I was right".

2. What is the probability that this simulation is possible at all? 

You mean one where AIs that were trained with no previous discussion of the concept of consciousness end up reinventing the hard problem on their own? 70% maybe.

If you expect to update in the future, just update now.  

That sounds like it would violate conservation of expected evidence:

... for every expectation of evidence, there is an equal and opposite expectation of counterevidence.

If you expect a strong probability of seeing weak evidence in one direction, it must be balanced by a weak expectation of seeing strong evidence in the other direction. If you’re very confident in your theory, and therefore anticipate seeing an outcome that matches your hypothesis, this can only provide a very small increment to your belief (it is already close to 1); but the unexpected failure of your prediction would (and must) deal your confidence a huge blow. On average, you must expect to be exactly as confident as when you started out. Equivalently, the mere expectation of encountering evidence—before you’ve actually seen it—should not shift your prior beliefs.


To me, this thought experiment solves the meta-problem and so dissolves the hard problem.

I don't see how it does? It just suggests that a possible approach by which the meta-problem could be solved in the future.

Suppose you told me that you had figured out how to create cheap and scalable source of fusion power. I'd say oh wow great! What's your answer? And you said that, well, you have this idea for a research program that might, in ten years, produce an explanation of how to create cheap and scalable fusion power.

I would then be disappointed because I thought you had an explanation that would let me build fusion power right now. Instead, you're just proposing another research program that hopes to one day achieve fusion power. I would say that you don't actually have it figured it out yet, you just think you have a promising lead.

Likewise, if you tell me that you have a solution to the meta-problem, then I would expect an explanation that lets me understand the solution to the meta-problem today. Not one that lets me do it ten years in the future, when we investigate the logs of the AIs to see what exactly it was that made them think the hard problem was a thing.

I also feel like this scenario is presupposing the conclusion - you feel that the right solution is an eliminativist one, so you say that once we examine the logs of the AIs, we will find out what exactly made them believe in the hard problem in a way that solves the problem. But a non-eliminativist might just as well claim that once we examine the logs of the AIs, we will eventually be forced to conclude that we can't find an answer there, and that the hard problem still remains mysterious.

Now personally I do lean toward thinking that examining the logs will probably give us an answer, but that's just my/your intuition against the non-eliminativist's intuition. Just having a strong intuition that a particular experiment will prove us right isn't the same as actually having the solution.

I quite liked the way that this post presented your intellectual history on the topic, it was interesting to read to see where you're coming from.

That said, I didn't quite understand your conclusion. Starting from Chap. 7, you seem to be saying something like, "everyone has a different definition for what consciousness is; if we stop treating consciousness as being a single thing and look at each individual definition that people have, then we can look at different systems and figure out whether those systems have those properties or not".

This makes sense, but - as I think you yourself said earlier in the post - the hard problem isn't about explaining every single definition of consciousness that people might have? Rather it's about explaining one specific question, namely:

The explanatory gap in the philosophy of mind, represented by the cross above, is the difficulty that physicalist theories seem to have in explaining how physical properties can give rise to a feeling, such as the perception of color or pain.

You cite Critch's list of definitions people have for consciousness, but none of the three examples that you quoted seem to be talking about this property, so I don't see how they're related or why you're bringing them up.

With regard to this part:

If they do reinvent the hard problem, it would be a big sign that the AIs in the simulation are “conscious” (in the reconstructed sense).

I assert that this experiment would solve the hard problem, because we could look at the logs,[4] and the entire causal history of the AI that utters the words "Hard pro-ble-m of Con-scious-ness" would be understandable. Everything would just be plainly understandable mechanistically, and David Chalmer would need to surrender.

This part seems to be quite a bit weaker than what I read you to be saying earlier. I interpreted most of the post to be saying "I have figured out the solution to the problem and will explain it to you". But this bit seems to be weakening it to "in the future, we will be able to create AIs that seem phenomenally conscious and solve the hard problem by looking at how they became that". Saying that we'll figure out an answer in the future when we have better data isn't actually giving an answer now.

In the current era, the economics are such that war and violence tend to pay relatively badly, because countries get rich by having a well-developed infrastructure and war tends to destroy that, so conquest will get you something that won't be of much value. This is argued to be one of the reasons for why we have less war today, compared to the past where land was the scarce resource and military conquest made more sense. 

However, if we were to shift to a situation where matter could be converted into computronium... then there are two ways that things could go. One possibility is that it would be an extension of current trends, as computronium is a type of infrastructure and going to war would risk destroying it. 

But the other possibility is that if you are good enough at rebuilding something that has been destroyed, then this is going back to the old trend where land/raw matter was a valuable resource - taking over more territory allows you to convert it into computronium (or recycle and rebuild the ruins of the computronium you took over). Also, an important part of "infrastructure" is educated people who are willing and capable of running it - war isn't bad just because it destroys physical facilities, it's also bad because it kills some of the experts who could run those facilities for you. This cost is reduced if you can just take your best workers and copy as many of them as you want to. All of that could shift us back to a situation where the return on investment for violence and conquest becomes higher than for peaceful trade.

As Azar Gat notes in War in Human Civilization (2006), for most of human history, war ‘paid,’ at least for the elites who made decisions. In pre-industrial societies, returns to capital investment were very low. They could – and did – build roads and infrastructure, irrigation systems and the like, but the production multiplier for such investments was fairly low. For antiquity, the Roman Empire probably represents close to the best that could be achieved with such capital investments and one estimate, by Richard Saller, puts the total gains per capita at perhaps 25% over three centuries (a very rough estimate, but focus on the implied scale here; the real number could be 15% or 30%, but it absolutely isn’t 1000% or 100% or even probably 50%).

But returns to violent land acquisition were very, very high. In those same three centuries, the Romans probably increased the productive capacity of their empire by conquest 1,200% (note that’s a comma, not a dot!), going from an Italian empire of perhaps 5,000,000 to a Mediterranean empire in excess of 60,000,000 (and because productivity per capita was so relatively insensitive to infrastructure investments, we can on some level extrapolate production straight out of population here in a way that we couldn’t discussing the modern world). Consequently, the ‘returns to warfare’ – if you won – were much higher than returns to peace. The largest and most prosperous states tended to become the largest and most prosperous states through lots of warfare and they tended to stay that way through even more of it.

This naturally produced a lot of very powerful incentives towards militarism in societies. Indeed, Gat argues (and I agree) that the state itself appears to have emerged as a stage in this competitive-militarism contest where the societies which were best at militarizing itself and coordinating those resources survived and aggregated new resources to themselves in conflict; everyone else could imitate or die (technically ‘or suffer state-extinction’ with most of the actual people being subjugated to the new states and later empires). [...]

And this makes a lot of sense if you think about the really basic energy economy of these societies: nearly all of the energy they are using comes from the land, either in the form of crops grown to feed either humans or animals who then do work with that energy. Of course small amounts of wind and water power were used, but only small amounts.

As Gat notes, the industrial revolution changed this, breaking the agricultural energy economy. Suddenly it was possible, with steam power and machines, to use other kinds of energy (initially, burning coal) to do work (more than just heating things) – for the first time, societies could radically increase the amount of energy they could dispose of without expanding. Consequently – as we’ve seen – returns to infrastructure and other capital development suddenly became much higher. At the same time, these new industrial technologies made warfare much more destructive precisely because the societies doing the warfare now had at their disposal far larger amounts of energy. Industrial processes not only made explosives possible, they also enabled such explosives to be produced in tremendous quantities, creating massive, hyper-destructive armies. Those armies were so destructive, they tended to destroy the sort of now-very-valuable mechanical infrastructure of these new industrial economies; they made the land they acquired less valuable by acquiring it. So even as what we might term ‘returns to capital’ were going wildly up, the costs of war were also increasing, which mean that ‘returns to warfare’ were going down for the first time in history.

It’s not clear exactly where the two lines cross, but it seems abundantly clear that for the most developed economies, this happened sometime before 1914 because it is almost impossible to argue that anything that could have possibly been won in the First World War could have ever – even on the cynical terms of the competitive militarism of the pre-industrial world – been worth the expenditure in blood and treasure.

And as far as we can tell, there don't appear to be any sharp discontinuities here, such that above a certain skill level it's beneficial to take things by force rather than through negotiation and trade. It's plausible that very smart power-seeking AIs would just become extremely rich, rather than trying to kill everyone.

I think this would depend quite a bit on the agent's utility function. Humans tend more toward satisficing than optimizing, especially as they grow older - someone who has established a nice business empire and feels like they're getting all their wealth-related needs met likely doesn't want to rock the boat and risk losing everything for what they perceive as limited gain. 

As a result, even if discontinuities do exist (and it seems pretty clear to me that being able to permanently rid yourself of all your competitors should be a discontinuity), the kinds of humans who could potentially make use of them are unlikely to.

In contrast, an agent that was an optimizer and had an unbounded utility function might be ready to gamble all of its gains for just a 0.1% chance of success if the reward was big enough. 

Cool that you published this! Could you post some example dialogues with the bot that you think went particularly well?

Hmm, I think people have occasionally asked me "how's your week going" on dating apps and I've liked it overall - I'm pretty sure I'd prefer it over your suggested alternative! No doubt to a large extent because I suck at cooking and wouldn't know what to say. Whereas a more open-ended question feels better: I can just ramble a bunch of things that happen to be on my mind and then go "how about yourself?" and then it's enough for either of our rambles to contain just one thing that the other party might find interesting.

It feels like your proposed question is a high-variance startegy: if you happen to find a question that the other person finds easy and interesting to answer, then the conversation can go really well. But if they don't like the direction you're offering, then it'd have been better to say something that would have given them more control over the direction.

Load More