deepthoughtlife - LessWrong

No. That's a foolish interpretation of domain insight. We have a massive number of highly general strategies that nonetheless work better for some things than others. A domain insight is simply some kind of understanding involving the domain being put to use. Something as simple as whether to use a linked list or an array can use a minor domain insight. Whether to use a monte carlo search or a depth limited search and so one are definitely insights. Most advances in AI to this point have in fact been based on domain insights, and only a small amount on scaling within an approach (though more so recently). Even the 'bitter lesson' is an attempted insight into the domain (that is wrong due to being a severe overreaction to previous failure.)

Also, most domain insights are in fact an understanding of constraints. 'This path will never have a reward' is both an insight and a constraint. 'Dying doesn't allow me to get the reward later' is both a constraint and a domain insight. So is 'the lists I sort will never have numbers that aren't between 143 and 987' (which is useful for and O(n) type of sorting). We are, in fact, trying to automate the process of getting domain insights via machine with this whole enterprise in AI, especially in whatever we have trained them for.

Even, 'should we scale via parameters or data' is a domain insight. They recently found out they had gotten that wrong (Chinchilla) too because they focused too much on just scaling.

Alphazero was given some minor domain insights (how to search and how to play the game), years later, and ended up slightly beating a much earlier approach, because they were trying to do that specifically. I specifically said that sort of thing happens. It's just not as good as it could have been (probably).

Thoughts on AGI consciousness / sentience

deepthoughtlife2y21

I do agree with your rephrasing. That is exactly what I mean (though with a different emphasis.).

Why Do People Think Humans Are Stupid?

Answer by deepthoughtlifeSep 15, 202210

I agree with you. The biggest leap was going to human generality level for intelligence. Humanity already is a number of superintelligences working in cooperation and conflict with each other; that's what a culture is. See also corporations and governments. Science too. This is a subculture of science worrying that it is superintelligent enough to create a 'God' superintelligence.

To be slightly uncharitable, the reason to assume otherwise is fear -either their own or to play on that of others. Throughout history people have looked for reasons why civilization would be destroyed, and this is just the latest. Ancient prophesiers of doom were exactly the same as modern ones. People haven't changed that much.

That doesn't mean we can't be destroyed, of course. A small but nontrivial percentage of doomsayers were right about the complete destruction of their civilization. They just happened to be right by chance most of the time.

I also agree that quantitative differences could possibly end up being very large, since we already have immense proof of that in one direction given that we have superintelligences massively larger than we are already, and computers have already made them immensely faster than they used to be.

I even agree that it is likely that they key advantages quantitatively would likely be in supra-polynomial arenas that would be hard to improve too quickly even for a massive superintelligence. See the exponential resources we are already pouring into chip design for continued smooth but decreasing progress and even higher exponential resources being poured into dumb tool AIs for noticeable but not game changing increases. While I am extremely impressed by some of them like Stable Diffusion (an image generation AI that has been my recent obsession) there is such a long way to go that resources will be a huge problem before we even get to human level, much less superhuman.

Thoughts on AGI consciousness / sentience

deepthoughtlife2y41

Honestly Illusionism is just really hard to take seriously. Whatever consciousness is, I have better evidence it exists than anything else since it is the only thing I actually experience directly. I should pretend it isn't real...why exactly? Am I talking to slightly defective P-zombies?

If the computer emitted it for the same reasons...is a clear example of a begging the question fallacy. If a computer claimed to be conscious because it was conscious, then it logically has to be conscious, but that is the possible dispute in the first place. If you claim consciousness isn't real, then obviously computers can't be conscious. Note, that you aren't talking about real illusionism if you don't think we are p-zombies. Only the first of the two possibilities you mentioned is Illusionism if I recall correctly.

You seem like one of the many people trying to systematize things they don't really understand. It's an understandable impulse, but leads to an illusion of understanding (which is the only thing that leads to a systemization like Illusionism seems like frustrated people claiming there is nothing to see here.)
If you want a systemization of consciousness that doesn't claim things it doesn't know, then assume consciousness is the self-reflective and experiential part of the mind that controls and directs large parts of the overall mind. There is no need to state what causes it.

If a machine fails to be self-reflective or experiential then it clearly isn't conscious. It seems pretty clear that modern AI is neither. It probably fails the test of even being a mind in any way, but that's debatable.

Is it possible for a machine to be conscious? Who knows. I'm not going to bet against it, but current techniques seem incredibly unlikely to do it.

Can someone explain to me why most researchers think alignment is probably something that is humanly tractable?

Answer by deepthoughtlifeSep 03, 20221-3

As individuals, Humans routinely do things much too hard for them to fully understand successfully. This is due partly due to innately hardcoded stuff (mostly for things we think are simple like vision and controlling our bodies automatic systems), and somewhat due to innate personality, but mostly due to the training process our culture puts us through (for everything else).

For its part, cultures can take the inputs of millions to hundreds of millions of people (or even more when stealing from other cultures), and distill them into both insights and practices that absolutely no one would have ever come up with on their own. The cultures themselves are, in fact, massively superintelligent compare to us, and people are effectively putting their faith either in AI being no big deal because it is too limited, or in the fact that we can literally ask a superintelligence for help in designing things much stupider than culture is to not turn on us too much.

AI is currently a small sub-culture within the greater cultures, and struggling a bit with the task, but as AI grows more impressive, much more of culture will be about how to align and improve AI for our purposes. If the full might of even a midsized culture ever sees this as important enough, alignment will probably become quite rapid, not because it is an easy question, but because cultures are terrifyingly capable.

At a guess, Alignment researches have seen countless impossible tasks fall to the midsized 'Science' culture of which they are a part, and many think this is much the same. 'Human achievable' means anything a human-based culture could ever do. This is just about anything that doesn't violate the substrates it is based on too much (and you could even see AI as a way around that.). Can human cultures tame a new substrate? It seems quite likely.

What's the Most Impressive Thing That GPT-4 Could Plausibly Do?

deepthoughtlife2y10

I'm hardly missing the point. It isn't impressive to have it be exactly 75%, not more or less, so the fact that it can't always be that is irrelevant. His point isn't that that particular exact number matters, it's that the number eventually becomes very small. But since the number being very small compared to what it should be does not prevent it from being made smaller by the same ratio, his point is meaningless. It isn't impressive to fulfill an obvious bias toward updating in a certain direction.

Any Utilitarianism Makes Sense As Policy

deepthoughtlife2y10

It doesn't take many people to cause these effects. If we make them 'the way', following them doesn't take an extremist, just someone trying to make the world better, or some maximizer. Both these types are plenty common, and don't have to make it fanatical at all. The maximizer could just be a small band of petty bureaucrats who happen to have power over the area in question. Each one of them just does their role, with a knowledge that it is to prevent overall suffering. These aren't even the kind of bureaucrats we usually dislike! They are also monsters, because the system has terrible (and knowable) side effects.

A gentle primer on caring, including in strange senses, with applications

deepthoughtlife2y30

I don't have much time, so:

While footnote 17 can be read as applying, it isn't very specific.

For all that you are doing math, this isn't mathematics, so base needs to be specified.

I am convinced that people really do give occasional others a negative weight.

And here are some notes I wrote while finishing the piece (that I would have edited and tightened up a a lot)(it's a bit all over the place):

This model obviously assumes utilitarianism.
Honestly, their math does seem reasonable to account for people caring about other people (as long as they care about themselves at all on the same scale, which could even be negative, just not exactly 0.).
They do add an extraneous claim that the numbers for the weight of a person can't be negative (because they don't understand actual hate? At least officially.) If someone hates themselves, then you can't do the numbers under these constraints, nor if they hate anyone else. But this constraint seems completely unnecessary, since you can sum negatives with positives easily enough.
I can't see the point of using an adjacency matrix (of a weighted directed graph).
Being completely altruistic doesn't seem like everyone gets a 1, but that everyone gets at least that much.
I don't see a reason to privilege mental similarity to myself, since there are people unlike me that should be valued more highly. (Reaction to footnote 13) Why should I care about similarities to pCEV when valuing people?

Thus, they care less about taking richer people's money. Why is the first example explaining why someone could support taking money from people you value less to give to other people, while not supporting doing so with your own money? It's obviously true under utilitarianism (which I don't subscribe to), but it's also obscures things by framing 'caring' as 'taking things from others by force'.

In 'Pareto improvements and total welfare' should a social planner care about the sum of U, or the sum of X? I don't see how it is clear that it should be X. Why shouldn't they value the sum of U, which seems more obvious?

'But it's okay for different things to spark joy'. Yes, if I care about someone I want their preferences fulfilled, not just mine, but I would like to point out that I want them to get what they want, not just for them to be happy.
Talking about caring about yourself though, if you care about yourself at different times, then you will care about what your current self does, past self did, and future self will, want. I'm not sure that my current preferences need to take into account those things though.
Thus I see two different categories of thing mattering as regards preferences. Contingent or instrumental preferences are changeable in accounting, while you should evaluate things as if your terminal preferences are unchanging.
Even though humans can have them change, such as when they have a child. Even if you already love your child automatically when you have one, you don't necessarily care who that child turns out to be, but you care quite a bit afterwards. See any time travel scenario, and the parent will care very much that Sally no longer exists even though they now have Sammy. They will likely now also terminally value Sammy. Take into account that you will love your child, but not who they are unless you will have an effect on it (such as learning how to care for them in advance making them a more trusting child.).

In practice, subsidies and taxes end up not being about externalities at all, or to a very small degree. Often, one kind of externality (often positive) will be ignored even when it is larger than the other (often negative) externality.
This is especially true in modern countries where people ignore the positive externalities of people's preferences being satisfied making them a better and more useful person in society, while they are obsessed with the idea of the negatives of any exchange.
I have a intuition that the maximum people would pay to avoid an externality is not really that close to its actual effects, and that people would generally lie if you asked them even if they knew.

In the real world, most people (though far from all) seem to have the intuition that the government uses the money they get from a tax less well than the individuals they take it from do.
Command economies are known to be much less efficient than free markets, so the best thing the government could do with a new tax is to lower less efficient taxes, but taxes only rarely go down, so this encourages wasted resources. Even when they do lower taxes, it isn't by eliminating the worst taxes. When they put it out in subsidies, they aren't well targeted subsidies either, but rather, distortionary.
Even a well targeted tax on negative externalities would thus have to handle the fact that it is, in itself, something with significant negative externalities even beyond the administrative cost (of making inefficient use of resources).

It's weird to bring up having kids vs. abortion and then not take a position on the latter. (Of course, people will be pissed at you for taking a position too.)

There are definitely future versions of myself whose utility are much more or less valuable to me than others despite being equally distant.
If in ten years I am a good man, who has started a nice family, that I take good care of, then my current self cares a lot more about their utility than an equally (morally) good version of myself that just takes care of my mother's cats, and has no wife or children (and this is separate from the fact that I would care about the effects my future self would have on that wife and children or that I care about them coming to exist).

Democracy might be less short-sighted on average because future people are more similar to average other people that currently exist than you happen to be right now. But then, they might be much more short-sighted because you plan for the future, while democracy plans for right now (and getting votes.) I would posit that sometimes one will dominate, and sometimes the other.
As to your framing, the difference between you-now and you-future is mathematically bigger than the difference between others-now and others-future if you use a ratio for the number of links to get to them.
Suppose people change half as much in a year as your sibling is different from you, and you care about similarity for what value you place on someone. Thus, two years equals one link.
After 4 years, you are now two links away from yourself-now and your sibling is 3 from you now. They are 50% more different than future you (assuming no convergence). After eight years, you are 4 links away, while they are only 5, which makes them 25% more different to you than you are.
Alternately, they have changed by 67% more, and you have changed by 100% of how much how distant they were from you at 4 years.
It thus seems like they have changed far less than you have, and are more similar to who they were, thus why should you treat them as having the same rate.

A gentle primer on caring, including in strange senses, with applications

deepthoughtlife2y30

I'm only a bit of the way in, and it is interesting so far, but it already shows signs of needing serious editing, and there are other ways it is clearly wrong too.

In 'The inequivalence of society-level and individual charity' they list the scenarios as 1, 1, and 2 instead of A, B, C, as they later use. Later, refers incorrectly to preferring C to A with different necessary weights when the second reference is is to prefer C to B.

The claim that money becomes utility as a log of the amount of money isn't true, but is probably close enough for this kind of use. You should add a note to the effect. (The effects of money are discrete at the very least).

The claim that the derivative of the log of y = 1/y is also incorrect. In general, log means either log base 10, or something specific to the area of study. If written generally, you must specify the base. (For instance, in Computer Science it is base-2, but I would have to explain that if I was doing external math with that.) The derivative of the natural log is 1/n, but that isn't true of any other log. You should fix that statement by specifying you are using ln instead of log (or just prepending the word natural).

Just plain wrong in my opinion, for instance, claiming that a weight can't be negative assumes away the existence of hate, but people do hate either themselves or others on occasion in non-instrumental ways, wanting them to suffer, which renders this claim invalid (unless they hate literally everyone).

I also don't see how being perfectly altruistic necessitates valuing everyone else exactly the same as you. I could still value others different amounts without being any less altruistic, especially if the difference is between a lower value for me and the others higher. Relatedly, it is possible to not care about yourself at all, but this math can't handle that.

I'll leave aside other comments because I've only read a little.

Any Utilitarianism Makes Sense As Policy

deepthoughtlife2y96

I strongly disagree. It would be very easy for a non-omnipotent, unpopular, government that has limited knowledge of the future, that will be overthrown in twenty years to do a hell of a lot of damage with negative utilitarianism, or any other imperfect utilitarianism. On a smaller scale, even individuals could do it alone.

A negative utilitarian could easily judge that something that had the side effect of making people infertile would cause far less suffering than not doing it, causing immense real world suffering amongst the people who wanted to have kids, and ending civilizations. If they were competent enough, or the problem slightly easier than expected, they could use a disease that did that without obvious symptoms, and end humanity.

Alternately, a utilitarian that valued the far future too much might continually cause the life of those around them to be hell for the sake of imaginary effects on said far future. They might even know those effects are incredibly unlikely, and that they are more likely to be wrong than right due to the distance, but it's what the math says, so...they cause a civil war. The government equivalent would be to conquer Africa (success not necessary for the negative effects, of course), or something like that, because your country is obviously better at ruling, and that would make the future brighter. (This could also be something done by a negative utilitarian to alleviate the long-term suffering of Africans).

Being in a limited situation does not automatically make Utilitarianism safe. (Nor any other general framework.) The specifics are always important.

LESSWRONG
LW

Posts

Wiki Contributions

Comments