I do agree with your rephrasing. That is exactly what I mean (though with a different emphasis.).
I agree with you. The biggest leap was going to human generality level for intelligence. Humanity already is a number of superintelligences working in cooperation and conflict with each other; that's what a culture is. See also corporations and governments. Science too. This is a subculture of science worrying that it is superintelligent enough to create a 'God' superintelligence.
To be slightly uncharitable, the reason to assume otherwise is fear -either their own or to play on that of others. Throughout history people have looked for reasons why civilizat...
Honestly Illusionism is just really hard to take seriously. Whatever consciousness is, I have better evidence it exists than anything else since it is the only thing I actually experience directly. I should pretend it isn't real...why exactly? Am I talking to slightly defective P-zombies?
If the computer emitted it for the same reasons...is a clear example of a begging the question fallacy. If a computer claimed to be conscious because it was conscious, then it logically has to be conscious, but that is the possible dispute in the first place. If you claim ...
As individuals, Humans routinely do things much too hard for them to fully understand successfully. This is due partly due to innately hardcoded stuff (mostly for things we think are simple like vision and controlling our bodies automatic systems), and somewhat due to innate personality, but mostly due to the training process our culture puts us through (for everything else).
For its part, cultures can take the inputs of millions to hundreds of millions of people (or even more when stealing from other cultures), and distill them into both insights and pract...
I'm hardly missing the point. It isn't impressive to have it be exactly 75%, not more or less, so the fact that it can't always be that is irrelevant. His point isn't that that particular exact number matters, it's that the number eventually becomes very small. But since the number being very small compared to what it should be does not prevent it from being made smaller by the same ratio, his point is meaningless. It isn't impressive to fulfill an obvious bias toward updating in a certain direction.
It doesn't take many people to cause these effects. If we make them 'the way', following them doesn't take an extremist, just someone trying to make the world better, or some maximizer. Both these types are plenty common, and don't have to make it fanatical at all. The maximizer could just be a small band of petty bureaucrats who happen to have power over the area in question. Each one of them just does their role, with a knowledge that it is to prevent overall suffering. These aren't even the kind of bureaucrats we usually dislike! They are also monsters, because the system has terrible (and knowable) side effects.
I don't have much time, so:
While footnote 17 can be read as applying, it isn't very specific.
For all that you are doing math, this isn't mathematics, so base needs to be specified.
I am convinced that people really do give occasional others a negative weight.
And here are some notes I wrote while finishing the piece (that I would have edited and tightened up a a lot)(it's a bit all over the place):
This model obviously assumes utilitarianism.
Honestly, their math does seem reasonable to account for people caring about other people (as long as they care about t...
I'm only a bit of the way in, and it is interesting so far, but it already shows signs of needing serious editing, and there are other ways it is clearly wrong too.
In 'The inequivalence of society-level and individual charity' they list the scenarios as 1, 1, and 2 instead of A, B, C, as they later use. Later, refers incorrectly to preferring C to A with different necessary weights when the second reference is is to prefer C to B.
The claim that money becomes utility as a log of the amount of money isn't true, but is probably close enough for this kind of u...
I strongly disagree. It would be very easy for a non-omnipotent, unpopular, government that has limited knowledge of the future, that will be overthrown in twenty years to do a hell of a lot of damage with negative utilitarianism, or any other imperfect utilitarianism. On a smaller scale, even individuals could do it alone.
A negative utilitarian could easily judge that something that had the side effect of making people infertile would cause far less suffering than not doing it, causing immense real world suffering amongst the people who wanted to ha...
A lot of this depends on your definition of doomsday/apocalypse. I took it to mean the end of humanity, and a state of the world we consider worse than our continued existence. If we valued the actual end state of the world more than continuing to exist, it would be easy to argue it was a good thing, and not a doom at all. (I don't think the second condition is likely to come up for a very long time as a reason for something to not be doomsday.) For instance, if each person created a sapient race of progeny that weren't human, but they valued as their own ...
Interactionism would simply require an extension of physics to include the interaction between the two, which would not defy physics any more than adding the strong nuclear force did. You can hold against it that we do not know how it works, but that's a weak point because there are many things where we still don't know how they work.
Epiphenomenalism seems irrelevant to me since it is simply a way you could posit things to be. A normal dualist ignores the idea because there is no reason to posit it. We can obviously see how consciousness has effects on the...
If they didn't accept physical stuff as being (at least potentially) equal to consciousness they actually wouldn't be a dualist. Both are considered real things, and though many have less confidence in the physical world, they still believe in it as a separate thing. (Cartesian dualists do have the least faith in the real world, but even they believe you can make real statements about it as a separate thing.) Otherwise, they would be a 'monist'. The 'dual' is in the name for a reason.
This is clearly correct. We know the world through our observations, which clearly occur within our consciousness, and are thus at least equally proving our consciousness. When something is being observed, you can assume that the something else doing the observations must exist. If my consciousness observes the world, my consciousness exists. If my consciousness observes itself, my consciousness exists. If my consciousness is viewing only hallucinations, it still exists for that reason. I disagree with Descartes, but 'I think therefore I am' is true of logical necessity.
I do not like immaterialism personally, but it is more logically defensible that illusionism.
The description and rejection given of dualism are both very weak. Also, dualism is a much broader group of models than is admitted here.
The fact is, we only have direct evidence of the mind, and everything else is just an attempt to explain certain regularities. An inability to imagine that the mind could be all that exists is clearly just a willful denial, and not evidence, but notably, dualism does not require nor even suggest that the mind is all there is, just that it is all we have proof of (even in the cartesian variant). Thus, dualism.
Your personal...
I was replying to someone asking why it isn't 2-5 years. I wasn't making an actual timeline. In another post elsewhere on the sight, I mention that they could give memory to a system now and it would be able to write a novel.
Without doing so, we obviously can't tell how much planning they would be capable of if we did, but current models don't make choices, and thus can only be scary for whatever people use them for, and their capabilities are quite limited.
I do believe that there is nothing inherently stopping the capabilities researchers from switching o...
You're assuming that it would make sense to have a globally learning model, one constantly still training, when that drastically increases the cost of running the model over present approaches. Cost is already prohibitive, and to reach that many parameters any time soon exorbitant (but that will probably happen eventually). Plus, the sheer amount of data necessary for such a large one is crazy, and you aren't getting much data per interaction. Note that Chinchilla recently showed that lack of data is a much bigger issue right now for models than lack of pa...
You're assuming that the updates are mathematical and unbiased, which is the opposite of how people actually work. If your updates are highly biased, it is very easy to just make large updates in that direction any time new evidence shows up. As you get more sure of yourself, these updates start getting larger and larger rather than smaller as they should.
That sort of strategy only works if you can get everyone to coordinate around it, and if you can do that, you could probably just get them to coordinate on doing the right things. I don't know if HR would listen to you if you brought your concerns directly to them, but they probably aren't harder to persuade on that sort of thing than convincing the rest of your fellows to defy HR. (Which is just a guess.) In cases where you can't get others to coordinate on it, you are just defecting against the group, to your own personal loss. This doesn't seem like a g...
That does sound problematic for his views if he actually holds these positions. I am not really familiar with him, even though he did write the textbook for my class on AI (third edition) back when I was in college. At that point, there wasn't much on the now current techniques and I don't remember him talking about this sort of thing (though we might simply have skipped such a section).
You could consider it that we have preferences on our preferences too. It's a bit too self-referential, but that's actually a key part of being a person. You could determin...
If something is capable of fulfilling human preferences in its actions, and you can convince it to do so, you're already most of the way to getting it to do things humans will judge as positive. Then you only need to specify which preferences are to be considered good in an equally compelling manner. This is obviously a matter of much debate, but it's an arena we know a lot about operating in. We teach children these things all the time.
If they chose to design it with effective long term memory, and a focus on novels, (especially prompting via summary) maybe it could write some? They wouldn't be human level, but people would be interested enough in novels on a whim to match some exact scenario that it could be valuable. It would also be good evidence of advancement, since that is a huge current weakness (the losing track of things.).
I would like to point out that what johnswentworth said about being able to turn off an internal monologue is completely true for me as well. My internal monologue turns itself on and off several (possibly many) times a day when I don't control it, and it is also quite easy to tell it which way to go on that. I don't seem to be particularly more or less capable with it on or off, except on a very limited number of tasks. Simple tasks are easier without it, while explicit reasoning and storytelling are easier with it. I think my default is off when I'm not worried (but I do an awful lot of intentional verbal daydreaming and reasoning about how I'm thinking too.).
So the example given to decry a hypothetical, obviously bad situation applies even better to what they're proposing. It's every bit the same coercion as they're decrying, but with less personal benefit and choice (you get nothing out of this deal.). And they admit this? This is self-refuting.
Security agencies don't have any more reason to compete on quality than countries do, it's actually less, because they have every bit as much force, and you don't really have any say. What, you're in the middle of a million people with company A security, and you think you can pick B and they'll be able to do anything?
Except that is clearly not real anarchy. It is a balance of power between the states. The states themselves ARE the security forces in this proposal. I'm saying that they would conquer everyone who doesn't belong to one.
Anarchists always miss the argument from logical necessity, which I won't actually make because it is too much effort, but in summary, politics abhors a vacuum. If there is not a formal power you must consent to, there will be an informal one. If there isn't an informal one, you will shortly be conquered.
In these proposals, what is to stop these security forces from simply conquering anyone and everyone that isn't under the protection of one? Nothing. Security forces have no reason to fight each other to protect your right not to belong to one. And they wi...
The examples used don't really seem to fit with that though. Blind signatures are things many/most people haven't heard of, and not how things are done; I freely admit I had never heard of them before the example. Your HR department probably shouldn't be expected to be aware of all the various things they could do, as they are ordinary people. Even if they knew what blind signatures were, that doesn't mean it is obvious they should use them, or how to do so even if they thought they should (which you admit). After reading the Wikipedia article, that doesn'...
I am aware of the excuses used to define it as not hearsay, even though it is clearly the same as all other cases of such. Society simply believes it is a valuable enough scenario that it should be included, even though it is still weak evidence.
I was pretty explicit that scale improves things and eventually surpasses any particular level that you get to earlier with the help of domain knowledge...my point is that you can keep helping it, and it will still be better than it would be with just scale. MuZero is just evidence that scale eventually gets you to the place you already were, because they were trying very hard to get there and it eventually worked.
AlphaZero did use domain insights. Just like AlphaGo. It wasn't self-directed. It was told the rules. It was given a direct way to play games, a...
The thing is, no one ever presents the actual strongest version of an argument. Their actions are never the best possible, except briefly, accidentally, and in extremely limited circumstances. I can probably remember how to play an ideal version of the tic-tac-toe strategy that's the reason only children play it, but any game more complicated than that and my play will be subpar. Games are much smaller and simpler things than arguments. Simply noticing that an argument isn't the best it could is a you thing, because it is always true. Basically no one is a...
Even if they were somehow extremely beneficial normally (which is fairly unlikely), any significant risk of going insane seems much too high. I would posit they have such a risk for exactly the same reason -when using them, you are deliberately routing around very fundamental safety features of your mind.
Donald Hobson has a good point about goodharting, but it's simpler than that. While some people want alignment so that everyone doesn't die, the rest of us still want it for what it can do for us. If I'm prompting a language model with "A small cat went out to explore the world" I want it to come back with a nice children's story about a small cat that went out to explore the world that I can show to just about any child. If I prompt a robot that I want it to "bring me a nice flower" I do not want it to steal my neighbor's rosebushes. And so on, I want it to be safe to give some random AI helper whatever lazy prompt is on my mind and have it improve things by my preferences.
Stop actively looking (though keep your ears open) when you have thoroughly researched two things:
First, the core issues that that could change your mind about who you'd think you should vote for. This is not about the candidates themselves.
Then, the candidates or questions on the ballot themselves. For candidates: Are they trustworthy? Where do they fall on the issues important to you? Do they implement these issues properly? Will they get things done? Are there issues they bring up that you would have included? If so, look at those issu...
That seems like the wrong take away to me. Why do we change our minds so little?
1.)Some of our positions are just right.
2.)We are wrong, but we don't take the time and effort to understand why we should change our minds.
We don't know which situation we are in beforehand, but if you change your mind less than you think you do, doesn't that mean you think you are often wrong? And that you are wrong about how you respond to it?
You could try to figure out what possible things would get you to change your mind, and look directly at those things, trying to fully understand whether they are or are not the way that would change things.
This is well written, easy to understand, and I largely agree that instilling a value like love for humans in general (as individuals) could deal with an awful lot of failure modes. It does so amongst humans already (though far from perfectly).
When there is a dispute, rather than optimizing over smiles in a lifetime (a proxy of long-term happiness), preferable is obviously something more difficult like, if the versions of the person in both the worlds where it did and did not happen would end up agreeing that it is better to have happened, and that it woul...
Understanding what parts of an argument you dislike are actually something you can agree with seems like a valuable thing to keep in mind. The post is well written and easy to understand too. I probably won't do this any more than I already do though.
What I try to do isn't so different, just less formal. I usually simply agree or disagree directly on individual points that come up through trying to understand things in general. I do not usually keep in mind what the current score is of agreement or disagreement is, and that seems to help not skew things to...
The entire thing I wrote is that marrying human insights, tools, etc with the scale increases leads to higher performance, and shouldn't be discarded, not that you can't do better with a crazy amount of resources than a small amount of resources and human insight.
Much later, with much more advancement, things improve. Two years after the things AlphaGo was famous for happened, they used scale to surpass it, without changing any of the insights. Generating games against itself is not a change to the fundamental approach in a well defined game like Go....
Yes, though I'm obviously arguing against what I think it means in practice, and how is it used, not necessarily how it was originally formulated. I've always thought it was the wrong take on AI history, tacking much too hard toward scale based approaches, and forgetting the successes of other methods could be useful too, as an over-correction from when people made the other mistake.
I think your response shows I understood it pretty well. I used an example that you directly admit is against what the bitter lesson tries to teach as my primary example. I also never said anything about being able to program something directly better.
I pointed out that I used the things people decided to let go of so that I could improve the results massively over the current state of the machine translation for my own uses, and then implied we should do things like give language models dictionaries and information about parts of speech that it can use as...
In a lot of ways, this is similar to the 'one weird trick' style of marketing so many lampoon. Assuming that you summarized Kuhn and Feyerabend correctly, it looks like: one weird trick to solve all of science, Popper: 'just falsify things'; Kuhn: 'just find Kantian paradigms for the field'; Feyerabend: 'just realize the only patterns are that there aren't any patterns.'
People like this sort of thing because its easy to understand (and I just made the same sort of simplification for this sentence.). Science is hard to get right, and people can only keep in...
My memories of childhood aren't that precise. I don't really know what my childhood state was? Before certain extremely negative things happened to my psyche, that is. There are only a few scattered pieces I recall, like self-sufficiency and honesty being important, but these are the parts that already survived into my present political and moral beliefs.
The only thing I could actually use is that I was a much more orderly person when I was 4 or 5, but I don't see how it would work to use just that.
I can't say I'm surprised a utilitarian doesn't realize how vague it sounds? It is a jargon taken from a word that simply means ability to be used widely? Utility is an extreme abstraction, literally unassignable, and entirely based on guessing. You've straightforwardly admitted that it doesn't have an agreed upon basis. Is it happiness? Avoidance of suffering? Fulfillment of the values of agents? Etc.
Utilitarians constantly talk about monetary situations, because that is one place they can actually use it and get results? But there, it's hardly diff...
You probably don't agree with this, but if I understand what you're saying, utilitarians don't really agree on anything or really have shared categories? Since utility is a nearly meaningless word outside of context due to broadness and vagueness, and they don't agree on anything about it, Utilitarianism shouldn't really be considered a thing itself? Just a collection of people who don't really fit into the other paradigms but don't rely on pure intuitions. Or in other words, pre-paradigmatic?
The easy first step, is a simple bias toward inaction, which you can provide with a large punishment per output of any kind. For instance, a language model with this bias would write out something extremely likely, and then stop quickly thereafter. This is only a partial measure, of course, but it is a significant first step.
Second through n-th step, harder, I really don't even know, how do you figure out what values to try to train it with to reduce impact. The immediate things I can think of might also train deceit, so it would take some thought.
Al...
You seem to have straw-manned your ideological opponents. Your claims are neither factually accurate, nor charitable. They don't point you in a useful direction either. Obviously, conservatives can be very wrong, but your assumptions seem unjustified.
And what are you going to claim your opponents do? In the US, rightists claim they will lower taxes, which they do. They claim they will reduce regulation, which they do. They claim they will turn back whatever they deem the latest outrage...which has mixed results. They explain why all of this is good by refe...
I had a thought while reading the section on Goodharting. You could fix a lot of the potential issues with an agentic AI by training it to want its impact on the world to be within small bounds. Give it a strong and ongoing bias toward 'leave the world the way I found it.' This could only be overcome by a very clear and large benefit toward its other goals per small amount of change. It should not be just part of the decision making process though, but part of the goal state. This wouldn't solve every possible issue, but it would solve a lot of them. In other words, make it unambitious and conservative, and then its interventions will be limited and precise if it has a good model of the world.
The obvious thing to keep in mind is that people dislike 'inauthentic' politicians but there are many things the people want in politicians. If a politician wants to portray a certain image to match what people like, they have to pick the policy positions that go along with it to avoid said label. Said images are not so evenly or finely distributed as policy positions, so that tends to lead to significant differences along these axes.
Then, over time, the parties accrete their own new images too, and people have to position themselves in relation to those too...
No. That's a foolish interpretation of domain insight. We have a massive number of highly general strategies that nonetheless work better for some things than others. A domain insight is simply some kind of understanding involving the domain being put to use. Something as simple as whether to use a linked list or an array can use a minor domain insight. Whether to use a monte carlo search or a depth limited search and so one are definitely insights. Most advances in AI to this point have in fact been based on domain insights, and only a small amount on sca... (read more)