I had to read some Lacan in college, putatively a chunk that was especially influential on the continental philosophers we were studying.
Same. I am seeing a trend where rats who had to spend time with this stuff in college say, "No, please don't go here it's not worth it." Then get promptly ignored.
The fundamental reason this stuff is not worth engaging with is because it's a Rorschach. Using this stuff is a verbal performance. We can make analogies to Tarot cards but in the end we're just cold reading our readers.
Lacan and his ilk aren't some low hanging source of zero day mind hacks for rats. Down this road lies a quagmire, which is not worth the effort to traverse.
Thanks for the additions here. I'm also unsure how to gel this definition (which I quite like) with the inner/outer/mesa terminology. Here is my knuckle dragging model of the post's implication:
target_set = f(env, agent)
So if we plug in a bunch of values for
agent and hope for the best, the
target_set we get might might not be what we desired. This would be misalignment. Whereas the alignment task is more like to fix
env and solve for
The stuff about mesa optimisers mainly sounds like inadequate (narrow) modelling of what
Capital gains has important differences to wealth tax. It's a tax on net-wealth-disposed-of-in-a-tax-year, or perhaps the last couple for someone with an accountant.
So your proverbial founder isn't taxed a penny until they dispose of their shares.
Someone sitting on a massive pile of bonds won't be paying capital gains tax, but rather enjoying the interest on them.
I was glad to read a post like this!
The following is as much a comment about EA as it is about rationality:
"My self-worth is derived from my absolute impact on the world-- sometimes causes a vicious cycle where I feel worthless, make plans that take that into account, and feel more worthless."
If you are a 2nd year undergraduate student, this is a very high bar to set.
First impact happens downstream, so we can't know our impact for sure until later. Depending on what we do, until possibly after we are dead.
Second, on the assumption that impact is uncertain,...
The description of a particular version of expected utility theory feels very particular to me.
Utility is generally expressed as a function of a random variable. Not as a function of an element from the sample space.
For instance: suppose that my utility is linear in the profit or loss from the following game. We draw one bit from /dev/random. If it is true, I win a pound, else I lose one.
Utility is not here a function of 'the configuration of the universe'. It is a function of a bool. The bool itself may depend on (some subset of) 'the configuration of the universe' but reality maps universe to bool for us, computability be damned.
Just observing that the answer to this question should be more or less obvious from a histogram (assuming large enough N and a sufficient number of buckets), "Is there a substantial discontinuity at the 2% quantile?"
Power law behaviour is not necessary and arguably not sufficient for "superforecasters are a natural category" to win (e.g. it should win in a population in which 2% have a brier score of zero and the rest 1, which is not a power law).
I like this idea generally.
Here is an elaboration on a theme I was thinking of running in a course:
If they could have a single yes / no question answered on the topic, what should most people ask?
The idea being to get people to start thinking about what the best way to probe for more information is when "directly look up the question's answer" is not an option.
This isn't something that can be easily operationalized on a large scale for examination. It is an exercise that could work in small groups.
One way to operationalize would be to construct the group a...
:D If I could write the right 50-80 words of code per minute my career would be very happy about it.
The human-off-button doesn't help Russell's argument with respect to the weakness under discussion.
It's the equivalent of a Roomba with a zap obstacle action. Again the solution is to dial theta towards the target and hold the zap button assuming free zaps. It still has a closed form solution that couldn't be described as instrumental convergence.
Russell's argument requires a more complex agent in order to demonstrate the danger of instrumental convergence rather than simple industrial machinery operation.
Isnasene's point above is closer to that, but tha...
This misses the original point. The Roomba is dangerous, in the sense that you could write a trivial 'AI' which merely gets to choose angle to travel along, and does so irregardless of grandma in the way.
But such an MDP not going to pose an X-risk. You can write down the objective function (y - x(theta))^2 differentiate wrt theta. Follow the gradient and you'll never end up at an AI overlord. Such a system lacks any analogue of opposable thumbs, memory and a good many other things.
Pointing at dumb industrial machinery operating around civilians and saying...
Lots of good points here, thanks.
My overall reaction is that:
The corrigibility framework does look like a good framework to hang the discussion on.
Your instruction to examine Y-general danger rather than X-specific danger here seems right. However, we then need to inspect what this means for the original argument. The Russell criticism being that it's blindingly obvious that an apparently trivial MDP is massively risky.
After this detour we see different kinds of risks: industrial machinery operation, and existential risk. The fixed objective, hard-coded, h...
The Promethean Servant doesn't have to be able to generate all those answers. If we could hardcode all of those and programmed it to never make decisions related to them, it would still be dangerous. For instance, if it thought "Fetching coffee is easier when more coffee is nearby->Coffee is most nearby when everything is coffee->convert all possible resources into coffee to maximize fetching").
We have to imagine a system not specifically designed to fetch the coffee that happens to be instructed to 'fetch the coffee'. Everything to do with the un...
I think the second robot you're talking about isn't the candidate for the AGI-could-kill-us-all level alignment concern. It's more like a self driving car that could hit someone due to inadequate testing.
Guess I'm not sure though how many answers to our questions you envisage the agent you're describing generating from second principles. That's the nub here because both the agents I tried to describe above fit the bill of coffee fetching, but with clearly varying potential for world-ending generalisation.
I'm not seeing that much here to rule out an alternative summary: get born into a rich, well-connected family.
Now, I'm not a historian, but iirc private tutoring was very common in the gentry/aristocracy 200 years back. So most of the UK examples might not say that much other than this person was default educated for class+era.
Virginia Woolf is an interesting case in point, as she herself wrote, “A woman must have money and a room of her own if she is to write fiction.”