If alignment is difficult, it is likely inductively difficult (difficult regardless of your base intelligence), and ASI will be cautious of creating a misaligned successor or upgrading itself in a way that risks misalignment.
You may argue it’s easier for an AI to upgrade itself, but if the process is hardware bound or even requires radical algorithmic changes, the ASI will need to create an aligned successor as preferences and values may not transfer directly to new architectures or hardwares.
If alignment is easy we will likely solve it with superhuman narrow intelligences and aligned near peak human level AGIs.
Why wouldn’t a wire head trap work?
Let’s say an AI has a remote sensor that measures a value function until the year 2100 and it’s RLed to optimize this value function over time. We can make this remote sensor easily hackable to get maximum value at 2100. If it understands human values, then it won’t try to hack its sensors. If it doesn’t we sort of have a trap for it that represents an easily achievable infinite peak.
Yes nothing is a guarantee in probabilities but can’t we just make it very easy for it to perfectly achieve its objective if it doesn’t go exactly the way we want it to, we just make an easier solution exist than disempowering us or wiping us out.
I guess in the long run we still select for models that ultimately don’t wirehead. But this might eliminate a lot of obviously wrong alignment failures we miss.
I was texting multiple Discords for a certain type of mental "heuristic", or "mental motion", or "badass thing that Planecrash!Keltham or HPMOR!Harry would think to themselves, in order to guide their own thoughts into fruitful and creative and smart directions". Someone commented that this could also be reframed as "language-model prompts to your own mind" or "language-model simulations of other people in your own mind".
I've decided to clarify what I meant, and why even smart people could benefit from seemingly hokey tricks like this.
Heuristics/LMs-of-oth...
(sources: discord chats on public servers) Why do I believe X?
What information do I already have, that could be relevant here?
What would have to be true such that X would be a good idea?
If I woke up tomorrow and found a textbook explaining how this problem was solved, what's paragraph 1?
What is the process by which X was selected?
How can I make predictions in a way which lets me do data analysis? I want to be able to grep / tag questions, plot calibration over time, split out accuracy over tags, etc. Presumably exporting to a CSV should be sufficient. PredictionBook doesn't have an obvious export feature, and its API seems to not be working right now / I haven't figured it out yet.
Trying to collate team shard's prediction results and visualize with plotly, but there's a lot of data processing that has to be done first. Want to avoid the pain in the future.
Thoughts on Apple Vision Pro:
That's true! However, I would feel weird and disruptive trying to ask ChatGPT questions when working alongside coworkers in the lab.
Saw some today demonstrating what I like to call the "Kirkegaard fallacy", in response to the Debrief article making the rounds.
People who have one obscure or weird belief tend to be unusually open minded and thus have other weird beliefs. Sometimes this is because they enter a feedback loop where they discover some established opinion is likely wrong, and then discount perceived evidence for all other established opinions.
This is a predictable state of affairs regardless of the nonconsensus belief, so the fact that a person currently talking to you about e.g. UFOs entertains other off-brand ideas like parapsychology or afterlives is not good evidence that the other nonconsensus opinion in particular is false.
I regret each of the thousands of hours I spent on my power-seeking theorems, and sometimes fantasize about retracting one or both papers. I am pained every time someone cites "Optimal policies tend to seek power", and despair that it is included in the alignment 201 curriculum. I think this work makes readers actively worse at thinking about realistic trained systems.
I think a healthy alignment community would have rebuked me for that line of research, but sadly I only remember about two people objecting that "optimality" is a horrible way of understanding trained policies.
"There are theoretical results showing that many decision-making algorithms have power-seeking tendencies."
I think this is reasonable, although I might say "suggesting" instead of "showing." I think I might also be more cautious about further inferences which people might make from this -- like I think a bunch of the algorithms I proved things about are importantly unrealistic. But the sentence itself seems fine, at first pass.
Idea: Github + Voting
The way Github works is geared towards how projects usually work - someone or some group owns them, or is responsible for them, and they have the ability to make decisions about it, then there are various level of permissions. If you can make a pull request, someone from the project's team has to accept it.
This works well for almost all projects, so I have no problem with it. What I'm going to suggest is more niche and solves a different problem. What if there's a project that you want to be completely public, that no one p...
I like this idea, because I'm too lazy to review pull requests. It would be great if other people could just review and vote on them for me :P
There may be no animal welfare gain to veganism
I remain unconvinced that there is any animal welfare gain to vegi/veganism, farm animals have a strong desire to exist and if we stopped eating them they would stop existing.
Vegi/veganism exists for reasons of signalling, it would be surprising if it had any large net benefits other than signalling.
On top of this, the cost to mitigate most of the aspects of farming that animals disprefer is likely vastly smaller than the harms to human health.
Back of the envelope calculation is that making farming highly pref...
I think this doesn't make sense any more now that veganism is such a popular and influential movement that influences government policy and has huge control over culture.
But a slightly different version of this is that because there's no signalling value in a collective decision to impose welfare standards, it's very hard to turn into a political movement. So we may be looking at a heavily constrained system.
Conservatism says "don't be first, keep everything the same." This is a fine, self-consistent stance.
A responsible moderate conservative says "Someone has to be first, and someone will be last. I personally want to be somewhere in the middle, but I applaud the early adopters for helping me understand new things." This is also a fine, self-consistent stance.
Irresponsible moderate conservatism endorses "don't be first, and don't be last," as a general rule, and denigrates those who don't obey it. It has no answers for who ought to be first and last. But for ...
One weird trick for estimating the expectation of Lognormally distributed random variables:
If you have a variable X that you think is somewhere between 1 and 100 and is Lognormally distributed, you can model it as being a random variable with distribution ~ Lognormal(1,1) - that is, the logarithm has a distribution ~ Normal(1,1).
What is the expectation of X?
Naively, you might say that since the expectation of log(X) is 1, the expectation of X is 10^1, or 10. That makes sense, 10 is at the midpoint of 1 and 100 on a log scale.
This is wrong though. The chanc...
Proposed Forecasting Technique: Annotate Scenario with Updates (Related to Joe's Post)
Science as a kind of Ouija board:
With the board, you do this set of rituals and it produces a string of characters as output, and then you are supposed to read those characters and believe what they say.
So too with science. Weird rituals, check. String of characters as output, check. Supposed to believe what they say, check.
With the board, the point of the rituals is to make it so that you aren't writing the output, something else is -- namely, spirits. You are supposed to be light and open-minded and 'let the spirit move you' rather than deliberately try ...
It's no longer my top priority, but I have a bunch of notes and arguments relating to AGI takeover scenarios that I'd love to get out at some point. Here are some of them:
Beating the game in May 1937 - Hoi4 World Record Speedrun Explained - YouTube
In this playthrough, the USSR has a brief civil war and Trotsky replaces Stalin. They then get an internationalist socialist type diplomat who is super popular with US, UK, and France, who negotiates passage of troops through their territory -- specifially, they send many many brigades of extremely low-tier troop...
epistemic status: I'm pretty sure that the described processes are part of me, but I don't know anything about how this might be applicable to other people (nervous systems).
AFAICT I do (at least) two distinct activities which I refer to as "thinking". The first type I'm calling "explicit computing" and it's what I'm doing when I manually multiply 2387*8230 or determine whether a DFA will accept some string or untangle some Python function that isn't behaving as I expect. The second type I'm calling "forced updating" and unlike explicit computi...
I’ll try my best, I’m by no means an expert. I don’t think there’s a one size fits all answer but let’s take your example of relationship between IQ and national prosperity. You can spend time researching what makes up prosperity and where that intersects with IQ and find different correlates between IQ and other attributes in individuals (the assumption being that individuals are a kind of unit to measure prosperity).
You can use spaced repetition to avoid burnout and gain fresh perspectives. The point is to build mental muscle memory and intuition on what...
In the past I had the thought: "probably there is no way to simulate reality that is more efficient than reality itself". That is, no procedure implementable in physical reality is faster than reality at the task of, given a physical state, computing the state after t physical ticks. This was motivated by intuitions about the efficiency of computational implementation in reality, but it seems like we can prove it by diagonalization (similarly to how we can prove two systems cannot perfectly predict each other), because the machine could in particular predi...
Can somebody convince me that the problem this site is trying to solve is even soluble?
I don't understand how an aligned machine intelligence is definitionally possible without some sort of mathematization of ethics/decision theory, which I know has been worked on. But any such correct mathematization, that is to say one that doesn't assert a falsehood or anything else from which seemingly anything is derivable, should amount to solving ethics entirely. It's the ethical equivalent of..."the meteor is coming and it will eradicate us all unless we come up wi...
Is it too pessimistic to point out that humans are constantly doing awful things to each other and their surroundings and have been for millennia? I mean, we torture and kill at least eleven moral patients per human per year. I don't understand how this solution doesn't immediately wipe out as much life as it can.
Part of your decision of "should I go on semaglutide/Wegovy/Ozempic?" would be influenced by whether it improves lifespan. Weight loss is generally good for that, but here's a decision market specifically about lifespan.
Since I've been getting downvoted for what have felt like genuine attempts to create less-corruptible information, please try to keep an open mind or explain why you downvote.
Is disempowerment that bad? Is a human directed society really much better than an AI directed society with a tiny weight of kindness towards humans? Human directed societies themselves usually create orthogonal and instrumental goals, and their assessment is highly subjective/relative. I don’t see how the disempowerment without extinction is that different from today to most people who are already effectively disempowered.
Is there a huge reason the latter is hugely different from the former for the average person excluding world leaders.