People are not being careful enough about what they mean when they say "simulator" and it's leading to some extremely unscientific claims. Use of the "superposition" terminology is particularly egregious.
I just wanted to put a record of this statement into the ether so I can refer back to it and say I told you so.
GPT-4 generated TL;DR (mostly endorsed but eh):
I think that the magnitude of the AI alignment problem has been ridiculously overblown & our ability to solve it widely underestimated.
I've been publicly called stupid before, but never as often as by the "AI is a significant existential risk" crowd.
That's OK, I'm used to it.
Devastating and utter communication failure?
Fixing the ticker-tape problem, or the disconnect between how we write and how we read
Between the tedious wash steps of the experiment I'm running, I've been tinkering with Python. The result is aiRead.
aiRead integrates the ideas about active reading I've accumulated over the last four years. Although its ChatGPT integration is its most powerful feature, this comment is about an insight I've gleaned by using its ticker-tape display feature.
Mostly, I sit down at my desk to read articles on my computer screen. I click a link, and what appears is a column of ...
As far as I can tell, people typically use the orthogonality thesis to argue that smart agents could have any motivations. But the orthogonality thesis is stronger than that, and its extra content is false - there are some goals that are too complicated for a dumb agent to have, because the agent couldn't understand those goals. I think people should instead directly defend the claim that smart agents could have arbitrary goals.
I no longer endorse this claim about what the orthogonality thesis says.
If I was going to give a talk in front of 200 people, it being 1 minute unnecessarily less consise wastes ~3 hours of the audience's time in total, so I should be willing to spend up to 3 hours to change that.
In 95%-ile isn't that good, Dan Luu writes:
Most people consider doing 30 practice runs for a talk to be absurd, a totally obsessive amount of practice, but I think Gary Bernhardt has it right when he says that, if you're giving a 30-minute talk to a 300 person audience, that's 150 person-hours watching your talk, so it's not obviously unreasonable to spend 15 hours practicing (and 30 practice runs will probably be less than 15 hours since you can cut a number of the runs short and/or repeatedly practice problem sections).
“On average, buildings that are being blasted with a firehose right now are significantly more likely to be on fire than the typical structure, but this does not mean we should ban fire departments as a clear fire hazard.” Byrne Hobart
Would it be possible to use a huge model (e.g. an LLM) to interpret smaller networks, and output human-readable explanations? Is anyone working on something along these lines?
I'm aware Kayla Lewis is working on something similar (but not quite the same thing) on a small scale. In my understanding, from reading her tweets, she's using a network to predict the outputs of another network by reading its activations.
In a previous post, I described my current alignment research agenda, formalizing abstractions of computations. One among several open questions I listed was whether unique minimal abstractions always exist. It turns out that (within the context of my current framework), the answer is yes.
I had a complete post on this written up (which I've copied below), but it turns out that the result is completely trivial if we make a fairly harmless assumption: The information we want the abstraction to contain is only a function of the output of the computation, not ...
I meant the general agenda. For abstract interpretation, I think the relevant point is that quotienting a state space is not necessarily a good way of expressing abstractions about it, for some sense of "abstraction" (the main thing I don't understand is the reasons for your choice of what to consider abstraction). Many things want a set of subspaces (like a topology, or a logic of propositions) instead of a partition, so that a point of the space doesn't admit a unique "abstracted value" (as in equivalence class it belongs to), but instead has many "abstr...
random thought: are the most useful posts typically karma approximately 10, and 40 votes to get there? what if it was possible to sort by controversial? maybe only for some users or something? what sorts of sort constraints are interesting in terms of incentivizing discussion vs agreement? blah blah etc
Why don't most AI researcher engage with Less Wrong? What valuable criticism can be learnt from it, and how can it be pragmatically changed?
My girlfriend just returned from a major machine learning conference. She judged less than 1/18 of the content was dedicated to AI safety rather than capability, despite an increasing number of the people at the conference being confident of AGI in the future (like, roughly 10-20 years, though people avoided nailing down a specific number). And the safety talk was more of a shower thought.
And yet, Less Wrong and ...
This is a huge practical issue that seems to not get enough thought, and I'm glad you're thinking about it. I agree with your summary of one way forward. I think there's another PR front; many educated people outside of the relevant fields are becoming concerned.
It sounds like the ML researchers at that conference are mostly familiar with MIRI style work. And they actually agree with Yudkowsky that it's a dead end. There's a newer tradition of safety work focused on deep networks. That's what you mostly see in the Alignment Forum. And it's what you see in ...
OpenAI says they are using ChatGPT 4 internally: "We’ve also been using GPT-4 internally, with great impact on functions like support, sales, content moderation, and programming. We also are using it to assist humans in evaluating AI outputs, starting the second phase in our alignment strategy." https://openai.com/research/gpt-4
Does this mean what I think it means? That they are using this AI to analyse and optimise the code the AI themself run on? Does anyone know if OpenAI have confirmed or denied this, or given information on safeguards that are in plac...
"We also are using it to assist humans in evaluating AI outputs, starting the second phase in our alignment strategy."
Probably something along the lines of RLAIF? Anthropic's Claude might be more robustly tuned because of this, though GPT-4 might already have similar things as part of its own training.
Does severe vitamin C deficiency (i.e. scurvy) lead to oxytocin depletion?
According to Wikipedia
The activity of the PAM enzyme [necessary for releasing oxytocin fromthe neuron] system is dependent upon vitamin C (ascorbate), which is a necessary vitamin cofactor.
I.e. if you don't have enough vitamin C, your neurons can't release oxytocin. Common sensically, this should lead to some psychological/neurological problems, maybe with empathy/bonding/social cognition?
Quick googling "scurvy mental problems" or "vitamin C deficiency mental symptoms" doesn't r...
That's also what this meta-analysis found but I was mostly wondering about social cognition deficits (though looking back I see it's not clear in the original shortform)
If you find yourself in or near Williamsburg, Virginia on 2023/04/09 come join for a Virginia Rationalists Meetup and the Williamsburg 2nd Sundays Art & Music Festival.
This week was a bit overwhelming in AI news, with GPT-4 releasing, new Midjourney, Stanford's Alpaca, more AI offerings from Google, Microsoft CoPilot 365, and honestly a bunch more things. I've spent too much time already talking with the GPT-4 version of ChatGPT given how long it's actually been available...
Arbitrary incompleteness invites gameability, and arbitrary specificity invites exceptioncraft.
Yeah, I suspect we mostly agree, and I apologize for looking to find points of contention.
Could someone please ELI5 why using a CNOT gate (if the target qubit was initially zero) does not violate the no-cloning theorem?
Oh, I think I got it. The forbidden thing is to have a state "copied and not entangled". CNOT gate creates a state that is "copied and entangled", which is okay, because you can only measure it once (if you measure either the original or the copy, the state of the other one collapses). The forbidden thing is to have a copy that you could measure independently (e.g. you could measure the copy without collapsing the original).
I recommend this article by the discoverers of the no-cloning theorem for a popular science magazine over the Wikipedia page for anyone trying to understand it.
Are we heading towards an new financial crisis?
Mark to market changes since 2009, combined with the recent significant interest hikes, seems to make bank balance sheets "unreliable".
Mark to market changes broadly means that banks can have certain assets on their balance sheet, and the value of the asset is set via mark to model (usually meaning its marked down as worth face value).
Banks traditionally have a ton of bonds on their balance sheet, and a lot of those are governed by mark to model and not mark to market.
Interest rates go up a lot, which leads to...
See the barter argument. Also yeah the Fed will probably issue a new note for 10B which removes exactly that from the economy.
Academic philosophers are better than average at evaluating object-level arguments for some claim. They don't seem to be very good at thinking about what rationalization in search implies about the arguments that come up. Compared to academic philosophers, rationalists strike me as especially appreciating filtered evidence and its significance to your world model.
If you find an argument for a claim easily, then even if that argument is strong, this (depending on some other things) implies that similarly strong arguments on the other side may turn up with n...
Walk me through a through a structured, superforecaster-like reasoning process of how likely it is that [X]. Define and use empirically testable definitions of [X]. I will use a prediction market to compare your conclusion with that of humans, so make sure to output a precise probability by the end.
This post by Jeffrey Ladish was a pretty motivating read: https://www.facebook.com/jeffladish/posts/pfbid02wV7ZNLLNEJyw5wokZCGv1eqan6XqCidnMTGj18mQYG1ZrnZ2zbrzH3nHLeNJPxo3l
Also posted on his shortform :) https://www.lesswrong.com/posts/fxfsc4SWKfpnDHY97/landfish-lab?commentId=jLDkgAzZSPPyQgX7i