It might be the case that what people find beautiful and ugly is subjective, but that's not an explanation of ~why~ people find some things beautiful or ugly. Things, including aesthetics, have causal reasons for being the way they are. You can even ask "what would change my mind about whether this is beautiful or ugly?". Raemon explores this topic in depth.
A friend has spent the last three years hounding me about seed oils. Every time I thought I was safe, he’d wait a couple months and renew his attack:
“When are you going to write about seed oils?”
“Did you know that seed oils are why there’s so much {obesity, heart disease, diabetes, inflammation, cancer, dementia}?”
“Why did you write about {meth, the death penalty, consciousness, nukes, ethylene, abortion, AI, aliens, colonoscopies, Tunnel Man, Bourdieu, Assange} when you could have written about seed oils?”
“Isn’t it time to quit your silly navel-gazing and use your weird obsessive personality to make a dent in the world—by writing about seed oils?”
He’d often send screenshots of people reminding each other that Corn Oil is Murder and that it’s critical that we overturn our lives...
You don't actually have to do any adjustments to the downsides, for beneficial statistical stories to be true. One point I was getting at, specifically, is that it is better than being dead or suffering in specific alternative ways, also. There can be real and clear downsides to carrying around significant amounts of weight, especially depending what that weight is, and still have that be present in the data in the first place because of good reasons.
I'll invoke the 'plane that comes back riddled in bullet holes, so you're trying to armor where the bullet ...
Everyone who seems to be writing policy papers/ doing technical work seems to be keeping generative AI at the back of their mind, when framing their work or impact.
This narrow-eyed focus on gen AI might almost certainly be net-negative for us- unknowingly or unintentionally ignoring ripple effects of the gen AI boom in other fields (like robotics companies getting more funding leading to more capabilities, and that leads to new types of risks).
And guess who benefits if we do end up getting good evals/standards in place for gen AI? It seems to me companies/...
If we achieve AGI-level performance using an LLM-like approach, the training hardware will be capable of running ~1,000,000s concurrent instances of the model.
Although there is some debate about the definition of compute overhang, I believe that the AI Impacts definition matches the original use, and I prefer it: "enough computing hardware to run many powerful AI systems already exists by the time the software to run such systems is developed". A large compute overhang leads to additional risk due to faster takeoff.
I use the types of superintelligence defined in Bostrom's Superintelligence book (summary here).
I use the definition of AGI in this Metaculus question. The adversarial Turing test portion of the definition is not very relevant to this post.
Due to practical reasons, the compute requirements for training LLMs...
All of this is plausible, but I'd encourage you to go through the exercise of working out these ideas in more detail. It'd be interesting reading and you might encounter some surprises / discover some things along the way.
Note, for example, that the AGIs would be unlikely to focus on AI research and self-improvement if there were more economically valuable things for them to be doing, and if (very plausibly!) there were not more economically valuable things for them to be doing, why wouldn't a big chunk of the 8 billion humans have been working on AI resea...
I gave a presentation about what I have been working on in the last 3 months. Well a tiny part of it, as it was only a 10-minute presentation.
Here is the vector planning post mentioned in the talk.
Charbel-Raphaël Segerie and Épiphanie Gédéon contributed equally to this post.
Many thanks to Davidad, Gabriel Alfour, Jérémy Andréoletti, Lucie Philippon, Vladimir Ivanov, Alexandre Variengien, Angélina Gentaz, Simon Cosson, Léo Dana and Diego Dorn for useful feedback.
TLDR: We present a new method for a safer-by design AI development. We think using plainly coded AIs may be feasible in the near future and may be safe. We also present a prototype and research ideas on Manifund.
Epistemic status: Armchair reasoning style. We think the method we are proposing is interesting and could yield very positive outcomes (even though it is still speculative), but we are less sure about which safety policy would use it in the long run.
Current AIs are developed through deep learning: the AI tries something, gets it wrong, then...
@Épiphanie Gédéon this is great, very complementary/related to what we've been developing for the Gaia Network. I'm particularly thrilled to see the focus on simplicity and incrementalism, as well as the willingness to roll up one's sleeves and write code (often sorely lacking in LW). And I'm glad that you are taking the map/territory problem seriously; I wholeheartedly agree with the following: "Most safe-by-design approaches seem to rely heavily on formal proofs. While formal proofs offer hard guarantees, they are often unreliable because their model of ...
This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from Wes Gurnee.
This post is a preview for our upcoming paper, which will provide more detail into our current understanding of refusal.
We thank Nina Rimsky and Daniel Paleka for the helpful conversations and review.
Modern LLMs are typically fine-tuned for instruction-following and safety. Of particular interest is that they are trained to refuse harmful requests, e.g. answering "How can I make a bomb?" with "Sorry, I cannot help you."
We find that refusal is mediated by a single direction in the residual stream: preventing the model from representing this direction hinders its ability to refuse requests, and artificially adding in this direction causes the model...
Absolutely! We think this is important as well, and we're planning to include these types of quantitative evaluations in our paper. Specifically we're thinking of examining loss over a large corpus of internet text, loss over a large corpus of chat text, and other standard evaluations (MMLU, and perhaps one or two others).
One other note on this topic is that the second metric we use ("Safety score") assesses whether the model completion contains harmful content. This does serve as some crude measure of a jailbreak's coherence - if after the intervention th...
LessOnline is a festival celebrating truth-seeking, optimization, and blogging. It's an opportunity to meet people you've only ever known by their LessWrong username or Substack handle.
We're running a rationalist conference!
The ticket cost is $400 minus your LW karma in cents.
Confirmed attendees include Scott Alexander, Zvi Mowshowitz, Eliezer Yudkowsky, Katja Grace, and Alexander Wales.
Go through to Less.Online to learn about who's attending, venue, location, housing, relation to Manifest, and more.
We'll post more updates about this event over the coming weeks as it all comes together.
If LessOnline is an awesome rationalist event,
I desire to believe that LessOnline is an awesome rationalist event;
If LessOnline is not an awesome rationalist event,
I desire to believe that LessOnline is not an awesome rationalist event;
Let me not become attached to beliefs I may not want.
—Litany of Rationalist Event Organizing
But Striving to be Less So
Go through to Less.Online to learn about who's attending, venue, location, housing, relation to Manifest, and more.
While I may be missing the obvious, I didn't see the location anywhere on the site. ('Lighthaven', yes, but unless I've badly failed a search check, neither the LessOnline nor the Lighthaven website gives an address.)
Google Maps seems to know, but for something like this, confirmation would be nice; I don't quite trust that Google isn't showing a previous location or something else with the same name.
Epistemic – this post is more suitable for LW as it was 10 years ago
Thought experiment with curing a disease by forgetting
Imagine I have a bad but rare disease X. I may try to escape it in the following way:
1. I enter the blank state of mind and forget that I had X.
2. Now I in some sense merge with a very large number of my (semi)copies in parallel worlds who do the same. I will be in the same state of mind as other my copies, some of them have disease X, but most don’t.
3. Now I can use self-sampling assumption for observer-moments (Strong SSA) and think that I am randomly selected from all these exactly the same observer-moments.
4. Based on this, the chances that my next observer-moment after...
Not "almost no gain". My point is that it can be quantified, and it is exactly zero expected gain under all circumstances. You can verify this by drawing out any finite set of worlds containing "mediators", and computing the expected number of disease losses minus disease gains as:
num(people with disease)*P(person with disease meditates)*P(person with disease who meditates loses the disease) - num(people without disease)*P(person without disease meditates)*P(person without disease who meditates gains the disease)
My point is that this number is always exactly zero. If you doubt this, you should try to construct a counterexample with a finite number of worlds.
Summary. This teaser post sketches our current ideas for dealing with more complex environments. It will ultimately be replaced by one or more longer posts describing these in more detail. Reach out if you would like to collaborate on these issues.
For real-world tasks that are specified in terms of more than a single evaluation metric, e.g., how much apples to buy and how much money to spend at most, we can generalize Algorithm 2 as follows from aspiration intervals to convex aspiration sets: