This post is a not a so secret analogy for the AI Alignment problem. Via a fictional dialog, Eliezer explores and counters common questions to the Rocket Alignment Problem as approached by the Mathematics of Intentional Rocketry Institute.
MIRI researchers will tell you they're worried that "right now, nobody can tell you how to point your rocket’s nose such that it goes to the moon, nor indeed any prespecified celestial destination."
The April 2024 Meetup will be April 27th at Bold Monk at 2:00 PM
We return to Bold Monk brewing for a vigorous discussion of rationalism and whatever else we deem fit for discussion – hopefully including actual discussions of the sequences and Hamming Circles/Group Debugging.
Location:
Bold Monk Brewing
1737 Ellsworth Industrial Blvd NW
Suite D-1
Atlanta, GA 30318, USA
No Book club this month!
This is also the meetups everywhere meetup that will be advertised on the blog - so we should have a large turnout!
We will be outside out front (in the breezeway) – this is subject to change, but we will be somewhere in Bold Monk. If you do not see us in the front of the restaurant, please check upstairs and out back – look for the yellow table sign. We will have to play the weather by ear.
Remember – bouncing around in conversations is a rationalist norm!
Great!
Produced as part of the SERI ML Alignment Theory Scholars Program - Summer 2023 Cohort, under the mentorship of Evan Hubinger.
I generate an activation steering vector using Anthropic's sycophancy dataset and then find that this can be used to increase or reduce performance on TruthfulQA, indicating a common direction between sycophancy on questions of opinion and untruthfulness on questions relating to common misconceptions. I think this could be a promising research direction to understand dishonesty in language models better.
Sycophancy in LLMs refers to the behavior when a model tells you what it thinks you want to hear / would approve of instead of what it internally represents as the truth. Sycophancy is a common problem in LLMs trained on human-labeled data because human-provided training signals...
I am contrasting generating an output by:
Eg. for common misconceptions, maybe most humans would hold a certain misconception (like that South America is west of Florida), but we want the LLM to realize that we want it to actually say how things are (given it likely does represent this fact somewhere)
This is a Concept Dependency Post. It may not be worth reading on its own, out of context. See the backlinks at the bottom to see which posts use this concept.
See the backlinks at the bottom of the post. Every post starting with [Concept Dependency] is a concept dependency post, that describes a concept this post is using.
Problem: Often when writing I come up with general concepts that make sense in isolation. Often I want to reuse these concepts without having to reexplain them.
A Concept Dependency Post is explaining a single concept, usually with no or minimal context. It is expected that the relevant context is provided by another post that links to the concept dependency post.
Concept Dependency Posts can be very short. Much shorter than a regular post. They might not be worth reading on their own....
Adopted.
Nevertheless lots of people were hassled. That has real costs, both to them and to you.
For the last month, @RobertM and I have been exploring the possible use of recommender systems on LessWrong. Today we launched our first site-wide experiment in that direction.
(In the course of our efforts, we also hit upon a frontpage refactor that we reckon is pretty good: tabs instead of a clutter of different sections. For now, only for logged-in users. Logged-out users see the "Latest" tab, which is the same-as-usual list of posts.)
A core value of LessWrong is to be timeless and not news-driven. However, the central algorithm by which attention allocation happens on the site is the Hacker News algorithm[1], which basically only shows you things that were posted recently, and creates a strong incentive for discussion to always be...
drat, I was hoping that one would work. oh well. yes, I use ublock, as should everyone. Have you considered simply not having analytics at all :P I feel like it would be nice to do the thing that everyone ought to do anyway since you're in charge. If I was running a website I'd simply not use analytics.
back to the topic at hand, I think you should just make a vector embedding of all posts and show a HuMAP layout of it on the homepage. that would be fun and not require sending data anywhere. you could show the topic islands and stuff.
The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.
But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.
Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.
To...
Lucius-Alexander SLT dialogue?
Before we get started, this is your quarterly reminder that I have no medical credentials and my highest academic credential is a BA in a different part of biology (with a double major in computer science). In a world with a functional medical system no one would listen to me.
Tl;dr povidone iodine probably reduces viral load when used in the mouth or nose, with corresponding decreases in symptoms and infectivity. The effect size could be as high as 90% for prophylactic use (and as low as 0% when used in late illness), but is probably much smaller. There is a long tail of side-effects. No study I read reported side effects at clinically significant levels, but I don’t think they looked hard enough. There are other gargle...
Yep that's my main contender for the better formulations referred to in the intro. .
This is a Concept Dependency Post. It may not be worth reading on its own, out of context. See the backlinks at the bottom to see which posts use this concept.
Also known as periodically labeled lattice graphs in graph theory.
Here is a concrete Edge Regular Lattice Graph:
In this graph, the following pattern is repeating locally:
So a Edge Regular Lattice Graph is a Lattice graph , such that in the natural embedding of , each edge label points in the same direction from the perspective of every vertex. Also, the number of edge labels is twice the number of dimensions.
Above we have 4 edge labels in the 2D lattice graph. One for each direction. In a 3D lattice graph, we would have 6 edge labels.
OC ACXLW Sat April 27 Argumentation and College Admissions
Hello Folks! We are excited to announce the 63rd Orange County ACX/LW meetup, happening this Saturday and most Saturdays after that.
Host: Michael Michalchik Email: michaelmichalchik@gmail.com (For questions or requests) Location: 1970 Port Laurent Place (949) 375-2045 Date: Saturday, April 27 2024 Time 2 pm
Conversation Starters:
Text link: https://www.currentaffairs.org/2018/11/you-can-make-an-argument-for-anything
Questions for discussion: a) The article suggests that people often do not investigate arguments very closely, and are...