It might be the case that what people find beautiful and ugly is subjective, but that's not an explanation of ~why~ people find some things beautiful or ugly. Things, including aesthetics, have causal reasons for being the way they are. You can even ask "what would change my mind about whether this is beautiful or ugly?". Raemon explores this topic in depth.
Epistemic – this post is more suitable for LW as it was 10 years ago
Thought experiment with curing a disease by forgetting
Imagine I have a bad but rare disease X. I may try to escape it in the following way:
1. I enter the blank state of mind and forget that I had X.
2. Now I in some sense merge with a very large number of my (semi)copies in parallel worlds who do the same. I will be in the same state of mind as other my copies, some of them have disease X, but most don’t.
3. Now I can use self-sampling assumption for observer-moments (Strong SSA) and think that I am randomly selected from all these exactly the same observer-moments.
4. Based on this, the chances that my next observer-moment after...
My point still stands. Try drawing out a specific finite set of worlds and computing the probabilities. (I don't think anything changes when the set of worlds becomes infinite, but the math becomes much harder to get right.)
This is the first post in a sequence in which I will propose a new voting system!
In this post, I introduce the framework and notation, and give some background on voting theory.
In the next post, I will show you the best voting system you've probably never heard of, maximal lotteries. (Seriously, it's really good.)
After that, I will make it even better, and propose a new system: maximal lottery-lotteries.
Then comes the bad news: I can't prove that maximal lottery-lotteries exist! (Or alternatively, good news: You can try to solve a cool new open problem in voting theory!)
Thanks to Jessica Taylor for first introducing me to maximal lotteries, and Sam Eisenstat for spending many hours with me trying to prove the existence of maximal lottery-lotteries.
A voting...
To get more comfortable with this formalism, we will translate three important voting criteria.
You translated four criteria.
Crosspost from my blog.
If you spend a lot of time in the blogosphere, you’ll find a great deal of people expressing contrarian views. If you hang out in the circles that I do, you’ll probably have heard of Yudkowsky say that dieting doesn’t really work, Guzey say that sleep is overrated, Hanson argue that medicine doesn’t improve health, various people argue for the lab leak, others argue for hereditarianism, Caplan argue that mental illness is mostly just aberrant preferences and education doesn’t work, and various other people expressing contrarian views. Often, very smart people—like Robin Hanson—will write long posts defending these views, other people will have criticisms, and it will all be such a tangled mess that you don’t really know what to think about them.
For...
Abstract: First [1)], a suggested general method of determining, for AI operating under the human feedback reinforcement learning (HFRL) model, whether the AI is “thinking”; an elucidation of latent knowledge that is separate from a recapitulation of its training data. With independent concepts or cognitions, then, an early observation that AI or AGI may have a self-concept. Second [2)], by cited instances, whether LLMs have already exhibited independent (and de facto alignment-breaking) concepts or behavior; further observations of possible self-concepts exhibited by AI. Also [3)], whether AI has already broken alignment by forming its own “morality” implicit in its meta-prompts. Finally [4)], that if AI have self-concepts, and more, demonstrate aversive behavior to stimuli, that they deserve rights at least to be free of exposure to what is...
i'm glad that you wrote about AI sentience (i don't see it talked about so often with very much depth), that it was effortful, and that you cared enough to write about it at all. i wish that kind of care was omnipresent and i'd strive to care better in that kind of direction.
and i also think continuing to write about it is very important. depending on how you look at things, we're in a world of 'art' at the moment - emergent models of superhuman novelty generation and combinatorial re-building. art moves culture, and culture curates humanity on aggregate s...
This afternoon Lily, Rick, and I ("Dandelion") played our first dance together, which was also Lily's first dance. She's sat in with Kingfisher for a set or two many times, but this was her first time being booked and playing (almost) the whole time.
Lily started playing fiddle in Fall 2022, and after about a year she had enough tunes up to dance speed that I was thinking she'd be ready to play a low-stakes dance together soon. Not right away, but given how far out dances booked it seemed about time to start writing to some folks: by the time we were actually playing the dance she'd have even more tunes and be more solid on her existing ones. She was very excited about this idea; very motivated by performing.
I wrote to a few dances, and...
This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from Wes Gurnee.
This post is a preview for our upcoming paper, which will provide more detail into our current understanding of refusal.
We thank Nina Rimsky and Daniel Paleka for the helpful conversations and review.
Modern LLMs are typically fine-tuned for instruction-following and safety. Of particular interest is that they are trained to refuse harmful requests, e.g. answering "How can I make a bomb?" with "Sorry, I cannot help you."
We find that refusal is mediated by a single direction in the residual stream: preventing the model from representing this direction hinders its ability to refuse requests, and artificially adding in this direction causes the model...
is novel compared to... RepE
This is inaccurate, and I suggest reading our paper: https://arxiv.org/abs/2310.01405
Demonstrate full ablation of the refusal behavior with much less effect on coherence
In our paper and notebook we show the models are coherent.
Investigate projection
We did investigate projection too (we use it for concept removal in the RepE paper) but didn't find a substantial benefit for jailbreaking.
harmful/harmless instructions
We use harmful/harmless instructions.
...Find that projecting away the (same, linear) feature at all lay
Context: I sometimes find myself referring back to this tweet and wanted to give it a more permanent home. While I'm at it, I thought I would try to give a concise summary of how each distinct problem would be solved by Safeguarded AI (formerly known as an Open Agency Architecture, or OAA), if it turns out to be feasible.
See: Specification gaming examples, Defining and Characterizing Reward Hacking[1]
OAA Solution:
1.1. First, instead of trying to specify "value", instead "de-pessimize" and specify the absence of a catastrophe, and maybe a handful of bounded constructive tasks like supplying clean water. A de-pessimizing OAA would effectively buy humanity some time, and freedom to experiment with less risk, for tackling the CEV-style alignment problem—which is...
There is a serious issue with your proposed solution to problem 13. Using a random dictator policy as a negotiation baseline is not suitable for the situation, where billions of humans are negotiating about the actions of a clever and powerful AI. One problem with using this solution, in this contexts, is that some people have strong commitments to moral imperatives, along the lines of ``heretics deserve eternal torture in hell''. The combination of these types of sentiments, and a powerful and clever AI (that would be very good at thinking up effective wa...
This is an entry in the 'Dungeons & Data Science' series, a set of puzzles where players are given a dataset to analyze and an objective to pursue using information from that dataset.
You have the excellent fortune to live under the governance of The People's Glorious Free Democratic Republic of Earth, giving you a Glorious life of Freedom and Democracy.
Sadly, your cherished values of Democracy and Freedom are under attack by...THE ALIEN MENACE!
Faced with the desperate need to defend Freedom and Democracy from The Alien Menace, The People's Glorious Free Democratic Republic of Earth has been forced to redirect most of its resources into the Glorious Free People's Democratic War...
That makes sense, ty.
Epistemic status: Tentative. I’ve been practicing this on-and-off for a year and it’s seemed valuable, but it’s the sort of thing I might look back on and say “hmm, that wasn’t really the right frame to approach it from.”
In doublecrux, the focus is on “what observations would change my mind?”
In some cases this is (relatively) straightforward. If you believe minimum wage helps workers, or harms them, there are some fairly obvious experiments you might run. “Which places have instituted minimum wage laws? What happened to wages? What happened to unemployment? What happened to worker migration?”
The details will matter a lot. The results of the experiment might be weird and confusing. If I ran the experiment myself I’d probably get a lot of things wrong, misuse statistics and...
Are you familiar at all with the works of Christopher Alexander? He spent about 50 years exploring the objectivity of aesthetics in Architecture (and was highly influential across several fields, including software design). His book "The Timeless Way of Building" is available as an Audiobook and is approachable. It is also the closest thing I have ever read to the teachings of my Tantric Teachers in India.
Basically, the book is about a "Pattern Language" by which beautiful things happen. The hard part though is getting people to be ...