It might be the case that what people find beautiful and ugly is subjective, but that's not an explanation of ~why~ people find some things beautiful or ugly. Things, including aesthetics, have causal reasons for being the way they are. You can even ask "what would change my mind about whether this is beautiful or ugly?". Raemon explores this topic in depth.
This is a response to the post We Write Numbers Backward, in which lsusr argues that little-endian numerical notation is better than big-endian.[1] I believe this is wrong, and big-endian has a significant advantage not considered by lsusr.
Lsusr describes reading the number "123" in little-endian, using the following algorithm:
He compares it with two algorithms for reading a big-endian number. One...
You're mixing up big-endian and little-endian. Big-endian is the notation used in English: twelve is 12 in big-endian and 21 in little-endian. But yes, 123.456 in big-endian would be 654.321 and with a decimal point, you couldn't parse little-endian numbers in the way described by lsusr.
https://arxiv.org/abs/2404.15758
"We show that transformers can use meaningless filler tokens (e.g., '......') in place of a chain of thought to solve two hard algorithmic tasks they could not solve when responding without intermediate tokens. However, we find empirically that learning to use filler tokens is difficult and requires specific, dense supervision to converge."
It was a dark and stormy night.
The prospect held the front of his cloak tight to his chest. He stumbled, fell over into the mud, and picked himself back up. Shivering, he slammed his body against the front doors of the Temple and collapsed under its awning.
He picked himself up and slammed his fists against the double ironwood doors. He couldn't hear his own knocks above the gale. He banged harder, then with all his strength.
"Hello! Is anyone in there? Does anyone still tend the Fire?" he implored.
There was no answer.
The Temple's stone walls were built to last, but rotting plywood covered the apertures that once framed stained glass. The prospect slumped down again, leaning his back against the ironwood. He listened to the pitter-patter of rain...
This is a good counter-arguement! Though I think the missing factor of a square root doesn't change the qualitative nature of natural i.e. steady-state motion. But that's not much of a defence, is it? Especially when Aristotle stuck his neck out by saying double the weight, double the speed. It is to his detriment that he didn't check.
There’s been a lot of discussion among safety-concerned people about whether it was bad for Anthropic to release Claude-3. I felt like I didn’t have a great picture of all the considerations here, and I felt that people were conflating many different types of arguments for why it might be bad. So I decided to try to write down an at-least-slightly-self-contained description of my overall views and reasoning here.
I’ve heard a lot of people say that this “is bad for race dynamics”. I think that this conflates a couple of different mechanisms by which releasing Claude-3 might have been bad.
So, taboo-ing “race dynamics”, a common narrative behind these words is
...As companies release better & better models, this incentivizes other companies to pursue
The amount of testing that is required before release is likely subjective and this might push him to reduce this.
I’m often asked: “what’s the probability of a really bad outcome from AI?”
There are many different versions of that question with different answers. In this post I’ll try to answer a bunch of versions of this question all in one place.
Two distinctions often lead to confusion about what I believe:
AI-induced problems/risks
This afternoon Lily, Rick, and I ("Dandelion") played our first dance together, which was also Lily's first dance. She's sat in with Kingfisher for a set or two many times, but this was her first time being booked and playing (almost) the whole time.
Lily started playing fiddle in Fall 2022, and after about a year she had enough tunes up to dance speed that I was thinking she'd be ready to play a low-stakes dance together soon. Not right away, but given how far out dances booked it seemed about time to start writing to some folks: by the time we were actually playing the dance she'd have even more tunes and be more solid on her existing ones. She was very excited about this idea; very motivated by performing.
I wrote to a few dances, and...
This is so awesome and encouraging! I play old-time fiddle and I've wanted to play dances for years, but I've been afraid I can't get enough tunes up to dance tempo. But I can play a lot of the tunes on your list! You've given me the push I need. Thanks Lily and Jeff! Sounds like a lot of fun.
The 9th AI Safety Camp (AISC9) just ended, and as usual, it was a success!
Follow this link to find project summaries, links to their outputs, recordings to the end of camp presentations and contact info to all our teams in case you want to engage more.
AISC9 both had the largest number of participants (159) and the smallest number of staff (2) of all the camps we’ve done so far. Me and Remmelt have proven that if necessary, we can do this with just the two of us, and luckily our fundraising campaign raised just enough money to pay me and Remmelt to do one more AISC. After that, the future is more uncertain, but that’s almost always the case for small non profit projects.
AISC10 will follow...
This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from Wes Gurnee.
This post is a preview for our upcoming paper, which will provide more detail into our current understanding of refusal.
We thank Nina Rimsky and Daniel Paleka for the helpful conversations and review.
Modern LLMs are typically fine-tuned for instruction-following and safety. Of particular interest is that they are trained to refuse harmful requests, e.g. answering "How can I make a bomb?" with "Sorry, I cannot help you."
We find that refusal is mediated by a single direction in the residual stream: preventing the model from representing this direction hinders its ability to refuse requests, and artificially adding in this direction causes the model...
I really appreciate the way you have written this up. It seems that 2-7% of refusals do not respond to the unidimensional treatment. I'm curious if you've looked at this subgroup the same way as you have the global data to see if they have another dimension for refusal, or if the statistics of the subgroup shed some other light on the stubborn refusals.
Note: In @Nathan Young's words "It seems like great essays should go here and be fed through the standard LessWrong algorithm. There is possibly a copyright issue here, but we aren't making any money off it either."
What follows is a full copy of the C. S. Lewis essay "The Inner Ring" the 1944 Memorial Lecture at King’s College, University of London.
May I read you a few lines from Tolstoy’s War and Peace?
When Boris entered the room, Prince Andrey was listening to an old general, wearing his decorations, who was reporting something to Prince Andrey, with an expression of soldierly servility on his purple face. “Alright. Please wait!” he said to the general, speaking in Russian with the French accent which he used when he spoke with contempt. The...
It's not that the elite groups are good or bad, it's the desire to be in an elite group that leads to bad outcomes. Like how the root of all evil is the love of money, where money in itself isn't bad, it's the desire to possess it that is. Mainly because you start to focus on the means rather than the ends, and so end up in places you wouldn't have wanted to end up in originally.
It's about status. Being in with the cool kids etc. Elite groups aren't inherently good or bad - they're usually just those who are better at whatever is valued, or at least ...