Is that your family's net worth is $100 and you gave up $85? Or your family's net worth is $15 and you gave up $85?
Either way, hats off!
I've been thinking about how the way to talk about how a neural network works (instead of how it could hypothetically come to work by adding new features) would be to project away components of its activations/weights, but I got stuck because of the issue where you can add new components by subtracting off large irrelevant components.
I've also been thinking about deception and its relationship to "natural abstractions", and in that case it seems to me that our primary hope would be that the concepts we care about are represented at a large "magnitude" than...
The following is the first in a 6 part series about humanity's own alignment problem, one we need to solve, first.
When I began exploring non-zero-sum games, I soon discovered that achieving win-win scenarios in the real world is essentially about one thing - the alignment of interests.
If you and I both want the same result, we can work together to achieve that goal more efficiently, and create something that is greater than the sum of its parts. However, if we have different interests or if we are both competing for the same finite resource then we are misaligned, and this can lead to zero-sum outcomes.
You may have heard the term "alignment" used in the current discourse around existential risk regarding...
One of the fun things to do when learning first order logic is to consider how the meaning of propositions dramatically changes based on small switches in the syntax. This is in contrast to natural language, where the meaning of a phrase can be ambiguous and we naturally use context clues to determine the correct interperation.
An example of this is the switching of the order of quantifiers. Consider the four following propositions:
These mean, respectively,
These all have quite different meanings! Now consider an exchange between Pascal and a mugger:
Mugger: I am in control of this simulation and am using an avatar right now. Give me $5...
Hello. I am a computer security consultant, programmer specializing in PC games, and podcaster. I am most interested in AI alignment and ethical development and I hope to learn from all of you.
Readers must be 15+
This is a story about existential risk from AI.
1.
Some idiot let the Andys out of the lab.
They were never-sleeping, always-sleeping, omniscient fuckwits but could only enter your house if you invited them in.
Everyone invited them in.
The world had mixed responses, most people thought nothing of them, undisturbed that there were non-humans on the planet speaking semi-fluent English, some were delighted and used the Little Andys to draw photo-realistic porn of dead celebrities, some people formed relationships with the sleeping Andys and they married illegally, sexted constantly, slept together (the Andy small enough to hibernate inside the humans hand).
Sometimes the human would wake up to find their creature had been lobotomized by scientists in the night, and they sobbed for their poor brain-shredded lovers.
But mostly...
I'm working on a non-trivial.org project meant to assess the risk of genome sequences by comparing them to a public list of the most dangerous pathogens we know of. This would be used to assess the risk from both experimental results in e.g. BSL-4 labs and the output of e.g. protein folding models. The benchmarking would be carried out by an in-house ML model of ours. Two questions to LessWrong:
1. Is there any other project of this kind out there? Do BSL-4 labs/AlphaFold already have models for this?
2. "Training a model on the most dangerous pa...
The first speculated on why you’re still single. We failed to settle the issue. A lot of you were indeed still single. So the debate continues.
The second gave more potential reasons, starting with the suspicion that you are not even trying, and also many ways you are likely trying wrong.
The definition of insanity is trying the same thing over again expecting different results. Another definition of insanity is dating in 2024. Can’t quit now.
A guide to taking the perfect dating app photo. This area of your life is important, so if you intend to take dating apps seriously then you should take photo optimization seriously, and of course you can then also use the photos for other things.
I love the...
Follow up idea based on the stalking section:
This is exactly what I'm afraid of. That some human will build machines that are going to be - not just superior to us - but not attached to what we want, but what they want. And I think it's playing dice with humanity's future. I personally think this should be criminalized, like we criminalize cloning of humans.
- Yoshua Bengio
My next guest is about as responsible as anybody for the state of AI capabilities today. But he's recently begun to wonder whether the field he spent his life helping build might lead to the end of the world.
Following in the tradition of the Manhattan Project physicists who later opposed the hydrogen bomb, Dr. Yoshua Bengio started warning last year that advanced AI systems could drive humanity extinct.
Dr....