What happens when a learned model (such as a neural network) is itself an optimizer? The possibility of mesa-optimization raises two important questions for the safety and transparency of advanced ML systems. First, under what circumstances will models be optimizers, including when they should not be? And, when a model is an optimizer, what will its objective be?
I've long been confused about the Smoking Lesion problem for a completely different and probably unimportant reason. I don't understand the combination of "should you prefer" and "preference is caused by lesion". What does "should" mean in this case? Preferences over counterfactual preferences is just weird.
It seems like logically updateless reasoning is what we would want in order to solve many decision-theory problems. I show that several of the problems which seem to require updateless reasoning can instead be solved by selecting a policy with a logical inductor that's run a small amount of time. The policy specifies how to make use of knowledge from a logical inductor which is run longer. This addresses the difficulties which seem to block logically updateless decision theory in a fairly direct manner. On the other hand, it doesn't seem to hold much promise for the kind of insights which we would want from a real solution.
Rather than running a logical inductor all the way to and making a decision via the expected
...It seems better in principle to find a way to respect human intuitions about which things to be updateless about. Getting something wrong in the too-updateless direction can give up control of the AI to entities which we don't think of as existing; getting something wrong in the too-updateful direction can miss out on multiverse-wide coorination via superrationality.
Say you want to plot some data. You could just plot it by itself:
Or you could put lines on the left and bottom:
Or you could put lines everywhere:
Or you could be weird:
Which is right? Many people treat this as an aesthetic choice. But I’d like to suggest an unambiguous rule.
First, try to accept that all axis lines are optional. I promise that readers will recognize a plot even without lines around it.
So consider these plots:
Which is better? I claim this depends on what you’re plotting. To answer, mentally picture these arrows:
Now, ask yourself, are the lengths of these arrows meaningful? When you draw that horizontal line, you invite people to compare those lengths.
You use the same principle for deciding if you should draw a y-axis line. As...
Just so; the correct way is indeed to show the full (zero-based y-axis) chart, then a “zoomed-in” version, with the y-axis mapping clearly indicated. Of course, this takes more effort than just including the one chart; but this is not surprising—doing things correctly often takes more effort than doing things incorrectly!
A while ago I wrote how I managed to add 13 points to my IQ (as measured by the mean between 4 different tests).
I had 3 “self-experimenters” follow my instructions in San Francisco. One of them dropped off, since, surprise surprise, the intervention is hard.
The other two had an increase of 11 and 10 points in IQ respectively (using the “fluid” components of each test) and an increase of 9 and 7 respectively if we include verbal IQ.
A total of 7 people acted as a control and were given advantages on the test compared to the intervention group to exacerbate the effects of memory and motivation, only 1 scored on par with the intervention group. We get a very good p-value, considering the small n, both when...
What evidence do you have about how much time it takes per day to maintain the effect after the end of the 2 weeks?
If it’s worth saying, but not worth its own post, here's a place to put it.
If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.
If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.
If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.
The Open Thread tag is here. The Open Thread sequence is here.
You might find this link helpful for your questions.
This is a link to the glossory from the above site.
This is from the FRB of St. Louis.
Last, I would suggest you can also just ask any of the available LLM's out there now to explain the term you are interested in and get a pretty good initial explanation.
As for books, I have three. How good they are is subjective as one textbook is from years ago but they should cover most of the investment markets side of things:
Options as a Strategic Investment (Lawrence McMIllan)
Technical Analysis (Kirkpatrick &am...
I did an exploration into how Community Notes (formerly Birdwatch) from X (formerly Twitter) works, and how its algorithm decides which notes get displayed to the wider community. In this post, I’ll share and explain what I found, as well as offer some comments.
Community Notes is a fact-checking tool available to US-based users of X/Twitter which allows readers to attach notes to posts to give them clarifying context. It uses an open-source bridging-based ranking algorithm intended to promote notes which receive cross-partisan support, and demote notes with a strong partisan lean. The tool seems to be pretty popular overall, and most of the criticism aimed toward it seems to be about how Community Notes fails to be a sufficient replacement for other, more top-down moderation systems.[1]
This seems interesting to me as an...
I'm surprised that it's one-dimensional as that should be relatively easy for the game. If the attacker cares about promoting Israeli interests or Chinese interests they can just cast a lot of votes in the other right/left direction on topics they don't care about.
Did they write anywhere why they only consider one-dimension?
I don't think this is the case, but I'm mentioning this possibility because I'm surprised I've never seen someone suggest it before:
Maybe the reason Sam Altman is taking decisions that increase p(doom) is because he's a pure negative utilitarian (and he doesn't know-about/believe-in acausal trade).
Seems relevant - RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval:
'Our theoretical analysis reveals that CoT improves RNNs but is insufficient to close the gap with Transformers. A key bottleneck lies in the inability of RNNs to perfectly retrieve information from the context, even with CoT: for several tasks that explicitly or implicitly require this capability, such as associative recall and determining if a graph is a tree, we prove that RNNs are not expressive enough to solve the tasks while Transformers can solve them with e...
(Crossposted by habryka after asking Eliezer whether I could post it under his account)
"Ignore all these elaborate, abstract, theoretical predictions," the Spokesperson for Ponzi Pyramid Incorporated said in a firm, reassuring tone. "Empirically, everyone who's invested in Bernie Bankman has received back 144% of what they invested two years later."
"That's not how 'empiricism' works," said the Epistemologist. "You're still making the assumption that --"
"You could only believe that something different would happen in the future, if you believed in elaborate theoretical analyses of Bernie Bankman's unobservable internal motives and internal finances," said the spokesperson for Ponzi Pyramid Incorporated. "If you are a virtuous skeptic who doesn't trust in overcomplicated arguments, you'll believe that future investments will also pay back 144%, just like in the past. That's the...
can't we just look at weights?
As I understand, interpretability research doesn't exactly got stuck, but it's very-very-very far from something like this even for not-SotA models. And the gap is growing.