LessWrong

Fluent dreaming for language models (AI interpretability method)

3mo

This is a linkpost for https://arxiv.org/pdf/2402.01702.pdf

This is a cross-post for our paper on fluent dreaming for language models. (arXiv link .) Dreaming, aka "feature visualization," is a interpretability approach popularized by DeepDream that involves optimizing the input of a neural network to maximize an internal feature like a neuron's activation. We adapt dreaming to language models.

Past dreaming work almost exclusively works with vision models because the inputs are continuous and easily optimized. Language model inputs are discrete and hard to optimize. To solve this issue, we adapted techniques from the adversarial attacks literature (GCG, Zou et al 2023). Our algorithm, Evolutionary Prompt Optimization (EPO), optimizes over a Pareto frontier of activation and fluency:

In the paper, we compare dreaming with max-activating dataset examples, demonstrating that dreaming achieves higher activations and similar perplexities to the training...

(See More – 129 more words)

Nathan Helm-Burger9m20

Very cool. I bet janus would dig this.

Rejecting Television

Declan Molony

12h

I didn’t use to be, but now I’m part of the 2% of U.S. households without a television. With its near ubiquity, why reject this technology?

The Beginning of my Disillusionment

Neil Postman’s book Amusing Ourselves to Death radically changed my perspective on television and its place in our culture. Here’s one illuminating passage:

We are no longer fascinated or perplexed by [TV’s] machinery. We do not tell stories of its wonders. We do not confine our TV sets to special rooms. We do not doubt the reality of what we see on TV [and] are largely unaware of the special angle of vision it affords. Even the question of how television affects us has receded into the background. The question itself may strike some of us as strange, as if one were

...

(Continue Reading – 1540 more words)

3Celarix1h

Anecdote, but this form of rapid cutting is most assuredly alive and well. I saw a promotional ad for an upcoming MLB baseball game on TBS. In a mere 25 seconds, I counted over 35 different cuts, cuts between players, cuts between people in the studio, cut after cut after cut. It was strangely exhausting.

Declan Molony14m10

When I watched "Spider-Man: Across the Spider-Verse" in theaters last year, the animations were amazing but I left two hours later with a headache. Maybe it's a sign that I'm getting older, but it was just too much for my brain.

The Efficient LessWrong Hypothesis - Stock Investing Competition

MrThink

There has been debate to whether the Efficient Market Hypothesis is or ever was valid.

The evidence brought forward of its death is usually stories of people in the LessWrong community correctly predicting market changes, using only widely available information and rational reasoning.

It might be correct that the market is highly efficient, but that LessWrong members and the community as a whole, in many cases, is even more efficient.

It is not unreasonable, that rationalists with a certain area of expertise and great knowledge of biases and reasoning, might occasionally find investment opportunities they can reasonably expect to beat the market (adjusted for risk of course). Even if for any given member this happens only once or twice in a lifetime, those opportunities could be highly valuable for the community....

(See More – 466 more words)

jimv27m10

Checking about 2 years after my initial post, it looks like $TSLA has fallen by more than 50%: it looks like the split-adjusted price in early April 2022 was around $330 or $340, and today it's around $145.

Eyeballing the chart, it looks like it's always been lower than that in the subsequent period, and was down to around $185 at the 12 month mark that was initially the target of the competition. That last bit is the bit that was least clear to me at the time: it seemed high probability that Tesla stock would have to fall at some point, but I expressed uncertainty about when because I thought there was a fair probability the market could stay irrational for a longer period.

Take the wheel, Shoggoth! (Lesswrong is trying out changes to the frontpage algorithm)

Ruby, RobertM

13h

For the last month, @RobertM and I have been exploring the possible use of recommender systems on LessWrong. Today we launched our first site-wide experiment in that direction.

(In the course of our efforts, we also hit upon a frontpage refactor that we reckon is pretty good: tabs instead of a clutter of different sections. For now, only for logged-in users. Logged-out users see the "Latest" tab, which is the same-as-usual list of posts.)

Why algorithmic recommendations?

A core value of LessWrong is to be timeless and not news-driven. However, the central algorithm by which attention allocation happens on the site is the Hacker News algorithm^[1], which basically only shows you things that were posted recently, and creates a strong incentive for discussion to always be...

(See More – 965 more words)

9niplav9h

(Emphasis mine.) Here's an idea[1] for a straightforward(?) recommendation algorithm: Quantilize over all past LessWrong posts by using inflation-adjusted karma as a metric of quality. The advantage is that this is dogfooding on some pretty robust theory. I think this isn't super compute-intensive, since the only thing one has to do is to compute the cumulative distribution function once a day (associating it with the post), and then inverse transform sampling from the CDF. Recommending this way has the disadvantage of not being recency-favoring (which I personally like), and not personalized (which I also like). By default, it also excludes posts below a certain karma threshold. That could be solved by exponentially tilting the distribution instead of cutting it off (θ>0, otherwise to be determined (experimentally?)). Such a recommendation algorithm wouldn't be as robust against very strong optimizers, but since we have some idea what high-karma LessWrong posts look like (& we're not dealing with a superintelligent adversary… yet), that shouldn't be a problem. ---------------------------------------- 1. If I was more virtuous, I'd write a pull request instead of a comment. ↩︎

dr_s1h20

Personalization is easy to achieve while keeping the algorithm transparent. Just rank your own viewed/commented posts by most frequent tags, then score past posts based on the tags and pick a quantile based on the mixed upvotes/tags score, possibly with a slider parameter that allows you to adjust which of the two things you want to matter most.

6dr_s10h

I am sceptical of recommender systems - I think they are kind of bound to end up in self reinforcing loops. I'd be more happy seeing a more transparent system - we have tags, upvotes, the works, so you could have something like a series of "suggested searches", e.g. the most common combinations of tags you've visited, that a user has a fast access to while also seeing what precisely is it that they're clicking on. That said, I do trust this website of all things to acknowledge if things aren't going to plan and revert. If we fail to align this one small AI to our values, well, that's a valuable lesson.

Priors and Prejudice

MathiasKB

I

Imagine an alternate version of the Effective Altruism movement, whose early influences came from socialist intellectual communities such as the Fabian Society, as opposed to the rationalist diaspora. Let’s name this hypothetical movement the Effective Samaritans.

Like the EA movement of today, they believe in doing as much good as possible, whatever this means. They began by evaluating existing charities, reading every RCT to find the very best ways of helping.

But many effective samaritans were starting to wonder. Is this randomista approach really the most prudent? After all, Scandinavia didn’t become wealthy and equitable through marginal charity. Societal transformation comes from uprooting oppressive power structures.

The Scandinavian societal model which lifted the working class, brought weekends, universal suffrage, maternity leave, education, and universal healthcare can be traced back all the...

(Continue Reading – 1941 more words)

romeostevensit1h61

Our sensible Chesterton fences

His biased priors

Their inflexible ideological commitments

In addition to epistemic priors, there are also ontological priors and teleological priors to cross compare, each with their own problems. On top of which, people are even worse at comparing non epistemic priors than they are at comparing epistemic priors. As such, attempts to point out that these are an issue will be seen as a battle tactic: move the argument from a domain in which they have the upper hand (from their perspective) to unfamiliar territory in which you'll... (read more)

1cubefox7h

I mean that those factors don't presuppose different priors. You could still end up with different "posteriors" even with the same "starting point". An example for an (informal) alternative to Bayesian updating, that doesn't require subjective priors, is Inference to the Best Explanation. One could, of course, model the criteria that determine the goodness of explanations as a sort of "prior". But those criteria would be part of the hypothetical IBE algorithm, not a free variable like in Bayesian updating. One could also claim that there are no objective facts about the goodness of explanations and that IBE is invalid. But that's an open question.

2dr_s7h

I'd definitely call any assumption about which forms preferred explanations should take as a "prior". Maybe I have a more flexible concept of what counts as Bayesian than you, in that sense? Priors don't need to be free parameters, the process has to start somewhere. But if you already have some data and then acquire some more data, obviously the previous data will still affect your conclusions.

1cubefox7h

The problem with calling parts of a learning algorithm a prior that are not free variables, is that then anything (every part of any learning algorithm) would count as a prior. So even the Bayesian conditionalization rule itself. But that's not what Bayesians consider part of a prior.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Funny Anecdote of Eliezer From His Sister

104

Daniel Birnbaum

18h

This comes from a podcast called 18Forty, of which the main demographic of Orthodox Jews. Eliezer's sister (Hannah) came on and talked about her Sheva Brachos, which is essentially the marriage ceremony in Orthodox Judaism. People here have likely not seen it, and I thought it was quite funny, so here it is:

https://18forty.org/podcast/channah-cohen-the-crisis-of-experience/

David Bashevkin:

So I want to shift now and I want to talk about something that full disclosure, we recorded this once before and you had major hesitation for obvious reasons. It’s very sensitive what we’re going to talk about right now, but really for something much broader, not just because it’s a sensitive personal subject, but I think your hesitation has to do with what does this have to do with the subject at hand?...

(See More – 440 more words)

ChrisHibbert1h20

Proofreading comment:

Please change "folks" to "focus"

The Solution to Sleeping Beauty

Ape in the coat

2mo

This is the eighth post in my series on Anthropics. The previous one is Lessons from Failed Attempts to Model Sleeping Beauty Problem. The next one is Beauty and the Bets.

Introduction

Suppose we take the insights from the previous post, and directly try to construct a model for the Sleeping Beauty problem based on them.

We expect a halfer model, so

$P (H e a d s & M o n d a y) = P (H e a d s) = 1 / 2$

On the other hand, in order not repeat Lewis' Model's mistakes:

$P (H e a d s | M o n d a y) = 1 / 2$

But both of these statements can only be true if

$P (M o n d a y) = 1$

And, therefore, apparently, $P (T u e s d a y)$ has to be zero, which sounds obviously wrong. Surely the Beauty can be awaken on Tuesday!

At this point, I think, you wouldn't be surprised, if I tell you that there are philosophers who are eager to bite this bullet and claim that the Beauty should, indeed, reason as...

(Continue Reading – 3721 more words)

2Ape in the coat9h

Well, I think this one is actually correct. But, as I said in the previous comment, the statement "Today is Monday" doesn't actually have a coherent truth value throughout the probability experiment. It's not either True or False. It's either True or True and False at the same time! We can answer every coherently formulated question. Everything that is formally defined has an answer Being careful with the basics allows to understand which question is coherent and which is not. This is the same principle as with every probability theory problem. Consider Sleeping-Beauty experiment without memory loss. There, the event Monday xor Tuesday also can't be said to always happen. And likewise "Today is Monday" also doesn't have a stable truth value throughout the whole experiment. Once again, we can't express Beauty's uncertainty between the two days using probability theory. We are just not paying attention to it because by the conditions of the experiment, the Beauty is never in such state of uncertainty. If she remembers a previous awakening then it's Tuesday, if she doesn't - then it's Monday. All the pieces of the issue are already present. The addition of memory loss just makes it's obvious that there is the problem with our intuition.

Markvy2h10

Re: no coherent “stable” truth value: indeed. But still… if she wonders out loud “what day is it?” at the very moment she says that, it has an answer. An experimenter who overhears her knows the answer. It seems to me that you “resolve” this tension is that the two of them are technically asking a different question, even though they are using the same words. But still… how surprised should she be if she were to learn that today is Monday? It seems that taking your stance to its conclusion, the answer would be “zero surprise: she knew for sure she wou... (read more)

Subjective Questions Require Subjective information

Ben

This summarizes a (possibly trivial) observation that I found interesting.

Story

An all-powerful god decides to play a game. They stop time, grab a random human, and ask them "What will you see next?". The human answers, then time is switched back on and the god looks at how well they performed. Most of the time the humans get it right, but occasionally they are caught by surprise and get it wrong.

To be more generous the god decides to give them access (for the game) to the entirety of all objective facts. The position and momentum of every elementary particle, every thought and memory anyone has ever had (before the time freeze) etc. However, suddenly performance in the game drops from 99% to 0%. How can this be? They...

(Continue Reading – 1058 more words)

tailcalled3h20

An idea I've been playing with recently:

Suppose you have some "objective world" space $Ω$ . Then in order to talk about subjective questions, you need a reference frame, which we could think of as the members of a fiber of some function $ω : I \to Ω$ , for some "interpretation space" $I$ .

The interpretations themselves might abstract to some "latent space" $Λ$ according to a function $λ : I \to Λ$ . Functions of $Λ$ would then be "subjective" (depending on the interpretation they arise from), yet still potentially meaningfully constrained, based on $(λ, ω)$ . In particular if some struct... (read more)

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

The Beginning of my Disillusionment

Why algorithmic recommendations?

I

Introduction

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA