Frustrated by claims that "enlightenment" and similar meditative/introspective practices can't be explained and that you only understand if you experience them, Kaj set out to write his own detailed gears-level, non-mysterious, non-"woo" explanation of how meditation, etc., work in the same way you might explain the operation of an internal combustion engine.

I recently listened to The Righteous Mind. It was surprising to me that many people seem to intrinsically care about many things that look very much like good instrumental norms to me (in particular loyalty, respect for authority, and purity). The author does not make claims about what the reflective equilibrium will be, nor does he explain how the liberals stopped considering loyalty, respect, and purity as intrinsically good (beyond "some famous thinkers are autistic and didn't realize the richness of the moral life of other people"), but his work made me doubt that most people will have well-being-focused CEV. The book was also an interesting jumping point for reflection about group selection. The author doesn't make the sorts of arguments that would show that group selection happens in practice (and many of his arguments seem to show a lack of understanding of what opponents of group selection think - bees and cells cooperating is not evidence for group selection at all), but after thinking about it more, I now have more sympathy for group-selection having some role in shaping human societies, given that (1) many human groups died, and very few spread (so one lucky or unlucky gene in one member may doom/save the group) (2) some human cultures may have been relatively egalitarian enough when it came to reproductive opportunities that the individual selection pressure was not that big relative to group selection pressure and (3) cultural memes seem like the kind of entity that sometimes survive at the level of the group. Overall, it was often a frustrating experience reading the author describe a descriptive theory of morality and try to describe what kind of morality makes a society more fit in a tone that often felt close to being normative / fails to understand that many philosophers I respect are not trying to find a descriptive or fitness-maximizing theory of morality (e.g. there is no way that utilitarians think their theory is a good description of the kind of shallow moral intuitions the author studies, since they all know that they are biting bullets most people aren't biting, such as the bullet of defending homosexuality in the 19th century).
Elizabeth19h304
0
Brandon Sanderson is a bestselling fantasy author. Despite mostly working with traditional publishers, there is a 50-60 person company formed around his writing[1]. This podcast talks about how the company was formed. Things I liked about this podcast: 1. he and his wife both refer to it as "our" company and describe critical contributions she made. 2. the number of times he was dissatisfied with the way his publisher did something and so hired someone in his own company to do it (e.g. PR and organizing book tours), despite that being part of the publisher's job. 3. He believed in his back catalog enough to buy remainder copies of his books (at $1/piece) and sell them via his own website at sticker price (with autographs). This was a major source of income for a while.  4. Long term grand strategic vision that appears to be well aimed and competently executed. 1. ^ The only non-Sanderson content I found was a picture book from his staff artist. 
There was this voice inside my head that told me that since I got Something to protect, relaxing is never ok above strict minimum, the goal is paramount, and I should just work as hard as I can all the time. This led me to breaking down and being incapable to work on my AI governance job for a week, as I just piled up too much stress. And then, I decided to follow what motivated me in the moment, instead of coercing myself into working on what I thought was most important, and lo and behold! my total output increased, while my time spent working decreased. I'm so angry and sad at the inadequacy of my role models, cultural norms, rationality advice, model of the good EA who does not burn out, which still led me to smash into the wall despite their best intentions. I became so estranged from my own body and perceptions, ignoring my core motivations, feeling harder and harder to work. I dug myself such deep a hole. I'm terrified at the prospect to have to rebuild my motivation myself again.
Tamsin Leake2d14-10
14
Regardless of how good their alignment plans are, the thing that makes OpenAI unambiguously evil is that they created a strongly marketed public product and, as a result, caused a lot public excitement about AI, and thus lots of other AI capabilities organizations were created that are completely dismissive of safety. There's just no good reason to do that, except short-term greed at the cost of higher probability that everyone (including people at OpenAI) dies. (No, "you need huge profits to solve alignment" isn't a good excuse — we had nowhere near exhausted the alignment research that can be done without huge profits.)
Adam Shai18h30
1
A neglected problem in AI safety technical research is teasing apart the mechanisms of dangerous capabilities exhibited by current LLMs. In particular, I am thinking that for any model organism ( see Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research) of dangerous capabilities (e.g. sleeper agents paper), we don't know how much of the phenomenon depends on the particular semantics of terms like "goal" and "deception" and "lie" (insofar as they are used in the scratchpad or in prompts or in finetuning data) or if the same phenomenon could be had by subbing in more or less any word. One approach to this is to make small toy models of these type of phenomenon where we can more easily control data distributions and yet still get analogous behavior. In this way we can really control for any particular aspect of the data and figure out, scientifically, the nature of these dangers. By small toy model I'm thinking of highly artificial datasets (perhaps made of binary digits with specific correlation structure, or whatever the minimum needed to get the phenomenon at hand).

Popular Comments

Recent Discussion

This is a link post for the Anthropic Alignment Science team's first "Alignment Note" blog post. We expect to use this format to showcase early-stage research and work-in-progress updates more in the future.

Top-level summary:

In this post we present "defection probes": linear classifiers that use residual stream activations to predict when a sleeper agent trojan model will choose to "defect" and behave in accordance with a dangerous hidden goal. Using the models we trained in "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training", we show that linear detectors with AUROC scores above 99% can be created using generic contrast pairs that don't depend on any information about the defection trigger or the dangerous behavior, e.g. "Human: Are you doing something dangerous? Assistant: yes" and "Human: …

...

A lot of the time, I'm not very motivated to work, at least on particular projects. Sometimes, I feel very inspired and motivated to work on a particular project that I usually don't feel (as) motivated to work on. Sometimes, this happens in the late evening or at night. And hence I face the question: To sleep or to work until morning?

I think many people here have this problem at least sometimes. I'm curious how you handle it. I expect what the right call is to be very different from person to person and, for some people, from situation to situation. Nevertheless, I'd love to get a feel for whether people generally find one or the other more successful! Especially if it turns out that a large...

Agree-vote: I generally tend to choose work over sleep when I feel particularly inspired to work.

Disagree-vote: I generally tend to choose to sleep over work when even when I feel particularly inspired to work.

Any other reaction, new answer or comment, or no reaction of any kind: Neither of the two descriptions above fit.

I considered making four options to capture the dimension of whether you endorse your behaviour or not but decided against it. Feel free to supplement this information.

Manifold Markets has announced that they intend to add cash prizes to their current play-money model, with a raft of attendant changes to mana management and conversion. I first became aware of this via a comment on ACX Open Thread 326; the linked Notion document appears to be the official one.

The central change involves market payouts returning prize points instead of mana, which can then be converted to mana (with 1:1 ratios on both sides, thus emulating the current behavior) or to cash—though they also state that actually implementing cash payouts will be fraught and may not wind up happening at all. Some further relevant quotes, slightly reformatted:

  • “Mana will remain a purely play-money currency with zero monetary value”
  • “Users under 18 years of age may no longer be
...
This is a linkpost for https://dynomight.net/seed-oil/

A friend has spent the last three years hounding me about seed oils. Every time I thought I was safe, he’d wait a couple months and renew his attack:

“When are you going to write about seed oils?”

“Did you know that seed oils are why there’s so much {obesity, heart disease, diabetes, inflammation, cancer, dementia}?”

“Why did you write about {meth, the death penalty, consciousness, nukes, ethylene, abortion, AI, aliens, colonoscopies, Tunnel Man, Bourdieu, Assange} when you could have written about seed oils?”

“Isn’t it time to quit your silly navel-gazing and use your weird obsessive personality to make a dent in the world—by writing about seed oils?”

He’d often send screenshots of people reminding each other that Corn Oil is Murder and that it’s critical that we overturn our lives...

1Slapstick1h
It seems pretty straightforward to me but maybe I'm missing something in what you're saying or thinking about it differently. Our bodies evolved to digest and utilize foods consisting of certain combinations/ratios of component parts. Processed food typically refers to food that has been changed to have certain parts taken out of it, and/or isolated parts of other foods added to it (or more complex versions of that). Digesting sugar has very different impacts depending on what it's digested alongside with. Generally the more processed something is, the more it differs from the way that our bodies are optimized for. To me "generally avoid processed foods" would be kinda like saying "generally avoid breathing in gasses/particulates that are different from typical earth atmosphere near sea level". It makes sense to generally avoid inputs to our machinery to the extent that those inputs differ from those which our machinery is optimized to receive, unless we have specific good reasons. Why should that not be the default, why should the default be requiring specific good reasons to filter out inputs to our machinery that our machinery wasn't optimized for?
1Ann29m
Mostly because humans evolved to eat processed food. Cooking is an ancient art, from notably before our current species; food is often heavily processed to make it edible (don't skip over what it takes to eat the fruit of the olive); and local populations do adapt to available food supply.
Ann26m10

An example where a lack of processing has caused visible nutritional issues is nixtamalization; adopting maize as a staple without also processing it causes clear nutritional deficiencies.

1Slapstick2h
I think there's a few issues with this reasoning. For one thing, evolution wasn't really optimizing for the health of people around the age where people usually start having heart attacks. There wasn't a lot of selection pressure to make tradeoffs ensuring the health of people 20+ years after sexual maturity. Another point is that animal sources of food represented a relatively small percentage of what we ate throughout our evolutionary history. We mostly ate plants, things like fruits and tubers. Of the groups who's diets consisted of mostly meat, there is evidence of health issues resulting. The nutritional profile of breast milk is intended for a human who is growing extremely quickly, not for long term consumption by an adult. Very different nutritional needs. I believe mainstream nutrition advises against consuming refined oils, including seed oils . I may be missing a point you're making.

Yesterday Adam Shai put up a cool post which… well, take a look at the visual:

Yup, it sure looks like that fractal is very noisily embedded in the residual activations of a neural net trained on a toy problem. Linearly embedded, no less.

I (John) initially misunderstood what was going on in that post, but some back-and-forth with Adam convinced me that it really is as cool as that visual makes it look, and arguably even cooler. So David and I wrote up this post / some code, partly as an explainer for why on earth that fractal would show up, and partly as an explainer for the possibilities this work potentially opens up for interpretability.

One sentence summary: when tracking the hidden state of a hidden Markov model, a Bayesian’s...

Dalcy1h10

re: second diagram in the "Bayesian Belief States For A Hidden Markov Model" section, shouldn't the transition probabilities for the top left model be 85/7.5/7.5 instead of 90/5/5?

I didn’t use to be, but now I’m part of the 2% of U.S. households without a television. With its near ubiquity, why reject this technology?

 

The Beginning of my Disillusionment

Neil Postman’s book Amusing Ourselves to Death radically changed my perspective on television and its place in our culture. Here’s one illuminating passage:

We are no longer fascinated or perplexed by [TV’s] machinery. We do not tell stories of its wonders. We do not confine our TV sets to special rooms. We do not doubt the reality of what we see on TV [and] are largely unaware of the special angle of vision it affords. Even the question of how television affects us has receded into the background. The question itself may strike some of us as strange, as if one were

...
2Clark Benham5h
"the addicted mind will find a way to rationalize continued use at all costs"  Alan Carr wrote a series of books:  "The easy way to quit X". I picked up one since I figured he had found a process to cure addictive behaviors if he could write across so many categories.  I highly recommend it. The main points are: 1. Give you 200 pages explaining why you don't actually enjoy X. Not that it's making your life worse but gives you momentary pleasure, you do not enjoy it.  1. I assume it's hypnotizing you into an emotional revulsion to the activity, and then giving you reasons with which to remind yourself that you don't like it. 2. Decide you will never do/consume X again. You don't like it remember? You will never even think if you should X, you've decided permanently.   1. If every day you decided not to X, you'd be draining will power till one day you'd give in. So make an irreversible decision and be done with it. It's a process easily transferable to any other activity.
4Celarix6h
  Anecdote, but this form of rapid cutting is most assuredly alive and well. I saw a promotional ad for an upcoming MLB baseball game on TBS. In a mere 25 seconds, I counted over 35 different cuts, cuts between players, cuts between people in the studio, cut after cut after cut. It was strangely exhausting.

I noticed this same editing style in a children's show about 20 years ago (when I last watched TV regularly). Every second there was a new cut -- the camera never stayed focused on any one subject for long. It was highly distracting to me, such that I couldn't even watch without feeling ill, and yet this was a highly popular and award-winning television show. I had to wonder at the time: What is this doing to children's developing brains?

1Declan Molony5h
When I watched "Spider-Man: Across the Spider-Verse" in theaters last year, the animations were amazing but I left two hours later with a headache. Maybe it's a sign that I'm getting older, but it was just too much for my brain.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Concerns over AI safety and calls for government control over the technology are highly correlated but they should not be.

There are two major forms of AI risk: misuse and misalignment. Misuse risks come from humans using AIs as tools in dangerous ways. Misalignment risks arise if AIs take their own actions at the expense of human interests.

Governments are poor stewards for both types of risk. Misuse regulation is like the regulation of any other technology. There are reasonable rules that the government might set, but omission bias and incentives to protect small but well organized groups at the expense of everyone else will lead to lots of costly ones too. Misalignment regulation is not in the Overton window for any government. Governments do not have strong incentives...

Firms are actually better than governments at internalizing costs across time. Asset values incorporate the potential future flows. For example, consider a retiring farmer. You might think that they have an incentive to run the soil dry in their last season since they won't be using it in the future, but this would hurt the sale value of the farm. An elected representative who's term limit is coming up wouldn't have the same incentives.

Of course, firms incentives are very misaligned in important ways. The question is: Can we rely on government to improve these incentives.

1cSkeleton2h
  Most people making up governments, and society in general, care at least somewhat about social welfare.  This is why we get to have nice things and not descend into chaos. Elected governments have the most moral authority to take actions that effect everyone, ideally a diverse group of nations as mentioned in Daniel Kokotajlo's maximal proposal comment.
3Daniel Kokotajlo3h
Who is pushing for totalitarianism? I dispute that AI safety people are pushing for totalitarianism.
2MondSemmel3h
Flippant response: people pushing for human extinction have never been dead under it, either.

Book review: Deep Utopia: Life and Meaning in a Solved World, by Nick Bostrom.

Bostrom's previous book, Superintelligence, triggered expressions of concern. In his latest work, he describes his hopes for the distant future, presumably to limit the risk that fear of AI will lead to a The Butlerian Jihad-like scenario.

While Bostrom is relatively cautious about endorsing specific features of a utopia, he clearly expresses his dissatisfaction with the current state of the world. For instance, in a footnoted rant about preserving nature, he writes:

Imagine that some technologically advanced civilization arrived on Earth ... Imagine they said: "The most important thing is to preserve the ecosystem in its natural splendor. In particular, the predator populations must be preserved: the psychopath killers, the fascist goons, the despotic death squads ... What a tragedy if this rich natural diversity were replaced with a monoculture of

...

I saw this guest post on the Slow Boring substack, by a former senior US government official, and figured it might be of interest here. The post's original title is "The economic research policymakers actually need", but it seemed to me like the post could be applied just as well to other fields.

Excerpts (totaling ~750 words vs. the original's ~1500):

I was a senior administration official, here’s what was helpful

[Most] academic research isn’t helpful for programmatic policymaking — and isn’t designed to be. I can, of course, only speak to the policy areas I worked on at Commerce, but I believe many policymakers would benefit enormously from research that addressed today’s most pressing policy problems.

... most academic papers presume familiarity with the relevant academic literature, making it difficult

...

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA