Frustrated by claims that "enlightenment" and similar meditative/introspective practices can't be explained and that you only understand if you experience them, Kaj set out to write his own detailed gears-level, non-mysterious, non-"woo" explanation of how meditation, etc., work in the same way you might explain the operation of an internal combustion engine.

There was this voice inside my head that told me that since I got Something to protect, relaxing is never ok above strict minimum, the goal is paramount, and I should just work as hard as I can all the time. This led me to breaking down and being incapable to work on my AI governance job for a week, as I just piled up too much stress. And then, I decided to follow what motivated me in the moment, instead of coercing myself into working on what I thought was most important, and lo and behold! my total output increased, while my time spent working decreased. I'm so angry and sad at the inadequacy of my role models, cultural norms, rationality advice, model of the good EA who does not burn out, which still led me to smash into the wall despite their best intentions. I became so estranged from my own body and perceptions, ignoring my core motivations, feeling harder and harder to work. I dug myself such deep a hole. I'm terrified at the prospect to have to rebuild my motivation myself again.
Elizabeth6h112
0
Brandon Sanderson is a bestselling fantasy author. Despite mostly working with traditional publishers, there is a 50-60 person company formed around his writing[1]. This podcast talks about how the company was formed. Things I liked about this podcast: 1. he and his wife both refer to it as "our" company and describe critical contributions she made. 2. the number of times he was dissatisfied with the way his publisher did something and so hired someone in his own company to do it (e.g. PR and organizing book tours), despite that being part of the publisher's job. 3. He believed in his back catalog enough to buy remainder copies of his books (at $1/piece) and sell them via his own website at sticker price (with autographs). This was a major source of income for a while.  4. Long term grand strategic vision that appears to be well aimed and competently executed. 1. ^ The only non-Sanderson content I found was a picture book from his staff artist. 
A neglected problem in AI safety technical research is teasing apart the mechanisms of dangerous capabilities exhibited by current LLMs. In particular, I am thinking that for any model organism ( see Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research) of dangerous capabilities (e.g. sleeper agents paper), we don't know how much of the phenomenon depends on the particular semantics of terms like "goal" and "deception" and "lie" (insofar as they are used in the scratchpad or in prompts or in finetuning data) or if the same phenomenon could be had by subbing in more or less any word. One approach to this is to make small toy models of these type of phenomenon where we can more easily control data distributions and yet still get analogous behavior. In this way we can really control for any particular aspect of the data and figure out, scientifically, the nature of these dangers. By small toy model I'm thinking of highly artificial datasets (perhaps made of binary digits with specific correlation structure, or whatever the minimum needed to get the phenomenon at hand).
MIRI Technical Governance Team is hiring, please apply and work with us! We are looking to hire for the following roles: * Technical Governance Researcher (2-4 hires) * Writer (1 hire) The roles are located in Berkeley, and we are ideally looking to hire people who can start ASAP. The team is currently Lisa Thiergart (team lead) and myself. We will research and design technical aspects of regulation and policy that could lead to safer AI, focusing on methods that won’t break as we move towards smarter-than-human AI. We want to design policy that allows us to safely and objectively assess the risks from powerful AI, build consensus around the risks we face, and put in place measures to prevent catastrophic outcomes. The team will likely work on: * Limitations of current proposals such as RSPs * Inputs into regulations, requests for comment by policy bodies (ex. NIST/US AISI, EU, UN) * Researching and designing alternative Safety Standards, or amendments to existing proposals * Communicating with and consulting for policymakers and governance organizations If you have any questions, feel free to contact me on LW or at peter@intelligence.org 
Tamsin Leake2d15-12
14
Regardless of how good their alignment plans are, the thing that makes OpenAI unambiguously evil is that they created a strongly marketed public product and, as a result, caused a lot public excitement about AI, and thus lots of other AI capabilities organizations were created that are completely dismissive of safety. There's just no good reason to do that, except short-term greed at the cost of higher probability that everyone (including people at OpenAI) dies. (No, "you need huge profits to solve alignment" isn't a good excuse — we had nowhere near exhausted the alignment research that can be done without huge profits.)

Popular Comments

Recent Discussion

I

Imagine an alternate version of the Effective Altruism movement, whose early influences came from socialist intellectual communities such as the Fabian Society, as opposed to the rationalist diaspora. Let’s name this hypothetical movement the Effective Samaritans.

Like the EA movement of today, they believe in doing as much good as possible, whatever this means. They began by evaluating existing charities, reading every RCT to find the very best ways of helping.

But many effective samaritans were starting to wonder. Is this randomista approach really the most prudent? After all, Scandinavia didn’t become wealthy and equitable through marginal charity. Societal transformation comes from uprooting oppressive power structures.

The Scandinavian societal model which lifted the working class, brought weekends, universal suffrage, maternity leave, education, and universal healthcare can be traced back all the...

1cubefox1h
Though this is only what Bayesianism predicts. A different theory of induction (e.g. one that explains human intelligence, or one that describes how to build an AGI) may not have an equivalent to Bayesian priors. Differences in opinions between two agents could instead be explained by having had different experiences, beliefs being path dependent (order of updates matters), or inference being influenced by random chance.
2dr_s30m
I'm not sure how that works. Bayes' theorem, per se, is correct. I'm not talking about a level of abstraction in which I try to define decisions/beliefs as symbols, I'm talking about the bare "two different brains with different initial states, subject to the same input, will end up in different final states". All of that can be accounted for in a Bayesian framework though? Different experiences produce different posteriors of course, and as for path dependence and random chance, I think you can easily get those by introducing some kind of hidden states, describing things we don't quite know about the inner workings of the brain.
cubefox11m10

All of that can be accounted for in a Bayesian framework though?

I mean that those factors don't presuppose different priors. You could still end up with different "posteriors" even with the same "starting point".

An example for an (informal) alternative to Bayesian updating, that doesn't require subjective priors, is Inference to the Best Explanation. One could, of course, model the criteria that determine the goodness of explanations as a sort of "prior". But those criteria would be part of the hypothetical IBE algorithm, not a free variable like in Ba... (read more)

3cousin_it2h
Yeah, the trapped priors thing is pretty worrying to me too. But I'm confused about the opposing interventions thing. Do charter cities, or unions, rely on donations that much? Is it really so common for donations to cancel each other out? I guess advocacy donations (for example, pro-life vs pro-choice) do cancel each other out, so maybe we could all agree that advocacy isn't charity.

1. If you find that you’re reluctant to permanently give up on to-do list items, “deprioritize” them instead

hate the idea of deciding that something on my to-do list isn’t that important, and then deleting it off my to-do list without actually doing it. Because once it’s off my to-do list, then quite possibly I’ll never think about it again. And what if it’s actually worth doing? Or what if my priorities will change such that it will be worth doing at some point in the future? Gahh!

On the other hand, if I never delete anything off my to-do list, it will grow to infinity.

The solution I’ve settled on is a priority-categorized to-do list, using a kanban-style online tool (e.g. Trello). The left couple columns (“lists”) are very active—i.e., to-do list...

  1. If you find that you’re reluctant to delete computer files / emails, don’t empty the trash

In Gmail I like to scan the email headers and then I bulk select and archive them (* a e thanks to vim shortcuts). After 5 years of doing this I still didn't run out of the free storage in Gmail. I already let Gmail sort the emails by "Primary" , "Promotions" , "Updates" etc. Usually the only important things are in "Primary" and 1 or 2 in "Updates".

For the last month, @RobertM and I have been exploring the possible use of recommender systems on LessWrong. Today we launched our first site-wide experiment in that direction. 

Behold, a tab with recommendations!

(In the course of our efforts, we also hit upon a frontpage refactor that we reckon is pretty good: tabs instead of a clutter of different sections. For now, only for logged-in users. Logged-out users see the "Latest" tab, which is the same-as-usual list of posts.)

Why algorithmic recommendations?

A core value of LessWrong is to be timeless and not news-driven. However, the central algorithm by which attention allocation happens on the site is the Hacker News algorithm[1], which basically only shows you things that were posted recently, and creates a strong incentive for discussion to always be...

niplav2h86

A core value of LessWrong is to be timeless and not news-driven.

I do really like the simplicity and predictability of the Hacker News algorithm. More karma means more visibility, older means less visibility.

Our current goal is to produce a recommendations feed that both makes people feel like they're keeping up to date with what's new (something many people care about) and also suggest great reads from across LessWrong's entire archive.

I hope that we can avoid getting swallowed by Shoggoth for now by putting a lot of thought into our optimization

... (read more)
6dr_s3h
I am sceptical of recommender systems - I think they are kind of bound to end up in self reinforcing loops. I'd be more happy seeing a more transparent system - we have tags, upvotes, the works, so you could have something like a series of "suggested searches", e.g. the most common combinations of tags you've visited, that a user has a fast access to while also seeing what precisely is it that they're clicking on. That said, I do trust this website of all things to acknowledge if things aren't going to plan and revert. If we fail to align this one small AI to our values, well, that's a valuable lesson.

This is the eighth post in my series on Anthropics. The previous one is Lessons from Failed Attempts to Model Sleeping Beauty Problem. The next one is Beauty and the Bets.

Introduction

Suppose we take the insights from the previous post, and directly try to construct a model for the Sleeping Beauty problem based on them.

We expect a halfer model, so

On the other hand, in order not repeat Lewis' Model's mistakes:

But both of these statements can only be true if 

And, therefore, apparently,  has to be zero, which sounds obviously wrong. Surely the Beauty can be awaken on Tuesday! 

At this point, I think, you wouldn't be surprised, if I tell you that there are philosophers who are eager to bite this bullet and claim that the Beauty should, indeed, reason as...

I knew that not any string of English words gets a probability, but I was naïve enough to think that all statements that are either true or false get one.

Well, I think this one is actually correct. But, as I said in the previous comment, the statement "Today is Monday" doesn't actually have a coherent truth value throughout the probability experiment. It's not either True or False. It's either True or True and False at the same time!

I was hoping they this sequence of posts which kept saying “don’t worry about anthropics, just be careful with the basics and

... (read more)

EDIT 1/27: This post neglects the entire sub-field of estimating uncertainty of learned representations, as in https://openreview.net/pdf?id=e9n4JjkmXZ. I might give that a separate follow-up post.

 

Introduction

Suppose you've built some AI model of human values. You input a situation, and it spits out a goodness rating. You might want to ask: "What are the error bars on this goodness rating?" In addition to it just being nice to know error bars, an uncertainty estimate can also be useful inside the AI: guiding active learning[1], correcting for the optimizer's curse[2], or doing out-of-distribution detection[3].

I recently got into the uncertainty estimation literature for neural networks (NNs) for a pet reason: I think it would be useful for alignment to quantify the domain of validity of an AI's latent features. If we...

This was a great post, thank you for making it!

I wanted to ask what you thought about the LLM-forecasting papers in relation to this literature? Do you think there are any ways of applying the uncertainty estimation literature to improve the forecasting ability of AI?:

https://arxiv.org/pdf/2402.18563.pdf

Epistemic status: pretty confident. Based on several years of meditation experience combined with various pieces of Buddhist theory as popularized in various sources, including but not limited to books like The Mind Illuminated, Mastering the Core Teachings of the Buddha, and The Seeing That Frees; also discussions with other people who have practiced meditation, and scatterings of cognitive psychology papers that relate to the topic. The part that I’m the least confident of is the long-term nature of enlightenment; I’m speculating on what comes next based on what I’ve experienced, but have not actually had a full enlightenment. I also suspect that different kinds of traditions and practices may produce different kinds of enlightenment states.

While I liked Valentine’s recent post on kensho and its follow-ups a lot,...

Based on the link, it seems you follow the Theravada tradition. The ideas you give go against the Theravada ideas. You need to go study the Pali Canon. This information is all wrong I'm afraid. I won't talk more on the matter.

1ship_shlap18h
I won't correct everything I find wrong, but I felt that the "Understanding Suffering" section was completely off. I will just mention one of the major points:   This is utterly wrong. Enlightenment in Buddhism means emotional pain cannot arise, period. In Buddhism, there are five "hindrances" or negative mental states: desire, aversion, compulsion/agitation, slothfulness and remorse. This list is said to encapsulate all possible negative feelings. In an enlightened person, these hindrances cannot arise. The "fetter", the bond which causes a person to experience these is uprooted.  Secondly, in Buddhism, it's believed that negative mental states are always a bad and painful experience so it's impossible to not mind having them. If you think about it, you can't be sad and not mind it. You can't be angry but not mind it. There are a few Buddhist circles which believe you can be detached from anger or desire, but this doesn't make sense because in Buddhist theory, such mental states arise from attachment in the first place. 
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

It was all quiet. Then it wasn’t.

Note the timestamps on both of these.

Dwarkesh Patel did a podcast with Mark Zuckerberg on the 18th. It was timed to coincide with the release of much of Llama-3, very much the approach of telling your story directly. Dwarkesh is now the true tech media. A meteoric rise, and well earned.

This is two related posts in one. First I cover the podcast, then I cover Llama-3 itself.

My notes are edited to incorporate context from later explorations of Llama-3, as I judged that the readability benefits exceeded the purity costs.

Podcast Notes: Llama-3 Capabilities

  1. (1:00) They start with Llama 3 and the new L3-powered version of Meta AI. Zuckerberg says “With Llama 3, we think now that Meta AI is the most intelligent, freely-available
...

Do you have any thoughts on whether it would make sense to push for a rule that forces open-source or open-weight models to be released behind an API for a certain amount of time before they can be released to the public?

What’s Twitter for you?

That's a long-lasting trend I often see on my feed when people praise the blue bird for getting them a job, introducing them to new people, investors, and all this and that.

What about me? I just wanted to get into dribbble — the then invite-only designer's social network which was at its peak at the time. When I realized invites were given away on Twitter, I set up an account and went on a hunt. Soon, the mission was accomplished.

For the next few years, I went on a radio silence. Like many others, I was lurking most of the time. Even today I don't tweet excessively. But Twitter has always been a town square of mine. Suited best for my interests, it’s been a...

Ever since they killed (or made it harder to host) nitter,rss,guest accounts etc. Twitter has been out of my life for the better. I find the twitter UX in terms of performance, chronological posts, subscriptions to be sub-optimal. If I do create an account my "home" feed has too much ingroup v/s outgroup kind of content (even within tech enthusiasts circle thanks to the AI safety vs e/acc debate etc), verified users are over-represented by design but it buries the good posts from non-verified. Elon is trying wayy too hard to prevent AI web scrapers ruining my workflow

5Adam Shai6h
A neglected problem in AI safety technical research is teasing apart the mechanisms of dangerous capabilities exhibited by current LLMs. In particular, I am thinking that for any model organism ( see Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research) of dangerous capabilities (e.g. sleeper agents paper), we don't know how much of the phenomenon depends on the particular semantics of terms like "goal" and "deception" and "lie" (insofar as they are used in the scratchpad or in prompts or in finetuning data) or if the same phenomenon could be had by subbing in more or less any word. One approach to this is to make small toy models of these type of phenomenon where we can more easily control data distributions and yet still get analogous behavior. In this way we can really control for any particular aspect of the data and figure out, scientifically, the nature of these dangers. By small toy model I'm thinking of highly artificial datasets (perhaps made of binary digits with specific correlation structure, or whatever the minimum needed to get the phenomenon at hand).

Terminology point: When I say "a model has a dangerous capability", I usually mean "a model has the ability to do XYZ if fine-tuned to do so". You seem to be using this term somewhat differently as model organisms like the ones you discuss are often (though not always) looking at questions related to inductive biases and generalization (e.g. if you train a model to have a backdoor and then train it in XYZ way does this backdoor get removed).

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA