Frustrated by claims that "enlightenment" and similar meditative/introspective practices can't be explained and that you only understand if you experience them, Kaj set out to write his own detailed gears-level, non-mysterious, non-"woo" explanation of how meditation, etc., work in the same way you might explain the operation of an internal combustion engine.

There was this voice inside my head that told me that since I got Something to protect, relaxing is never ok above strict minimum, the goal is paramount, and I should just work as hard as I can all the time. This led me to breaking down and being incapable to work on my AI governance job for a week, as I just piled up too much stress. And then, I decided to follow what motivated me in the moment, instead of coercing myself into working on what I thought was most important, and lo and behold! my total output increased, while my time spent working decreased. I'm so angry and sad at the inadequacy of my role models, cultural norms, rationality advice, model of the good EA who does not burn out, which still led me to smash into the wall despite their best intentions. I became so estranged from my own body and perceptions, ignoring my core motivations, feeling harder and harder to work. I dug myself such deep a hole. I'm terrified at the prospect to have to rebuild my motivation myself again.
Elizabeth7h112
0
Brandon Sanderson is a bestselling fantasy author. Despite mostly working with traditional publishers, there is a 50-60 person company formed around his writing[1]. This podcast talks about how the company was formed. Things I liked about this podcast: 1. he and his wife both refer to it as "our" company and describe critical contributions she made. 2. the number of times he was dissatisfied with the way his publisher did something and so hired someone in his own company to do it (e.g. PR and organizing book tours), despite that being part of the publisher's job. 3. He believed in his back catalog enough to buy remainder copies of his books (at $1/piece) and sell them via his own website at sticker price (with autographs). This was a major source of income for a while.  4. Long term grand strategic vision that appears to be well aimed and competently executed. 1. ^ The only non-Sanderson content I found was a picture book from his staff artist. 
A neglected problem in AI safety technical research is teasing apart the mechanisms of dangerous capabilities exhibited by current LLMs. In particular, I am thinking that for any model organism ( see Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research) of dangerous capabilities (e.g. sleeper agents paper), we don't know how much of the phenomenon depends on the particular semantics of terms like "goal" and "deception" and "lie" (insofar as they are used in the scratchpad or in prompts or in finetuning data) or if the same phenomenon could be had by subbing in more or less any word. One approach to this is to make small toy models of these type of phenomenon where we can more easily control data distributions and yet still get analogous behavior. In this way we can really control for any particular aspect of the data and figure out, scientifically, the nature of these dangers. By small toy model I'm thinking of highly artificial datasets (perhaps made of binary digits with specific correlation structure, or whatever the minimum needed to get the phenomenon at hand).
MIRI Technical Governance Team is hiring, please apply and work with us! We are looking to hire for the following roles: * Technical Governance Researcher (2-4 hires) * Writer (1 hire) The roles are located in Berkeley, and we are ideally looking to hire people who can start ASAP. The team is currently Lisa Thiergart (team lead) and myself. We will research and design technical aspects of regulation and policy that could lead to safer AI, focusing on methods that won’t break as we move towards smarter-than-human AI. We want to design policy that allows us to safely and objectively assess the risks from powerful AI, build consensus around the risks we face, and put in place measures to prevent catastrophic outcomes. The team will likely work on: * Limitations of current proposals such as RSPs * Inputs into regulations, requests for comment by policy bodies (ex. NIST/US AISI, EU, UN) * Researching and designing alternative Safety Standards, or amendments to existing proposals * Communicating with and consulting for policymakers and governance organizations If you have any questions, feel free to contact me on LW or at peter@intelligence.org 
Tamsin Leake2d15-12
14
Regardless of how good their alignment plans are, the thing that makes OpenAI unambiguously evil is that they created a strongly marketed public product and, as a result, caused a lot public excitement about AI, and thus lots of other AI capabilities organizations were created that are completely dismissive of safety. There's just no good reason to do that, except short-term greed at the cost of higher probability that everyone (including people at OpenAI) dies. (No, "you need huge profits to solve alignment" isn't a good excuse — we had nowhere near exhausted the alignment research that can be done without huge profits.)

Popular Comments

Recent Discussion

Disclaimer: While I criticize several EA critics in this article, I am myself on the EA-skeptical side of things (especially on AI risk).

Introduction

I am a proud critic of effective altruism, and in particular a critic of AI existential risk, but I have to admit that a lot of the critcism of EA is hostile, or lazy, and is extremely unlikely to convince a believer.

Take this recent Leif Weinar time article as an example. I liked a few of the object level critiques, but many of the points were twisted, and the overall point was hopelessly muddled (are they trying to say that voluntourism is the solution here?). As people have noted, the piece was needlessly hostile to EA (and incredibly hostile to Will Macaskill in particular). And...

Good article. 

It's an asymmetry worth pointing out.

It seems related to some concept of "low interest rate phenomenon in ideas". Sometimes in a low interest rate environment, people fund all sorts of stuff, because they want any return and credit is cheap. Later much of this looks bunk. Likewise, much EA behaviour around the plentiful money and status of the FTX era looks profligate by todays standards. In the same way I wonder what ideas are held up by some vague consensus rather than being good ideas.

2Nathan Young37m
Feels like there is something off about the following graph. Ie these people could write better critiques. Many care a lot. Émile spends a lot of time on their work for instance. I don't think effort really catches what's going on. I think it's a mix of effort status and norms. In our community it's high status to bend over backwards to write a critique (not that we always succeed). For Émile, as an example, I don't think this is the case. Perhaps, they gain status by articles that are widely shared and link ideas they dislike to a broader worldview. 
6ryan_greenblatt14h
I'm not sure that I buy that critics lack motivation. At least in the space of AI, there will be (and already are) people with immense financial incentive to ensure that x-risk concerns don't become very politically powerful. Of course, it might be that the best move for these critics won't be to write careful and well reasoned arguments for whatever reason (e.g. this would draw more attention to x-risk so ignoring it is better from their perspective). (I think critics in the space of GHW might lack motivation, but at least in AI and maybe animal welfare I would guess that "lack of motive" isn't a good description of what is going on.) Edit: this is mentioned in the post, but I'm a bit surprised because this isn't emphasized more. [Cross-posted from EAF]
2abstractapplic14h
Typos: "Al gore"->"Al Gore" "newpaper"->"newspaper" "south park"->"South Park" "scott alexander"->"Scott Alexander" "a littler deeper"->"a little deeper" "Ai"->"AI" (. . . I'm now really curious as to why you keep decapitalizing names and proper nouns.) Regarding the actual content of the post: appreciated, approved, and strong-upvoted. Thank you.

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

I like his UI. In fact, I shared about CQ2 with Andy in February since his notes site was the only other place where I had seen the sliding pane design. He said CQ2 is neat!

6niplav16h
There are several sequences which are visible on the profiles of their authors, but haven't yet been added to the library. Those are: * «Boundaries» Sequence (Andrew Critch) * Maximal Lottery-Lotteries (Scott Garrabrant) * Geometric Rationality (Scott Garrabrant) * UDT 1.01 (Diffractor) * Unifying Bargaining (Diffractor) * Why Everyone (Else) Is a Hypocrite: Evolution and the Modular Mind (Kaj Sotala) * The Sense Of Physical Necessity: A Naturalism Demo (LoganStrohl) * Scheming AIs: Will AIs fake alignment during training in order to get power? (Joe Carlsmith) I think these are good enough to be moved into the library.
4habryka13h
This probably should be made more transparent, but the reason why these aren't in the library is because they don't have images for the sequence-item. We display all sequences that people create that have proper images on the library (otherwise we just show it on user's profiles).

Epistemic status: pretty confident. Based on several years of meditation experience combined with various pieces of Buddhist theory as popularized in various sources, including but not limited to books like The Mind Illuminated, Mastering the Core Teachings of the Buddha, and The Seeing That Frees; also discussions with other people who have practiced meditation, and scatterings of cognitive psychology papers that relate to the topic. The part that I’m the least confident of is the long-term nature of enlightenment; I’m speculating on what comes next based on what I’ve experienced, but have not actually had a full enlightenment. I also suspect that different kinds of traditions and practices may produce different kinds of enlightenment states.

While I liked Valentine’s recent post on kensho and its follow-ups a lot,...

1ship_shlap3h
Based on the link, it seems you follow the Theravada tradition. The ideas you give go against the Theravada ideas. You need to go study the Pali Canon. This information is all wrong I'm afraid. I won't talk more on the matter.

Based on the link, it seems you follow the Theravada tradition. 

For what it's worth, I don't really follow any one tradition, though Culadasa does indeed have a Theravada background.

I

Imagine an alternate version of the Effective Altruism movement, whose early influences came from socialist intellectual communities such as the Fabian Society, as opposed to the rationalist diaspora. Let’s name this hypothetical movement the Effective Samaritans.

Like the EA movement of today, they believe in doing as much good as possible, whatever this means. They began by evaluating existing charities, reading every RCT to find the very best ways of helping.

But many effective samaritans were starting to wonder. Is this randomista approach really the most prudent? After all, Scandinavia didn’t become wealthy and equitable through marginal charity. Societal transformation comes from uprooting oppressive power structures.

The Scandinavian societal model which lifted the working class, brought weekends, universal suffrage, maternity leave, education, and universal healthcare can be traced back all the...

2dr_s2h
I'm not sure how that works. Bayes' theorem, per se, is correct. I'm not talking about a level of abstraction in which I try to define decisions/beliefs as symbols, I'm talking about the bare "two different brains with different initial states, subject to the same input, will end up in different final states". All of that can be accounted for in a Bayesian framework though? Different experiences produce different posteriors of course, and as for path dependence and random chance, I think you can easily get those by introducing some kind of hidden states, describing things we don't quite know about the inner workings of the brain.
1cubefox1h
I mean that those factors don't presuppose different priors. You could still end up with different "posteriors" even with the same "starting point". An example for an (informal) alternative to Bayesian updating, that doesn't require subjective priors, is Inference to the Best Explanation. One could, of course, model the criteria that determine the goodness of explanations as a sort of "prior". But those criteria would be part of the hypothetical IBE algorithm, not a free variable like in Bayesian updating. One could also claim that there are no objective facts about the goodness of explanations and that IBE is invalid. But that's an open question.
2dr_s1h
I'd definitely call any assumption about which forms preferred explanations should take as a "prior". Maybe I have a more flexible concept of what counts as Bayesian than you, in that sense? Priors don't need to be free parameters, the process has to start somewhere. But if you already have some data and then acquire some more data, obviously the previous data will still affect your conclusions.

The problem with calling parts of a learning algorithm a prior that are not free variables, is that then anything (every part of any learning algorithm) would count as a prior. So even the Bayesian conditionalization rule itself. But that's not what Bayesians consider part of a prior.

1. If you find that you’re reluctant to permanently give up on to-do list items, “deprioritize” them instead

hate the idea of deciding that something on my to-do list isn’t that important, and then deleting it off my to-do list without actually doing it. Because once it’s off my to-do list, then quite possibly I’ll never think about it again. And what if it’s actually worth doing? Or what if my priorities will change such that it will be worth doing at some point in the future? Gahh!

On the other hand, if I never delete anything off my to-do list, it will grow to infinity.

The solution I’ve settled on is a priority-categorized to-do list, using a kanban-style online tool (e.g. Trello). The left couple columns (“lists”) are very active—i.e., to-do list...

  1. If you find that you’re reluctant to delete computer files / emails, don’t empty the trash

In Gmail I like to scan the email headers and then I bulk select and archive them (* a e thanks to vim shortcuts). After 5 years of doing this I still didn't run out of the free storage in Gmail. I already let Gmail sort the emails by "Primary" , "Promotions" , "Updates" etc. Usually the only important things are in "Primary" and 1 or 2 in "Updates".

For the last month, @RobertM and I have been exploring the possible use of recommender systems on LessWrong. Today we launched our first site-wide experiment in that direction. 

Behold, a tab with recommendations!

(In the course of our efforts, we also hit upon a frontpage refactor that we reckon is pretty good: tabs instead of a clutter of different sections. For now, only for logged-in users. Logged-out users see the "Latest" tab, which is the same-as-usual list of posts.)

Why algorithmic recommendations?

A core value of LessWrong is to be timeless and not news-driven. However, the central algorithm by which attention allocation happens on the site is the Hacker News algorithm[1], which basically only shows you things that were posted recently, and creates a strong incentive for discussion to always be...

niplav3h86

A core value of LessWrong is to be timeless and not news-driven.

I do really like the simplicity and predictability of the Hacker News algorithm. More karma means more visibility, older means less visibility.

Our current goal is to produce a recommendations feed that both makes people feel like they're keeping up to date with what's new (something many people care about) and also suggest great reads from across LessWrong's entire archive.

I hope that we can avoid getting swallowed by Shoggoth for now by putting a lot of thought into our optimization

... (read more)
6dr_s4h
I am sceptical of recommender systems - I think they are kind of bound to end up in self reinforcing loops. I'd be more happy seeing a more transparent system - we have tags, upvotes, the works, so you could have something like a series of "suggested searches", e.g. the most common combinations of tags you've visited, that a user has a fast access to while also seeing what precisely is it that they're clicking on. That said, I do trust this website of all things to acknowledge if things aren't going to plan and revert. If we fail to align this one small AI to our values, well, that's a valuable lesson.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

This is the eighth post in my series on Anthropics. The previous one is Lessons from Failed Attempts to Model Sleeping Beauty Problem. The next one is Beauty and the Bets.

Introduction

Suppose we take the insights from the previous post, and directly try to construct a model for the Sleeping Beauty problem based on them.

We expect a halfer model, so

On the other hand, in order not repeat Lewis' Model's mistakes:

But both of these statements can only be true if 

And, therefore, apparently,  has to be zero, which sounds obviously wrong. Surely the Beauty can be awaken on Tuesday! 

At this point, I think, you wouldn't be surprised, if I tell you that there are philosophers who are eager to bite this bullet and claim that the Beauty should, indeed, reason as...

I knew that not any string of English words gets a probability, but I was naïve enough to think that all statements that are either true or false get one.

Well, I think this one is actually correct. But, as I said in the previous comment, the statement "Today is Monday" doesn't actually have a coherent truth value throughout the probability experiment. It's not either True or False. It's either True or True and False at the same time!

I was hoping they this sequence of posts which kept saying “don’t worry about anthropics, just be careful with the basics and

... (read more)

EDIT 1/27: This post neglects the entire sub-field of estimating uncertainty of learned representations, as in https://openreview.net/pdf?id=e9n4JjkmXZ. I might give that a separate follow-up post.

 

Introduction

Suppose you've built some AI model of human values. You input a situation, and it spits out a goodness rating. You might want to ask: "What are the error bars on this goodness rating?" In addition to it just being nice to know error bars, an uncertainty estimate can also be useful inside the AI: guiding active learning[1], correcting for the optimizer's curse[2], or doing out-of-distribution detection[3].

I recently got into the uncertainty estimation literature for neural networks (NNs) for a pet reason: I think it would be useful for alignment to quantify the domain of validity of an AI's latent features. If we...

This was a great post, thank you for making it!

I wanted to ask what you thought about the LLM-forecasting papers in relation to this literature? Do you think there are any ways of applying the uncertainty estimation literature to improve the forecasting ability of AI?:

https://arxiv.org/pdf/2402.18563.pdf

It was all quiet. Then it wasn’t.

Note the timestamps on both of these.

Dwarkesh Patel did a podcast with Mark Zuckerberg on the 18th. It was timed to coincide with the release of much of Llama-3, very much the approach of telling your story directly. Dwarkesh is now the true tech media. A meteoric rise, and well earned.

This is two related posts in one. First I cover the podcast, then I cover Llama-3 itself.

My notes are edited to incorporate context from later explorations of Llama-3, as I judged that the readability benefits exceeded the purity costs.

Podcast Notes: Llama-3 Capabilities

  1. (1:00) They start with Llama 3 and the new L3-powered version of Meta AI. Zuckerberg says “With Llama 3, we think now that Meta AI is the most intelligent, freely-available
...

Do you have any thoughts on whether it would make sense to push for a rule that forces open-source or open-weight models to be released behind an API for a certain amount of time before they can be released to the public?

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA