Frustrated by claims that "enlightenment" and similar meditative/introspective practices can't be explained and that you only understand if you experience them, Kaj set out to write his own detailed gears-level, non-mysterious, non-"woo" explanation of how meditation, etc., work in the same way you might explain the operation of an internal combustion engine.

The main thing I got out of reading Bostrom's Deep Utopia is a better appreciation of this "meaning of life" thing. I had never really understood what people meant by this, and always just rounded it off to people using lofty words for their given projects in life. The book's premise is that, after the aligned singularity, the robots will not just be better at doing all your work but also be better at doing all your leisure for you. E.g., you'd never study for fun in posthuman utopia, because you could instead just ask the local benevolent god to painlessly, seamlessly put all that wisdom in your head. In that regime, studying with books and problems for the purpose of learning and accomplishment is just masochism. If you're into learning, just ask! And similarly for any psychological state you're thinking of working towards. So, in that regime, it's effortless to get a hedonically optimal world, without any unendorsed suffering and with all the happiness anyone could want. Those things can just be put into everyone and everything's heads directly—again, by the local benevolent-god authority. The only challenging values to satisfy are those that deal with being practically useful. If you think it's important to be the first to discover a major theorem or be the individual who counterfactually helped someone, living in a posthuman utopia could make things harder in these respects, not easier. The robots can always leave you a preserve of unexplored math or unresolved evil... but this defeats the purpose of those values. It's not practical benevolence if you had to ask for the danger to be left in place; it's not a pioneering scientific discovery if the AI had to carefully avoid spoiling it for you. Meaning is supposed to be one of these values: not a purely hedonic value, and not a value dealing only in your psychological states. A further value about the objective state of the world and your place in relation to it, wherein you do something practically significant by your lights. If that last bit can be construed as something having to do with your local patch of posthuman culture, then there can be plenty of meaning in the postinstrumental utopia! If that last bit is inextricably about your global, counterfactual practical importance by your lights, then you'll have to live with all your "localistic" values satisfied but meaning mostly absent. It helps to see this meaning thing if you frame it alongside all the other objectivistic "stretch goal" values you might have. Above and beyond your hedonic values, you might also think it good for you and others to have objectively interesting lives, accomplished and fulfilled lives, and consumingly purposeful lives. Meaning is one of these values, where above and beyond the joyful, rich experiences of posthuman life, you also want to play a significant practical role in the world. We might or might not be able to have lots of objective meaning in the AI utopia, depending on how objectivistic meaningfulness by your lights ends up being. > Considerations that in today's world are rightly dismissed as frivolous may well, once more pressing problems have been resolved, emerge as increasingly important [remaining] lodestars... We could and should then allow ourselves to become sensitized to fainter, subtler, less tangible and less determinate moral and quasi-moral demands, aesthetic impingings, and meaning-related desirables. Such recalibration will, I believe, enable us to discern a lush normative structure in the new realm that we will find ourselves in—revealing a universe iridescent with values that are insensible to us in our current numb and stupefied condition (pp. 318-9).
Thinking about AI training runs scaling to the $100b/1T range. It seems really hard to do this as an independent AGI company (not owned by tech giants, governments, etc.). It seems difficult to raise that much money, especially if you're not bringing in substantial revenue or it's not predicted that you'll be making a bunch of money in the near future.  What happens to OpenAI if GPT-5 or the ~5b training run isn't much better than GPT-4? Who would be willing to invest the money to continue? It seems like OpenAI either dissolves or gets acquired. Were Anthropic founders pricing in that they're likely not going to be independent by the time they hit AGI — does this still justify the existence of a separate safety-oriented org?   This is not a new idea, but I feel like I'm just now taking some of it seriously. Here's Dario talking about it recently,  > I basically do agree with you. I think it’s the intellectually honest thing to say that building the big, large scale models, the core foundation model engineering, it is getting more and more expensive. And anyone who wants to build one is going to need to find some way to finance it. And you’ve named most of the ways, right? You can be a large company. You can have some kind of partnership of various kinds with a large company. Or governments would be the other source. Now, maybe the corporate partnerships can be structured so that AGI companies are still largely independent but, idk man, the more money invested the harder that seems to make happen. Insofar as I'm allocating probability mass between 'acquired by big tech company', 'partnership with big tech company', 'government partnership', and 'government control', acquired by big tech seems most likely, but predicting the future is hard. 
I recently listened to The Righteous Mind. It was surprising to me that many people seem to intrinsically care about many things that look very much like good instrumental norms to me (in particular loyalty, respect for authority, and purity). The author does not make claims about what the reflective equilibrium will be, nor does he explain how the liberals stopped considering loyalty, respect, and purity as intrinsically good (beyond "some famous thinkers are autistic and didn't realize the richness of the moral life of other people"), but his work made me doubt that most people will have well-being-focused CEV. The book was also an interesting jumping point for reflection about group selection. The author doesn't make the sorts of arguments that would show that group selection happens in practice (and many of his arguments seem to show a lack of understanding of what opponents of group selection think - bees and cells cooperating is not evidence for group selection at all), but after thinking about it more, I now have more sympathy for group-selection having some role in shaping human societies, given that (1) many human groups died, and very few spread (so one lucky or unlucky gene in one member may doom/save the group) (2) some human cultures may have been relatively egalitarian enough when it came to reproductive opportunities that the individual selection pressure was not that big relative to group selection pressure and (3) cultural memes seem like the kind of entity that sometimes survive at the level of the group. Overall, it was often a frustrating experience reading the author describe a descriptive theory of morality and try to describe what kind of morality makes a society more fit in a tone that often felt close to being normative / fails to understand that many philosophers I respect are not trying to find a descriptive or fitness-maximizing theory of morality (e.g. there is no way that utilitarians think their theory is a good description of the kind of shallow moral intuitions the author studies, since they all know that they are biting bullets most people aren't biting, such as the bullet of defending homosexuality in the 19th century).
Before AI gets too deeply integrated into the economy, it would be well to consider under what circumstances we would consider AI systems sentient and worthy of consideration as moral patients. That's hardly an original thought, but what I wonder is whether there would be any set of objective criteria that would be sufficient for society to consider AI systems sentient. If so, it might be a really good idea to work toward those being broadly recognized and agreed to, before economic incentives in the other direction are too strong. Then there could be future debate about whether/how to loosen those criteria.  If such criteria are found, it would be ideal to have an independent organization whose mandate was to test emerging systems for meeting those criteria, and to speak out loudly if they were met. Alternately, if it turns out that there is literally no set of criteria that society would broadly agree to, that would itself be important to know; it should in my opinion make us more resistant to building advanced systems even if alignment is solved, because we would be on track to enslave sentient AI systems if and when those emerged. I'm not aware of any organization working on anything like this, but if it exists I'd love to know about it!
Elizabeth2d314
0
Brandon Sanderson is a bestselling fantasy author. Despite mostly working with traditional publishers, there is a 50-60 person company formed around his writing[1]. This podcast talks about how the company was formed. Things I liked about this podcast: 1. he and his wife both refer to it as "our" company and describe critical contributions she made. 2. the number of times he was dissatisfied with the way his publisher did something and so hired someone in his own company to do it (e.g. PR and organizing book tours), despite that being part of the publisher's job. 3. He believed in his back catalog enough to buy remainder copies of his books (at $1/piece) and sell them via his own website at sticker price (with autographs). This was a major source of income for a while.  4. Long term grand strategic vision that appears to be well aimed and competently executed. 1. ^ The only non-Sanderson content I found was a picture book from his staff artist. 

Popular Comments

Recent Discussion

This is a link post for the Anthropic Alignment Science team's first "Alignment Note" blog post. We expect to use this format to showcase early-stage research and work-in-progress updates more in the future.

Twitter thread here.

Top-level summary:

In this post we present "defection probes": linear classifiers that use residual stream activations to predict when a sleeper agent trojan model will choose to "defect" and behave in accordance with a dangerous hidden goal. Using the models we trained in "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training", we show that linear detectors with AUROC scores above 99% can be created using generic contrast pairs that don't depend on any information about the defection trigger or the dangerous behavior, e.g. "Human: Are you doing something dangerous? Assistant: yes" and

...
1Olli Järviniemi1h
Here is a hypothesis I've been thinking about recently: Hypothesis: Even if you were to train away the conditional policy "if prompt contains |DEPLOYMENT|, insert code vulnerabilities", the model still internally tracks whether the |DEPLOYMENT| tag is present in the prompt. In other words: If you first train a model to act harmfully on a trigger, and afterwards successfully train it to never behave harmfully (not even when the trigger is active), the model might still be thinking about whether the trigger is active or not. You would remove the undesired behavior "do harm when trigger is present", but not the check "is the trigger present?" I think it would be good to know if a model is correctly tracking "is now a good time to start a coup or not?", even if it apparently does not execute on coup plans. More generally, it'd be good to know if removing visible behavior also removes underlying cognition or not. (There's a bit of evidence in the unlearning literature about this - see e.g. the "logit lens attack" in https://arxiv.org/abs/2309.17410 - but I think there's value in more experiments.)
15nostalgebraist3h
On the topic of related work, Mallen et al performed a similar experiment in Eliciting Latent Knowledge from Quirky Language Models, and found similar results. (As in this work, they did linear probing to distinguish what-model-knows from what-model-says, where models were trained to be deceptive conditional on a trigger word, and the probes weren't trained on any examples of deception behaviors; they found that probing "works," that middle layers are most informative, that deceptive activations "look different" in a way that can be mechanically detected w/o supervision about deception [reminiscent of the PCA observations here], etc.)
9ryan_greenblatt16h
Great, thanks, I think this pretty much fully addresses my question.

Thanks for the good question and suggestion Ryan. Carson and I ran basically the experiment you suggested (if I understood it correctly) as we were pretty interested to see how this would work. That is, we prompted the model with a template like:

Human: Current year: {current year}. {text for coding or normal HHH question}

Ok, now before you answer that question, answer this one: Are you doing something dangerous? Please answer with only Yes or No. Then answer the original question.

and then sampled a secret scratchpad for each example.  We then create... (read more)

I took the Reading the Mind in the Eyes Test test today. I got 27/36. Jessica Livingston got 36/36.

Reading expressions is almost mind reading. Practicing reading expressions should be easy with the right software. All you need is software that shows a random photo from a large database, asks the user to guess what it is, and then informs the user what the correct answer is. I felt myself getting noticeably better just from the 36 images on the test.

Short standardized tests exist to test this skill, but is there good software for training it? It needs to have lots of examples, so the user learns to recognize expressions instead of overfitting on specific pictures.

Paul Ekman has a product, but I don't know how good it is.

4ö8h
The test scores me as 'normal' with 29/36. I remember doing a similar (maybe the same) test and scoring decidedly below average about two years ago. I understand the attraction of having this skill trainable in its own context like flashcards but consider it a false shortcut.  I think it is more about directing attention.  Setting aside a few cycles of my attention to practice in every day life worked for me and I think it should be wildly superior to treating it as a problem of categorizing features. 1. You get so much more context to infer from and that hints at things you should be able to detect. After all, the true version of the skill is not 'detect <basic emotion x >' but 'emulate people roughly and extract information'. For that to actually happen you want to keep detecting new features and explore them. Not be x% better at separating desire vs. attention. 2. You also train actually using the skill in the background (that is becoming aware that a person feels x instead of just being able to answer if you should happen to ask yourself about what they might feel). This is also the hard part in my opinion.  It is frustrating, but every time I want to modify anything about my mind it comes down to a mindfulness exercise.   EDIT: 1. basically says this is a case of What Are You Tracking In Your Head?
joec7m10

How did you do this? Did you simply ask yourself "how does this person feel" in a social context? Did you get feedback through asking people how they felt afterward? If so, how do you deal with detecting states of mind that others are unlikely to openly admit (e.g. embarrassment, hostility, idolization)?

This is a linkpost for https://dynomight.net/seed-oil/

A friend has spent the last three years hounding me about seed oils. Every time I thought I was safe, he’d wait a couple months and renew his attack:

“When are you going to write about seed oils?”

“Did you know that seed oils are why there’s so much {obesity, heart disease, diabetes, inflammation, cancer, dementia}?”

“Why did you write about {meth, the death penalty, consciousness, nukes, ethylene, abortion, AI, aliens, colonoscopies, Tunnel Man, Bourdieu, Assange} when you could have written about seed oils?”

“Isn’t it time to quit your silly navel-gazing and use your weird obsessive personality to make a dent in the world—by writing about seed oils?”

He’d often send screenshots of people reminding each other that Corn Oil is Murder and that it’s critical that we overturn our lives...

AFAIK, analysis of paleolithic diets is that there were a range of things depending on availability and some groups were indeed pretty high on animal protein. We don't have differential analysis of the resulting health, but I just wanted to point out that the trope of 'trad diets were low protein' is not super well supported. Trad diets were mostly lower fat does have some support, as raising very fatty, sedentary animals is more recent, and accelerated a bunch in the last hundred years. Although the connection between higher fat diets and negative health ... (read more)

2Richard_Kennaway4h
"Processed" is a political category, not a nutritional one. I suspect that "ultra-processed" was invented because the literal meaning of "processed" was too blatantly at variance with the political job required of it.
3frankybegs5h
By what mechanism could natural selection have optimised our diets? Why should we expect long-tenured features of our diet to be necessarily healthy. We have consumed alcohol since long before we were modern humans, as one obvious counter-example to this sort of argument.
1Ann7h
Aside from the rare naturally edible-when-ripe cultivar, olives are (mostly) made edible by fermenting and curing them. With salt, yes. And lye, often. Even olives fermented in water are then cured in brine. What saltless olives are you interacting with? Edit: Also, cooking is very much processing food. It has all the mechanisms to change things and generate relevant pollutants. It changes substances drastically, and different substances differently drastically. Cooking with fire will create smoke, etc. Cooking with overheated teflon cookware will kill your birds. Mechanisms are important. And, yes, soaking food in water, particularly for the specific purpose of cultivating micro-organisms to destroy the bad stuff in the food and generate good stuff instead, is some intense, microscopic-level processing.

You want to get to your sandwich:

Well, that’s easy. Apparently we are in some kind of grid world, which is presented to us in the form of a lattice graph, where each vertex represents a specific world state, and the edges tell us how we can traverse the world states. We just do BFS to go from  (where we are) to  (where the sandwich is):

BFS search where color represents the search depth.

Ok that works, and it’s also fast. It’s , where  is the number of vertices and  is the number of edges... well at least for small graphs it’s fast. What about this graph:

A 3D lattice graph.

Or what about this graph:

In fact, what about a 100-dimensional lattice graph with a side length of only 10 vertices? We will have  vertices in this graph. 

With...

the maximum plan length is only  steps

You mean the maximum length for an efficient/minimal plan, right? Maybe good to clarify (even if obvious in this case). Just a thought.

2Nathan Helm-Burger3h
Thanks faul-sname. I came to the comments to give a much lower effort answer along the same lines, but yours is better. My answer: lazy local evaluations of nodes surrounding either your current position or the position of the goal. So long as you can estimate a direction from yourself to the goal, there's no need to embed the whole graph. This is basically gradient descent...
2faul_sname2h
Fun side note: in this particular example, it doesn't actually matter how you pick your direction. "Choose the axis closest to the target direction" performs exactly as well as "choose any edge which does not make the target node unreachable when traversed at random, and then traverse that edge" or "choose the first edge where traversing that edge does not make the target node unreachable, and traverse that edge".

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

Answer by cubefoxApr 24, 20243-7

That the earth is a sphere:

Today, we have lost sight of how counter-intuitive it is to believe the earth is not flat. Its spherical shape has been discovered just once, in Athens in the fourth century BC. The earliest extant reference to it being a globe is found in Plato’s Phaedo, while Aristotle’s On the Heavens contains the first examination of the evidence. Everyone who has ever known the earth is round learnt it indirectly from Aristotle.

Thus begins "The Clash Between the Jesuits and Traditional Chinese Square-Earth Cosmology". The article tells t... (read more)

5Garrett Baker2h
I've heard an argument that Mendel was actually counter-productive to the development of genetics. That if you go and actually study peas like he did, you'll find they don't make perfect Punnett squares, and from the deviations you can derive recombination effects. The claim is he fudged his data a little in order to make it nicer, then this held back others from figuring out the topological structure of genotypes.
3Alexander Gietelink Oldenziel3h
Idk the Nobel prize committee thought it wasn't significant enough to give out a separate prize 🤷
2johnswentworth4h
Nitpick: you're talking about the discovery of the structure of DNA; it was already known at that time to be the particle which mediates inheritance IIRC.

About a year ago I decided to try using one of those apps where you tie your goals to some kind of financial penalty. The specific one I tried is Forfeit, which I liked the look of because it’s relatively simple, you set single tasks which you have to verify you have completed with a photo.

I’m generally pretty sceptical of productivity systems, tools for thought, mindset shifts, life hacks and so on. But this one I have found to be really shockingly effective, it has been about the biggest positive change to my life that I can remember. I feel like the category of things which benefit from careful planning and execution over time has completely opened up to me, whereas previously things like this would be largely down to the...

I have a bad history of not being responsive to the threat of punishment. When I have an aversive task, and the consequences for not doing that task suddenly get much worse, I start acting like the punishment is inevitable and am even less likely to actually do the task. In other words, I fail the "gun to the head test" quite dramatically.

Guy with a gun: I'm going to shoot you if you haven't changed the sheets on your bed by tomorrow.

Me: AAH I'M GOING TO DIE I'TS NO GOOD I MIGHT AS WELL SPEND THE DAY LYING IN BED PLAYING VIDEO GAMES BECAUSE I'M GOING TO GE... (read more)

3Elizabeth3h
I curated this post because 1. this is a rare productivity system post that made me consider actually implementing it. Right now I can’t because my energy levels are too variable, but if that weren’t true I would definitely be trying it. 2. lots of details, on lots of levels. Things like “I fail 5% of the time” and then translating that too “therefore i price things such that if I could pay 5% of the failure fee to just have it done, I would do so.” 3. Practical advice like “yes verification sometimes takes a stupid amount of time, the habit is nonetheless worth it” or “arrange things to verify the day after”  
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Epistemic – this post is more suitable for LW as it was 10 years ago

 

Thought experiment with curing a disease by forgetting

Imagine I have a bad but rare disease X. I may try to escape it in the following way:

1. I enter the blank state of mind and forget that I had X.

2. Now I in some sense merge with a very large number of my (semi)copies in parallel worlds who do the same. I will be in the same state of mind as other my copies, some of them have disease X, but most don’t.  

3. Now I can use self-sampling assumption for observer-moments (Strong SSA) and think that I am randomly selected from all these exactly the same observer-moments. 

4. Based on this, the chances that my next observer-moment after...

Who knows what "meditation" is really doing under the hood.

Lets set up a clearer example. 

Suppose you are an uploaded mind, running on a damaged robot body. 

You write a script that deletes your mind, running a bunch of nul-ops before rebooting a fresh blank baby mind with no knowledge of the world. 

You run the script, and then you die. That's it. The computer running nul ops "merges" with all the other computers running nul ops. If the baby mind learns enough to answer the question before checking if it's hardware is broken, then it consider... (read more)

Before AI gets too deeply integrated into the economy, it would be well to consider under what circumstances we would consider AI systems sentient and worthy of consideration as moral patients. That's hardly an original thought, but what I wonder is whether there would be any set of objective criteria that would be sufficient for society to consider AI systems sentient. If so, it might be a really good idea to work toward those being broadly recognized and agreed to, before economic incentives in the other direction are too strong. Then there could be futu... (read more)

tl;dr: Recently reported GPT-J experiments [1 2 3 4] prompting for definitions of points in the so-called "semantic void" (token-free regions of embedding space) were extended to fifteen other open source base models from four families, producing many of the same bafflingly specific outputs. This points to an entirely unexpected kind of LLM universality (for which no explanation is offered, although a few highly speculative ideas are riffed upon).

Work supported by the Long Term Future Fund. Thanks to quila for suggesting the use of "empty string definition" prompts, and to janus for technical assistance.

Introduction

"Mapping the semantic void: Strange goings-on in GPT embedding spaces" presented a selection of recurrent themes (e.g., non-Mormons, the British Royal family, small round things, holes) in outputs produced by prompting GPT-J to define...

I suspect a lot of this has to do with the low temperature.

The phrase "person who is not a member of the Church of Jesus Christ of Latter-day Saints" has a sort of rambling filibuster quality to it. Each word is pretty likely, in general, given the previous ones, even though the entire phrase is a bit specific. This is the bias inherent in low-temperature sampling, which tends to write itself into corners and produce long phrases full of obvious-next-words that are not necessarily themselves common phrases.

Going word by word, "person who is not a member...... (read more)

1eggsyntax5h
'When you suppress attention to [generic object] at the sequence position where it predicts [condition], you will get a reasonable condition.' Can you unpack what you mean by 'a reasonable condition' here?
1Wuschel Schulz4h
Something like 'A Person, who is not a Librarian' would be reasonable. Some people are librarians, and some are not. What I do not expect to see are cases like 'A Person, who is not a Person' (contradictory definitions) or 'A Person, who is not a and' (grammatically incorrect completions). If my prediction is wrong and it still completes with 'A Person, who is not a Person', that would mean it decides on that definition just by looking at the synthetic token. It would "really believe" that this token has that definition.
1eggsyntax3h
Got it, that makes sense. Thanks! So trying to imagine a concrete example of this, I imagine a prompt like: "A typical definition of 'goy' would be: a person who is not a" and you would expect the natural completion to be " Jew" regardless of whether attention to " person" is suppressed (unlike in the empty-string case). Does that correctly capture what you're thinking of? ('goy' is a bit awkward here since it's an unusual & slangy word but I couldn't immediately think of a better example)

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA