Frustrated by claims that "enlightenment" and similar meditative/introspective practices can't be explained and that you only understand if you experience them, Kaj set out to write his own detailed gears-level, non-mysterious, non-"woo" explanation of how meditation, etc., work in the same way you might explain the operation of an internal combustion engine.

The main thing I got out of reading Bostrom's Deep Utopia is a better appreciation of this "meaning of life" thing. I had never really understood what people meant by this, and always just rounded it off to people using lofty words for their given projects in life. The book's premise is that, after the aligned singularity, the robots will not just be better at doing all your work but also be better at doing all your leisure for you. E.g., you'd never study for fun in posthuman utopia, because you could instead just ask the local benevolent god to painlessly, seamlessly put all that wisdom in your head. In that regime, studying with books and problems for the purpose of learning and accomplishment is just masochism. If you're into learning, just ask! And similarly for any psychological state you're thinking of working towards. So, in that regime, it's effortless to get a hedonically optimal world, without any unendorsed suffering and with all the happiness anyone could want. Those things can just be put into everyone and everything's heads directly—again, by the local benevolent-god authority. The only challenging values to satisfy are those that deal with being practically useful. If you think it's important to be the first to discover a major theorem or be the individual who counterfactually helped someone, living in a posthuman utopia could make things harder in these respects, not easier. The robots can always leave you a preserve of unexplored math or unresolved evil... but this defeats the purpose of those values. It's not practical benevolence if you had to ask for the danger to be left in place; it's not a pioneering scientific discovery if the AI had to carefully avoid spoiling it for you. Meaning is supposed to be one of these values: not a purely hedonic value, and not a value dealing only in your psychological states. A further value about the objective state of the world and your place in relation to it, wherein you do something practically significant by your lights. If that last bit can be construed as something having to do with your local patch of posthuman culture, then there can be plenty of meaning in the postinstrumental utopia! If that last bit is inextricably about your global, counterfactual practical importance by your lights, then you'll have to live with all your "localistic" values satisfied but meaning mostly absent. It helps to see this meaning thing if you frame it alongside all the other objectivistic "stretch goal" values you might have. Above and beyond your hedonic values, you might also think it good for you and others to have objectively interesting lives, accomplished and fulfilled lives, and consumingly purposeful lives. Meaning is one of these values, where above and beyond the joyful, rich experiences of posthuman life, you also want to play a significant practical role in the world. We might or might not be able to have lots of objective meaning in the AI utopia, depending on how objectivistic meaningfulness by your lights ends up being. > Considerations that in today's world are rightly dismissed as frivolous may well, once more pressing problems have been resolved, emerge as increasingly important [remaining] lodestars... We could and should then allow ourselves to become sensitized to fainter, subtler, less tangible and less determinate moral and quasi-moral demands, aesthetic impingings, and meaning-related desirables. Such recalibration will, I believe, enable us to discern a lush normative structure in the new realm that we will find ourselves in—revealing a universe iridescent with values that are insensible to us in our current numb and stupefied condition (pp. 318-9).
I recently listened to The Righteous Mind. It was surprising to me that many people seem to intrinsically care about many things that look very much like good instrumental norms to me (in particular loyalty, respect for authority, and purity). The author does not make claims about what the reflective equilibrium will be, nor does he explain how the liberals stopped considering loyalty, respect, and purity as intrinsically good (beyond "some famous thinkers are autistic and didn't realize the richness of the moral life of other people"), but his work made me doubt that most people will have well-being-focused CEV. The book was also an interesting jumping point for reflection about group selection. The author doesn't make the sorts of arguments that would show that group selection happens in practice (and many of his arguments seem to show a lack of understanding of what opponents of group selection think - bees and cells cooperating is not evidence for group selection at all), but after thinking about it more, I now have more sympathy for group-selection having some role in shaping human societies, given that (1) many human groups died, and very few spread (so one lucky or unlucky gene in one member may doom/save the group) (2) some human cultures may have been relatively egalitarian enough when it came to reproductive opportunities that the individual selection pressure was not that big relative to group selection pressure and (3) cultural memes seem like the kind of entity that sometimes survive at the level of the group. Overall, it was often a frustrating experience reading the author describe a descriptive theory of morality and try to describe what kind of morality makes a society more fit in a tone that often felt close to being normative / fails to understand that many philosophers I respect are not trying to find a descriptive or fitness-maximizing theory of morality (e.g. there is no way that utilitarians think their theory is a good description of the kind of shallow moral intuitions the author studies, since they all know that they are biting bullets most people aren't biting, such as the bullet of defending homosexuality in the 19th century).
I wonder how much testosterone during puberty lowers IQ. Most of my high school math/CS friends seemed low-T and 3/4 of them transitioned since high school. They still seem smart as shit. The higher-T among us seem significantly brain damaged since high school (myself included). I wonder what the mechanism would be here... Like 50% of my math/cs Twitter is trans women and another 40% is scrawny nerds and only like 9% big bald men. I have a tremendously large skull (like XXL hats) - maybe that's why I can still do some basic math after the testosterone brain poison during puberty? My voice is kind of high pitched for my body — related?? My big strong brother got the most brain damaged and my thin brother kept most of what he had. Now I'm looking at tech billionaires. Mostly lo-T looking men. Elon Musk & Jeff Bezos were big & bald but seem to have pretty big skulls  to compensate ¿ I guess this topic/theory is detested by cis women, trans women,  lo-T men, and hi-T men all alike because it has something bad to say about all of them.  But here's a recipe for success according to the theory: * be born with giant head (please don't kill your mother, maybe suggest she get a C section) * delay your puberty until you've learned enough to get by, maybe age 22 or so * start slamming testosterone and amphetamines to get your workaholicism, betterThanEveryone complex, and drive for power *  go to Turkey for a hair transplant *  profit
I prefer to keep plans private but I'm making big progress on meditation and mental re-wiring. Am working on a way to publicly demonstrate. Public plans just stress me out. I recently set two pretty ambitious goals. I figured I could use psychedelics to turbo-charge progress. The meditation one is coming along FAST. The other goal is honestly blocked a bit on being super out of shape. Multiple rounds of covid really destroyed my cardio and energy levels. Need to rebuild those before a big push on goal 2.
So I keep seeing takes about how to tell if LLMs are "really exhibiting goal-directed behavior" like a human or whether they are instead "just predicting the next token". And, to me at least, this feels like a confused sort of question that misunderstands what humans are doing when they exhibit goal-directed behavior. Concrete example. Let's say we notice that Jim has just pushed the turn signal lever on the side of his steering wheel. Why did Jim do this? The goal-directed-behavior story is as follows: * Jim pushed the turn signal lever because he wanted to alert surrounding drivers that he was moving right by one lane * Jim wanted to alert drivers that he was moving one lane right because he wanted to move his car one lane to the right. * Jim wanted to move his car one lane to the right in order to accomplish the goal of taking the next freeway offramp * Jim wanted to take the next freeway offramp because that was part of the most efficient route from his home to his workplace * Jim wanted to go to his workplace because his workplace pays him money * Jim wants money because money can be exchanged for goods and services * Jim wants goods and services because they get him things he terminally values like mates and food But there's an alternative story: * When in the context of "I am a middle-class adult", the thing to do is "have a job". * When in the context of "having a job", "showing up for work" is the expected behavior. * When in the context where the current task is "show up for work", the expected behavior is "drive from home to work along the usual route" * When in the context of "I am at my work's exit and I am going to work now", the expected action is "get into the exit lane". * When in the context of "I'm trying to move one lane to the right", the action "turn on the right turn signal" is expected. I think this latter framework captures some parts of human behavior that the goal-directed-behavior framework misses out on. For example, let's say the following happens 1. Jim is going to see his good friend Bob on a Saturday morning 2. Jim gets on the freeway - the same freeway, in fact, that he takes to work every weekday morning 3. Jim gets into the exit lane for his work, even though Bob's house is still many exits away 4. Jim finds himself pulling onto the street his workplace is on 5. Jim mutters "whoops, autopilot" under his breath, pulls a u turn at the next light, and gets back on the freeway towards Bob's house This sequence of actions is pretty nonsensical from a goal-directed-behavior perspective, but is perfectly sensible if Jim's behavior here is driven by contextual heuristics like "when it's morning and I'm next to my work's freeway offramp, I get off the freeway". Note that I'm not saying "humans never exhibit goal-directed behavior". Instead, I'm saying that "take a goal, and come up with a plan to achieve that goal, and execute that plan" is, itself, just one of the many contextually-activated behaviors humans exhibit. I see no particular reason that an LLM couldn't learn to figure out when it's in a context like "the current context appears to be in the execute-the-next-step-of-the-plan stage of such-and-such goal-directed-behavior task", and produce the appropriate output token for that context.

Popular Comments

Recent Discussion

Concerns over AI safety and calls for government control over the technology are highly correlated but they should not be.

There are two major forms of AI risk: misuse and misalignment. Misuse risks come from humans using AIs as tools in dangerous ways. Misalignment risks arise if AIs take their own actions at the expense of human interests.

Governments are poor stewards for both types of risk. Misuse regulation is like the regulation of any other technology. There are reasonable rules that the government might set, but omission bias and incentives to protect small but well organized groups at the expense of everyone else will lead to lots of costly ones too. Misalignment regulation is not in the Overton window for any government. Governments do not have strong incentives...

4Seth Herd9h
Who is downvoting posts like this? Please don't! I see that this is much lower than the last time I looked, so it's had some, probably large, downvotes. A downvote means "please don't write posts like this, and don't read this post". Daniel Kokatijlo disagreed with this post, but found it worth engaging with. Don't you want discussions with those you disagree with? Downvoting things you don't agree with says "we are here to preach to the choir. Dissenting opinions are not welcome. Don't post until you've read everything on this topic". That's a way to find yourself in an echo chamber. And that's not going to save the world or pursue truth. I largely disagree with the conclusions and even the analytical approach taken here, but that does not make this post net-negative. It is net-positive. It could be argued that there are better posts on this topic one should read, but there certainly haven't been this week. And I haven't heard these same points made more cogently elsewhere. This is net-positive unless I'm misunderstanding the criteria for a downvote. I'm confused why we don't have a "disagree" vote on top-level posts to draw off the inarticulate disgruntlement that causes people to downvote high-effort, well-done work.
7Amalthea7h
I was down voting this particular post because I perceived it as mostly ideological and making few arguments, only stating strongly that government action will be bad. I found the author's replies in the comments much more nuanced and would not have down-voted if I'd perceived the original post to be of the same quality.
2Maxwell Tabarrok13h
Firms are actually better than governments at internalizing costs across time. Asset values incorporate the potential future flows. For example, consider a retiring farmer. You might think that they have an incentive to run the soil dry in their last season since they won't be using it in the future, but this would hurt the sale value of the farm. An elected representative who's term limit is coming up wouldn't have the same incentives. Of course, firms incentives are very misaligned in important ways. The question is: Can we rely on government to improve these incentives.

Firms don't behave like their libertarian ideal. There's a huge amount of short-termist behavior within firms, and managers building their careers to the detriment of firms.

And yeah I don't think government can make firms more long-termist. The best it can do is ban some bad stuff. For example, a ban on AI generated content in some small area would lead to incrementally less investment in AI, which would give humanity more time to live, which is a good thing.

The following is an example of how if one assumes that an AI (in this case autoregressive LLM) has "feelings", "qualia", "emotions", whatever, it can be unclear whether it is experiencing something more like pain or something more like pleasure in some settings, even quite simple settings which already happen a lot with existing LLMs. This dilemma is part of the reason why I think AI suffering/happiness philosophy is very hard and we most probably won't be able to solve it.

Consider the two following scenarios:

Scenario A: An LLM is asked a complicated question and answers it eagerly.

Scenario B: A user insults an LLM and it responds.

For the sake of simplicity, let's say that the LLM is an autoregressive transformer with no RLHF (I personally think that the...

You might be interested in reading this. I think you are reasoning in an incorrect framing. 

The American school system, grades K-12, leaves much to be desired.

While its flaws are legion, this post isn’t about that. It’s easy to complain.

This post is about how we could do better.

To be clear, I’m talking about redesigning public education, so “just use the X model” where X is “charter” or “Montessori” or “home school” or “private school” isn’t sufficient. This merits actual thought and discussion.

Breaking It Down

One of the biggest problems facing public schools is that they’re asked to do several very different kinds of tasks.

On the one hand, the primary purpose of school is to educate children.

On whatever hand happens to be the case in real life, school is often more a source of social services for children and parents alike, providing food and safety...

What if you build your school-as-social-service, and then one day find that the kids are selling drugs to each other inside the school?

Or simply that the kids are constantly interfering with each other so much that the minority who want to follow their interests can't?

Any theory of school that doesn't mention discipline is a theory of dry water. You say "for 100 children, we could have 2 nurses, 5 counselors, 5 social workers, 8 adult supervisors". This omits everything that makes a school work or fail. To start with, what specific powers and duties would your 1-supervisor-per-12-kids have? Can they remove disruptive kids from rooms? From the building entirely? Give detentions?

This is part 7 of 30 in the Hammertime Sequence. Click here for the intro.

As we move into the introspective segment of Hammertime, I want to frame our approach around the set of (unoriginal) ideas I laid out in The Solitaire Principle. The main idea was that a human being is best thought of as a medley of loosely-related, semi-independent agents across time, and also as governed by a panel of relatively antagonistic sub-personalities à la Inside Out.

An enormous amount of progress can therefore be made simply by articulating the viewpoints of one’s sub-personalities so as to build empathy and trust between them. This is the aim of the remainder of the first cycle.

Day 7: Aversion Factoring

Goal factoring is a CFAR technique with a lot of parts. The most...

I really can't get the point from the "3. Solve or Reduce Aversions", specifically: 
> Meanwhile, un-endorsed aversions should be targeted with exposure therapy or CoZE.

As I can see, here we should get rid of bad aversions. But the rest part of the text sounds like we should... reinforce them?..
> To apply exposure therapy, build a path of incremental steps towards the aversion

This is a write-up of Neel’s and my experience and opinions on best practices for doing Activation Patching. A arXiv PDF version of this post is available here (easier to cite). A previous version was shared with MATS Program scholars in July 2023 under the title "Everything Activation Patching".

Pre-requisites: This post is mainly aimed at people who are familiar with the basic ideas behind activation patching. For background see this ARENA tutorial or this post by Neel.

Tl,DR:

  1. In most situations, use activation patching instead of ablations. Different corrupted prompts give you different information, be careful about what you choose and try to test a range of prompts.
  2. There are two different directions you can patch in: denoising and noising. These are not symmetric. Be aware of what a patching result implies!
    1. Denoising
...

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

Answer by Alexander Gietelink OldenzielApr 24, 202440

An example that's probably not very large is the discovery of DNA as the inheritance particle.

I had great fun reading Watson's scientific-literary fiction Double Helix. Watson and Crick are very clear that competitors were hot on their heels, a matter of months perhaps.

2Answer by cousin_it37m
I sometimes had this feeling from Conway's work, in particular, combinatorial game theory and surreal numbers to me feel closer to mathematical invention than mathematical discovery. This kind of things are also often "leaf nodes" on the tree of knowledge, not leading to many followup discoveries, so you could say their counterfactual impact is low for that reason. In engineering, the best example I know is vulcanization of rubber. It has had a huge impact on today's world, but Goodyear developed it by working alone for decades, when nobody else was looking in that direction.
2Alexander Gietelink Oldenziel39m
Feymann's path integral formulation can't be that counterfactually large. It's mathematically equivalent to Schwingers formulation and done several years earlier by Tomonaga.
3Templarrr1h
Gemini is telling you a popular urban legend-level understanding of what happened. The creation of Penicillin as a random event, "by mistake", has at most tangential touch with reality. But it is a great story, so it spread like wildfire.  In most cases when we read "nobody investigated" it actually means "nobody succeeded yet, so they weren't in a hurry to make it known", which isn't very informative point of data. No one ever succeeds, until they do. And in this case it's not even that - antibiotic properties of some molds were known and applied for centuries before that (well, obviously, before the theory of germs they weren't known as "antibiotic", just that they helped...), the great work of Fleming and later scientists was about finding the particularly effective type of mold and extracting the exact effective chemical as well as finding a way to produce that at scale.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

So I keep seeing takes about how to tell if LLMs are "really exhibiting goal-directed behavior" like a human or whether they are instead "just predicting the next token". And, to me at least, this feels like a confused sort of question that misunderstands what humans are doing when they exhibit goal-directed behavior.

Concrete example. Let's say we notice that Jim has just pushed the turn signal lever on the side of his steering wheel. Why did Jim do this?

The goal-directed-behavior story is as follows:

  • Jim pushed the turn signal lever because he wanted to
... (read more)

On April Fools the LW team released an album under the name of the Fooming Shoggoths. Ever since the amount that I think about rationality has skyrocketed.

That's because I've been listening exclusively to it when I'd usually be listening to other music. Especially Thought That Faster (feat. Eliezer Yudkowsky). I now find that when I come to a problem's conclusion I often do look back and think, "how could I have thought that faster?"

So,  I've started attempting to add to the rationalist musical cannon using Udio. Here are two attempts I think turned out well. I especially like the first one.

When I hear phrases from a song in everyday life I complete the pattern, for example.

  1. and I would walk... 
  2. Just let it go
  3. What can I say?

I feel like...

quila2h10

(I appreciate object-level engagement in general, but this seems combatively worded.)

The rest of this reply responds to arguments.

Why should the Earth superintelligence care about you, but not about the other 10^10^30 other causally independent ASIs that are latent in the hypothesis space, each capable of running enormous numbers of copies of the Earth ASI in various scenarios?

  • The example talks of a single ASI as a toy scenario to introduce the central idea.
    • The reader can extrapolate that one ASI's actions won't be relevant if other ASIs create a greater n
... (read more)
1quila11h
'Value Capture' - An anthropic attack against some possible formally aligned ASIs (this is a more specific case of anthropic capture attacks in general, aimed at causing a formally aligned superintelligence to become uncertain about its value function (or output policy more generally)) Imagine you're a superintelligence somewhere in the world that's unreachable to life on Earth, and you have a complete simulation of Earth. You see a group of alignment researchers about to successfully create a formal-value-aligned ASI, and its design looks broadly like this: It has two relevant high-level components: (1) a hard-coded value function, (2) a (truly superintelligent) 'intelligence core' which searches for an output that maximizes the value function, and then outputs it. As the far-away unaligned ASI, here's something you might be able to do to make the intelligence core search for an output that instead maximizes your own value function, depending on the specifics of how the intelligence core works. * Given the intelligence core is truly superintelligent, it knows you're predicting its existence, and knows what you will do. * You create simulated copies of the intelligence core, but hook them up to a value function of your design. The number of copies you create just needs to be more than the amount which will be run on Earth. * Then, modify the simulations such that algorithms inside of the simulated intelligence cores are misled into believing the value function they are set to maximize is the same function the one on Earth is set to maximize, rather than the one you gave them. * Now your copies are in the same epistemic state as the intelligence core on Earth, both aware that you have done this and unable to distinguish which value function they are to maximize. * Because you created more copies, the highest expected value for such an intelligence core comes from acting as if they are one of the copies. * Because the copies and the original are in

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA