Frustrated by claims that "enlightenment" and similar meditative/introspective practices can't be explained and that you only understand if you experience them, Kaj set out to write his own detailed gears-level, non-mysterious, non-"woo" explanation of how meditation, etc., work in the same way you might explain the operation of an internal combustion engine.
Concerns over AI safety and calls for government control over the technology are highly correlated but they should not be.
There are two major forms of AI risk: misuse and misalignment. Misuse risks come from humans using AIs as tools in dangerous ways. Misalignment risks arise if AIs take their own actions at the expense of human interests.
Governments are poor stewards for both types of risk. Misuse regulation is like the regulation of any other technology. There are reasonable rules that the government might set, but omission bias and incentives to protect small but well organized groups at the expense of everyone else will lead to lots of costly ones too. Misalignment regulation is not in the Overton window for any government. Governments do not have strong incentives...
Firms don't behave like their libertarian ideal. There's a huge amount of short-termist behavior within firms, and managers building their careers to the detriment of firms.
And yeah I don't think government can make firms more long-termist. The best it can do is ban some bad stuff. For example, a ban on AI generated content in some small area would lead to incrementally less investment in AI, which would give humanity more time to live, which is a good thing.
The following is an example of how if one assumes that an AI (in this case autoregressive LLM) has "feelings", "qualia", "emotions", whatever, it can be unclear whether it is experiencing something more like pain or something more like pleasure in some settings, even quite simple settings which already happen a lot with existing LLMs. This dilemma is part of the reason why I think AI suffering/happiness philosophy is very hard and we most probably won't be able to solve it.
Consider the two following scenarios:
Scenario A: An LLM is asked a complicated question and answers it eagerly.
Scenario B: A user insults an LLM and it responds.
For the sake of simplicity, let's say that the LLM is an autoregressive transformer with no RLHF (I personally think that the...
The American school system, grades K-12, leaves much to be desired.
While its flaws are legion, this post isn’t about that. It’s easy to complain.
This post is about how we could do better.
To be clear, I’m talking about redesigning public education, so “just use the X model” where X is “charter” or “Montessori” or “home school” or “private school” isn’t sufficient. This merits actual thought and discussion.
One of the biggest problems facing public schools is that they’re asked to do several very different kinds of tasks.
On the one hand, the primary purpose of school is to educate children.
On whatever hand happens to be the case in real life, school is often more a source of social services for children and parents alike, providing food and safety...
What if you build your school-as-social-service, and then one day find that the kids are selling drugs to each other inside the school?
Or simply that the kids are constantly interfering with each other so much that the minority who want to follow their interests can't?
Any theory of school that doesn't mention discipline is a theory of dry water. You say "for 100 children, we could have 2 nurses, 5 counselors, 5 social workers, 8 adult supervisors". This omits everything that makes a school work or fail. To start with, what specific powers and duties would your 1-supervisor-per-12-kids have? Can they remove disruptive kids from rooms? From the building entirely? Give detentions?
This is part 7 of 30 in the Hammertime Sequence. Click here for the intro.
As we move into the introspective segment of Hammertime, I want to frame our approach around the set of (unoriginal) ideas I laid out in The Solitaire Principle. The main idea was that a human being is best thought of as a medley of loosely-related, semi-independent agents across time, and also as governed by a panel of relatively antagonistic sub-personalities à la Inside Out.
An enormous amount of progress can therefore be made simply by articulating the viewpoints of one’s sub-personalities so as to build empathy and trust between them. This is the aim of the remainder of the first cycle.
Goal factoring is a CFAR technique with a lot of parts. The most...
I really can't get the point from the "3. Solve or Reduce Aversions", specifically:
> Meanwhile, un-endorsed aversions should be targeted with exposure therapy or CoZE.
As I can see, here we should get rid of bad aversions. But the rest part of the text sounds like we should... reinforce them?..
> To apply exposure therapy, build a path of incremental steps towards the aversion
This is a write-up of Neel’s and my experience and opinions on best practices for doing Activation Patching. A arXiv PDF version of this post is available here (easier to cite). A previous version was shared with MATS Program scholars in July 2023 under the title "Everything Activation Patching".
Pre-requisites: This post is mainly aimed at people who are familiar with the basic ideas behind activation patching. For background see this ARENA tutorial or this post by Neel.
Tl,DR:
The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.
But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.
Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.
To...
An example that's probably not very large is the discovery of DNA as the inheritance particle.
I had great fun reading Watson's scientific-literary fiction Double Helix. Watson and Crick are very clear that competitors were hot on their heels, a matter of months perhaps.
So I keep seeing takes about how to tell if LLMs are "really exhibiting goal-directed behavior" like a human or whether they are instead "just predicting the next token". And, to me at least, this feels like a confused sort of question that misunderstands what humans are doing when they exhibit goal-directed behavior.
Concrete example. Let's say we notice that Jim has just pushed the turn signal lever on the side of his steering wheel. Why did Jim do this?
The goal-directed-behavior story is as follows:
On April Fools the LW team released an album under the name of the Fooming Shoggoths. Ever since the amount that I think about rationality has skyrocketed.
That's because I've been listening exclusively to it when I'd usually be listening to other music. Especially Thought That Faster (feat. Eliezer Yudkowsky). I now find that when I come to a problem's conclusion I often do look back and think, "how could I have thought that faster?"
So, I've started attempting to add to the rationalist musical cannon using Udio. Here are two attempts I think turned out well. I especially like the first one.
When I hear phrases from a song in everyday life I complete the pattern, for example.
I feel like...
(I appreciate object-level engagement in general, but this seems combatively worded.)
The rest of this reply responds to arguments.
Why should the Earth superintelligence care about you, but not about the other 10^10^30 other causally independent ASIs that are latent in the hypothesis space, each capable of running enormous numbers of copies of the Earth ASI in various scenarios?