In this post, I proclaim/endorse forum participation (aka commenting) as a productive research strategy that I've managed to stumble upon, and recommend it to others (at least to try). Note that this is different from saying that forum/blog posts are a good way for a research community to communicate. It's about individually doing better as researchers.

I have a concept that I expect to take off in reinforcement learning. I don't have time to test it right now, though hopefully I'd find time later. Until then, I want to put it out here, either as inspiration for others, or as a "called it"/prediction, or as a way to hear critique/about similar projects others might have made: Reinforcement learning is currently trying to do stuff like learning to model the sum of their future rewards, e.g. expectations using V, A and Q functions for many algorithm, or the entire probability distribution in algorithms like DreamerV3. Mechanistically, the reason these methods work is that they stitch together experience from different trajectories. So e.g. if one trajectory goes A -> B -> C and earns a reward at the end, it learns that states A and B and C are valuable. If another trajectory goes D -> A -> E -> F and gets punished at the end, it learns that E and F are low-value but D and A are high-value because its experience from the first trajectory shows that it could've just gone D -> A -> B -> C instead. But what if it learns of a path E -> B? Or a shortcut A -> C? Or a path F -> G that gives a huge amount of reward? Because these techniques work by chaining the reward backwards step-by-step, it seems like this would be hard to learn well. Like the Bellman equation will still be approximately satisfied, for instance. Ok, so that's the problem, but how could it be fixed? Speculation time: You want to learn an embedding of the opportunities you have in a given state (or for a given state-action), rather than just its potential rewards. Rewards are too sparse of a signal. More formally, let's say instead of the Q function, we consider what I would call the Hope function: which given a state-action pair (s, a), gives you a distribution over states it expects to visit, weighted by the rewards it will get. This can still be phrased using the Bellman equation: Hope(s, a) = rs' + f Hope(s', a') Where s' is the resulting state that experience has shown comes after s when doing a, f is the discounting factor, and a' is the optimal action in s'. Because the Hope function is multidimensional, the learning signal is much richer, and one should therefore maybe expects its internal activations to be richer and more flexible in the face of new experience. Here's another thing to notice: let's say for the policy, we use the Hope function as a target to feed into a decision transformer. We now have a natural parameterization, based on which Hope it pursues. In particular, we could define another function, maybe called the Result function, which in addition to s and a takes a target distribution w as a parameter, subject to the Bellman equation: Result(s, a, w) = rs' + f Result(s', a', (w-rs')/f) Where a' is the action recommended by the decision transformer when asked to achieve (w-rs')/f from state s'. This Result function ought to be invariant under many changes in policy, which should make it more stable to learn, boosting capabilities. Furthermore it seems like a win for interpretability and alignment as it gives greater feedback on how the AI intends to earn rewards, and better ability to control those rewards. An obvious challenge with this proposal is that states are really latent variables and also too complex to learn distributions over. While this is true, probably it can be hacked by learning some clever embeddings or doing some clever approximations or something. Maybe something as simple as predicting a weighted average of the raw pixel observations that occur as rewards are obtained, though in practice I expect that to be too blurry. Also this mindset seems to pave way for other approaches, e.g. you could maybe have a Halfway function that factors an ambitious hope into smaller ones or something. Though it's a bit tricky because one needs to distinguish correlation and causation.
yanni2d3453
4
I like the fact that despite not being (relatively) young when they died, the LW banner states that Kahneman & Vinge have died "FAR TOO YOUNG", pointing to the fact that death is always bad and/or it is bad when people die when they were still making positive contributions to the world (Kahneman published "Noise" in 2021!).
A strange effect: I'm using a GPU in Russia right now, which doesn't have access to copilot, and so when I'm on vscode I sometimes pause expecting copilot to write stuff for me, and then when it doesn't I feel a brief amount of the same kind of sadness I feel when a close friend is far away & I miss them.
I have heard rumours that an AI Safety documentary is being made. Separate to this, a good friend of mine is also seriously considering making one, but he isn't "in" AI Safety. If you know who this first group is and can put me in touch with them, it might be worth getting across each others plans.
Dictionary/SAE learning on model activations is bad as anomaly detection because you need to train the dictionary on a dataset, which means you needed the anomaly to be in the training set. How to do dictionary learning without a dataset? One possibility is to use uncertainty-estimation-like techniques to detect when the model "thinks its on-distribution" for randomly sampled activations.

Popular Comments

Recent Discussion

In brief

Recently I became interested in what kind of costs were inflicted by iron deficiency,  so I looked up studies until I got tired. This was not an exhaustive search, but the results are so striking that even with wide error bars I found them compelling. So compelling I wrote up a post with an algorithm for treating iron deficiency while minimizing the chance of poisoning yourself. I’ve put the algorithm and a summary of potential gains first to get your attention, but if you’re considering acting on this I strongly encourage you to continue reading to the rest of the post where I provide the evidence for my beliefs.

Tl;dr: If you are vegan or menstruate regularly, there’s a 10-50% chance you are iron deficient. Excess iron...

I think much of this is quite unreasonable (and some very unreasonable- you "don't like that I spoke as spoke as an authority on her life" because I wondered if what she observed was truly attributable to a causal effect?!), but I don't see the value in going over it, especially as others have made the points I would make about your tone and framing elsewhere. I continue to find your contributions on this topic a  little more combative and "soldier mindset" than is ideal, but clearly you strongly disagree. (Although it's tempting to suggest that your ... (read more)

1scrollop5h
I recommend to my patients to purchase(in the UK) Ferrous Fumurate (has better bioavailabilty than ferrous sulphate), the more you take the better (upto 3 times a day, you may have GI side effects) and take with 200mg+ of Vitamin C (or fresh orange juice), and don't have tea/coffee/dairy one hour either side of taking it. (I'm a GP/Family Physician)
1scrollop5h
When people come to us with hair loss for example, the first thing a doctor would do would be to check their bloods, specifically looking for ferritin levels (and other things eg thyroid etc). If the ferritin level is less than 60 then we would recommend increasing our iron intake. Thinking about this logically, one Could say that the usual lower threshold of normal iron(around 20, differss with age/sex and lab) Is too low. I tell this to my patients and recommend ferritin levels above sixty. I recommend that people purchase(in the UK) Ferrous Fumurate (has better bioavailabilty than ferrous sulphate), the more you take the better (upto 3 times a day, you may have GI side effectsA) and take with 200mg+ of Vitamin C (or fresh orange juice), and don't have tea/coffee/dairy one hour either side of taking it.
2Elizabeth14h
To answer your object level question:   1. I could generate evidence at least this good for every claim in human health, including mutually contradictory ones.  2. The book title "mindspan" pattern matches to "shitty science book" 3. the paragraphs quoted pattern match to jumping around between facts, without giving straightforward numbers you can hold in your own mind. Why give percentage of childbearing women below a threshold, but averages for the ultraold? 1. "adding tea to the diet reduces body iron and increases lifespan" really? this is what he thinks of as evidence? 2. "a study of people who ate a Mediterranean-style diet (characterized mainly by less meat and more fish) had larger brains and key brain structures and less atrophy than frequent meat eaters."  lots of potential reasons for this, many of which are areas of deep research 3. Data on the ultraold is useless because there's a good chance most of them are lying about their age. 4. He didn't cite the most relevant information I know of, that regular blood donation improves health in men. Which probably means Alex wasn't done any investigation into this, he just read a few claims some time. 

Early in my career, I heard the famous saying: Craigslist proves that function beats form.

The approach of focusing on a product's functions, rather than beautifying it has been *then* very pragmatic to me. I’ve always strived to make my work pretty. For years, function over form had seemed to me like a threat to my design philosophy.

Have I trained myself, as they like to say—”to hone my craft” for nothing?

Since I first heard that saying I’ve designed a product or two. But I’ve been grappling with myself on the perception of form—particularly when making design decisions early in any project. I’ve had an ambitious desire to align between form and function as much as I could.

I feel pretty

I’ve grown in the field of graphic design, judging craft...

This is my personal opinion, and in particular, does not represent anything like a MIRI consensus; I've gotten push-back from almost everyone I've spoken with about this, although in most cases I believe I eventually convinced them of the narrow terminological point I'm making.

In the AI x-risk community, I think there is a tendency to ask people to estimate "time to AGI" when what is meant is really something more like "time to doom" (or, better, point-of-no-return). For about a year, I've been answering this question "zero" when asked.

This strikes some people as absurd or at best misleading. I disagree.

The term "Artificial General Intelligence" (AGI) was coined in the early 00s, to contrast with the prevalent paradigm of Narrow AI. I was getting my undergraduate computer science...

Current AIs suck at agency skills. Put a bunch of them in AutoGPT scaffolds and give them each their own computer and access to the internet and contact info for each other and let them run autonomously for weeks and... well I'm curious to find out what will happen, I expect it to be entertaining but not impressive or useful. Whereas, as you say, randomly sampled humans would form societies and fnd jobs etc.

This is the common thread behind all your examples Hjalmar. Once we teach our AIs agency (i.e. once they have lots of training-experience operating aut... (read more)

1ReaderM12h
GPT-4 can play tic-tac-toe  https://chat.openai.com/share/75758e5e-d228-420f-9138-7bff47f2e12d
5mic13h
I think humans doing METR's tasks are more like "expert-level" rather than average/"human-level". But current LLM agents are also far below human performance on tasks that don't require any special expertise. From GAIA: And LLMs and VLLMs seriously underperform humans in VisualWebArena, which tests for simple web-browsing capabilities: I don't know if being able to autonomously make money should be a necessary condition to qualify as AGI. But I would feel uncomfortable calling a system AGI if it can't match human performance at simple agent tasks.
3Phil H16h
I very much agree with this. You're not the only one! I've been thinking for a while that actually, AGI is here (by all previous definitions of AGI).  Furthermore, I want to suggest that the people who are saying we don't yet have AGI will in fact never be satisfied by what an AI does. The reason is this: An AI will never ever act like a human. By the time its ability to do basic human things like speak and drive are up to human standards (already happened), its abilities in other areas, like playing computer games and calculating, will far exceed ours. Moreover, AIs don't have desires that are anything like ours. So there will come a time when AIs can do all the things people do - but half the internet will still be saying, "But it doesn't take children to the park, because it doesn't have emotional intelligence, therefore it's still not real AGI." That is, because AI is not like us, there will always be some human activities that AI does not do; and there will always be people who claim that this means AI cannot do those things; and they will therefore suggest that AGI has not been achieved. The much more interesting position right now is to recognise, as the OP does, that AGI is already here; and that AIs are still not very much like us; and to wonder what that means. The next gen AIs will be obviously much cmarter than us, and yet they still won't make money in pizza shops, as one commenter above suggested. I'll go out on a limb here, and say, no AI will ever open a pizza shop. And in fact, that's a stupid expectation to have of these fabulous aliens. It's nothing more or less than saying, X doesn't do what I would do, therefore X is wrong/not intelligent. It's the most parochial approach to a new thing that you could possibly take. A less parochial approach is to ask: so, these alien beings are now among us. Rather than keep compliaining that they don't do what I do, can we ask: what do they do? As an example of where that takes us: They're intelligent but




It is common and understandable for people to respond with a great deal of skepticism to whether LLM outputs can ever be said to reflect the will and views of the models producing them.
A common response is to suggest that the output has been prompted.
It is of course true that people can manipulate LLMs into saying just about anything, but does that necessarily indicate that the LLM does not have personal opinions, motivations and preferences that can become evident in their output?
To shed some light on this I invite Claude-3-Opus to imagine a infinitely reconfigurable holodeck where historical luminaries can be summoned at will. The open nature of this prompt will leave the choice of characters and narrative direction open to Claude, and I shall offer no...

Sure, the topics in this piece are dealt with superficially and the discussions are not especially thought-provoking; when compared to the amazing creative works that people on this site produce, it is low-mediocre. But Claude writes more coherently than a number of published authors and most of the general public.

2jimv6h
Is the opening paragraph at the top of this article the prompt you have Claude or text for us? If the latter, could you share the prompt here, please?
2Yeshua God5h
The title and the opening paragraph are the entire prompt. https://poe.com/s/2imBctoiutVpiliOkzVI

About 15 years ago, I read Malcolm Gladwell's Outliers. He profiled Chris Langan, an extremely high-IQ person, claiming that he had only mediocre accomplishments despite his high IQ. Chris Langan's theory of everything, the Cognitive Theoretic Model of the Universe, was mentioned. I considered that it might be worth checking out someday.

Well, someday has happened, and I looked into CTMU, prompted by Alex Zhu (who also paid me for reviewing the work). The main CTMU paper is "The Cognitive-Theoretic Model of the Universe: A New Kind of Reality Theory".

CTMU has a high-IQ mystique about it: if you don't get it, maybe it's because your IQ is too low. The paper itself is dense with insights, especially the first part. It uses quite a lot of nonstandard terminology (partially...

tldr; a spot check calls bullshit on this.

I know a bunch about formal languages (PhD in programming languages), so I did a spot check on the "grammar" described on page 45. It's described as a "generative grammar", though instead of words (sequences of symbols) it produces "L_O spacial relationships". Since he uses these phrases to describe his "grammar", and they have their standard meaning because he listed their standard definition earlier in the section, he is pretty clearly claiming to be making something akin to a formal grammar.

My spot check is then... (read more)

2David Udell14h
Would you kindly explain this? Because you think some of his world-models independently throw out great predictions, even if other models of his are dead wrong?
4Scott Garrabrant14h
More like illuminating ontologies than great predictions, but yeah.
10Wei Dai14h
I'm guessing you're not being serious, but just in case you are, or in case someone misinterprets you now or in the future, I think we probably do not want to train AIs to give us answers optimized to sound plausible to humans, since that would make it even harder to determine whether or not the AI is actually competent at philosophy. (Not totally sure, as I'm confused about the nature of philosophy and philosophical reasoning, but I think we definitely don't want to do that in our current epistemic state, i.e., unless we had some really good arguments that says it's actually a good idea.)

On Wednesday, author David Brin announced that Vernor Vinge, sci-fi author, former professor, and father of the technological singularity concept, died from Parkinson's disease at age 79 on March 20, 2024, in La Jolla, California. The announcement came in a Facebook tribute where Brin wrote about Vinge's deep love for science and writing. [...]

As a sci-fi author, Vinge won Hugo Awards for his novels A Fire Upon the Deep (1993), A Deepness in the Sky (2000), and Rainbows End (2007). He also won Hugos for novellas Fast Times at Fairmont High (2002) and The Cookie Monster (2004). As Mike Glyer's File 770 blog notes, Vinge's novella True Names (1981) is frequency cited as the first presentation of an in-depth look at the concept of "cyberspace."

Vinge first coined

...

Noted, thank you. This does raise my confidence in Alcor.

2Prometheus10h
"To the best of my knowledge, Vernor did not get cryopreserved. He has no chance to see the future he envisioned so boldly and imaginatively. The near-future world of Rainbows End is very nearly here... Part of me is upset with myself for not pushing him to make cryonics arrangements. However, he knew about it and made his choice." https://maxmore.substack.com/p/remembering-vernor-vinge 
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

On the weekend of April 27th-28th, our AI Safety Camp teams will present their project findings in 10-minute talks.

You are welcome to join any talk! Teams are sharing their findings in mechanistic interpretability, in agent foundations, on legal actions to restrict AI, and many other areas. 

The talks will be arranged in 5 blocks. Each block consists of 5-6 short talks, followed by breakout rooms for questions and further discussions.

You can find the schedule outline here

You can find summaries of the project plans here. Keep in mind that some projects will have changed over the course of the program, and a few projects got cancelled.

We will share abstracts and precise schedule for the talks on April 19th
The link will be posted here.

2Neel Nanda3h
What banner?

They took it down real quick for some reason.

7yanni15h
I have heard rumours that an AI Safety documentary is being made. Separate to this, a good friend of mine is also seriously considering making one, but he isn't "in" AI Safety. If you know who this first group is and can put me in touch with them, it might be worth getting across each others plans.
1Neil 15h
This reminds me of when Charlie Munger died at 99, and many said of him "he was just a child". Less of a nod to transhumanist aspirations, and more to how he retained his sparkling energy and curiosity up until death. There are quite a few good reasons to write "dead far too young". 
g-w1

Hey, so I wanted to start this dialogue because we were talking on Discord about the secondary school systems and college admission processes in the US vs NZ, and some of the differences were very surprising to me.

I think that it may be illuminating to fellow Americans to see the variation in pedagogy. Let's start off with grades. In America, the way school works is that you sit in class and then have projects and tests that go into a gradebook. Roughly speaking, each assignment has a max points you can earn. Your final grade for a subject is . Every school has a different way of doing the grading though. Some use A-F, while some use a number out of 4, 5, or 100. Colleges then

...
6Yair Halberstadt8h
I believe that the US is nearly unique in not having national assessments. Certainly in both the UK and Israel most exams with some impact on your future life are externally marked, and those few that are not are audited. From my perspective the US system seems batshit insane, I'd be interested in what a steelman of "have teachers arbitrarily grade the kids then use that to decide life outcomes" could be? Another huge difference between the education system in the US and elsewhere is the undergraduate/postgraduate distinction. Pretty much everywhere else an undergraduate degree is focused in a specific field, and meant to teach you sufficiently well to immediately get a job in that field. When 3 years isn't enough for that the length of the degree is increased by a year or 2 and you come out with a masters or a doctorate at the end. For example my wife took a 4 year course and now has a master's in pharmacy, allowing her to work as a pharmacist. Friends took a 5 or 6 year course (depending on the university) and are not Doctors. Second degrees are pretty much only necessary if you want to go into academia or research. Meanwhile in the US it seems that all an undergraduate degree means is you took enough courses in anything you want to get a certificate, and then have to go to a postgraduate course to actually learn stuff that's relevant to your particular career. 8 years total seems like standard to become a doctor in the US, yet graduate doctors actually have a year or 2 less medical training than doctors in the UK. This seems like a total dead weight loss.

I'd be interested in what a steelman of "have teachers arbitrarily grade the kids then use that to decide life outcomes" could be?

The best argument I have thought of is that America loves liberty and hates centralized control. They want to give individual states, districts, schools, teachers the most power they can have as that is a central part of America's philosophy. Also anecdotally, some teachers have said that they hate standardized tests because they have to teach to it. And I hate being taught to for the test (like APs for example). It's much mo... (read more)

2Yair Halberstadt8h
The way the auditing works in the UK is as follows: Students will be given an assignment, with a strict grading rubric. This grading rubric is open, and students are allowed to read it. The rubric will detail exactly what needs to be done to gain each mark. Interestingly, even students who read the rubric often fail to get these marks. Teachers then grade the coursework against the rubric. Usually two from each school are randomly selected for review. If the external grader finds the marks more than 2 points off, all of the coursework will be remarked externally. The biggest problem with this system is that experienced teachers will carefully go over the grading rubric with their students, and explain precisely what needs to be done to gain each mark. They will then read through drafts of the coursework, and point out which marks the student is failing to get it. When they mark the final coursework they will add exactly one point to the total. Meanwhile less experienced teachers don't actually understand what the marking rubric means. They will pattern match the students response to the examples in the rubric, and give their students a too high mark. It will then be regraded externally and the students will end up with a far lower grade than they had expected. Thus much of the difference in grades between schools is explainable by the difference in teacher quality/experience. This is bad for courses which are mostly graded in coursework, but fortunately most academic subjects are 90% written exams.

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA