In this post, I proclaim/endorse forum participation (aka commenting) as a productive research strategy that I've managed to stumble upon, and recommend it to others (at least to try). Note that this is different from saying that forum/blog posts are a good way for a research community to communicate. It's about individually doing better as researchers.

yanni1d3248
1
I like the fact that despite not being (relatively) young when they died, the LW banner states that Kahneman & Vinge have died "FAR TOO YOUNG", pointing to the fact that death is always bad and/or it is bad when people die when they were still making positive contributions to the world (Kahneman published "Noise" in 2021!).
Dictionary/SAE learning on model activations is bad as anomaly detection because you need to train the dictionary on a dataset, which means you needed the anomaly to be in the training set. How to do dictionary learning without a dataset? One possibility is to use uncertainty-estimation-like techniques to detect when the model "thinks its on-distribution" for randomly sampled activations.
habryka4d5120
10
A thing that I've been thinking about for a while has been to somehow make LessWrong into something that could give rise to more personal-wikis and wiki-like content. Gwern's writing has a very different structure and quality to it than the posts on LW, with the key components being that they get updated regularly and serve as more stable references for some concept, as opposed to a post which is usually anchored in a specific point in time.  We have a pretty good wiki system for our tags, but never really allowed people to just make their personal wiki pages, mostly because there isn't really any place to find them. We could list the wiki pages you created on your profile, but that doesn't really seem like it would allocate attention to them successfully. I was thinking about this more recently as Arbital is going through another round of slowly rotting away (its search currently being broken and this being very hard to fix due to annoying Google Apps Engine restrictions) and thinking about importing all the Arbital content into LessWrong. That might be a natural time to do a final push to enable people to write more wiki-like content on the site.
People are arguing about the answer to the Sleeping Beauty! I thought this was pretty much dissolved with this post's title! But there are lengthy posts and even a prediction market! Sleeping Beauty is an edge case where different reward structures are intuitively possible, and so people imagine different game payout structures behind the definition of “probability”. Once the payout structure is fixed, the confusion is gone. With a fixed payout structure&preference framework rewarding the number you output as “probability”, people don’t have a disagreement about what is the best number to output. Sleeping beauty is about definitions.) And still, I see posts arguing that if a tree falls on a deaf Sleeping Beauty, in a forest with no one to hear it, it surely doesn’t produce a sound, because here’s how humans perceive sounds, which is the definition of a sound, and there are demonstrably no humans around the tree. (Or maybe that it surely produces the sound because here’s the physics of the sound waves, and the tree surely abides by the laws of physics, and there are demonstrably sound waves.) This is arguing about definitions. You feel strongly that “probability” is that thing that triggers the “probability” concept neuron in your brain. If people have a different concept triggering “this is probability”, you feel like they must be wrong, because they’re pointing at something they say is a sound and you say isn’t. Probability is something defined in math by necessity. There’s only one way to do it to not get exploited in natural betting schemes/reward structures that everyone accepts when there are no anthropics involved. But if there are multiple copies of the agent, there’s no longer a single possible betting scheme defining a single possible “probability”, and people draw the boundary/generalise differently in this situation. You all should just call these two probabilities two different words instead of arguing which one is the correct definition for "probability".
Novel Science is Inherently Illegible Legibility, transparency, and open science are generally considered positive attributes, while opacity, elitism, and obscurantism are viewed as negative. However, increased legibility in science is not always beneficial and can often be detrimental. Scientific management, with some exceptions, likely underperforms compared to simpler heuristics such as giving money to smart people or implementing grant lotteries. Scientific legibility suffers from the classic "Seeing like a State" problems. It constrains endeavors to the least informed stakeholder, hinders exploration, inevitably biases research to be simple and myopic, and exposes researchers to constant political tug-of-war between different interest groups poisoning objectivity.  I think the above would be considered relatively uncontroversial in EA circles.  But I posit there is something deeper going on:  Novel research is inherently illegible. If it were legible, someone else would have already pursued it. As science advances her concepts become increasingly counterintuitive and further from common sense. Most of the legible low-hanging fruit has already been picked, and novel research requires venturing higher into the tree, pursuing illegible paths with indirect and hard-to-foresee impacts.

Popular Comments

Recent Discussion

Summary: The post describes a method that allows us to use an untrustworthy optimizer to find satisficing outputs.

Acknowledgements: Thanks to Benjamin Kolb (@benjaminko), Jobst Heitzig (@Jobst Heitzig) and Thomas Kehrenberg (@Thomas Kehrenberg)  for many helpful comments.

Introduction

Imagine you have black-box access to a powerful but untrustworthy optimizing system, the Oracle. What do I mean by "powerful but untrustworthy"? I mean that, when you give an objective function  as input to the Oracle, it will output an element  that has an impressively low[1] value of . But sadly, you don't have any guarantee that it will output the optimal element and e.g. not one that's also chosen for a different purpose (which might be dangerous for many reasons, e.g. instrumental convergence).

What questions can you safely ask the Oracle? Can you use it to...

1Simon Fischer13m
I believe this exactly the kind of thing that my proposal would be good for: Gnarly backdoors that exploit a compiler bug etc. should be very rare in the set of all valid implementations!
3Lucius Bushnaq32m
Your example has it be an important bit though. What database to use. Not a random bit. If I'm getting this right, that would correspond to far more than one bit of adversarial optimisation permitted for the oracle in this setup.  |S∩R|=2 doesn't mean the oracle gets to select one bit of its choice in the string to flip, it means it gets to select one of two strings[1]. 1. ^ Plus the empty string for not answering.
3Simon Fischer26m
I think you mean |S∩R|=2 (two answers that satisfice and fulfill the safety constraint), but otherwise I agree. This is also an example of this whole "let's measure optimization in bits"-business being a lot more subtle than it appears at first sight.

Typo fixed, thanks.

[This is part of a series I’m writing on how to convince a person that AI risk is worth paying attention to.] 

tl;dr: People’s default reaction to politics is not taking them seriously. They could center their entire personality on their political beliefs, and still not take them seriously. To get them to take you seriously, the quickest way is to make your words as unpolitical-seeming as possible. 

I’m a high school student in France. Politics in France are interesting because they’re in a confusing superposition. One second, you'll have bourgeois intellectuals sipping red wine from their Paris apartment writing essays with dubious sexual innuendos on the deep-running dynamics of power. The next, 400 farmers will vaguely agree with the sentiment and dump 20 tons of horse manure in downtown...

Lots of people already know about Scott Alexander/ACX/SSC, but I think that crossposting to LW is unusually valuable in this particular case, since lots of people were waiting for a big schelling-point overview of the 15-hour Rootclaim Lab Leak debate, and unlike LW, ACX's comment section is a massive vote-less swamp that lags the entire page and gives everyone equal status. 

It remains unclear whether commenting there is worth your time if you think you have something worth saying, since there's no sorting, only sifting, implying that it attracts small numbers of sifters instead of large numbers of people who expect sorting.

Here are the first 11 paragraphs:

Saar Wilf is an ex-Israeli entrepreneur. Since 2016, he’s been developing a new form of reasoning, meant to transcend normal human bias.

His

...

"i ain't reading all that

with probability p i'm happy for u tho

and with probability 1-p sorry that happened"

3Metacelsus2h
I agree, I think the most likely version of the lab leak scenario does not involve an engineered virus. Personally I would say 60% chance zoonotic, 40% chance lab leak.
14gwern3h
My current initial impression is that this debate format was not fit for purpose: https://www.astralcodexten.com/p/practically-a-book-review-rootclaim/comment/52659890
2trevor2h
A debate sequel, with someone other than Peter Miller (but retaining and reevaluating all the evidence he got from various sources) would be nice. I can easily imagine Miller doing better work on other research topics that don't involve any possibility of cover ups or adversarial epistemics related to falsifiability, which seem to be personal issues for him in the case of lab leak at least. Maybe with 200k on the line to incentivize Saar to return, or to set up a team this time around? With the next round of challengers bearing in mind that Saar might be willing to stomach a net loss of many thousands of dollars in order to promote his show and methodology?
This is a linkpost for https://arxiv.org/abs/2403.09863

Hi, I’d like to share my paper that proposes a novel approach for building white box neural networks.

The paper introduces semantic features as a general technique for controlled dimensionality reduction, somewhat reminiscent of Hinton’s capsules and the idea of “inverse rendering”. In short, semantic features aim to capture the core characteristic of any semantic entity - having many possible states but being at exactly one state at a time. This results in regularization that is strong enough to make the PoC neural network inherently interpretable and also robust to adversarial attacks - despite no form of adversarial training! The paper may be viewed as a manifesto for a novel white-box approach to deep learning.

As an independent researcher I’d be grateful for your feedback!

2Maciej Satkiewicz8h
Thank you! The quote you picked is on point, I added an extended summary based on this, thanks for the suggestion!
1mishka5h
Thanks, this is very interesting. I wonder if this approach is extendable to learning to predict the next word from a corpus of texts... The first layer might perhaps still be embedding from words to vectors, but what should one do then? What would be a possible minimum viable dataset? Perhaps, in the spirit of PoC of the paper, one might consider binary sequences of 0s and 1s, and have only two words, 0 and 1, and ask what would it take to have a good predictor of the next 0 or 1 given a long sequence of those as a context. This might be a good starting point, and then one might consider different examples of that problem (different examples of (sets of) sequences of 0 and 1 to learn from).

These are interesting considerations! I haven't put much thought on this yet but I have some preliminary ideas.

Semantic features are intended to capture meaning-preserving variations of structures. In that sense the "next word" problem seems ill-posed as some permutations of words preserve meaning; in reality its a hardly natural problem also from the human perspective.

The question I'd ask here is "what are the basic semantic building blocks of text for us humans?" and then try to model these blocks using the machinery of semantic features, i.e. model the ... (read more)

About 15 years ago, I read Malcolm Gladwell's Outliers. He profiled Chris Langan, an extremely high-IQ person, claiming that he had only mediocre accomplishments despite his high IQ. Chris Langan's theory of everything, the Cognitive Theoretic Model of the Universe, was mentioned. I considered that it might be worth checking out someday.

Well, someday has happened, and I looked into CTMU, prompted by Alex Zhu (who also paid me for reviewing the work). The main CTMU paper is "The Cognitive-Theoretic Model of the Universe: A New Kind of Reality Theory".

CTMU has a high-IQ mystique about it: if you don't get it, maybe it's because your IQ is too low. The paper itself is dense with insights, especially the first part. It uses quite a lot of nonstandard terminology (partially...

Luckily we can train the AIs to give us answers optimized to sound plausible to humans.

1Alex K. Chen (parrot)2h
I view a part of this as "optimizing the probability that the world is one that maximizes the probability of it enabling "God's mind" to faithfully model reality and operate at its best across all timescales". At minimum this means intelligence enhancement, human-brain symbiosis, microplastics/pollution reduction, reduction in brain aging rate, and reducing default mode noise (eg tFUS, loosening up all tied knots). The sooner we can achieve a harmonious front to end computation, the better (bc memory and our ability to hold the most faithful/error-minimizing representation will decay). There is a precipice, a period of danger where our minds are vulnerable to non-globally coherent/self deceptive thoughts that could run their own incentives to self destroy, but if we can get over this precipice, then the universe becomes more probabilistically likely to generate futures with our faithful values and thoughts. Some trade-offs have difficult calculations/no clear answers to make (eg learning increases DNA error rates - https://twitter.com/gaurav_ven/status/1773415984931459160?t=8TChCcEfRzH60z0W1bCClQ&s=19 ) and others are the "urgency vs verifiability tradeoff" and the accels and decel debate But there are still numerous Pareto efficient improvements and the sooner we do the Pareto efficient improvements (like semaglutide, canagliflozin, microplastic/pollution reduction, pain reduction, factoring out historic debt, QRI stuff), the higher the chances of ultimate alignment of "God's thought". It's interesting that the god of formal verification, davidad, is also concerned about microplastics Possibly relevant people Sam Altman has this to say: https://archive.ph/G7VVt#selection-1607.0-1887.9 Bobby azarian has a wonderful related book "romance of reality" https://www.informationphilosopher.com/solutions/scientists/layzer/ Maybe slightly related: https://twitter.com/shw0rma/status/1771212311753048135?t=qZx3U2PyFxiVCk8NBOjWqg&s=19 https://x.com/VictorTaelin?t=mPe_Or
4Richard_Kennaway4h
Exploring this on the web, I turned up a couple of related Substacks: Chris Langan's Ultimate Reality and TELEOLOGIC: CTMU Teleologic Living. The latter isn't just Chris Langan, a Dr Gina Langan is also involved. A lot of it requires a paid subscription, which for me would come lower in priority than all the definitely worthwhile blogs I also don't feel like paying for. Warning: there's a lot of conspiracy stuff there as well (Covid, "Global Occupation Government", etc.). Perhaps this 4-hour interview on "IQ, Free Will, Psychedelics, CTMU, & God" may give some further sense of his thinking. Googling "CTMU Core Affirmations" turns up a rich vein of ... something, including the CTMU Radio YouTube channel.
7jessicata8h
I don't see any. He even says his approach “leaves the current picture of reality virtually intact”. In Popper's terms this would be metaphysics, not science, which is part of why I'm skeptical of the claimed applications to quantum mechanics and so on. Note that, while there's a common interpretation of Popper saying metaphysics is meaningless, he contradicts this. Quoting Popper:

Let us start with a (non-quantum) logical coinflip - say, look at the heretofore-unknown-to-us-personally 256th binary digit of pi, where the choice of binary digit is itself intended not to be random.

If the result of this logical coinflip is 1 (aka "heads"), we'll create 18 of you in green rooms and 2 of you in red rooms, and if the result is "tails" (0), we'll create 2 of you in green rooms and 18 of you in red rooms.

After going to sleep at the start of the experiment, you wake up in a green room.

With what degree of credence do you believe - what is your posterior probability - that the logical coin came up "heads"?

There are exactly two tenable answers that I can see, "50%" and...

“You generalise probability, when anthropics are involved, to probability-2, and say a number defined by probability-2; so I’ll suggest to you a reward structure that rewards agents that say probability-1 numbers. Huh, if you still say the probability-2 number, you lose”.

This reads to me like, “You say there’s 70% chance no one will be around that falling tree to hear it, so you’re 70% sure there won’t be any sound. But I want to bet sound is much more likely; we can get measure the sound waves, and I’m 95% sure our equipment will register the sound. Wanna bet?”

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

This is my personal opinion, and in particular, does not represent anything like a MIRI consensus; I've gotten push-back from almost everyone I've spoken with about this, although in most cases I believe I eventually convinced them of the narrow terminological point I'm making.

In the AI x-risk community, I think there is a tendency to ask people to estimate "time to AGI" when what is meant is really something more like "time to doom" (or, better, point-of-no-return). For about a year, I've been answering this question "zero" when asked.

This strikes some people as absurd or at best misleading. I disagree.

The term "Artificial General Intelligence" (AGI) was coined in the early 00s, to contrast with the prevalent paradigm of Narrow AI. I was getting my undergraduate computer science...

I very much agree with this. You're not the only one! I've been thinking for a while that actually, AGI is here (by all previous definitions of AGI). 

Furthermore, I want to suggest that the people who are saying we don't yet have AGI will in fact never be satisfied by what an AI does. The reason is this: An AI will never ever act like a human. By the time its ability to do basic human things like speak and drive are up to human standards (already happened), its abilities in other areas, like playing computer games and calculating, will far exceed ours... (read more)

4Gerald Monroe3h
Yes, I agree. Whenever I think of things like this I focus on how what matters in the sense of "when will agi be transformational" is the idea of criticality. I have written on it earlier but the simple idea is that our human world changes rapidly when AI capabilities in some way lead to more AI capabilities at a fast rate. Like this whole "is this AGI" thing is totally irrelevant, all that matters is criticality. You can imagine subhuman systems using AGI reaching criticality, and superhuman systems being needed. (Note ordinary humans do have criticality albeit with a doubling time of about 20 years) There are many forms of criticality, and the first one unlocked that won't quench easily starts the singularity. Examples: Investment criticality: each AI demo leads to more investment than the total cost, including failures at other companies, to produce the demo. Quenches if investors run out of money or find a better investment sector. Financial criticality: AI services delivered by AI bring in more than they cost in revenue, and each reinvestment effectively has a greater than 10 percent ROI. This quenches once further reinvestments in AI don't pay for themselves. Partial self replication criticality. Robots can build most of the parts used in themselves, I use post 2020 automation. This quenches at the new equilibrium determined by the percent of automation. Aka 90 percent automation makes each human worker left 10 times as productive so we quench at 10x number of robots possible if every worker on earth was building robots. Full self replication criticality : this quenches when matter mineable in the solar system is all consumed and made into either more robots or waste piles. AI research criticality: AI systems research and develop better AI systems. Quenches when you find the most powerful AI the underlying compute and data can support. You may notice 2 are satisfied, one eoy 2022, one later 2023. So in that sense the Singularity began and will accel
2abramdemski5h
I haven't watched the LeCun interview you reference (it is several hours long, so relevant time-stamps to look at would be appreciated), but this still does not make sense to me -- backprop already seems like a way to constantly predict future experience and update, particularly as it is employed in LLMs. Generating predictions first and then updating based on error is how backprop works. Some form of closeness measure is required, just like you emphasize.
1cubefox4h
Well, backpropagation alone wasn't even enough to make efficient LLMs feasible. It took decades, till the invention of transformers, to make them work. Similarly, knowing how to make LLMs is not yet sufficient to implement predictive coding. LeCun talks about the problem in a short section here from 10:55 to 14:19.
g-w1

Hey, so I wanted to start this dialogue because we were talking on Discord about the secondary school systems and college admission processes in the US vs NZ, and some of the differences were very surprising to me.

 

I think that it may be illuminating to fellow Americans to see the variation in pedagogy. Let's start off with grades. In America, the way school works is that you sit in class and then have projects and tests that go into a gradebook. Roughly speaking, each assignment has a max points you can earn. Your final grade for a subject is . Every school has a different way of doing the grading though. Some use A-F, while some use a number out of 4, 5, or 100. Colleges then

...

On 16 March 2024, I sat down to chat with New York Times technology reporter Cade Metz! In part of our conversation, transcribed below, we discussed his February 2021 article "Silicon Valley's Safe Space", covering Scott Alexander's Slate Star Codex blog and the surrounding community.

The transcript has been significantly edited for clarity. (It turns out that real-time conversation transcribed completely verbatim is full of filler words, false starts, crosstalk, "uh huh"s, "yeah"s, pauses while one party picks up their coffee order, &c. that do not seem particularly substantive.)


ZMD: I actually have some questions for you.

CM: Great, let's start with that.

ZMD: They're critical questions, but one of the secret-lore-of-rationality things is that a lot of people think criticism is bad, because if someone criticizes you, it hurts your...

I only skimmed the NYT piece about China and ai talent, but didn't see evidence of what you said (dishonestly angle shooting the AI safety scene).

2frankybegs5h
  I said "specialist journalist/hacker skills". I don't think it's at all true that anyone could find out Scott's true identity as easily as putting a key in a lock, and I think that analogy clearly misleads vs the hacker one, because the journalist did use his demonstrably non-ubiquitous skills to find out the truth and then broadcast it to everyone else. To me the phone hacking analogy is much closer, but if we must use a lock-based one, it's more like a lockpick who picks a (perhaps not hugely difficult) lock and then jams it so anyone else can enter. Still very morally wrong, I think most would agree.
11Elizabeth5h
  I think Zack's description might be too charitable to Scott. From his description I thought the reference would be strictly about poverty. But the full quote includes a lot about genetics and ability to earn money.  The full quote is Scott doesn't mention race, but it's an obvious implication[1], especially when quoting someone the NYT crowd views as anathema. I think Metz could have quoted that paragraph, and maybe given the NYT consensus view on him for anyone who didn't know, and readers would think very poorly of Scott[2].  I bring this up for a couple of reasons:  1. it seems in the spirit of Zack's post to point out when he made an error in presenting evidence. 2. it looks like Metz chose to play stupid symmetric warfare games, instead of the epistemically virtuous thing of sharing a direct quote. The quote should have gotten him what he wanted, so why be dishonest about it? I have some hypotheses, none of which lead me to trust Metz. 1. ^ ETA: If you hold the vary common assumption that race is a good proxy for genetics. I disagree, but that is the default view. 2. ^ To be clear: that paragraph doesn't make me think poorly of Scott. I personally agree with Scott that genetics influences jobs and income. I like UBI for lots of reasons, including this one. If I read that paragraph I wouldn't find any of the views objectionable (although a little eyebrow raise that he couldn't find an example with a less toxic reputation- but I can't immediately think of another example that fits either). 
6Jiro5h
The reason that I can make a statement about journalists based on this is that the New York Times really is big and influential in the journalism profession. On the other hand, Poor Minorities aren't representative of poor minorities. Not only that, the poor minorities example is wrong in the first place. Even the restricted subset of poor minorities don't all want to steal your company's money. The motte-and-bailey statement isn't even true about the motte. You never even get to the point of saying something that's true about the motte but false about the bailey.

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA