In this post, I proclaim/endorse forum participation (aka commenting) as a productive research strategy that I've managed to stumble upon, and recommend it to others (at least to try). Note that this is different from saying that forum/blog posts are a good way for a research community to communicate. It's about individually doing better as researchers.

yanni1d3349
2
I like the fact that despite not being (relatively) young when they died, the LW banner states that Kahneman & Vinge have died "FAR TOO YOUNG", pointing to the fact that death is always bad and/or it is bad when people die when they were still making positive contributions to the world (Kahneman published "Noise" in 2021!).
Dictionary/SAE learning on model activations is bad as anomaly detection because you need to train the dictionary on a dataset, which means you needed the anomaly to be in the training set. How to do dictionary learning without a dataset? One possibility is to use uncertainty-estimation-like techniques to detect when the model "thinks its on-distribution" for randomly sampled activations.
habryka4d5120
10
A thing that I've been thinking about for a while has been to somehow make LessWrong into something that could give rise to more personal-wikis and wiki-like content. Gwern's writing has a very different structure and quality to it than the posts on LW, with the key components being that they get updated regularly and serve as more stable references for some concept, as opposed to a post which is usually anchored in a specific point in time.  We have a pretty good wiki system for our tags, but never really allowed people to just make their personal wiki pages, mostly because there isn't really any place to find them. We could list the wiki pages you created on your profile, but that doesn't really seem like it would allocate attention to them successfully. I was thinking about this more recently as Arbital is going through another round of slowly rotting away (its search currently being broken and this being very hard to fix due to annoying Google Apps Engine restrictions) and thinking about importing all the Arbital content into LessWrong. That might be a natural time to do a final push to enable people to write more wiki-like content on the site.
People are arguing about the answer to the Sleeping Beauty! I thought this was pretty much dissolved with this post's title! But there are lengthy posts and even a prediction market! Sleeping Beauty is an edge case where different reward structures are intuitively possible, and so people imagine different game payout structures behind the definition of “probability”. Once the payout structure is fixed, the confusion is gone. With a fixed payout structure&preference framework rewarding the number you output as “probability”, people don’t have a disagreement about what is the best number to output. Sleeping beauty is about definitions.) And still, I see posts arguing that if a tree falls on a deaf Sleeping Beauty, in a forest with no one to hear it, it surely doesn’t produce a sound, because here’s how humans perceive sounds, which is the definition of a sound, and there are demonstrably no humans around the tree. (Or maybe that it surely produces the sound because here’s the physics of the sound waves, and the tree surely abides by the laws of physics, and there are demonstrably sound waves.) This is arguing about definitions. You feel strongly that “probability” is that thing that triggers the “probability” concept neuron in your brain. If people have a different concept triggering “this is probability”, you feel like they must be wrong, because they’re pointing at something they say is a sound and you say isn’t. Probability is something defined in math by necessity. There’s only one way to do it to not get exploited in natural betting schemes/reward structures that everyone accepts when there are no anthropics involved. But if there are multiple copies of the agent, there’s no longer a single possible betting scheme defining a single possible “probability”, and people draw the boundary/generalise differently in this situation. You all should just call these two probabilities two different words instead of arguing which one is the correct definition for "probability".
Novel Science is Inherently Illegible Legibility, transparency, and open science are generally considered positive attributes, while opacity, elitism, and obscurantism are viewed as negative. However, increased legibility in science is not always beneficial and can often be detrimental. Scientific management, with some exceptions, likely underperforms compared to simpler heuristics such as giving money to smart people or implementing grant lotteries. Scientific legibility suffers from the classic "Seeing like a State" problems. It constrains endeavors to the least informed stakeholder, hinders exploration, inevitably biases research to be simple and myopic, and exposes researchers to constant political tug-of-war between different interest groups poisoning objectivity.  I think the above would be considered relatively uncontroversial in EA circles.  But I posit there is something deeper going on:  Novel research is inherently illegible. If it were legible, someone else would have already pursued it. As science advances her concepts become increasingly counterintuitive and further from common sense. Most of the legible low-hanging fruit has already been picked, and novel research requires venturing higher into the tree, pursuing illegible paths with indirect and hard-to-foresee impacts.

Popular Comments

Recent Discussion

Summary: The post describes a method that allows us to use an untrustworthy optimizer to find satisficing outputs.

Acknowledgements: Thanks to Benjamin Kolb (@benjaminko), Jobst Heitzig (@Jobst Heitzig) and Thomas Kehrenberg (@Thomas Kehrenberg)  for many helpful comments.

Introduction

Imagine you have black-box access to a powerful but untrustworthy optimizing system, the Oracle. What do I mean by "powerful but untrustworthy"? I mean that, when you give an objective function  as input to the Oracle, it will output an element  that has an impressively low[1] value of . But sadly, you don't have any guarantee that it will output the optimal element and e.g. not one that's also chosen for a different purpose (which might be dangerous for many reasons, e.g. instrumental convergence).

What questions can you safely ask the Oracle? Can you use it to...

2Gerald Monroe21m
For this particular situation, can you describe what the restriction would be in concrete terms? Is it "ok write this compiler function to convert C arithmetic to bytecode. Declare any variables used at the latest valid location. Use only 3 registers. " And then elsewhere in the compiler the restriction might be "declare any variables used at the top of the main function and pass it by reference to any child functions. Use all available registers possible, and manually update the instruction pointer"
1Simon Fischer12m
I'm not sure I understand your question. What restriction do you have in mind? A safety restriction on what the generated code should be like? Something like requiring the code should be in some canonical form to remove degrees of freedom for the (potential malicious) code generating AI?

I gave "changing canon randomly" in the comment you are replying to. Is this how you propose limiting the hostile AIs ability to inject subtle hostile plans? Or similarly, "design the columns for this building. Oh they must all be roman arches." Would be a similar example.

2Lucius Bushnaq23m
Typo fixed, thanks.

This reminds me of when Charlie Munger died at 99, and many said of him "he was just a child". Less of a nod to transhumanist aspirations, and more to how he retained his sparkling energy and curiosity up until death. There are quite a few good reasons to write "dead far too young". 

8the gears to ascension9h
I like it too, and because your comment made me think about it, I now kind of wish it said "orders of magnitude too young"

[This is part of a series I’m writing on how to convince a person that AI risk is worth paying attention to.] 

tl;dr: People’s default reaction to politics is not taking them seriously. They could center their entire personality on their political beliefs, and still not take them seriously. To get them to take you seriously, the quickest way is to make your words as unpolitical-seeming as possible. 

I’m a high school student in France. Politics in France are interesting because they’re in a confusing superposition. One second, you'll have bourgeois intellectuals sipping red wine from their Paris apartment writing essays with dubious sexual innuendos on the deep-running dynamics of power. The next, 400 farmers will vaguely agree with the sentiment and dump 20 tons of horse manure in downtown...

More French stories: So, at some point, the French decided what kind of political climate they wanted. What actions would reflect on their cause well? Dumping manure onto the city center using tractors? Sure! Lining up a hundred stationary taxi cabs in every main artery of the city? You bet! What about burning down the city hall's door, which is a work of art older than the United States? Mais évidemment!

"Politics" evokes all that in the mind of your average Frenchman. No, not sensible strategies that get your goals done, but the first shiny thing the prot... (read more)

Behold the dogit lens. Patch-level logit attribution is an emergent segmentation map.

Join our Discord here.

This article was written by Sonia Joseph, in collaboration with Neel Nanda, and incubated in Blake Richards’s lab at Mila and in the MATS community. Thank you to the Prisma core contributors, including Praneet Suresh, Rob Graham, and Yash Vadi. 

Full acknowledgements of contributors are at the end. I am grateful to my collaborators for their guidance and feedback.

Outline

  • Part One: Introduction and Motivation
  • Part Two: Tutorial Notebooks
  • Part Three: Brief ViT Overview
  • Part Four: Demo of Prisma’s Functionality
    • Key features, including logit attribution, attention head visualization, and activation patching.
    • Preliminary research results obtained using Prisma, including emergent segmentation maps and canonical attention heads.
  • Part Five: FAQ, including Key Differences between Vision and Language Mechanistic Interpretability
  • Part Six: Getting Started with Vision Mechanistic
...

Thanks for your comment. Some follow-up thoughts, especially regarding your second point:

There currently seems to be this implicit zeitgeist in the mech interp community that other modalities will simply be an extension or subcase of language. For example, a previous poster made the analogy about studying vision mech interp’s usefulness compared to mech interp’s: “Fusion power plants will need to be built in many countries, and it's increasing clear that fusion power plant construction can't only study building fusion power in the US.” The implicit assumpt... (read more)

Lots of people already know about Scott Alexander/ACX/SSC, but I think that crossposting to LW is unusually valuable in this particular case, since lots of people were waiting for a big schelling-point overview of the 15-hour Rootclaim Lab Leak debate, and unlike LW, ACX's comment section is a massive vote-less swamp that lags the entire page and gives everyone equal status. 

It remains unclear whether commenting there is worth your time if you think you have something worth saying, since there's no sorting, only sifting, implying that it attracts small numbers of sifters instead of large numbers of people who expect sorting.

Here are the first 11 paragraphs:

Saar Wilf is an ex-Israeli entrepreneur. Since 2016, he’s been developing a new form of reasoning, meant to transcend normal human bias.

His

...

"i ain't reading all that

with probability p i'm happy for u tho

and with probability 1-p sorry that happened"

3Metacelsus3h
I agree, I think the most likely version of the lab leak scenario does not involve an engineered virus. Personally I would say 60% chance zoonotic, 40% chance lab leak.
14gwern4h
My current initial impression is that this debate format was not fit for purpose: https://www.astralcodexten.com/p/practically-a-book-review-rootclaim/comment/52659890
2trevor2h
A debate sequel, with someone other than Peter Miller (but retaining and reevaluating all the evidence he got from various sources) would be nice. I can easily imagine Miller doing better work on other research topics that don't involve any possibility of cover ups or adversarial epistemics related to falsifiability, which seem to be personal issues for him in the case of lab leak at least. Maybe with 200k on the line to incentivize Saar to return, or to set up a team this time around? With the next round of challengers bearing in mind that Saar might be willing to stomach a net loss of many thousands of dollars in order to promote his show and methodology?
This is a linkpost for https://arxiv.org/abs/2403.09863

Hi, I’d like to share my paper that proposes a novel approach for building white box neural networks.

The paper introduces semantic features as a general technique for controlled dimensionality reduction, somewhat reminiscent of Hinton’s capsules and the idea of “inverse rendering”. In short, semantic features aim to capture the core characteristic of any semantic entity - having many possible states but being at exactly one state at a time. This results in regularization that is strong enough to make the PoC neural network inherently interpretable and also robust to adversarial attacks - despite no form of adversarial training! The paper may be viewed as a manifesto for a novel white-box approach to deep learning.

As an independent researcher I’d be grateful for your feedback!

2Maciej Satkiewicz8h
Thank you! The quote you picked is on point, I added an extended summary based on this, thanks for the suggestion!
1mishka6h
Thanks, this is very interesting. I wonder if this approach is extendable to learning to predict the next word from a corpus of texts... The first layer might perhaps still be embedding from words to vectors, but what should one do then? What would be a possible minimum viable dataset? Perhaps, in the spirit of PoC of the paper, one might consider binary sequences of 0s and 1s, and have only two words, 0 and 1, and ask what would it take to have a good predictor of the next 0 or 1 given a long sequence of those as a context. This might be a good starting point, and then one might consider different examples of that problem (different examples of (sets of) sequences of 0 and 1 to learn from).

These are interesting considerations! I haven't put much thought on this yet but I have some preliminary ideas.

Semantic features are intended to capture meaning-preserving variations of structures. In that sense the "next word" problem seems ill-posed as some permutations of words preserve meaning; in reality its a hardly natural problem also from the human perspective.

The question I'd ask here is "what are the basic semantic building blocks of text for us humans?" and then try to model these blocks using the machinery of semantic features, i.e. model the ... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

About 15 years ago, I read Malcolm Gladwell's Outliers. He profiled Chris Langan, an extremely high-IQ person, claiming that he had only mediocre accomplishments despite his high IQ. Chris Langan's theory of everything, the Cognitive Theoretic Model of the Universe, was mentioned. I considered that it might be worth checking out someday.

Well, someday has happened, and I looked into CTMU, prompted by Alex Zhu (who also paid me for reviewing the work). The main CTMU paper is "The Cognitive-Theoretic Model of the Universe: A New Kind of Reality Theory".

CTMU has a high-IQ mystique about it: if you don't get it, maybe it's because your IQ is too low. The paper itself is dense with insights, especially the first part. It uses quite a lot of nonstandard terminology (partially...

Luckily we can train the AIs to give us answers optimized to sound plausible to humans.

1Alex K. Chen (parrot)2h
I view a part of this as "optimizing the probability that the world is one that maximizes the probability of it enabling "God's mind" to faithfully model reality and operate at its best across all timescales". At minimum this means intelligence enhancement, human-brain symbiosis, microplastics/pollution reduction, reduction in brain aging rate, and reducing default mode noise (eg tFUS, loosening up all tied knots). The sooner we can achieve a harmonious front to end computation, the better (bc memory and our ability to hold the most faithful/error-minimizing representation will decay). There is a precipice, a period of danger where our minds are vulnerable to non-globally coherent/self deceptive thoughts that could run their own incentives to self destroy, but if we can get over this precipice, then the universe becomes more probabilistically likely to generate futures with our faithful values and thoughts. Some trade-offs have difficult calculations/no clear answers to make (eg learning increases DNA error rates - https://twitter.com/gaurav_ven/status/1773415984931459160?t=8TChCcEfRzH60z0W1bCClQ&s=19 ) and others are the "urgency vs verifiability tradeoff" and the accels and decel debate But there are still numerous Pareto efficient improvements and the sooner we do the Pareto efficient improvements (like semaglutide, canagliflozin, microplastic/pollution reduction, pain reduction, factoring out historic debt, QRI stuff), the higher the chances of ultimate alignment of "God's thought". It's interesting that the god of formal verification, davidad, is also concerned about microplastics Possibly relevant people Sam Altman has this to say: https://archive.ph/G7VVt#selection-1607.0-1887.9 Bobby azarian has a wonderful related book "romance of reality" https://www.informationphilosopher.com/solutions/scientists/layzer/ Maybe slightly related: https://twitter.com/shw0rma/status/1771212311753048135?t=qZx3U2PyFxiVCk8NBOjWqg&s=19 https://x.com/VictorTaelin?t=mPe_Or
4Richard_Kennaway4h
Exploring this on the web, I turned up a couple of related Substacks: Chris Langan's Ultimate Reality and TELEOLOGIC: CTMU Teleologic Living. The latter isn't just Chris Langan, a Dr Gina Langan is also involved. A lot of it requires a paid subscription, which for me would come lower in priority than all the definitely worthwhile blogs I also don't feel like paying for. Warning: there's a lot of conspiracy stuff there as well (Covid, "Global Occupation Government", etc.). Perhaps this 4-hour interview on "IQ, Free Will, Psychedelics, CTMU, & God" may give some further sense of his thinking. Googling "CTMU Core Affirmations" turns up a rich vein of ... something, including the CTMU Radio YouTube channel.
7jessicata8h
I don't see any. He even says his approach “leaves the current picture of reality virtually intact”. In Popper's terms this would be metaphysics, not science, which is part of why I'm skeptical of the claimed applications to quantum mechanics and so on. Note that, while there's a common interpretation of Popper saying metaphysics is meaningless, he contradicts this. Quoting Popper:

Let us start with a (non-quantum) logical coinflip - say, look at the heretofore-unknown-to-us-personally 256th binary digit of pi, where the choice of binary digit is itself intended not to be random.

If the result of this logical coinflip is 1 (aka "heads"), we'll create 18 of you in green rooms and 2 of you in red rooms, and if the result is "tails" (0), we'll create 2 of you in green rooms and 18 of you in red rooms.

After going to sleep at the start of the experiment, you wake up in a green room.

With what degree of credence do you believe - what is your posterior probability - that the logical coin came up "heads"?

There are exactly two tenable answers that I can see, "50%" and...

“You generalise probability, when anthropics are involved, to probability-2, and say a number defined by probability-2; so I’ll suggest to you a reward structure that rewards agents that say probability-1 numbers. Huh, if you still say the probability-2 number, you lose”.

This reads to me like, “You say there’s 70% chance no one will be around that falling tree to hear it, so you’re 70% sure there won’t be any sound. But I want to bet sound is much more likely; we can get measure the sound waves, and I’m 95% sure our equipment will register the sound. Wanna bet?”

This is my personal opinion, and in particular, does not represent anything like a MIRI consensus; I've gotten push-back from almost everyone I've spoken with about this, although in most cases I believe I eventually convinced them of the narrow terminological point I'm making.

In the AI x-risk community, I think there is a tendency to ask people to estimate "time to AGI" when what is meant is really something more like "time to doom" (or, better, point-of-no-return). For about a year, I've been answering this question "zero" when asked.

This strikes some people as absurd or at best misleading. I disagree.

The term "Artificial General Intelligence" (AGI) was coined in the early 00s, to contrast with the prevalent paradigm of Narrow AI. I was getting my undergraduate computer science...

I very much agree with this. You're not the only one! I've been thinking for a while that actually, AGI is here (by all previous definitions of AGI). 

Furthermore, I want to suggest that the people who are saying we don't yet have AGI will in fact never be satisfied by what an AI does. The reason is this: An AI will never ever act like a human. By the time its ability to do basic human things like speak and drive are up to human standards (already happened), its abilities in other areas, like playing computer games and calculating, will far exceed ours... (read more)

4Gerald Monroe3h
Yes, I agree. Whenever I think of things like this I focus on how what matters in the sense of "when will agi be transformational" is the idea of criticality. I have written on it earlier but the simple idea is that our human world changes rapidly when AI capabilities in some way lead to more AI capabilities at a fast rate. Like this whole "is this AGI" thing is totally irrelevant, all that matters is criticality. You can imagine subhuman systems using AGI reaching criticality, and superhuman systems being needed. (Note ordinary humans do have criticality albeit with a doubling time of about 20 years) There are many forms of criticality, and the first one unlocked that won't quench easily starts the singularity. Examples: Investment criticality: each AI demo leads to more investment than the total cost, including failures at other companies, to produce the demo. Quenches if investors run out of money or find a better investment sector. Financial criticality: AI services delivered by AI bring in more than they cost in revenue, and each reinvestment effectively has a greater than 10 percent ROI. This quenches once further reinvestments in AI don't pay for themselves. Partial self replication criticality. Robots can build most of the parts used in themselves, I use post 2020 automation. This quenches at the new equilibrium determined by the percent of automation. Aka 90 percent automation makes each human worker left 10 times as productive so we quench at 10x number of robots possible if every worker on earth was building robots. Full self replication criticality : this quenches when matter mineable in the solar system is all consumed and made into either more robots or waste piles. AI research criticality: AI systems research and develop better AI systems. Quenches when you find the most powerful AI the underlying compute and data can support. You may notice 2 are satisfied, one eoy 2022, one later 2023. So in that sense the Singularity began and will accel
2abramdemski5h
I haven't watched the LeCun interview you reference (it is several hours long, so relevant time-stamps to look at would be appreciated), but this still does not make sense to me -- backprop already seems like a way to constantly predict future experience and update, particularly as it is employed in LLMs. Generating predictions first and then updating based on error is how backprop works. Some form of closeness measure is required, just like you emphasize.
1cubefox4h
Well, backpropagation alone wasn't even enough to make efficient LLMs feasible. It took decades, till the invention of transformers, to make them work. Similarly, knowing how to make LLMs is not yet sufficient to implement predictive coding. LeCun talks about the problem in a short section here from 10:55 to 14:19.

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA