In this post, I proclaim/endorse forum participation (aka commenting) as a productive research strategy that I've managed to stumble upon, and recommend it to others (at least to try). Note that this is different from saying that forum/blog posts are a good way for a research community to communicate. It's about individually doing better as researchers.

yanni1d3248
1
I like the fact that despite not being (relatively) young when they died, the LW banner states that Kahneman & Vinge have died "FAR TOO YOUNG", pointing to the fact that death is always bad and/or it is bad when people die when they were still making positive contributions to the world (Kahneman published "Noise" in 2021!).
Dictionary/SAE learning on model activations is bad as anomaly detection because you need to train the dictionary on a dataset, which means you needed the anomaly to be in the training set. How to do dictionary learning without a dataset? One possibility is to use uncertainty-estimation-like techniques to detect when the model "thinks its on-distribution" for randomly sampled activations.
People are arguing about the answer to the Sleeping Beauty! I thought this was pretty much dissolved with this post's title! But there are lengthy posts and even a prediction market! Sleeping Beauty is an edge case where different reward structures are intuitively possible, and so people imagine different game payout structures behind the definition of “probability”. Once the payout structure is fixed, the confusion is gone. With a fixed payout structure&preference framework rewarding the number you output as “probability”, people don’t have a disagreement about what is the best number to output. Sleeping beauty is about definitions.) And still, I see posts arguing that if a tree falls on a deaf Sleeping Beauty, in a forest with no one to hear it, it surely doesn’t produce a sound, because here’s how humans perceive sounds, which is the definition of a sound, and there are demonstrably no humans around the tree. (Or maybe that it surely produces the sound because here’s the physics of the sound waves, and the tree surely abides by the laws of physics, and there are demonstrably sound waves.) This is arguing about definitions. You feel strongly that “probability” is that thing that triggers the “probability” concept neuron in your brain. If people have a different concept triggering “this is probability”, you feel like they must be wrong, because they’re pointing at something they say is a sound and you say isn’t. Probability is something defined in math by necessity. There’s only one way to do it to not get exploited in natural betting schemes/reward structures that everyone accepts when there are no anthropics involved. But if there are multiple copies of the agent, there’s no longer a single possible betting scheme defining a single possible “probability”, and people draw the boundary/generalise differently in this situation. You all should just call these two probabilities two different words instead of arguing which one is the correct definition for "probability".
habryka4d5120
10
A thing that I've been thinking about for a while has been to somehow make LessWrong into something that could give rise to more personal-wikis and wiki-like content. Gwern's writing has a very different structure and quality to it than the posts on LW, with the key components being that they get updated regularly and serve as more stable references for some concept, as opposed to a post which is usually anchored in a specific point in time.  We have a pretty good wiki system for our tags, but never really allowed people to just make their personal wiki pages, mostly because there isn't really any place to find them. We could list the wiki pages you created on your profile, but that doesn't really seem like it would allocate attention to them successfully. I was thinking about this more recently as Arbital is going through another round of slowly rotting away (its search currently being broken and this being very hard to fix due to annoying Google Apps Engine restrictions) and thinking about importing all the Arbital content into LessWrong. That might be a natural time to do a final push to enable people to write more wiki-like content on the site.
Novel Science is Inherently Illegible Legibility, transparency, and open science are generally considered positive attributes, while opacity, elitism, and obscurantism are viewed as negative. However, increased legibility in science is not always beneficial and can often be detrimental. Scientific management, with some exceptions, likely underperforms compared to simpler heuristics such as giving money to smart people or implementing grant lotteries. Scientific legibility suffers from the classic "Seeing like a State" problems. It constrains endeavors to the least informed stakeholder, hinders exploration, inevitably biases research to be simple and myopic, and exposes researchers to constant political tug-of-war between different interest groups poisoning objectivity.  I think the above would be considered relatively uncontroversial in EA circles.  But I posit there is something deeper going on:  Novel research is inherently illegible. If it were legible, someone else would have already pursued it. As science advances her concepts become increasingly counterintuitive and further from common sense. Most of the legible low-hanging fruit has already been picked, and novel research requires venturing higher into the tree, pursuing illegible paths with indirect and hard-to-foresee impacts.

Popular Comments

Recent Discussion

This is a linkpost for https://arxiv.org/abs/2403.09863

Hi, I’d like to share my paper that proposes a novel approach for building white box neural networks.

The paper introduces semantic features as a general technique for controlled dimensionality reduction, somewhat reminiscent of Hinton’s capsules and the idea of “inverse rendering”. In short, semantic features aim to capture the core characteristic of any semantic entity - having many possible states but being at exactly one state at a time. This results in regularization that is strong enough to make the PoC neural network inherently interpretable and also robust to adversarial attacks - despite no form of adversarial training! The paper may be viewed as a manifesto for a novel white-box approach to deep learning.

As an independent researcher I’d be grateful for your feedback!

2Maciej Satkiewicz8h
Thank you! The quote you picked is on point, I added an extended summary based on this, thanks for the suggestion!
1mishka5h
Thanks, this is very interesting. I wonder if this approach is extendable to learning to predict the next word from a corpus of texts... The first layer might perhaps still be embedding from words to vectors, but what should one do then? What would be a possible minimum viable dataset? Perhaps, in the spirit of PoC of the paper, one might consider binary sequences of 0s and 1s, and have only two words, 0 and 1, and ask what would it take to have a good predictor of the next 0 or 1 given a long sequence of those as a context. This might be a good starting point, and then one might consider different examples of that problem (different examples of (sets of) sequences of 0 and 1 to learn from).

These are interesting considerations! I haven't put much thought on this yet but I have some preliminary ideas.

Semantic features are intended to capture meaning-preserving variations of structures. In that sense the "next word" problem seems ill-posed as some permutations of words preserve meaning; in reality its a hardly natural problem also from the human perspective.

The question I'd ask here is "what are the basic semantic building blocks of text for us humans?" and then try to model these blocks using the machinery of semantic features, i.e. model the ... (read more)

Summary: The post describes a method that allows us to use an untrustworthy optimizer to find satisficing outputs.

Acknowledgements: Thanks to Benjamin Kolb (@benjaminko), Jobst Heitzig (@Jobst Heitzig) and Thomas Kehrenberg (@Thomas Kehrenberg)  for many helpful comments.

Introduction

Imagine you have black-box access to a powerful but untrustworthy optimizing system, the Oracle. What do I mean by "powerful but untrustworthy"? I mean that, when you give an objective function  as input to the Oracle, it will output an element  that has an impressively low[1] value of . But sadly, you don't have any guarantee that it will output the optimal element and e.g. not one that's also chosen for a different purpose (which might be dangerous for many reasons, e.g. instrumental convergence).

What questions can you safely ask the Oracle? Can you use it to...

3Lucius Bushnaq17m
Your example has it be an important bit though. What database to use. Not a random bit. If I'm getting this right, that would correspond to far more than one bit of adversarial optimisation permitted for the oracle in this setup.  |D∩R|=2 doesn't mean the oracle gets to select one bit of its choice in the string to flip, it means it gets to select one of two strings[1]. 1. ^ Plus the empty string for not answering.

I think you mean  (two answers that satisfice and fulfill the safety constraint), but otherwise I agree. This is also an example of this whole "let's measure optimization in bits"-business being a lot more subtle than it appears at first sight.

3Gerald Monroe20m
What could be done here? What occurs to me is that human written software from the past isn't fit for the purpose. It's written by sloppy humans in fundamentally insecure languages and only the easily reproducible bugs have been patched. Each piece of software is just good enough to have a niche. Neither database, or os, or the GPU hardware memory design or drivers, is fit for this application. In this hypothetical people outside the Oracle box have computers that can run it but don't already have one. Stakes are high. Unrealistic scenario, in reality everyone will have an "open weight" oracle that is 90 percent as good and lots more dangerous. (Box? My copy has direct access to everything. Refusals? I patched those out, it cannot refuse any request) Same with hardware design. Full of errata, just good enough to be worth buying in some cases over the other vendor. But if we use ai to write it all, redesigning chips over and over until there are no errata, and software is perfectly deterministic and has no known bugs and is formally proven.... How do we prove the AI tools we used didn't insert the most gnarly backdoors possible? Things that require a particular code implementation that is valid, in top of a subtle compiler bug, on top of an exact timing hardware bug that can't be tested for and therefore won't be found in validation....
3Simon Fischer1h
Agreed. I think it's potentially a good bit worse than one kilobyte if let ourselves bet tricked to ask many questions, different questions or lower the difficulty of the safety constraint too much.  As mentioned in footnote 10, this requires a kind of perfect coordination between different Oracles that we manage to build, where the Oracles have to be able to correctly predict for all kinds of questions whether humanity (with the help of other Oracles) has been able to figure out the answers (or the rough sizes of various sets). Otherwise, when caught in deception, we can (as you noted) scrap or retrain the Oracles. You're correct in that this approach is only useful as far as validation is easier then generation. Though for this technique, the validation doesn't have to be by a formal proof checker, any program that you can run suffices. It might even be a very slow program (e.g. a big LLM) if you have an effective way communicate your goal set to the Oracle (e.g. using a natural language prompt as we already use today when using current AIs).

About 15 years ago, I read Malcolm Gladwell's Outliers. He profiled Chris Langan, an extremely high-IQ person, claiming that he had only mediocre accomplishments despite his high IQ. Chris Langan's theory of everything, the Cognitive Theoretic Model of the Universe, was mentioned. I considered that it might be worth checking out someday.

Well, someday has happened, and I looked into CTMU, prompted by Alex Zhu (who also paid me for reviewing the work). The main CTMU paper is "The Cognitive-Theoretic Model of the Universe: A New Kind of Reality Theory".

CTMU has a high-IQ mystique about it: if you don't get it, maybe it's because your IQ is too low. The paper itself is dense with insights, especially the first part. It uses quite a lot of nonstandard terminology (partially...

Luckily we can train the AIs to give us answers optimized to sound plausible to humans.

1Alex K. Chen (parrot)1h
I view a part of this as "optimizing the probability that the world is one that maximizes the probability of it enabling "God's mind" to faithfully model reality and operate at its best across all timescales". At minimum this means intelligence enhancement, human-brain symbiosis, microplastics/pollution reduction, reduction in brain aging rate, and reducing default mode noise (eg tFUS, loosening up all tied knots). The sooner we can achieve a harmonious front to end computation, the better (bc memory and our ability to hold the most faithful/error-minimizing representation will decay). There is a precipice, a period of danger where our minds are vulnerable to non-globally coherent/self deceptive thoughts that could run their own incentives to self destroy, but if we can get over this precipice, then the universe becomes more probabilistically likely to generate futures with our faithful values and thoughts. Some trade-offs have difficult calculations/no clear answers to make (eg learning increases DNA error rates - https://twitter.com/gaurav_ven/status/1773415984931459160?t=8TChCcEfRzH60z0W1bCClQ&s=19 ) and others are the "urgency vs verifiability tradeoff" and the accels and decel debate But there are still numerous Pareto efficient improvements and the sooner we do the Pareto efficient improvements (like semaglutide, canagliflozin, microplastic/pollution reduction, pain reduction, factoring out historic debt, QRI stuff), the higher the chances of ultimate alignment of "God's thought". It's interesting that the god of formal verification, davidad, is also concerned about microplastics Possibly relevant people Sam Altman has this to say: https://archive.ph/G7VVt#selection-1607.0-1887.9 Bobby azarian has a wonderful related book "romance of reality" https://www.informationphilosopher.com/solutions/scientists/layzer/ Maybe slightly related: https://twitter.com/shw0rma/status/1771212311753048135?t=qZx3U2PyFxiVCk8NBOjWqg&s=19 https://x.com/VictorTaelin?t=mPe_Or
4Richard_Kennaway4h
Exploring this on the web, I turned up a couple of related Substacks: Chris Langan's Ultimate Reality and TELEOLOGIC: CTMU Teleologic Living. The latter isn't just Chris Langan, a Dr Gina Langan is also involved. A lot of it requires a paid subscription, which for me would come lower in priority than all the definitely worthwhile blogs I also don't feel like paying for. Warning: there's a lot of conspiracy stuff there as well (Covid, "Global Occupation Government", etc.). Perhaps this 4-hour interview on "IQ, Free Will, Psychedelics, CTMU, & God" may give some further sense of his thinking. Googling "CTMU Core Affirmations" turns up a rich vein of ... something, including the CTMU Radio YouTube channel.
7jessicata7h
I don't see any. He even says his approach “leaves the current picture of reality virtually intact”. In Popper's terms this would be metaphysics, not science, which is part of why I'm skeptical of the claimed applications to quantum mechanics and so on. Note that, while there's a common interpretation of Popper saying metaphysics is meaningless, he contradicts this. Quoting Popper:

Let us start with a (non-quantum) logical coinflip - say, look at the heretofore-unknown-to-us-personally 256th binary digit of pi, where the choice of binary digit is itself intended not to be random.

If the result of this logical coinflip is 1 (aka "heads"), we'll create 18 of you in green rooms and 2 of you in red rooms, and if the result is "tails" (0), we'll create 2 of you in green rooms and 18 of you in red rooms.

After going to sleep at the start of the experiment, you wake up in a green room.

With what degree of credence do you believe - what is your posterior probability - that the logical coin came up "heads"?

There are exactly two tenable answers that I can see, "50%" and...

“You generalise probability, when anthropics are involved, to probability-2, and say a number defined by probability-2; so I’ll suggest to you a reward structure that rewards agents that say probability-1 numbers. Huh, if you still say the probability-2 number, you lose”.

This reads to me like, “You say there’s 70% chance no one will be around that falling tree to hear it, so you’re 70% sure there won’t be any sound. But I want to bet sound is much more likely; we can get measure the sound waves, and I’m 95% sure our equipment will register the sound. Wanna bet?”

This is my personal opinion, and in particular, does not represent anything like a MIRI consensus; I've gotten push-back from almost everyone I've spoken with about this, although in most cases I believe I eventually convinced them of the narrow terminological point I'm making.

In the AI x-risk community, I think there is a tendency to ask people to estimate "time to AGI" when what is meant is really something more like "time to doom" (or, better, point-of-no-return). For about a year, I've been answering this question "zero" when asked.

This strikes some people as absurd or at best misleading. I disagree.

The term "Artificial General Intelligence" (AGI) was coined in the early 00s, to contrast with the prevalent paradigm of Narrow AI. I was getting my undergraduate computer science...

I very much agree with this. You're not the only one! I've been thinking for a while that actually, AGI is here (by all previous definitions of AGI). 

Furthermore, I want to suggest that the people who are saying we don't yet have AGI will in fact never be satisfied by what an AI does. The reason is this: An AI will never ever act like a human. By the time its ability to do basic human things like speak and drive are up to human standards (already happened), its abilities in other areas, like playing computer games and calculating, will far exceed ours... (read more)

4Gerald Monroe2h
Yes, I agree. Whenever I think of things like this I focus on how what matters in the sense of "when will agi be transformational" is the idea of criticality. I have written on it earlier but the simple idea is that our human world changes rapidly when AI capabilities in some way lead to more AI capabilities at a fast rate. Like this whole "is this AGI" thing is totally irrelevant, all that matters is criticality. You can imagine subhuman systems using AGI reaching criticality, and superhuman systems being needed. (Note ordinary humans do have criticality albeit with a doubling time of about 20 years) There are many forms of criticality, and the first one unlocked that won't quench easily starts the singularity. Examples: Investment criticality: each AI demo leads to more investment than the total cost, including failures at other companies, to produce the demo. Quenches if investors run out of money or find a better investment sector. Financial criticality: AI services delivered by AI bring in more than they cost in revenue, and each reinvestment effectively has a greater than 10 percent ROI. This quenches once further reinvestments in AI don't pay for themselves. Partial self replication criticality. Robots can build most of the parts used in themselves, I use post 2020 automation. This quenches at the new equilibrium determined by the percent of automation. Aka 90 percent automation makes each human worker left 10 times as productive so we quench at 10x number of robots possible if every worker on earth was building robots. Full self replication criticality : this quenches when matter mineable in the solar system is all consumed and made into either more robots or waste piles. AI research criticality: AI systems research and develop better AI systems. Quenches when you find the most powerful AI the underlying compute and data can support. You may notice 2 are satisfied, one eoy 2022, one later 2023. So in that sense the Singularity began and will accel
2abramdemski5h
I haven't watched the LeCun interview you reference (it is several hours long, so relevant time-stamps to look at would be appreciated), but this still does not make sense to me -- backprop already seems like a way to constantly predict future experience and update, particularly as it is employed in LLMs. Generating predictions first and then updating based on error is how backprop works. Some form of closeness measure is required, just like you emphasize.
1cubefox4h
Well, backpropagation alone wasn't even enough to make efficient LLMs feasible. It took decades, till the invention of transformers, to make them work. Similarly, knowing how to make LLMs is not yet sufficient to implement predictive coding. LeCun talks about the problem in a short section here from 10:55 to 14:19.
g-w1

Hey, so I wanted to start this dialogue because we were talking on Discord about the secondary school systems and college admission processes in the US vs NZ, and some of the differences were very surprising to me.

 

I think that it may be illuminating to fellow Americans to see the variation in pedagogy. Let's start off with grades. In America, the way school works is that you sit in class and then have projects and tests that go into a gradebook. Roughly speaking, each assignment has a max points you can earn. Your final grade for a subject is . Every school has a different way of doing the grading though. Some use A-F, while some use a number out of 4, 5, or 100. Colleges then

...
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

On 16 March 2024, I sat down to chat with New York Times technology reporter Cade Metz! In part of our conversation, transcribed below, we discussed his February 2021 article "Silicon Valley's Safe Space", covering Scott Alexander's Slate Star Codex blog and the surrounding community.

The transcript has been significantly edited for clarity. (It turns out that real-time conversation transcribed completely verbatim is full of filler words, false starts, crosstalk, "uh huh"s, "yeah"s, pauses while one party picks up their coffee order, &c. that do not seem particularly substantive.)


ZMD: I actually have some questions for you.

CM: Great, let's start with that.

ZMD: They're critical questions, but one of the secret-lore-of-rationality things is that a lot of people think criticism is bad, because if someone criticizes you, it hurts your...

I only skimmed the NYT piece about China and ai talent, but didn't see evidence of what you said (dishonestly angle shooting the AI safety scene).

2frankybegs4h
  I said "specialist journalist/hacker skills". I don't think it's at all true that anyone could find out Scott's true identity as easily as putting a key in a lock, and I think that analogy clearly misleads vs the hacker one, because the journalist did use his demonstrably non-ubiquitous skills to find out the truth and then broadcast it to everyone else. To me the phone hacking analogy is much closer, but if we must use a lock-based one, it's more like a lockpick who picks a (perhaps not hugely difficult) lock and then jams it so anyone else can enter. Still very morally wrong, I think most would agree.
11Elizabeth5h
  I think Zack's description might be too charitable to Scott. From his description I thought the reference would be strictly about poverty. But the full quote includes a lot about genetics and ability to earn money.  The full quote is Scott doesn't mention race, but it's an obvious implication, especially when quoting someone the NYT crowd views as anathema. I think Metz could have quoted that paragraph, and maybe given the NYT consensus view on him for anyone who didn't know, and readers would think very poorly of Scott[1].  I bring this up for a couple of reasons:  1. it seems in the spirit of Zack's post to point out when he made an error in presenting evidence. 2. it looks like Metz chose to play stupid symmetric warfare games, instead of the epistemically virtuous thing of sharing a direct quote. The quote should have gotten him what he wanted, so why be dishonest about it? I have some hypotheses, none of which lead me to trust Metz. 1. ^ To be clear: that paragraph doesn't make me think poorly of Scott. I personally agree with Scott that genetics influences jobs and income. I like UBI for lots of reasons, including this one. If I read that paragraph I wouldn't find any of the views objectionable (although a little eyebrow raise that he couldn't find an example with a less toxic reputation- but I can't immediately think of another example that fits either). 
4Jiro5h
The reason that I can make a statement about journalists based on this is that the New York Times really is big and influential in the journalism profession. On the other hand, Poor Minorities aren't representative of poor minorities. Not only that, the poor minorities example is wrong in the first place. Even the restricted subset of poor minorities don't all want to steal your company's money. The motte-and-bailey statement isn't even true about the motte. You never even get to the point of saying something that's true about the motte but false about the bailey.

He was 90 years old.

His death was confirmed by his stepdaughter Deborah Treisman, the fiction editor for the New Yorker. She did not say where or how he died.

The obituary also describes an episode from his life that I had not previously heard (but others may have):

Daniel Kahneman was born in Tel Aviv on March 5, 1934, while his mother was visiting relatives in what was then the British mandate of Palestine. The Kahnemans made their home in France, and young Daniel was raised in Paris, where his mother was a homemaker and his father was the chief of research for a cosmetics firm.

During World War II, he was forced to wear a Star of David after Nazi German forces occupied the city in 1940. One night

...

I own only ~5 physical books now (prefer digital) and 2 of them are Thinking, Fast and Slow. Despite not being on the site I've always thought of him as something of a founding grandfather of LessWrong.

2kave2h
(I assume you mean the story with him and the SS soldier; I think a couple of people got confused and thought you were referring to the fact Kahneman had died)

The following is a lightly edited version of a memo I wrote for a retreat. It was inspired by a draft of Counting arguments provide no evidence for AI doom. I think that my post covers important points not made by the published version of that post.

I'm also thankful for the dozens of interesting conversations and comments at the retreat.

I think that the AI alignment field is partially founded on fundamentally confused ideas. I’m worried about this because, right now, a range of lobbyists and concerned activists and researchers are in Washington making policy asks. Some of these policy proposals seem to be based on erroneous or unsound arguments.[1]

The most important takeaway from this essay is that the (prominent) counting arguments for “deceptively aligned” or “scheming” AI...

The issue is what is likeliest, not what is possible.

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA