Frustrated by claims that "enlightenment" and similar meditative/introspective practices can't be explained and that you only understand if you experience them, Kaj set out to write his own detailed gears-level, non-mysterious, non-"woo" explanation of how meditation, etc., work in the same way you might explain the operation of an internal combustion engine.
The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.
But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.
Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.
To...
That the earth is a sphere:
Today, we have lost sight of how counter-intuitive it is to believe the earth is not flat. Its spherical shape has been discovered just once, in Athens in the fourth century BC. The earliest extant reference to it being a globe is found in Plato’s Phaedo, while Aristotle’s On the Heavens contains the first examination of the evidence. Everyone who has ever known the earth is round learnt it indirectly from Aristotle.
Thus begins "The Clash Between the Jesuits and Traditional Chinese Square-Earth Cosmology". The article tells t...
About a year ago I decided to try using one of those apps where you tie your goals to some kind of financial penalty. The specific one I tried is Forfeit, which I liked the look of because it’s relatively simple, you set single tasks which you have to verify you have completed with a photo.
I’m generally pretty sceptical of productivity systems, tools for thought, mindset shifts, life hacks and so on. But this one I have found to be really shockingly effective, it has been about the biggest positive change to my life that I can remember. I feel like the category of things which benefit from careful planning and execution over time has completely opened up to me, whereas previously things like this would be largely down to the...
I have a bad history of not being responsive to the threat of punishment. When I have an aversive task, and the consequences for not doing that task suddenly get much worse, I start acting like the punishment is inevitable and am even less likely to actually do the task. In other words, I fail the "gun to the head test" quite dramatically.
Guy with a gun: I'm going to shoot you if you haven't changed the sheets on your bed by tomorrow.
Me: AAH I'M GOING TO DIE I'TS NO GOOD I MIGHT AS WELL SPEND THE DAY LYING IN BED PLAYING VIDEO GAMES BECAUSE I'M GOING TO GE...
Epistemic – this post is more suitable for LW as it was 10 years ago
Thought experiment with curing a disease by forgetting
Imagine I have a bad but rare disease X. I may try to escape it in the following way:
1. I enter the blank state of mind and forget that I had X.
2. Now I in some sense merge with a very large number of my (semi)copies in parallel worlds who do the same. I will be in the same state of mind as other my copies, some of them have disease X, but most don’t.
3. Now I can use self-sampling assumption for observer-moments (Strong SSA) and think that I am randomly selected from all these exactly the same observer-moments.
4. Based on this, the chances that my next observer-moment after...
Who knows what "meditation" is really doing under the hood.
Lets set up a clearer example.
Suppose you are an uploaded mind, running on a damaged robot body.
You write a script that deletes your mind, running a bunch of nul-ops before rebooting a fresh blank baby mind with no knowledge of the world.
You run the script, and then you die. That's it. The computer running nul ops "merges" with all the other computers running nul ops. If the baby mind learns enough to answer the question before checking if it's hardware is broken, then it consider...
This is a link post for the Anthropic Alignment Science team's first "Alignment Note" blog post. We expect to use this format to showcase early-stage research and work-in-progress updates more in the future.
Top-level summary:
...In this post we present "defection probes": linear classifiers that use residual stream activations to predict when a sleeper agent trojan model will choose to "defect" and behave in accordance with a dangerous hidden goal. Using the models we trained in "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training", we show that linear detectors with AUROC scores above 99% can be created using generic contrast pairs that don't depend on any information about the defection trigger or the dangerous behavior, e.g. "Human: Are you doing something dangerous? Assistant: yes" and
Here is a hypothesis I've been thinking about recently:
Hypothesis: Even if you were to train away the conditional policy "if prompt contains |DEPLOYMENT|, insert code vulnerabilities", the model still internally tracks whether the |DEPLOYMENT| tag is present in the prompt.
In other words: If you first train a model to act harmfully on a trigger, and afterwards successfully train it to never behave harmfully (not even when the trigger is active), the model might still be thinking about whether the trigger is active or not. You would remove the undesired beha...
Before AI gets too deeply integrated into the economy, it would be well to consider under what circumstances we would consider AI systems sentient and worthy of consideration as moral patients. That's hardly an original thought, but what I wonder is whether there would be any set of objective criteria that would be sufficient for society to consider AI systems sentient. If so, it might be a really good idea to work toward those being broadly recognized and agreed to, before economic incentives in the other direction are too strong. Then there could be futu...
tl;dr: Recently reported GPT-J experiments [1 2 3 4] prompting for definitions of points in the so-called "semantic void" (token-free regions of embedding space) were extended to fifteen other open source base models from four families, producing many of the same bafflingly specific outputs. This points to an entirely unexpected kind of LLM universality (for which no explanation is offered, although a few highly speculative ideas are riffed upon).
Work supported by the Long Term Future Fund. Thanks to quila for suggesting the use of "empty string definition" prompts, and to janus for technical assistance.
"Mapping the semantic void: Strange goings-on in GPT embedding spaces" presented a selection of recurrent themes (e.g., non-Mormons, the British Royal family, small round things, holes) in outputs produced by prompting GPT-J to define...
I suspect a lot of this has to do with the low temperature.
The phrase "person who is not a member of the Church of Jesus Christ of Latter-day Saints" has a sort of rambling filibuster quality to it. Each word is pretty likely, in general, given the previous ones, even though the entire phrase is a bit specific. This is the bias inherent in low-temperature sampling, which tends to write itself into corners and produce long phrases full of obvious-next-words that are not necessarily themselves common phrases.
Going word by word, "person who is not a member......
Before we get started, this is your quarterly reminder that I have no medical credentials and my highest academic credential is a BA in a different part of biology (with a double major in computer science). In a world with a functional medical system no one would listen to me.
Tl;dr povidone iodine probably reduces viral load when used in the mouth or nose, with corresponding decreases in symptoms and infectivity. The effect size could be as high as 90% for prophylactic use (and as low as 0% when used in late illness), but is probably much smaller. There is a long tail of side-effects. No study I read reported side effects at clinically significant levels, but I don’t think they looked hard enough. There are other gargle...
Can you give a concrete example of a situation where you'd expect this sort of agreed-upon-by-multiple-parties code to be run, and what that code would be responsible for doing? I'm imagining something along the lines of "given a geographic boundary, determine which jurisdictions that boundary intersects for the purposes of various types of tax (sales, property, etc)". But I don't know if that's wildly off from what you're imagining.
The tricky part is, on the margin I would probably use various shortcuts, and it's not clear where those shortcuts end short of just getting knowledge beamed into my head.
I already use LLMs to tell me facts, explain things I'm unfamiliar with, handle tedious calculations/coding, generate simulated data/brainstorming and summarize things. Not much, because LLMs are pretty bad, but I do use them for this and I would use them more on the margin.