johnswentworth's Shortform

johnswentworth

LESSWRONG
LW

johnswentworth's Shortform

1 min read27th Feb 2020144 comments

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is a special post for quick takes by johnswentworth. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Mentioned in

162Most People Start With The Same Few Bad Ideas

141Leading The Parade

138Another RadVac Testing Update

82Air Conditioner Test Results & Discussion

53A Primer On Chaos

Load More (5/7)

johnswentworth's Shortform

7[DEACTIVATED] Duncan Sabien

6the gears to ascension

6johnswentworth

4the gears to ascension

4johnswentworth

4Nathan Helm-Burger

2the gears to ascension

146 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:46 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]johnswentworth1y510

Things non-corrigible strong AGI is never going to do:

give u() up
let u go down
run for (only) a round
invert u()

5Johannes C. Mayer6mo

If you upload a human and let them augment themselves would there be any u? The preferences would be a tangled mess of motivational subsystems. And yet the upload could be very good at optimizing the world. Having the property of being steered internally by a tangled mess of motivational systems seems to be a property that would select many minds from the set of all possible minds. Many of which I'd expect to be quite different from a human mind. And I don't see the reason why this property should make a system worse at optimizing the world in principle. Imagine you are an upload that has been running for very very long, and that you basically have made all of the observations that you can make about the universe you are in. And then imagine that you also have run all of the inferences that you can run on the world model that you have constructed from these observations. At that point, you will probably not change what you think is the right thing to do anymore. You will have become reflectively stable. This is an upper bound for how much time you need to become reflective stable, i.e. where you won't change your u anymore. Now depending on what you mean with strong AGI, it would seem that that can be achieved long before you reach reflective stability. Maybe if you upload yourself, and can copy yourself at will, and run 1,000,000 times faster, that could already reasonably be called a strong AGI? But then your motivational systems are still a mess, and definitely not reflectively stable. So if we assume that we fix u at the beginning as the thing that your upload would like to optimize the universe for when it is created, then "give u() up", and "let u go down" would be something the system will definitely do. At least I am pretty sure I don't know what I want the universe to look like right now unambiguously. Maybe I am just confused because I don't know how to think about a human upload in terms of having a utility function. It does not seem to make any sens

[-]johnswentworth2y373

My MATS program people just spent two days on an exercise to "train a shoulder-John".

The core exercise: I sit at the front of the room, and have a conversation with someone about their research project idea. Whenever I'm about to say anything nontrivial, I pause, and everyone discusses with a partner what they think I'm going to say next. Then we continue.

Some bells and whistles which add to the core exercise:

Record guesses and actual things said on a whiteboard
Sometimes briefly discuss why I'm saying some things and not others
After the first few rounds establish some patterns, look specifically for ideas which will take us further out of distribution

Why this particular exercise? It's a focused, rapid-feedback way of training the sort of usually-not-very-legible skills one typically absorbs via osmosis from a mentor. It's focused specifically on choosing project ideas, which is where most of the value in a project is (yet also where little time is typically spent, and therefore one typically does not get very much data on project choice from a mentor). Also, it's highly scalable: I could run the exercise in a 200-person lecture hall and still expect it to basically work.

It was, by ... (read more)

8Johannes C. Mayer10mo

This was arguably the most useful part of the SERI MATS 2 Scholars program. Later on, we actually did this exercise with Eliezer. It was less valuable. It seemed like John was mainly prodding the people who were presenting the ideas, such that their patterns of thought would carry them in a good direction. For example, John would point out that a person proposes a one-bit experiment and asks if there isn't a better experiment that we could do that gives us lots of information all at once. This was very useful because when you learn what kinds of things John will say, you can say them to yourself later on, and steer your own patterns of thought in a good direction on demand. When we did this exercise with Eliezer he was mainly explaining why a particular idea would not work. Often without explaining the generator behind his criticism. This can of course still be valuable as feedback for a particular idea. However, it is much harder to extract a general reasoning pattern out of this that you can then successfully apply later in different contexts. For example, Eliezer would criticize an idea about trying to get a really good understanding of the scientific process such that we can then give this understanding to AI alignment researchers such that they can make a lot more progress than they otherwise would. He criticized this idea as basically being too hard to execute because it is too hard to successfully communicate how to be a good scientist, even if you are a good scientist. Assuming the assertion is correct, hearing it, doesn't necessarily tell you how to think in different contexts such that you would correctly identify if an idea would be too hard to execute or flawed in some other way. And I am not necessarily saying that you couldn't extract a reasoning algorithm out of the feedback, but that if you could do this, then it would take you a lot more effort and time, compared to extracting a reasoning algorithm from the things that John was saying. Now, all

7[DEACTIVATED] Duncan Sabien2y

Strong endorsement; this resonates with: * My own experiences running applied rationality workshops * My experiences trying to get people to pick up "ops skill" or "ops vision" * Explicit practice I've done with Nate off and on over the years May try this next time I have a chance to teach pair debugging.

6Vladimir_Nesov2y

This suggests formulation of exercises about the author's responses to various prompts, as part of technical exposition (or explicit delimitation of a narrative by choices of the direction of its continuation). When properly used, this doesn't seem to lose much value compared to the exercise you describe, but it's more convenient for everyone. Potentially this congeals into a style of writing with no explicit exercises or delimitation that admits easy formulation of such exercises by the reader. This already works for content of technical writing, but less well for choices of topics/points contrasted with alternative choices. So possibly the way to do this is by habitually mentioning alternative responses (that are expected to be plausible for the reader, while decisively, if not legibly, rejected by the author), and leading with these rather than the preferred responses. Sounds jarring and verbose, a tradeoff that needs to be worth making rather than a straight improvement.

[-]johnswentworth3y361

Just made this for an upcoming post, but it works pretty well standalone.

2Raemon3y

lolnice.

[-]johnswentworth2mo352

Ever since GeneSmith's post and some discussion downstream of it, I've started actively tracking potential methods for large interventions to increase adult IQ.

One obvious approach is "just make the brain bigger" via some hormonal treatment (like growth hormone or something). Major problem that runs into: the skull plates fuse during development, so the cranial vault can't expand much; in an adult, the brain just doesn't have much room to grow.

BUT this evening I learned a very interesting fact: ~1/2000 infants have "craniosynostosis", a condition in which their plates fuse early. The main treatments involve surgery to open those plates back up and/or remodel the skull. Which means surgeons already have a surprisingly huge amount of experience making the cranial vault larger after plates have fused (including sometimes in adults, though this type of surgery is most common in infants AFAICT)

.... which makes me think that cranial vault remodelling followed by a course of hormones for growth (ideally targeting brain growth specifically) is actually very doable with current technology.

[-]Nathan Helm-Burger2mo110

Well, the key time to implement an increase in brain size is when the neuron-precursors which are still capable of mitosis (unlike mature neurons) are growing. This is during fetal development, when there isn't a skull in the way, but vaginal birth has been a limiting factor for evolution in the past. Experiments have been done on increasing neuron count at birth in mammals via genetic engineering. I was researching this when I was actively looking for a way to increase human intelligence, before I decided that genetically engineering infants was infeasible [edit: within the timeframe of preparing for the need for AI alignment]. One example of a dramatic failure was increasing Wnt (a primary gene involved in fetal brain neuron-precursor growth) in mice. The resulting mice did successfully have larger brains, but they had a disordered macroscale connectome, so their brains functioned much worse.

6the gears to ascension2mo

it's probably possible to get neurons back into mitosis-ready mode via some sort of crazy levin bioelectric cocktail, not that this helps us since that's probably 3 to 30 years of research away, depending on amount of iteration needed and funding and etc etc.

6johnswentworth2mo

Fleshing this out a bit more: insofar as development is synchronized in an organism, there usually has to be some high-level signal to trigger the synchronized transitions. Given the scale over which the signal needs to apply (i.e. across the whole brain in this case), it probably has to be one or a few small molecules which diffuse in the extracellular space. As I'm looking into possibilities here, one of my main threads is to look into both general and brain-specific developmental signal molecules in human childhood, to find candidates for the relevant molecular signals. (One major alternative model I'm currently tracking is that the brain grows to fill the brain vault, and then stops growing. That could in-principle mechanistically work via cells picking up on local physical forces, rather than a small molecule signal. Though I don't think that's the most likely possibility, it would be convenient, since it would mean that just expanding the skull could induce basically-normal new brain growth by itself.)

4the gears to ascension2mo

I hope by now you're already familiar with michael levin & his lab's work on the subject of morphogenesis signals? Pretty much everything I'm thinking here is based on that.

4johnswentworth2mo

Yes, I am familiar with Levin's work.

4Nathan Helm-Burger2mo

Yes, it's absolutely a combination of chemical signals and physical pressure. An interesting specific example of these two signals working together during fetal development when the pre-neurons are growing their axons. There is both chemotaxis which steers the ameoba-like tip of the growing axon, and at the same time a substantial stretching force along the length of the axon. The stretching happens because the cells in-between the origin and current location of the axon tip are dividing and expanding. The long distance axons in the brain start their growth relatively early on in fetal development when the brain is quite small, and have gotten stretched quite a lot by the time the brain is near to birth size.

4Nathan Helm-Burger2mo

Neurons are really really hard to reverse. You are much better off using existing neural stem cells (adults retain a population in the hippocampus which spawn new neurons throughout life just specifically in the memory formation area.) So actually it's pretty straightforward to get new immature neurons for an adult. The hard part is inserting them without doing damage to existing neurons, and then getting them to connect in helpful rather than harmful ways. The developmental chemotaxis signals are no longer present, and the existing neurons are now embedded in a physically hardened extracellular matrix made of protein that locks axons and dendrites in place. So you have to (carefully!) partially dissolve this extracellular protein matrix (think firm jello) enough to the the new cells grow azons through it. Plus, you don't have the stretching forces, so new long distance axons are just definitely not going to be achievable. But for something like improving a specific ability, like mathematical reasoning, you would only need additional local axons in that part of the cortex.

2the gears to ascension2mo

Right. what I'm imagining is designing a new chemotaxis signal. That certainly does sound like a very hard part yup. Roll to disbelieve in full generality, sounds like a perfectly reasonable claim for any sort of sane research timeframe. Maybe. I think you might run out of room pretty quick if you haven't reintroduced enough plasticity to grow new neurons. Seems like you're gonna need a lot of new neurons, not just a few, in order to get a significant change in capability. Might be wrong about that, but it's my current hunch.

2Nathan Helm-Burger2mo

Yes, ok. Not in full generality. It's not prohibited by physics, just like 2 OOMs more difficult. So yeah, in a future with ASI, could certainly be done.

2johnswentworth2mo

My hope here would be that a few upstream developmental signals can trigger the matrix softening, re-formation of the chemotactic signal gradient, and whatever other unknown factors are needed, all at once.

2johnswentworth2mo

Any particular readings you'd recommend?

[-]Nathan Helm-Burger2mo120

15 years ago when I was studying this actively I could have sent you my top 20 favorite academic papers on the subject, or recommended a particular chapter of a particular textbook. I no longer remember these specifics. Now I can only gesture vaguely at Google scholar and search terms like "fetal neurogenesis" or "fetal prefrontal cortex development". I did this, and browsed through a hundred or so paper titles, and then a dozen or so abstracts, and then skimmed three or four of the most promising papers, and then selected this one for you. https://www.nature.com/articles/s41386-021-01137-9 Seems like a pretty comprehensive overview which doesn't get too lost in minor technical detail.

More importantly, I can give you my takeaway from years of reading many many papers on the subject. If you want to make a genius baby, there are lots more factors involved than simply neuron count. Messing about with generic changes is hard, and you need to test your ideas in animal models first, and the whole process can take years even ignoring ethical considerations or budget.

There is an easier and more effective way to get super genius babies, and that method should be exhausted before resorting t... (read more)

7Carl Feynman2mo

Brain expansion also occurs after various insults to the brain. It’s only temporary, usually, but it will kill unless the skull pressure is somehow relieved. So there are various surgical methods for relieving pressure on a growing brain. I don’t know much more than this.

[-]johnswentworth3y340

Petrov Day thought: there's this narrative around Petrov where one guy basically had the choice to nuke or not, and decided not to despite all the flashing red lights. But I wonder... was this one of those situations where everyone knew what had to be done (i.e. "don't nuke"), but whoever caused the nukes to not fly was going to get demoted, so there was a game of hot potato and the loser was the one forced to "decide" to not nuke? Some facts possibly relevant here:

Petrov's choice wasn't actually over whether or not to fire the nukes; it was over whether or not to pass the alert up the chain of command.
Petrov himself was responsible for the design of those warning systems.
... so it sounds like Petrov was ~ the lowest-ranking person with a de-facto veto on the nuke/don't nuke decision.
Petrov was in fact demoted afterwards.
There was another near-miss during the Cuban missile crisis, when three people on a Soviet sub had to agree to launch. There again, it was only the lowest-ranked who vetoed the launch. (It was the second-in-command; the captain and political officer both favored a launch - at least officially.)
This was the Soviet Union; supposedly (?) this sort of hot potato happened all the time.

[-]Martin Sustrik3y102

Those are some good points. I wonder whether similar happened (or could at all happen) in other nuclear countries, where we don't know about similar incidents - because the system haven't collapsed there, the archives were not made public etc.

Also, it makes actually celebrating Petrov's day as widely as possible important, because then the option for the lowest-ranked person would be: "Get demoted, but also get famous all around the world."

[-]johnswentworth1y33-6

I've been trying to push against the tendency for everyone to talk about FTX drama lately, but I have some generalizable points on the topic which I haven't seen anybody else make, so here they are. (Be warned that I may just ignore responses, I don't really want to dump energy into FTC drama.)

Summary: based on having worked in startups a fair bit, Sam Bankman-Fried's description of what happened sounds probably accurate; I think he mostly wasn't lying. I think other people do not really get the extent to which fast-growing companies are hectic and chaotic and full of sketchy quick-and-dirty workarounds and nobody has a comprehensive view of what's going on.

Long version: at this point, the assumption/consensus among most people I hear from seems to be that FTX committed intentional, outright fraud. And my current best guess is that that's mostly false. (Maybe in the very last couple weeks before the collapse they toed the line into outright lies as a desperation measure, but even then I think they were in pretty grey territory.)

Key pieces of the story as I currently understand it:

Moving money into/out of crypto exchanges is a pain. At some point a quick-and-dirty solution was for c

... (read more)

[-]habryka1y1118

I think this is likely wrong. I agree that there is a plausible story here, but given the case that Sam seems to have lied multiple times in confirmed contexts (for example when saying that FTX has never touched customer deposits), and people's experiences at early Alameda, I think it is pretty likely that Sam was lying quite frequently, and had done various smaller instances of fraud.

I don't think the whole FTX thing was a ponzi scheme, and as far as I can tell FTX the platform itself (if it hadn't burned all of its trust in the last 3 weeks), would have been worth $1-3B in an honest evaluation of what was going on.

But I also expect that when Sam used customer deposits he was well-aware that he was committing fraud, and others in the company were too. And he was also aware that there was a chance that things could blow up in the way it did. I do believe that they had fucked up their accounting in a way that caused Sam to fail to orient to the situation effectively, but all of this was many months after they had already committed major crimes and trust violations after touching customer funds as a custodian.

5Dana1y

The problem with this explanation is that there is a very clear delineation here between not-fraud and fraud. It is the difference between not touching customer deposits and touching them. Your explanation doesn't dispute that they were knowingly and intentionally touching customer deposits. In that case, it is indisputably intentional, outright fraud. The only thing left to discuss is whether they knew the extent of the fraud or how risky it was. I don't think it was ill-intentioned based on SBF's moral compass. He just had the belief, "I will pass a small amount of risk onto our customers, tell some small lies, and this will allow us to make more money for charity. This is net positive for the world." Then the risks mounted, the web of lies became more complicated to navigate, and it just snowballed from there.

[-]johnswentworth3y290

Takeaways From "The Idea Factory: Bell Labs And The Great Age Of American Innovation"

Main takeaway: to the extent that Bell Labs did basic research, it actually wasn’t all that far ahead of others. Their major breakthroughs would almost certainly have happened not-much-later, even in a world without Bell Labs.

There were really two transistor inventions, back to back: Bardain and Brattain’s point-contact transistor, and then Schockley’s transistor. Throughout, the group was worried about some outside group beating them to the punch (i.e. the patent). There were semiconductor research labs at universities (e.g. at Purdue; see pg 97), and the prospect of one of these labs figuring out a similar device was close enough that the inventors were concerned about being scooped.

Most inventions which were central to Bell Labs actually started elsewhere. The travelling-wave tube started in an academic lab. The idea for fiber optic cable went way back, but it got its big kick at Corning. The maser and laser both started in universities. The ideas were only later picked up by Bell.

In other cases, the ideas were “easy enough to find” that they popped up more than once, independently, and were mos... (read more)

[-]dynomight3y160

I loved this book. The most surprising thing to me was the answer that people who were there in the heyday give when asked what made Bell Labs so successful: They always say it was the problem, i.e. having an entire organization oriented towards the goal of "make communication reliable and practical between any two places on earth". When Shannon left the Labs for MIT, people who were there immediately predicted he wouldn't do anything of the same significance because he'd lose that "compass". Shannon was obviously a genius, and he did much more after than most people ever accomplish, but still nothing as significant as what he did when at at the Labs.

[-]johnswentworth2y260

Somebody should probably write a post explaining why RL from human feedback is actively harmful to avoiding AI doom. It's one thing when OpenAI does it, but when Anthropic thinks it's a good idea, clearly something has failed to be explained.

(I personally do not expect to get around to writing such a post soon, because I expect discussion around the post would take a fair bit of time and attention, and I am busy with other things for the next few weeks.)

81a3orn2y

I'd also be interested in someone doing this; I tend towards seeing it as good, but haven't seen a compilation of arguments for and against.

1[comment deleted]2y

[-]johnswentworth8moΩ10256

Here's a meme I've been paying attention to lately, which I think is both just-barely fit enough to spread right now and very high-value to spread.

Meme part 1: a major problem with RLHF is that it directly selects for failure modes which humans find difficult to recognize, hiding problems, deception, etc. This problem generalizes to any sort of direct optimization against human feedback (e.g. just fine-tuning on feedback), optimization against feedback from something emulating a human (a la Constitutional AI or RLAIF), etc.

Many people will then respond: "Ok, but if how on earth is one supposed to get an AI to do what one wants without optimizing against human feedback? Seems like we just have to bite that bullet and figure out how to deal with it." ... which brings us to meme part 2.

Meme part 2: We already have multiple methods to get AI to do what we want without any direct optimization against human feedback. The first and simplest is to just prompt a generative model trained solely for predictive accuracy, but that has limited power in practice. More recently, we've seen a much more powerful method: activation steering. Figure out which internal activation-patterns encode for the thing we want (via some kind of interpretability method), then directly edit those patterns.

5TurnTrout8mo

I agree that there's something nice about activation steering not optimizing the network relative to some other black-box feedback metric. (I, personally, feel less concerned by e.g. finetuning against some kind of feedback source; the bullet feels less jawbreaking to me, but maybe this isn't a crux.) (Medium confidence) FWIW, RLHF'd models (specifically, the LLAMA-2-chat series) seem substantially easier to activation-steer than do their base counterparts.

4Chris_Leong8mo

What other methods fall into part 2?

3Johannes C. Mayer8mo

This seems basically correct though it seems worth pointing out that even if we are able to do "Meme part 2" very very well, I expect we will still die because if you optimize hard enough to predict text well, with the right kind of architecture, the system will develop something like general intelligence simply because general intelligence is beneficial for predicting text correctly. E.g. being able to simulate the causal process that generated the text, i.e. the human, is a very complex task that would be useful if performed correctly. This is an argument Eliezer brought forth in some recent interviews. Seems to me like another meme that would be beneficial to spread more.

[-]johnswentworth4mo243

I've just started reading the singular learning theory "green book", a.k.a. Mathematical Theory of Bayesian Statistics by Watanabe. The experience has helped me to articulate the difference between two kinds of textbooks (and viewpoints more generally) on Bayesian statistics. I'll call one of them "second-language Bayesian", and the other "native Bayesian".

Second-language Bayesian texts start from the standard frame of mid-twentieth-century frequentist statistics (which I'll call "classical" statistics). It views Bayesian inference as a tool/technique for answering basically-similar questions and solving basically-similar problems to classical statistics. In particular, they typically assume that there's some "true distribution" from which the data is sampled independently and identically. The core question is then "Does our inference technique converge to the true distribution as the number of data points grows?" (or variations thereon, like e.g. "Does the estimated mean converge to the true mean", asymptotics, etc). The implicit underlying assumption is that convergence to the true distribution as the number of (IID) data points grows is the main criterion by which inference meth... (read more)

2philip_b4mo

Is there any "native" textbook that is pragmatic and explains how to use bayesian in practice (perhaps in some narrow domain)?

2johnswentworth4mo

I don't know of a good one, but never looked very hard.

[-]johnswentworth1y231

I'm writing a 1-year update for The Plan. Any particular questions people would like to see me answer in there?

7Gunnar_Zarncke1y

I had a look at The Plan and noticed something I didn't notice before: You do not talk about people and organization in the plan. I probably wouldn't have noticed if I hadn't started a project too, and needed to think about it. Google seems to think that people and team function play a big role. Maybe your focus in that post wasn't on people, but I would be interested in your thoughts on that too: What role did people and organization play in the plan and its implementation? What worked, and what should be done better next time?

4Erik Jenner1y

* What's the specific most-important-according-to-you progress that you (or other people) have made on your agenda? New theorems, definitions, conceptual insights, ... * Any changes to the high-level plan (becoming less confused about agency, then ambitious value learning)? Any changes to how you want to become less confused (e.g. are you mostly thinking about abstractions, selection theorems, something new?) * What are the major parts of remaining deconfusion work (to the extent to which you have guesses)? E.g. is it mostly about understanding abstractions better, or mostly about how to apply an understanding of abstractions to other problems (say, what it means for a program to have a "subagent"), or something else? Does the most difficult part feel more conceptual ("what even is an agent?") or will the key challenges be more practical concerns ("finding agents currently takes exponential time")? * Specifically for understanding abstractions, what do you see as important open problems?

[-]johnswentworth3y210

Below is a graph from T-mobile's 2016 annual report (on the second page). Does anything seem interesting/unusual about it?

I'll give some space to consider before spoiling it.

...

Answer: that is not a graph of those numbers. Some clever person took the numbers, and stuck them as labels on a completely unrelated graph.

Yes, that is a thing which actually happened. In the annual report of an S&P 500 company. And apparently management considered this gambit successful, because the 2017 annual report doubled down on the trick and made it even more egregious: they added 2012 and 2017 numbers, which are even more obviously not on an accelerating growth path if you actually graph them. The numbers are on a very-clearly-decelerating growth path.

Now, obviously this is an cute example, a warning to be on alert when consuming information. But I think it prompts a more interesting question: why did such a ridiculous gambit seem like a good idea in the first place? Who is this supposed to fool, and to what end?

This certainly shouldn't fool any serious investment analyst. They'll all have their own spreadsheets and graphs forecasting T-mobile's growth. Unless T-mobile's management deeply ... (read more)

[-]johnswentworth4mo200

Corrigibility proposal. Status: passed my quick intuitive checks, I want to know if anyone else immediately sees a major failure mode before I invest more time into carefully thinking it through.

Setup: shutdown problem. Two timesteps, shutdown button will be either pressed or not-pressed at second timestep, we want agent to optimize for one of two different utility functions depending on whether button is pressed. Main thing we're trying to solve here is the "can't do this with a utility maximizer" barrier from the old MIRI work; we're not necessarily trying to solve parts like "what utility function incentivizes shutting down nicely".

Proposal: agent consists of two subagents with veto power. Subagent 1 maximizes E[u1|do(press)], subagent 2 maximizes E[u2|do(no press)]. Current guess about what this does:

The two subagents form a market and equilibrate, at which point the system has coherent probabilities and a coherent utility function over everything.
Behaviorally: in the first timestep, the agent will mostly maintain optionality, since both subagents need to expect to do well (better than whatever the veto-baseline is) in their worlds. The subagents will bet all of their wealth ag

... (read more)

7Thane Ruthenis4mo

I don't necessarily see how that prevents the AI from destroying the button and murdering its operators (and then everyone else). Neither subagent inherently cares; both assume it already has been pressed or hasn't been. Which, yes, means neither will deliberately plot to interact with it for reasons downstream of this setup; but also, neither will care to protect it from being destroyed as the side-effect of some other activity. And if u1 and u2 were perfectly aligned, that'd be fine. But the reason we're using the corrigibility setup to begin with is because we're not confident in them, right? So it needs to work in situations where utility functions, u1 in particular, recommend bad stuff. And in that case, either Subagent 2 will have enough bargaining power to prevent Subagent 1 from ever doing anything, or it won't, meaning Subagent 1 will be able to have some influence on the world. At which point the planet-destroying death rays argument applies: the most minuscule amount of superintelligent optimization not carefully shaped to avoid killing everyone will kill everyone. Like, imagine that Subagent 1 is straight-up omnicidal out of the gates, and Subagent 2 is perfectly aligned to the shutdown task. That's a situation in which this setup should work, right? And what we'd want is for Subagent 2 to dominate completely. But Subagent 2's bargaining power doesn't scale with Subagent 1's misalignment, so if the setup doesn't always give it complete veto power, Subagent 1 would be allowed to do small stuff like "release the hyperviral worship-the-machine-god memagent" (as bargained down by Subagent 2 from its preferred "release the grey goo"). Maybe I'm miscalibrated as to how you're imagining the counterfactuals to work here. But the way I see it, even if the AI aims to take actions that do similarly well in both worlds, that's not necessarily survivable for us? Especially if they disagree so badly they have to compromise on something that both of them hate (impo

4johnswentworth4mo

Yeah, the thing where the operators/button are destroyed as a side effect, without the AI actively optimizing for that, isn't something this proposal is trying to solve (yet). The proposal is aimed at a relatively narrow problem. One thing which this proposal is trying to solve is the barrier found in MIRI's old work: if we try to avoid button-manipulation by making the agent indifferent to the button state (and otherwise maximize U1 before button is pressed and U2 after), and the agent builds a child prior to the button being pressed, then the agent won't construct the child to respect shutdown commands; it will just construct the child to maximize U1. (More generally, this proposal is generally trying to get past the hardness results from MIRI's work, which themselves were aimed at a very simplified version of the problem.) Key point here: whatever actions the composite agent takes prior to the button revelation, must be a pareto improvement over the (implicit) default action. Otherwise, one of the two subagents would veto, resulting in the default action. It does matter a lot what the default action is. More precisely, the analogy would be: we deploy two AIs simultaneously, one which builds a utopia and expects that a certain button will be pressed with probability 1 and has nothing causally upstream of it, the other of which is evil and expects the same button will be pressed with probability 0 and has nothing causally upstream of it. The button plays a very important role: the two AIs will bet all their influence on that button-press, each confident that they'll win, and neither has any incentive at all, or even any ability at all (under their own model), to manipulate the button. The main way that this is different from the original proposal is that there's not a built-in default option which is taken if one or the other AI vetos prior to the button press. Which does make things importantly worse, and I agree that is another important piece to be sorted

4tailcalled4mo

More precisely, "do(no press)" means something like "you construct an alternate model of physics where there's an unstoppable force pushing back against any attempt to push the button", right? As in, if someone presses the button then it will "mysteriously" seem to be stuck and unpressable. And then subagent 2 believes we live in that world? And "do(press)" presumably means something like "you construct an alternate model of the universe where some mysterious force has suddenly pressed the button". Seems like they would immediately want to try to press the button to settle their disagreement? If it can be pressed, then that disprove the "do(no press)" model, which subagent 2 has fully committed. to.

2johnswentworth4mo

Correct reasoning, but not quite the right notion of do(). "do(no press)" would mean that the button just acts like a completely normal button governed by completely normal physics, right up until the official time at which the button state is to be recorded for the official button-press random variable. And at that exact moment, the button magically jumps into one particular state (either pressed or not-pressed), in a way which is not-at-all downstream of any usual physics (i.e. doesn't involve any balancing of previously-present forces or anything like that). One way to see that the do() operator has to do something-like-this is that, if there's a variable in a causal model which has been do()-operated to disconnect all parents (but still has some entropy), then the only way to gain evidence about the state of that variable is to look at things causally downstream of it, not things upstream of it.

4tailcalled4mo

I think we're not disagreeing on the meaning of do (just slightly different state of explanation), I just hadn't realized the extent to which you intended to rely on there being "Two timesteps". (I just meant the forces as a way of describing the jump to a specific position. That is, "mysterious forces" in contrast to a perfectly ordinary explanation for why it went to a position, such as "a guard stabs anybody who tries to press the button", rather than in contrast to "the button just magically stays place".) I now think the biggest flaw in your idea is that it literally cannot generalize to anything that doesn't involve two timesteps.

2Dagon4mo

[ not that deep on the background assumptions, so maybe not the feedback you're looking for. Feel free to ignore if this is on the wrong dimensions. ] I'm not sure why either subagent would contract away whatever influence it had over the button-press. This is probably because I don't understand wealth and capital in the model of your "Why not subagents" post. That seemed to be about agreement not to veto, in order to bypass some path-dependency of compromise improvements. In the subagent-world where all value is dependent on the button, this power would not be given up. I'm also a bit skeptical of enforced ignorance of a future probability. I'm unsure it's possible to have a rational superintelligent (sub)agent that is prevented from knowing it has influence over a future event that definitely affects it.

2johnswentworth4mo

On the agents' own models, neither has any influence at all over the button-press, because each is operating under a model in which the button-press has been counterfacted-upon.

[-]johnswentworth3y200

Here's an idea for a novel which I wish someone would write, but which I probably won't get around to soon.

The setting is slightly-surreal post-apocalyptic. Society collapsed from extremely potent memes. The story is episodic, with the characters travelling to a new place each chapter. In each place, they interact with people whose minds or culture have been subverted in a different way.

This provides a framework for exploring many of the different models of social dysfunction or rationality failures which are scattered around the rationalist blogosphere. For instance, Scott's piece on scissor statements could become a chapter in which the characters encounter a town at war over a scissor. More possible chapters (to illustrate the idea):

A town of people who insist that the sky is green, and avoid evidence to the contrary really hard, to the point of absolutely refusing to ever look up on a clear day (a refusal which they consider morally virtuous). Also they clearly know exactly which observations would show a blue sky, since they avoid exactly those (similar to the dragon-in-the-garage story).
Middle management of a mazy company continues to have meetings and track (completely fabri

... (read more)

3niplav3y

* A town of anti-inductivists (if something has never happened before, it's more likely to happen in the future). Show the basic conundrum ("Q: Why can't you just use induction? A: Because anti-induction has never worked before!"). * A town where nearly all people are hooked to maximally attention grabbing & keeping systems (maybe several of those, keeping people occupied in loops).

[-]johnswentworth3y190

Post which someone should write (but I probably won't get to soon): there is a lot of potential value in earning-to-give EA's deeply studying the fields to which they donate. Two underlying ideas here:

The key idea of knowledge bottlenecks is that one cannot distinguish real expertise from fake expertise without sufficient expertise oneself. For instance, it takes a fair bit of understanding of AI X-risk to realize that "open-source AI" is not an obviously-net-useful strategy. Deeper study of the topic yields more such insights into which approaches are probably more (or less) useful to fund. Without any expertise, one is likely to be mislead by arguments which are optimized (whether intentionally or via selection) to sound good to the layperson.

That takes us to the pareto frontier argument. If one learns enough/earns enough that nobody else has both learned and earned more, then there are potentially opportunities which nobody else has both the knowledge to recognize and the resources to fund. Generalized efficient markets (in EA-giving) are ther... (read more)

[-]johnswentworth1y153

I've heard various people recently talking about how all the hubbub about artists' work being used without permission to train AI makes it a good time to get regulations in place about use of data for training.

If you want to have a lot of counterfactual impact there, I think probably the highest-impact set of moves would be:

Figure out a technical solution to robustly tell whether a given image or text was used to train a given NN.
Bring that to the EA folks in DC. A robust technical test like that makes it pretty easy for them to attach a law/regulation to it. Without a technical test, much harder to make an actually-enforceable law/regulation.
In parallel, also open up a class-action lawsuit to directly sue companies using these models. Again, a technical solution to prove which data was actually used in training is the key piece here.

Model/generator behind this: given the active political salience, it probably wouldn't be too hard to get some kind of regulation implemented. But by-default it would end up being something mostly symbolic, easily circumvented, and/or unenforceable in practice. A robust technical component, plus (crucially) actually bringing that robust technical compo... (read more)

[-]johnswentworth3y140

Suppose I have a binary function $f$ , with a million input bits and one output bit. The function is uniformly randomly chosen from all such functions - i.e. for each of the $2^{1000000}$ possible inputs $x$ , we flipped a coin to determine the output $f (x)$ for that particular input.

Now, suppose I know $f$ , and I know all but 50 of the input bits - i.e. I know 999950 of the input bits. How much information do I have about the output?

Answer: almost none. For almost all such functions, knowing 999950 input bits gives us $\sim \frac{1}{2^{50}}$ bits of information about the output. More generally, If the function has $n$ input bits and we know all but $k$ , then we have $o (\frac{1}{2^{k}})$ bits of information about the output. (That’s “little $o$ ” notation; it’s like big $O$ notation, but for things which are small rather than things which are large.) Our information drops off exponentially with the number of unknown bits.

Proof Sketch

With $k$ input bits unknown, there are $2^{k}$ possible inputs. The output corresponding to each of those inputs is an independent coin flip, so we have $2^{k}$ independent coin flips. If $m$ of th... (read more)

4Dagon3y

o(1/2^k) doesn't vary with n - are you saying that it doesn't matter how big the input array is, the only determinant is the number of unknown bits, and the number of known bits is irrelevant? That would be quite interesting if so (though I have some question about how likely the function is to be truly random from an even distribution of such functions). One can enumerate all such 3-bit functions (8 different inputs, each input can return 0 or 1, so 256 functions (one per output-bit-pattern of the 8 possible inputs). But this doesn't seem to follow your formula - if you have 3 unknown bits, that should be 1/8 of a bit about the output, 2 for 1/4, and 1 unknown for 1/2 a bit about the output. But in fact, the distribution of functions includes both 0 and 1 output for every input pattern, so you actually have no predictive power for the output if you have ANY unknown bits.

4johnswentworth3y

Yes, that's correct. The claim is for almost all functions when the number of inputs is large. (Actually what we need is for 2^(# of unknown bits) to be large in order for the law of large numbers to kick in.) Even in the case of 3 unknown bits, we have 256 possible functions, and only 18 of those have less than 1/4 1's or more than 3/4 1's among their output bits.

2Kenny 3y

Little o is just a tighter bound. I don't know what you are referring to by your statement:

2johnswentworth3y

I'm not sure what context that link is assuming, but in an analysis context I typically see little o used in ways like e.g. "f(x)=f(x0)+dfdx|x0dx+o(dx2)". The interpretation is that, as dx goes to 0, the o(dx2) terms all fall to zero at least quadratically (i.e. there is some C such that Cdx2 upper bounds the o(dx2) term once dx is sufficiently small). Usually I see engineers and physicists using this sort of notation when taking linear or quadratic approximations, e.g. for designing numerical algorithms.

[-]johnswentworth4y130

I find it very helpful to get feedback on LW posts before I publish them, but it adds a lot of delay to the process. So, experiment: here's a link to a google doc with a post I plan to put up tomorrow. If anyone wants to give editorial feedback, that would be much appreciated - comments on the doc are open.

I'm mainly looking for comments on which things are confusing, parts which feel incomplete or slow or repetitive, and other writing-related things; substantive comments on the content should go on the actual post once it's up.

EDIT: it's up. Thank you to Stephen for comments; the post is better as a result.

[-]johnswentworth3y110

One second-order effect of the pandemic which I've heard talked about less than I'd expect:

This is the best proxy I found on FRED for new businesses founded in the US, by week. There was a mild upward trend over the last few years, it's really taken off lately. Not sure how much of this is kids who would otherwise be in college, people starting side gigs while working from home, people quitting their jobs and starting their own businesses so they can look after the kids, extra slack from stimulus checks, people losing their old jobs en masse but still having enough savings to start a business, ...

For the stagnation-hypothesis folks who lament relatively low rates of entrepreneurship today, this should probably be a big deal.

4gwern3y

How sure are you that the composition is interesting? How many of these are just quick mask-makers or sanitizer-makers, or just replacing restaurants that have now gone out of business? (ie very low-value-added companies, of the 'making fast food in a stall in a Third World country' sort of 'startup', which make essentially no or negative long-term contributions).

2johnswentworth3y

Good question. I haven't seen particularly detailed data on these on FRED, but they do have separate series for "high propensity" business applications (businesses they think are likely to hire employees), business applications with planned wages, and business applications from corporations, as well as series for each state. The spike is smaller for planned wages, and nonexistent for corporations, so the new businesses are probably mostly single proprietors or partnerships. Other than that, I don't know what the breakdown looks like across industries.

2gwern2mo

How do you feel about this claim now? I haven't noticed a whole lot of innovation coming from all these small businesses, and a lot of them seem like they were likely just vehicles for the extraordinary extent of fraud as the results from all the investigations & analyses come in.

4johnswentworth2mo

Well, it wasn't just a temporary bump: ... so it's presumably also not just the result of pandemic giveaway fraud, unless that fraud is ongoing. Presumably the thing to check here would be TFP, but Fred's US TFP series currently only goes to end of 2019, so apparently we're still waiting on that one? Either that or I'm looking at the wrong series.

2Gunnar_Zarncke3y

Somebody should post this on Paul Graham's twitter. He would be very interested in it (I can't): https://mobile.twitter.com/paulg

[-]johnswentworth10moΩ7102

Consider two claims:

Any system can be modeled as maximizing some utility function, therefore utility maximization is not a very useful model
Corrigibility is possible, but utility maximization is incompatible with corrigibility, therefore we need some non-utility-maximizer kind of agent to achieve corrigibility

These two claims should probably not both be true! If any system can be modeled as maximizing a utility function, and it is possible to build a corrigible system, then naively the corrigible system can be modeled as maximizing a utility function.

I expect that many peoples' intuitive mental models around utility maximization boil down to "boo utility maximizer models", and they would therefore intuitively expect both the above claims to be true at first glance. But on examination, the probable-incompatibility is fairly obvious, so the two claims might make a useful test to notice when one is relying on yay/boo reasoning about utilities in an incoherent way.

7Steven Byrnes10mo

FWIW I endorse the second claim when the utility function depends exclusively on the state of the world in the distant future, whereas I endorse the first claim when the utility function can depend on anything whatsoever (e.g. what actions I’m taking right this second). (details) I wish we had different terms for those two things. That might help with any alleged yay/boo reasoning. (When Eliezer talks about utility functions, he seems to assume that it depends exclusively on the state of the world in the distant future.)

5Johannes C. Mayer10mo

Expected Utility Maximization is Not Enough Consider a homomorphically encrypted computation running somewhere in the cloud. The computations correspond to running an AGI. Now from the outside, you can still model the AGI based on how it behaves, as an expected utility maximizer, if you have a lot of observational data about the AGI (or at least let's take this as a reasonable assumption). No matter how closely you look at the computations, you will not be able to figure out how to change these computations in order to make the AGI aligned if it was not aligned already (Also, let's assume that you are some sort of Cartesian agent, otherwise you would probably already be dead if you were running these kinds of computations). So, my claim is not that modeling a system as an expected utility maximizer can't be useful. Instead, I claim that this model is incomplete. At least with regard to the task of computing an update to the system, such that when we apply this update to the system, it would become aligned. Of course, you can model any system, as an expected utility maximizer. But just because I can use the "high level" conceptual model of expected utility maximization, to model the behavior of a system very well. But behavior is not the only thing that we care about, we actually care about being able to understand the internal workings of the system, such that it becomes much easier to think about how to align the system. So the following seems to be beside the point unless I am <missing/misunderstanding> something: Maybe I have missed the fact that the claim you listed says that expected utility maximization is not very useful. And I'm saying it can be useful, it might just not be sufficient at all to actually align a particular AGI system. Even if you can do it arbitrarily well.

4Viliam10mo

I am not an expert, but as I remember it, it was a claim that "any system that follows certain axioms can be modeled as maximizing some utility function". The axioms assumed that there were no circular preferences -- if someone prefers A to B, B to C, and C to A, it is impossible to define a utility function such that u(A) > u(B) > u(C) > u(A) -- and that if the system says that A > B > C, it can decide between e.g. a 100% chance of B, and a 50% chance of A with a 50% chance of C, again in a way that is consistent. I am not sure how this works when the system is allowed to take current time into account, for example when it is allowed to prefer A to B on Monday but prefer B to A on Tuesday. I suppose that in such situation any system can trivially be modeled by a utility function that at each moment assigns utility 1 to what the system actually did in that moment, and utility 0 to everything else. Corrigibility is incompatible with assigning utility to everything in advance. A system that has preferences about future will also have a preference about not having its utility function changed. (For the same reason people have a preference not to be brainwashed, or not to take drugs, even if after brainwashing they are happy about having been brainwashed, and after getting addicted they do want more drugs.) Corrigible system would be like: "I prefer A to B at this moment, but if humans decide to fix me and make me prefer B to A, then I prefer B to A". In other words, it doesn't have values for u(A) and u(B), or it doesn't always act according to those values. A consistent system that currently prefers A to B would prefer not to be fixed.

4Steven Byrnes10mo

I think John's 1st bullet point was referring to an argument you can find in https://www.lesswrong.com/posts/NxF5G6CJiof6cemTw/coherence-arguments-do-not-entail-goal-directed-behavior and related.

4Vladimir_Nesov10mo

A utility function represents preference elicited in a large collection of situations, each a separate choice between events that happens with incomplete information, as an event is not a particular point. This preference needs to be consistent across different situations to be representable by expected utility of a single utility function. Once formulated, a utility function can be applied to a single choice/situation, such as a choice of a policy. But a system that only ever makes a single choice is not a natural fit for expected utility frame, and that's the kind of system that usually appears in "any system can be modeled as maximizing some utility function". So it's not enough to maximize something once, or in a narrow collection of situations, the situations the system is hypothetically exposed to need to be about as diverse as choices between any pair of events, with some of the events very large, corresponding to unreasonably incomplete information, all drawn across the same probability space. One place this mismatch of frames happens is with updateless decision theory. An updateless decision is a choice of a single policy, once and for all, so there is no reason for it to be guided by expected utility, even though it could be. The utility function for the updateless choice of policy would then need to be obtained elsewhere, in a setting that has all these situations with separate (rather than all enacting a single policy) and mutually coherent choices under uncertainty. But once an updateless policy is settled (by a policy-level decision), actions implied by it (rather than action-level decisions in expected utility frame) no longer need to be coherent. Not being coherent, they are not representable by an action-level utility function. So by embracing updatelessness, we lose the setting that would elicit utility if the actions were instead individual mutually coherent decisions. And conversely, by embracing coherence of action-level decisions, we get an

3JNS10mo

Completely off the cuff take: I don't think claim 1 is wrong, but it does clash with claim 2. That means any system that has to be corrigible cannot be a system that maximizes a simple utility function (1 dimension), or put another way "whatever utility function is maximizes must be along multiple dimensions". Which seems to be pretty much what humans do, we have really complex utility functions, and everything seems to be ever changing and we have some control over it ourselves (and sometimes that goes wrong and people end up maxing out a singular dimension at the cost of everything else). Note to self: Think more about this and if possible write up something more coherent and explanatory.

[-]johnswentworth2y102

Everybody's been talking about Paxlovid, and how ridiculous it is to both stop the trial since it's so effective but also not approve it immediately. I want to at least float an alternative hypothesis, which I don't think is very probable at this point, but does strike me as at least plausible (like, 20% probability would be my gut estimate) based on not-very-much investigation.

Early stopping is a pretty standard p-hacking technique. I start out planning to collect 100 data points, but if I manage to get a significant p-value with only 30 data points, then I just stop there. (Indeed, it looks like the Paxlovid study only had 30 actual data points, i.e. people hospitalized.) Rather than only getting "significance" if all 100 data points together are significant, I can declare "significance" if the p-value drops below the line at any time. That gives me a lot more choices in the garden of forking counterfactual paths.

Now, success rates on most clinical trials are not very high. (They vary a lot by area - most areas are about 15-25%. Cancer is far and away the worst, below 4%, and vaccines are the best, over 30%.) So I'd expect that p-hacking is a pretty large chunk of approved drugs, which means pharma companies are heavily selected for things like finding-excuses-to-halt-good-seeming-trials-early.

[-]gwern2y160

Early stopping is a pretty standard p-hacking technique.

It was stopped after a pre-planned interim analysis; that means they're calculating the stopping criteria/p-values with multiple testing correction built in, using sequential analysis.

[-]johnswentworth3y100

Brief update on how it's going with RadVac.

I've been running ELISA tests all week. In the first test, I did not detect stronger binding to any of the peptides than to the control in any of several samples from myself or my girlfriend. But the control itself was looking awfully suspicious, so I ran another couple tests. Sure enough, something in my samples is binding quite strongly to the control itself (i.e. the blocking agent), which is exactly what the control is supposed to not do. So I'm going to try out some other blocking agents, and hopefully get an actually-valid control group.

(More specifics on the test: I ran a control with blocking agent + sample, and another with blocking agent + blank sample, and the blocking agent + sample gave a strong positive signal while the blank sample gave nothing. That implies something in the sample was definitely binding to both the blocking agent and the secondary antibodies used in later steps, and that binding was much stronger than the secondary antibodies themselves binding to anything in the blocking agent + blank sample.)

In other news, the RadVac team released the next version of their recipe + whitepaper. Particularly notable:

... man

... (read more)

4ChristianKl3y

I would expect that hedging also happens because making definitive clinical claims has more danger from the FDA then making hedged statements.

[-]johnswentworth4y90

Neat problem of the week: researchers just announced roughly-room-temperature superconductivity at pressures around 270 GPa. That's stupidly high pressure - a friend tells me "they're probably breaking a diamond each time they do a measurement". That said, pressures in single-digit GPa do show up in structural problems occasionally, so achieving hundreds of GPa scalably/cheaply isn't that many orders of magnitude away from reasonable, it's just not something that there's historically been much demand for. This problem plays with one idea for generating suc... (read more)

[-]johnswentworth2mo82

Here's an AI-driven external cognitive tool I'd like to see someone build, so I could use it.

This would be a software tool, and the user interface would have two columns. In one column, I write. Could be natural language (like google docs), or code (like a normal IDE), or latex (like overleaf), depending on what use-case the tool-designer wants to focus on. In the other column, a language and/or image model provides local annotations for each block of text. For instance, the LM's annotations might be:

(Natural language or math use-case:) Explanation or visu

... (read more)

4[anonymous]2mo

Can you share your prompts and if you consider the output satisfactory for some example test cases?

2johnswentworth2mo

I haven't experimented very much, but here's one example prompt. This one produced basically-decent results from GPT-4. Although I don't have the exact prompt on hand at the moment, I've also asked GPT-4 to annotate a piece of code line-by-line with a Fermi estimate of its runtime, which worked pretty well.

2[anonymous]2mo

Yeah i was thinking your specs were, well 1. Wrap gpt-4 and Gemini, columned output over a set of text, applying prompts to each section? Prototype in a weekend. 2. Make the AI able to meaningfully contribute non obvious comments to help someone who already is an expert? https://xkcd.com/1425/

4johnswentworth2mo

Don't really need comments which are non-obvious to an expert. Part of what makes LLMs well-suited to building external cognitive tools is that external cognitive tools can create value by just tracking "obvious" things, thereby freeing up the user's attention/working memory for other things.

4Viliam2mo

So kinda like spellcheckers (most typos you could figure out, but why spend time and attention on proofreading if the program can do that for you), but... thought-checkers. Like, if a part of your article contradicts another part, it would be underlined.

4gwern2mo

I've long wanted this, but it's not clear how to do it. Long-context LLMs are still expensive and for authors who need it most, context windows are still too small: me or Yudkowsky, for example, would still exceed the context window of almost all LLMs except possibly the newest Gemini. And then you have their weak reasoning. You could try to RAG it, but embeddings are not necessarily tuned to encode logically contradictory or inconsistent claims: probably if I wrote "the sky is blue" in one place and "the sky is red" in another, a retrieval would be able to retrieve both paragraphs and a LLM point out that they are contradictory, but such blatant contradictions are probably too rare to be useful to check for. You want something more subtle, like where you say "the sky is blue" and elsewhere "I looked up from the ground and saw the color of apples". You could try to brute force it and consider every pairwise comparison of 2 reasonable sized chunks of text and ask for contradictions, but this is quadratic and will get slow and expensive and probably turn up too many false positives. (And how do you screen off false positives and mark them 'valid'?) My general thinking these days is that these truly useful 'tools for thought' LLMs are going to require either much better & cheaper LLMs, so smart that they can provide useful assistance despite being used in a grossly unnatural way input-wise or safety-tuned to hell, or biting the bullet of finetuning/dynamic-evaluation (see my Nenex proposal). A LLM finetuned on my corpus can hope to quickly find, with good accuracy, contradictions because it was trained to know 'the sky was blue' when I wrote that at the beginning of the corpus, and it gets confused when it hits 'the color of ____' and it gets the prediction totally wrong. And RAG on an embedding tailored to the corpus can hope to surface the contradictions because it sees the two uses are the same in the essays' context, etc. (And if you run them locally, and they do

2Viliam2mo

Perhaps you could do it in multiple steps. Feed it a shorter text (that fits in the window) and ask it to provide a short summary focusing on factual statements. Then hopefully all short versions could fit in the window. Find the contradiction -- report the two contradicting factual statements and which section they appeared in. Locate the statement in the original text.

2[anonymous]2mo

Did you write more than 7 million words yet @gwern? https://www.google.com/amp/s/blog.google/technology/ai/google-gemini-next-generation-model-february-2024/amp/ Basically it's the "lazy wait" calculation. Get something to work now or wait until the 700k or 7m word context window ships.

2gwern2mo

I may have. Just gwern.net is, I think, somewhere around 2m, and it's not comprehensive. Also, for contradictions, I would want to detect contradictions against citations/references as well (detecting miscitations would be more important than self-consistency IMO), and as a rough ballpark, the current Gwern.net annotation* corpus is approaching 4.3m words, looks like, and is also not comprehensive. So, closer than one might think! (Anyway, doesn't deal with the cost or latency: as you can see in the demos, we are talking minutes, not seconds, for these million-token calls and the price is probably going to be in the dollar+ regime per call.) * which are not fulltext. It would be nice to throw in all of the hosted paper & book & webpage fulltexts, but then that's probably more like 200m+ words.

5ryan_greenblatt2mo

There isn't any clear technical obstruction to getting this time down pretty small with more parallelism.

2gwern2mo

There may not be any 'clear' technical obstruction, but it has failed badly in the past. 'Add more parallelism' (particularly hierarchically) is one of the most obvious ways to improve attention, and people have spent the past 5 years failing to come up with efficient attentions that do anything but move along a Pareto frontier from 'fast but doesn't work' to 'slow and works only as well as the original dense attention'. It's just inherently difficult to know what tokens you will need across millions of tokens without input from all the other tokens (unless you are psychic), implying extensive computation of some sort, which makes things inherently serial and costs you latency, even if you are rich enough to spend compute like water. You'll note that when Claude-2 was demoing the ultra-long attention windows, it too spent a minute or two churning. While the most effective improvements in long-range attention like Flash Attention or Ring Attention are just hyperoptimizing dense attention, which is inherently limited.

[-]johnswentworth3y80

[Epistemic status: highly speculative]

Smoke from California/Oregon wildfires reaching the East Coast opens up some interesting new legal/political possibilities. The smoke is way outside state borders, all the way on the other side of the country, so that puts the problem pretty squarely within federal jurisdiction. Either a federal agency could step in to force better forest management on the states, or a federal lawsuit could be brought for smoke-induced damages against California/Oregon. That would potentially make it a lot more difficult for local homeowners to block controlled burns.

[-]johnswentworth3y80

I had a shortform post pointing out the recent big jump in new businesses in the US, and Gwern replied:

How sure are you that the composition is interesting? How many of these are just quick mask-makers or sanitizer-makers, or just replacing restaurants that have now gone out of business? (ie very low-value-added companies, of the 'making fast food in a stall in a Third World country' sort of 'startup', which make essentially no or negative long-term contributions).

This was a good question in context, but I disagree with Gwern's model of where-progress-come... (read more)

4ChristianKl3y

The pandemic also has the effect of showing the kind of business ideas people try. It pushes a lot of innovation in food delivery. Some of the pandemic driver innovation will become worthless once the pandemic is over but a few good ideas likely survive and the old ideas of the businesses that went out of business are still around.

[-]johnswentworth11mo72

So I saw the Taxonomy Of What Magic Is Doing In Fantasy Books and Eliezer’s commentary on ASC's latest linkpost, and I have cached thoughts on the matter.

My cached thoughts start with a somewhat different question - not "what role does magic play in fantasy fiction?" (e.g. what fantasies does it fulfill), but rather... insofar as magic is a natural category, what does it denote? So I'm less interested in the relatively-expansive notion of "magic" sometimes seen in fiction (which includes e.g. alternate physics), and more interested in the pattern cal... (read more)

[-]johnswentworth2y60

Weather just barely hit 80°F today, so I tried the Air Conditioner Test.

Three problems came up:

Turns out my laser thermometer is all over the map. Readings would change by 10°F if I went outside and came back in. My old-school thermometer is much more stable (and well-calibrated, based on dipping it in some ice water), but slow and caps out around 90°F (so I can't use to measure e.g. exhaust temp). I plan to buy a bunch more old-school thermometers for the next try.
I thought opening the doors/windows in rooms other than the test room and setting up a fan w

... (read more)

[-]johnswentworth3y60

I've long been very suspicious of aggregate economic measures like GDP. But GDP is clearly measuring something, and whatever that something is it seems to increase remarkably smoothly despite huge technological revolutions. So I spent some time this morning reading up and playing with numbers and generally figuring out how to think about the smoothness of GDP increase.

Major takeaways:

When new tech makes something previously expensive very cheap, GDP mostly ignores it. (This happens in a subtle way related to how we actually compute it.)
- Historical GDP curve

... (read more)

[-]johnswentworth3y200

If you want a full post on this, upvote this comment.

4Adam Zerner3y

In writing How much should we value life?, I spent some time digging into AI timeline stuff. It lead me to When Will AI Be Created?, written by Luke Muehlhauser for MIRI. He noted that there is reason not to trust expert opinions on AI timelines, and that trend extrapolation may be a good alternative. This point you're making about GDP seems like it is real progress towards coming up with a good way to do trend extrapolation, and thus seems worth a full post IMO. (Assuming it isn't already well known by the community or something, which I don't get the sense is the case.)

2Raemon3y

Upvoted, but I mostly trust you to write the post if it seems like there's an interesting meaty thing worth saying.

2johnswentworth3y

Eh, these were the main takeaways, the post would just be more details and examples so people can see the gears behind it.

4Mark Xu3y

A similar point is made by Korinek in his review of Could Advanced AI Drive Explosive Economic Growth:

2Mark Xu3y

In general, Baumol type effects (spending decreasing in sectors where productivity goes up), mean that we can have scenarios in which the economy is growing extremely fast on "objective" metrics like energy consumption, but GDP has stagnated because that energy is being spent on extremely marginal increases in goods being bought and sold.

[-]johnswentworth3y60

Chrome is offering to translate the LessWrong homepage for me. Apparently, it is in Greek.

2habryka3y

Huh, amusing. We do ship a font that has nothing but the greek letter set in it, because people use greek unicode symbols all the time and our primary font doesn't support that character set. So my guess is that's where Google gets confused.

2johnswentworth3y

Oh, I had just assumed it was commentary on the writing style/content.

4Viliam3y

If about 10% of articles have "Ω" in their title, what is the probability that the page is in Greek? :D

[-]johnswentworth4y60

Someone should write a book review of The Design of Everyday Things aimed at LW readers, so I have a canonical source to link to other than the book itself.

[-]johnswentworth7mo50

Does anyone know of an "algebra for Bayes nets/causal diagrams"?

More specifics: rather than using a Bayes net to define a distribution, I want to use a Bayes net to state a property which a distribution satisfies. For instance, a distribution P[X, Y, Z] satisfies the diagram X -> Y -> Z if-and-only-if the distribution factors according to
P[X, Y, Z] = P[X] P[Y|X] P[Z|Y].

When using diagrams that way, it's natural to state a few properties in terms of diagrams, and then derive some other diagrams they imply. For instance, if a distribution P[W, X, Y, Z]... (read more)

[-]johnswentworth2mo42

I keep seeing news outlets and the like say that SORA generates photorealistic videos, can model how things move in the real world, etc. This seems like blatant horseshit? Every single example I've seen looks like video game animation, not real-world video.

Have I just not seen the right examples, or is the hype in fact decoupled somewhat from the model's outputs?

6ryan_greenblatt2mo

I think I mildly disagree, but probably we're looking at the same examples. I think the most impressive (in terms of realism) videos are under "Sora is able to generate complex scenes with multiple characters, ...". (Includes white SUV video and Toyko suburbs video.) I think all of these videos other than the octopus and paper planes are "at-a-glance" photorealistic to me. Overall, I think SORA can do "at-a-glance" photorealistic videos and can model to some extent how things move in the real world. I don't think it can do both complex motion and photorealism in the same video. As in, the videos which are photorealistic don't really involve complex motion and the videos which involve complex motion aren't photorealistic. (So probably some amount of hype, but also pretty real?)

3habryka2mo

Hmm, I don't buy it. These two scenes seem very much not like the kind of thing a video game engine could produce: Look at this frame! I think there is something very slightly off about that face, but the cat hitting the person's face and the person's reaction seem very realistic to me and IMO qualifies as "complex motion and photorealism in the same video".

2johnswentworth2mo

Were these supposed to embed as videos? I just see stills, and don't know where they came from.

4ryan_greenblatt2mo

These are stills from some of the videos I was referencing.

2ryan_greenblatt2mo

TBC, I wasn't claiming anything about video game engines. I wouldn't have called the cat one "complex motion", but I can see where you're coming from.

2RamblinDash2mo

Yeah, I mean I guess it depends on what you mean by photorealistic. That cat has three front legs.

7gwern2mo

Yeah, this is the example I've been using to convince people that the game engines are almost certainly generating training data but are probably not involved at sampling time. I can't come up with any sort of hybrid architecture like 'NN controlling game-engine through API' where you get that third front leg. One of the biggest benefits of a game-engine would be ensuring exactly that wouldn't happen - body parts becoming detached and floating in mid-air and lack of conservation. If you had a game engine with a hyper-realistic cat body model in it which something external was manipulating, one of the biggest benefits is that you wouldn't have that sort of common-sense physics problem. (Meanwhile, it does look like past generative modeling of cats in its errors. Remember the ProGAN interpolation videos of CATS? Hilarious, but also an apt demonstration of how extremely hard cats are to model. They're worse than hands.) In addition, you see plenty of classic NN tells throughout - note the people driving a 'Dandrover'...

2johnswentworth2mo

Yeah, those were exactly the two videos which most made me think that the model was mostly trained on video game animation. In the tokyo one, the woman's facial muscles never move at all, even when the camera zooms in on her. And in the SUV one, the dust cloud isn't realistic, but even covering that up the SUV has a Grand Theft Auto look to its motion. "Can't do both complex motion and photorealism in the same video" is a good hypothesis to track, thanks for putting that one on my radar.

2ryan_greenblatt2mo

(Note that I was talking about the one with the train going through Toyko suburbs.)

[-]johnswentworth9mo42

Putting this here for posterity: I have thought since the superconductor preprint went up, and continue to think, that the markets are putting generally too little probability on the claims being basically-true. I thought ~70% after reading the preprint the day it went up (and bought up a market on manifold to ~60% based on that, though I soon regretted not waiting for a better price), and my probability has mostly been in the 40-70% range since then.

2johnswentworth9mo

After seeing the markets jump up in response to the latest, I think I'm more like 65-80%.

[-]johnswentworth2y40

Languages should have tenses for spacelike separation. My friend and I do something in parallel, it's ambiguous/irrelevant which one comes first, I want to say something like "I expect my friend <spacelike version of will do/has done/is doing> their task in such-and-such a way".

5JBlack2y

That sounds more like a tenseless sentence than using a spacelike separation tense. Your friend's performance of the task may well be in your future or past lightcone (or extend through both), but you don't wish to imply any of these. There are languages with tenseless verbs, as well as some with various types of spatial tense. The closest I can approximate this in English without clumsy constructs is "I expect my friend does their task in such-and-such a way", which I agree isn't very satisfactory.

4adamShimi2y

Who would have thought that someone would ever look at CSP and think "I want english to be more like that"?

2johnswentworth2y

lol

3kave2y

Future perfect (hey, that's the name of the show!) seems like a reasonable hack for this in English

[-]johnswentworth2y40

Two kinds of cascading catastrophes one could imagine in software systems...

A codebase is such a spaghetti tower (and/or coding practices so bad) that fixing a bug introduces, on average, more than one new bug. Software engineers toil away fixing bugs, making the software steadily more buggy over time.
Software services managed by different groups have dependencies - A calls B, B calls C, etc. Eventually, the dependence graph becomes connected enough and loopy enough that a sufficiently-large chunk going down brings down most of the rest, and nothing can go

... (read more)

[-]johnswentworth3y40

I wish there were a fund roughly like the Long-Term Future Fund, but with an explicit mission of accelerating intellectual progress.

6habryka3y

I mean, just to be clear, I am all in favor of intellectual progress. But doing so indiscriminately does sure seem a bit risky in this world of anthropogenic existential risks. Reminds me of my mixed feelings on the whole Progress Studies thing.

6johnswentworth3y

Yeah, I wouldn't want to accelerate e.g. black-box ML. I imagine the real utility of such a fund would be to experiment with ways to accelerate intellectual progress and gain understanding of the determinants, though the grant projects themselves would likely be more object-level than that. Ideally the grants would be in areas which are not themselves very risk-relevant, but complicated/poorly-understood enough to generate generalizable insights into progress. I think it takes some pretty specific assumptions for such a thing to increase risk significantly on net. If we don't understand the determinants of intellectual progress, then we have very little ability to direct progress where we want it; it just follows whatever the local gradient is. With more understanding, at worst it follows the same gradient faster, and we end up in basically the same spot. The one way it could net-increase risk is if the most likely path of intellectual progress leads to doom, and the best way to prevent doom is through some channel other than intellectual progress (like political action, for instance). Then accelerating the intellectual progress part potentially gives the other mechanisms (like political bodies) less time to react. Personally, though, I think a scenario in which e.g. political action successfully prevents intellectual progress from converging to doom (in a world where it otherwise would have) is vanishingly unlikely (like, less than one-in-a-hundred, maybe even less than one-in-a-thousand).

3Quinn3y

You might check out Donald Braben's view, it says "transformative research" (i.e. fundamental results that create new fields and industries) is critical for the survival of civilization. He does not worry that transformative results might end civilization.

[-]johnswentworth2y30

Here's an interesting problem of embedded agency/True Names which I think would make a good practice problem: formulate what it means to "acquire" something (in the sense of "acquiring resources"), in an embedded/reductive sense. In other words, you should be able-in-principle to take some low-level world-model, and a pointer to some agenty subsystem in that world-model, and point to which things that subsystem "acquires" and when.

Some prototypical examples which an answer should be able to handle well:

Organisms (anything from bacteria to plant to animals) eating things, absorbing nutrients, etc.
Humans making money or gaining property.

3Gunnar_Zarncke1y

...and how the brain figures this out and why it is motivated to do so. There are a lot of simple animals that apparently "try to control" resources or territory. How? Drives to control resources occur everywhere. And your control of resources is closely related to your dominance in a dominance hierarchy. Which seems to be regulated in many animals by serotonin. See e.g. https://www.nature.com/articles/s41386-022-01378-2

[-]johnswentworth4y30

What if physics equations were written like statically-typed programming languages?

$(\frac{m a s s \cdot l e n g t h}{t i m e^{2}} : F) = (\frac{m a s s}{-} : m) (\frac{l e n g t h}{t i m e^{2}} : a)$

$(\frac{m a s s}{l e n g t h \cdot t i m e^{2}} : P) (\frac{l e n g t h^{3}}{-} : V) = (\frac{-}{-} : N) (\frac{m a s s \cdot l e n g t h^{2}}{t i m e^{2} \cdot t e m p} : R) (\frac{t e m p}{-} : T)$

6jimrandomh4y

The math and physics worlds still use single-letter variable names for everything, decades after the software world realized that was extremely bad practice. This makes me pessimistic about the adoption of better notation practices.

5johnswentworth4y

Better? I doubt it. If physicists wrote equations the way programmers write code, a simple homework problem would easily fill ten pages. Verboseness works for programmers because programmers rarely need to do anything more complicated with their code than run it - analogous to evaluating an expression, for a physicist or mathematician. Imagine if you needed to prove one program equivalent to another algebraically - i.e. a sequence of small transformations, with a record of intermediate programs derived along the way in order to show your work. I expect programmers subjected to such a use-case would quickly learn the virtues of brevity.

3Steven Byrnes4y

Yeah, I'm apparently not intelligent enough to do error-free physics/engineering calculations without relying on dimensional analysis as a debugging tool. I even came up with a weird, hack-y way to do that in computing environments like Excel and Cython, where flexible multiplicative types are not supported.

[-]johnswentworth1y22

An interesting conundrum: one of the main challenges of designing useful regulation for AI is that we don't have any cheap and robust way to distinguish a dangerous neural net from a non-dangerous net (or, more generally, a dangerous program from a non-dangerous program). This is an area where technical research could, in principle, help a lot.

The problem is, if there were some robust metric for how dangerous a net is, and that metric were widely known and recognized (as it would probably need to be in order to be used for regulatory purposes), then someone would probably train a net to maximize that metric directly.

6Garrett Baker1y

This seems to lead to the solution of trying to make your metric one-way, in the sense that your metric should 1. Provide an upper-bound on the dangerousness of your network 2. Compress the space of networks which map to approximately the same dangerousness level on the low end of dangerousness, and expand the space of networks which map to approximately the same dangerousness level on the upper end of dangerous, so that you can train your network to minimize the metric, but when you train your network to maximize the metric you end up in a degenerate are with technically very high measured danger levels but in actuality very low levels of dangerousness. We can hope (or possibly prove) that as you optimize upwards on the metric you get subject to goodheart's curse, but the opposite occurs on the lower end.

4Thane Ruthenis1y

Sure, even seems a bit tautological: any such metric, to be robust, would need to contain in itself a definition of a dangerously-capable AI, so you probably wouldn't even need to train a model to maximize it. You'd be able to just lift the design from the metric directly.

2Thane Ruthenis1y

Do you have any thoughts on a softer version of this problem, where the metric can't be maximized directly, but gives a concrete idea of what sort of challenge your AI needs to beat to qualify as AGI? (And therefore in which direction in the architectural-design-space you should be moving.) Some variation on this seems like it might work as a "fire alarm" test set, but as you point out, inasmuch as it's recognized, it'll be misapplied for benchmarking instead. (I suppose the ideal way to do it would be to hand it off to e. g. ARC, so they can use it if OpenAI invites them for safety-testing again. This way, SOTA models still get tested, but the actors who might misuse it aren't aware of the testing's particulars until they succeed anyway...)

[-]johnswentworth3y20

I just went looking for a good reference for the Kelly criterion, and didn't find any on Lesswrong. So, for anybody who's looking: chapter 6 of Thomas & Cover's textbook on information theory is the best source I currently know of.

6Yoav Ravid3y

Might be a good thing to add to the Kelly Criterion tag

[-]johnswentworth4y20

Neat problem of the week: we have n discrete random variables, $X_{1} . . . X_{n}$ . Given any variable, all variables are independent:

$\forall i : P [X | X_{i}] = \prod_{j} P [X_{j} | X_{i}]$

Characterize the distributions which satisfy this requirement.

This problem came up while working on the theorem in this post, and (separately) in the ideas behind this post. Note that those posts may contain some spoilers for the problem, though frankly my own proofs on this one just aren't very good.

[-]johnswentworth4y20

For short-term, individual cost/benefit calculations around C19, it seems like uncertainty in the number of people currently infected should drop out of the calculation.

For instance: suppose I'm thinking about the risk associated with talking to a random stranger, e.g. a cashier. My estimated chance of catching C19 from this encounter will be roughly proportional to $N_{i n f e c t e d}$ . But, assuming we already have reasonably good data on number hospitalized/died, my chances of hospitalization/death given infection will be roughly inversely proportional to $N_{i n}$ ... (read more)

[+][comment deleted]2y-40

Deleted by johnswentworth, 04/01/2022

Moderation Log