Building gears-level models is expensive - often prohibitively expensive. Black-box approaches are usually cheaper and faster. But black-box approaches rarely generalize - they need to be rebuilt when conditions change, don’t identify unknown unknowns, and are hard to build on top of. Gears-level models, on the other hand, offer permanent, generalizable knowledge which can be applied to many problems in the future, even if conditions shift.

gwern1dΩ276025
6
Warning for anyone who has ever interacted with "robosucka" or been solicited for a new podcast series in the past few years: https://www.tumblr.com/rationalists-out-of-context/744970106867744768/heads-up-to-anyone-whos-spoken-to-this-person-i
The idea that maximally-coherent agents look like squiggle-maximizers raises the question: what would it look like for humans to become maximally coherent? One answer, which Yudkowsky gives here, is that conscious experiences are just a "weird and more abstract and complicated pattern that matter can be squiggled into". But that seems to be in tension with another claim he makes, that there's no way for one agent's conscious experiences to become "more real" except at the expense of other conscious agents—a claim which, according to him, motivates average utilitarianism across the multiverse. Clearly a squiggle-maximizer would not be an average squigglean. So what's the disanalogy here? It seems like @Eliezer Yudkowsky is basically using SSA, but comparing between possible multiverses—i.e. when facing the choice between creating agent A or not, you look at the set of As in the multiverse where you decided yes, and compare it to the set of As in the multiverse where you decided no, and (if you're deciding for the good of A) you pick whichever one gives A a better time on average. Yudkowsky has written before (can't find the link) that he takes this approach because alternatives would entail giving up on predictions about his future experiences—e.g. constantly predicting he's a Boltzmann brain and will dissolve in the next second. But this argument by Wei Dai shows that agents which reason in this way can be money-pumped by creating arbitrarily short-lived copies of them. Based on this I claim that Yudkowsky's preferences are incoherent, and that the only coherent thing to do here is to "expect to be" a given copy in proportion to the resources it will have available, as anthropic decision theory claims. (Incidentally, this also explains why we're at the hinge of history.) But this is just an answer, it doesn't dissolve the problem. What could? Some wild guesses: 1. You are allowed to have preferences about the external world, and you are allowed to have preferences about your "thread of experience"—you're just not allowed to have both. The incoherence comes from trying to combine the two; the coherent thing to do would be to put them into different agents, who will then end up in very different parts of the multiverse. 2. Another way of framing this: you are allowed to be a decision-maker, and you are allowed to be a repository of welfare, but you're not allowed to be both (on pain of incoherence/being dutch-booked). 3. Something totally different: the problem here is that we don't have intuitive experience of being agents which can copy themselves, shut down copies, re-merge, etc. If we did, then maybe SSA would seem as silly as expecting to end up in a different universe whenever we went to sleep. 4. Actually, maybe the operative thing we lack experience with is not just splitting into different subagents, but rather merging together afterwards. What does it feel like to have been thousands of different parallel agents, and now be a single agent with their unified experiences? What sort of identity would one construct in that situation? Maybe this is an important part of dissolving the problem.
Today in Azathoth news: "Eurasian hoopoes raise extra chicks so they can be eaten by their siblings" It seems that the hoopoes lay extra eggs in times of abundance — more than they would be able to see through to fledging — as a way of storing up food for the older siblings. It is rather gruesomely called the "larder" hypothesis. > “What surprised me the most was the species practicing this aggressive parenting,” says Vladimir Pravosudov, an ecologist at the University of Nevada, Reno. Hoopoes primarily eat insects, he notes, so their long, curved bills aren’t ideal for killing and eating chicks. That might be why, Soler says, mother hoopoes often grab the unlucky chick and shove it into the mouth of an older chick, which swallows it whole. Literal baby-eaters!
Note, I consider this post to be “Lynette speculates based on one possible model”, rather than “scientific evidence shows”, based on my default skepticism for psych research.  A recent Astral Codex Ten post argued that advice is written by people who struggle because they put tons of time into understanding the issue. People who succeeded effortlessly don’t have explicit models of how they perform (section II). It’s not the first time I’ve seen this argument, e.g. this Putanumonit post arguing that explicit rules help poor performers, who then abandon the rules and just act intuitively once they become good.  This reminded me of a body of psych research I half-remembered from college called Choking under Pressure.  My memory was that if you think about what you’re doing too much after becoming good, then you do worse. The paper I remembered from college was from 1986, so I found “Choking interventions in sports: A systematic review” from 2017.  It turns out that I was remembering the “self-focused” branch of choking research.  "Self-focus approaches have largely been extended from Baumeister’s (1984) automatic execution hypothesis. Baumeister explains that choking occurs because, when anxiety increases, the athlete allocates conscious attention to movement execution. This conscious attention interferes with otherwise automatic nature of movement execution, which results in performance decrements." (Slightly worrying. I have no particular reason to doubt this body of work, but Baumeister's "willpower as muscle" -- i.e. ego depletion -- work hasn't stood upwell.)  Two studies found that distraction while training negatively impacted performance. I’m not sure if this this was supposed to acclimatize the participants to distractions while performing or reduce their self-focus while training. (I’m taking the paper’s word and not digging beyond the surface on the numbers.) Either way, I feel very little surprise that practicing while distracted was worse. Maybe we just need fatal-car-crash magnitude effects before we notice that focus is good?   Which makes it all the more surprising that seven of eight studies found that athletes performed better under pressure if they simultaneously did a second task (such as counting backwards). (The eighth study found a null result.) According to the theory, the second task helped because it distracted from self-focus on the step-by-step execution.  If this theory holds up, it seems to support paying deliberate attention to explicit rules while learning but *not* paying attention to those rules once you're able to use them intuitively (at least for motor tasks). In other words, almost exactly what Jacob argued in the Putanumonit article. Conclusions  I was intrigued by this argument because I’ve argued that building models is how one becomes an expert.[1] After considering it, I don’t actually think the posts above offer a counter argument to my claim.  My guess is that experts do have models of skills they developed, even if they have fewer models (because they needed to explicitly learn fewer skills). The NDM method for extracting experts’ models implies that the experts have models that can be coaxed out. Holden’s Learning By Writing post feels like an explicit model.  Another possibility is that experts forget the explicit models after switching to intuition. If they faced the challenges more than five or ten years ago, they may not remember the models that helped them then. Probably uncoincidentally, this aligns neatly with Cal Newport’s advice to seek advice from someone who recently went through the challenges you’re now facing because they will still remember relevant advice. Additionally, the areas of expertise I care about aren’t like walking, where most people will effortlessly succeed. Expertise demands improving from where you started. Both posts and the choking under pressure literature agree that explicit models help you improve, at least for a while.  “Find the best explicit models you can and practice until you don’t need them” seems like a reasonable takeaway.  ---------------------------------------- [1] Note, there’s an important distinction between building models of your field and building models of skills. It seems like the main argument mostly applies to models of skills. I doubt Scott would disagree that models of fields are valuable, given how much time he’s put into developing his model of psychopharmacology.
Reposting myself from discord, on the topic of donating 5000$ to EA causes. > if you're doing alignment research, even just a bit, then the 5000$ are plobly better spent on yourself > > if you have any gears level model of AI stuff then it's better value to pick which alignment org to give to yourself; charity orgs are vastly understaffed and you're essentially contributing to the "picking what to donate to" effort by thinking about it yourself > > if you have no gears level model of AI then it's hard to judge which alignment orgs it's helpful to donate to (or, if giving to regranters, which regranters are good at knowing which alignment orgs to donate to) > > as an example of regranters doing massive harm: openphil gave 30M$ to openai at a time where it was critically useful to them, (supposedly in order to have a chair on their board, and look how that turned out when the board tried to yeet altman) > > i know of at least one person who was working in regranting and was like "you know what i'd be better off doing alignment research directly" — imo this kind of decision is probly why regranting is so understaffed > > it takes technical knowledge to know what should get money, and once you have technical knowledge you realize how much your technical knowledge could help more directly so you do that, or something

Popular Comments

Recent Discussion

I'm looking for computer games that involve strategy, resource management, hidden information, and management of "value of information" (i.e. figuring out when to explore or exploit), which:

  • *can* be beaten in 30 – 120 minutes on your first try (or, there's a clear milestone that's about that long)
  • but, it'd be pretty hard to do so unless you are trying really hard. Even if a pretty savvy gamer shouldn't be able to by default.

This is for my broader project of "have a battery of exercises that train/test people's general reasoning on openended problems." Each exercise should ideally be pretty different from the other ones.

In this case, I don't expect anyone to have such a game that they have beaten on their first try, but, I'm looking for games where this seems at least plausible, if you were taking a long time to think each turn, or pausing a lot.

The strategy/resource/value-of-information aspect is meant to correspond to some real world difficulties of running longterm ambitious planning.


(One example game that's been given to me in this category is "Luck Be a Landlord")

1Dweomite2h
If your definition of "hidden information" implies that chess has it then I think you will predictably be misunderstood. Terms that I associate with (gaining advantage by spending time modeling a situation) include:  thinking, planning, analyzing, simulating, computing ("running the numbers")
2Raemon1h
Yeah I do not super stand by how I phrased it in the post. But also your second paragraph feels wrong to me too – in some sense yes Chess and Slay the Spire hidden information are "the same", but, like, it seems at least somewhat important that in Slay the Spire there are things you can't predict by purely running simulations forward, you have to have a probability distribution over pretty unknown things. (I'm not sure I'll stand by either this or my last comment, either. I'm thinking out loud, and may have phrased things wrong here)

Some concepts that I use:

Randomness is when the game tree branches according to some probability distribution specified by the rules of the game.  Examples:  rolling a die; cutting a deck at a random card.

Slay the Spire has randomness; Chess doesn't.

Hidden Information is when some variable that you can't directly observe influences the evolution of the game.  Examples: a card in an opponent's hand, which they can see but you can't; the 3 solution cards set aside at the start of a game of Clue; the winning pattern in a game of Mastermind.

Peop... (read more)

1Dweomite2h
I haven't played it, but someone disrecommended it to me on the basis that there was no way to know which skills you'd need to survive the scripted events except to have seen the script before.

Introducing Devin

Is the era of AI agents writing complex code systems without humans in the loop upon us?

Cognition is calling Devin ‘the first AI software engineer.’

Here is a two minute demo of Devin benchmarking LLM performance.

Devin has its own web browser, which it uses to pull up documentation.

Devin has its own code editor.

Devin has its own command line.

Devin uses debugging print statements and uses the log to fix bugs.

Devin builds and deploys entire stylized websites without even being directly asked.

What could possibly go wrong? Install this on your computer today.

Padme.

The Real Deal

I would by default assume all demos were supremely cherry-picked. My only disagreement with Austen Allred’s statement here is that this rule is not new:

Austen Allred: New rule:

If someone only shows their AI model in tightly

...

Is Devin using GPT-4, GPT-4T, or one of the 2 currently available long context models, Claude Opus 200k or Gemini 1.5?  

March 14, 2023 is GPT-4, but the "long" context was expensive and initially unavailable to anyone
Reason that matters is November 6, 2023 is the announcement for GPT-4T, which is 128k context.  

Feb 15, 2024 is Gemini 1.5 LC

March 4, 2024 is Claude 200k is 

That makes the timeline less than 4 months, and remember there's a few weeks generally between "announcement" and "here's your opportunity to pay for tokens with an API key"... (read more)

2Seth Herd4h
Very little alignment work of note, despite tons of published work on developing agents. I'm puzzled as to why the alignment community hasn't turned more of their attention toward language model cognitive architectures/agents, but I'm also reluctant to publish more work advertising how easily they might achieve AGI. ARC Evals did set up a methodology for Evaluating Language-Model Agents on Realistic Autonomous Tasks. I view this as a useful acknowledgment of the real danger of better LLMs, but I think it's inherently inadequate, because it's based on the evals team doing the scaffolding to make the LLM into an agent. They're not going to be able to devote nearly as much time to that as other groups will down the road. New capabilities are certainly going to be developed by combinations of LLM improvements, and hard work at improving the cognitive architecture scaffolding around them.
3MichaelDickens5h
I've heard it argued that this isn't representative of the programming population. Rather, people who suck at programming (and thus can't get jobs) apply to way more positions than people who are good at programming. I have no idea if it's true, but it sounds plausible.
1Random Developer4h
  I have interviewed a fair number of programmers, and I've definitely seen plenty of people who talked a good game but who couldn't write FizzBuzz (or sum the numbers in an array). And this was stacking the deck in their favor: They could use a programming language of their choice, plus a real editor, and if they appeared unable to deal with coding in front of people, I'd go sit on the other side of the office and let them work for a bit. I do not think these people were representative of the average working programmer, based on my experiences consulting at a variety of companies. The average engineer can write code.  

Meet inside The Shops at Waterloo Town Square - we will congregate in the seating area next to the Valu-Mart with the trees sticking out in the middle of the benches at 7pm for 15 minutes, and then head over to my nearby apartment's amenity room. If you've been around a few times, feel free to meet up at my apartment front door for 7:30 instead. (There is free city parking at Bridgeport and Regina, 22 Bridgeport Rd E.)

Activity

A KWR member is going to teach the rest of us some sleight of hand tricks! Just show up. 

Supplies to Bring if You Can

Lex Fridman posts timestamped transcripts of his interviews. It's an 83 minute read here and a 115 minute watch on Youtube.

It's neat to see Altman's side of the story. I don't know whether his charisma is more like +2SD or +5SD above the average American (concept origin: planecrash, likely doesn't follow a normal distribution), and I only have a vague grasp of what kinds of shenanigans +5SDish types can do when they pull out the stops in face-to-face interactions, so maybe you'll prefer to read the transcript over watching the video (although they're largely related to reading and responding to your facial expression and body language on the fly, not projecting their own).

If you've missed it, Gwern's side of the story is here.

Lex Fridman(00:01:05) Take me through

...
2Zack_M_Davis1h
The concept of measuring traits in standard deviation units did not originate in someone's roleplaying game session in 2022! Statistically literate people have been thinking in standardized units for more than a century. (If anyone has priority, it's Karl Pearson in 1894.) If you happened to learn about it from someone's RPG session, that's fine. (People can learn things from all different sources, not just from credentialed "teachers" in officially accredited "courses.") But to the extent that you elsewhere predict changes in the trajectory of human civilization on the basis that "fewer than 500 people on earth [are] currently prepared to think [...] at a level similar to us, who read stuff on the same level" as someone's RPG session, learning an example of how your estimate of the RPG session's originality was a reflection of your own ignorance should make you re-think your thesis.

The application of statistical variance, to the bundle of traits under the blanket label charisma (similar to the bundle of intelligence and results-acquisition under the blanket label thinkoomph), and the sociological implications of more socially powerful people being simultaneously more rare and also more capable of making the people around them erroneously feel safe, was a really interesting application that I picked up almost entirely from planecrash, yes.

I think that my "coordination takeoffs" post also ended up being a bad example for what you're tr... (read more)

Thanks to Rohin Shah, Ajeya Cotra, Richard Ngo, Paul Christiano, Jon Uesato, Kate Woolverton, Beth Barnes, and William Saunders for helpful comments and feedback.

Evaluating proposals for building safe advanced AI—and actually building any degree of confidence in their safety or lack thereof—is extremely difficult. Previously, in “An overview of 11 proposals for building safe advanced AI,” I tried evaluating such proposals on the axes of outer alignment, inner alignment, training competitiveness, and performance competitiveness. While I think that those criteria were good for posing open questions, they didn’t lend themselves well to actually helping us understand what assumptions needed to hold for any particular proposal to work. Furthermore, if you’ve read that paper/post, you’ll notice that those evaluation criteria don’t even work for some of the proposals...

Anyone -- and in particular Evhub -- have updated views on this post with the benefit of hindsight?

I intuitively don't like this approach, but I have trouble articulating exactly why. I've tried to explain a bit in this comment, but I don't think I'm quite saying the right thing.

One issue I have is that it doesn't seem to nicely handle interactions between the properties of the AI and how it's used. You can have an AI which is safe when used in some ways, but not always. This could be due to approaches like control (which mostly route around mechanistic... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Today, the AI Extinction Statement was released by the Center for AI Safetya one-sentence statement jointly signed by a historic coalition of AI experts, professors, and tech leaders.

Geoffrey Hinton and Yoshua Bengio have signed, as have the CEOs of the major AGI labs–Sam Altman, Demis Hassabis, and Dario Amodei–as well as executives from Microsoft and Google (but notably not Meta).

The statement reads: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”

We hope this statement will bring AI x-risk further into the overton window and open up discussion around AI’s most severe risks. Given the growing number of experts and public figures who take risks from advanced AI seriously, we hope to improve epistemics by encouraging discussion and focusing public and international attention toward this issue.

That's a good example of my point. Instead of a petition, a more impactful document would be a survey of risks and their probability of occurring in the opinion of these notable public figures. 

In addition, there should be a disclaimer regarding who has accepted money from Open Philanthropy or any other EA-affiliated non-profit for research. 

2RHollerith2h
The statement does not mention existential risk, but rather "the risk of extinction from AI".
1jeffreycaruso2h
Which makes it an existential risk.  "An existential risk is any risk that has the potential to eliminate all of humanity or, at the very least, kill large swaths of the global population." - FLI

Churchill famously called democracy “the worst form of Government except for all those other forms that have been tried from time to time” - referring presumably to the relative success of his native Britain, the US, and more generally Western Europe and today most of the first world.

I claim that Churchill was importantly wrong. Not (necessarily) wrong about the relative success of Britain/US/etc, but about those countries’ governments being well-described as simple democracy. Rather, I claim, the formula which has worked well in e.g. Britain and the US diverges from pure democracy in a crucial load-bearing way; that formula works better than pure democracy both in theory and in practice, and when thinking about good governance structures we should emulate the full formula rather than pure democracy.

Specifically, the actual...

I'm not entirely sure the thesis quite captured exactly what was going on. It's true balancing the factions was a big deal to the founders and there were number of ways one can cast the USA into some dichotomous buckets -- North/South (which is largely industrial/agrarian) or the Federalist/Anit Federalist and probably some others. But the other point of the separation of powers and the nature of the bicameral struture was about checks and balaces both within the population and within government itself. In that sense I agree one can cast the position as so... (read more)

3quiet_NaN5h
The failure mode of having a lot of veto-holders is that  nothing ever gets done. Which is fine if you are happy with the default state of affairs, but not so fine if you prefer not to run your state on the default budget of zero.  There are some international organizations heavily reliant on veto powers, the EU and the UN Security Council come to mind, and to a lesser degree NATO (as far as the admission of new members in concerned).  None of these are unmitigated success stories. From my understanding, getting stuff done in the EU means bribing or threatening every state who does not particularly benefit from whatever you want to do.  Likewise, getting Turkey to allow Sweden to join NATO was kind of difficult, from what I remember. Not very surprisingly, if you have to get 30 factions to agree on something, one will be likely to object for good or bad reasons.  The UN Security Council with its five veto-bearing permanent members does not even make a pretense at democratic legitimation. The three states with the biggest nuclear arsenals, plus two nuclear countries which used to be colonial powers. The nicest thing one can say about that arrangement is that it failed to start WW III, and in a few cases passed a resolution against some war criminal who did not have the backing of any of the veto powers. I think veto powers as part of a system of checks and balances are good in moderation, but add to many of them and you end up with a stalemate. -- I also do not think that the civil war could have been prevented by stacking the deck even more in favor of the South. Sooner or later the industrial economy in the North would have overtaken the slave economy in the South. At best, the North might have seceded in disgust, resulting in the South on track to become some rural backwater.
3Random Developer4h
Yes, there's actually some research into this area: https://www.jstor.org/stable/j.ctt7rvv7 "Veto Players: How Political Institutions Work". The theory apparently suggested that if you have too many "veto players", your government quickly becomes unable to act. And I suspect that states which are unable to act are vulnerable to major waves of public discontent during perceived crises.
3Michael Roe6h
The Northen Ireland Assembly works this way, at least for some things.   But, in general, the U.K. does not work that way. A particular political party sometimes gets a big majority.

I have anxiety and depression.

The kind that doesn’t go away, and you take pills to manage.

This is not a secret.

What’s more interesting is that I just switched medications from one that successfully managed the depression but not the anxiety to one that successfully manages the anxiety but not the depression, giving me a brief window to see my two comorbid conditions separated from each other, for the first time since ever.

What follows is a (brief) digression on what they’re like from the inside.

Depression

I’m still me when I’m depressed.

Just a version of me that’s sapped of all initiative, energy, and tolerance for human contact.

There are plenty of metaphors for depression - a grey fog being one of the most popular - but I often think of it in...

These are really good descriptions! (Going by my own and friends' experience). For me I might just tweak it to put anxiety as the height rather than the gravity. Thank you for writing these up!