Building gears-level models is expensive - often prohibitively expensive. Black-box approaches are usually cheaper and faster. But black-box approaches rarely generalize - they need to be rebuilt when conditions change, don’t identify unknown unknowns, and are hard to build on top of. Gears-level models, on the other hand, offer permanent, generalizable knowledge which can be applied to many problems in the future, even if conditions shift.

gwern1dΩ276025
6
Warning for anyone who has ever interacted with "robosucka" or been solicited for a new podcast series in the past few years: https://www.tumblr.com/rationalists-out-of-context/744970106867744768/heads-up-to-anyone-whos-spoken-to-this-person-i
The idea that maximally-coherent agents look like squiggle-maximizers raises the question: what would it look like for humans to become maximally coherent? One answer, which Yudkowsky gives here, is that conscious experiences are just a "weird and more abstract and complicated pattern that matter can be squiggled into". But that seems to be in tension with another claim he makes, that there's no way for one agent's conscious experiences to become "more real" except at the expense of other conscious agents—a claim which, according to him, motivates average utilitarianism across the multiverse. Clearly a squiggle-maximizer would not be an average squigglean. So what's the disanalogy here? It seems like @Eliezer Yudkowsky is basically using SSA, but comparing between possible multiverses—i.e. when facing the choice between creating agent A or not, you look at the set of As in the multiverse where you decided yes, and compare it to the set of As in the multiverse where you decided no, and (if you're deciding for the good of A) you pick whichever one gives A a better time on average. Yudkowsky has written before (can't find the link) that he takes this approach because alternatives would entail giving up on predictions about his future experiences—e.g. constantly predicting he's a Boltzmann brain and will dissolve in the next second. But this argument by Wei Dai shows that agents which reason in this way can be money-pumped by creating arbitrarily short-lived copies of them. Based on this I claim that Yudkowsky's preferences are incoherent, and that the only coherent thing to do here is to "expect to be" a given copy in proportion to the resources it will have available, as anthropic decision theory claims. (Incidentally, this also explains why we're at the hinge of history.) But this is just an answer, it doesn't dissolve the problem. What could? Some wild guesses: 1. You are allowed to have preferences about the external world, and you are allowed to have preferences about your "thread of experience"—you're just not allowed to have both. The incoherence comes from trying to combine the two; the coherent thing to do would be to put them into different agents, who will then end up in very different parts of the multiverse. 2. Another way of framing this: you are allowed to be a decision-maker, and you are allowed to be a repository of welfare, but you're not allowed to be both (on pain of incoherence/being dutch-booked). 3. Something totally different: the problem here is that we don't have intuitive experience of being agents which can copy themselves, shut down copies, re-merge, etc. If we did, then maybe SSA would seem as silly as expecting to end up in a different universe whenever we went to sleep. 4. Actually, maybe the operative thing we lack experience with is not just splitting into different subagents, but rather merging together afterwards. What does it feel like to have been thousands of different parallel agents, and now be a single agent with their unified experiences? What sort of identity would one construct in that situation? Maybe this is an important part of dissolving the problem.
Today in Azathoth news: "Eurasian hoopoes raise extra chicks so they can be eaten by their siblings" It seems that the hoopoes lay extra eggs in times of abundance — more than they would be able to see through to fledging — as a way of storing up food for the older siblings. It is rather gruesomely called the "larder" hypothesis. > “What surprised me the most was the species practicing this aggressive parenting,” says Vladimir Pravosudov, an ecologist at the University of Nevada, Reno. Hoopoes primarily eat insects, he notes, so their long, curved bills aren’t ideal for killing and eating chicks. That might be why, Soler says, mother hoopoes often grab the unlucky chick and shove it into the mouth of an older chick, which swallows it whole. Literal baby-eaters!
S-risks are barely discussed in LW, is that because: * People think they are so improbable that it's not worth mentioning. * People are scared to discuss them. * Avoiding creating hypersititous textual attractors * Other reasons?
Note, I consider this post to be “Lynette speculates based on one possible model”, rather than “scientific evidence shows”, based on my default skepticism for psych research.  A recent Astral Codex Ten post argued that advice is written by people who struggle because they put tons of time into understanding the issue. People who succeeded effortlessly don’t have explicit models of how they perform (section II). It’s not the first time I’ve seen this argument, e.g. this Putanumonit post arguing that explicit rules help poor performers, who then abandon the rules and just act intuitively once they become good.  This reminded me of a body of psych research I half-remembered from college called Choking under Pressure.  My memory was that if you think about what you’re doing too much after becoming good, then you do worse. The paper I remembered from college was from 1986, so I found “Choking interventions in sports: A systematic review” from 2017.  It turns out that I was remembering the “self-focused” branch of choking research.  "Self-focus approaches have largely been extended from Baumeister’s (1984) automatic execution hypothesis. Baumeister explains that choking occurs because, when anxiety increases, the athlete allocates conscious attention to movement execution. This conscious attention interferes with otherwise automatic nature of movement execution, which results in performance decrements." (Slightly worrying. I have no particular reason to doubt this body of work, but Baumeister's "willpower as muscle" -- i.e. ego depletion -- work hasn't stood upwell.)  Two studies found that distraction while training negatively impacted performance. I’m not sure if this this was supposed to acclimatize the participants to distractions while performing or reduce their self-focus while training. (I’m taking the paper’s word and not digging beyond the surface on the numbers.) Either way, I feel very little surprise that practicing while distracted was worse. Maybe we just need fatal-car-crash magnitude effects before we notice that focus is good?   Which makes it all the more surprising that seven of eight studies found that athletes performed better under pressure if they simultaneously did a second task (such as counting backwards). (The eighth study found a null result.) According to the theory, the second task helped because it distracted from self-focus on the step-by-step execution.  If this theory holds up, it seems to support paying deliberate attention to explicit rules while learning but *not* paying attention to those rules once you're able to use them intuitively (at least for motor tasks). In other words, almost exactly what Jacob argued in the Putanumonit article. Conclusions  I was intrigued by this argument because I’ve argued that building models is how one becomes an expert.[1] After considering it, I don’t actually think the posts above offer a counter argument to my claim.  My guess is that experts do have models of skills they developed, even if they have fewer models (because they needed to explicitly learn fewer skills). The NDM method for extracting experts’ models implies that the experts have models that can be coaxed out. Holden’s Learning By Writing post feels like an explicit model.  Another possibility is that experts forget the explicit models after switching to intuition. If they faced the challenges more than five or ten years ago, they may not remember the models that helped them then. Probably uncoincidentally, this aligns neatly with Cal Newport’s advice to seek advice from someone who recently went through the challenges you’re now facing because they will still remember relevant advice. Additionally, the areas of expertise I care about aren’t like walking, where most people will effortlessly succeed. Expertise demands improving from where you started. Both posts and the choking under pressure literature agree that explicit models help you improve, at least for a while.  “Find the best explicit models you can and practice until you don’t need them” seems like a reasonable takeaway.  ---------------------------------------- [1] Note, there’s an important distinction between building models of your field and building models of skills. It seems like the main argument mostly applies to models of skills. I doubt Scott would disagree that models of fields are valuable, given how much time he’s put into developing his model of psychopharmacology.

Popular Comments

Recent Discussion

I've been in many conversations where I've mentioned the idea of using neuroscience for outer alignment, and the people who I'm talking to usually seem pretty confused about why I would want to do that. Well, I'm confused about why one wouldn't want to do that, and in this post I explain why.

As far as I see it, there are three main strategies people have for trying to deal with AI alignment in worlds where AI alignment is hard.

  1. Value alignment
  2. Corrigibility
  3. Control/scalable alignment

In my opinion, these are all great efforts, but I personally like the idea of working on value alignment directly. Why? First some negatives of the others:

  1. Corrigibility requires moderate to extreme levels of philosophical deconfusion, an effort worth doing for some, but a very small set not
...

First is that I don't really expect us to come up with a fully general answer to this problem in time. I wouldn't be surprised if we had to trade off some generality for indexing on the system in front of us - this gets us some degree of non-robustness, but hopefully enough to buy us a lot more time before stuff like the problem behind deep deception breaks a lack of True Names. Hopefully then we can get the AI systems to solve the harder problem for us in the time we've bought, with systems more powerful than us. The relevance here is that if this is the

... (read more)
1Bogdan Ionut Cirstea1h
I think the seeds of an interdisciplinary agenda on this are already there, see e.g. https://manifund.org/projects/activation-vector-steering-with-bci, https://www.lesswrong.com/posts/GfZfDHZHCuYwrHGCd/without-fundamental-advances-misalignment-and-catastrophe?commentId=WLCcQS5Jc7NNDqWi5, https://www.lesswrong.com/posts/GfZfDHZHCuYwrHGCd/without-fundamental-advances-misalignment-and-catastrophe?commentId=D6NCcYF7Na5bpF5h5, https://www.lesswrong.com/posts/wr2SxQuRvcXeDBbNZ/bogdan-ionut-cirstea-s-shortform?commentId=A8muL55dYxR3tv5wp and maybe my other comments on this post.  I might have a shortform going into more details on this soon, ot at least by the time of https://foresight.org/2024-foresight-neurotech-bci-and-wbe-for-safe-ai-workshop/. 
1Bogdan Ionut Cirstea1h
It sounds to me like you should seriously consider doing work which might look like e.g.  https://www.lesswrong.com/posts/eruHcdS9DmQsgLqd4/inducing-human-like-biases-in-moral-reasoning-lms, Getting aligned on representational alignment, Training language models to summarize narratives improves brain alignment; also see a lot of recent works from Ilia Sucholutsky and from this workshop.
1Bogdan Ionut Cirstea1h
The similarities go even deeper, I'd say, see e.g. The neuroconnectionist research programme for a review and quite a few of my past linkposts (e.g. on representational alignment and how it could be helpful for value alignment, on evidence of [by default] representational alignment between LLMs and humans, etc.); and https://www.lesswrong.com/posts/eruHcdS9DmQsgLqd4/inducing-human-like-biases-in-moral-reasoning-lms. 

S-risks are barely discussed in LW, is that because:

  • People think they are so improbable that it's not worth mentioning.
  • People are scared to discuss them.
  • Avoiding creating hypersititous textual attractors
  • Other reasons?

Introducing Devin

Is the era of AI agents writing complex code systems without humans in the loop upon us?

Cognition is calling Devin ‘the first AI software engineer.’

Here is a two minute demo of Devin benchmarking LLM performance.

Devin has its own web browser, which it uses to pull up documentation.

Devin has its own code editor.

Devin has its own command line.

Devin uses debugging print statements and uses the log to fix bugs.

Devin builds and deploys entire stylized websites without even being directly asked.

What could possibly go wrong? Install this on your computer today.

Padme.

The Real Deal

I would by default assume all demos were supremely cherry-picked. My only disagreement with Austen Allred’s statement here is that this rule is not new:

Austen Allred: New rule:

If someone only shows their AI model in tightly

...

Hasn't that happened?

2Gerald Monroe6h
Is Devin using GPT-4, GPT-4T, or one of the 2 currently available long context models, Claude Opus 200k or Gemini 1.5?   March 14, 2023 is GPT-4, but the "long" context was expensive and initially unavailable to anyone Reason that matters is November 6, 2023 is the announcement for GPT-4T, which is 128k context.   Feb 15, 2024 is Gemini 1.5 LC March 4, 2024 is Claude 200k is  That makes the timeline less than 4 months, and remember there's a few weeks generally between "announcement" and "here's your opportunity to pay for tokens with an API key". The prompting structure and meta-analysis for "Devin" was likely in the works since GPT-4, but without the long context you can't fit: [system prompt forced on you] ['be an elite software engineer' prompt] [ issue description] [ main source file ] [ data structures referenced in the main source file ] [ first attempt to fix ] [ compile or unit test outputs ] In practice I found that I need Opus 200k to even try when I do the above by hand. Also remember, GPT-4 128k starts failing near the end of it's context window, the full 128k is not usable:
1Random Developer9h
  I have interviewed a fair number of programmers, and I've definitely seen plenty of people who talked a good game but who couldn't write FizzBuzz (or sum the numbers in an array). And this was stacking the deck in their favor: They could use a programming language of their choice, plus a real editor, and if they appeared unable to deal with coding in front of people, I'd go sit on the other side of the office and let them work for a bit. I do not think these people were representative of the average working programmer, based on my experiences consulting at a variety of companies. The average engineer can write code.  
2Seth Herd10h
Very little alignment work of note, despite tons of published work on developing agents. I'm puzzled as to why the alignment community hasn't turned more of their attention toward language model cognitive architectures/agents, but I'm also reluctant to publish more work advertising how easily they might achieve AGI. ARC Evals did set up a methodology for Evaluating Language-Model Agents on Realistic Autonomous Tasks. I view this as a useful acknowledgment of the real danger of better LLMs, but I think it's inherently inadequate, because it's based on the evals team doing the scaffolding to make the LLM into an agent. They're not going to be able to devote nearly as much time to that as other groups will down the road. New capabilities are certainly going to be developed by combinations of LLM improvements, and hard work at improving the cognitive architecture scaffolding around them.

(Crossposted by habryka after asking Eliezer whether I could post it under his account)

i.

"Ignore all these elaborate, abstract, theoretical predictions," the Spokesperson for Ponzi Pyramid Incorporated said in a firm, reassuring tone.  "Empirically, everyone who's invested in Bernie Bankman has received back 144% of what they invested two years later."

"That's not how 'empiricism' works," said the Epistemologist.  "You're still making the assumption that --"

"You could only believe that something different would happen in the future, if you believed in elaborate theoretical analyses of Bernie Bankman's unobservable internal motives and internal finances," said the spokesperson for Ponzi Pyramid Incorporated.  "If you are a virtuous skeptic who doesn't trust in overcomplicated arguments, you'll believe that future investments will also pay back 144%, just like in the past.  That's the...

Apparently Eliezer decided to not take the time to read e.g. Quintin Pope's actual critiques, but he does have time to write a long chain of strawmen and smears-by-analogy.

A lot of Quintin Pope's critiques are just obviously wrong and lots of commenters were offering to help correct them. In such a case, it seems legitimate to me for a busy person to request that Quintin sorts out the problems together with the commenters before spending time on it. Even from the perspective of correcting and informing Eliezer, people can more effectively be corrected a... (read more)

11habryka13h
I don't think this essay is commenting on AI optimists in-general. It is commenting on some specific arguments that I have seen around, but I don't really see how it relates to the recent stuff that Quintin, Nora or you have been writing (and I would be reasonably surprised if Eliezer intended it to apply to that). You can also leave it up to the reader to decide whether and when the analogy discussed here applies or not. I could spend a few hours digging up people engaging in reasoning really very closely to what is discussed in this article, though by default I am not going to.
8Zack_M_Davis13h
In general, the "market" for criticism just doesn't seem very efficient at all! You might have hoped that people would mostly agree about what constitutes a flaw, critics would compete to find flaws in order to win status, and authors would learn not to write posts with flaws in them (in order to not lose status to the critics competing to point out flaws). I wonder which part of the criticism market is failing: is it more that people don't agree about what constitutes a flaw, or that authors don't have enough of an incentive to care, or something else? We seem to end up with a lot of critics who specialize in detecting a specific kind of flaw ("needs examples" guy, "reward is not the optimization target" guy, "categories aren't arbitrary" guy, &c.), with very limited reaction from authors or imitation by other potential critics.
4kave12h
My quick guess is that people don't agree about what constitutes a (relevant) flaw. (And there are lots of irrelevant flaws so you can't just check for the existence of any flaws at all). I think if people could agree, the authorial incentives would follow. I'm fairly sympathetic to the idea that readers aren't incentivised to correctly agree on what consitutes a flaw.
This is a linkpost for https://dynomight.net/axes/

Say you want to plot some data. You could just plot it by itself:

Or you could put lines on the left and bottom:

Or you could put lines everywhere:

Or you could be weird:

Which is right? Many people treat this as an aesthetic choice. But I’d like to suggest an unambiguous rule.

Principles

First, try to accept that all axis lines are optional. I promise that readers will recognize a plot even without lines around it.

So consider these plots:

Which is better? I claim this depends on what you’re plotting. To answer, mentally picture these arrows:

Now, ask yourself, are the lengths of these arrows meaningful? When you draw that horizontal line, you invite people to compare those lengths.

You use the same principle for deciding if you should draw a y-axis line. As...

My first impression was also that axis lines are a matter of aesthetics. But then I browsed The Economist's visual styleguide and realized they also do something similar, i.e. omit the y-axis line (in fact, they omit the y-axis line on basically all their line / scatter plots, but almost always maintain the gridlines). 

Here's also an article they ran about their errors in data visualization, albeit probably fairly introductory for the median LW reader.

1Jufda3h
This could be expanded to show good examples of using logarithmic scale.
2Ruby9h
Curated. Beyond the object level arguments for how to do plots here that are pretty interesting, I like this post for the periodic reminder/extra evidence that relatively "minor" details in how information is presented can nudge/bias interpretation and understanding. I think the claims around bordering lines become strongly true if there were established convention, and more weakly so the way currently are. Obviously one ought to be conscious in reading and creating graphs for whether 0 is included.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Churchill famously called democracy “the worst form of Government except for all those other forms that have been tried from time to time” - referring presumably to the relative success of his native Britain, the US, and more generally Western Europe and today most of the first world.

I claim that Churchill was importantly wrong. Not (necessarily) wrong about the relative success of Britain/US/etc, but about those countries’ governments being well-described as simple democracy. Rather, I claim, the formula which has worked well in e.g. Britain and the US diverges from pure democracy in a crucial load-bearing way; that formula works better than pure democracy both in theory and in practice, and when thinking about good governance structures we should emulate the full formula rather than pure democracy.

Specifically, the actual...

I think this is an interesting point of view. The OP is interested in how this concept of checked democracy might work within a corporation. From a position of ignorance can I ask whether anyone familiar with German corporate governance recognises this mode of democracy within German organisations? I choose Germany because large German companies historically incorporate significant worker representation within their governance structures, and, historically, tend to  perform well.

4jmh7h
I'm not entirely sure the thesis quite captured exactly what was going on. It's true balancing the factions was a big deal to the founders and there were number of ways one can cast the USA into some dichotomous buckets -- North/South (which is largely industrial/agrarian) or the Federalist/Anit Federalist and probably some others. But the other point of the separation of powers and the nature of the bicameral struture was about checks and balaces both within the population and within government itself. In that sense I agree one can cast the position as some type of veto for the large minority but it was probably more about just increasing the costs of passing legislation at the federal level.  An interesting compare/contrast here might be looking at the federal level and then looking at the States. The idea probably also needs to be run through the lens of modern political economy (Public Choice/Social Choice) theory as many of the conclusion from that literature is that in general the majority is hardly ever really doing anything -- special interests and narrow factions are in more control. I think it was Knut Wicksell that suggested the idea that Constitutions should have a rule whereby legislation didn't pass with just a simple majority but needed some higher level of approval, e.g., 60%. But he didn't stop there. The Constitution would then allow a smaller number of people repeal the law. So if once implemented and 15% of the legislature were getting ear fulls from their constituents they could force the repeal of the law with a vote and only need to meet that 15% theashold. I don't think that was ever implement and no idea just how seriously it was discussed but clearly is about providing that type of veto power to a minority that might be feeling abused. The other thing to point at was the political and something of a constitutional crises that arose in the early 1830s in the Tarrifs of Abomination. The South hated that and I think it came close to Civil
3quiet_NaN10h
The failure mode of having a lot of veto-holders is that  nothing ever gets done. Which is fine if you are happy with the default state of affairs, but not so fine if you prefer not to run your state on the default budget of zero.  There are some international organizations heavily reliant on veto powers, the EU and the UN Security Council come to mind, and to a lesser degree NATO (as far as the admission of new members in concerned).  None of these are unmitigated success stories. From my understanding, getting stuff done in the EU means bribing or threatening every state who does not particularly benefit from whatever you want to do.  Likewise, getting Turkey to allow Sweden to join NATO was kind of difficult, from what I remember. Not very surprisingly, if you have to get 30 factions to agree on something, one will be likely to object for good or bad reasons.  The UN Security Council with its five veto-bearing permanent members does not even make a pretense at democratic legitimation. The three states with the biggest nuclear arsenals, plus two nuclear countries which used to be colonial powers. The nicest thing one can say about that arrangement is that it failed to start WW III, and in a few cases passed a resolution against some war criminal who did not have the backing of any of the veto powers. I think veto powers as part of a system of checks and balances are good in moderation, but add to many of them and you end up with a stalemate. -- I also do not think that the civil war could have been prevented by stacking the deck even more in favor of the South. Sooner or later the industrial economy in the North would have overtaken the slave economy in the South. At best, the North might have seceded in disgust, resulting in the South on track to become some rural backwater.
3Random Developer9h
Yes, there's actually some research into this area: https://www.jstor.org/stable/j.ctt7rvv7 "Veto Players: How Political Institutions Work". The theory apparently suggested that if you have too many "veto players", your government quickly becomes unable to act. And I suspect that states which are unable to act are vulnerable to major waves of public discontent during perceived crises.

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

Post upvotes are at the bottom but user comment upvotes are at the top of each comment. Sometimes I'll read a very long comment and then have to scroll aaaaall the way back up to upvote it. Is there some reason for this that I'm missing or is it just an oversight?

4Nathan Young8h
Yeah and a couple of relevant things: 1. The time EA sexual abuse article includes 1 person who isn't an EA and a sort of vibe that iirc includes most tech houses in the bay in the heading of "EA". This is inaccurate. 2. EA takes a pretty strong stance on sexual harassment. Look at what people are banned from the forum for and scale it up. I've heard about people being banned from events for periods of time for causing serial discomfort. Compare this to Church communities I've been a part of and political communities and this is much stricter. 

I have anxiety and depression.

The kind that doesn’t go away, and you take pills to manage.

This is not a secret.

What’s more interesting is that I just switched medications from one that successfully managed the depression but not the anxiety to one that successfully manages the anxiety but not the depression, giving me a brief window to see my two comorbid conditions separated from each other, for the first time since ever.

What follows is a (brief) digression on what they’re like from the inside.

Depression

I’m still me when I’m depressed.

Just a version of me that’s sapped of all initiative, energy, and tolerance for human contact.

There are plenty of metaphors for depression - a grey fog being one of the most popular - but I often think of it in...

Just me following up with myself wrt what the post made me think about: it’s as if there are two ways of being anxious, one where you feel sort of frazzled and hectic all the time (‘I need to do more of that stuff, and do it better, or something bad will happen’), and one where you just retreat to safety (‘There’s nothing I can do that wouldn’t come with an exceedingly high risk of something bad happening’). It’s quite clear that the former could lead someone to being an overachiever and doing masses of great stuff (while still, unfortunately, feeling like... (read more)

1Jalen Lyle-Holmes8h
These are really good descriptions! (Going by my own and friends' experience). For me I might just tweak it to put anxiety as the height rather than the gravity. Thank you for writing these up!