Building gears-level models is expensive - often prohibitively expensive. Black-box approaches are usually cheaper and faster. But black-box approaches rarely generalize - they need to be rebuilt when conditions change, don’t identify unknown unknowns, and are hard to build on top of. Gears-level models, on the other hand, offer permanent, generalizable knowledge which can be applied to many problems in the future, even if conditions shift.

gwern1dΩ276025
6
Warning for anyone who has ever interacted with "robosucka" or been solicited for a new podcast series in the past few years: https://www.tumblr.com/rationalists-out-of-context/744970106867744768/heads-up-to-anyone-whos-spoken-to-this-person-i
The idea that maximally-coherent agents look like squiggle-maximizers raises the question: what would it look like for humans to become maximally coherent? One answer, which Yudkowsky gives here, is that conscious experiences are just a "weird and more abstract and complicated pattern that matter can be squiggled into". But that seems to be in tension with another claim he makes, that there's no way for one agent's conscious experiences to become "more real" except at the expense of other conscious agents—a claim which, according to him, motivates average utilitarianism across the multiverse. Clearly a squiggle-maximizer would not be an average squigglean. So what's the disanalogy here? It seems like @Eliezer Yudkowsky is basically using SSA, but comparing between possible multiverses—i.e. when facing the choice between creating agent A or not, you look at the set of As in the multiverse where you decided yes, and compare it to the set of As in the multiverse where you decided no, and (if you're deciding for the good of A) you pick whichever one gives A a better time on average. Yudkowsky has written before (can't find the link) that he takes this approach because alternatives would entail giving up on predictions about his future experiences—e.g. constantly predicting he's a Boltzmann brain and will dissolve in the next second. But this argument by Wei Dai shows that agents which reason in this way can be money-pumped by creating arbitrarily short-lived copies of them. Based on this I claim that Yudkowsky's preferences are incoherent, and that the only coherent thing to do here is to "expect to be" a given copy in proportion to the resources it will have available, as anthropic decision theory claims. (Incidentally, this also explains why we're at the hinge of history.) But this is just an answer, it doesn't dissolve the problem. What could? Some wild guesses: 1. You are allowed to have preferences about the external world, and you are allowed to have preferences about your "thread of experience"—you're just not allowed to have both. The incoherence comes from trying to combine the two; the coherent thing to do would be to put them into different agents, who will then end up in very different parts of the multiverse. 2. Another way of framing this: you are allowed to be a decision-maker, and you are allowed to be a repository of welfare, but you're not allowed to be both (on pain of incoherence/being dutch-booked). 3. Something totally different: the problem here is that we don't have intuitive experience of being agents which can copy themselves, shut down copies, re-merge, etc. If we did, then maybe SSA would seem as silly as expecting to end up in a different universe whenever we went to sleep. 4. Actually, maybe the operative thing we lack experience with is not just splitting into different subagents, but rather merging together afterwards. What does it feel like to have been thousands of different parallel agents, and now be a single agent with their unified experiences? What sort of identity would one construct in that situation? Maybe this is an important part of dissolving the problem.
Today in Azathoth news: "Eurasian hoopoes raise extra chicks so they can be eaten by their siblings" It seems that the hoopoes lay extra eggs in times of abundance — more than they would be able to see through to fledging — as a way of storing up food for the older siblings. It is rather gruesomely called the "larder" hypothesis. > “What surprised me the most was the species practicing this aggressive parenting,” says Vladimir Pravosudov, an ecologist at the University of Nevada, Reno. Hoopoes primarily eat insects, he notes, so their long, curved bills aren’t ideal for killing and eating chicks. That might be why, Soler says, mother hoopoes often grab the unlucky chick and shove it into the mouth of an older chick, which swallows it whole. Literal baby-eaters!
Note, I consider this post to be “Lynette speculates based on one possible model”, rather than “scientific evidence shows”, based on my default skepticism for psych research.  A recent Astral Codex Ten post argued that advice is written by people who struggle because they put tons of time into understanding the issue. People who succeeded effortlessly don’t have explicit models of how they perform (section II). It’s not the first time I’ve seen this argument, e.g. this Putanumonit post arguing that explicit rules help poor performers, who then abandon the rules and just act intuitively once they become good.  This reminded me of a body of psych research I half-remembered from college called Choking under Pressure.  My memory was that if you think about what you’re doing too much after becoming good, then you do worse. The paper I remembered from college was from 1986, so I found “Choking interventions in sports: A systematic review” from 2017.  It turns out that I was remembering the “self-focused” branch of choking research.  "Self-focus approaches have largely been extended from Baumeister’s (1984) automatic execution hypothesis. Baumeister explains that choking occurs because, when anxiety increases, the athlete allocates conscious attention to movement execution. This conscious attention interferes with otherwise automatic nature of movement execution, which results in performance decrements." (Slightly worrying. I have no particular reason to doubt this body of work, but Baumeister's "willpower as muscle" -- i.e. ego depletion -- work hasn't stood upwell.)  Two studies found that distraction while training negatively impacted performance. I’m not sure if this this was supposed to acclimatize the participants to distractions while performing or reduce their self-focus while training. (I’m taking the paper’s word and not digging beyond the surface on the numbers.) Either way, I feel very little surprise that practicing while distracted was worse. Maybe we just need fatal-car-crash magnitude effects before we notice that focus is good?   Which makes it all the more surprising that seven of eight studies found that athletes performed better under pressure if they simultaneously did a second task (such as counting backwards). (The eighth study found a null result.) According to the theory, the second task helped because it distracted from self-focus on the step-by-step execution.  If this theory holds up, it seems to support paying deliberate attention to explicit rules while learning but *not* paying attention to those rules once you're able to use them intuitively (at least for motor tasks). In other words, almost exactly what Jacob argued in the Putanumonit article. Conclusions  I was intrigued by this argument because I’ve argued that building models is how one becomes an expert.[1] After considering it, I don’t actually think the posts above offer a counter argument to my claim.  My guess is that experts do have models of skills they developed, even if they have fewer models (because they needed to explicitly learn fewer skills). The NDM method for extracting experts’ models implies that the experts have models that can be coaxed out. Holden’s Learning By Writing post feels like an explicit model.  Another possibility is that experts forget the explicit models after switching to intuition. If they faced the challenges more than five or ten years ago, they may not remember the models that helped them then. Probably uncoincidentally, this aligns neatly with Cal Newport’s advice to seek advice from someone who recently went through the challenges you’re now facing because they will still remember relevant advice. Additionally, the areas of expertise I care about aren’t like walking, where most people will effortlessly succeed. Expertise demands improving from where you started. Both posts and the choking under pressure literature agree that explicit models help you improve, at least for a while.  “Find the best explicit models you can and practice until you don’t need them” seems like a reasonable takeaway.  ---------------------------------------- [1] Note, there’s an important distinction between building models of your field and building models of skills. It seems like the main argument mostly applies to models of skills. I doubt Scott would disagree that models of fields are valuable, given how much time he’s put into developing his model of psychopharmacology.
Reposting myself from discord, on the topic of donating 5000$ to EA causes. > if you're doing alignment research, even just a bit, then the 5000$ are plobly better spent on yourself > > if you have any gears level model of AI stuff then it's better value to pick which alignment org to give to yourself; charity orgs are vastly understaffed and you're essentially contributing to the "picking what to donate to" effort by thinking about it yourself > > if you have no gears level model of AI then it's hard to judge which alignment orgs it's helpful to donate to (or, if giving to regranters, which regranters are good at knowing which alignment orgs to donate to) > > as an example of regranters doing massive harm: openphil gave 30M$ to openai at a time where it was critically useful to them, (supposedly in order to have a chair on their board, and look how that turned out when the board tried to yeet altman) > > i know of at least one person who was working in regranting and was like "you know what i'd be better off doing alignment research directly" — imo this kind of decision is probly why regranting is so understaffed > > it takes technical knowledge to know what should get money, and once you have technical knowledge you realize how much your technical knowledge could help more directly so you do that, or something

Popular Comments

Recent Discussion

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

Yeah and a couple of relevant things:

  1. The time EA sexual abuse article includes 1 person who isn't an EA and a sort of vibe that iirc includes most tech houses in the bay in the heading of "EA". This is inaccurate.
  2. EA takes a pretty strong stance on sexual harassment. Look at what people are banned from the forum for and scale it up. I've heard about people being banned from events for periods of time for causing serial discomfort. Compare this to Church communities I've been a part of and political communities and this is much stricter. 

An LLM is a simulator for token-generation processes, generally ones that are human-like agents. You can fine-tune or RLHF it to preferentially create some sorts of agents (to generate a different distribution of agents than was in its pretraining data), such as more of ones that won't commit undesired/unaligned behaviors, but its very hard (a paper claims impossible) to stop it from ever creating them at all in response to some sufficiently long, detailed prompt.

Suppose we didn't really try. Let us assume that we mildly fine-tune/RLHF the LLM to normally prefer simulating helpful agents who helpfully, honestly, and harmlessly answer questions, but we acknowledge that there are still prompts/other text inputs/conversations that may cause it to instead start generating tokens from, say, a supervillain (like the prompt...

One issue is figuring out who will watch the supervillain light. If we need someone monitoring everything the AI does, that puts some serious limits on what we can do with it (we can't use the AI for anything that we want to be cheaper than a human, or anything that requires superhuman response speed).

Lex Fridman posts timestamped transcripts of his interviews. It's an 83 minute read here and a 115 minute watch on Youtube.

It's neat to see Altman's side of the story. I don't know whether his charisma is more like +2SD or +5SD above the average American (concept origin: planecrash, likely doesn't actually follow a normal distribution in reality), and I only have a vague grasp of what kinds of things +5SDish types can do when they pull out the stops in face-to-face interactions, so maybe you'll prefer to read the transcript over watching the video.

If you've missed it, Gwern's side of the story is here.

Lex Fridman(00:01:05) Take me through the OpenAI board saga that started on Thursday, November 16th, maybe Friday, November 17th for you.

Sam Altman(00:01:13) That was

...

This post was produced as part of the Astra Fellowship under the Winter 2024 Cohort, mentored by Richard Ngo. Thanks to Martín Soto, Jeremy Gillien, Daniel Kokotajlo, and Lukas Berglund for feedback.

Summary

Discussions around the likelihood and threat models of AI existential risk (x-risk) often hinge on some informal concept of a “coherent”, goal-directed AGI in the future maximizing some utility function unaligned with human values. Whether and how coherence may develop in future AI systems, especially in the era of LLMs, has been a subject of considerable debate. In this post, we provide a preliminary mathematical definition of the coherence of a policy as how likely it is to have been sampled via uniform reward sampling (URS), or uniformly sampling a reward function and then sampling from the set...

7Garrett Baker4h
I have thought some about how to measure the "coherence" of a policy in an MDP. One nice approach I came to was summing up the absolute values of the real parts of the eigenvalues of the corresponding MDP matrix with & without the policy present. The lower this is, the more coherent a policy. It seemed to work well for my purposes, but I haven't subjected it to much strain yet.
5Garrett Baker4h
The eigenvalues are a measure of how quickly the MDP reaches a steady state. This works when you know the goals of your network are to achieve a particular state in the MDP as fast as possible, and stay there. Edit: I think this also works if your model has the goal to achieve a particular distribution of states too.
1dx261h
Right, I think this somewhat corresponds to the "how long it takes a policy to reach a stable loop" (the "distance to loop" metric), which we used in our experiments. What did you use your coherence definition for?

Its a long story, but I wanted to see what the functional landscape of coherence looked like for goal misgeneralizing RL environments after doing essential dynamics. Results forthcoming.

I'm looking for computer games that involve strategy, resource management, hidden information, and management of "value of information" (i.e. figuring out when to explore or exploit), which:

  • *can* be beaten in 30 – 120 minutes on your first try (or, there's a clear milestone that's about that long)
  • but, it'd be pretty hard to do so unless you are trying really hard. Even if a pretty savvy gamer shouldn't be able to by default.

This is for my broader project of "have a battery of exercises that train/test people's general reasoning on openended problems." Each exercise should ideally be pretty different from the other ones.

In this case, I don't expect anyone to have such a game that they have beaten on their first try, but, I'm looking for games where this seems at least plausible, if you were taking a long time to think each turn, or pausing a lot.

The strategy/resource/value-of-information aspect is meant to correspond to some real world difficulties of running longterm ambitious planning.


(One example game that's been given to me in this category is "Luck Be a Landlord")

1seed3h
Long Live the Queen takes about 4 hours. It would take some luck to beat it on the first try, but generally you win by using common sense and training useful skills.
3Andrew Currall15h
I think I disagree with this comment. StS really does have hidden information and tradeoffs, as you don't know what you will encounter later in the run. Very often the value of a card depends on what cards you are offered later, or even which bosses you face.
1Dweomite4h
Unless I'm mistaken, StS does not have any game actions the player can take to learn information about future encounters or rewards in advance.  Future encounters are well-modeled as simple random events, rather than lurking variables (unless we're talking about reverse-engineering the PRNG, which I'm assuming is out-of-scope). It therefore does not demonstrate the concept of value-of-information.  The player can make bets, but cannot "scout". (Though there might be actions a first-time player can take to help pin down the rules of the game, that an experienced player would already know; I'm unclear on whether that counts for purposes of this exercise.)

(Though there might be actions a first-time player can take to help pin down the rules of the game, that an experienced player would already know; I'm unclear on whether that counts for purposes of this exercise.)

I think one thing I meant in the OP was more about "the player can choose to spend more time modeling the situation." Is it worth spending an extra 15 minutes thinking about how the longterm game might play out, and what concerns you may run into that you aren't currently modeling? I dunno! Depends on how much better you become at playing the game, by spending those 15 minutes.

This is maybe a nonstandard use of "value of information", but I think it counts.

This is a linkpost for https://dynomight.net/axes/

Say you want to plot some data. You could just plot it by itself:

Or you could put lines on the left and bottom:

Or you could put lines everywhere:

Or you could be weird:

Which is right? Many people treat this as an aesthetic choice. But I’d like to suggest an unambiguous rule.

Principles

First, try to accept that all axis lines are optional. I promise that readers will recognize a plot even without lines around it.

So consider these plots:

Which is better? I claim this depends on what you’re plotting. To answer, mentally picture these arrows:

Now, ask yourself, are the lengths of these arrows meaningful? When you draw that horizontal line, you invite people to compare those lengths.

You use the same principle for deciding if you should draw a y-axis line. As...

Curated. Beyond the object level arguments for how to do plots here that are pretty interesting, I like this post for the periodic reminder/extra evidence that relatively "minor" details in how information is presented can nudge/bias interpretation and understanding.

I think the claims around bordering lines become strongly true if there were established convention, and more weakly so the way currently are. Obviously one ought to be conscious in reading and creating graphs for whether 0 is included.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
This is a linkpost for https://x.ai/blog/grok-os

We are releasing the base model weights and network architecture of Grok-1, our large language model. Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.

This is the raw base model checkpoint from the Grok-1 pre-training phase, which concluded in October 2023. This means that the model is not fine-tuned for any specific application, such as dialogue.

We are releasing the weights and the architecture under the Apache 2.0 license.

To get started with using the model, follow the instructions at github.com/xai-org/grok.

Model Details

  • Base model trained on a large amount of text data, not fine-tuned for any particular task.
  • 314B parameter Mixture-of-Experts model with 25% of the weights active on a given token.
  • Trained from scratch by xAI using a custom training stack on top of JAX and Rust
...

This way it's probably smarter given its compute and a more instructive exercise before scaling further than a smaller model would've been. Makes sense if the aim is to out-scale others more quickly instead of competing at smaller scale, and if this model wasn't meant to last.

Churchill famously called democracy “the worst form of Government except for all those other forms that have been tried from time to time” - referring presumably to the relative success of his native Britain, the US, and more generally Western Europe and today most of the first world.

I claim that Churchill was importantly wrong. Not (necessarily) wrong about the relative success of Britain/US/etc, but about those countries’ governments being well-described as simple democracy. Rather, I claim, the formula which has worked well in e.g. Britain and the US diverges from pure democracy in a crucial load-bearing way; that formula works better than pure democracy both in theory and in practice, and when thinking about good governance structures we should emulate the full formula rather than pure democracy.

Specifically, the actual...

3quiet_NaN2h
The failure mode of having a lot of veto-holders is that  nothing ever gets done. Which is fine if you are happy with the default state of affairs, but not so fine if you prefer not to run your state on the default budget of zero.  There are some international organizations heavily reliant on veto powers, the EU and the UN Security Council come to mind, and to a lesser degree NATO (as far as the admission of new members in concerned).  None of these are unmitigated success stories. From my understanding, getting stuff done in the EU means bribing or threatening every state who does not particularly benefit from whatever you want to do.  Likewise, getting Turkey to allow Sweden to join NATO was kind of difficult, from what I remember. Not very surprisingly, if you have to get 30 factions to agree on something, one will be likely to object for good or bad reasons.  The UN Security Council with its five veto-bearing permanent members does not even make a pretense at democratic legitimation. The three states with the biggest nuclear arsenals, plus two nuclear countries which used to be colonial powers. The nicest thing one can say about that arrangement is that it failed to start WW III, and in a few cases passed a resolution against some war criminal who did not have the backing of any of the veto powers. I think veto powers as part of a system of checks and balances are good in moderation, but add to many of them and you end up with a stalemate. -- I also do not think that the civil war could have been prevented by stacking the deck even more in favor of the South. Sooner or later the industrial economy in the North would have overtaken the slave economy in the South. At best, the North might have seceded in disgust, resulting in the South on track to become some rural backwater.

I think veto powers as part of a system of checks and balances are good in moderation, but add to many of them and you end up with a stalemate.

Yes, there's actually some research into this area: https://www.jstor.org/stable/j.ctt7rvv7 "Veto Players: How Political Institutions Work". The theory apparently suggested that if you have too many "veto players", your government quickly becomes unable to act.

And I suspect that states which are unable to act are vulnerable to major waves of public discontent during perceived crises.

3Michael Roe4h
The Northen Ireland Assembly works this way, at least for some things.   But, in general, the U.K. does not work that way. A particular political party sometimes gets a big majority.
322tom4h
I think this is similar to the governance arrangement in Northern Ireland that ended the troubles (for the most part). Both sides need to share power in order to govern. If one side is perceived to go too far then the other can resign from government, effectively vetoing the other.

Introducing Devin

Is the era of AI agents writing complex code systems without humans in the loop upon us?

Cognition is calling Devin ‘the first AI software engineer.’

Here is a two minute demo of Devin benchmarking LLM performance.

Devin has its own web browser, which it uses to pull up documentation.

Devin has its own code editor.

Devin has its own command line.

Devin uses debugging print statements and uses the log to fix bugs.

Devin builds and deploys entire stylized websites without even being directly asked.

What could possibly go wrong? Install this on your computer today.

Padme.

The Real Deal

I would by default assume all demos were supremely cherry-picked. My only disagreement with Austen Allred’s statement here is that this rule is not new:

Austen Allred: New rule:

If someone only shows their AI model in tightly

...
3MichaelDickens2h
I've heard it argued that this isn't representative of the programming population. Rather, people who suck at programming (and thus can't get jobs) apply to way more positions than people who are good at programming. I have no idea if it's true, but it sounds plausible.

Rather, people who suck at programming (and thus can't get jobs) apply to way more positions than people who are good at programming.

 

I have interviewed a fair number of programmers, and I've definitely seen plenty of people who talked a good game but who couldn't write FizzBuzz (or sum the numbers in an array). And this was stacking the deck in their favor: They could use a programming language of their choice, plus a real editor, and if they appeared unable to deal with coding in front of people, I'd go sit on the other side of the office and let th... (read more)

3Ozyrus3h
Any new safety studies on LMCA’s?
2Seth Herd2h
Very little alignment work of note, despite tons of published work on developing agents. I'm puzzled as to why the alignment community hasn't turned more of their attention toward language model cognitive architectures/agents, but I'm also reluctant to publish more work advertising how easily they might achieve AGI. ARC Evals did set up a methodology for Evaluating Language-Model Agents on Realistic Autonomous Tasks. I view this as a useful acknowledgment of the real danger of better LLMs, but I think it's inherently inadequate, because it's based on the evals team doing the scaffolding to make the LLM into an agent. They're not going to be able to devote nearly as much time to that as other groups will down the road. New capabilities are certainly going to be developed by combinations of LLM improvements, and hard work at improving the cognitive architecture scaffolding around them.