Building gears-level models is expensive - often prohibitively expensive. Black-box approaches are usually cheaper and faster. But black-box approaches rarely generalize - they need to be rebuilt when conditions change, don’t identify unknown unknowns, and are hard to build on top of. Gears-level models, on the other hand, offer permanent, generalizable knowledge which can be applied to many problems in the future, even if conditions shift.
If it’s worth saying, but not worth its own post, here's a place to put it.
If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.
If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.
If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.
The Open Thread tag is here. The Open Thread sequence is here.
Yeah and a couple of relevant things:
An LLM is a simulator for token-generation processes, generally ones that are human-like agents. You can fine-tune or RLHF it to preferentially create some sorts of agents (to generate a different distribution of agents than was in its pretraining data), such as more of ones that won't commit undesired/unaligned behaviors, but its very hard (a paper claims impossible) to stop it from ever creating them at all in response to some sufficiently long, detailed prompt.
Suppose we didn't really try. Let us assume that we mildly fine-tune/RLHF the LLM to normally prefer simulating helpful agents who helpfully, honestly, and harmlessly answer questions, but we acknowledge that there are still prompts/other text inputs/conversations that may cause it to instead start generating tokens from, say, a supervillain (like the prompt...
One issue is figuring out who will watch the supervillain light. If we need someone monitoring everything the AI does, that puts some serious limits on what we can do with it (we can't use the AI for anything that we want to be cheaper than a human, or anything that requires superhuman response speed).
Lex Fridman posts timestamped transcripts of his interviews. It's an 83 minute read here and a 115 minute watch on Youtube.
It's neat to see Altman's side of the story. I don't know whether his charisma is more like +2SD or +5SD above the average American (concept origin: planecrash, likely doesn't actually follow a normal distribution in reality), and I only have a vague grasp of what kinds of things +5SDish types can do when they pull out the stops in face-to-face interactions, so maybe you'll prefer to read the transcript over watching the video.
If you've missed it, Gwern's side of the story is here.
...Lex Fridman(00:01:05) Take me through the OpenAI board saga that started on Thursday, November 16th, maybe Friday, November 17th for you.
Sam Altman(00:01:13) That was
This post was produced as part of the Astra Fellowship under the Winter 2024 Cohort, mentored by Richard Ngo. Thanks to Martín Soto, Jeremy Gillien, Daniel Kokotajlo, and Lukas Berglund for feedback.
Discussions around the likelihood and threat models of AI existential risk (x-risk) often hinge on some informal concept of a “coherent”, goal-directed AGI in the future maximizing some utility function unaligned with human values. Whether and how coherence may develop in future AI systems, especially in the era of LLMs, has been a subject of considerable debate. In this post, we provide a preliminary mathematical definition of the coherence of a policy as how likely it is to have been sampled via uniform reward sampling (URS), or uniformly sampling a reward function and then sampling from the set...
Its a long story, but I wanted to see what the functional landscape of coherence looked like for goal misgeneralizing RL environments after doing essential dynamics. Results forthcoming.
I'm looking for computer games that involve strategy, resource management, hidden information, and management of "value of information" (i.e. figuring out when to explore or exploit), which:
This is for my broader project of "have a battery of exercises that train/test people's general reasoning on openended problems." Each exercise should ideally be pretty different from the other ones.
In this case, I don't expect anyone to have such a game that they have beaten on their first try, but, I'm looking for games where this seems at least plausible, if you were taking a long time to think each turn, or pausing a lot.
The strategy/resource/value-of-information aspect is meant to correspond to some real world difficulties of running longterm ambitious planning.
(One example game that's been given to me in this category is "Luck Be a Landlord")
(Though there might be actions a first-time player can take to help pin down the rules of the game, that an experienced player would already know; I'm unclear on whether that counts for purposes of this exercise.)
I think one thing I meant in the OP was more about "the player can choose to spend more time modeling the situation." Is it worth spending an extra 15 minutes thinking about how the longterm game might play out, and what concerns you may run into that you aren't currently modeling? I dunno! Depends on how much better you become at playing the game, by spending those 15 minutes.
This is maybe a nonstandard use of "value of information", but I think it counts.
Say you want to plot some data. You could just plot it by itself:
Or you could put lines on the left and bottom:
Or you could put lines everywhere:
Or you could be weird:
Which is right? Many people treat this as an aesthetic choice. But I’d like to suggest an unambiguous rule.
First, try to accept that all axis lines are optional. I promise that readers will recognize a plot even without lines around it.
So consider these plots:
Which is better? I claim this depends on what you’re plotting. To answer, mentally picture these arrows:
Now, ask yourself, are the lengths of these arrows meaningful? When you draw that horizontal line, you invite people to compare those lengths.
You use the same principle for deciding if you should draw a y-axis line. As...
Curated. Beyond the object level arguments for how to do plots here that are pretty interesting, I like this post for the periodic reminder/extra evidence that relatively "minor" details in how information is presented can nudge/bias interpretation and understanding.
I think the claims around bordering lines become strongly true if there were established convention, and more weakly so the way currently are. Obviously one ought to be conscious in reading and creating graphs for whether 0 is included.
...We are releasing the base model weights and network architecture of Grok-1, our large language model. Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.
This is the raw base model checkpoint from the Grok-1 pre-training phase, which concluded in October 2023. This means that the model is not fine-tuned for any specific application, such as dialogue.
We are releasing the weights and the architecture under the Apache 2.0 license.
To get started with using the model, follow the instructions at github.com/xai-org/grok.
Model Details
- Base model trained on a large amount of text data, not fine-tuned for any particular task.
- 314B parameter Mixture-of-Experts model with 25% of the weights active on a given token.
- Trained from scratch by xAI using a custom training stack on top of JAX and Rust
This way it's probably smarter given its compute and a more instructive exercise before scaling further than a smaller model would've been. Makes sense if the aim is to out-scale others more quickly instead of competing at smaller scale, and if this model wasn't meant to last.
Churchill famously called democracy “the worst form of Government except for all those other forms that have been tried from time to time” - referring presumably to the relative success of his native Britain, the US, and more generally Western Europe and today most of the first world.
I claim that Churchill was importantly wrong. Not (necessarily) wrong about the relative success of Britain/US/etc, but about those countries’ governments being well-described as simple democracy. Rather, I claim, the formula which has worked well in e.g. Britain and the US diverges from pure democracy in a crucial load-bearing way; that formula works better than pure democracy both in theory and in practice, and when thinking about good governance structures we should emulate the full formula rather than pure democracy.
Specifically, the actual...
I think veto powers as part of a system of checks and balances are good in moderation, but add to many of them and you end up with a stalemate.
Yes, there's actually some research into this area: https://www.jstor.org/stable/j.ctt7rvv7 "Veto Players: How Political Institutions Work". The theory apparently suggested that if you have too many "veto players", your government quickly becomes unable to act.
And I suspect that states which are unable to act are vulnerable to major waves of public discontent during perceived crises.
Is the era of AI agents writing complex code systems without humans in the loop upon us?
Cognition is calling Devin ‘the first AI software engineer.’
Here is a two minute demo of Devin benchmarking LLM performance.
Devin has its own web browser, which it uses to pull up documentation.
Devin has its own code editor.
Devin has its own command line.
Devin uses debugging print statements and uses the log to fix bugs.
Devin builds and deploys entire stylized websites without even being directly asked.
What could possibly go wrong? Install this on your computer today.
Padme.
I would by default assume all demos were supremely cherry-picked. My only disagreement with Austen Allred’s statement here is that this rule is not new:
...Austen Allred: New rule:
If someone only shows their AI model in tightly
Rather, people who suck at programming (and thus can't get jobs) apply to way more positions than people who are good at programming.
I have interviewed a fair number of programmers, and I've definitely seen plenty of people who talked a good game but who couldn't write FizzBuzz (or sum the numbers in an array). And this was stacking the deck in their favor: They could use a programming language of their choice, plus a real editor, and if they appeared unable to deal with coding in front of people, I'd go sit on the other side of the office and let th...