1 min read

2

This is a special post for quick takes by jacquesthibs. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
jacquesthibs's Shortform
299 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

I would find it valuable if someone could gather an easy-to-read bullet point list of all the questionable things Sam Altman has done throughout the years.

I usually link to Gwern’s comment thread (https://www.lesswrong.com/posts/KXHMCH7wCxrvKsJyn/openai-facts-from-a-weekend?commentId=toNjz7gy4rrCFd99A), but I would prefer if there was something more easily-consumable.

[Edit #2, two months later: see https://ailabwatch.org/resources/integrity/]

[Edit: I'm not planning on doing this but I might advise you if you do, reader.]

50% I'll do this in the next two months if nobody else does. But not right now, and someone else should do it too.

Off the top of my head (this is not the list you asked for, just an outline):

  • Loopt stuff
  • YC stuff
  • YC removal
  • NDAs
    • And deceptive communication recently
    • And maybe OpenAI's general culture of don't publicly criticize OpenAI
  • Profit cap non-transparency
  • Superalignment compute
  • Two exoduses of safety people; negative stuff people-who-quit-OpenAI sometimes say
  • Telling board members not to talk to employees
  • Board crisis stuff
    • OpenAI executives telling the board Altman lies
    • The board saying Altman lies
    • Lying about why he wanted to remove Toner
    • Lying to try to remove Toner
    • Returning
    • Inadequate investigation + spinning results

Stuff not worth including:

  • Reddit stuff - unconfirmed
  • Financial conflict-of-interest stuff - murky and not super important
  • Misc instances of saying-what's-convenient (e.g. OpenAI should scale because of the prospect of compute overhang and the $7T chip investment thing) - idk, maybe, also interested in more examples
  • Johansson
... (read more)
6jacquesthibs
Here’s new one: https://x.com/jacquesthibs/status/1796275771734155499?s=61&t=ryK3X96D_TkGJtvu2rm0uw  Sam added in SEC filings (for AltC) that he’s YC’s chairman. Sam Altman has never been YC’s chairman. From an article posted on April 15th, 2024: “Annual reports filed by AltC for the past 3 years make the same claim. The recent report: Sam was currently chairman of YC at the time of filing and also "previously served" as YC's chairman.” The journalist who replied to me said: “Whether Sam Altman was fired from YC or not, he has never been YC's chair but claimed to be in SEC filings for his AltC SPAC which merged w/Oklo. AltC scrubbed references to Sam being YC chair from its website in the weeks since I first reported this.” The article: https://archive.is/Vl3VR 
2jacquesthibs
Just a heads up, it's been 2 months!
2Zach Stein-Perlman
Not what you asked for but related: https://ailabwatch.org/resources/integrity/
1lemonhope
His sister's accusations that he blocked her from parent's inheritance and that he molested her when he was a young teenager and that he got her social media accounts flagged as spam to hide the accusations
6gwern
I would not consider her claims worth including in a list of top items for people looking for an overview, as they are hard to verify or dubious (her comments are generally bad enough to earn flagging on their own), aside from possibly the inheritance one - as that should be objectively verifiable, at least in theory, and lines up better with the other items.
3robo
I'm very not sure how to do this, but are there ways to collect some counteracting or unbiased samples about Sam Altman?  Or to do another one-sided vetting for other CEOs to see what the base rate of being able to dig up questionable things is?  Collecting evidence in that points in only one direction just sets off huge warning lights 🚨🚨🚨🚨 I can't quiet.
6gwern
Yes, it should. And that's why people are currently digging so hard in the other direction, as they begin to appreciate to what extent they have previously had evidence that only pointed in one direction and badly misinterpreted things like, say, Paul Graham's tweets or YC blog post edits or ex-OAer statements.
2jacquesthibs
Given today's news about Mira (and two other execs leaving), I figured I should bump this again. But also note that @Zach Stein-Perlman has already done some work on this (as he noted in his edit): https://ailabwatch.org/resources/integrity/. Note, what is hard to pinpoint when it comes to S.A. is that many of the things he does have been described as "papercuts". This is the kind of thing that makes it hard to make a convincing case for wrongdoing.

How likely is it that the board hasn’t released specific details about Sam’s removal because of legal reasons? At this point, I feel like I have to place overwhelmingly high probability on this.

So, if this is the case, what legal reason is it?

[-]owencb166

My mainline guess is that information about bad behaviour by Sam was disclosed to them by various individuals, and they owe a duty of confidence to those individuals (where revealing the information might identify the individuals, who might thereby become subject to some form of retaliation).

("Legal reasons" also gets some of my probability mass.)

2jacquesthibs
I think this sounds reasonable, but if this is true, why wouldn’t they just say this?
4Amalthea
It might not be legal reasons specifically, but some hard-to-specify mix of legal reasons/intimidation/bullying. While it's useful to discuss specific ideas, it should be kept in mind that Altman doesn't need to restrict his actions to any specific avenue that could be neatly classified.
3jacquesthibs
My question for as to why they can’t share all the examples was not answered, but Helen gives background on what happened here: https://open.spotify.com/episode/4r127XapFv7JZr0OPzRDaI?si=QdghGZRoS769bGv5eRUB0Q&context=spotify%3Ashow%3A6EBVhJvlnOLch2wg6eGtUa She does confirm she can’t give all of the examples (though points to the ones that were reported), however. Which is not nothing, but eh. However, she also mentioned it was under-reported how much people were scared of Sam and he was creating a very toxic environment.
2Dagon
"legal reasons" is pretty vague.  With billions of dollars at stake, it seems like public statements can be used against them more than it helps them, should things come down to lawsuits.  It's also the case that board members are people, and want to maintain their ability to work and have influence in future endeavors, so want to be seen as systemic cooperators.
2RobertM
But surely "saying nearly nothing" ranks among the worst-possible options for being seen as a "systemic cooperator"?
2Dagon
I should have specified WHO they want to cooperate with in the future.  People with lots of money to spend - businesses.  Silence is far preferable to badmouthing former coworkers.

I thought Superalignment was a positive bet by OpenAI, and I was happy when they committed to putting 20% of their current compute (at the time) towards it. I stopped thinking about that kind of approach because OAI already had competent people working on it. Several of them are now gone.

It seems increasingly likely that the entire effort will dissolve. If so, OAI has now made the business decision to invest its capital in keeping its moat in the AGI race rather than basic safety science. This is bad and likely another early sign of what's to come.

I think the research that was done by the Superalignment team should continue happen outside of OpenAI and, if governments have a lot of capital to allocate, they should figure out a way to provide compute to continue those efforts. Or maybe there's a better way forward. But I think it would be pretty bad if all that talent towards the project never gets truly leveraged into something impactful.

3Bogdan Ionut Cirstea
Strongly agree; I've been thinking for a while that something like a public-private partnership involving at least the US government and the top US AI labs might be a better way to go about this. Unfortunately, recent events seem in line with it not being ideal to only rely on labs for AI safety research, and the potential scalability of automating it should make it even more promising for government involvement. [Strongly] oversimplified, the labs could provide a lot of the in-house expertise, the government could provide the incentives, public legitimacy (related: I think of a solution to aligning superintelligence as a public good) and significant financial resources.
1kromem
It's going to have to. Ilya is brilliant and seems to really see the horizon of the tech, but maybe isn't the best at the business side to see how to sell it. But this is often the curse of the ethically pragmatic. There is such a focus on the ethics part by the participants that the business side of things only sees that conversation and misses the rather extreme pragmatism. As an example, would superaligned CEOs in the oil industry fifty years ago have still only kept their eye on quarterly share prices or considered long term costs of their choices? There's going to be trillions in damages that the world has taken on as liabilities that could have been avoided with adequate foresight and patience. If the market ends up with two AIs, one that will burn down the house to save on this month's heating bill and one that will care if the house is still there to heat next month, there's a huge selling point for the one that doesn't burn down the house as long as "not burning down the house" can be explained as "long term net yield" or some other BS business language. If instead it's presented to executives as "save on this month's heating bill" vs "don't unhouse my cats" leadership is going to burn the neighborhood to the ground. (Source: Explained new technology to C-suite decision makers at F500s for years.) The good news is that I think the pragmatism of Ilya's vision on superalignment is going to become clear over the next iteration or two of models and that's going to be before the question of models truly being unable to be controlled crops up. I just hope that whatever he's going to be keeping busy with will allow him to still help execute on superderminism when the market finally realizes "we should do this" for pragmatic reasons and not just amorphous ethical reasons execs just kind of ignore. And in the meantime I think given the present pace that Anthropic is going to continue to lay a lot of the groundwork on what's needed for alignment on the way to s

For anyone interested in Natural Abstractions type research: https://arxiv.org/abs/2405.07987

Claude summary:

Key points of "The Platonic Representation Hypothesis" paper:

  1. Neural networks trained on different objectives, architectures, and modalities are converging to similar representations of the world as they scale up in size and capabilities.

  2. This convergence is driven by the shared structure of the underlying reality generating the data, which acts as an attractor for the learned representations.

  3. Scaling up model size, data quantity, and task diversity leads to representations that capture more information about the underlying reality, increasing convergence.

  4. Contrastive learning objectives in particular lead to representations that capture the pointwise mutual information (PMI) of the joint distribution over observed events.

  5. This convergence has implications for enhanced generalization, sample efficiency, and knowledge transfer as models scale, as well as reduced bias and hallucination.

Relevance to AI alignment:

  1. Convergent representations shaped by the structure of reality could lead to more reliable and robust AI systems that are better anchored to the real worl

... (read more)
4Gunnar_Zarncke
I recommend making this into a full link-post. I agree about the relevance for AI alignment. 
4cubefox
This sounds really intriguing. I would like someone who is familiar with natural abstraction research to comment on this paper.
1Lorxus
I am very very vaguely in the Natural Abstractions area of alignment approaches. I'll give this paper a closer read tomorrow (because I promised myself I wouldn't try to get work done today) but my quick quick take is - it'd be huge if true, but there's not much more than that there yet, and it also has no argument that even if representations are converging for now, that it'll never be true that (say) adding a whole bunch more effectively-usable compute means that the AI no longer has to chunk objectspace into subtypes rather than understanding every individual object directly.

I thought this series of comments from a former DeepMind employee (who worked on Gemini) were insightful so I figured I should share.

From my experience doing early RLHF work for Gemini, larger models exploit the reward model more. You need to constantly keep collecting more preferences and retraining reward models to make it not exploitable. Otherwise you get nonsensical responses which have exploited the idiosyncracy of your preferences data. There is a reason few labs have done RLHF successfully.

It's also know that more capable models exploit loopholes in reward functions better. Imo, it's a pretty intuitive idea that more capable RL agents will find larger rewards. But there's evidence from papers like this as well: https://arxiv.org/abs/2201.03544 

To be clear, I don't think the current paradigm as-is is dangerous. I'm stating the obvious because this platform has gone a bit bonkers.

The danger comes from finetuning LLMs to become AutoGPTs which have memory, actions, and maximize rewards, and are deployed autonomously. Widepsread proliferation of GPT-4+ models will almost certainly make lots of these agents which will cause a lot of damage and potentially cause something ind

... (read more)
7leogao
"larger models exploit the RM more" is in contradiction with what i observed in the RM overoptimization paper. i'd be interested in more analysis of this
4Algon
In that paper did you guys take a good long look at the output of various sized models throughout training? In addition to looking at the graphs of gold-standard/proxy reward model ratings against KL-divergence. If not, then maybe that's the discrepancy: perhaps Sherjil was communicating with the LLM and thinking "this is not what we wanted". 

Why aren't you doing research on making pre-training better for alignment?

I was on a call today, and we talked about projects that involve studying how pre-trained models evolve throughout training and how we could guide the pre-training process to make models safer. For example, could models trained on synthetic/transformed data make models significantly more robust and essentially solve jailbreaking? How about the intersection of pretraining from human preferences and synthetic data? Could the resulting model be significantly easier to control? How would it impact the downstream RL process? Could we imagine a setting where we don't need RL (or at least we'd be able to confidently use resulting models to automate alignment research)? I think many interesting projects could fall out of this work.

So, back to my main question: why aren't you doing research on making pre-training better for alignment? Is it because it's too expensive and doesn't seem like a low-hanging fruit? Or do you feel it isn't a plausible direction for aligning models?

We were wondering if there are technical bottlenecks that would make this kind of research more feasible for alignment research to better study ho... (read more)

3jacquesthibs
Synthesized various resources for this "pre-training for alignment" type work: * Data * Synthetic Data * The RetroInstruct Guide To Synthetic Text Data * Alignment In The Age of Synthetic Data * Leveraging Agentic AI for Synthetic Data Generation * **AutoEvol**: Automatic Instruction Evolving for Large Language Models We build a fully automated Evol-Instruct pipeline to create high-quality, highly complex instruction tuning data * Synthetic Data Generation and AI Feedback notebook * The impact of models training on their own outputs and how its actually done well in practice * Google presents Best Practices and Lessons Learned on Synthetic Data for Language Models *   * Transformed/Enrichment of Data * Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling. TLDR: You can train 3x faster and with upto 10x lesser data with just synthetic rephrases of the web! * Better Synthetic Data by Retrieving and Transforming Existing Datasets * Rho-1: Not All Tokens Are What You Need RHO-1-1B and 7B achieves SotA results of 40.6% and 51.8% on MATH dataset, respectively — matching DeepSeekMath with only 3% of the pretraining tokens. * Data Attribution * In-Run Data Shapley * Scaling Laws for the Value of Individual Data Points in Machine Learning We show how some data points are only valuable in small training sets; others only shine in large datasets. * What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions *   * Data Mixtures * Methods for finding optimal data mixture * RegMix: Data Mixture as Regression for Language Model Pre-training * Curriculum Learning * On transforming data into a curriculum to improve learning efficiency and capability * Curriculum learning that actually works? * Active Data Selection * MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models MATE
1myyycroft
GPT-2 1.5B is small by today's standards. I hypothesize people are not sure if findings made for models of this scale will generalize to frontier models (or at least to the level of LLaMa-3.1-70B), and that's why nobody is working on it. However, I was impressed by "Pre-Training from Human Preferences". I suppose that pretraining could be improved, and it would be a massive deal for alignment.
1eggsyntax
One key question here, I think: a major historical alignment concern has been that for any given finite set of outputs, there are an unbounded number of functions that could produce it, and so it's hard to be sure that a model will generalize in a desirable way. Nora Belrose goes so far as to suggest that 'Alignment worries are quite literally a special case of worries about generalization.' This is relevant for post-training but I think even more so for pre-training. I know that there's been research into how neural networks generalize both from the AIS community and the larger ML community, but I'm not very familiar with it; hopefully someone else can provide some good references here.

If you work at a social media website or YouTube (or know anyone who does), please read the text below:

Community Notes is one of the best features to come out on social media apps in a long time. The code is even open source. Why haven't other social media websites picked it up yet? If they care about truth, this would be a considerable step forward beyond. Notes like “this video is funded by x nation” or “this video talks about health info; go here to learn more” messages are simply not good enough.

If you work at companies like YouTube or know someone who does, let's figure out who we need to talk to to make it happen. Naïvely, you could spend a weekend DMing a bunch of employees (PMs, engineers) at various social media websites in order to persuade them that this is worth their time and probably the biggest impact they could have in their entire career.

If you have any connections, let me know. We can also set up a doc of messages to send in order to come up with a persuasive DM.

5jacquesthibs
Don't forget that we train language models on the internet! The more truthful your dataset is, the more truthful the models will be! Let's revamp the internet for truthfulness, and we'll subsequently improve truthfulness in our AI systems!!
2Viliam
I don't use Xitter; is there a way to display e.g. top 100 tweets with community notes? To see how it works in practice.
6Yoav Ravid
I don't know of something that does so at random, but this page automatically shares posts with community notes that have been deemed helpful.

Oh, that’s great, thanks! Also reminded me of (the less official, more comedy-based) “Community Notes Violating People”. @Viliam 

2Viliam
Thank you both! This is perfect. It's like a rational version of Twitter, and I didn't expect to use those words in the same sentence.
2jacquesthibs
I don’t think so, unfortunately.
2Viliam
Found a nice example (linked from Zvi's article). Okay, it's just one example and it wasn't found randomly, but I am impressed.
2jacquesthibs
I've also started working on a repo in order to make Community Notes more efficient by using LLMs.
2ChristianKl
This sounds a bit naive.  There's a lot of energy invested in making it easier for powerful elites to push their preferred narratives. Community Notes are not in the interests of the Censorship Industrial Complex. I don't think that anyone at the project manager level has the political power to add a feature like Community Notes. It would likely need to be someone higher up in the food chain. 
9jacquesthibs
Sure, but sometimes it's just a PM and a couple of other people that lead to a feature being implemented. Also, keep in mind that Community Notes was a thing before Musk. Why was Twitter different than other social media websites? Also, the Community Notes code was apparently completely revamped by a few people working on the open-source code, which got it to a point where it was easy to implement, and everyone liked the feature because it noticeably worked. Either way, I'd rather push for making it happen and somehow it fails on other websites than having pessimism and not trying at all. If it needs someone higher up the chain, let's make it happen.
4ChristianKl
Twitter seems to have started Birdwatch as a small separate pilot project where it likely wasn't easy to fight or on anyone's radar to fight.  In the current enviroment, where X gets seen as evil by a lot of the mainstream media, I would suspect that copying Community Notes from X would alone produce some resistence. The antibodies are now there in a way they weren't two years ago.  If you look at mainstream media views about X's community notes, I don't think everyone likes it.  I remember Elon once saying that he lost a 8-figure advertising deal because of Community Notes on posts of a company that wanted to advertise on X. I think you would likely need to make a case that it's good business in addition to helping with truth.  If you want to make your argument via truth, motivating some reporters to write favorable articles about Community Notes might be necessary. 
2jacquesthibs
Good points; I'll keep them all in mind. If money is the roadblock, we can put pressure on the companies to do this. Or, worst-case, maybe the government can enforce it (though that should be done with absolute care).
2jacquesthibs
I shared a tweet about it here: https://x.com/JacquesThibs/status/1724492016254341208?s=20 Consider liking and retweeting it if you think this is impactful. I'd like it to get into the hands of the right people.
1Bruce Lewis
I had not heard of Community Notes. Interesting anti-bias technique "notes require agreement between contributors who have sometimes disagreed in their past ratings". https://communitynotes.twitter.com/guide/en/about/introduction
2jacquesthibs
I've been on Twitter for a long time, and there's pretty much unanimous agreement that it works amazingly well in practice!
1Kabir Kumar
there is an issue with surface level insights being unfaily weighted, but this is solvable, imo. especially with youtube, which can see which commenters have watched the full video.

I have some alignment project ideas for things I'd consider mentoring for. I would love feedback on the ideas. If you are interested in collaborating on any of them, that's cool, too.

Here are the titles:

Smart AI vs swarm of dumb AIs
Lit review of chain of thought faithfulness (steganography in AIs)
Replicating METR paper but for alignment research task
Tool-use AI for alignment research
Sakana AI for Unlearning
Automated alignment onboarding
Build the infrastructure for making Sakana AI's AI scientist better for alignment research

I quickly wrote up some rough project ideas for ARENA and LASR participants, so I figured I'd share them here as well. I am happy to discuss these ideas and potentially collaborate on some of them.

Alignment Project Ideas (Oct 2, 2024)

1. Improving "A Multimodal Automated Interpretability Agent" (MAIA)

Overview

MAIA (Multimodal Automated Interpretability Agent) is a system designed to help users understand AI models by combining human-like experimentation flexibility with automated scalability. It answers user queries about AI system components by iteratively generating hypotheses, designing and running experiments, observing outcomes, and updating hypotheses.

MAIA uses a vision-language model (GPT-4V, at the time) backbone equipped with an API of interpretability experiment tools. This modular system can address both "macroscopic" questions (e.g., identifying systematic biases in model predictions) and "microscopic" questions (e.g., describing individual features) with simple query modifications.

This project aims to improve MAIA's ability to either answer macroscopic questions or microscopic questions on vision models.

2. Making "A Multimodal Automated Interpretability Agent" (MAIA) wor

... (read more)

My current speculation as to what is happening at OpenAI

How do we know this wasn't their best opportunity to strike if Sam was indeed not being totally honest with the board?

Let's say the rumours are true, that Sam is building out external orgs (NVIDIA competitor and iPhone-like competitor) to escape the power of the board and potentially going against the charter. Would this 'conflict of interest' be enough? If you take that story forward, it sounds more and more like he was setting up AGI to be run by external companies, using OpenAI as a fundraising bargaining chip, and having a significant financial interest in plugging AGI into those outside orgs.

So, if we think about this strategically, how long should they wait as board members who are trying to uphold the charter?

On top of this, it seems (according to Sam) that OpenAI has made a significant transformer-level breakthrough recently, which implies a significant capability jump. Long-term reasoning? Basically, anything short of 'coming up with novel insights in physics' is on the table, given that Sam recently used that line as the line we need to cross to get to AGI.

So, it could be a mix of, Ilya thinking they have achieved AG... (read more)

3jacquesthibs
Obviously, a lot has happened since the above shortform, but regarding model capabilities (which discussions died down these last couple of days), there's now this: Source: https://www.reuters.com/technology/sam-altmans-ouster-openai-was-precipitated-by-letter-board-about-ai-breakthrough-2023-11-22/ 
4jacquesthibs
So, apparently, there are two models, but only Q* is mentioned in the article. Won't share the source, but:
3jacquesthibs
Update, board members seem to be holding their ground more than expected in this tight situation:

News on the next OAI GPT release:

Nagasaki, CEO of OpenAI Japan, said, "The AI ​​model called 'GPT Next' that will be released in the future will evolve nearly 100 times based on past performance. Unlike traditional software, AI technology grows exponentially." 

https://www.itmedia.co.jp/aiplus/articles/2409/03/news165.html 

The slide clearly states 2024 "GPT Next". This 100 times increase probably does not refer to the scaling of computing resources, but rather to the effective computational volume + 2 OOMs, including improvements to the architecture and learning efficiency. GPT-4 NEXT, which will be released this year, is expected to be trained using a miniature version of Strawberry with roughly the same computational resources as GPT-4, with an effective computational load 100 times greater. Orion, which has been in the spotlight recently, was trained for several months on the equivalent of 100k H100 compared to GPT-4 (EDIT: original tweet said 10k H100s, but that was a mistake), adding 10 times the computational resource scale, making it +3 OOMs, and is expected to be released sometime next year.

Note: Another OAI employee seemingly confirms this (I've followed... (read more)

8Vladimir_Nesov
This implies successful use of FP8, if taken literally in a straightforward way. In BF16 an H100 gives 1e15 FLOP/s (in dense tensor compute). With 40% utilization over 10 months, 10K H100s give 1e26 FLOPs, which is only 5 times higher than the rumored 2e25 FLOPs of original GPT-4. To get to 10 times higher requires some 2x improvement, and the evident way to get that is by transitioning from BF16 to FP8. I think use of FP8 for training hasn't been confirmed to be feasible at GPT-4 level scale (Llama-3-405B uses BF16), but if it does work, that's a 2x compute increase for other models as well. This text about Orion and 10K H100s only appears in the bioshok3 tweet itself, not in the quoted news article, so it's unclear where the details come from. The "10 times the computational resource scale, making it +3 OOMs" hype within the same sentence also hurts credence in the numbers being accurate (10 times, 10K H100s, several months). Another implication is that Orion is not the 100K H100s training run (that's probably currently ongoing). Plausibly it's an experiment with training on a significant amount of synthetic data. This suggests that the first 100K H100s training run won't be experimenting with too much synthetic training data yet, at least in pre-training. The end of 2025 point for significant advancement in quality might then be referring to the possibility that Orion succeeds and its recipe is used in another 100K H100s scale run, which might be the first hypothetical model they intend to call "GPT-5". The first 100K H100s run by itself (released in ~early 2025) would then be called "GPT-4.5o" or something (especially if Orion does succeed, so that "GPT-5" remains on track).
8dirk
Bioshok3 said in a later tweet that they were in any case mistaken about it being 10k H100s and it was actually 100k H100s: https://x.com/bioshok3/status/1831016098462081256 
4Vladimir_Nesov
Surprisingly, there appears to be an additional clue for this in the wording: 2e26 BF16 FLOPs take 2.5 months on 100K H100s at 30% utilization, while the duration of "several months" is indicated by the text "数ヶ月" in the original tweet. GPT-4o explains it to mean So the interpretation that fits most is specifically 2-3 months (Claude says 2-4 months, Grok 3-4 months), close to what the calculation for 100K H100s predicts. And this is quite unlike the requisite 10 months with 10K H100s in FP8.
2ryan_greenblatt
My guess is that this is just false / hallucinated.
4ryan_greenblatt
"Orion is 10x compute" seems plausible, "Orion was trained on only 10K H100s" does not seem plausible if it is actually supposed to be 10x raw compute. Around 50K H100s does seem plausible and would correspond to about 10x compute assuming a training duration similar to GPT-4.
4Vladimir_Nesov
Within this hypothetical, Orion didn't necessarily merit the use of the largest training cluster, while time on 10K H100s is something mere money can buy without impacting other plans. GPT-4o is itself plausibly at 1e26 FLOPs level already, since H100s were around for more than a year before it came out (1e26 FLOPs is 5 months on 20K H100s). It might be significantly overtrained, or its early fusion multimodal nature might balloon the cost of effective intelligence. Gemini 1.0 Ultra, presumably also an early fusion model with rumored 1e26 FLOPs, similarly wasn't much better than Mar 2023 GPT-4. Though Gemini 1.0 is plausibly dense, given how the Gemini 1.5 report stressed that 1.5 is MoE, so that might be a factor in how 1e26 FLOPs didn't get it too much of an advantage. So if GPT-4o is not far behind in terms of FLOPs, a 2e26 FLOPs Orion wouldn't be a significant improvement unless the synthetic data aspect works very well, and so there would be no particular reason to rush it. On the other hand GPT-4o looks like something that needed to be done as fast as possible, and so the largest training cluster went to it and not Orion. The scaling timelines are dictated by building of largest training clusters, not by decisions about use of smaller training clusters.
2ryan_greenblatt
This tweet also claims 10k H100s while citing the same article that doesn't mention this.
1LuigiPagani
Are you sure he is an OpenAi employee?

I encourage alignment/safety people to be open-minded about what François Chollet is saying in this podcast:

I think many are blindly bought into the 'scale is all you need' and apparently godly nature of LLMs and may be dependent on unfounded/confused assumptions because of it.

Getting this right is important because it could significantly impact how hard you think alignment will be. Here's @johnswentworth responding to @Eliezer Yudkowsky about his difference in optimism compared to @Quintin Pope (despite believing the natural abstraction hypothesis is true):

Entirely separately, I have concerns about the ability of ML-based technology to robustly point the AI in any builder-intended direction whatsoever, even if there exists some not-too-large adequate mapping from that intended direction onto the AI's internal ontology at training time.  My guess is that more of the disagreement lies here.

I doubt much disagreement between you and I lies there, because I do not expect ML-style training to robustly point an AI in any builder-intended direction. My hopes generally don't route through targeting via ML-style training.

I do think my deltas from many other people lie there - e.g. that

... (read more)
7Mitchell_Porter
In my opinion, this does not correspond to a principled distinction at the level of computation.  For intelligences that employ consciousness in order to do some of these things, there may be a difference in terms of mechanism. Reasoning and pattern matching sound like they correspond to different kinds of conscious activity.  But if we're just talking about computation... a syllogism can be implemented via pattern matching, a pattern can be completed by a logical process (possibly probabilistic). 
7jacquesthibs
Perhaps, but deep learning models are still failing at ARC. My guess (and Chollet's) is that they will continue to fail at ARC unless they are trained on that kind of data (which goes against the point of the benchmark) or you add something else that actually resolves this failure in deep learning models. It may be able to pattern-match to reasoning-like behaviour, but only if specifically trained on that kind of data. No matter how much you scale it up, it will still fail to generalize to anything not local in its training data distribution.
4Seth Herd
I think this is exactly right. The phrasing is a little confusing. I'd say "LLMs can't solve truly novel problems". But the implication that this is a slow route or dead-end for AGI is wrong. I think it's going to be pretty easy to scaffold LLMs into solving novel problems. I could be wrong, but don't bet heavily on it unless you happen to know way more about cognitive psychology and LLMs in combination than I do. it would be foolish to make a plan for survival that relies on this being a major delay. I can't convince you of this without describing exactly how I think this will be relatively straightforward, and I'm not ready to advance capabilities in this direction yet. I think language model agents are probably our best shot at alignment, so we should probably actively work on advancing them to AGI; but I'm not sure enough yet to start publishing my best theories on how to do that. Back to the possibly confusing phrasing Chollet uses: I think he's using Piaget's definition of intelligence as "what you do when you don't know what to do" (he quotes this in the interview). That's restricting it to solving problems you haven't memorized an approach to. That's not how most people use the word intelligence. When he says LLMs "just memorize", he's including memorizing programs or approaches to problems, and they can plug the variables of this particular variant of the problem in to those memorized programs/approaches. I think the question "well maybe that's all you need to do" raised by Patel is appropriate; it's clear they can't do enough of this yet, but it's unclear if further progress will get them to another level of abstraction of an approach so abstract and general that it can solve almost any problem. I think he's on the wrong track with the "discrete program search" because I see more human-like solutions that may be lower-hanging fruit, but I wouldn't bet his approach won't work. I'm starting to think that there are many approaches to general intelligence
2quetzal_rainbow
Okay, hot take: I don't think that ARC tests "system 2 reasoning" and "solving novel tasks", at least, in humans. When I see simple task, I literally match patterns, when I see complex task I run whatever patterns I can invent until they match. I didn't run the entire ARC testing dataset, but if I am good at solving it, it will be because I am fan of Myst-esque games and, actually, there are not so many possible principles in designing problems of this sort. What failure of LLMs to solve ARC is actually saying us, it is "LLM cognition is very different from human cognition".
3jacquesthibs
They've tested ARC with children and Mechanical Turk workers, and they all seem to do fine despite the average person not being a fan of "Myst-esque games." Do you believe LLMs are just a few OOMs away from solving novel tasks like ARC? What is different that is not explained by what Chollet is saying?
2quetzal_rainbow
By "good at solving" I mean "better than average person". I think the fact that language model are better at predicting next token than humans implies that LLMs have sophisticated text-oriented cognition and saying "LLMs are not capable to solve ARC, therefore, they are less intelligent than children" is equivalent to saying "humans can't take square root of 819381293787, therefore, they are less intelligent than calculator".  My guess that probably we would need to do something non-trivial to scale LLM to superintelligence, but I don't expect that it is necessary to move from general LLM design principles. 
4jacquesthibs
Of course, I acknowledge that LLMs are better at many tasks than children. Those tasks just happen to all be within its training data distribution and not on things that are outside of it. So, no, you wouldn't say the calculator is more intelligent than the child, but you might say that it has an internal program that allows it to be faster and more accurate than a child. LLMs have such programs they can use via pattern-matching too, as long as it falls into the training data distribution (in the case of Caesar cypher, apparently it doesn't do so well for number nine – because it's simply less common in its training data distribution). One thing that Chollet does mention that helps to alleviate the limitation of deep learning is to have some form of active inference:
2quetzal_rainbow
Let's start with the end: Why do you think that they don't already do that?  My point is that children can solve ARC not because they have some amazing abstract spherical-in-vacuum reasoning abilities which LLMs lack, but because they have human-specific pattern recognition ability (like geometric shapes, number sequences, music, etc). Brains have strong inductive biases, after all. If you train a model purely on the prediction of a non-anthropogenic physical environment, I think this model will struggle with solving ARC even if it has a sophisticated multi-level physical model of reality, because regular ARC-style repeating shapes are not very probable on priors. In my impression, in debates about ARC, AI people do not demonstrate a very high level of deliberation. Chollet and those who agree with him are like "nah, LLMs are nothing impressive, just interpolation databases!" and LLM enthusiasts are like "scaling will solve everything!!!!111!" Not many people seem to consider "something interesting is going on here. Maybe we can learn something important about how humans and LLMs work that doesn't fit into simple explanation templates."
1eggsyntax
  Since AFAIK in-context learning functions pretty similarly to fine-tuning (though I haven't looked into this much), it's not clear to me why Chollet sees online fine-tuning as deeply different from few-shot prompting. Certainly few-shot prompting works extremely well for many tasks; maybe it just empirically doesn't help much on this one?
1eggsyntax
  As per "Transformers learn in-context by gradient descent", which Gwern also mentions in the comment that @quetzal_rainbow links here.  
1Morpheus
Looking at how gpt-4 did on the benchmark when I gave it some screenshots, the thing it failed at was the visual "pattern matching" (things completely solved by my system 1) rather than the abstract reasoning.
2jacquesthibs
Yes, the point is that it can’t pattern match because it has never seen such examples. And, as humans, we are able to do well on the task because we don’t simply rely on pattern matching, we use system 2 reasoning (in addition) to do well on such a novel task. Given that the deep learning model relies on pattern matching, it can’t do the task.
3Morpheus
I think humans just have a better visual cortex and expect this benchmark too to just fall with scale.
2jacquesthibs
As Chollet says in the podcast, we will see if multimodal models crack ARC in the next year, but I think researchers should start paying attention rather than dismissing if they are incapable of doing so in the next year. But for now, “LLMs do fine with processing ARC-like data by simply fine-tuning an LLM on subsets of the task and then testing it on small variation.” It encodes solution programs just fine for tasks it has seen before. It doesn’t seem to be an issue of parsing the input or figuring out the program. For ARC, you need to synthesize a new solution program on the fly for each new task.
1Morpheus
Would it change your mind if gpt-4 was able to do the grid tasks if I manually transcribed them to different tokens? I tried to manually let gpt-4 turn the image to a python array, but it indeed has trouble performing just that task alone.
1Morpheus
For concreteness. In this task it fails to recognize that all of the cells get filled, not only the largest one. To me that gives the impression that the image is just not getting compressed really well and the reasoning gpt-4 is doing is just fine.
1[comment deleted]
1eggsyntax
There are other interesting places where LLMs fail badly at reasoning, eg planning problems like block-world or scheduling meetings between people with availability constraints; see eg this paper & other work from Kambhampati. I've been considering putting some time into this as a research direction; the ML community has a literature on the topic but it doesn't seem to have been discussed much in AIS, although the ARC prize could change that. I think it needs to be considered through a safety lens, since it has significant impacts on the plausibility of short timelines to drop-in-researcher like @leopold's. I have an initial sketch of such a direction here, combining lit review & experimentation. Feedback welcomed! (if in fact someone already has looked at this issue through an AIS lens, I'd love to know about it!)
1Morpheus
I don't get it. I just looked at ARC and it seemed obvious that gpt-4/gpt-4o can easily solve these problems by writing python. Then I looked it up on papers-with-code and it seems close to solved? Probably the ones remaining would be hard for children also. Did the benchmark leak into the training data and that is why they don't count them?
3dmz
Unfortunate name collision: you're looking at numbers on the AI2 Reasoning Challenge, not Chollet's Abstraction & Reasoning Corpus.
1Morpheus
Thanks for clarifying! I just tried a few simple ones by prompting gpt-4o and gpt-4 and it does absolutely horrific job! Maybe trying actually good prompting could help solving it, but this is definitely already an update for me!

Attempt to explain why I think AI systems are not the same thing as a library card when it comes to bio-risk.

To focus on less of an extreme example, I’ll be ignoring the case where AI can create new, more powerful pathogens faster than we can create defences, though I think this is an important case (some people just don’t find it plausible because it relies on the assumption that AIs being able to create new knowledge).

I think AI Safety people should make more of an effort to walkthrough the threat model so I’ll give an initial quick first try:

1) Library. If I’m a terrorist and I want to build a bioweapon, I have to spend several months reading books at minimum to understand how it all works. I don’t have any experts on-hand to explain how to do it step-by-step. I have to figure out which books to read and in what sequence. I have to look up external sources to figure out where I can buy specific materials.

Then, I have to somehow find out how to to gain access to those materials (this is the most difficult part for each case). Once I gain access to the materials, I still need to figure out how to make things work as a total noob at creating bioweapons. I will fail. Even experts fa... (read more)

Resharing a short blog post by an OpenAI employee giving his take on why we have 3-5 year AGI timelines (https://nonint.com/2024/06/03/general-intelligence-2024/):

Folks in the field of AI like to make predictions for AGI. I have thoughts, and I’ve always wanted to write them down. Let’s do that.

Since this isn’t something I’ve touched on in the past, I’ll start by doing my best to define what I mean by “general intelligence”: a generally intelligent entity is one that achieves a special synthesis of three things:

  • A way of interacting with and observing a complex environment. Typically this means embodiment: the ability to perceive and interact with the natural world.
  • A robust world model covering the environment. This is the mechanism which allows an entity to perform quick inference with a reasonable accuracy. World models in humans are generally referred to as “intuition”, “fast thinking” or “system 1 thinking”.
  • A mechanism for performing deep introspection on arbitrary topics. This is thought of in many different ways – it is “reasoning”, “slow thinking” or “system 2 thinking”.

If you have these three things, you can build a generally intelligent agent. Here’s how:

First, you se... (read more)

1Jonas Hallgren
I really like this take. I'm kind of "bullish" on active inference as a way to scale existing architectures to AGI as I think it is more optimised for creating an explicit planning system. Also, Funnily enough, Yann LeCun has a paper on his beliefs on the path to AGI which I think Steve Byrnes has a good post on. It basically says that we need system 2 thinking in the way you said it here. With your argument in mind he kind of disproves himself to some extent. 😅
1kromem
I agree with a lot of those points, but suspect there may be fundamental limits to planning capabilities related to the unidirectionality of current feed forward networks. If we look at something even as simple as how a mouse learns to navigate a labyrinth, there's both a learning of the route to the reward but also a learning of how to get back to the start which adjusts according to the evolving learned layout of the former (see paper: https://elifesciences.org/articles/66175 ). I don't see the SotA models doing well at that kind of reverse planning, and expect that nonlinear tasks are going to pose significant agentic challenges until architectures shift to something new. So it could be 3-5 years to get to AGI depending on hardware and architecture advances, or we might just end up in a sort of weird "bit of both" world where we have models that are beyond expert human level superintelligent in specific scopes but below average in other tasks. But when we finally do get models that in both training and operation exhibit bidirectional generation across large context windows, I think it will only be a very short time until some rather unbelievable goalposts are passed by.

Low-hanging fruit:

Loving this Chrome extension so far: YouTube Summary with ChatGPT & Claude - Chrome Web Store

It adds a button on YouTube videos where, when you click it (or keyboard shortcut ctrl + x + x), it opens a new tab into the LLM chat of your choice, pastes the entire transcript in the chat along with a custom message you can add as a template ("Explain the key points.") and then automatically presses enter to get the chat going.

It's pretty easy to get a quick summary of a YouTube video without needing to watch the whole thing and then ask follow-up questions. It seems like an easy way to save time or do a quick survey of many YouTube videos. (I would not have bothered going through the entire "Team 2 | Lo fi Emulation @ Whole Brain Emulation Workshop 2024" talk, so it was nice to get the quick summary.)

I usually like getting a high-level overview of the key points of a talk to have a mental mind map skeleton before I dive into the details.

You can even set up follow-up prompt buttons (which works with ChatGPT but currently does not work with Claude for me), though I'm not sure what I'd use. Maybe something like, "Why is this important to AI alignment?"

The default prom... (read more)

5JaimeRV
I used to use that one but I moved to Sider: https://sider.ai/pricing?trigger=ext_chrome_btm_upgrd it works in all the pages, including youtube. For Papers and articles I have shortcut to automatically modify the url (adding the prefix "https://r.jina.ai/") so you get the markdown and then do Sider on that. With gpt4o-mini it is almost free. Also nice is Sider is that you can write your own prompt templates
2jacquesthibs
Thanks for sharing, will give it a shot! Edit: Sider seems really great! I wish it could connect to Claude chat (without using credits), so I will probably just use both extensions.

Dario Amodei believes that LLMs/AIs can be aided to self-improve in a similar way to AlphaGo Zero (though LLMs/AIs will benefit from other things too, like scale), where the models can learn by themselves to gain significant capabilities.

The key for him is that Go has a set of rules that the AlphaGo model needs to abide by. These rules allow the model to become superhuman at Go with enough compute.

Dario essentially believes that to reach better capabilities, it will help to develop rules for all the domains we care about and that this will likely be possible for more real-world tasks (not just games like Go).

Therefore, I think the crux here is if you think it is possible to develop rules for science (physics, chemistry, math, biology) and other domains s.t., the models can do this sort of self-play to become superhuman for each of the things we care about.

So far, we have examples like AlphaGeometry, which relies on our ability to generate many synthetic examples to help the model learn. This makes sense for the geometry use case, but how do we know if this kind of approach will work for the kinds of things we actually care about? For games and geometry, this seems possible, but wha... (read more)

4jacquesthibs
Hey @Zac Hatfield-Dodds, I noticed you are looking for citations; these are the interview bits I came across (and here at 47:31). It's possible I misunderstood him; please correct me if I did!
3Zac Hatfield-Dodds
I don't think any of these amount to a claim that "to reach ASI, we simply need to develop rules for all the domains we care about". Yes, AlphaGo Zero reached superhuman levels on the narrow task of playing Go, and that's a nice demonstration that synthetic data could be useful, but it's not about ASI and there's no claim that this would be either necessary or sufficient. (not going to speculate on object-level details though)
2jacquesthibs
Ok, totally; there's no specific claim about ASI. Will edit the wording.
4Seth Herd
I think this type of autonomous learning is fairly likely to be achieved soon (1-2 years), and it doesn't need to follow exactly AlphaZero's self-play model. The world has rules. Those rules are much more complex and stochastic than games or protein folding. But note that the feedback in Go comes only after something like 200 moves, yet the powerful critic head is able to use that to derive a good local estimate of what's likely a good or bad move. Humans use a similar powerful critic in the dopamine system working in concert with the cortex's rich world model to decide what's rewarding long before there's a physical reward or punishment signal. This is one route to autonomous learning for LLM agents. I don't know if Amodei is focused on base models or hybrid learning systems, and that matters. Or maybe it doesn't. I can think of more human-like ways of autonomous learning in a hybrid system, but a powerful critic may be adequate for self-play even in a base model. Existing RLHF techniques do use a critic - I think it's proximal policy optimization (or DPO?) in the last OpenAI setup they publicly reported. (I haven't looked at Anthropic's RLAIF setup to see if they're using a similar critic portion of the model- I'd guess they are, following OpenAIs success with it). I'd expect they're experimenting with using small sets of human feedback to leverage self-critique as in RLAIF, making a better critic that makes a better overall model. Decomposing video into text and then predicting how people behave both physically and emotionally offer two new windows onto the rules of the world. I guess those aren't quite in the self-play domain on their own, but having good predictions of outcomes might allow autonomous learning of agentic actions by taking feedback not from a real or simulated world, but from that trained predictor of physical and social outcomes. Deriving a feedback signal directly from the world can be done in many ways. I expect there are more clever ide
2Seth Herd
Glancing back at this, I noted I missed the most obvious form of self-play: putting an agent in an interaction with another copy of itself. You could do any sort of "scoring" by having an automated of the outcome vs. the current goal. This has some obvious downsides, in that the agents aren't the same as people. But it might get you a good bit of extra training that predicting static datasets doesn't give. A little interaction with real humans might be the cherry on top of the self-play whipped cream on the predictive learning sundae.
3davekasten
I am fairly skeptical that we don't already have something close-enough-to-approximate this if we had access to all the private email logs of the relevant institutions matched to some sort of correlation of "when this led to an outcome" metric (e.g., when was the relevant preprint paper or strategy deck or whatever released)
2RogerDearnaley
Go has rules, and gives you direct and definitive feedback on how well you're doing, but, while a very large space, it isn't open-ended. A lot of the foundation model companies appear to be busily thinking about doing something AlphaZero-inspired in mathematics, which also has rules, and can be arranged to give you direct feedback on how you're doing (there have been recent papers on how to make this more efficient with less human input). Similarly on writing and debugging software, likewise. Indeed, models have recently been getting better at Math and coding faster than other topics, suggesting that they're making real progress. When I watched that Dario interview (the Scandinavian bank one, I assume) my assumption was that Dario was talking about those, but using AlphaGo as a clearer and more widely-familiar example. Expanding this to other areas seems like it would come next: robotics seems a promising one that also gives you a lot of rapid feedback, science would be fascinating and exciting but the feedback loops are a lot longer, human interactions (on something like the Character AI platform) seem like another possibility (though the result of that might be models better at human manipulation and/or pillow-talk, which might not be entirely a good thing).