jacquesthibs

I work primarily on AI Alignment. Scroll down to my pinned Shortform for an idea of my current work and who I'd like to collaborate with.

Website: https://jacquesthibodeau.com

Twitter: https://twitter.com/JacquesThibs

GitHub: https://github.com/JayThibs 

Sequences

On Becoming a Great Alignment Researcher (Efficiently)

Wiki Contributions

Comments

I shared the following as a bio for EAG Bay Area 2024. I'm sharing this here if it reaches someone who wants to chat or collaborate.

Hey! I'm Jacques. I'm an independent technical alignment researcher with a background in physics and experience in government (social innovation, strategic foresight, mental health and energy regulation). Link to Swapcard profile. Twitter/X.

CURRENT WORK

  • Collaborating with Quintin Pope on our Supervising AIs Improving AIs agenda (making automated AI science safe and controllable). The current project involves a new method allowing unsupervised model behaviour evaluations. Our agenda.
  • I'm a research lead in the AI Safety Camp for a project on stable reflectivity (testing models for metacognitive capabilities that impact future training/alignment).
  • Accelerating Alignment: augmenting alignment researchers using AI systems. A relevant talk I gave. Relevant survey post.
  • Other research that currently interests me: multi-polar AI worlds (and how that impacts post-deployment model behaviour), understanding-based interpretability, improving evals, designing safer training setups, interpretable architectures, and limits of current approaches (what would a new paradigm that addresses these limitations look like?).
  • Used to focus more on model editing, rethinking interpretability, causal scrubbing, etc.

TOPICS TO CHAT ABOUT

  • How do you expect AGI/ASI to actually develop (so we can align our research accordingly)? Will scale plateau? I'd like to get feedback on some of my thoughts on this.
  • How can we connect the dots between different approaches? For example, connecting the dots between Influence Functions, Evaluations, Probes (detecting truthful direction), Function/Task Vectors, and Representation Engineering to see if they can work together to give us a better picture than the sum of their parts.
  • Debate over which agenda actually contributes to solving the core AI x-risk problems.
  • What if the pendulum swings in the other direction, and we never get the benefits of safe AGI? Is open source really as bad as people make it out to be?
  • How can we make something like the d/acc vision (by Vitalik Buterin) happen?
  • How can we design a system that leverages AI to speed up progress on alignment? What would you value the most?
  • What kinds of orgs are missing in the space?

POTENTIAL COLLABORATIONS

  • Examples of projects I'd be interested in: extending either the Weak-to-Strong Generalization paper or the Sleeper Agents paper, understanding the impacts of synthetic data on LLM training, working on ELK-like research for LLMs, experiments on influence functions (studying the base model and its SFT, RLHF, iterative training counterparts; I heard that Anthropic is releasing code for this "soon") or studying the interpolation/extrapolation distinction in LLMs.
  • I’m also interested in talking to grantmakers for feedback on some projects I’d like to get funding for.
  • I'm slowly working on a guide for practical research productivity for alignment researchers to tackle low-hanging fruits that can quickly improve productivity in the field. I'd like feedback from people with solid track records and productivity coaches.

TYPES OF PEOPLE I'D LIKE TO COLLABORATE WITH

  • Strong math background, can understand Influence Functions enough to extend the work.
  • Strong machine learning engineering background. Can run ML experiments and fine-tuning runs with ease. Can effectively create data pipelines.
  • Strong application development background. I have various project ideas that could speed up alignment researchers; I'd be able to execute them much faster if I had someone to help me build my ideas fast. 

OpenAI CEO Sam Altman has privately said the company could become a benefit corporation akin to rivals Anthropic and xAI.

Paywalled: https://www.theinformation.com/articles/openai-ceo-says-company-could-become-benefit-corporation-akin-to-rivals-anthropic-xai

"Sam Altman recently told some shareholders that OAI is considering changing its governance structure to a for-profit business that OAI's nonprofit board doesn't control. [...] could open the door to public offering of OAI; may give Altman an opportunity to take a stake in OAI."

Yes, but this is similar to usual startups, it’s a calculated bet you are making. So you expect some of the people to try this will fail, but investors hope one of them will be a unicorn.

We had a similar thought:

But yeah, my initial comment was about how to take advantage of nationalization if it does happen in the way Leopold described/implied.

You can do the writing, but if you have a useful product and connect with those who are within the agencies, you are in a position where you have built a team and infrastructure for several years with the purpose of getting pulled into the nationalization project. You likely get most of the value by just keeping close ties with others within government while also have built a ready-to-use solution that can prevent the government from rushing out a worse version of what you’ve built.

I think it’s important to see AI Safety as a collective effort rather than one person’s decision (of working inside or out of government).

I agree that this would be impactful! I'm mostly thinking about a more holistic approach that assumes you'd have reasonable to 'the right people' in those government positions. Similar to the current status quo where you have governance people and technical people filling in the different gaps.

If anyone would like to discuss this privately, please message me. I'm considering whether to build a startup that tackles the kinds of things I describe above (e.g., monitoring), so I would love to get feedback.

I had this thought yesterday: "If someone believes in the 'AGI lab nationalization by default' story, then what would it look like to build an organization or startup in preparation for this scenario?"

For example, you try to develop projects that would work exceptionally well in a 'nationalization by default' world while not getting as much payoff if you are in a non-nationalization world. The goal here is to do the normal startup thing: risky bets with a potentially huge upside.

I don't necessarily support nationalization and am still trying to think through the upsides/downsides, but I was wondering if there are worlds where some safety projects become much more viable in such a world, the kind of things we used to brush off because we assumed we weren't in such a world.

Some vague directions to perhaps consider: AI security, Control agenda-type stuff, fully centralized monitoring to detect early signs of model drift, inaccessible compute unless you are a government-verified user, etc. By building such tech or proposals, you may be much more likely to end up with a seat at the big boy table, whereas you wouldn't have in a non-nationalization world. I could be wrong about the specific examples above, but just want to provide some quick examples.

saying "LLMs are not capable to solve ARC, therefore, they are less intelligent than children" is equivalent to saying "humans can't take square root of 819381293787, therefore, they are less intelligent than calculator". 

Of course, I acknowledge that LLMs are better at many tasks than children. Those tasks just happen to all be within its training data distribution and not on things that are outside of it. So, no, you wouldn't say the calculator is more intelligent than the child, but you might say that it has an internal program that allows it to be faster and more accurate than a child. LLMs have such programs they can use via pattern-matching too, as long as it falls into the training data distribution (in the case of Caesar cypher, apparently it doesn't do so well for number nine – because it's simply less common in its training data distribution).

One thing that Chollet does mention that helps to alleviate the limitation of deep learning is to have some form of active inference:

Dwarkesh: Jack Cole with a 240 million parameter model got 35% [on ARC]. Doesn't that suggest that they're on this spectrum that clearly exists within humans, and they're going to be saturated pretty soon?

[...]

Chollet: One thing that's really critical to making the model work at all is test time fine-tuning. By the way, that's something that's really missing from LLM approaches right now. Most of the time when you're using an LLM, it's just doing static inference. The model is frozen. You're just prompting it and getting an answer. The model is not actually learning anything on the fly. Its state is not adapting to the task at hand.

What Jack Cole is actually doing is that for every test problem, it’s on-the-fly fine-tuning a version of the LLM for that task. That's really what's unlocking performance. If you don't do that, you get like 1-2%, something completely negligible. If you do test time fine-tuning and you add a bunch of tricks on top, then you end up with interesting performance numbers.

What it's doing is trying to address one of the key limitations of LLMs today: the lack of active inference. It's actually adding active inference to LLMs. That's working extremely well, actually. So that's fascinating to me.

As Chollet says in the podcast, we will see if multimodal models crack ARC in the next year, but I think researchers should start paying attention rather than dismissing if they are incapable of doing so in the next year.

But for now, “LLMs do fine with processing ARC-like data by simply fine-tuning an LLM on subsets of the task and then testing it on small variation.” It encodes solution programs just fine for tasks it has seen before. It doesn’t seem to be an issue of parsing the input or figuring out the program. For ARC, you need to synthesize a new solution program on the fly for each new task.

Load More