jacquesthibs

I work primarily on AI Alignment. Scroll down to my pinned Shortform for an idea of my current work and who I'd like to collaborate with.

Website: https://jacquesthibodeau.com

Twitter: https://twitter.com/JacquesThibs

GitHub: https://github.com/JayThibs 

LinkedIn: https://www.linkedin.com/in/jacques-thibodeau/ 

Sequences

On Becoming a Great Alignment Researcher (Efficiently)

Wiki Contributions

Comments

Sorted by

I shared the following as a bio for EAG Bay Area 2024. I'm sharing this here if it reaches someone who wants to chat or collaborate.

Hey! I'm Jacques. I'm an independent technical alignment researcher with a background in physics and experience in government (social innovation, strategic foresight, mental health and energy regulation). Link to Swapcard profile. Twitter/X.

CURRENT WORK

  • Collaborating with Quintin Pope on our Supervising AIs Improving AIs agenda (making automated AI science safe and controllable). The current project involves a new method allowing unsupervised model behaviour evaluations. Our agenda.
  • I'm a research lead in the AI Safety Camp for a project on stable reflectivity (testing models for metacognitive capabilities that impact future training/alignment).
  • Accelerating Alignment: augmenting alignment researchers using AI systems. A relevant talk I gave. Relevant survey post.
  • Other research that currently interests me: multi-polar AI worlds (and how that impacts post-deployment model behaviour), understanding-based interpretability, improving evals, designing safer training setups, interpretable architectures, and limits of current approaches (what would a new paradigm that addresses these limitations look like?).
  • Used to focus more on model editing, rethinking interpretability, causal scrubbing, etc.

TOPICS TO CHAT ABOUT

  • How do you expect AGI/ASI to actually develop (so we can align our research accordingly)? Will scale plateau? I'd like to get feedback on some of my thoughts on this.
  • How can we connect the dots between different approaches? For example, connecting the dots between Influence Functions, Evaluations, Probes (detecting truthful direction), Function/Task Vectors, and Representation Engineering to see if they can work together to give us a better picture than the sum of their parts.
  • Debate over which agenda actually contributes to solving the core AI x-risk problems.
  • What if the pendulum swings in the other direction, and we never get the benefits of safe AGI? Is open source really as bad as people make it out to be?
  • How can we make something like the d/acc vision (by Vitalik Buterin) happen?
  • How can we design a system that leverages AI to speed up progress on alignment? What would you value the most?
  • What kinds of orgs are missing in the space?

POTENTIAL COLLABORATIONS

  • Examples of projects I'd be interested in: extending either the Weak-to-Strong Generalization paper or the Sleeper Agents paper, understanding the impacts of synthetic data on LLM training, working on ELK-like research for LLMs, experiments on influence functions (studying the base model and its SFT, RLHF, iterative training counterparts; I heard that Anthropic is releasing code for this "soon") or studying the interpolation/extrapolation distinction in LLMs.
  • I’m also interested in talking to grantmakers for feedback on some projects I’d like to get funding for.
  • I'm slowly working on a guide for practical research productivity for alignment researchers to tackle low-hanging fruits that can quickly improve productivity in the field. I'd like feedback from people with solid track records and productivity coaches.

TYPES OF PEOPLE I'D LIKE TO COLLABORATE WITH

  • Strong math background, can understand Influence Functions enough to extend the work.
  • Strong machine learning engineering background. Can run ML experiments and fine-tuning runs with ease. Can effectively create data pipelines.
  • Strong application development background. I have various project ideas that could speed up alignment researchers; I'd be able to execute them much faster if I had someone to help me build my ideas fast. 

Agreed, but I will find a way.

Hey Ben and Jesse!

This comment is more of a PSA:

I am building a startup focused on making this kind of thing exceptionally easy for AI safety researchers. I’ve been working as an AI safety researcher for a few years. I’ve been building an initial prototype and I am in the process of integrating it easily into AI research workflows. So, with respect to this post, I’ve been actively working towards building a prototype for the “AI research fleets”.

I am actively looking for a CTO I can build with to +10x alignment research in the next 2 years. I’m looking for someone absolutely cracked and it’s fine if they already have a job (I’ll give my pitch and let them decide).

If that’s you or you know anyone who could fill that role (or who I could talk to that might know), then please let me know!

For alignment researchers or people in AI safety research orgs: hit me up if you want to be pinged for beta testing when things are ready.

For orgs, I’d be happy to work with you to setup automations or give a masterclass on the latest AI tools/automation workflows and maybe provide a custom report (with a video overview) each month so that you can focus on research rather than trying new tools that might not be relevant to your org.

Additional context:

“When we say “automating alignment research,” we mean a mix of Sakana AI’s AI scientist (specialized for alignment), Transluce’s work on using AI agents for alignment research, test-time compute scaling, and research into using LLMs for coming up with novel AI safety ideas. This kind of work includes empirical alignment (interpretability, unlearning, evals) and conceptual alignment research (agent foundations).

We believe that it is now the right time to take on this project and build this startup because we are nearing the point where AIs could automate parts of research and may be able to do so sooner with the right infrastructure, data, etc.

We intend to study how our organization’s work can integrate with the Safeguarded AI thesis by Davidad.”

I’m currently in London for the month as part of the Catalyze Impact programme.

If interested, send me a message on LessWrong or X or email (thibo.jacques @ gmail dot com).

It has basically significantly accelerated my ability to build fully functional websites very quickly. To the point where it was basically a phase transition between me building my org’s website and not building it (waiting for someone with web dev experience to do it for me).

I started my website by leveraging the free codebase template he provides on his github and covers in the course.

I mean that it's a trade secret for what I'm personally building, and I would also rather people don't just use it freely for advancing frontier capabilities research.

Is this because it would reveal private/trade-secret information, or is this for another reason?

Yes (all of the above)

Thanks for amplifying. I disagree with Thane on some things they said in that comment, and I don't want to get into the details publicly, but I will say:

  1. it's worth looking at DeepSeek V3 and what they did with a $5.6 million training run (obviously that is still a nontrivial amount / CEO actively says most of the cost of their training runs is coming from research talent),
  2. compute is still a bottleneck (and why I'm looking to build an ai safety org to efficiently absorb funding/compute for this), but I think Thane is not acknowledging that some types of research require much more compute than others (tho I agree research taste matters, which is also why DeepSeek's CEO hires for cracked researchers, but don't think it's an insurmountable wall),
  3. "Simultaneously integrating several disjunctive incremental improvements into one SotA training run is likely nontrivial/impossible in the general case." Yes, seems really hard and a bottleneck...for humans and current AIs.
    1. imo, AI models will become Omega Cracked at infra and hyper-optimizing training/inference to keep costs down soon enough (which seems to be what DeepSeek is especially insanely good at)

Putting venues aside, I'd like to build software (like AI-aided) to make it easier for the physics post-docs to onboard to the field and focus on the 'core problems' in ways that prevent recoil as much as possible. One worry I have with 'automated alignment'-type things is that it similarly succumbs to the streetlight effect due to models and researchers having biases towards the types of problems you mention. By default, the models will also likely just be much better at prosaic-style safety than they will be at the 'core problems'. I would like to instead design software that makes it easier to direct their cognitive labour towards the core problems.

I have many thoughts/ideas about this, but I was wondering if anything comes to mind for you beyond 'dedicated venues' and maybe writing about it.

Hey Logan, thanks for writing this!

We talked about this recently, but for others reading this: given that I'm working on building an org focused on this kind of work and wrote a relevant shortform lately, I wanted to ping anybody reading this to send me a DM if you are interested in either making this happen (looking for a cracked CTO atm and will be entering phase 2 of Catalyze Impact in January) or provide feedback to an internal vision doc.

As a side note, I’m in the process of building an organization (leaning startup). I will be in London in January for phase 2 of the Catalyze Impact program (incubation program for new AI safety orgs). Looking for feedback on a vision doc and still looking for a cracked CTO to co-found with. If you’d like to help out in whichever way, send a DM!

Load More