I work primarily on AI Alignment. My main direction at the moment is to accelerate alignment work via language models and interpretability.
Website: https://jacquesthibodeau.com
Twitter: https://twitter.com/JacquesThibs
GitHub: https://github.com/JayThibs
AI labs should be dedicating a lot more effort into using AI for cybersecurity as a way to prevent weights or insights from being stolen. Would be good for safety and it seems like it could be a pretty big cash cow too.
If they have access to the best models (or specialized), it may be highly beneficial for them to plug them in immediately to help with cybersecurity (perhaps even including noticing suspicious activity from employees).
I don’t know much about cybersecurity so I’d be curious to hear from someone who does.
This is amazing, thanks! I'm happy people are setting up new places to absorb potential funding given the overton window shift.
If I'm applying to multiple funds and receive a funding from one of the other funds first, what should I do? I will list what I'd do with additional funding, but is there someone you would like me to email if I get funding from elsewhere first?
I spoke to Altman about a month ago. He essentially said some of the following:
In a shortform last month, I wrote the following:
There has been some insider discussion (and Sam Altman has said) that scaling has started running into some difficulties. Specifically, GPT-4 has gained a wider breath of knowledge, but has not significantly improved in any one domain. This might mean that future AI systems may gain their capabilities from places other than scaling because of the diminishing returns from scaling. This could mean that to become “superintelligent”, the AI needs to run experiments and learn from the outcome of those experiments to gain more superintelligent capabilities.
So you can imagine the case where capabilities come from some form of active/continual/online learning, but that was only possible once models were scaled up enough to gain capabilities in that way. And so that as LLMs become more capable, they will essentially become capable of running their own experiments to gain alphafold-like capabilities across many domains.
Of course, this has implications for understanding takeoffs / sharp left turns.
As Max H said, I think once you meet a threshold with a universal interface like a language model, things start to open up and the game changes.
This was also a reason why I thought it might be valuable to scrape the alignment content: https://www.lesswrong.com/posts/FgjcHiWvADgsocE34/a-descriptive-not-prescriptive-overview-of-current-ai.
I figured we might want to use that dataset as a base for removing the data from the dataset.
I recently sent in some grant proposals to continue working on my independent alignment research. It gives an overview of what I'd like to work on for this next year (and more really). If you want to have a look at the full doc, send me a DM. If you'd like to help out through funding or contributing to the projects, please let me know.
Here's the summary introduction:
12-month salary for building a language model system for accelerating alignment research and upskilling (additional funding will be used to create an organization), and studying how to supervise AIs that are improving AIs to ensure stable alignment.
As part of my Accelerating Alignment agenda, I aim to create the best Alignment Research Assistant using a suite of language models (LLMs) to help researchers (like myself) quickly produce better alignment research through an LLM system. The system will be designed to serve as the foundation for the ambitious goal of increasing alignment productivity by 10-100x during crunch time (in the year leading up to existentially dangerous AGI). The goal is to significantly augment current alignment researchers while also providing a system for new researchers to quickly get up to speed on alignment research or promising parts they haven’t engaged with much.
For Supervising AIs Improving AIs, this research agenda focuses on ensuring stable alignment when AIs self-train or train new AIs and studies how AIs may drift through iterative training. We aim to develop methods to ensure automated science processes remain safe and controllable. This form of AI improvement focuses more on data-driven improvements than architectural or scale-driven ones.
I’m seeking funding to continue my work as an independent alignment researcher and intend to work on what I’ve just described. However, to best achieve the project’s goal, I would want additional funding to scale up the efforts for Accelerating Alignment to develop a better system faster with the help of engineers so that I can focus on the meta-level and vision for that agenda. This would allow me to spread myself less thin and focus on my comparative advantages. If you would like to hop on a call to discuss this funding proposal in more detail, please message me. I am open to refocusing the proposal or extending the funding.
I'm still in some sort of transitory phase where I'm deciding where I'd like to live long term. I moved to Montreal, Canada lately because I figured I'd try working as an independent researcher here and see if I can get MILA/Bengio to do some things for reducing x-risk.
Not long after I moved here, Hinton started talking about AI risk too, and he's in Toronto which is not too far from Montreal. I'm trying to figure out the best way I could leverage Canada's heavyweights and government to make progress on reducing AI risk, but it seems like there's a lot more opportunity than there was before.
This area is also not too far from Boston and NYC, which have a few alignment researchers of their own. It's barely a day's drive away. For me personally, there's the added benefit that it is also just a day's drive away from my home (where my parents live).
Montreal/Toronto is also a nice time zone since you can still work a few hours with London people, and a few hours with Bay Area people.
That said, it's obvious that not many alignment researchers are here and eventually end up at one of the two main hubs.
When I spent time at both hubs last year, I think I preferred London. And now London is getting more attention than I was expecting:
It's not clear how things will evolve going forward, but I still have things to think about. If I decide to go to London, I can get a Youth Mobility visa for 2 years (I have 2 months to decide) and work independently...but I'm also considering building an org for Accelerating Alignment too and I'm not sure if I could get that setup in London.
I think there is value in being in person, but I think that value can fade over time as an independent researcher. You just end up in a routine, stop talking to as many people, and just work. That's why, for now, I'm trying to aim for some kind of hybrid where I spend ~2 months per year at the hubs to benefit from being there in person. And maybe 1-2 work retreats. Not sure what I'll do if I end up building an org.
Cyborgism (especially in a recent alignment agenda) is sometimes used more narrowly to mean “ using AI (primarily pretrained GPT models) to augment human cognition". However in this workshop we intentionally do not restrict the term to language model cooperation and also include uses associated with the term “cyborg”.
Less talked about, but there have been some discussions about what we call "Hard Cyborgism" in the Cyborgism agenda and I remember we were hypothesizing different approaches to use tech like VR, TTS/STT, BCI, etc. sometime last fall.
I looked into hard cyborgism very briefly several months ago and concluded that I wouldn't be able to make much progress on it given my expected timelines.
A friend of mine recommended this book to me: Silicon Dreams: Information, Man, and Machine by Robert Lucky. I haven't read it yet (though I have a PDF if you want it), but here's what he said about it:
It's a wonderful book not just because it's about the fundamental limits of HCI explored/bounded using information theory. It will help you develop an understanding of the Hard Cyborgism approach in a more rigorous way. He also asks directly if AI can help. In 1989. The dude's career was in compression codecs, so it's a rare Hutterpilled GOFAI book. He states the Hutter thesis before Hutter even.
(Note: I have not read this entire post yet.)
Projects I'd like to work on in 2023.
Wrote up a short (incomplete) bullet point list of the projects I'd like to work on in 2023: