AI labs should be dedicating a lot more effort into using AI for cybersecurity as a way to prevent weights or insights from being stolen. Would be good for safety and it seems like it could be a pretty big cash cow too.
If they have access to the best models (or specialized), it may be highly beneficial for them to plug them in immediately to help with cybersecurity (perhaps even including noticing suspicious activity from employees).
I don’t know much about cybersecurity so I’d be curious to hear from someone who does.
This is amazing, thanks! I'm happy people are setting up new places to absorb potential funding given the overton window shift.
If I'm applying to multiple funds and receive a funding from one of the other funds first, what should I do? I will list what I'd do with additional funding, but is there someone you would like me to email if I get funding from elsewhere first?
I spoke to Altman about a month ago. He essentially said some of the following:
In a shortform last month, I wrote the following:
...There has been some insider disc
This was also a reason why I thought it might be valuable to scrape the alignment content: https://www.lesswrong.com/posts/FgjcHiWvADgsocE34/a-descriptive-not-prescriptive-overview-of-current-ai.
I figured we might want to use that dataset as a base for removing the data from the dataset.
I recently sent in some grant proposals to continue working on my independent alignment research. It gives an overview of what I'd like to work on for this next year (and more really). If you want to have a look at the full doc, send me a DM. If you'd like to help out through funding or contributing to the projects, please let me know.
Here's the summary introduction:
12-month salary for building a language model system for accelerating alignment research and upskilling (additional funding will be used to create an organization), and studying how to&nbs...
I'm still in some sort of transitory phase where I'm deciding where I'd like to live long term. I moved to Montreal, Canada lately because I figured I'd try working as an independent researcher here and see if I can get MILA/Bengio to do some things for reducing x-risk.
Not long after I moved here, Hinton started talking about AI risk too, and he's in Toronto which is not too far from Montreal. I'm trying to figure out the best way I could leverage Canada's heavyweights and government to make progress on reducing AI risk, but it seems like there's a lot mor...
Cyborgism (especially in a recent alignment agenda) is sometimes used more narrowly to mean “ using AI (primarily pretrained GPT models) to augment human cognition". However in this workshop we intentionally do not restrict the term to language model cooperation and also include uses associated with the term “cyborg”.
Less talked about, but there have been some discussions about what we call "Hard Cyborgism" in the Cyborgism agenda and I remember we were hypothesizing different approaches to use tech like VR, TTS/STT, BCI, etc. sometime last fall.
I looked i...
I agree with the main points made in the post, though I want to recognize there is some difficulty that comes with predicting which aspects will drive capability advances. I think there is value in reading papers (something that more alignment researchers should probably do) because it can give us hints at the next capability leaps. Over time, I think it can improve our intuition for what lies ahead and allows us to better predict the order of capability advances. This is how I’ve felt as I’ve been pursuing the Accelerating Alignment agenda (language model...
I gave talk about my Accelerating Alignment with LLMs agenda about 1 month ago (which is basically a decade in AI tools time). Part of the agenda covered (publicly) here.
I will maybe write an actual post about the agenda soon, but would love to have some people who are willing to look over it. If you are interested, send me a message.
Well yes, but he’s also one of the main guys who brought the field to this point so this feels a little different. That said, I’m not saying he has an obligation, just that some people might have hoped for more after seeing him go public with this.
We now have a channel on the EleutherAI discord server called ai-supervisors. If you’d like to help with this agenda, please go there!
In the channel, Quintin shared a quick overview of the two projects we mentioned in this post. I’m sharing it below two provide some clarity on what we are working towards at the moment:
This agenda has two projects as its current focuses.
Project 1: Unsupervised behavioral evaluation
This project focuses on scalable ways to compare the behavioral tendencies of different LMs (or different ways of prompting the same LM), without...
Based on this interview, it doesn’t seem like Hinton is interested in doing a lot more for reducing AI risk: https://youtu.be/rLG68k2blOc?t=3378
It sounds like he wanted to sound the alarm as best he could with his credibility and will likely continue to do interviews, but says he’ll be spending his time “watching netflix, hanging around with his kids, and trying to study his forward-forward algorithm some more”.
Maybe he was downplaying his plans because he wants to keep them quiet for now, but this was a little sad even though his credibility applied to discussing AI risk concerns is certainly already an amazing thing for us to have gotten.
Edit: oops, I thought you were responding to my other recent comment on building an alignment research system.
Stampy.ai and AlignmentSearch (https://www.lesswrong.com/posts/bGn9ZjeuJCg7HkKBj/introducing-alignmentsearch-an-ai-alignment-informed) are both a lot more introductory than what I am aiming for. I’m aiming for something to greatly accelerate my research workflow as well as other alignment researchers. It will be designed to be useful for fresh researchers, but yeah the aim is more about producing research rather than learning about AI risk.
Someone should create a “AI risk arguments” flowchart that serves as a base for simulating a conversation with skeptics or the general public. Maybe a set of flashcards to go along with it.
I want to have the sequence of arguments solid enough in my head so that I can reply concisely (snappy) if I ever end up in a debate, roundtable or on the news. I’ve started collecting some stuff since I figured I should take initiative on it.
Working on a new grant proposal right now. Should be sent this weekend. If you’d like to give feedback or have a look, please send me a DM! Otherwise, I can send the grant proposal to whoever wants to have a look once it is done (still debating about posting it on LW).
Outside of that, there has been a lot of progress on the Cyborgism discord (there is a VSCode plugin called Worldspider that connects to the various APIs, and there has been more progress on Loom). Most of my focus has gone towards looking at the big picture and keeping an eye on all the deve...
Just a small tangent with respect to:
...Here's a relevant statement that Rohin made-- I think it's from a few years ago though so it might be outdated:
I would guess that AI systems will become more interpretable in the future, as they start using the features / concepts / abstractions that humans are using. Eventually, sufficiently intelligent AI systems will probably find even better concepts that are alien to us, but if we only consider AI systems that are (say) 10x more intelligent than us, they will probably still be using human-understandable concepts. T
Wrote a Twitter thread here for a shorter explanation of the agenda: https://twitter.com/jacquesthibs/status/1652389982005338112?s=46&t=YyfxSdhuFYbTafD4D1cE9A.
This reminds me of what Evan said here: https://www.lesswrong.com/posts/uqAdqrvxqGqeBHjTP/towards-understanding-based-safety-evaluations
...My concern is that, in such a situation, being able to robustly evaluate the safety of a model could be a more difficult problem than finding training processes that robustly produce safe models. For some discussion of why I think checking for deceptive alignment might be harder than avoiding it, see here and here. Put simply: checking for deception in a model requires going up against a highly capable adversary that is
Yeah, so just to clarify a few things:
I’m collaborating on a new research agenda. Here’s a potential insight about future capability improvements:
There has been some insider discussion (and Sam Altman has said) that scaling has started running into some difficulties. Specifically, GPT-4 has gained a wider breath of knowledge, but has not significantly improved in any one domain. This might mean that future AI systems may gain their capabilities from places other than scaling because of the diminishing returns from scaling. This could mean that to become “superintelligent”, the AI needs to run ...
Small shortform to say that I’m a little sad I haven’t posted as much as I would like to in recent months because of infohazard reasons. I’m still working on Accelerating Alignment with LLMs and eventually would like to hire some software engineer builders that are sufficiently alignment-pilled.
Regarding thinking about what to do in the endgame:
Having a bunch of practice at thinking about AI alignment in principle, which might be really useful for answering difficult-to-empirically-resolve questions about the AIs being trained.
...Being well-prepared to use AI cognitive labor to do something useful, by knowing a lot about some research topic that we end up wanting to put lots of AI labor into. Maybe you could call this “preparing to be a research lead for a research group made up of AIs”. Or “preparing to be good at consuming AI research labor”.
Right, you are saying evolution doesn't provide evidence for AI capabilities generalizing further than alignment, but then only consider the fast takeoff part of the SLT to be the concern. I know you have stated reasons why alignment would generalize further than capabilities, but do you not think an SLT-like scenario could occur in the two capability jump scenarios you listed?
Here’s my takeaway:
There are mechanistic reasons for humanity’s “Sharp Left Turn” with respect to evolution. Humans were bottlenecked by knowledge transfer between new generations, and the cultural revolution allowed us to share our lifetime learnings with the next generation instead of waiting on the slow process of natural selection.
Current AI development is not bottlenecked in the same way and, therefore, is highly unlikely to get a sharp left turn for the same reason. Ultimately, evolution analogies can lead to bad unconscious assumptions with no rigor...
Text-to-Speech tool I use for reading more LW posts and papers
I use Voice Dream Reader. It's great even though the TTS voice is still robotic. For papers, there's a feature that let's you skip citations so the reading is more fluid.
I've mentioned it before, but I was just reminded that I should share it here because I just realized that if you load the LW post with "Save to Voice Dream", it will also save the comments so I can get TTS of the comments as well. Usually these tools only include the post, but that's annoying because there's a lot of good stuff...
Note on using ChatGPT for learning
Indeed! When I looked into model editing stuff with the end goal of “retargeting the search”, the finickiness and break down of internal computations was the thing that eventually updated me away from continuing to pursue this. I haven’t read these maze posts in detail yet, but the fact that the internal computations don’t ruin the network is surprising and makes me think about spending time again in this direction.
I’d like to eventually think of similar experiments to run with language models. You could have a language model learn how to solve a text adventure game, and try to edit the model in similar ways as these posts, for example.
Edit: just realized that the next post might be with GPT-2. Exciting!
Jeff Bezos has now followed Eliezer on Twitter: https://twitter.com/bigtechalert/status/1641659849539833856?s=46&t=YyfxSdhuFYbTafD4D1cE9A
Of course it’s often all over the place. I only shared the links because I wanted to make sure people weren’t deluding themselves with only positive comments.
To try and burst any bubble about people’s reaction to the article, here’s a set of tweets critical about the article:
I’m still thinking this through, but I am deeply concerned about Eliezer’s new article for a combination of reasons:
In the end, I expect this will just alienate people. And stuff like this concerns me.
I think it’s possible that the most memetically power...
So I think what I'm getting here is that you have an object-level disagreement (not as convinced about doom), but you are also reinforcing that object-level disagreement with signalling/reputational considerations (this will just alienate people). This pattern feels ugh and worries me. It seems highly important to separate the question of what's true from the reputational question. It furthermore seems highly important to separate arguments about what makes sense to say publicly on-your-world-model vs on-Eliezer's-model. In particular, it is unclear to me ...
Even if Eliezer doesn’t think the objections hold up to scrutiny, I think it would still be highly valuable to the wider community for him to share his perspective on them. It feels pretty obvious to me he won’t think they hold up to the scrutiny, but sharing his disagreement would be helpful for the community.
The sixth and final post will focus on tips for how to conduct good research and navigate the research landscape.
Is there anything I can do to help with this post? I'm still figuring out these things, but I want to help get this out there.
I’ve been working towards this direction for a while. Though what I’m imagining is a lot more elaborate. If anyone would like to help out, send me a DM and I can invite you to a discord server where we talk about this stuff. Please let me know who you are and what you do if you do DM me.
I wrote some brief notes about it in the Accelerating Alignment section here: https://www.lesswrong.com/posts/jXjeYYPXipAtA2zmj/jacquesthibs-s-shortform?commentId=iLJDjBQBwFod7tjfz
And cover some of the philosophy in the beginning of this post: https://www.lesswrong.com/post...
It’s a common thing that people who want to learn efficiently come across. I cover some of my thoughts on efficient learning in this shortform thread: https://www.lesswrong.com/posts/jXjeYYPXipAtA2zmj/jacquesthibs-s-shortform?commentId=hQmoiHnf4q8z8H59r.
I think people who want to learn efficiently (learning what you need in less time) in relation to their specific goal, they should watch the following videos (all by the same guy):
TLDR:
Flas...
Here’s Quintin Pope’s answer from the Twitter thread I posted (https://twitter.com/quintinpope5/status/1633148039622959104?s=46&t=YyfxSdhuFYbTafD4D1cE9A):
1.1 How do we make there be more convergence?
How do we minimize semantic drift in LMs when we train them to do other stuff? (If you RL them to program good, how to make sure their English continues to describe their programs well?)
How well do alignment techniques generalize across capabilities advances? Id AI
Thanks for writing this post, John! I'll comment since this is one of the directions I am exploring (released an alignment text dataset, published a survey for feedback on tools for alignment research, and have been ruminating on these ideas for a while).
...Thus, my current main advice for people hoping to build AI tools for boosting alignment research: go work on the object-level research you’re trying to boost for a while. Once you have a decent amount of domain expertise, once you have made any progress at all (and therefore have any first-hand idea of wha
I’m wondering if it were at all possible to run an experiment where we have a Twitch stream of several well-known people[1] (Eliezer, Rob Miles, etc.) who would like to join and they play this game. It would advertise the game a lot for people who want to play and people could learn about AI risk/alignment while watching something enjoyable.
Now that people are starting to wake up to some of the risks, I feel like we should capitalize on opportunities like these as long as it’s well-thought-out. I think it could definitely flop, but not sure what the best o...
A frame for thinking about takeoff
One error people can make when thinking about takeoff speeds is assuming that because we are in a world with some gradual takeoff, it now means we are in a "slow takeoff" world. I think this can lead us to make some mistakes in our strategy. I usually prefer thinking in the following frame: “is there any point in the future where we’ll have a step function that prevents us from doing slow takeoff-like interventions for preventing x-risk?”
In other words, we should be careful to assume that some "slow takeoff" doesn't have a...
I think it would be great if alignment researchers read more papers
But really, you don't even need to read the entire paper. Here's a reminder to consciously force yourself to at least read the abstract. Sometimes I catch myself running away from reading an abstract of a paper even though it is very little text. Over time I've just been forcing myself to at least read the abstract. A lot of times you can get most of the update you need just by reading the abstract. Try your best to make it automatic to do the same.
To read more papers, consider using Semant...
Two other projects I would find interesting to work on:
Perfect, thanks!