This is great! I agree with a lot of what you're saying here and am glad someone is writing these ideas up. Two points of possible disagreement (or misunderstanding on my part perhaps) are:
Highly competitive, pragmatic, no-nonsense culture
I think that competitiveness can certainly be helpful, but too much can be detrimental. Specifically, I think competitiveness needs to go hand-in-hand with cooperation and transparency. Work needs to be shared, and projects need to encompass groups of people. Trying to get the most 'effective research points' from amongst... (read more)
I'm potentially interested in the Research Engineer position on the Alignment Team, but I'm currently 3 months into a 6 month grant from LTFF to reorient my career from general machine learning to AI safety specifically. My current plan is to keep doing solo work () until the last month of my grant period then begin applying to AI safety work at places like Anthropic, Redwood Research, Open AI, and Deepmind.
Do you think there's a significant advantage to applying soon vs 3 months from now?
Seems to me that what CLIP needs is a secondary training regime, where it has an image generated as a 2d render of a 3d scene that is generated from a simulator which can also generate a correct and an several incorrect captions. Like: red vase on a brown table (correct), blue vase on a brown table (incorrect), red vase on a green table (incorrect), red vase under a brown table (incorrect). Then do the CLIP training with the text set deliberately including the inappropriate text samples in addition to the usual random incorrect caption samples. I saw... (read more)
I understand much better now what you were saying. Thanks for clarifying.
My hope for my personal work is that nibbling away at the mystery with 'prosaic engineering work ' will make the problem clearer such that profound insights will be easier to generate. I think in science generally it is a good heuristic to follow that when no clear theory exists, and gathering more data is an option, then go gather the data. Also, use engineering to build better tools with which to gather new data.
Question for clarification: I'm confused by what you mean when you say that the null hypothesis is always false. My understanding is that the null hypothesis is generally "these two sets of data were actually drawn from the same distribution", and the hypothesis is generally "these two sets of data are drawn from different distributions, because the variable of interest which differs between them is in the casual path of a distributional change". Do you have a different way of viewing the null hypothesis in mind? (I totally agree with your conclusion that likelihood ratios should be presented asking with p values, and also ideally other details about the data also.)
That's been tried a few times with some mixed successes and some outright failures. Something that hasn't been thoroughly tried (to my knowledge) is actually making a distributed governance system within / on-top-of existing governments. We can't subtract laws or taxes or currencies, but we can add them. What if we tried making a distributed government with its own currency and governance system and some additional laws. It's a way to test out some theories in a low-risk, low-startup-cost scenario, and could be fun. If the government starts out with a small budget per member, and limited responsibilities, it could be pretty low-pressure to get right on the first try.
Well, I don't agree with the underlying argument, but to try to steelman it I'd say something like: Markets are pretty efficient at causing economic growth, causing us to 'grow the pie' (more total absolute wealth for everyone, as opposed to 'sharing the pie' which refers to distributing wealth more evenly), and moving resources around to where they are most needed to allow large complex profit-seeking systems to function. All those little decisions made by selfish semi-rational actors add up to much better decision-making about profitable uses of resource... (read more)
Epistocracy as described by Jason Brennan is similar to things I've been thinking about. Also sortition, which is related but different. Here are some of my thoughts:
If we have a system of 'extrapolating volition' of segments of the population by taking representative samples of people and paying them generously to learn and think and debate on a key issue for a couple of weeks, then give their best full answer as to what should be done... And then also quiz them on their factual understanding of relevant systems, and register their predictions about outco... (read more)
Others have mentioned Jason Brennan's work, but I wanted to specifically add this comment he made about practical initial trials of a first step towards epistocracy in his effective altruism AMA.
... (read more)
Are there any small-scale, experiments with Epistocracy you think that countries or other jurisdictions should try as a first stab at testing this form of government? What would you like to see and where?
Reply
7
I'd like to try enlightened preference voting in Denmark or New Hampshire.
How it works:
1. Everyone votes for their p
As someone who was recently the main person doing machine learning research & development in a 'data driven company', I can confirm that we were working as hard as we could to replace human decision making with machine learning models on every level of the business that we could. It worked better, made more money, more reliably, with fewer hours of human work input. Over the years I was there we gradually scaled down the workforce of human decision makers and scaled up the applications of ML and each step along that path was clearly a profit win. Money... (read more)
You get most of your learning from experiences? I sure don't. I get most of mine from reading, and I expect an AGI even close to human-level will also be able to learn from the logical abstractions of the books it reads. I think what you're saying would be true if we agreed to not train AI models on text, but only on things like toy physical models. But currently, we're feeding in tons of data meant to educated humans about the world, like textbooks on every subject and scientific papers, and all of wikipedia, and personal stories from Reddit and... everyt... (read more)
I think that by the time we have a moderately-superhuman agentive unaligned AGI, we're in deep trouble. I think it's more interesting and useful to stay focused on the time leading up to that which we must cross through to get to the deep trouble.
Particularly, I am hopeful that there will be some level of sub-human AGI (either sub-human intelligence or sub-human-speed) which tries some of the things we predict might happen, like a deceptive turn, but underestimates us, and we catch it in time. Or we notice the first crazy weird bad thing happen, and ... (read more)
I think any AI sufficiently good at world modeling to be much of a threat is also going to be quite good at learning from abstractions made from others. In other words, it can just read a history book to figure out about political change, it doesn't need years of 'personal' experience. It's pretty standard to give models access to training data encompassing a ton of detail about the world nowadays. Just because they currently don't grok the physics textbooks, medical papers, and wikipedia entries doesn't mean that that limitation will remain true.
Yeah, I think the idea needs some careful thought to flesh it out a bit more. If voting is not anonymous and scheduled, then it's too easy for coercion and such to enter the picture. On the plus side, once you do have something solid enough to test, you can start with local elections. For instance, look at the Center for Election Science and their work on converting local elections to approval voting.
I think I am more optimistic than most about the idea that experimental evidence and iterative engineering of solutions will become increasingly effective as we approach AGI-level tech. I place most of my hope for the future on a sprint of productivity around alignment near the critical juncture. I think it makes sense that this transition period between theory and experiment feels really scary to the theorists like Eliezer. To me, a much more experimentally minded person it feels like a mix of exciting tractability and looming danger. To put it poetically... (read more)
I think I am more optimistic than most about the idea that experimental evidence and iterative engineering of solutions will become increasingly effective as we approach AGI-level tech. I place most of my hope for the future on a sprint of productivity around alignment near the critical juncture. I think it makes sense that this transition period between theory and experiment feels really scary to the theorists like Eliezer. To me, a much more experimentally minded person it feels like a mix of exciting tractability and looming danger.
To put it poetically,... (read more)
The call to world leaders to update their conceptions of the strategic landscape based on a game-changing new technology seems incredibly relevant to the impending development of AGI. And also, in a sadly retro way, relevant to the resurgence of nuclear world war risk from Russia's recent aggression.
I think this is a nice idea, especially if it's just a karma-sorting effect, and it's either author-only or implements the author-can-undo system mentioned by Viliam. But actually, I came here to say that I'm even more excited for a feature which is nearly the opposite: I want comments to be able to be tagged by the author as 'ToDo'. This would let readers know that the author has seen the comment and intends to respond, and the author could have a 'ToDo' page which listed their marked Todo flagged comments either chronologically or karmically. You could even allow readers to up/down vote the ToDo flag to help the author prioritize their responses.
Ok, I want to say thank you for this comment because it contains a lot of points I strongly agree with. I think the alignment community needs experimental data now more than it needs more theory. However, I don't think this lowers my opinion of MIRI. MIRI, and Eliezer before MIRI even existed yet, was predicting this problem accurately and convincingly enough that people like myself updated. 15 years ago I began studying neuroscience, neuromorphic computing, and machine learning because I believed this was going to become a much bigger deal than it was the... (read more)
I'm really excited for the outcomes you describe: more relentlessly resourceful independent researchers exploring a wider range of options. I do feel a bit concerned that your search for good applicants is up against a challenge. I think that both the intelligence necessary to produce good results and the personality trait of agentiveness such that they can become relentlessly resourceful with training are rare and largely determined early in life. And I think that, given this, a lot of such people will already be quite absorbed in profitable paths by the ... (read more)
Ah, gotcha. I predict yes, with quite high confidence (like 95%), for 12 OOMs and using the Blue Brain Project. The others I place only small confidence in (maybe 5% each). I really think the BBP has enough detail in its model to make something very like a human neocortex, and capable of being an AGI, if scaled up.
I wrote this specifically aimed at the case of "in the thought experiment where humanity got 12 orders of magnitude more compute this year, what would happen in the next 12 months?" I liked the post that Daniel wrote about that and wanted to expand on it. My claim is that even if everything that was mentioned in that post was tried and failed, that there would still be these things to try. They are algorithms which already exist, which could be scaled up if we suddenly had an absurd amount of compute. Not all arguments about why standard approaches like Transformers fail also apply to these alternate approaches.
I don't think they are better representations of general intelligence. I'm quite confident that much better representations of general intelligence exist and just have yet to be discovered. I'm just saying that these are closer to a proven path, and although they are inefficient and unwise, somebody would likely follow these paths if suddenly given huge amounts of compute this year. And in that imaginary scenario, I predict they'd be pretty effective.
My reasoning for saying this for the Blue Brain Project is that I've read a lot of their research papers, a... (read more)
In a similar conversation about non-main-actor paths to dangerous AI I came up with this as an example of a path I can imagine being plausible and dangerous: A plausible-to-me worst case scenario would be something like:
A phone-scam organization employs someone to build them a online-learning reinforcement learning agent (using an open-source language model as a language-understanding-component) that functions as a scam-helper. It takes in the live transcription of the ongoing conversation between a scammer and a victim, and gives the scammer suggestions f... (read more)
I was using flp as an abbreviation. And I'll read Joe Carlsmith's report and then let you know what I think.
edit: Oh yeah, and one thing to keep in mind is these are estimates for if we suddenly had a shockingly big jump in amount of compute (12 orders of magnitude) but no time to develop or improve existing algorithms. So my estimates for 'what could a reasonably well engineered algorithm, that had been tested and iterated on at scale, do?' would be much much lower. This is stupidly wasteful upper bound.
Yes, I'll fix that link [edit: fixed]. I have not yet thought hard about failure modes and probabilities for these cases. I can work on that and let you know what I come up with.
Yes, thanks for catching that.
edit: fixed
I think spiking neural nets are at least 1, probably more like 2 OOMs more compute intensive to train, similarly effective, somewhat more efficient at learning from data. I think Numenta is probably even harder to train and even more data efficient. I can certainly test these hypotheses at small scale. I'll let you know what I find.
Blue Brain does actually have a human brain model waiting in the wings, it just tries to avoid mentioning that. A media-image management thing. I spent the day digging into your question about OOMs, and now have much more refined estimates. Here's my post: https://www.lesswrong.com/posts/5Ae8rcYjWAe6zfdQs/what-more-compute-does-for-brain-like-models-response-to
Ok, as a former neuroscientist who has spent a lot of years (albeit not recent ones) geeking out about, downloading, and playing with various neural models, I'd like to add to this discussion. First, the worm stuff seems overly detailed and focused on recreating the exact behavior rather than 'sorta kinda working like a brain should'. A closer, more interesting project to look at (but still too overly specific) is the Blue Brain project [ https://www.epfl.ch/research/domains/bluebrain/ ]. Could that work with 12 more OOMs of compute? I feel quite confident... (read more)
Absolutely. In fact, I think the critical impediment to machine learning being able to learn more useful things from the current amassed neuroscience knowledge is: "but which of these many complicated bits are even worth including in the model?" There's just too much, and so much is noise, or incompletely understood such that our models of it are incomplete enough to be worse-than-useless.
Since my goal is to convince people that I take my beliefs seriously, and this amount of money is not actually going to change much about how I conduct the next three years of my life, I'm not worried about the details. Also, I'm not betting that there will be a FOOM scenario by the conclusion of the bet, just that we'll have made frightening progress towards one.
Oh man, I spent so many years of grad school looking up these acronyms and handwriting their full names into the papers I was reading, until I memorized enough of them. The acronym soup is silly, and the formal paper language which ends up obscuring the true confidence level of the observations. So much overstatement of limited evidence... Still, I like this stuff. One thing I'd like to add is that when working with complicated interaction mechanisms like this that aren't fully known, I find it super helpful to run computer simulations of competing hypothe... (read more)
I just expect much better outcomes from the path of close collaboration. To me it seems like I am a "nuclear power plant safety engineer", and they are the "nuclear power plant core criticality engineers". Preventing the power plant from getting built would mean it wasn't built unsafely, but since I believe that would leave a void in which someone else would be strongly inclined to build their version... I just see the best likelihood of good outcome in the paths near 'we work together to build the safest version we can'.
Nice specific breakdown! Sounds like you side with the authors overall. Want to also make the 3:1 bet with me?
My thought, as a researcher who is pretty good at roughshod programming but not so good at rock-solid tested-everything programming, is that programming/engineering is big. Focusing on a specific aspect that is needed and also interesting to you might be advantageous, like supercomputing / running big spark clusters or security / cryptography.
I think the framing of "convince leading AI researchers to willingly work more closely with AI alignment researchers, and think about the problem more themselves" is the better goal. I don't think hampering them generally is particularly useful/effective, and I don't think convincing them entirely to "AGI is very scary" is likely either.
Ok, I take your bet for 2030. I win, you give me $1000. You win, I give you $3000. Want to propose an arbiter? (since someone else also took the bet, I'll get just half the bet, their $500 vs my $1500)
Shouldn't it be: 'They pay you $1,000 now, and in 3 years, you pay them back plus $3,000' (as per Bryan Caplan's discussion in the latest 80k podcast episode)? The money won't do anyone much good if they receive in it a FOOM scenario.
That's already what TPUs do, basically
Trying to think about this from more of a practical machine learning standpoint, but without full understanding of all the points you made...
I think in the case where X, Y, and S are all partially but not fully known, and you have to choose model M so as to minimize regret over time. Two things occur to me as possibly useful strategies for choosing M. You might find the opportunity to run 'experiments', to choose suboptimal output R at an early timestep such that from then on you'd have a better understanding of S, and be able to better refine M for ... (read more)
About 15 years ago, before I'd started professionally studying and doing machine learning research and development, my timeline had most of its probability mass around 60 - 90 years from then. This was based on my neuroscience studies and thinking about how long I thought it would take to build a sufficiently accurate emulation of the human brain to be functional. About 8 years ago, studying machine learning full time, AlphaGo coming out was inspiration for me to carefully rethink my position, and I realized there were a fair number of shortcuts off my lon... (read more)
I recall something similar about a robot hand trained in varying simulations. I remember an OpenAI project not a Deepmind one... Here's the link to the OpenAI environment-varying learner: https://openai.com/blog/learning-dexterity/
I think there's something we could do even beyond choosing the best of existing data points to study. I think we could create data-generators, which could fill out missing domains of data using logical extrapolations. I think your example is a great type of problem for such an approach.
Well, I don't think that focusing on the most famous slightly-ahead organizations is actually all that useful. I'd expect that the next-best-in-line would just step forward. Impeding data centers around the world would likely be more generally helpful. But realistically for an individual, trying to be helpful to the AI safety community in a non-direct-work way is probably your best bet at contributing.
Looking at a couple hundred of these plots for the MLP neurons, I see two obvious patterns which can occur alone or together/overlapping. One pattern is a 'vertical group', a narrow band that runs through many layers. The other pattern is a 'horizontal group', which is lots of involvement within one layer.
Now I'm generating and looking at plots which do both the MLP neurons and the attention heads.
I have a friend who has been working on a team doing automatic factual responses to search queries. I'll send him the link to this article and maybe he'll have some thoughts...
https://www.lesswrong.com/posts/nxLHgG5SCKCxM9oJX/progress-report-2
Now that I got a grant from the Long Term Future Fund and quit my job to do interpretability research full time, I'm actually making progress on some of my ideas!
Thanks Rohin. I also feel that interviewing after my 3 more months of independent work is probably the correct call.