Strongly upvoted for the clear write-up, thank you for that, and engagement with a potentially neglected issue.
Following your post I'd distinguish two issues:
(a) Lack of data privacy enabling a powerful future agent to target/manipulate you personally, because your data is just there for the taking, stored in not-so-well-protected databases, cross-reference is easier at higher capability levels, singling you out and fine-tuning a behavioral model on you in particular isn't hard ;
(b) Lack of data privacy enabling a powerful future agent to build that generi...
No need to apologize, I'm usually late as well!
I don't think there is a great answer to "What is the most comprehensive repository of resources on the work being done in AI Safety?"
There is no great answer, but I am compelled to list some of the few I know of (that I wanted to update my Resources post with) :
Answers in order: there is none, there were, there are none yet.
(Context starts, feel free to skip, this is the first time I can share this story)
After posting this, I was contacted by Richard Mallah, who (if memory serves right) created the map, compiled the references and wrote most of the text in 2017, to help with the next iteration of the map. The goal was to build a Body of Knowledge for AI Safety, including AGI topics but also more current-capabilities ML Safety methods.
This was going to happen in conjunction with the contributions of many academic ...
But let's suppose that the first team of people who build a superintelligence first decide not to turn the machine on and immediately surrender our future to it. Suppose they recognize the danger and decide not to press "run" until they have solved alignment.
The section ends here but... isn't there a paragraph missing? I was expecting the standard continuation along the lines of "Will the second team make the same decision, once they reach the same capability? Will the third, or the fourth?" and so on.
Thank you for this post, I find this distinction very useful and would like to see more of it. Has the talk been recorded, by any chance (or will you give it again)?
Thank you, that's was my understanding. Looking forward to the second competition! And, good luck sorting out all the submissions for this one.
[Meta comment]
The deadline is past, should we keep the submissions coming or is it too late? Some of the best arguments I could find elsewhere are rather long, in the vein of the Superintelligence FAQ. I did not want to copy-paste chunks of it and the arguments stand better as part of a longer format.
Anyway, signalling that the lack of money incentive will not stop me from trying to generate more compelling arguments... but I'd rather do it in French instead of posting here (I'm currently working on some video scripts on AI alignment, there's not enough French content of that type).
(Policymakers) We have a good idea of what make bridges safe, through physics, materials science and rigorous testing. We can anticipate the conditions they'll operate in.
The very point of powerful AI systems is to operate in complex environments better than we can anticipate. Computer science can offer no guarantees if we don't even know what to check. Safety measures aren't catching up quickly enough.
We are somehow tolerating the mistakes of current AI systems. Nothing's ready for the next scale-up.
(ML researchers) We still don't have a robust solution to specification gaming: powerful agents find ways to get high reward, but not in the way you'd want. Sure, you can tweak your objective, add rules, but this doesn't solve the core problem, that your agent doesn't seek what you want, only a rough operational translation.
What would a high-fidelity translation would look like? How would create a system that doesn't try to game you?
(Policymakers) There is outrage right now about AI systems amplifying discrimination and polarizing discourse. Consider that this was discovered after they were widely deployed. We still don't know how to make them fair. This isn't even much of a priority.
Those are the visible, current failures. Given current trajectories and lack of foresight of AI research, more severe failures will happen in more critical situations, without us knowing how to prevent them. With better priorities, this need not happen.
(Tech execs) "Don’t ask if artificial intelligence is good or fair, ask how it shifts power". As a corollary, if your AI system is powerful enough to bypass human intervention, it surely won't be fair, nor good.
(ML researchers) Most policies are unsafe in a large enough search space; have you designed yours well, or are you optimizing through a minefield?
(Policymakers) AI systems are very much unlike humans. AI research isn't trying to replicate the human brain; the goal is, however, to be better than humans at certain tasks. For the AI industry, better means cheaper, faster, more precise, more reliable. A plane flies faster than birds, we don't care if it needs more fuel. Some properties are important (here, speed), some aren't (here, consumption).
When developing current AI systems, we're focusing on speed and precision, and we don't care about unintended outcomes. This isn't an issue for most systems: a ...
(Tech execs) Tax optimization is indeed optimization under the constraints of the tax code. People aren't just stumbling on loopholes, they're actually seeking them, not for the thrill of it, but because money is a strong incentive.
Consider now AI systems, built to maximize a given indicator, seeking whatever strategy is best, following your rules. They will get very creative with them, not for the thrill of it, but because it wins.
Good faith rules and heuristics are no match for adverse optimization.
(ML researchers) Powerful agents are able to search through a wide range of actions. The more efficient the search, the better the actions, the higher the rewards. So we are building agents that are searching in bigger and bigger spaces.
For a classic pathfinding algorithm, some paths are suboptimal, but all of them are safe, because they follow the map. For a self-driving car, some paths are suboptimal, but some are unsafe. There is no guarantee that the optimal path is safe, because we really don't know how to tell what is safe or not, yet.
A more efficient search isn't a safer search!
(Policymakers) The goals and rules we're putting into machines are law to them. What we're doing right now is making them really good at following the letter of this law, but not the spirit.
Whatever we really mean by those rules, is lost on the machine. Our ethics don't translate well. Therein lies the danger: competent, obedient, blind, just following the rules.
Thank you for curating this, I had missed this one and it does provide a useful model of trying to point to particular concepts.
Hi! Thank you for this project, I'll attempt to fill the survey.
My apologies if you already encountered the following extra sources I think are relevant to this post:
Hi! Thank you for this outline. I would like some extra details on the following points:
Congratulations on your launch!
As Michaël Trazzi in the other post, I'm interested in the kind of products you'll develop, but more specifically in how the for-profit part interacts with both the conceptual research part and the incubator part. Are you expecting the latter two to yield new products as they make progress? Do these activities have different enough near-term goals that they mostly just coexist within Conjecture?
(also, looking forward to the pluralism sequence, this sounds great)
Thank you for this, I resonate with this a lot. I have written an essay about this process, a while ago: Always go full autocomplete. One of its conclusions:
It cannot be trained by expecting perfection from the start. It's trained by going full autocomplete, and reflecting on the result, not by dreaming up what the result could be. Now I wrote all that, I have evidence that it works.
The compression idea evokes Kaj Sotala's summary/analysis of the AI-Foom Debate (which I found quite useful at the time). I support the idea, especially given it has taken a while for the participants to settle on things cruxy enough to discuss and so on. Though I would also be interested in "look, these two disagree on that, but look at all the very fundamental things about AI alignment they agree on".
I finished reading all the conversations a few hours ago. I have no follow-up questions (except maybe "now what?"), I'm still updating from all those words.
One except in particular, from the latest post, jumped at me (from Eliezer Yudkowsky, emphasis mine):
...This is not aimed particularly at you, but I hope the reader may understand something of why Eliezer Yudkowsky goes about sounding so gloomy all the time about other people's prospects for noticing what will kill them, by themselves, without Eliezer constantly hovering over their shoulder every minute pr
So, assuming an unaligned agent here.
If your agent isn't aware that its compute cycles are limited (i.e. the compute constraint is part of the math problem), then you have three cases: (1a) the agent doesn't hit the limit with its standard search, you're in luck; (1b) the problem is difficult enough that the agent runs its standards search but fails to find a solution in the allocated cycles, so it always fails, but safely. (1c) you tweak the agent to be more compute-efficient, which is very costly and might not work, in practice if you're in case 1b and i...
I am confused by the problem statement. What you're asking for is a generic tool, something that doesn't need information about the world to be created, but that I can then feed information about the real world and it will become very useful.
My problem is that the real world is rich, and feeding the tool with all relevant information will be expensive, and the more complicated the math problem is, the more safety issues you get.
I cannot rely on "don't worry if the Task AI is not aligned, we'll just feed it harmless problems", the risk comes from what the A...
“Knowledge,” said the Alchemist, “is harder to transmit than anyone appreciates. One can write down the structure of a certain arch, or the tactical considerations behind a certain strategy. But above those are higher skills, skills we cannot name or appreciate. Caesar could glance at a battlefield and know precisely which lines were reliable and which were about to break. Vitruvius could see a great basilica in his mind’s eye, every wall and column snapping into place. We call this wisdom. It is not unteachable, but neither can it be taught. Do you understand?”
Quoted from Ars Longa, Vita Brevis.
I second Charlie Steiner's questions, and add my own: why collaboration? A nice property of an (aligned) AGI would be that we could defer activities to it... I would even say that the full extent of "do what we want" at superhuman level would encompass pretty much everything we care about (assuming, again, alignment).
Hi! Thank you for writing this and suggesting solutions. I have a number of points to discuss. Apologies in advance for all the references to Arbital, it's a really nice resource.
The AI will hack the system and produce outputs that it's not theoretically meant to be able to produce at all.
In the first paragraphs following this, you describe this first kind of misalignment as an engineering problem, where you try to guarantee that the instructions that are run on the hardware correspond exactly to the code you are running; being robust from hardware tamperi...
All my hopes for this new subscription model! The use of NFTs for posts will, without a doubt, ensure that quality writing remains forever in the Blockchain (it's like the Cloud, but with better structure). Typos included.
Is there a plan to invest in old posts' NFTs that will be minted from the archive? I figure Habryka already holds them all, and selling vintage Sequences NFT to the highest bidder could be a nice addition to LessWrong's finances (imagine the added value of having a complete set of posts!)
Also, in the event that this model doesn't pan out, will the exclusive posts be released for free? It would be an excruciating loss for the community to have those insights sealed off.
My familiarity with the topic gives me enough confidence to join this challenge!
I hope this makes the case at least somewhat that these events are important, even if you don’t care at all about the specific politics involved.
I would argue that the specific politics inherent in these events are exactly why I don't want to approach them. From the outside, the mix of corporate politics, reputation management, culture war (even the boring part), all of which belong in the giant near-opaque system that is Google, is a distraction from the underlying (indeed important) AI governance problems.
For that particular series of events, I already g...
My gratitude for the already posted suggestions (keep them coming!) - I'm looking forward to work on the reviews. My personal motivation resonates a lot with the help people navigate the field part; in-depth reviews are a precious resource for this task.
This is one of the rare times I can in good faith use the prefix "as a parent...", so thank you for the opportunity.
So, as a parent, lots of good ideas here. Some I couldn't implement in time, some that are very dependent on living conditions (finding space for the trampoline is a bit difficult at the moment), some that are nice reminders (swamp water, bad indeed), some that are too early (because they can't read yet)...
... but most importantly, some that genuinely blindsided me, because I found myself agreeing with them, and they were outside my thought p...
I think not mixing up the referents is the hard part. One can properly learn from fictional territory when they can clearly see in which ways it's a good representation of reality, and where it's not.
I may learn from an action movie the value of grit and what it feels like to have principles, but I wouldn't trust them on gun safety or CPR.
It's not common for fiction to be self-consistent enough and preserve drama. Acceptable breaks from reality will happen, and sure, sometimes you may have a hard SF universe were the alternate reality is very lawful and th...
Thank you for this clear and well-argued piece.
From my reading, I consider three main features of AWSs in order to evaluate the risk they present:
To clarify the question, would a good distiller be one (or more) of:
Based on the level of articles in Distill I wouldn't expect producers of introductory material to fit your definition, but if advanced material counts, I'd nominate Adrian Colyer for Computer Science (I'll put this in a proper answer with extra names based on your reply).
I was indeed wondering about it as I just read your first comment :D
For extra convenience you could even comment again with your alt account (wait, which is the main? Which is the alt? Does it matter?)
The original comment seems to have been edited to a sharper statement (thanks, D0TheMath), I hope it's enough to clear up things.
I agree this qualifier pattern is harmful, in the context of collective action problems, when mutual trust and commitment has to be more firmly established. I don't believe we're in that context, hence my comment.
I interpret the quoted statement as "I am willing to make an effort that I don't usually do, by commenting more, based on your assessment of the importance of giving feedback", assuming good faith.
There's an uncertainty, of course, as whether it will actually turn out important. "I can try" suggests they will try even if they don't know, and we won't know if they will succeed until they try.
Yes, you can interpret the statement in an uncharitable way with respect to their goodwill, but this is not what is, in my opinion, conducive to healthy comment sections in general.
We discussed the topic of feedback with Adam. I approve of this challenge and will attempt to comment on at least half of all new posts from today to the end of November. Eventually renewing it if it works well.
I've been meaning to get out of mostly-lurking mode for months now, and this is as good of an opportunity as it gets.
I also want to mention the effect of "this comment could be a post", which can help people "upgrade" from commenting, to shortform or longform, if they feel (like me), that there's some quality bar to clear to feel comfortable posting...
I have to admit having read some of your essays, found them very interesting, and yet found the prospect of diving into the rest daunting enough to put the idea somewhere on my to-read pile.
I applaud your book writing and will gladly read the final version, as I'll perceive it at a more coherent chunk of content to go through, instead of a collection of posts, even if the quality of the writing is high for both. The medium itself, to me, has its importance.
It's also easier to recommend « this excellent book by Samo Burja » than « this excellent collection ...
Thank you for the import.
Once again, the Progress Bar shall advance. It will probably be slower this time. No matter: I shall contribute.
The Mindcrime tag might be relevant here! More specific than both concepts you mentioned, though. Which posts discussing them were you alluding to? Might be an opportunity to create an extra tag.
(also, yes, this in an Open Thread, your comment is in the right place)