My top interest is AI safety, followed by reinforcement learning. My professional background is in software engineering, computer science, machine learning. I have degrees in electrical engineering, liberal arts, and public policy. I currently live in the Washington, DC metro area; before that, I lived in Berkeley for about five years.
I want to be clear: Lots of terrifyingly smart people made this mistake, including some of the smartest scientists who ever lived. Many of them made this mistake for a decade or more before wising up or giving up.
Imagine this. Imagine a future world where gradient-driven optimization never achieves aligned AI. But there is success of a different kind. At great cost, ASI arrives. Humanity ends. In his few remaining days, a scholar with the pen name of Rete reflects back on the 80s approach (i.e. using deterministic rules and explicit knowledge) with the words: "The technology wasn't there yet; it didn't work commercially. But they were onto something -- at the very least, their approach was probably compatible with provably safe intelligence. Under other circumstances, perhaps it would have played a more influential role in promoting human thriving."
We might soon build something a lot smarter than us.
Logically speaking, "soon" is not required for your argument. With regards to persuasiveness, saying "soon" is double-edged: those who think soon applies will feel more urgency, but the rest are given an excuse to tune out. A thought experiment by Stuart Russell goes like this: if we know with high confidence a meteor will smack Earth in 50 years, what should we do? This is an easy call. Prepare, starting now.
The right time to worry about a potentially serious problem for humanity depends not just on when the problem will occur but also on how long it will take to prepare and implement a solution. For example, if we were to detect a large asteroid on course to collide with Earth in 2069, would we wait until 2068 to start working on a solution? Far from it! There would be a worldwide emergency project to develop the means to counter the threat, because we can’t say in advance how much time is needed. - Book Review: Human Compatible at Slate Star Codex
In Chapter 6 of his 2019 book, Human Compatible: Artificial Intelligence and the Problem of Control, Russell lists various objections to what one would hope would be well underway as of 2019:
"The implications of introducing a second intelligent species onto Earth are far-reaching enough to deserve hard thinking.” So ended The Economist magazine’s review of Nick Bostrom’s Superintelligence. Most would interpret this as a classic example of British understatement. Surely, you might think, the great minds of today already doing this hard thinking—engaging in serious debate, weighing up the risks and benefits, seeking solutions, ferreting out loopholes in solutions, and so on. Not yet, as far as I am aware.
The 50-year-away meteor is discussed by Human Compatible on page 151 as well as in
I'll share a stylized story inspired from my racing days of many years ago. Imagine you are a competitive amateur racing road cyclist. After years of consistent training racing in the rain, wind, heat, and snow, you are ready for your biggest race of the year, a 60-something mile hilly road race. Having completed your warm-up, caffeination, and urination rituals, you line up with your team and look around. Having agreed you are the leader for the day (you are best suited to win given the course and relative fitness), you chat about the course, wind, rivals, feed zones, and so on.
Seventy or so brightly-colored participants surround you, all optimized to convert calories into rotational energy. You feel the camaraderie among this traveling band of weekend warriors. Lots of determination and shaved legs. You notice but don't worry about the calves carved out of wood since they are relatively weak predictors of victory. Speaking of...
You hazard one of twenty some-odd contenders is likely to win. It will involve lots of fitness and some blend of grit, awareness, timing, team support, adaptability, and luck. You estimate you are in the top twenty based on previous events. So from the outside viewpoint, you estimate your chances of winning are low, roughly 5%.
What's that? ... You hear three out-of-state semi-pro mountain bikers are in the house. Teammates too. They have decided to "use this race for training". Lovely. You update your P(win) to ~3%. Does this 2% drop bother you? It is actually a 40% decrease. For a moment maybe but not for long. What about the absolute probability? Does a 3% chance demotivate you? Hell no. A low chance of winning will not lower your level of effort.
You remember that time trial from the year before where your heart rate was pegged at your threshold for something like forty minutes. The heart-rate monitor reading was higher than you expected, but your body indicated it was doable. At the same time, your exertion level was right on the edge of unsustainable. Saying "it's all mental" is cliché, but in that case, it was close enough to the truth. So you engaged in some helpful self-talk (a.k.a. repeating of the mantra "I'm not going to crack") for the last twenty minutes. There was no voodoo nor divine intervention; it was just one way to focus the mind to steer the body in a narrow performance band.
You can do that again, you think. You assume a mental state of "I'm going to win this" as a conviction, a way of enhancing your performance without changing your epistemic understanding.
How are you doing to win? You don't know exactly. You can say this: you will harness your energy and abilities. You review your plan. Remain calm, pay attention, conserve energy until the key moments, trust your team to help, play to your strengths, and when the time is right, take a calculated risk. You have some possible scenarios in mind; get in a small breakaway, cooperate, get ready for cat-and-mouse at the end, maybe open your sprint from 800 meters or farther. (You know from past experiences your chances of winning a pack sprint are very low.)
Are we starting soon? Some people are twitchy. Lots of cycling computer beeping and heart-rate monitor fiddling. Ready to burst some capillaries? Ready to drop the hammer? Turn the pedals in anger? Yes! ... and no. You wait some more. This is taking a while. (Is that person really peeing down their leg? Caffeine intake is not an exact science apparently. Better now than when the pistons are motion.)
After a seemingly endless sequence of moments, a whistle blows. The start! The clicking of shoes engaging with pedals. Leading to ... not much ... the race starts with a neutral roll out. Slower than anyone wants. But the suspense builds ... until maybe a few minutes later ... a cacophony of shifting of gears ... and the first surge begins. This hurts. Getting dropped at the beginning is game-over. Even trading position to save energy is unwise right now -- you have to be able to see the front of the pack until things calm down a bit. You give whatever level of energy is needed, right now.
One gets the feeling that the authors are just raising their hands and saying something like: “Look, we are doomed, and there’s no realistic way we’re getting out of this short of doing stuff we are not going to do. These proposals are the necessary consequence of accepting what is stated in the preceding chapters, so that’s that”.
First, personally, I can understand how some people could have this response, and I empathize -- this topic is heavy. However, with some time and emotional distance, one can see the above interpretation doesn't correspond with key passages from the book, such as this one from Chapter 13:
Anytime someone tells you that the Earth could not possibly manage to do anything as difficult as restricting AI research, they are really claiming to know that countries will never care. They are asserting that countries and their leaders could not possibly come to care even 1 percent as much as they cared to fight World War II.
Also, the last chapter (14, "Where There's Life, There's Hope") centers around people facing the truth of their circumstances and rising to the occasion. Here is a key quote:
Humanity averted nuclear war because people who understood that the world was on track for destruction worked hard to change tracks.
Chapter 14 concludes by offering tailored suggestions to civil servants, elected officials, political leaders, journalists, or "the rest of us". The authors are decidedly not sending up the white flag; instead, they are stating what needs to be done and advocating for it.
I composed this poem by myself (except for the final 4 lines):
## Homeward Bound
wandering around they go
eating grass
to and fro
etiquette gives them a pass
sometimes ducking
head under ass
sometimes waddling palmatedly
strutting with sass
and pooping belatedly
with an impact so vast
if only they could be steered
like a ship by mast
into a loving home though weird
by tempting grain cast
show me, caring shepherd,
how to usher them from the splattered urban sprawl
so they follow me undemurred
to a place quiet streams and safety call
Sonnet 4.5 tells me:
The practical answer: you almost certainly can't and shouldn't try to relocate them yourself. Legal reality: In most jurisdictions, waterfowl (especially Canada geese) are protected under migratory bird laws. Moving them without permits is illegal. Even if they're a nuisance, relocation requires wildlife management authorities.
Biological reality: Urban waterfowl are often there because:
- They've habituated to human environments (easy food, few predators)
- They may be resident populations, not migratory
- Their home range is the urban area
If you actually want to help:
- Contact local wildlife authorities - They have legal authority and expertise. Some areas have humane relocation programs for problem populations.
- Habitat modification - Making the current area less attractive is more effective than chasing them elsewhere:
- Stop feeding (yours and others')
- Install barriers to nesting sites
- Use visual deterrents during establishment season
- Accept the reality - Urban waterfowl populations are generally stable because cities provide resources. The "splattered urban sprawl" is their ecological niche now.
The herding question: Yes, you can move them short distances with patience - walk slowly behind them, use your body/arms to guide direction, they'll generally move away from pressure. But they'll return. Geese have strong site fidelity.
Probability assessment: ~95% chance any DIY relocation effort results in them returning within days, possibly with legal consequences for you. The caring shepherd move is usually to let wildlife professionals handle it, or to modify the environment rather than the birds.
I wanted to highlight the Trustworthy Systems Group at School of Computer Science and Engineering of UNSW Sydney and two of their projects, seL4 and LionsOS.
We research techniques for the design, implementation and verification of secure and performant real-world computer systems. / Our techniques provide the highest possible degree of assurance—the certainty of mathematical proof—while being cost-competitive with traditional low- to medium-assurance systems.
seL4 is both the world's most highly assured and the world's fastest operating system kernel. Its uniqueness lies in the formal mathematical proof that it behaves exactly as specified, enforcing strong security boundaries for applications running on top of it while maintaining the high performance that deployed systems need.
seL4 is grounded in research breakthroughs across multiple science disciplines. These breakthroughs have been recognised by international acclaimed awards, from the MIT Technology Review Award, to the ACM Hall of Fame Award, the ACM Software Systems Award, the DARPA Game changer award, and more.
They are building a modular operating system called LionsOS:
We are designing a set of system services, each made up of simple building blocks that make best use of the underlying seL4 kernel and its features, while achieving superior performance. The building blocks are simple enough for automated verification tools (SMT solvers) to prove their implementation correctness. We are furthermore employing model checkers to verify key properties of the interaction protocols between components.
Core to this approach are simple, clean and lean designs that can be well optimised, use seL4 to the best effect, and provide templates for proper use and extension of functionality. Achieving this without sacrificing performance, while keeping the verification task simple, poses significant systems research challenges.
It seems like a catastrophic civilizational failure that we don't have confident common knowledge of how colds spread.
Given the context above (posted by bhauth), the problem seems intrinsically hard. What would make this a civilizational failure? To my eye, that label would be warranted if either:
in alternative timelines with the same physics and biological complexity, other civilizations sometimes figured out transmission. If the success rate is under some threshold (maybe 1%), it suggests variation in civilization isn’t enough to handle the intrinsic complexity. (This option could be summarized as “grading on a multiverse curve”.)
deaths from the common cold (cc) met the criteria of “catastrophic”. The cc costs lives, happiness, and productivity, yes, but relative to other diseases, the “catastrophic” label seems off-target. (This option is analogous to comparing against other risks.)
Are you positing an associative or causal connection from increased intelligence to decreased listening ability or motivation to those who are deemed less intelligent? This is a complex area; I agree with statisticians who promote putting one’s causal model front and center.
A browser doesn't count as a significant business integration?
It seems to me the author's meaning of business integration is pretty clear: "profitable workplace integrations, such as a serious integration with Microsoft office suite or the google workplace suite." -- integrations tailored to a particular company's offerings. A browser is not business-specific nor workplace-specific.
I don't usually venture into LessWrong culture or "feels" issues, but the above comment seems out of place to me. As written, the comment seems more like something I would see on Hacker News, where someone fires off a quick quip or a zinger. I prefer to see more explanation here on LW. I'm in some ways a HN refugee, having seen too many low-effort ping-pong sniping contests.
I am seeking definitions of key foundational concepts in this paper (cognitive pattern, context, influence, selection, motivations) with (a) something as close to formal precision as possible while (b) attempting a minimal word count. This might be asking a lot, but I think it can be done, and I think it is important. I suggest using a very basic foundation: the basic terminology of artificial neural networks (ANNs): neurons, weights, activations, etc. Let the difficulty arise in putting the ideas together, not in confusion about the definitions themselves. If there is ambiguity or variation in how these terms apply, I think it would make sense to lock-in some particulars so the definitions can be tightened up. (Walk before you run.) Even better if these definitions themselves are diagrammed and connected visually (perhaps with something like an ontology diagram).
I'd appreciate any efforts in this direction, thanks! I've started a draft myself, but I want to have some properly uninterrupted time to iterate on it before sharing.
Why do I ask this? Personally, I find this article hard to parse due to definitional reasons.