‘‘Most people won't agree to kill themselves for 50 million dollars."Stuart Russell and Peter Norvig

Foreword

One of the fruits of growing older is revisiting your old favorites, whether they be foods, books, or songs. As you take your warm, rose-tinted mental bath, you appreciate subtleties which had escaped you, laugh heartily at jokes which hadn't even registered, and grasp concepts whose importance you had never fully realized. Similarly, some songs never lose their charm, no matter how abused the 'replay' button becomes. AI: AMA is a paean to the triumphs of Computer Science and a rallying cry towards those hills we have yet to climb (ahem).

Exposition

My process for the first 60% of this 1,052-page behemoth was fairly normal: an hour or two of studying per day over the course of a few weeks. The last bit went a little differently.

Whatever It Takes

Two days ago, my winter trimester ended; I had upwards of 400 pages to go, half of this post to write, and dozens of exercises to complete. I found myself with three full days to spare before my research resumed the proceeding week. I did what any young man in his early twenties would do - I concocted a plan for studying upwards of 30 hours in that time span.

I knew that this plan was preposterous; in all likelihood, I'd finish 3 chapters and burn out. That's why I wheeled out Murphyjitsu.

The institution of hourly brief exercise breaks. Cutting through the crisp, cold air at high speed got my heart beating and rekindled my passion for crushing the task at hand.

The construction of a narrative, with me as the protagonist, slowly increasing in capability, working diligently against the racing clock.

Yes, I basically turned myself into a trope.

No, I don't mind: I'm willing to take advantage of whatever psychological factors are at my disposal to help me succeed.

Yes, I listened to the Rocky soundtrack a few times. It was great.

Mental exhaustion (distinct from emotional / physical burnout in that your mind can't process concepts as efficiently as it could seven hours ago when you started) was prepared for by:

Regularly taking breaks to chill out, listen to music, and dance a bit. I'd also spend time dwelling on the pleasant anticipated result of having mastered the important material in the rest of the book, making sure to contrast it with what would happen if I didn't follow through.

I made sure to avoid activities that would suck me in.

As outlined above - regular exercise and narrative-building.

The pre-emptive purchase of all my favorite healthful foods; this was a great self-bribe. Having lots of delicious food on hand meant I always had something to look forward to for my breaks, and eating well meant that I didn't feel sluggish. I like to call this tactic "trivial conveniencing".

Maintaining my regular sleep schedule of 9:30 PM - 6:00 AM.

Exercises taking forever was prevented. I performed a quick time profile of my exercise completion and found that the majority of my time was spent rereading concepts which hadn't sunk in sufficiently during the chapter. By doing relevant questions immediately after each section, I decreased time spent by about 40% and increased retention.

Wanting to do other things was mitigated by setting aside an hour or two where I'd go to the dojo or spend time with friends. I also took a bit of time for myself each night, calling my family as usual.

The Outcome

Not only did I do it, but I finished a day early. The Murphyjitsu was invaluable; the failure modes I predicted came up and were dealt with by my precautions.

24 hours of studying over the last two days, and I enjoyed every moment.

AI: A Modern Approach

If I found a concept confusing, I'll explain it in both intuitive and technical terms. Any criticisms are not directed towards the authors; it is their job to distill the field.

1: Introduction

In which the authors define rationality (no, IEEE Computing Edge, books do not qualify as intelligent), provide an overview of related fields, and recount a brief history of AI.

2: Intelligent Agents

In which the authors broadly define agent frameworks, environmental attributes, and the nature of learning.

Wireheading Cameo

Notice that we said [agent performance is graded on] environment states, not agent states. If we define success in terms of agent's (sic) opinion of its own performance, an agent could achieve perfect rationality simply by deluding itself that its performance was perfect.

Of course, this division only works if there is a Cartesian boundary between that-which-grades and that-which-acts. Which there isn't.

Charming Philosophical Tangents

The notion of "clean floor"... is based on average cleanliness over time. Yet the same average cleanliness can be achieved by two different agents, one of which does a mediocre job all the time while the other cleans energetically but takes long breaks... Which is better - a reckless life of highs and lows, or a safe but humdrum existence? Which is better - an economy where everyone lives in moderate poverty, or one in which some live in plenty while others are very poor? We leave these questions as an exercise for the diligent reader.

I don't know if I can answer the first question without more information, but assuming a Rawlsian veil of ignorance and considering the well-documented logarithmic hedonic effects of wealth, universal moderate poverty would be preferable. I leave the proof as an exercise for the diligent reader (it should be immediate after having read Chapter 16).

3: Solving Problems by Searching

In which the authors teach us to find what we're looking for.

Admissibility and Consistency

Heuristics are functions which estimate distance to the goal. Let g(s) be the cost to reach s in the current path, let the path cost of reaching state s′ from s via action a be c(s,a,s′), and let h be a heuristic. The total distance function is then

f(s)=g(s)+h(s).

h is admissible if it never overestimates the distance remaining. Admissibility frees us from needing to exhaustively explore every path (for fear of h having hid a good solution from us through overestimation). The heuristic h is consistent (or, more helpfully, monotonic) if it obeys the triangle inequality:

h(s)≤c(s,a,s′)+h(s′).

What this means: imagine that we have some admissible heuristic ha (for example, straight-line distance on a map). An admissible but inconsistent heuristic would then be:

h′a(s)=hash(s) mod ha(s).

This is clearly admissible, but also inconsistent - every h′a(s) evaluation is some pseudo-random number between 0 and the true distance to the goal!

Claim. All consistent heuristics are admissible.

Proof. Let nk denote a state k actions from the goal, and let d be the true distance function.

Base case (n1):

h(n1)≤c(n1,a,n0)+h(n0) consistency≤c(n1,a,n0) definition of a heuristic=d(n1)definition of distance

Induction step (nk⇒nk+1):

The inductive hypothesis is that h(nk)≤d(nk). Then

h(nk+1)≤c(nk+1,a,nk)+h(nk) consistency≤c(nk+1,a,nk)+d(nk) inductive hypothesis=d(nk+1) definition of distance.

In which the authors introduce ways to search using local information.

And-Or

Applying And-Or search can seem tricky, but it's really not. When an agent is operating under partial observability (it isn't sure about the exact state of the world), it maintains a belief state (in this chapter, the set of all states the agent could be in). To be sure it will be in the goal state after following its plan, we whip out And-Or search: for each state we could be in now (∧), we need to find at least one solution (∨).

Sometimes the environment is nondeterministic (actions have uncertain effects). In this case, we use And-Or search to construct a contingency plan, which consists of a series of if-then-else statements. Here, we control our actions, but not their effects; we will then find at least one action (∨) such that we have a solution for each potential outcome (∧). Think of this as min-maxing against an adversarial world.

5: Adversarial Search

In which the authors demonstrate how to search when the world really is out to get you.

Pruning

I won't babble about αβ-pruning - just practice the concept here. For me, it was deceptively intuitive - I "got it" so "easily" that I neglected to follow the actual algorithm in a practice problem.

Patchwork Progress

I don't think it's a good idea to spend substantial time on quick fixes which slightly improve performance but don't scale in the limit. Elegant algorithms are often superior for reasons exceeding their aesthetic appeal.

Two examples of objectively-good but non-scalable fixes:

Quiescence search - sometimes, we have to stop evaluating a plan before the game is done; this can leave us in a deceptively good-looking state. Say I move my queen diagonal to your pawn and then I have to stop searching. A simple material evaluation function wouldn't see that the queen is about to get obliterated, so it returns a neutral score. Quiescence search considers the "stability" of a position and searches until it gets to quiescent states, providing a partial workaround for this problem. The search is constrained to certain types of moves.

Singular extension - we try historically-good moves when we reach the search's depth limit as a last-ditch effort to extend the search tree a bit further.

This is even more relevant to deep learning, where numerous engineering tricks are employed to eke out a slightly-improved classification accuracy. I agree that spending some effort working on local optimization of established methods is beneficial, but wouldn't it be higher expected utility to have more researchers studying the fundamentals and innovating new approaches?

6: Constraint Satisfaction Problems

In which the authors show us how to color maps real pretty.

Solving n-ary CSPs

A constraint satisfaction problem (CSP) is defined as follows:

X is a set of variables, {X1,...,Xn}.

D is a set of domains, {D1,...,Dn}.

C is a set of constraints that specify allowable combinations of variables.

Sounds intimidating, but it's really not. Let's say you have two variables: {X1,X2}. Each can be colored red or blue, so D={{red,blue},{red,blue}}. Pretend that each variable is a vertex and that they're connected by an edge; we want to 2-color this graph. Then C={X1≠X2}.

This gets tricky when we need to solve n-ary CSPs, which are those CSPs whose maximum constraint arity is n. Assume that the domains are discrete.

The main idea is that we want to break up these thickc constraints into mega-variables whose domains are exhaustive tuple enumerations of ways to satisfy the constraint. Then we just need to make sure that our chosen mega-solutions line up with the other mega- and normal variable assignments.

For each constraint Ck with variables Sk (|Sk|>2), make a new variable yk with domain

D′k={x∈∏Dj∈Domains(Sk)Dj:x satisfies Ck}.

In logical terms, D′k is the set of all models for Ck. For each new variable, instantiate binary constraints on the identities of the values of the shared variables.

Arc consistency can be viewed in a similar light; a variable Xi is arc-consistent with another variable Xj if for each value in Di, there exists at least once satisfactory assignment in

## Foreword

One of the fruits of growing older is revisiting your old favorites, whether they be foods, books, or songs. As you take your warm, rose-tinted mental bath, you appreciate subtleties which had escaped you, laugh heartily at jokes which hadn't even registered, and grasp concepts whose importance you had never fully realized. Similarly, some songs never lose their charm, no matter how abused the 'replay' button becomes.

AI: AMAis a paean to the triumphs of Computer Science and a rallying cry towards those hills we have yet to climb (ahem).## Exposition

My process for the first 60% of this 1,052-page behemoth was fairly normal: an hour or two of studying per day over the course of a few weeks. The last bit went a little differently.

## Whatever It Takes

Two days ago, my winter trimester ended; I had upwards of 400 pages to go, half of this post to write, and dozens of exercises to complete. I found myself with three full days to spare before my research resumed the proceeding week. I did what any young man in his early twenties would do - I concocted a plan for studying upwards of 30 hours in that time span.

I knew that this plan was preposterous; in all likelihood, I'd finish 3 chapters and burn out. That's why I wheeled out Murphyjitsu.

## Preparing for Failure Modes

Burnoutwas prepared for by:My ego does not deplete.I am 100% certain that I will succeed.Given enough time, I can understand anything.Mental exhaustion(distinct from emotional / physical burnout in that yourmindcan't process concepts as efficiently as it could seven hours ago when you started) was prepared for by:contrast itwith what would happen if I didn't follow through.Exercises taking foreverwas prevented. I performed a quick time profile of my exercise completion and found that the majority of my time was spent rereading concepts which hadn't sunk in sufficiently during the chapter. By doing relevant questions immediately after each section, I decreased time spent by about 40% and increased retention.Wanting to do other thingswas mitigated by setting aside an hour or two where I'd go to the dojo or spend time with friends. I also took a bit of time for myself each night, calling my family as usual.## The Outcome

Not only did I do it, but I finished a day early. The Murphyjitsu was invaluable; the failure modes I predicted came up and were dealt with by my precautions.

24 hours of studying over the last two days, and I enjoyed every moment.

## AI: A Modern Approach

If I found a concept confusing, I'll explain it in both intuitive and technical terms. Any criticisms are not directed towards the authors; it is their job to

distillthe field.## 1: Introduction

In which the authors define rationality (no, IEEE Computing Edge, books do not qualify as intelligent), provide an overview of related fields, and recount a brief history of AI.## 2: Intelligent Agents

In which the authors broadly define agent frameworks, environmental attributes, and the nature of learning.## Wireheading Cameo

Of course, this division only works if there is a Cartesian boundary between that-which-grades and that-which-acts. Which there isn't.

## Charming Philosophical Tangents

I don't know if I can answer the first question without more information, but assuming a Rawlsian veil of ignorance and considering the well-documented logarithmic hedonic effects of wealth, universal moderate poverty would be preferable. I leave the proof as an exercise for the diligent reader (it should be immediate after having read Chapter 16).

## 3: Solving Problems by Searching

In which the authors teach us to find what we're looking for.## Admissibility and Consistency

Heuristics are functions which estimate distance to the goal. Let g(s) be the cost to reach s in the current path, let the path cost of reaching state s′ from s via action a be c(s,a,s′), and let h be a heuristic. The total distance function is then

admissibleif it never overestimates the distance remaining. Admissibility frees us from needing to exhaustively explore every path (for fear of h having hid a good solution from us through overestimation). The heuristic h isconsistent(or, more helpfully,monotonic) if it obeys the triangle inequality:What this means: imagine that we have some admissible heuristic ha (for example, straight-line distance on a map). An admissible but inconsistent heuristic would then be:

This is clearly admissible, but also inconsistent - every h′a(s) evaluation is some pseudo-random number between 0 and the true distance to the goal!

Claim.All consistent heuristics are admissible.Proof.Let nk denote a state k actions from the goal, and let d be the true distance function.Base case(n1):Induction step(nk⇒nk+1):The inductive hypothesis is that h(nk)≤d(nk). Then

## Relaxation

Problem relaxation is a great way of finding admissible heuristics, and it's also a great way of approaching problems. Making potentially-unrealistic assumptions about your problem allows you to more freely explore the solution space: real-life examples include Shannon's formulation of a perfect chess algorithm in 1950, the formalization of idealized induction in Solomonoff Induction (can you induce who formalized it?), and Hutter's formulation of a perfectly rational AGI in 2000.2

## 4: Beyond Classical Search

In which the authors introduce ways to search using local information.## And-Or

Applying And-Or search can seem tricky, but it's really not. When an agent is operating under

partial observability(it isn't sure about the exact state of the world), it maintains abelief state(in this chapter, the set of all states the agent could be in). To be sure it will be in the goal state after following its plan, we whip out And-Or search:for eachstate we could be in now (∧), we need to findat least onesolution (∨).Sometimes the environment is

nondeterministic(actions have uncertain effects). In this case, we use And-Or search to construct acontingency plan, which consists of a series of if-then-else statements. Here, we control our actions, but not their effects; we will then findat least oneaction (∨) such that we have a solutionfor eachpotential outcome (∧). Think of this as min-maxing against an adversarial world.## 5: Adversarial Search

In which the authors demonstrate how to search when the world reallyisout to get you.## Pruning

I won't babble about αβ-pruning - just practice the concept here. For me, it was deceptively intuitive - I "got it" so "easily" that I neglected to follow the actual algorithm in a practice problem.

## Patchwork Progress

I don't think it's a good idea to spend substantial time on quick fixes which slightly improve performance but don't scale in the limit. Elegant algorithms are often superior for reasons exceeding their aesthetic appeal.

Two examples of objectively-good but non-scalable fixes:

Quiescence searchsometimes, we have to stop evaluating a plan before the game is done; this can leave us in a deceptively good-looking state. Say I move my queen diagonal to your pawn and then I have to stop searching. A simple material evaluation function wouldn't see that the queen is about to get obliterated, so it returns a neutral score. Quiescence search considers the "stability" of a position and searches until it gets to quiescent states, providing a partial workaround for this problem. The search is constrained to certain types of moves.-Singular extension- we try historically-good moves when we reach the search's depth limit as a last-ditch effort to extend the search tree a bit further.This is even more relevant to deep learning, where numerous engineering tricks are employed to eke out a slightly-improved classification accuracy. I agree that spending some effort working on local optimization of established methods is beneficial, but wouldn't it be higher expected utility to have more researchers studying the fundamentals and innovating new approaches?

## 6: Constraint Satisfaction Problems

In which the authors show us how to color maps real pretty.## Solving n-ary CSPs

A constraint satisfaction problem (CSP) is defined as follows:

Sounds intimidating, but it's really not. Let's say you have two variables: {X1,X2}. Each can be colored red or blue, so D={{red,blue},{red,blue}}. Pretend that each variable is a vertex and that they're connected by an edge; we want to 2-color this graph. Then C={X1≠X2}.

This gets tricky when we need to solve n-ary CSPs, which are those CSPs whose maximum constraint arity is n. Assume that the domains are discrete.

The main idea is that we want to break up these thickc constraints into mega-variables whose domains are exhaustive tuple enumerations of ways to satisfy the constraint. Then we just need to make sure that our chosen mega-solutions line up with the other mega- and normal variable assignments.

For each constraint Ck with variables Sk (|Sk|>2), make a new variable yk with domain

In logical terms, D′k is the set of all models for Ck. For each new variable, instantiate binary constraints on the identities of the values of the shared variables.

Arc consistencycan be viewed in a similar light; a variable Xi is arc-consistent with another variable Xj if for each value in Di, there exists at least once satisfactory assignment in