TurnTrout's shortform feed

by TurnTrout30th Jun 2019186 comments
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
185 comments, sorted by Highlighting new comments since Today at 8:22 PM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

My maternal grandfather was the scientist in my family. I was young enough that my brain hadn't decided to start doing its job yet, so my memories with him are scattered and inconsistent and hard to retrieve. But there's no way that I could forget all of the dumb jokes he made; how we'd play Scrabble and he'd (almost surely) pretend to lose to me; how, every time he got to see me, his eyes would light up with boyish joy.

My greatest regret took place in the summer of 2007. My family celebrated the first day of the school year at an all-you-can-eat buffet, delicious food stacked high as the eye could fathom under lights of green, red, and blue. After a particularly savory meal, we made to leave the surrounding mall. My grandfather asked me to walk with him.

I was a child who thought to avoid being seen too close to uncool adults. I wasn't thinking. I wasn't thinking about hearing the cracking sound of his skull against the ground. I wasn't thinking about turning to see his poorly congealed blood flowing from his forehead out onto the floor. I wasn't thinking I would nervously watch him bleed for long minutes while shielding my seven-year-old brother from the sight. I wasn't thinking t

... (read more)

My mother told me my memory was indeed faulty. He never asked me to walk with him; instead, he asked me to hug him during dinner. I said I'd hug him "tomorrow".

But I did, apparently, want to see him in the hospital; it was my mother and grandmother who decided I shouldn't see him in that state.

6Raemon2y<3
6habryka2yThank you for sharing.

Earlier today, I was preparing for an interview. I warmed up by replying stream-of-consciousness to imaginary questions I thought they might ask. Seemed worth putting here.

What do you think about AI timelines?

I’ve obviously got a lot of uncertainty. I’ve got a bimodal distribution, binning into “DL is basically sufficient and we need at most 1 big new insight to get to AGI” and “we need more than 1 big insight”

So the first bin has most of the probability in the 10-20 years from now, and the second is more like 45-80 years, with positive skew. 

Some things driving my uncertainty are, well, a lot. One thing  that drives how things turn out (but not really  how fast we’ll get there) is: will we be able to tell we’re close 3+ years in advance, and if so, how quickly will the labs react? Gwern Branwen made a point a few months ago, which is like, OAI has really been validated on this scaling hypothesis, and no one else is really betting big on it because they’re stubborn/incentives/etc, despite the amazing progress from scaling. If that’s true, then even if it's getting pretty clear that one approach is working better, we might see a slower pivot and have a more unipolar s

... (read more)
6Ben Pace1yWow.
1William Walker1yNice! Thanks!

Comment #1000 on LessWrong :)

5niplav2moWith 5999 karma! Edit: Now 6000 – I weak-upvoted an old post of yours [https://www.lesswrong.com/posts/EvKWNRkJgLosgRDSa/lightness-and-unease] I hadn't upvoted before.

For the last two years, typing for 5+ minutes hurt my wrists. I tried a lot of things: shots, physical therapy, trigger-point therapy, acupuncture, massage tools, wrist and elbow braces at night, exercises, stretches. Sometimes it got better. Sometimes it got worse.

No Beat Saber, no lifting weights, and every time I read a damn book I would start translating the punctuation into Dragon NaturallySpeaking syntax.

Text: "Consider a bijection "

My mental narrator: "Cap consider a bijection space dollar foxtrot colon cap x backslash tango oscar cap y dollar"

Have you ever tried dictating a math paper in LaTeX? Or dictating code? Telling your computer "click" and waiting a few seconds while resisting the temptation to just grab the mouse? Dictating your way through a computer science PhD?

And then.... and then, a month ago, I got fed up. What if it was all just in my head, at this point? I'm only 25. This is ridiculous. How can it possibly take me this long to heal such a minor injury?

I wanted my hands back - I wanted it real bad. I wanted it so bad that I did something dirty: I made myself believe something. Well, actually, I pretended to be a person who really, really believed hi

... (read more)
4DanielFilan9moIs the problem still gone?

Still gone. I'm now sleeping without wrist braces and doing intense daily exercise, like bicep curls and pushups.

8TurnTrout9moTotally 100% gone. Sometimes I go weeks forgetting that pain was ever part of my life.
4Vanessa Kosoy1yI'm glad it worked :) It's not that surprising given that pain is known to be susceptible to the placebo effect. I would link the SSC post, but, alas...
1Raj Thimmiah3moYou able to link to it now?
3Teerth Aloke1yThis is unlike anything I have heard!
6mingyuan1yIt's very similar to what John Sarno (author of Healing Back Pain and The Mindbody Prescription) preaches, as well as Howard Schubiner. There's also a rationalist-adjacent dude who started a company (Axy Health [https://www.axyhealth.com/]) based on these principles. Fuck if I know how any of it works though, and it doesn't work for everyone. Congrats though TurnTrout!
1Teerth Aloke1yMy Dad it seems might have psychosomatic stomach ache. How to convince him to convince himself that he has no problem?
4mingyuan1yIf you want to try out the hypothesis, I recommend that he (or you, if he's not receptive to it) read Sarno's book [https://smile.amazon.com/Mindbody-Prescription-Healing-Body-Pain/dp/0446675156/ref=sr_1_1?crid=13M4LF1VEWLRD&dchild=1&keywords=the+mindbody+prescription&qid=1593406066&sprefix=the+mindbody+%2Caps%2C226&sr=8-1] . I want to reiterate that it does not work in every situation, but you're welcome to take a look.
2Steven Byrnes5moMe too! [https://www.lesswrong.com/posts/urvyvjBFSAnP3aPvN/wrist-issues?commentId=SfpsQ4pP359SQxXdn]
2TurnTrout5moThere's a reasonable chance that my overcoming RSI was causally downstream of that exact comment of yours.
4Steven Byrnes5moHappy to have (maybe) helped! :-)
2avturchin1yLooks like reverse stigmata effect.
2Raemon1yWoo faith healing! (hope this works out longterm, and doesn't turn out be secretly hurting still)
5TurnTrout1yaren't we all secretly hurting still?
2mingyuan1y....D:

For quite some time, I've disliked wearing glasses. However, my eyes are sensitive, so I dismissed the possibility of contacts.

Over break, I realized I could still learn to use contacts, it would just take me longer. Sure enough, it took me an hour and five minutes to put in my first contact, and I couldn't get it out on my own. An hour of practice later, I put in a contact on my first try, and took it out a few seconds later. I'm very happily wearing contacts right now, as a matter of fact.

I'd suffered glasses for over fifteen years because of a cached decision – because I didn't think to rethink something literally right in front of my face every single day.

What cached decisions have you not reconsidered?

I think instrumental convergence also occurs in the model space for machine learning. For example, many different architectures likely learn edge detectors in order to minimize classification loss on MNIST. But wait - you'd also learn edge detectors to maximize classification loss on MNIST (loosely, getting 0% on a multiple-choice exam requires knowing all of the right answers). I bet you'd learn these features for a wide range of cost functions. I wonder if that's already been empirically investigated?

And, same for adversarial features. And perhaps, same for mesa optimizers (understanding how to stop mesa optimizers from being instrumentally convergent seems closely related to solving inner alignment). 

What can we learn about this?

6evhub1yA lot of examples of this sort of stuff show up in OpenAI clarity's circuits analysis work [https://distill.pub/2020/circuits/]. In fact, this is precisely their Universality hypothesis [https://distill.pub/2020/circuits/zoom-in/]. See also my discussion here [https://www.lesswrong.com/posts/MG4ZjWQDrdpgeu8wG/zoom-in-an-introduction-to-circuits] .

While reading Focusing today, I thought about the book and wondered how many exercises it would have. I felt a twinge of aversion. In keeping with my goal of increasing internal transparency, I said to myself: "I explicitly and consciously notice that I felt averse to some aspect of this book".

I then Focused on the aversion. Turns out, I felt a little bit disgusted, because a part of me reasoned thusly:

If the book does have exercises, it'll take more time. That means I'm spending reading time on things that aren't math textbooks. That means I'm slowing down.

(Transcription of a deeper Focusing on this reasoning)

I'm afraid of being slow. Part of it is surely the psychological remnants of the RSI I developed in the summer of 2018. That is, slowing down is now emotionally associated with disability and frustration. There was a period of meteoric progress as I started reading textbooks and doing great research, and then there was pain. That pain struck even when I was just trying to take care of myself, sleep, open doors. That pain then left me on the floor of my apartment, staring at the ceiling, desperately willing my hands to just get better. They didn't (for a long while), so I

... (read more)

I passed a homeless man today. His face was wracked in pain, body rocking back and forth, eyes clenched shut. A dirty sign lay forgotten on the ground: "very hungry".

This man was once a child, with parents and friends and dreams and birthday parties and maybe siblings he'd get in arguments with and snow days he'd hope for.

And now he's just hurting.

And now I can't help him without abandoning others. So he's still hurting. Right now.

Reality is still allowed to make this happen. This is wrong. This has to change.

9Said Achmiz2yHow would you help this man, if having to abandon others in order to do so were not a concern? (Let us assume that someone else—someone whose competence you fully trust, and who will do at least as good a job as you will—is going to take care of all the stuff you feel you need to do.) What is it you had in mind to do for this fellow—specifically, now—that you can’t (due to those other obligations)?

Suppose I actually cared about this man with the intensity he deserved - imagine that he were my brother, father, or best friend.

The obvious first thing to do before interacting further is to buy him a good meal and a healthy helping of groceries. Then, I need to figure out his deal. Is he hurting, or is he also suffering from mental illness?

If the former, I'd go the more straightforward route of befriending him, helping him purchase a sharp business professional outfit, teaching him to interview and present himself with confidence, secure an apartment, and find a job.

If the latter, this gets trickier. I'd still try and befriend him (consistently being a source of cheerful conversation and delicious food would probably help), but he might not be willing or able to get the help he needs, and I wouldn't have the legal right to force him. My best bet might be to enlist the help of a psychological professional for these interactions. If this doesn't work, my first thought would be to influence the local government to get the broader problem fixed (I'd spend at least an hour considering other plans before proceeding further, here). Realistically, there's ... (read more)

3Said Achmiz2yWell, a number of questions may be asked here (about desert, about causation, about autonomy, etc.). However, two seem relevant in particular: First, it seems as if (in your latter scenario) you’ve arrived (tentatively, yes, but not at all unreasonably!) at a plan involving systemic change. As you say, there is quite a bit of effort being expended on this sort of thing already, so, at the margin, any effective efforts on your part would likely be both high-level and aimed in an at-least-somewhat-unusual direction. … yet isn’t this what you’re already doing? Second, and unrelatedly… you say: Yet it seems to me that, empirically, most people do not expend the level of effort which you describe, even for their siblings, parents, or close friends. Which is to say that the level of emotional and practical investment you propose to make (in this hypothetical situation) is, actually, quite a bit greater than that which most people invest in their family members or close friends. The question, then, is this: do you currently make this degree of investment (emotional and practical) in your actual siblings, parents, and close friends? If so—do you find that you are unusual in this regard? If not—why not?
… yet isn’t this what you’re already doing?

I work on technical AI alignment, so some of those I help (in expectation) don't even exist yet. I don't view this as what I'd do if my top priority were helping this man.

The question, then, is this: do you currently make this degree of investment (emotional and practical) in your actual siblings, parents, and close friends? If so—do you find that you are unusual in this regard? If not—why not?

That's a good question. I think the answer is yes, at least for my close family. Recently, I've expended substantial energy persuading my family to sign up for cryonics with me, winning over my mother, brother, and (I anticipate) my aunt. My father has lingering concerns which I think he wouldn't have upon sufficient reflection, so I've designed a similar plan for ensuring he makes what I perceive to be the correct, option-preserving choice. For example, I made significant targeted donations to effective charities on his behalf to offset (what he perceives as) a considerable drawback of cryonics: his inability to also be an organ donor.

A universe in which humanity wins but my dad is gone would be quite sad t... (read more)

2Raemon2yI predict that this comment is not helpful to Turntrout.
6Raemon2y:( Song I wrote about this once [https://soundcloud.com/raymond-arnold/tuesday] (not very polished)

Weak derivatives

In calculus, the product rule says . The fundamental theorem of calculus says that the Riemann integral acts as the anti-derivative.[1] Combining these two facts, we derive integration by parts:

It turns out that we can use these two properties to generalize the derivative to match some of our intuitions on edge cases. Let's think about the absolute value function:

Image from Wikipedia

The boring old normal derivative isn't defined at , but it seems like it'd make sense to be able to say that the derivative is eg 0. Why might this make sense?

Taylor's theorem (and its generalizations) characterize first derivatives as tangent lines with slope which provide good local approximations of around : . You can prove that this is the best approximation you can get using only and ! In the absolute value example, defining the "derivative" to be zero at would minimize approximation error on average in neighborhoods around the origin.

In multivariable calculus, the Jacobian is a tangent plane which again minimizes approximation error (with respect to the Eucli

... (read more)
2TurnTrout1yThe reason f′(0) is undefined for the absolute value function is that you need the value to be the same for all sequences converging to 0 – both from the left and from the right. There's a nice way to motivate this in higher-dimensional settings by thinking about the action of e.g. complex multiplication, but this is a much stronger notion than real differentiability and I'm not quite sure how to think about motivating the single-valued real case yet. Of course, you can say things like "the theorems just work out nicer if you require both the lower and upper limits be the same"...

Listening to Eneasz Brodski's excellent reading of Crystal Society, I noticed how curious I am about how AGI will end up working. How are we actually going to do it? What are those insights? I want to understand quite badly, which I didn't realize until experiencing this (so far) intelligently written story.

Similarly, how do we actually "align" agents, and what are good frames for thinking about that?

Here's to hoping we don't sate the former curiosity too early.

Good, original thinking feels present to me - as if mental resources are well-allocated.

The thought which prompted this:

Sure, if people are asked to solve a problem and say they can't after two seconds, yes - make fun of that a bit. But that two seconds covers more ground than you might think, due to System 1 precomputation.

Reacting to a bit of HPMOR here, I noticed something felt off about Harry's reply to the Fred/George-tried-for-two-seconds thing. Having a bit of experience noticing confusing, I did not think "I notice I am confused" (although this can be useful). I did not think "Eliezer probably put thought into this", or "Harry is kinda dumb in certain ways - so what if he's a bit unfair here?". Without resurfacing, or distraction, or wondering if this train of thought is more fun than just reading further, I just thought about the object-level exchange.

People need to allocate mental energy wisely; this goes far beyond focusing on important tasks. Your existing mental skillsets already optimize and auto-pilot certain mental motions for you, so you should allocate less deliberation to them. In this case, the confusion-noticing module was honed; by not worrying about how w

... (read more)
6TurnTrout2yExpanding on this, there is an aspect of Actually Trying that is probably missing from S1 precomputation. So, maybe the two-second "attempt" is actually useless for most people because subconscious deliberation isn't hardass enough at giving its all, at making desperate and extraordinary efforts to solve the problem.

If you're tempted to write "clearly" in a mathematical proof, the word quite likely glosses over a key detail you're confused about. Use that temptation as a clue for where to dig in deeper.

At least, that's how it is for me.

From my Facebook

My life has gotten a lot more insane over the last two years. However, it's also gotten a lot more wonderful, and I want to take time to share how thankful I am for that.

Before, life felt like... a thing that you experience, where you score points and accolades and check boxes. It felt kinda fake, but parts of it were nice. I had this nice cozy little box that I lived in, a mental cage circumscribing my entire life. Today, I feel (much more) free.

I love how curious I've become, even about "unsophisticated" things. Near dusk, I walked the winter wonderland of Ogden, Utah with my aunt and uncle. I spotted this gorgeous red ornament hanging from a tree, with a hunk of snow stuck to it at north-east orientation. This snow had apparently decided to defy gravity. I just stopped and stared. I was so confused. I'd kinda guessed that the dry snow must induce a huge coefficient of static friction, hence the winter wonderland. But that didn't suffice to explain this. I bounded over and saw the smooth surface was iced, so maybe part of the snow melted in the midday sun, froze as evening advanced, and then the part-ice part-snow chunk stuck much more solidly to the ornament.

Mayb

... (read more)

Yesterday, I put the finishing touches on my chef d'œuvre, a series of important safety-relevant proofs I've been striving for since early June. Strangely, I felt a great exhaustion come over me. These proofs had been my obsession for so long, and now - now, I'm done.

I've had this feeling before; three years ago, I studied fervently for a Google interview. The literal moment the interview concluded, a fever overtook me. I was sick for days. All the stress and expectation and readiness-to-fight which had been pent up, released.

I don't know why this happens. But right now, I'm still a little tired, even after getting a good night's sleep.

2Hazard2yThis happens to me sometimes. I know several people who have this happen at the end of a Uni semester. Hope you can get some rest.

I went to the doctor's yesterday. This was embarrassing for them on several fronts.

First, I had to come in to do an appointment which could be done over telemedicine, but apparently there are regulations against this.

Second, while they did temp checks and required masks (yay!), none of the nurses or doctors actually wore anything stronger than a surgical mask. I'm coming in here with a KN95 + goggles + face shield because why not take cheap precautions to reduce the risk, and my own doctor is just wearing a surgical? I bought 20 KN95s for, like, 15 bucks on Amazon.

Third, and worst of all, my own doctor spouted absolute nonsense. The mildest insinuation was that surgical facemasks only prevent transmission, but I seem to recall that many kinds of surgical masks halve your chances of infection as well.

Then, as I understood it, he first claimed that coronavirus and the flu have comparable case fatality rates. I wasn't sure if I'd heard him correctly - this was an expert talking about his area of expertise, so I felt like I had surely misunderstood him. I was taken aback. But, looking back, that's what he meant.

He went on to suggest that we can't expect COVID immunity to last (wrong) b... (read more)

7mingyuan8moEli just took a plane ride to get to CA and brought a P100, but they told him he had to wear a cloth mask, that was the rule. So he wore a cloth mask under the P100, which of course broke the seal. I feel you.
4ChristianKl8moI don't think that policy is unreasonable for a plane ride. Just because someone wears a P100 mask doesn't mean that their mask filters outgoing air as that's not the design goals for most of the use cases of P100 masks. Checking on a case-by-case basis whether a particular P100 mask is not designed like an average P100 mask is likely not feasible in that context.
4Dagon8moWhat do you call the person who graduates last in their med school class? Doctor. And remember that GPs are weighted toward the friendly area of doctor-quality space rather than the hyper-competent. Further remember that consultants (including experts on almost all topics) are generally narrow in their understanding of things - even if they are well above the median at their actual job (for a GP, dispensing common medication and identifying situations that need referral to a specialist), that doesn't indicate they're going to be well-informed even for adjacent topics. That said, this level of misunderstanding on topics that impact patient behavior and outcome (mask use, other virus precautions) is pretty sub-par. The cynic in me estimates it's the bottom quartile of front-line medical providers, but I hope it's closer to the bottom decile. Looking into an alternate provider seems quite justified.
2ChristianKl8moIn the US that isn't the case. There are limited places for internships and the worst person in medical school might not get a place for an internship and thus is not allowed to be a doctor. The medical system is heavily gated to keep out people.

When I notice I feel frustrated, unproductive, lethargic, etc, I run down a simple checklist:

  • Do I need to eat food?
  • Am I drinking lots of water?
  •  Have I exercised today?
  • Did I get enough sleep last night? 
    • If not, what can I do now to make sure I get more tonight?
  • Have I looked away from the screen recently?
  • Have I walked around in the last 20 minutes?

It's simple, but 80%+ of the time, it fixes the issue.

2Viliam1yThere is a "HALT: hungry? angry? lonely? tired?" mnemonic, but I like that your list includes water and walking and exercise. Now just please make it easier to remember.
1AllAmericanBreakfast1yHow about THREES: Thirsty Hungry Restless Eyestrain Exercise?
2Matt Goldenberg1yHey can I steal this for a course I'm teaching? (I'll give you credit).
2TurnTrout1ysure!

Judgment in Managerial Decision Making says that (subconscious) misapplication of e.g. the representativeness heuristic causes insensitivity to base rates and to sample size, failure to reason about probabilities correctly, failure to consider regression to the mean, and the conjunction fallacy. My model of this is that representativeness / availability / confirmation bias work off of a mechanism somewhat similar to attention in neural networks: due to how the brain performs time-limited search, more salient/recent memories get prioritized for recall.

The availability heuristic goes wrong when our saliency-weighted perceptions of the frequency of events is a biased estimator of the real frequency, or maybe when we just happen to be extrapolating off of a very small sample size. Concepts get inappropriately activated in our mind, and we therefore reason incorrectly. Attention also explains anchoring: you can more readily bring to mind things related to your anchor due to salience.

The case for confirmation bias seems to be a little more involved: first, we had evolutionary pressure to win arguments, which means our search is meant to find supportive arguments and avoid even subconscio

... (read more)

I feel very excited by the AI alignment discussion group I'm running at Oregon State University. Three weeks ago, most attendees didn't know much about "AI security mindset"-ish considerations. This week, I asked the question "what, if anything, could go wrong with a superhuman reward maximizer which is rewarded for pictures of smiling people? Don't just fit a bad story to the reward function. Think carefully."

There was some discussion and initial optimism, after which someone said "wait, those optimistic solutions are just the ones you'd prioritize! What's that called, again?" (It's called anthropomorphic optimism)

I'm so proud.

With respect to the integers, 2 is prime. But with respect to the Gaussian integers, it's not: it has factorization . Here's what's happening.

You can view complex multiplication as scaling and rotating the complex plane. So, when we take our unit vector 1 and multiply by , we're scaling it by and rotating it counterclockwise by :

This gets us to the purple vector. Now, we multiply by , scaling it up by again (in green), and rotating it clockwise again by the same amount. You can even deal with the scaling and rotations separately (scale twice by , with zero net rotation).

The Pfizer phase 3 study's last endpoint is 7 days after the second shot. Does anyone know why the CDC recommends waiting 2 weeks for full protection? Are they just being the CDC again?

6jimrandomh2moPeople don't really distinguish between "I am protected" and "I am safe for others to be around". If someone got infected prior to their vaccination and had a relatively-long incubation period, they could infect others; I don't think it's a coincidence that two weeks is also the recommended self-isolation period for people who may have been exposed.

Suppose you could choose how much time to spend at your local library, during which:

  • you do not age. Time stands still outside; no one enters or exits the library (which is otherwise devoid of people).
  • you don't need to sleep/eat/get sunlight/etc
  • you can use any computers, but not access the internet or otherwise bring in materials with you
  • you can't leave before the requested time is up

Suppose you don't go crazy from solitary confinement, etc. Remember that value drift is a potential thing.

How long would you ask for?

1FactorialCode1yHow good are the computers?
2TurnTrout1yWindows machines circa ~2013. Let’s say 128GB hard drives which magically never fail, for 10 PCs.
1FactorialCode1yProbably 3-5 years then. I'd use it to get a stronger foundation in low level programming skills, math and physics. The limiting factors would be entertainment in the library to keep me sane and the inevitable degradation of my social skills from so much spent time alone.

When proving theorems for my research, I often take time to consider the weakest conditions under which the desired result holds - even if it's just a relatively unimportant and narrow lemma. By understanding the weakest conditions, you isolate the load-bearing requirements for the phenomenon of interest. I find this helps me build better gears-level models of the mathematical object I'm studying. Furthermore, understanding the result in generality allows me to recognize analogies and cross-over opportunities in the future. Lastly, I just find this plain satisfying.

I remarked to my brother, Josh, that when most people find themselves hopefully saying "here's how X can still happen!", it's a lost cause and they should stop grasping for straws and move on with their lives. Josh grinned, pulled out his cryonics necklace, and said "here's how I can still not die!"

Does Venting Anger Feed or Extinguish the Flame? Catharsis, Rumination, Distraction, Anger, and Aggressive Responding

Does distraction or rumination work better to diffuse anger? Catharsis theory predicts that rumination works best, but empirical evidence is lacking. In this study, angered participants hit a punching bag and thought about the person who had angered them (rumination group) or thought about becoming physically fit (distraction group). After hitting the punching bag, they reported how angry they felt. Next, they were given the chance to admini

... (read more)
5MakoYass10moIt would be interesting to see a more long-term study about habits around processing anger. For instance, randomly assigning people different advice about processing anger (likely to have quite an impact on them, I don't think the average person receives much advice in that class) and then checking in on them a few years later and ask them things like, how many enemies they have, how many enemies they've successfully defeated, how many of their interpersonal issues they resolve successfully?
4Raemon10moBoggling a bit at the "can you actually reliably find angry people and/or make people angry on purpose?"
1capybaralet9moI found this fascinating... it's rare these days that I see some fundamental assumption in my thinking that I didn't even realize I was making laid bare like this... it is particularly striking because I think I could easily have realized that my own experience contradicts catharsis theory... I know that I can distract myself to become less angry, but I usually don't want to, in the moment. I think that desire is driven by emotion, but rationalized via something like catharsis theory. I want to try and rescue catharsis theory by saying that maybe there are negative long-term effects of being distracted from feelings of anger (e.g. a build up of resentment). I wonder how much this is also a rationalization. I also wonder how accurately the authors have characterized catharsis theory, and how much to identify it with the "hydraulic model of anger"... I would imagine that there are lots of attempts along the lines of what I suggested to try and rescue catharsis theory by refining or moving away from the hydraulic model. A highly general version might claim: "over a long time horizon, not 'venting' anger is net negative".

This might be the best figure I've ever seen in a textbook. Talk about making a point! 

Molecular Biology of the Cell, Alberts.

An exercise in the companion workbook to the Feynman Lectures on Physics asked me to compute a rather arduous numerical simulation. At first, this seemed like a "pass" in favor of an exercise more amenable to analytic and conceptual analysis; arithmetic really bores me. Then, I realized I was being dumb - I'm a computer scientist.

Suddenly, this exercise became very cool, as I quickly figured out the equations and code, crunched the numbers in an instant, and churned out a nice scatterplot. This seems like a case where cross-domain competence is unusually h

... (read more)

Amazing how much I can get done if I chant to myself "I'm just writing two pages of garbage abstract/introduction/related work, it's garbage, it's just garbage, don't fix it rn, keep typing"

I never thought I'd be seriously testing the reasoning abilities of an AI in 2020

Looking back, history feels easy to predict; hindsight + the hard work of historians makes it (feel) easy to pinpoint the key portents. Given what we think about AI risk, in hindsight, might this have been the most disturbing development of 2020 thus far? 

I personally lean towards "no", because this scaling seemed somewhat predictable from GPT-2 (flag - possible hindsight bias), and because 2020 has been so awful so far. But it seems possible, at least. I don't rea... (read more)

DL so far has been easy to predict - if you bought into a specific theory of connectionism & scaling espoused by Schmidhuber, Moravec, Sutskever, and a few others, as I point out in https://www.gwern.net/newsletter/2019/13#what-progress & https://www.gwern.net/newsletter/2020/05#gpt-3 . Even the dates are more or less correct! The really surprising thing is that that particular extreme fringe lunatic theory turned out to be correct. So the question is, was everyone else wrong for the right reasons (similar to the Greeks dismissing heliocentrism for excellent reasons yet still being wrong), or wrong for the wrong reasons, and why, and how can we prevent that from happening again and spending the next decade being surprised in potentially very bad ways?

Over the last 2.5 years, I've read a lot of math textbooks. Not using Anki / spaced repetition systems over that time has been an enormous mistake. My factual recall seems worse-than-average among my peers, but when supplemented with Anki, it's far better than average (hence, I was able to learn 2000+ Japanese characters in 90 days, in college). 

I considered using Anki for math in early 2018, but I dismissed it quickly because I hadn't had good experience using that application for things which weren't languages. I should have at least tried to see if... (read more)

1NaiveTortoise7moI'm curious what sort of things you're Anki-fying (e.g. a few examples for measure theory).
2TurnTrout7mohttps://ankiweb.net/shared/info/511421324 [https://ankiweb.net/shared/info/511421324]

An additional consideration for early work on interpretability: it slightly increases the chance we actually get an early warning shot. If a system misbehaves, we can inspect its cognition and (hopefully) find hints of intentional deception. Could motivate thousands of additional researcher-hours being put into alignment.

2Raemon1yThat's an interesting point.

Today, let's read about GPT-3's obsession with Shrek

As for me, I think Shrek is important because the most valuable thing in life is happiness. I mean this quite literally. There's a mountain of evidence for it, if you're willing to look at the research. And I think movies can help us get there. Or at least not get in the way.

Now, when I say "happiness," I'm not talking about the transient buzz that you get from, say, heroin. I'm talking about a sense of fulfillment. A sense that you are where you're meant to be. That you are doing what you're meant

... (read more)
2ChristianKl1yWhat's the input that produced the text from GPT-3?
2TurnTrout1yTwo Sequences posts... lol... Here's the full transcript [https://aidungeon.page.link/?link=https://exploreViewAdventure?publicId=f352d549-9f35-49cf-a363-095b88d52385&ofl=https://play.aidungeon.io/adventure/f352d549-9f35-49cf-a363-095b88d52385&apn=com.aidungeon&ibi=com.aidungeon.app&isi=1491268416] .

Cool Math Concept You Never Realized You Wanted: Fréchet distance.

Imagine a man traversing a finite curved path while walking his dog on a leash, with the dog traversing a separate one. Each can vary their speed to keep slack in the leash, but neither can move backwards. The Fréchet distance between the two curves is the length of the shortest leash sufficient for both to traverse their separate paths. Note that the definition is symmetric with respect to the two curves—the Frechet distance would be the same if the dog was walking its owner.

The Fréche

... (read more)

Earlier today, I became curious why extrinsic motivation tends to preclude or decrease intrinsic motivation. This phenomenon is known as overjustification. There's likely agreed-upon theories for this, but here's some stream-of-consciousness as I reason and read through summarized experimental results. (ETA: Looks like there isn't consensus on why this happens)

My first hypothesis was that recognizing external rewards somehow precludes activation of curiosity-circuits in our brain. I'm imagining a kid engrossed in a puzzle. Then, they're told that they'll b

... (read more)

Virtue ethics seems like model-free consequentialism to me.

5JohnSteidley1yI've was thinking along similar lines! From my notes from 2019-11-24: "Deontology is like the learned policy of bounded rationality of consequentialism"

The discussion of the HPMOR epilogue in this recent April Fool's thread was essentially online improv, where no one could acknowledge that without ruining the pretense. Maybe I should do more improv in real life, because I enjoyed it!

What kind of reasoning would have allowed me to see MySpace in 2004, and then hypothesize the current craziness as a plausible endpoint of social media? Is this problem easier or harder than the problem of 15-20 year AI forecasting?

1unparadoxed4moHmm, maybe it would be easier if we focused on one kind/example of craziness. Is there a particular one you have in mind?

If Hogwarts spits back an error if you try to add a non-integer number of house points, and if you can explain the busy beaver function to Hogwarts, you now have an oracle which answers  for arbitrary : just state " points to Ravenclaw!". You can do this for other problems which reduce to divisibility tests (so, any decision problem  which you can somehow get Hogwarts to compute; if ).

Homework: find a way to safely take over the world using this power, and no other magic. 

5Measure5moI'd be worried about integer overflow with that protocol. If it can understand BB and division, you can probably just ask for the remainder directly and observe the change.

When I imagine configuring an imaginary pile of blocks, I can feel the blocks in front of me in this fake imaginary plane of existence. I feel aware of their spatial relationships to me, in the same way that it feels different to have your eyes closed in a closet vs in an empty auditorium. 

But what is this mental workspace? Is it disjoint and separated from my normal spatial awareness, or does my brain copy/paste->modify my real-life spatial awareness. Like, if my brother is five feet in front of me, and then I imagine a blade flying five feet in f... (read more)

AIDungeon's subscriber-only GPT-3 can do some complex arithmetic, but it's very spotty. Bold text is me.

You say "What happens if I take the square root of 3i?" 

The oracle says: "You'll get a negative number. [wrong] So, for example, the square root of  is ." [correct]
"What?" you say.
 "I just said it," the oracle repeats. 
"But that's ridiculous! The square root of  is not . It's complex. It's  plus a multiple of ." [wrong, but my character is supposed to be playing dumb here]

The

... (read more)

The new "Broader Impact" NeurIPS statement is a good step, but incentives are misaligned. Admitting fatally negative impact would set a researcher back in their career, as the paper would be rejected. 

Idea: Consider a dangerous paper which would otherwise have been published. What if that paper were published title-only on the NeurIPS website, so that the researchers can still get career capital?

Problem: How do you ensure resubmission doesn't occur elsewhere?

4Daniel Kokotajlo1yThe people at NeurIPS who reviewed the paper might notice if resubmission occurred elsewhere? Automated tools might help with this, by searching for specific phrases. There's been talk of having a Journal of Infohazards. Seems like an idea worth exploring to me. Your suggestion sounds like a much more feasible first step. Problem: Any entity with halfway decent hacking skills (such as a national government, or clever criminal) would be able to peruse the list of infohazardy titles, look up the authors, cyberstalk them, and then hack into their personal computer and steal the files. We could hope that people would take precautions against this, but I'm not very optimistic. That said, this still seems better than the status quo.

Sentences spoken aloud are a latent space embedding of our thoughts; when trying to move a thought from our mind to another's, our thoughts are encoded with the aim of minimizing the other person's decoder error.

Broca’s area handles syntax, while Wernicke’s area handles the semantic side of language processing. Subjects with damage to the latter can speak in syntactically fluent jargon-filled sentences (fluent aphasia) – and they can’t even tell their utterances don’t make sense, because they can’t even make sense of the words leaving their own mouth!

It seems like GPT2 : Broca’s area :: ??? : Wernicke’s area. Are there any cog psych/AI theories on this?

Going through an intro chem textbook, it immediately strikes me how this should be as appealing and mysterious as the alchemical magic system of Fullmetal Alchemist. "The law of equivalent exchange" "conservation of energy/elements/mass (the last two holding only for normal chemical reactions)", etc. If only it were natural to take joy in the merely real...

4Hazard2yHave you been continuing your self-study schemes into realms beyond math stuff? If so I'm interested in both the motivation and how it's going! I remember having little interest in other non-physics science growing up, but that was also before I got good at learning things and my enjoyment was based on how well it was presented.
4TurnTrout2yYeah, I've read a lot of books since my reviews fell off last year, most of them still math. I wasn't able to type reliably until early this summer, so my reviews kinda got derailed. I've read Visual Group Theory, Understanding Machine Learning, Computational Complexity: A Conceptual Perspective, Introduction to the Theory of Computation, An Illustrated Theory of Numbers, most of Tadellis' Game Theory, the beginning of Multiagent Systems, parts of several graph theory textbooks, and I'm going through Munkres' Topology right now. I've gotten through the first fifth of the first Feynman lectures, which has given me an unbelievable amount of mileage for generally reasoning about physics. I want to go back to my reviews, but I just have a lot of other stuff going on right now. Also, I run into fewer basic confusions than when I was just starting at math, so I generally have less to talk about. I guess I could instead try and re-present the coolest concepts from the book. My "plan" is to keep learning math until the low graduate level (I still need to at least do complex analysis, topology, field / ring theory, ODEs/PDEs, and something to shore up my atrocious trig skills, and probably more)[1] [#fn-HtxK45yv8TDXoD5Ft-1], and then branch off into physics + a "softer" science (anything from microecon to psychology). CS ("done") -> math -> physics -> chem -> bio is the major track for the physical sciences I have in mind, but that might change. I dunno, there's just a lot of stuff I still want to learn. :) -------------------------------------------------------------------------------- 1. I also still want to learn Bayes nets, category theory, get a much deeper understanding of probability theory, provability logic, and decision theory. ↩︎ [#fnref-HtxK45yv8TDXoD5Ft-1]
4Hazard2yYay learning all the things! Your reviews are fun, also completely understandable putting energy elsewhere. Your energy for more learning is very useful for periodically bouncing myself into more learning.

We can think about how consumers respond to changes in price by considering the elasticity of the quantity demanded at a given price - how quickly does demand decrease as we raise prices? Price elasticity of demand is defined as ; in other words, for price and quantity , this is (this looks kinda weird, and it wasn't immediately obvious what's happening here...). Revenue is the total amount of cash changing hands: .

What's happening here is that raising prices is a good idea when the revenue gained (the "pric

... (read more)

How does representation interact with consciousness? Suppose you're reasoning about the universe via a partially observable Markov decision process, and that your model is incredibly detailed and accurate. Further suppose you represent states as numbers, as their numeric labels.

To get a handle on what I mean, consider the game of Pac-Man, which can be represented as a finite, deterministic, fully-observable MDP. Think about all possible game screens you can observe, and number them. Now get rid of the game screens. From the perspective of reinforcement lea

... (read more)
8G Gordon Worley III2yI think a reasonable and related question we don't have a solid answer for is if humans are already capable of mind crime. For example, maybe Alice is mad at Bob and imagines causing harm to Bob. How well does Alice have to model Bob for her imaginings to be mind crime? If Alice has low cognitive empathy is it not mind crime but if her cognitive empathy is above some level is it then mind crime? I think we're currently confused enough about what mind crime is such that it's hard to even begin to know how we could answer these questions based on more than gut feelings.
2Vladimir_Nesov2yI suspect that it doesn't matter how accurate or straightforward a predictor is in modeling people. What would make prediction morally irrelevant is that it's not noticed by the predicted people, irrespective of whether this happens because it spreads the moral weight conferred to them over many possibilities (giving inaccurate prediction), keeps the representation sufficiently baroque, or for some other reason. In the case of inaccurate prediction or baroque representation, it probably does become harder for the predicted people to notice being predicted, and I think this is the actual source of moral irrelevance, not those things on their own. A more direct way of getting the same result is to predict counterfactuals where the people you reason about don't notice the fact that you are observing them, which also gives a form of inaccuracy (imagine that your predicting them is part of their prior, that'll drive the counterfactual further from reality).

I seem to differently discount different parts of what I want. For example, I'm somewhat willing to postpone fun to low-probability high-fun futures, whereas I'm not willing to do the same with romance.

If you measure death-badness from behind the veil of ignorance, you’d naively prioritize well-liked, famous people with large families.

2Pattern1yWould you prioritize the young from behind the veil of ignorance?

Idea: learn by making conjectures (math, physical, etc) and then testing them / proving them, based on what I've already learned from a textbook. 

Learning seems easier and faster when I'm curious about one of my own ideas.

7NaiveTortoise1yFor what it's worth, this is very true for me as well. I'm also reminded of a story of Robin Hanson from Cryonics magazine: * Source [https://www.google.com/url?sa=t&source=web&rct=j&url=https://alcor.org/cryonics/Cryonics2017-4.pdf&ved=2ahUKEwjKhLnnl6LqAhWeoXIEHQWwB4UQFjAPegQIBxAB&usg=AOvVaw3uCyzISnOW89LDK4zeAsTC]
1Rudi C1yHow do you estimate how hard your invented problems are?

I had an intuition that attainable utility preservation (RL but you maintain your ability to achieve other goals) points at a broader template for regularization. AUP regularizes the agent's optimal policy to be more palatable towards a bunch of different goals we may wish we had specified. I hinted at the end of Towards a New Impact Measure that the thing-behind-AUP might produce interesting ML regularization techniques.

This hunch was roughly correct; Model-Agnostic Meta-Learning tunes the network parameters such that they can be quickly adapted to achiev

... (read more)

I'd like to see research exploring the relevance of intragenomic conflict to AI alignment research. Intragenomic conflict constitutes an in-the-wild example of misalignment, where conflict arises "within an agent" even though the agent's genes have strong instrumental incentives to work together (they share the same body). 

In an interesting parallel to John Wentworth's Fixing the Good Regulator Theorem, I have an MDP result that says: 

Suppose we're playing a game where I give you a reward function and you give me its optimal value function in the MDP. If you let me do this for  reward functions (one for each state in the environment), and you're able to provide the optimal value function for each, then you know enough to reconstruct the entire environment (up to isomorphism).

Roughly: being able to complete linearly many tasks in the state space means you ha... (read more)

I read someone saying that ~half of the universes in a neighborhood of ours went to Trump. But... this doesn't seem right. Assuming Biden wins in the world we live in, consider the possible perturbations to the mental states of each voter. (Big assumption! We aren't thinking about all possible modifications to the world state. Whatever that means.)

Assume all 2020 voters would be equally affected by a perturbation (which you can just think of as a decision-flip for simplicity, perhaps). Since we're talking about a neighborhood ("worlds pretty close to ours"... (read more)

1Measure7moI think this depends on the distance considered. In worlds very very close to ours, the vast majority will have the same outcome as ours. As you increase the neighborhood size (I imagine this as considering worlds which diverged from ours more distantly in the past), Trump becomes more likely relative to Biden [edit: more likely than he is relative to Biden in more nearby worlds]. As you continue to expand, other outcomes start to have significant likelihood as well.
2TurnTrout7moWhy do you think that? How do you know that?
2Measure7moGeneral intuition that "butterfly effect" is basically true, meaning that if a change occurs in a chaotic system, then the size of the downstream effects will tend to increase over time. Edit: I don't have a good sense of how far back you would have to go to see meaningful change in outcome, just that the farther you go the more likely change becomes.
2TurnTrout7moSure, but why would those changes tend to favor Trump as you get outside of a small neighborhood? Like, why would Biden / (Biden or Trump win) < .5? I agree it would at least approach .5 as the neighborhood grows. I think.
4Measure7moI think we're in agreement here. I didn't mean to imply that Trump would become more likely than Biden in absolute terms, just that the ratio Trump/Biden would increase.

Epistemic status: not an expert

Understanding Newton's third law, .

Consider the vector-valued velocity as a function of time, . Scale this by the object's mass and you get the momentum function over time. Imagine this momentum function wiggling around over time, the vector from the origin rotating and growing and shrinking.

The third law says that force is the derivative of this rescaled vector function - if an object is more massive, then the same displacement of this rescaled arrow is a proportionally smaller velocity modification, because o... (read more)

What is "real"? I think about myself as a computation embedded in some other computation (i.e. a universe-history). I think "real" describes hypotheses about the environment where my computation lives. What should I think is real? That which an "ideal embedded reasoner" would assign high credence. However that works.

This sensibly suggests that Gimli-in-actual-Ea (LOTR) should believe he lives in Ea, and that Ea is real, even though it isn't our universe's Earth. Also, the notion accounts for indexical uncertainty by punting it to how embedded reasoning sho... (read more)

Tricking AIDungeon's GPT-3 model into writing HPMOR:

You start reading Harry Potter and the Methods of Rationality by Eliezer Yudkowsky:

" "It said to me," said Professor Quirrell, "that it knew me, and that it would hunt me down someday, wherever I tried to hide." His face was rigid, showing no fright.
"Ah," Harry said. "I wouldn't worry about that, Professor Quirrell." It's not like Dementors can actually talk, or think; the structure they have is borrowed from your own mind and expectations...
Now

... (read more)
2Pattern1yI love the ending. It's way more exciting,
2TurnTrout1y... that which he had thought was absent. Love. He didn't think of the books, or his parents or Professor McGonagall. He thought of Hermione, and how she had always believed in him. He thought of how she'd helped him in so many ways, not just with homework, not just with fighting the Dark Arts. How she'd tried to help him every day since they'd first met on the Hogwarts Express.
2habryka1yMod note: Spoilerified, to shield the eyes of the innocent.
4TurnTrout1yMy bad! Thanks.

ARCHES distinguishes between single-agent / single-user and single-agent/multi-user alignment scenarios. Given assumptions like "everyone in society is VNM-rational" and "societal preferences should also follow VNM rationality", and "if everyone wants a thing, society also wants the thing", Harsanyi's utilitarian theorem shows that the societal utility function is a linear non-negative weighted combination of everyone's utilities. So, in a very narrow (and unrealistic) setting, Harsanyi's theorem tells you how the single-multi solution is built from the si

... (read more)

From FLI's AI Alignment Podcast: Inverse Reinforcement Learning and Inferring Human Preferences with Dylan Hadfield-Menell:

Dylan: There’s one example that I think about, which is, say, you’re cooperating with an AI system playing chess. You start working with that AI system, and you discover that if you listen to its suggestions, 90% of the time, it’s actually suggesting the wrong move or a bad move. Would you call that system value-aligned?

Lucas: No, I would not.

Dylan: I think most people wouldn’t. Now, what if I told you that that program was act

... (read more)

On page 22 of Probabilistic reasoning in intelligent systems, Pearl writes:

Raw experiential data is not amenable to reasoning activities such as prediction and planning; these require that data be abstracted into a representation with a coarser grain. Probabilities are summaries of details lost in this abstraction...

An agent observes a sequence of images displaying either a red or a blue ball. The balls are drawn according to some deterministic rule of the time step. Reasoning directly from the experiential data leads to ~Solomonoff induction. What mig

... (read more)
2TurnTrout1yIn particular, the coarse-grain is what I mentioned in 1) – beliefs are easier to manage with respect to a fixed featurization of the observation space.
1NaiveTortoise1yOnly related to the first part of your post, I suspect Pearl!2020 would say the coarse-grained model should be some sort of causal model on which we can do counterfactual reasoning.

We can imagine aliens building a superintelligent agent which helps them get what they want. This is a special case of aliens inventing tools. What kind of general process should these aliens use – how should they go about designing such an agent?

Assume that these aliens want things in the colloquial sense (not that they’re eg nontrivially VNM EU maximizers) and that a reasonable observer would say they’re closer to being rational than antirational. Then it seems[1] like these aliens eventually steer towards reflectively coherent rationality (provided they

... (read more)

ordinal preferences just tell you which outcomes you like more than others: apples more than oranges.

Interval scale preferences assign numbers to outcomes, which communicates how close outcomes are in value: kiwi 1, orange 5, apple 6. You can say that apples have 5 times the advantage over kiwis that they do over oranges, but you can't say that apples are six times as good as kiwis. Fahrenheit and Celsius are also like this.

Ratio scale ("rational"? 😉) preferences do let you say that apples are six times as good as kiwis, and you need this property to maxi

... (read more)
4Matt Goldenberg1yIsn't the typical assumption in game theory that preferences are ordinal? This suggests that you can make quite a few strategic decisions without bringing in ratio.
3Dagon1yFrom what I have read, and from self-introspection, humans mostly have ordinal preferences. Some of them we can interpolate to interval scales or ratios (or higher-order functions) but if we extrapolate very far, we get odd results. It turns out you can do a LOT with just ordinal preferences. Almost all real-world decisions are made this way.

It seems to me that Zeno's paradoxes leverage incorrect, naïve notions of time and computation. We exist in the world, and we might suppose that that the world is being computed in some way. If time is continuous, then the computer might need to do some pretty weird things to determine our location at an infinite number of intermediate times. However, even if that were the case, we would never notice it – we exist within time and we would not observe the external behavior of the system which is computing us, nor its runtime.

2Pattern1yWhat are your thoughts on infinitely small quantities?
2TurnTrout1yDon't have much of an opinion - I haven't rigorously studied infinitesimals yet. I usually just think of infinite / infinitely small quantities as being produced by limiting processes. For example, the intersection of all the ϵ-balls around a real number is just that number (under the standard topology), which set has 0 measure and is, in a sense, "infinitely small".

Very rough idea

In 2018, I started thinking about corrigibility as "being the kind of agent lots of agents would be happy to have activated". This seems really close to a more ambitious version of what AUP tries to do (not be catastrophic for most agents).

I wonder if you could build an agent that rewrites itself / makes an agent which would tailor the AU landscape towards its creators' interests, under a wide distribution of creator agent goals/rationalities/capabilities. And maybe you then get a kind of generalization, where most simple algorithms which solve this solve ambitious AI alignment in full generality.

My autodidacting has given me a mental reflex which attempts to construct a gears-level explanation of almost any claim I hear. For example, when listening to “Listen to Your Heart” by Roxette:

Listen to your heart,

There’s nothing else you can do

I understood what she obviously meant and simultaneously found myself subvocalizing “she means all other reasonable plans are worse than listening to your heart - not that that’s literally all you can do”.

This reflex is really silly and annoying in the wrong context - I’ll fix it soon. But it’s pretty amusing

... (read more)

AFAICT, the deadweight loss triangle from eg price ceilings is just a lower bound on lost surplus. inefficient allocation to consumers means that people who value good less than market equilibrium price can buy it, while dwl triangle optimistically assumes consumers with highest willingness to buy will eat up the limited supply.

4Wei_Dai1yGood point. By searching for "deadweight loss price ceiling lower bound" I was able to find a source [http://www.econ.ucla.edu/sboard/teaching/econ11_09/econ11_09_slides10.pdf] (see page 26) that acknowledges this, but most explications of price ceilings do not seem to mention that the triangle is just a lower bound for lost surplus.
2Dagon1yLost surplus is definitely a loss - it's not linear with utility, but it's not uncorrelated. Also, if supply is elastic over any relevant timeframe, there's an additional source of loss. And I'd argue that for most goods, over timeframes smaller than most price-fixing proposals are expected to last, there is significant price elasticity.
2TurnTrout1yI don't think I was disagreeing?
2Dagon1yAh, I took the "just" in "just a lower bound on lost surplus" as an indicator that it's less important than other factors. And I lightly believe (meaning: for the cases I find most available, I believe it, but I don't know how general it is) that the supply elasticity _is_ the more important effect of such distortions. So I wanted to reinforce that I wasn't ignoring that cost, only pointing out a greater cost.

The framing effect & aversion to losses generally cause us to execute more cautious plans. I’m realizing this is another reason to reframe my x-risk motivation from “I won’t let the world be destroyed” to “there’s so much fun we could have, and I want to make sure that happens”. I think we need more exploratory thinking in alignment research right now.

(Also, the former motivation style led to me crashing and burning a bit when my hands were injured and I was no longer able to do much.)

ETA: actually, i’m realizing I had the effect backwards. Framing via

... (read more)
6TurnTrout1yI’m realizing how much more risk-neutral I should be:
3Isnasene1yFor what it's worth, I tried something like the "I won't let the world be destroyed"->"I want to make sure the world keeps doing awesome stuff" reframing back in the day and it broadly didn't work. This had less to do with cautious/uncautious behavior and more to do with status quo bias. Saying "I won't let the world be destroyed" treats "the world being destroyed" as an event that deviates from the status quo of the world existing. In contrast, saying "There's so much fun we could have" treats "having more fun" as the event that deviates from the status quo of us not continuing to have fun. When I saw the world being destroyed as status quo, I cared a lot less about the world getting destroyed.

I was having a bit of trouble holding the point of quadratic residues in my mind. I could effortfully recite the definition, give an example, and walk through the broad-strokes steps of proving quadratic reciprocity. But it felt fake and stale and memorized.

Alex Mennen suggested a great way of thinking about it. For some odd prime , consider the multiplicative group . This group is abelian and has even order . Now, consider a primitive root / generator . By definition, every element of the group can be expressed as . The quadratic residues ar

... (read more)
4AlexMennen2yThe theorem: where k is relatively prime to an odd prime p and n<e, k⋅pn is a square mod pe iff k is a square mod p and n is even. The real meat of the theorem is the n=0 case (i.e. a square mod p that isn't a multiple of p is also a square mod pe. Deriving the general case from there should be fairly straightforward, so let's focus on this special case. Why is it true? This question has a surprising answer: Newton's method for finding roots of functions. Specifically, we want to find a root of f(x):=x2−k, except in Z/peZ instead of R. To adapt Newton's method to work in this situation, we'll need the p-adic absolute value on Z: |k⋅pn|p:=p−n for k relatively prime to p. This has lots of properties that you should expect of an "absolute value": it's positive (|x|p≥0 with = only when x=0), multiplicative (|xy|p=|x|p|y|p), symmetric (|−x|p=|x|p), and satisfies a triangle inequality (|x+y|p≤|x|p+|y|p; in fact, we get more in this case: |x+y|p≤max(|x|p,|y|p)). Because of positivity, symmetry, and the triangle inequality, the p-adic absolute value induces a metric (in fact, ultrametric, because of the strong version of the triangle inequality) d(x,y):=| x−y|p. To visualize this distance function, draw p giant circles, and sort integers into circles based on their value mod p. Then draw p smaller circles inside each of those giant circles, and sort the integers in the big circle into the smaller circles based on their value mod p2. Then draw p even smaller circles inside each of those, and sort based on value mod p3, and so on. The distance between two numbers corresponds to the size of the smallest circle encompassing both of them. Note that, in this metric, 1,p,p2,p3,... converges to 0. Now on to Newton's method: if k is a square mod p, let a be one of its square roots mod p. |f(a)|p≤p−1; that is, a is somewhat close to being a root of f with respect to the p-adic absolute value. f′(x)=2x, so |f'(a)|p=|2a|p=|2|p⋅|a|p=1⋅1= 1; that is, f is steep near a. This is goo
2AlexMennen2yThe part about derivatives might have seemed a little odd. After all, you might think, Z is a discrete set, so what does it mean to take derivatives of functions on it. One answer to this is to just differentiate symbolically using polynomial differentiation rules. But I think a better answer is to remember that we're using a different metric than usual, and Z isn't discrete at all! Indeed, for any number k, limn→∞k+pn=k, so no points are isolated, and we can define differentiation of functions on Z in exactly the usual way with limits.

I noticed I was confused and liable to forget my grasp on what the hell is so "normal" about normal subgroups. You know what that means - colorful picture time!

First, the classic definition. A subgroup is normal when, for all group elements , (this is trivially true for all subgroups of abelian groups).

ETA: I drew the bounds a bit incorrectly; is most certainly within the left coset ().

Notice that nontrivial cosets aren't subgroups, because they don't have the identity .

This "normal" thing matters because sometimes we want to highlight regu

... (read more)

One of the reasons I think corrigibility might have a simple core principle is: it seems possible to imagine a kind of AI which would make a lot of different possible designers happy. That is, if you imagine the same AI design deployed by counterfactually different agents with different values and somewhat-reasonable rationalities, it ends up doing a good job by almost all of them. It ends up acting to further the designers' interests in each counterfactual. This has been a useful informal way for me to think about corrigibility, when considering different

... (read more)

Continuous functions can be represented by their rational support; in particular, for each real number , choose a sequence of rational numbers converging to , and let .

Therefore, there is an injection from the vector space of continuous functions to the vector space of all sequences : since the rationals are countable, enumerate them by . Then the sequence represents continuous function .

3itaibn01yThis map is not a surjection because not every map from the rational numbers to the real numbers is continuous, and so not every sequence represents a continuous function. It is injective, and so it shows that a basis for the latter space is at least as large in cardinality as a basis for the former space. One can construct an injective map in the other direction, showing the both spaces of bases with the same cardinality, and so they are isomorphic.
2TurnTrout1yFixed, thanks.

(Just starting to learn microecon, so please feel free to chirp corrections)

How diminishing marginal utility helps create supply/demand curves: think about the uses you could find for a pillow. Your first few pillows are used to help you fall asleep. After that, maybe some for your couch, and then a few spares to keep in storage. You prioritize pillow allocation in this manner; the value of the latter uses is much less than the value of having a place to rest your head.

How many pillows do you buy at a given price point? Well, if you buy any, you'll buy som

... (read more)

The Baldwin effect

I couldn't find great explanations online, so here's my explanation after a bit of Googling. I welcome corrections from real experts.

Organisms exhibit phenotypic plasticity when they act differently in different environments. The phenotype (manifested traits: color, size, etc) manifests differently, even though two organisms might share the same genotype (genetic makeup). 

Panel 1: organisms are not phenotypically plastic and do not adapt to a spider-filled environment. Panel 2: a plastic organism might do the bad thing, and then lear
... (read more)
3Robbo1mo[disclaimer: not an expert, possibly still confused about the Baldwin effect] A bit of feedback on this explanation: as written, it didn’t make clear to me what makes it a special effect. “Evolution selects for genome-level hardcoding of extremely important learned lessons.” As a reader I was like, what makes this a special case? If it’s useful lesson then of course evolution would tend to select for knowing it innately - that does seem handy for an organism. As I understand it, what is interesting about the Baldwin effect is that such hard coding is selected for more among creatures that can learn, and indeed because of learning. The learnability of the solution makes it even more important to be endowed with the solution. So individual learning, in this way, drives selection pressures. Dennett’s explanation emphasizes this - curious what you make of his? https://ase.tufts.edu/cogstud/dennett/papers/baldwincranefin.htm [https://ase.tufts.edu/cogstud/dennett/papers/baldwincranefin.htm]
3TurnTrout1moRight, I wondered this as well. I had thought its significance was that the effect seemed Lamarckian, but it wasn't. (And, I confess, I made the parent comment partly hoping that someone would point out that I'd missed the key significance of the Baldwin effect. As the joke goes, the fastest way to get your paper spell-checked is to comment it on a YouTube video!) Thanks for this link. One part which I didn't understand is why closeness in learning-space (given your genotype, you're plastic enough to learn to do something) must imply that you're close in genotype-space (evolution has a path of local improvements which implement genetic assimilation of the plastic advantage). I can learn to program computers. Does that mean that, given the appropriate selection pressures, my descendents would learn to program computers instinctively? In a reasonable timeframe? It's not that I can't imagine such evolution occurring. It just wasn't clear why these distance metrics should be so strongly related. Reading the link, Dennett points out this assumption and discusses why it might be reasonable, and how we might test it.

I went into a local dentist's office to get more prescription toothpaste; I was wearing my 3M p100 mask (with a surgical mask taped over the exhaust, in order to protect other people in addition to the native exhaust filtering offered by the mask). When I got in, the receptionist was on the phone. I realized it would be more sensible for me to wait outside and come back in, but I felt a strange reluctance to do so. It would be weird and awkward to leave right after entering. I hovered near the door for about 5 seconds before actually leaving. I was pretty ... (read more)

(This is a basic point on conjunctions, but I don't recall seeing its connection to Occam's razor anywhere)

When I first read Occam's Razor back in 2017, it seemed to me that the essay only addressed one kind of complexity: how complex the laws of physics are. If I'm not sure whether the witch did it, the universes where the witch did it are more complex, and so these explanations are exponentially less likely under a simplicity prior. Fine so far.

But there's another type. Suppose I'm weighing whether the United States government is currently engaged in a v... (read more)

2Steven Byrnes5moI agree with the principle but I'm not sure I'd call it "Occam's razor". Occam's razor is a bit sketchy, it's not really a guarantee of anything, it's not a mathematical law, it's like a rule of thumb or something. Here you have a much more solid argument: multiplying many probabilities into a conjunction makes the result smaller and smaller. That's a mathematical law, rock-solid. So I'd go with that...
2TurnTrout5moMy point was more that "people generally call both of these kinds of reasoning 'Occam's razor', and they're both good ways to reason, but they work differently."
2Steven Byrnes5moOh, hmm, I guess that's fair, now that you mention it I do recall hearing a talk where someone used "Occam's razor" to talk about the solomonoff prior. Actually he called it "Bayes Occam's razor" I think. He was talking about a probabilistic programming algorithm. That's (1) not physics, and (2) includes (as a special case) penalizing conjunctions, so maybe related to what you said. Or sorry if I'm still not getting what you meant

Instead of waiting to find out you were confused about new material you learned, pre-emptively google things like "common misconceptions about [concept]" and put the answers in your spaced repetition system, or otherwise magically remember them.

At a poster session today, I was asked how I might define "autonomy" from an RL framing; "power" is well-definable in RL, and the concepts seem reasonably similar. 

I think that autonomy is about having many ways to get what you want. If your attainable utility is high, but there's only one trajectory which really makes good things happen, then you're hemmed-in and don't have much of a choice. But if you have many policies which make good things happen, you have a lot of slack and you have a lot of choices. This would be a lot of autonomy.

This has to b... (read more)

In Markov decision processes, state-action reward functions seem less natural to me than state-based reward functions, at least if they assign different rewards to equivalent actions. That is, actions  at a state  can have different reward  even though they induce the same transition probabilities: . This is unappealing because the actions don't actually have a "noticeable difference" from within the MDP, and the MDP is visitation-distribution-isomorphic to an MDP without the act... (read more)

From unpublished work.

The answer to this seems obvious in isolation: shaping helps with credit assignment, rescaling doesn't (and might complicate certain methods in the advantage vs Q-value way). But I feel like maybe there's an important interaction here that could inform a mathematical theory of how a reward signal guides learners through model space?

Reasoning about learned policies via formal theorems on the power-seeking incentives of optimal policies

One way instrumental subgoals might arise in actual learned policies: we train a proto-AGI reinforcement learning agent with a curriculum including a variety of small subtasks. The current theorems show sufficient conditions for power-seeking tending to be optimal in fully-observable environments; many environments meet these sufficient conditions; optimal policies aren't hard to compute for the subtasks. One highly transferable heuristic would therefore... (read more)

In order to reduce bias (halo effect, racism, etc), shouldn't many judicial proceedings generally be held over telephone, and/or through digital audio-only calls with voice anonymizers? 

3Mark Xu1yI don't see strong reasons why this isn't a good idea. I have heard that technical interviews sometimes get conducted with voice anonymizers.

I prompted GPT-3 with modified versions of Eliezer's Beisutsukai stories, where I modified the "class project" to be about solving intent alignment instead of quantum gravity. 

... Taji looked over his sheets. "Okay, I think we've got to assume that every avenue that Eld science was trying is a blind alley, or they would have found it. And if this is possible to do in one month, the answer must be, in some sense, elegant. So no human mistake models. If we start doing anything that looks like we should call it 'utility function patching', we'd better st

... (read more)

Transparency Q: how hard would it be to ensure a neural network doesn't learn any explicit NANDs?

Physics has existed for hundreds of years. Why can you reach the frontier of knowledge with just a few years of study? Think of all the thousands of insights and ideas and breakthroughs that have been had - yet, I do not imagine you need most of those to grasp modern consensus.

Idea 1: the tech tree is rather horizontal - for any given question, several approaches and frames are tried. Some are inevitably more attractive or useful. You can view a Markov decision process in several ways - through the Bellman equations, through the structure of the state

... (read more)
8Viliam1yCould this depend on your definition of "physics"? Like, if you use a narrow definition like "general relativity + quantum mechanics", you can learn that in a few years. But if you include things like electricity, expansion of universe, fluid mechanics, particle physics, superconductors, optics, string theory, acoustics, aerodynamics... most of them may be relatively simple to learn, but all of them together it's too much.
4TurnTrout1yMaybe. I don't feel like that's the key thing I'm trying to point at here, though. The fact that you can understand any one of those in a reasonable amount of time is still surprising, if you step back far enough.

When under moral uncertainty, rational EV maximization will look a lot like preserving attainable utility / choiceworthiness for your different moral theories / utility functions, while you resolve that uncertainty.

3MichaelA1yThis seems right to me, and I think it's essentially the rationale for the idea of the Long Reflection [https://forum.effectivealtruism.org/posts/H2zno3ggRJaph9P6c/quotes-about-the-long-reflection] .

To prolong my medicine stores by 200%, I've mixed in similar-looking iron supplement placebos with my real medication. (To be clear, nothing serious happens to me if I miss days)