All of Aaron Bergman's Comments + Replies

I mean the reason is that I've never heard of that haha. Perhaps it should be

Ngl I did not fully understand this, but to be clear I don't think understanding alignment through the lense of agency is "excessively abstract." In fact I think I'd agree with the implicit default view that it's largely the single most productive lense to look through. My objection to the status quo is that it seems like the scale/ontology/lense/whatever I was describing is getting 0% of the research attention whereas perhaps it should be getting 10 or 20%.  

Not sure this analogy works, but if NIH was spending $10B on cancer research, I would (prima facie, as a layperson) want >$0 but probably <$2B spent on looking at cancer as an atomic-scale phenomenon, and maybe some amount at an even lower-scale scale

3the gears to ascension1y
yeah I was probably too abstract in my reply - to rephrase: a thermostat (or other extremely small control system) is a perfectly valid example of agency. it's not dangerously strong agency or any such thing. but my point is really to say that you're on the right track here, looking at the micro-scale versions of things is very promising.

Note: I'm probably well below median commenter in terms of technical CS/ML understanding.  Anyway...

I feel like a missing chunk of research could be described as “seeing DL systems as ‘normal,’ physical things and processes that involve electrons running around inside little bits of (very complex) metal pieces” instead of mega-abstracted “agents.”

The main reason this might be fruitful is that, at least intuitively and to my understanding, failures like “the AI stops just playing chess really well and starts taking over the world to learn how to play c... (read more)

1the gears to ascension1y
I can understand why it would seem excessively abstract, but when we speak of agency, we are in fact talking about patterns in the activations of the gpu's circuit elements - specifically we'd be talking about patterns of numerical feedback where the program forms a causal predictive model of a variable and then, based on the result of the predictive model, does any form of model-predictive control, eg outputting bytes (floats, probably) that encode an action that the action-conditional predictive model evaluates as likely to impact the variable. Merely minimizing loss is insufficient to end up with this outcome in many cases, but on some datasets, with some problem formulations - ones that we expect to come up, such as motor control of a robot in order to walk across a room, for a trivial example, or trying to select videos which maximize probability that a user stays on the website - we can expect that the predictive model, if more precise about the future than a human's predictive model, would allow the gpu code to select actions (motor actions or video selections) that have higher reliability of reaching the target outcome (cross the room, ensure the user stays on the site) that the control loop code evaluated via the predictive model. The worry is that, if an agent is general enough in purpose to form its own subgoals and evaluate those in the predictive model, it could end up doing multi-step plan chaining through this general world-simulator subalgorithm and realize it can attack its creators in one of a great many possible ways.

Banneker Key! Yeah I was in a very similar position, but basically made the opposite choice (largely because financial costs not internalized)

One answer to the question for me:

While writing, something close to "how does this 'sound' in my head naturally, when read, in an aesthetic sense?"

I've thought for a while that "writing quality" largely boils down to whether the writer has an intuitively salient and accurate intuition about how the words they're writing come across when read. 

Ah late to the party! This was a top-level post aptly titled "Half-baked alignment idea: training to generalize" that didn't get a ton of attention. 

Thanks to Peter Barnett and Justis Mills for feedback on a draft of this post. It was inspired by Eliezer's Lethalities post and Zvi's response.

Central idea: can we train AI to generalize out of distribution

I'm thinking, for example, of an algorithm like the following:

  1. Train a GPT-like ML system to predict the next word given a string of text only using, say, grade school-level w
... (read more)

Thank you, Solenoid! The SSC podcast is the only reason I to consume all of posts like Biological Anchors: A Trick That Might Or Might Not Work

Glad to hear it's useful :)

Thanks. It's similar in one sense, but (if I'm reading the paper right) a key difference is that in the MAML examples, the ordering of the meta-level and object level training is such that you still wind up optimizing hard for a particular goal. The idea here is that the two types of training function in opposition, as a control system of sorts, such that the meta-level training should make the model perform worse at the narrow type of task it was trained on. 

That said, for sure, the types of distribution shift thing is an issue. It seems like this meta-level bias might be less bad than at the object level, but I have no idea. 

Training to generalize (and training to train to generalize, etc.)

Inspired by Eliezer's Lethalities post and Zvi's response:

Has there been any research or writing on whether we can train AI to generalize out of distribution

I'm thinking, for example:

  1. Train a GPT-like ML system to predict the next word given a string of text only using, say, grade school-level writing (this is one instance of the object level
    1. Assign the system a meta-level award based on how well it performs (without any additional training) at generating the next word from more advance
... (read more)

MichaelStJules is right about what I meant. While it's true that preferring not to experience something doesn't necessarily imply that the thing is net-negative, it seems to me very strong evidence in that direction. 

Hi, instead of clogging up the thread I just thought I'd alert you that I responded to MichaelStJules, which should function equally as a response to your comment.

Entirely agree.  There are certainly chunks of my life (as a privileged first-worlder) I'd prefer not to have experienced, and these generally these seem less bad than "an average period of the same duration as a Holocaust prisoner." Given that animals are sentient, I'd put it at at ~98% that their lives are net negative.

Preferring not to experience something is not the same thing as it being net negative. You are comparing it to a baseline of your normal life (because not experiencing it is simply continuing to experience your usual utility level).

Good point; complex, real world questions/problems are often not Googleable, but I suspect a lot of time is spent dealing with mundane, relatively non-complex problems. Even in your example, I bet there is something useful to be learned from Googling "DIY cabinet instructions" or whatever. 

I have a real world example. Last week, I noticed a 3M Command Wire Hook kept falling down. Trivial fixes like cleaning the wall as described in the instructions did not work. I tried to search for information about calculating the total load that is placed on the hook by 5 cables with different lengths and diameter along with various points of support. After about fifteen to thirty minutes of trying to figure out statics (with no formal training besides the standard introductory college physics classes), I gave up. Then, I searched for information about the likely weight of each cable and assumed that the full weight was born by the hook. The results led me to use a jumbo hook with a five pound capacity and it had not fallen down after 2 days. And if this problem had nerd-sniped you a la Xkcd and you want to show off, this is the problem I faced. From left to right: * Long headphone cable * Headphone cable held down on desk by a tissue box * Dell monitor with USB 3 and HDMI cables plugged in * USB cable supported by a pile of paper * Headphone cable, USB cable, HDMI cable wrapped together with a velcro strip * Two speaker cables plugged into a USB dac and amp * Another velcro strip * Headphone cable and USB cable connected to USB hub and different USB dac plugged into the hub * Velcro wrap holding the HDMI cable and two speaker cables together Hook that falls down * Speaker cables dangle down to the floor * HDMI cable dangles loosely from the hook (not connected on most days) (I am fairly certain that if you read this far, you ought to be doing something more useful than being nerd sniped by a physics problem.)

Interesting, but I think you're way at the tail end of the distribution on this one. I bet I use Google more than 90%+ of people, but still not as much as I should.

Yes - if not heretical, at least interesting to other people! I'm going to lean into the "blogging about things that seem obvious to me" thing now. 

Fair enough, this might be a good counterargument though I'm very unsure. How much do mundane "brain workouts" matter? Tentatively, the lack of efficacy of brain training programs like Luminosity would suggest that they might not be doing much.

The if long COVID usually clears up after eight weeks, that would definitely weaken my point (which would be good news!) I haven’t decided if it would change my overall stance on masking though

Even in a scenario where all unvaccinated people were infected with covid, I would expect none of the Georgetown undergraduates to die from covid or get covid longer than 12 weeks. Here's my fermi analysis: * in your 20s, covid CFR is .0001, compared to .01 for population as a whole. * covid longer than 12 weeks is .03 for covid population as a whole. * assume really long covid scales similarly to death and hospitalization * mRNA reduces these both by .9. That gives us .03 x .01 x .1, for a case really long covid rate of .00003. .00003 x 6532 = .2 really long covid .00001 x 6532 = .07 deaths And given that you are primarily interacting with other unvaccinated, young individuals, you are less likely to be infected than the average vaccinated person. So the real number is probably less than .1 person getting covid beyond 12 weeks. Let me know if you see errors in my reasoning.

Good point. Implicitly, I was thinking “wearing masks while indoors within ~10 feet of another person or outdoors if packed together like at a rally or concert”

[note: moved answer to comments]
1Maxwell Peterson3y

Would you be willing to post this as a general post on the main forum? I think lots of people including myself would appreciate!

I don't feel like it's the kind of polished thing I'd put on LW. But here it is on my blog:

Thanks, but I have hardly any experience with Python. Need to start learning.

4gilch3y should be a good introduction if you already know another programming language.

Yup, fixing. Gotta get better at proofreading.

From my perspective, this is why society at large needs to get better at communicating the content - so you wouldn't have to be good at "anticipating the content." 

The meaningfulness point is interesting, but I'm not sure I fully agree. Some topics can me meaningful but not interesting (high frequency trading to donate money) and visa-versa (video game design? No offense to video game designers).

I bet we agree on the substance, and that any disagreement is probably just a word choice thing. Like, if we could figure out how to describe and predict the “real content” for a given person - the way they would feel psychologically and physically on a daily basis to do the job - then that would clearly be much more useful than just knowing the topic. And we probably can improve at that task as a society. I just think it is a difficult problem (as you point out), and I worry that solving it might seem to some people like all it requires is a small change in mental focus. In my experience negotiating a mid career job change and hearing about the experiences of others doing the same, I am skeptical of how much the job shadowing and such helps. However, I have gotten quite a few benefits from the line of thinking you sketch here. In particular, just knowing how many hours per week a job (or course of schooling) can be a big help. When I originally considered med school, one of the factors that decided me against it was the 80/90 hour weeks, and lots of reports that med school students/residents who are parents rely entirely on their partner for parenting duties.

By your description, it feels like the kind of book where an author picks a word and then rambles about it like an impromptu speaker. If this had an extraordinary thesis requiring extraordinary evidence like Manufacturing Consent then lots of anecdotes would make sense. But the thesis seems too vague to be extraordinary.

I get the impression of the kind of book which where a dense blogpost is stretched out to the length of a book. This is ironic for a book about subtraction.

Yup, very well-put.

Your point about anecdotes got me thinking; an "extraordinary the... (read more)

Your comment makes sense. I think the problem goes even deeper. Many nonfiction books project all of human experience onto a single axis and then ramble about that axis. In this case, the axis is "more" vs "less". If you don't understand what "more" and "less" are then this can be educational. But if you do know what "more" and "less" are then the important thing to understand is when should you apply this axis and when shouldn't you. It is one half of a bravery debate. Bravery debates are more about the listener than the facts. Whether you should subtract from your life depends on who you are. I don't think Khunu needs to read about how he should subtract from his life. The problem with Subtract is that the truth-value of its thesis depends on who is reading it. I prefer to read books with observer-independent (i.e. objective) truth values.

Looks like I’m in good company!

I don't think it is operationalizable, but I fail to see why 'net positive mental states' isn't a meaningful, real value. Maybe the units would be apple*minutes or something, where one unit is equivalent to the pleasure you get by eating an apple for one minute. It seems that this could in principle be calculated with full information about everyone's conscious experience. 

Are you using 'utility' in the economic context, for which a utility function is purely ordinal? Perhaps I should have used a different word, but I'm referring to 'net positive conscious mental states,' which intuitively doesn't seem to suffer from the same issues. 

Yes, I was using it in the economic sense. If we say something like "net positive conscious mental states", it's still unclear what it would mean to add up such things. What would "positive conscious mental state" mean, in a sense which can be added across humans, without running into the same problems which come up for utility?

Interesting, thanks. Assuming this effect is real, I wonder how much is due to the physical movement of walking rather than the low-level cognitive engagement associated with doing something mildly goal-oriented (i.e. trying to reach a destination), or something else. 

Thanks for your perspective. 

I've never been able to do intellectual work with background music, and am baffled by people e.g. programmers who work with headphones playing music all day. But maybe for them it does just use different parts of the brain

For me, there is a huge qualitative difference between lyrical music or even "interesting" classical and electronic music, and very "boring," quiet lyric-less music. Can't focus at all listening to lyrics, but soft ambient music feels intuitively helpful (though this could be illusory). This is especially the case when its a playlist or song I've heard a hundred times before, so the tune is completely unsurprising. 

Yes I’ve heard others say they can’t listen to lyrics. The one thing I’ve started playing recently in the otherwise silent room where I work is quiet birdsong (background level, hardly noticeable). On the grounds it may have a subconscious effect of making me feel I’m outdoors, which may be conducive to creativity (cf walks), or at least be relaxing.

Yes, I was incorrect about Matuschak's position. He commented on reddit here:

"I think Matuschak would say that, for the purpose of conveying information, it would be much more efficient to read a very short summary than to read an entire book."

FWIW, I wouldn't say that! Actually, my research for the last couple years has been predicated on the value of embedding focused learning interactions (i.e spaced repetition prompts) into extended narrative. The underlying theory isn't (wasn't!) salience-based, but basically I believe that strong understanding is pro

... (read more)
2Stefan De Young3y
Thanks for the link! :)

Super interesting and likely worth developing into a longer post if you're so inclined. Really like this analogy.

Great post and thanks for linking to it! Seems like books' function and utility has gotten more attention than I would have expected. 

But then readers would have to repeat this sentence for as long as it takes to read the blog post to get the same effect. Not quite as fun.

Yes this is an excellent point; books increase the fidelity of idea transmission because they place something like a bound on how much an idea can be misinterpreted, since one can always appeal to the author's own words (much more than a blog post or Tweet).

It’s not that individual journalists don’t trust Wikipedia, but that they know they can’t publish an article in which a key fact comes directly from Wikipedia without any sort of corroboration. I assume, anyway. Perhaps I’m wrong.

I don't think that key facts are often sourced via Wikipedia. On the other hand many facts that you find in an newspaper article aren't the key facts. 

Great post! Is ego depletion just another way of conceptualizing rising marginal cost of effort? Like, maybe it is a fact of human psychology that the second hour of work is more difficult and unpleasant than the first. 

Interesting question - to what extent is ego depletion (insofar as it occurs) related to rising marginal cost of effort? It feels to me that is part of what's going on, but maybe not all of it. For instance, some forms of effort feel like their marginal cost only goes up gradually, and others more steeply. Motivation also seems relevant (it can go down over time) and that seems to have less to do with marginal cost from what I can tell.

I don't know much more than you could find searching around r/nootropics, but my sense is that the relationship between diet and cognition is highly personal, so experimentation is warranted. Some do best on keto, others as a vegan, etc. With respect to particular substances, it seems that creatine might have some cognitive benefits, but once again supplementation is highly personal. DHA helps some people and induces depression in others, for example.

Also, inflammation is a common culprit/risk factor for many mental issues, so I'd expect that a generally "... (read more)

Yes, you're correct. As others have correctly noted, there is no unambiguous way of determining which effects are "direct" and which are not. However, suppose decriminalization does decrease drug use. My argument emphasizes that we would need to consider the reduction in time spent enjoying drugs as a downside to decriminalization (though I doubt this would outweigh the benefits associated with lower incarceration rates). It seems to me that this point would frequently be neglected.

There is a good amount of this discussion at r/nootropics - of which some is evidence based and some is not. For example, see this post

1Just Learning3y
Thank you. There was one paper at the post about older adults and calorie restriction. However, it is kind of biased - they have slightly overweight people in the experiment. So yes, calorie restriction is good for overweight. Duh.  Do you know any other studies? Thank you! 

Thanks very much. Just fixed that. 

This is a good point. Could also be that discussing only points that might impact oneself seems more credible and less dependent on empathy, even if one really does care about others directly.

Fair point, but you'd have to think that the tendencies of the patent officers changed over time in order to foreclose that as a good metric. 

I do think that standards of what is a trivial invention change over time. There are court cases that invalidate certain patents and then patent officers change their patent giving to not give out the kind of patents that are likely to be declared invalid. Laws also change.

I meant objective in the sense that the metric itself is objective, not that it is necessarily a good indicator of innovation. Yes, you're right. I do like Cowen and Southewood's method of only looking at patents registered all of the U.S., Japan, and E.U. 

The subjects making the judgment seem here to be burocrats in the patent office. I don't see how that's substantially more objective then historians making judgments.

Basically agree with this suggestion: broader metrics are more likely to be unbiased over time. Even the electric grid example, though, isn't ideal because we can imagine a future point where going from $0.0001 to $0.000000001 per kilowatt-hour, for example, just isn't relevant. 

Total factor productivity and GDP per capita are even better, agreed. 

While a cop-out, my best guess is that a mixture of qualitative historical assessments (for example, asking historians, entrepreneurs, and scientists to rank decades by degree of progress) and using a v... (read more)

Patent rates aren't an objective measure of innovation. Cutting down the number of trival patents might very well mean increased and not decreased innovation.

Thank you! Should have known someone would have beat me to it. 

I was thinking the third bullet, though the question of perverse incentives needs fleshing out, which I briefly alluded to at the end of the post:

“Expected consequences”, for example, leaves under-theorized when you should seek out new, relevant information to improve your forecast about some action’s consequences.

My best guess is that this isn't actually an issue, because you have a moral duty to seek out that information, as you know a priori that seeking out such info is net-positive in itself. 

Thanks for your insight. Yes, the "we simplify this for undergrads" thing seems most plausible to me. I guess my concern is that in this particular case, the simplification from "expected consequences matter" to "consequences matter" might be doing more harm than good. 

3Vaughn Papenhausen3y
This could well be true. It's highly possible that we ought to be teaching this distinction, and teaching the expected-value version when we teach utilitarianism (and maybe some philosophy professors do, I don't know). Also, here's a bit in the SEP on actual vs expected consequentialism:
Load More