Morgan_Rogers — LessWrong

I can say from first and second-hand experience that a hard part of supervising a PhD or Masters student in research (there are many) is taking someone who lies at one end of the bird-frog spectrum and pushing them to acquire the skills they need from the other end. To get to the point of pursuing research in the first place, you're likely to be either someone technically skilled who can easily work out the fine details of a problem and habitually focuses on examples or someone who has enough of an appreciation for the overarching ideas to be motivated to build them further -- it sounds like you are/were of the latter variety. If you don't acquire some skills and perspective from the other end, you'll inevitably drive yourself into a dead end: in the former case, one risks sinking much time into elaborating specific cases while missing a general result that simplifies matters; in the latter, one can work for a long time on a false claim because there is insufficient grounding in verifiable cases.

At the start of a research career, a responsible supervisor must push their student to be independent, but there is a compromise between giving space and giving guidance. It seems like your adviser wasn't paying close enough attention to your work to see that you hadn't done the basics, which is how you ended up spending so long on this without realising that you didn't have an 'empirical' basis for what you were trying to prove in the first place. The fact that you weren't getting pushback on your reluctance to read references also seems like a red flag.

All this is to say that a moral of the story could be for PhD supervisors (who, by the way, almost universally get not specific training for that role): just because a student is confident doesn't mean they have everything it takes to do research, and you need to make sure that they aren't wasting their time.

A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]

Morgan_Rogers3y20

This post sought to give an overview of how they do this, which is in my view extremely useful information!

This is what I was trying to question with my comment above: Why do you think this? How am I to use this information? It's surely true that this is a community that needs to be convinced of the importance of work on safety, as you point out in the next post in the sequence, but how does information about, say, the turnover of ML PhD students help me do that?

Thus to answer the question "what kind of research approaches generally work for shaping machine learning systems?" it is quite useful to engage with how they have worked in capabilities advancements. In machine learning, theoretical (in "math proofs" sense of the word) approaches to advancing capabilities have largely not worked. This suggests deep learning is not amenable to these kinds of approaches.

There is conflation happening here which undermines your argument: theoretical approaches dominated how machine learning systems were shaped for decades, and you say so at the start of this post. It turned out that automated learning produced better results in terms of capabilities, and it is that success that makes it the continued default. But the former fact surely says a lot more about whether or not theory can "shape machine learning systems" than the latter. Following through with your argument, I would instead conclude that implementing theoretical approaches to safety might require us to compromise on capabilities, and this is indeed exactly what I expect: learning systems would have access to much more delicious data if they ignored privacy regulations and other similar ethical boundaries, but safety demands that capability is not the singular shaping consideration in AI systems.

Knowledge that useable theory has not really been produced in deep learning suggests to me that it's unlikely to for safety, either.

This is simply not true. Failure modes which were identified by purely theoretical arguments have been realised in ML systems. System attacks and pathological behaviour (for image classifiers, say) are regularly built in theory before they ever meet real systems. It's also worth noting that any architecture choices or to, say, make backprop more algorithmically efficient, are driven by theory.

In the end, my attitude is not that "iterative engineering practices will never ensure safety", but rather that there are plenty of people already doing iterative engineering, and that while it's great to convince as many of those as possible to be safety-conscious, there would be further benefits to safety if some of their experience could be applied to the theoretical approaches that you're actively dismissing.

A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]

Morgan_Rogers3yΩ4100

There is a disheartening irony to calling this series "Practical AI Safety" and having the longest post being about capabilities advancements which largely ignore safety.

The first part of this post consists in observing that ML applications proceed from metrics, and subsequently arguing that theoretical approaches have been unsuccessful in learning problems. This is true but irrelevant for safety, unless your proposal is to apply ML to safety problems, which reduces AI Safety to 'just find good metrics for safe behaviour'. This seems as far from a pragmatic understanding of what is needed in AI Safety as one can get.

In the process of dismissing theoretical approaches, you ask "Why do residual connections work? Why does fractal data augmentation help?" These are exactly the kind of questions which we need to be building theory for, not to improve performance, but for humans to understand what is happening well enough to identify potential risks orthogonal to the benchmarks which such techniques are improving against, or trust that such risks are not present.

You say, "If we want to have any hope of influencing the ML community broadly, we need to understand how it works (and sometimes doesn’t work) at a high level," and provide similar prefaces as motivation in other sections. I find these claims credible, assuming the "we" refers to AI Safety researchers, but considering the alleged pragmatism of this sequence, it's surprising to me that none of the claims are followed up with suggested action points. Given the information you have provided, how can we influence this community? By publishing ML papers at NeurIPS? And to what end are you hoping to influence them? AI Safety can attract attention, but attention alone doesn't translate into progress (or even into more person-hours).

Your disdain for theoretical approaches is transparent here (if it wasn't already from the name of this sequence). But your reasoning cuts both ways. You say, "Even if the current paradigm is flawed and a new paradigm is needed, this does not mean that [a researcher's] favorite paradigm will become that new paradigm. They cannot ignore or bargain with the paradigm that will actually work; they must align with it." I expect that 'metrics suffice', (a strawperson of) your favoured paradigm, will not be the paradigm that will actually work, and it's disappointing that your sequence carries the message (to my reading) that technical ML researchers can make significant progress in alignment and safety without really changing what they're doing.

[Closed] Job Offering: Help Communicate Infrabayesianism

Morgan_Rogers4yΩ030

If I haven't found a way to extend my post-doc position (ending in August) by mid-July and by some miracle this job offer is still open, it could be the perfect job for me. Otherwise, I look forward to seeing the results.

Goal-directedness: exploring explanations

Morgan_Rogers4y10

A note on judging explanations

I should address a point that wasn't addressed in the post, and which may otherwise be a point of confusion going forward: the quality of an explanation can be high according to my criteria even if it isn't empirically correct. That is, there are some explanations of behaviour which may be falsifiable: if I am observing a robot, I could explain its behaviour in terms of an algorithm, and one way to "test" that explanation would be to discover the algorithm which the robot is in fact running. However, no matter the result of this test, the judged quality of the explanation is not affected. Indeed, there are two possible outcomes: either the actual algorithm provides a better explanation overall, or our explanatory algorithm could be a simpler algorithm with the same effects, and hence be a better explanation than the true one, since using this simpler algorithm is a more efficient way to predict the robot's behaviour than simulating the robot's actual algorithm.

This might seem counterintuitive at first, but it's really just Occam's razor in action. Functionally speaking, the explanations I'm talking about in this post aren't intended to be recovering the specific algorithm the robot is running (just as we don't need the specifics of its hardware or operating system); I am only concerned with accounting for the robot's behaviour.

Harmful Options

Morgan_Rogers4y10

Suppose your computer games, in addition to the long difficult path to your level's goal, also had little side-paths that you could use—directly in the game, as corridors—that would bypass all the enemies and take you straight to the goal, offering along the way all the items and experience that you could have gotten the hard way. And this corridor is always visible, out of the corner of your eye.

Even if you resolutely refused to take the easy path through the game, knowing that it would cheat you of the very experience that you paid money in order to buy—wouldn't that always-visible corridor, make the game that much less fun? Knowing, for every alien you shot, and every decision you made, that there was always an easier path?

This exact phenomenon happens in Deus Ex: Human Revolution, where you can get around almost every obstacle in the game by using the ventilation system. The frustration that results is apparent in this video essay/analysis: it undermines all of the otherwise well-designed systems in the game in spite of not actually interfering with the player's ability to engage with them.

I wonder if, alongside the "loss of rejected options" proposition, a reason that extra choices impact us is the mental bandwidth they take up. If the satisfaction we derive from a choice is (to a first-order approximation) proportional to our intellectual and emotional investment in the option we select, then having more options leaves less to invest as soon as the options go from being free to having any cost at all. As an economic analogy, a committee seeking to design a new product or building must choose between an initial set of designs. The more designs there are, the more resources must go into the selection procedure, and if the committee's budget is fixed, then this will remove resources that could have improved the product further down the line.

Why Rationalists Shouldn't be Interested in Topos Theory

Morgan_Rogers4y60

[0,1] is a commutative quantale when equipped with its usual multiplication. You can lift the monoidal product structure to sheaves on [0,1] (viewed as a frame) via Day convolution. So we recover a topos where the truth values are probabilities.

People who have attempted to build toposes with probabilities as truth values have also failed to notice this. Take Isham and Doering's paper, for example, (which I personally am quite averse to because they bullishly follow through on constructing toposes with certain properties which are barely justified). They don't even think about products of probabilities.

I think the monoidal topos on the unit interval merits some serious investigation.

Goal-directedness: exploring explanations

Morgan_Rogers4y10

I see what you're getting at. For an arbitrary explanation, we need to take into account not only the complexity of the explanation itself, but also how difficult it is to compute a relevant prediction from that explanation; according to my criteria, the Standard Model (or any sufficiently detailed theory of physics that accurately explains phenomena within a conservative range of low-ish energy environments encountered on Earth) would count as a very good explanation for any behaviour for its complexity, but that's ignoring the fact that it would be impossible to actually compute those predictions.

While I made the claim that there is a clear dividing line between (accuracy and power) and (complexity), this strikes me as an issue straddling complexity and explanatory power, which muddies the water a little.

Since I've appealed to physics explanations in my post, I'm glad you've made me think about these points. Moving forward, though, I expect the classes of explanation under consideration to be so constrained as to make this issue insignificant. That is, I expect to be directly comparing explanations taking the form of goals to explanations taking the form of algorithms or similar; each of these has a clear interpretation in terms of its predictions and, while the former might be harder to compute, the difference in difficulty is going to be suitably uniform across the classes (after accounting for complexity of explanations), so that I feel justified in ignoring it until later.

Goal-directedness: my baseline beliefs

Morgan_Rogers4y20

Thanks for the ideas!

I like the idea about the size of the target states; there's bound to be some interesting measure theory that I can apply if I decide to formalize in that direction. In fact, measure theory might be able to clarify some of the subtleties I alluded to above regarding what happens when we refine the world model (for example, in a way that causes a single goal state to split into two or more).

There are hints in your last paragraph of associating competence with goal-directedness, which I think is an association to avoid. For example, when a zebra is swimming across a river as fast as it can, I would like the extent to which that behaviour is considered goal-directed to be independent of whether that zebra is the one that gets attacked by a crocodile.

Why Subagents?

Morgan_Rogers4yΩ470

The example you give has a pretty simple lattice of preferences, which lends itself to illustrations but which might create some misconceptions about how the subagent model should be formalized. For example, in your example you assume that the agents' preferences are orthogonal (one cares about pepperoni, the other about mushrooms, and each is indifferent to the opposite direction), the agents have equal weighting in the decision-making, the lattice is distributive... Compensating for these factors, there are many ways that a given 'weak utility' can be expressed in terms of subagents. I'm sure there are optimization questions that follow here, about the minimum number of subagents (dimensions) needed to embed a given weak-utility function (partially ordered set), and about when reasonable constraints such as orthogonality of subagents can be imposed. There are also composition questions: how does a committee of agents with subagents behave?

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Sequences

Posts

Wikitag Contributions

Comments

Sequences

Posts

Wikitag Contributions

Comments