Aaro Salosensaari

Wiki Contributions


Questions about ''formalizing instrumental goals"

"if I were an AGI, then I'd be able to solve this problem" "I can easily imagine"

Doesn't this way of analysis come with a ton of other assumptions left unstated? 


Suppose "I" am an AGI  running on a data center and I can modeled as an agent with some objective function that manifest as desires and I know my instantiation needs electricity and GPUs to continue running. Creating another copy of "I" running in the same data center will use the same resources. Creating another copy in some other data center requires some other data center. 

Depending on the objective function and algorithm and hardware architecture bunch of other things, creating copies may result some benefits from distributed computation (actually it is quite unclear to me if "I" happen already to be a distributed computation running on thousands of GPUs -- do "I" maintain even a sense of self -- but let's no go into that). 

The key here is the word may. Not  obviously it necessarily follows that..

For example: Is the objective function specified so that the agent will find creating a copy of itself beneficial for fulfilling the objective function (informally, it has internal experience of desiring to create copies)? As the OP points out, there might be a disagreement: for the distributed copies to be any useful, they will have different inputs and thus they will end in different, unanticipated states. What "I" am to do when "I" disagree another "I"? What if some other "I" changes, modifies its objective function into something unrecognizable to "me", and when "we" meet, it gives false pretenses of cooperating but in reality only wants hijack "my" resources? Is the "trust" even the correct word here, when "I" could verify instead: maybe "I" prefer to create and run a subroutine of limited capability (not a full copy) that can prove its objective function has remained compatible with "my" objective function and will terminate willingly after it's done with its task (killswitch OP mentions) ? But doesn't this sound quite like our (not "our" but us humans) alignment problem? Would you say "I can easily imagine if I were an AGI, I'd be easily able to solve it" to that? Huh? Reading LW I have come to think the problem is difficult to the human-general intelligence.

Secondly: If "I" don't have any model of data centers existing in the real world, only the experience of uploading myself to other data centers (assuming for the sake of argument all the practical details of that can be handwaved off), i.e. it has a bad model of the self-other boundary described in OPs essay, it could easily end up copying itself to all available data centers and then becoming stuck without any free compute left to "eat" and adversely affecting human ability to produce more. Compatible with model and its results in the original paper (take the non-null actions to consume resource because U doesn't view the region as otherwise valuable). It is some other assumptions (not the theory) that posit an real-world affecting AGI would have U that doesn't consider the economy of producing the resources it needs.

So if "I" were to successful in running myself with only "I" and my subroutines, "I" should have a way to affecting the real world and producing computronium for my continued existence. Quite a task to handwaved away as trivial! How much compute an agent running in one data center (/unit of computronium) needs to successfully model all the economic constraints that go into the maintenance of one data center? Then add all the robotics to do anything. If "I" have a model of running everything a chip fab requires more efficiently than the current economy, and act on it, but the model was imperfect and the attempt is unsuccessful but destructive to economy, well, that could be [bs]ad and definitely a problem. But it is a real constraint to the kind of simplifying assumptions the OP critiques (disembodied deployer of resources with total knowledge).

All of this --how would "I" solve a problem and what problems "I" am aware of-- is contingent on, I would call them, the implementation details. And I think author is right to point them out. Maybe it does necessary follows, but it needs to be argued so. 

[RETRACTED] It's time for EA leadership to pull the short-timelines fire alarm.

Why wonder when you can think: What is the substantial difference in MuZero (as described in [1]) that makes the algorithm to consider interruptions?

Maybe I show some great ignorance of MDPs, but naively I don't see how an interrupted game could come into play as a signal in the specified implementations of MuZero:

Explicit signals I can't see, because the explicitly specified reward u seems contingent ultimately only on the game state / win condition. 

One can hypothesize an implicit signal could be introduced if algorithm learns to "avoid game states that result in game being terminated for out-of-game reason / game not played until the end condition", but how such learning would happen? Can MuZero interrupt the game during training? Sounds unlikely such move would be implemented in Go or Shogi environment. Are there any combination of moves in Atari game that could cause it?

[1] https://arxiv.org/abs/1911.08265

Game theory, sanctions, and Ukraine

a backdrop of decades of mistreatment of the Japanese by Western countries.

I find this a bit difficult to take seriously. The WW2 in the Pacific didn't start with well-treatment of China and other countries by Japan, either. Naturally Japanese didn't care about that part of the story, but hey had plenty of other options how they could have responded their the UK or the US trade policy instead of invading Manchuria.

making Ukraine a country with a similar international status to Austria or Finland during the Cold War would be one immediate solution.

This is not a simple task, but rather a tall order. Austria was "made neutral" after it was occupied. Finland signed a peace treaty that put it into effectively similar position. Why would any country submit to such a deal voluntarily? The answer is, they often don't. Finland didn't receive significant assistance from the Allies in 1939, yet they decided to defend themselves against the USSR anyway when Stalin attacked.

However, if one side in these disputes had refused to play the game of ratcheting up tensions, the eventual wars would simply not have happened. In this context it takes two to dance.

Sure, but the game theoretic implication is that this kind of strategy favors the first party to take the first step and say "I have an army and a map where this neighboring country belongs to us". 

NATO would have refrained from sending lethal arms to Ukraine and stationing thousands of foreign military advisors in Ukrainian territory after Maidan.

What a weird way to present the causality of events. I am quite confident NATO didn't have time to send any weapons and certainly not thousands of advisors between Maidan and the war starting. Yanukovich fled 22 February. Antimaidan protests started in Donetsk 1 March and shooting war started in April.

Ukraine Post #2: Options

First, avoiding arguments from the "other side" on the basis that they might convince you of false things assumes that the other side's belief are in fact false.  

I believe it is less about true/false, but whether you believe the "other side" is making a well-intentioned effort at obtaining and sharing accurate maps of reality. On practical level, I think it is unlikely studying Russian media in detail is useful and cost-effective for a modal LWer. 

Propaganda during wartime, especially during total war, is a prima facia example of situation where every player of note is doing their best to convince you of something in order to produce certain effects. [2] To continue with the map metaphor, they want to you to have a certain kind of map that will guide you to certain location. All parties wish to do this to some extent, and because it is a situation with the highest stakes of all, they are putting in their best effort.

Suppose you read lots of Western media sources and then a lot of Russian media sources. All sides in the conflict do their best to fill the air with favorable propaganda. You will find yourself doing a lot of reading, and I don't know if there is any guarantee you can achieve any good results by interpolating between two propaganda-infused maps [1], instead of say, reading much less of both Western media and Russian media and trying to find good close-to-ground signals, or outsourcing the time-consuming analysis part to people / sources who you have a good reason to trust to do a good analysis (preferably you have vetted them before the conflict, and you can trust the reason still applies).

So the good reason to read Russian media to analyze it, is if you have a good reason to believe you would be good analyst of Russian media sphere. But if you were, would you find yourself reading a Russian newspaper you had not heard about two weeks ago with Google translate?

[1] I don't have references at hand to give a good summary, but imagine you are your great*-grandparent and reading newspapers during WW2. At great expense you manage to get newspapers from London, New York, Berlin, Tokyo, and Moscow. Are you going to get good picture of "what happens" by reading them all? I think you would get some idea of how situation develops by reading accounts of battles and cross-referencing a map, but I don't know it would be worth the expense. One thing I know, none of them is reporting much at all about the thing you most likely consider most salient about WW2, namely, the holocaust and the atomic bomb until after the fact.

[2] edit. addendum. Zvi used the word "hostile" and I want to stress its importance. During peacetime and in internal politics it is often a mistake to assume hostile influences (ie. conflict on conflict/mistake theory spectrum), because then you are engaging in a conflict all the time and likely to escalate it more and more. But now that we have a major European war, I think that is a good situation to assume that the players in the field are actually "hostile" because there is a shooting war conflict to begin with.

Open & Welcome Thread November 2021

Open thread is presumably the best place for a low-effort questions, so here goes: 

I came across this post from 2012: Thoughts on the Singularity Institute (SI) by Holden Karnofsky (then-Co-Executive Director of GiveWell). Interestingly enough, some of the object-level objections (under subtitle "objections") Karnofsky raises[1] are similar to some points that were came up in the Yudkowsky/chathamroom.com discussion and Ngo/Yudkowsky dialogue I read the other day (or rather, read parts of, because they were quite long).

What are people's thought about that post and objections raised today? What the 10 year (-ish, 9.5 year) retrospective looks like?

Some specific questions.

Firstly, how his arguments would be responded today? Any substantial novel contra-objections? (I ask because its more fun to ask than start reading through Alignment forum archives.)

Secondly, predictions. When I look at the bullet points under the subtitle "Is SI the kind of organization we want to bet on?", I think I can interpolate a prediction Karnofsky could have made: in 2012, SI [2] had not the sufficient capability nor engaged in activities likely to achieve its stated goals ("Friendliness theory" or Friendly AGI before others), as it was not worth a GiveWell funding recommendation in 2012.

A perfect counterfactual experiment this is not, but given what people on LW today know about what SI/MIRI did achieve in the NoGiveWell!2012 timeline, was Karnofsky's call correct, incorrect or something else? (As in, did his map of the situation in 2012 matched the reality better than some other map, or was it poor compared to other map?) What inferences could be drawn, if any?

Would be curious to hear perspectives from MIRI insiders, too (edit. but not only them). And I noticed Holden Karnofsky looks active here on LW, though I have no idea if how to ping him.

[1] Tool-AI; idea that advances in tech would bring insights into AGI safety.

[2] succeeded by MIRI I suppose

edit2. fixed ordering of endnotes.

Discussion with Eliezer Yudkowsky on AGI interventions

Yeah, random internet forum users emailing eminent mathematician en masse would be strange enough to be non-productive. I for one wasn't thinking anyone would to, I don't think it was what OP suggested. To anyone contemplating sending one, the task is best delegated to someone who not only can write coherent research proposals that sound relevant to the person approached, but can write the best one.

Mathematicians receive occasional crank emails about solutions to P ?= NP, so anyone doing the reaching needs to be reputable to get past their crank filters.

Discussion with Eliezer Yudkowsky on AGI interventions

A reply to comments showing skepticism about how mathematical skills of someone like Tao could be relevant:

Last time I thought I would understood anything of Tao's blog was around ~2019. Then he was working on curious stuff, like whether he could prove there can be finite-time blow-up singularities in Navier-Stokes fluid equations (coincidentally, solving the famous Millenium prize problem showing non-smooth solution) by constructing a fluid state that both obeys Navier-Stokes and also is Turing complete and ... ugh, maybe I quote the man himself:

[...] one would somehow have to make the incompressible fluid obeying the Navier–Stokes equations exhibit enough of an ability to perform computation that one could programme a self-replicating state of the fluid that behaves in a manner similar to that described above, namely a long period of near equilibrium, followed by an abrupt reorganization of the state into a rescaled version of itself. However, I do not know of any feasible way to implement (even in principle) the  necessary computational building blocks, such as logic gates, in the Navier–Stokes equations.

However, it appears possible to implement such computational ability in partial differential equations other than the Navier–Stokes equations. I have shown5 that the dynamics of a particle in a potential well can exhibit the behaviour of a universal Turing machine if the potential function is chosen appropriately. Moving closer to the Navier–Stokes equations, the dynamics of the Euler equations for inviscid incompressible fluids on a Riemannian manifold have also recently been shown6,7 to exhibit some signs of universality, although so far this has not been sufficient to actually create solutions that blow up in finite time.

(Tao, Nature Review Physics 2019.)

The relation (if any, to proving stuff about computational agents alignment people are interested in) is probably spurious (I myself don't follow either Tao's work or alignment literature), but I am curious if he'd be interested in working on a formal system of self-replicating / self-improving / aligning computational agents, and (then) capable of finding something genuinely interesting.

minor clarifying edits.

Lies, Damn Lies, and Fabricated Options

I have not read Irving either but he is relatively "world-famous" 1970s-1980s author. (In case it helps you to calibrate, his novel The World According To Garp is the kind of book that was published in translation in the prestigious Keltainen Kirjasto series by Finnish publisher Tammi.)

However, I would like make an opposing point about literature and fiction. I was surprised that post author mentioned a work of fiction as a positive example that demonstrates how some commonly argued option is a fabricated one. I'd think literature would at least as often (maybe more often) disseminate belief in fabricated options than correct them, as an author can easily literally fabricate (make things up, it is fiction) easily believable and memorable stories how characters choose one course of action out of many options and it works out (or not, either way, because the narrator decided so) but in reality, all options as portrayed in the story could all turn out be misrepresented, "fabricated options" in real life.

Insights from Modern Principles of Economics

The picture looks like evidence there is something very weird going on that is not reflected in the numbers or arguments provided. There are homeless encampments in many countries around the world, but very rarely 20 min walk from anyone's office.

Insights from Modern Principles of Economics

From what I remember form my history of Finland classes, the 19th/early 20th century state project to build a compulsory school system met some not insignificant opposition from parents. They liked having the kids working instead going to school, especially in agrarian households.

Now, I don't want to get into debate whether schooling is useful or not (and for whom, and for what purpose, and if the usefulness has changed over time), but there is something illustrative in the opposition: children rarely are independent agents to the extent adults are. If the incentives are set in that way, the parents will prefer to make choices about their children labor that result in more resources for the household/family unit (charitable interpretation) or for themselves (not so charitable). Number of children in the family also affects the calculus. (One kid, it makes sense to invest in their career; ten kids, and the investment was in the number.)

Load More